Image and Video Compression

Learning-based Image and Video Compression

Data compression and generative modeling are two fundamentally related tasks. Intuitively, the essence of compression is to find all “patterns” in the data and assign fewer bits to more frequent patterns. To know exactly how frequent each pattern occurs, one would need a good probabilistic model of the data distribution, which coincides with the objective of (likelihood-based) generative modeling. Motivated by this, we study the problem of image and video compression from the perspective of probabilistic generative modeling.

Publications:

[highlight, 13.5% of accepted papers] Y. Zhang, Z. Duan, Y. Huang, F. Zhu, “Balanced Rate-Distortion Optimization in Learned Image Compression,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, Jul 2025.
Y. Zhang, Z. Duan, F. Zhu, “On Efficient Neural Network Architectures for Image Compression,” Proceedings of the IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, Oct 2024. Code
Y. Zhang, Z. Duan, Y. Huang, F. Zhu, “Theoretical Bound-Guided Hierarchical VAE For Neural Image Codecs,” Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Niagara Falls, ON, Canada, Jul 2024. Code
Z. Duan, M. Lu, J. Yang, J. He, Z. Ma, F. Zhu, “Towards Backward-Compatible Continual Learning of Image Compression,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, Washington, USA, Jun 2024. Code
Z. Duan, M. Lu, J. Ma, Y. Huang, Z. Ma, F. Zhu, “QARV: Quantization-Aware ResNet VAE for Lossy Image Compression,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 1, pp. 436-450, Jan 2024. Code
Z. Duan, J. Ma, J. He, F. Zhu, “An Improved Upper Bound on the Rate-Distortion Function of Images,” Proceedings of the IEEE International Conference on Image Processing (ICIP), Kuala Lumpur, Malaysia, Oct 2023. Code
[Best Algorithms Paper] Z. Duan, M. Lu, Z. Ma, F. Zhu, “Lossy Image Compression with Quantized Hierarchical VAEs,” Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Hawaii, USA, Jan 2023. Code
Z. Duan, M. Lu, Z. Ma, F. Zhu, “Opening the Black Box of Learned Image Coders,” Proceedings of the Picture Coding Symposium (PCS), San Jose, California, USA, Dec 2022.
D. Ding, M. Zhan, D. Chen, Q. Chen, Z. Liu, F. Zhu, “Advances In video Compression System Using Deep Neural Network: A Review and Case Studies,” Proceedings of the IEEE, pp. 1-27, Mar 2021. Project Page

Compression for Machine Vision and Processing

Visual data are traditionally designed to be viewed by human, and thus the compression techniques are also designed to reconstruct the original data. A recent paradigm, namely Coding for Machines, has been grown rapidly to embrace the era of AI and deep learning. In many modern applications that involves autonomous visual analysis, visual data needs to be compressed, stored/transmitted, but is then only processed by an AI algorithm (instead of human eyes) and is never reconstructed to the original form. Traditional methods that are mostly designed to reconstruct visual signal are thus inefficient in this new paradigm, and new techniques must be developed to account for new challenges and requirements.

Publications:

Z. Duan, M.A. Hossain, J. He, F. Zhu, “Balancing the Encoder and Decoder Complexity in Image Compression for Classification,” EURASIP Journal on Image and Video Processing, vol. 2024, no. 38, Oct 2024.
Z. Duan, F. Zhu, “Compression of Self-Supervised Representations for Machine Vision,” Proceedings of the IEEE International Workshop on Multimedia Signal Processing (MMSP), West Lafayette, IN, USA, Oct 2024. Code
Z. Duan, Z. Ma, F. Zhu, “Unified Architecture Adaptation for Compressed Domain Semantic Inference,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 8, pp. 4108-4121, Aug 2023.
Y. Huang, Z. Duan, F. Zhu, “NARV: An Efficient Noise-Adaptive ResNet VAE for Joint Image Compression and Denoising,” Proceedings of the IEEE International Conference on Multimedia and Expo Workshop (ICME-W), Brisbane, Australia, Jul 2023.
M.A. Hossain, Z. Duan, Y. Huang, F. Zhu, “Flexible Variable-Rate Image Feature Compression for Edge-Cloud Systems,” Proceedings of the IEEE International Conference on Multimedia and Expo Workshop (ICME-W), Brisbane, Australia, Jul 2023.
[Best Paper Award Finalists] Z. Duan, F. Zhu, “Efficient Feature Compression for Edge-Cloud Systems,” Proceedings of the Picture Coding Symposium (PCS), San Jose, California, USA, Dec 2022. Code

Model Compression

Deep learning models used in image compression often involve high computational complexity, making them challenging to deploy in resource-constrained environments. Model compression techniques, such as pruning and quantization, aim to reduce the size and computational cost of these models while preserving rate-distortion performance. Our research explores structured pruning and mixed-precision quantization methods specifically tailored for Learned Image Compression (LIC) models, enabling efficient deployment without sacrificing coding efficiency.

Publications:

M.A.F. Hossain, F. Zhu, “Structured Pruning and Quantization for Learned Image Compression,” Proceedings of the IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, Oct 2024. Code
M.A.F. Hossain, Z. Duan, F. Zhu, “Flexible Mixed Precision Quantization for Learned Image Compression,” Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Niagara Falls, ON, Canada, Jul 2024. Code

Deep Learning Based Pre-processing

There has been a growing interest in using different approaches to improve the coding efficiency of modern video codec in recent years as demand for web-based video consumption increases. We propose a model-based approach that uses texture analysis/synthesis to reconstruct blocks in texture regions of a video to achieve potential coding gains using the AV1 codec developed by the Alliance for Open Media (AOM). The proposed method uses convolutional neural networks to extract texture regions in a frame, which are then reconstructed using a global motion model. Our preliminary results show an increase in coding efficiency while maintaining satisfactory visual quality.

Publications:

D. Chen, Q. Chen, F. Zhu, “Pixel-Level Texture Segmentation Based AV1 Video Compression,” Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brighton, UK, May 2019.
Di Chen, Chichen Fu, Zoe Liu and Fengqing Zhu, “AV1 Video Coding Using Texture Analysis With Convolutional Neural Networks”, arXiv:1804.09291, 2018.
C. Fu, D. Chen, E. J. Delp, Z. Liu, F. Zhu, “Texture Segmentation Based Video Compression Using Convolutional Neural Networks,” Electronic Imaging, Burlingame, CA, USA, Jan 2018.

Texture and Motion Based Video Coding

In recent years, there has been a growing interest in developing novel techniques for increasing the coding efficiency of video compression methods. One approach is to use texture and motion models of the content in a scene. Based on these models parts of the video frame are not coded or “skipped” by a classical motion compensated coder. The models are then used at the decoder to reconstruct the missing or skipped regions. We propose several spatial-texture models for video coding. We investigate several texture features in combination with two segmentation strategies in order to detect texture regions in a video sequence. These detected areas are not encoded using motion compensated coding. The model parameters are sent to the decoder as side information. After the decoding process, frame reconstruction is done by inserting the skipped texture areas into the decoded frames.

Publications:

M. Bosch, F. Zhu, E.J. Delp, “Segmentation Based Video Compression Using Texture and Motion Models,” IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 7, pp. 1366-1377, Nov 2011.
M. Bosch, F. Zhu, E. J. Delp, “Perceptual quality evaluation for texture and motion based video coding,” Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 2285-2288, Cairo, Egypt, Nov 2009.
M. Bosch, F. Zhu, E. J. Delp, “An Overview of Texture and Motion based Video Coding at Purdue University,” Proceedings of the 27th Picture Coding Symposium (PCS), pp. 1-4, Chicago, USA, May 2009.
M. Bosch, F. Zhu, E.J. Delp, “Video Coding Using Motion Classification,” Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 1588-1591, San Diego, USA, Oct 2008.
M. Bosch, F. Zhu, E.J. Delp, “Models for texture based video coding,” Proceedings of International Workshop on Local and Non-Local Approximation in Image Processing (LNLA), Lausanne, Switzerland, Aug 2008.
M. Bosch, F. Zhu, E. J. Delp, “Spatial Texture Models for Video Compression,” Proceedings of IEEE International Conference on Image Processing (ICIP), pp. I-93-I-96, San Antonio, USA, Sep 2007.
F. Zhu, K. Ng, G. Abdollahian, E.J. Delp, “Spatial and Temporal Models for Texture-Based Video Coding,” Proceedings of SPIE 6508, Video Communications and Image Processing 2007 (VCIP), pp. 650806-650806-10, San Jose, USA, Jan 2007.