Data compression and generative modeling are two fundamentally related tasks. Intuitively, the essence of compression is to find all “patterns” in the data and assign fewer bits to more frequent patterns. To know exactly how frequent each pattern occurs, one would need a good probabilistic model of the data distribution, which coincides with the objective of (likelihood-based) generative modeling. Motivated by this, we research the problem of image and video compression from the perspective of probabilistic generative modeling. Publications:
|
Visual data are traditionally designed to be viewed by human, and thus the compression techniques are also designed to reconstruct the original data. A recent paradigm, namely Coding for Machines, has been grown rapidly to embrace the era of AI and deep learning. In many modern applications that involves autonomous visual analysis, visual data needs to be compressed, stored/transmitted, but is then only processed by an AI algorithm (instead of human eyes) and is never reconstructed to the original form. Traditional methods that are mostly designed to reconstruct visual signal are thus inefficient in this new paradigm, and new techniques must be developed to account for new challenges and requirements. Publications:
|
In recent years, there has been a growing interest in developing novel techniques for increasing the coding efficiency of video compression methods. One approach is to use texture and motion models of the content in a scene. Based on these models parts of the video frame are not coded or “skipped” by a classical motion compensated coder. The models are then used at the decoder to reconstruct the missing or skipped regions. We propose several spatial-texture models for video coding. We investigate several texture features in combination with two segmentation strategies in order to detect texture regions in a video sequence. These detected areas are not encoded using motion compensated coding. The model parameters are sent to the decoder as side information. After the decoding process, frame reconstruction is done by inserting the skipped texture areas into the decoded frames. Publications:
|
Using similar approach to texture-based video coding, we consider motion models based on human visual motion perception. We describe a motion classification model to separate foreground objects containing noticeable motion from the background. This motion model is then used in the encoder to again allow regions to be skipped and not coded using a motion compensated encoder. Our results indicate significant increase in terms of coding efficiency in comparison to the spatial texture-based methods. Publications:
|
There has been a growing interest in using different approaches to improve the coding efficiency of modern video codec in recent years as demand for web-based video consumption increases. We propose a model-based approach that uses texture analysis/synthesis to reconstruct blocks in texture regions of a video to achieve potential coding gains using the AV1 codec developed by the Alliance for Open Media (AOM). The proposed method uses convolutional neural networks to extract texture regions in a frame, which are then reconstructed using a global motion model. Our preliminary results show an increase in coding efficiency while maintaining satisfactory visual quality. Publications:
|