Browse Prior Art Database

Human Visual System Based Rate Scalable Video Coding Disclosure Number: IPCOM000014938D
Original Publication Date: 2002-Jan-29
Included in the Prior Art Database: 2003-Jun-20

Publishing Venue



Human Visual System Based Rate Scalable Video Coding

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 13% of the total text.

Page 1 of 13

Human Visual System Based Rate Scalable Video Coding

Human Visual System Based Rate Scalable Video Coding

Ligang Lu and Zhou Wang

What is disclosed is an Human Visual System (HVS) based scalable video coding scheme.

A. General Framework First, we divide the whole video sequence into group of pictures (GOP). Each GOP has one intra coding frame (I frame) and the rest are prediction coding frames (P frames).

  For encoding I frame, we first apply the discrete wavelet transform (DWT) and obtain the wavelet coefficients. The Human Visual System (HVS) model is employed to determine the visually important points in the image and the importance values of the wavelet coefficients. These values are converted to weight the importance of wavelet coefficients. An embedded encoding algorithm is then used to generate the scalable bitstream.

  For the P frames, we do motion estimation from their previous frames. The result of the motion estimation algorithm is a set of motion vectors. The motion vectors are used to do motion compensation. The motion compensation is done on two versions of the previous frames. One is the original previous frame. The other is a decoded version of the previous frame. The final prediction frame is the weighted combination of the two motion compensation results. The weighting values are from the HVS model. This is a novel P frame prediction technique and will be discussed in great details later. The wavelet transform is applied to the prediction error frame, and the resulting coefficients are HVS-weighted and coded with the encoding algorithm.

  During the encoding process, a rate control algorithm is used to allocate bits to each frame. The allocation is determined by the available bandwidth, the HVS modeling results and the frame prediction error.
B. Automatic Determination of Important Regions Every point in the picture could be visually very important. However, it is not practical to exam all the points. One reason is the high computational complexity. The other reason is that we have to use too many bits to encode the coordinates of all the selected points. We choose centers of all the macro blocks as candidate foveation points to limit the computation and we can use only one bit for each macro block to encode this information.

  The methods to find foveation points for I frames and P frames are different. For I frames, we first detect the points that is interesting to human eyes. In the current work, human faces are assumed to be of interest. Also, bright areas are selected as important regions because human eyes are more sensitive to the errors occurred in bright areas.

  For P frames, the most important information is how much new information is given in comparison with the prediction frame. Therefore, bad motion prediction


Page 2 of 13

regions are selected as foveation points. We also care more about the regions of interest, but if the motion prediction from the previous frame is already good enough, it is not necess...