Browse Prior Art Database

Write Combining Optimization of Storage for Improving Decompression and Rendering Speed of MPEG Format Players

IP.com Disclosure Number: IPCOM000125085D
Publication Date: 2005-May-18
Document File: 2 page(s) / 126K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method for an algorithm for storing decoded minimum coded units (MCU) to main memory. The MCU has 16 pieces of data from 16 different lines of video. Benefits include delivering MCU to main memory with minimal overhead.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 66% of the total text.

Write Combining Optimization of Storage for Improving Decompression and Rendering Speed of MPEG Format Players

Disclosed is a method for an algorithm for storing decoded minimum coded units (MCU) to main memory. The MCU has 16 pieces of data from 16 different lines of video. Benefits include delivering MCU to main memory with minimal overhead.

Background

Current CPUs have many ways to deliver data to main memory. The CPU may use special instructions to deliver MCU directly to main memory, or simply store MCU to cache, then later deliver MCU to memory; however, these methods have some disadvantages:

1.      Storing data directly to memory. The CPU has special instructions (movntdq, movntq), for memory storage. The CPU uses special Write Combining (WC) buffers to store data to the cache. On the CPU Pentium 4 these buffers are 64 bytes long, however MCU has only a 16 bytes width. The CPU has to store particularly filled WC buffers 16 times per MCU. This harms memory bandwidth, and as a result, the decoding speed is decreased. 

2.      Storing data to cache. This method leads to cache pollution and harms memory bandwidth. The CPU unnecessarily loads the cache line with the destination buffer when the decoder accesses it the first time. Moreover, data that is no longer needed occupies much of the cache. As a result, the CPU spends too much time to search for a free cache line and deliver lines with destination data to main memory.

General Description

The disclosed method uses an algorithm...