Browse Prior Art Database

Methods and apparatus for implementing dot products and performing data reformatting in a SIMD vector-media unit

IP.com Disclosure Number: IPCOM000021052D
Original Publication Date: 2003-Dec-18
Included in the Prior Art Database: 2003-Dec-18
Document File: 3 page(s) / 71K

Publishing Venue

IBM

Abstract

According to the present invention, there is provided one or more of (1) an architecture implementing a dot-product function without limiting or impacting the main data flow, yet having the capability to share most of the data path of the SIMD unit, with a final adder the only additional hardware needed. Traditional reduction units are arranged as a separate unit, and thus require additional chip area; (2) the architecture of the scalar result file, which allows itself to be accessed as either single scalar values, or by an instruction-specified n-element SIMD word to combine data as would be needed to pack data in a specific format. This register file offers several benefits, as it allows dynamic data reformatting (in particular, but not limited to, packing) and decouples the critical path to update the SIMD register file from the worst-case pipeline delay associated with dot product execution; (3) a scalar access file which can be used to implement efficient data reformatting, in addition to the previously described use as target and staging area for dot-product or other computational operations.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 51% of the total text.

Page 1 of 3

THIS COPY WAS MADE FROM AN INTERNAL IBM DOCUMENT AND NOT FROM THE PUBLISHED BOOK

YOR820030032 Louis J Percello/Watson/IBM Michael Gschwind

Methods and apparatus for implementing dot products and performing data reformatting in a SIMD vector -media unit

According to the present invention, there is provided (one or more of): an architecture implementing a dot-product function without limiting or impacting the main data flow, yet having the capability to share most of the data path of the SIMD unit, with a final adder the only additional hardware needed. Traditional reduction units are arranged as a separate unit, and thus require additional chip area; the architecture of the scalar result file, which allows itself to be accessed as either single scalar values, or by an instruction-specified 4 element SIMD word to combine data as would be needed to pack data in a specific format. This register file offers several benefits, as it allows dynamic data reformatting (in particular, but not limited to, packing) and decouples the critical path to update the SIMD register file from the worst-case pipeline delay associated with dot product execution; a scalar access file which can be used to implement efficient data reformatting, in addition to the previously described use as target and staging area for dot-product or other computational operations.

The aim of the present invention is to provide data arrangement flexibility which does not require a permute unit, since the permute unit is restricted to sourcing from 2 VMX registers (or similar media architectures). To maintain code compatibility and the existing VMX programming model which programmers are skilled at exploiting, this invention adds a separate register file. The main benefits of the this structure are small size, and hence reduced penalty for the various bussing structures, and the ability to efficiently provide multiple rename accesses which for accessing vector registers (VFR in the VMX specification) would be above and beyond the pre-existing ports already required to support multiple VMX execution piplines.

The present invention addresses the need to compute a plurality of dot-products on SIMD vectors, then pack the dot products in a single long vector for storage to memory. This style of programming is necessary in some environments, e.g., those having adopted the AOS (array-of-structures) vertex encoding style over the more efficient SOA (structure-of-arrays) vertex encoding style. These operations are hard to implement in modern SIMD architectures, as they require non-optimized data flow inside the macro, and additional merge operations (or partial writeback to registers).

According to the present invention, a scalar access file is provided, and is used as target for dot product and other operations (preferably used for those generating a single scalar value in lieu of a vector). The result of a dot product operation is allocated to a singl scalar access file, which has s...