Browse Prior Art Database

Multi-channel pipelined accumulator

IP.com Disclosure Number: IPCOM000234853D
Original Publication Date: 2014-Feb-11
Included in the Prior Art Database: 2014-Feb-11
Document File: 3 page(s) / 324K

Publishing Venue

Microsoft

Related People

Karin Strauss: INVENTOR [+4]

Abstract

In the area of high efficiency computing, a common type of computation is reduction. In this context, multiple reductions could be processed in parallel. For example, in matrix-vector multiplication, each row in the matrix goes through a dot product operation with the vector (i.e., for each index of the same-sized row and vector, the row element found at that index is multiplied with the corresponding vector element and all results added/accumulated). Dot product operations for different rows can be performed in parallel. Accumulators for certain accumulation operations such as floating point additions may take longer than one cycle. In this example, dedicating one accumulator per dot product unit may result in additional complexity and/or waste (i.e., cycles in which accumulators go unused). This document describes the architecture of a high efficiency multi-channel pipelined accumulator. For highest efficiency, this module offers as many channels as the number of cycles corresponding to the latency of the its accumulator, admits 1 input per channel per cycle and is capable of emiting up to 1 accumulated result per cycle. This module achieves high efficiency by keeping all its adders busy every cycle, as long as sets of elements to be added for a channel are equal in number or larger than the number of cycles corresponding to the latency of the accumulator used in the design. This solution could enable high performance and efficiency computations that involve accumulation, such as those used in computational fluid dynamics, computer vision, robotics, structural engineering, machine learning, and financial modeling, among others.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 39% of the total text.

Page 01 of 3

Defensive Publication Form

Document Author (alias) kstrauss

Defensive Publication Title Multi-channel pipelined accumulator

Name(s) of All Contributors

Scott Hauck, Karin Strauss, Jeremy Fowers, Kalin Ovtcharov

Summary of the Defensive Publication/Abstract

In the area of high efficiency computing, a common type of computation is reduction. In this context, multiple reductions could be processed in parallel. For example, in matrix-vector multiplication, each row in the matrix goes through a dot product operation with the vector (i.e., for each index of the same-sized row and vector, the row element found at that index is multiplied with the corresponding vector element and all results added/accumulated). Dot product operations for different rows can be performed in parallel. Accumulators for certain accumulation operations such as floating point additions may take longer than one cycle. In this example, dedicating one accumulator per dot product unit may result in additional complexity and/or waste (i.e., cycles in which accumulators go unused).

This document describes the architecture of a high efficiency multi-channel pipelined accumulator. For highest efficiency, this module offers as many channels as the number of cycles corresponding to the latency of the its accumulator, admits 1 input per channel per cycle and is capable of emiting up to 1 accumulated result per cycle. This module achieves high efficiency by keeping all its adders busy every cycle, as long as sets of elements to be added for a channel are equal in number or larger than the number of cycles corresponding to the latency of the accumulator used in the design.

This solution could enable high performance and efficiency computations that involve accumulation, such as those used in computational fluid dynamics, computer vision, robotics, structural engineering, machine learning, and financial modeling, among others.


Page 02 of 3

Defensive Publication Form

Description: Include architectural diagrams and system level data flow diagrams if: 1) they have already been prepared or 2) they are needed to enable another developer to implement your defensive publication. Target 1-2 pages, and not more than 5 pages.

The central piece of this module is a pipelined accumulator. A pipelined accumulator as defined here accepts two inputs per cycle and emits one result per cycle corresponding to the addition of two inputs from a prior cycle. Note that what is stated in the previous sentence refers to the throughput of the pipelined accumulator, i.e., how many inputs it admits and outputs it emits, and does not say much about its latency,


i.e., how long it takes for the addition of two inputs to be emitted as a result in the pipelined accumulator's output. Some types of addition operations such as floating point additions may require more than one cycle to complete, making the latency of the pipelined accumulator n cycles, where n is an integer greater than 0.

A pipeline accumulat...