Browse Prior Art Database

Method for preprocessing variable width ALU operands

IP.com Disclosure Number: IPCOM000016698D
Publication Date: 2003-Jul-09
Document File: 5 page(s) / 178K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method for ALU operand preprocessing in an architecture which supports ALU operations of different width. Benefits include improved performance and improved support for future technologies.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 38% of the total text.

Method for preprocessing variable width ALU operands

Disclosed is a method for ALU operand preprocessing in an architecture which supports ALU operations of different width. Benefits include improved performance and improved support for future technologies.

Background

        � � � � � The method is used in a simple reduced instruction set (RISC) microprocessor core designed for low-level data processing. Used in a subsystem, the microprocessor core acts as a controller and linker to a variable number of coprocessors. The core itself mainly performs arithmetic operations involved in data processing and address manipulation. It is a 32-bit machine, but all arithmetic logic unit (ALU) instructions are available in 8/16/32 bit versions. A range of Boolean operations are provided in the ALU, including barrel shifting, addition, and subtraction. From a timing point of view, the longest path through the ALU typically involves the 32-bit adder.

        � � � � � The core is implemented as a four-stage pipeline (though some instructions go through a fifth stage). The four stages are (see Figure 1):

•        � � � � Instruction fetch

•        � � � � Decode

•        � � � � ALU operation

•        � � � � Memory access

        � � � � � As with all pipelined machines, the core’s design must address data dependency issues when an instruction requires an operand that is being modified by one of the two instructions that immediately precede it. To eliminate data dependency stalls, the core employs the typical approach of bypassing the combinatorial outputs of the ALU back to the stage that fetches the operands.

        � � � � � The decode stage of the pipeline examines the current instruction and processes it, including generating addresses for 1 or 2 operands. These addresses are used to fetch operands from the register file. However, data dependency occurs when one of the two previous instructions has modified the required operand. Because operand write-back does not occur until the fourth stage of the pipeline, the value in the register file is out of date and should not be used. The operand fetch logic includes bypass matching. The processor compares the required operands against those being modified by the previous two instructions and selects data bypassed back from the ALU or memory stages as appropriate.

        � � � � � Due to the combined 8/16/32-bit functionality, the core is considered to be sliced into 4-byte lanes with lane 0 containing the most significant byte. The 32-bit adder is comprised of 4-byte adders with the carry output of adder 3 feeding into the carry output of adder 2 and so on. Only one (or two) byte lanes of an operand can be bypassed, for example, if the required operand is 32 bits and the previous instruction modifies one of its bytes.

        � � � � � Several operands are visible to the operator (see Figure 2). The 8/16/32 bit operands are shared. All registers d0-d23 are valid as 8 bit operands, but only even numbered registers are valid as 16-bit operands and only those divisible by...