Browse Prior Art Database

High Performance Two Cycle Loop Decimal Multiply Algorithm

IP.com Disclosure Number: IPCOM000053142D
Original Publication Date: 1981-Sep-01
Included in the Prior Art Database: 2005-Feb-12
Document File: 4 page(s) / 103K

Publishing Venue

IBM

Related People

Angiulli, JM: AUTHOR [+5]

Abstract

In certain high performance processors the decimal multiply execution is operand length and data dependent, and in most operations is a time-consuming execution. In certain commercial job environments, the decimal multiply takes up a disproportionate amount of CPU Busy Time because of the large number of cycles required to manipulate the data before the final product is determined.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 44% of the total text.

Page 1 of 4

High Performance Two Cycle Loop Decimal Multiply Algorithm

In certain high performance processors the decimal multiply execution is operand length and data dependent, and in most operations is a time-consuming execution. In certain commercial job environments, the decimal multiply takes up a disproportionate amount of CPU Busy Time because of the large number of cycles required to manipulate the data before the final product is determined.

This article describes the extension of a double-word binary parallel adder with +6 excess arithmetic to perform decimal additions. It introduces a dynamic (data) working storage addressing facility, A Register byte ingating and mark generation, shift aligning, new micro-orders to control the execution, and a new Instruction Element operand set up.

This invention uses the new facilities in a novel way and highly overlaps the processing to significantly improve the decimal multiply execution. In Fig. 1, the multiplier (OPND 1) and the multiplicand (OPND 2) roles are reversed and the multiplier is fetched, left aligned and set up in the A Reg 1 (even if it is across a DW (double word) boundary in storage). If the multiplier length is less than 8 bytes, a new shift alignment technique is used to zero out the unwanted data that is contained in the same DW. The shifter uses the inversion of the multiplier length (L2) to shift right the correct number of bytes and right align the multiplier prior to sign determination and multiplier multiple generation. The multiplicand is ingated into the A Reg 1 under microcode control as the multiplier is gated to the B Reg 2. After the product sign is determined and retired, the multiplier multiples (0-9) are built via the parallel decimal adder 3 and put away in corresponding (0- 9) working store 4 locations for future reference. The first multiplicand digit is ingated via the serial adder 5, and the two-cycle overlapped tight loop shown in Fig. 2 is entered. This loop is the key underlying action that makes the significant performance gain possible with this algorithm.

In essence, the two-cycle processing loop performs as follows. In cycle 1, the partial product corresponding to multiplicand digit (D1) is being developed (B+C-- B) in the B Reg 2. (The working store 4 location was addressed and read out by D1 and gated into C Reg 6 in the previous cycle.) The multiplier multiple is added to the running partial product (initially 0) in the decimal adder 3 and gated to the B Reg 2. Also, the next multiplicand digit (D2) is ingated from A Reg 1 to set up the reading of the corresponding working store 4 location (multiplier multiple) in the next cycle.

In cycle 2, the final product for D1 is retired from B Reg 2 (60-63) via the serial adder 5 to the A Reg 1 for building prior to storing. The final product digits replace the originally fetched multiplicand digits in the area. The partial product is shifted one digit position to line it up for the next digit (D2) addit...