Browse Prior Art Database

Implementation of atomic memory operations utilizing hardware speculation. Disclosure Number: IPCOM000015871D
Original Publication Date: 2002-Dec-11
Included in the Prior Art Database: 2003-Jun-21
Document File: 2 page(s) / 64K

Publishing Venue




This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 49% of the total text.

Page 1 of 2

Implementation of atomic memory operations utilizing hardware speculation.


     Compare and Swap operations are used by a microprocessor to control the modification of memory that may be shared by other CPU's or programs running in the system. Without a control mechanism, two CPUs trying to reference and update the same memory locations could overwrite each others results causing a write after write hazard. Compare and Swap operations use atomic memory references to update shared storage message lists, or to create semaphores to control updates and accesses to shared regions of memory.

     The CS, CGS, and CDS instructions are variations of compare and swap instructions in the S/390 and Z-Series architecture. The instructions each contain 3 fields (R1, R2, and R3) used to specify operand data. R1 and R3 specify locations of data in the general purpose register file (GPR), and R2 specifies a location in memory. The first and second operand (specified by R1 and R2 respectively) are compared. If they are equal, the third operand (specified by R3) is stored at the second operand location. If they are unequal, the second operand is loaded into the first operand location. The result of the comparison is specified by the condition code(1). Since the decision whether or not to store operand 3 is dependent on the result of the comparison, this instruction typically takes 2 cycles to complete in a pipelined processor. The goal of this work is to describe an innovative combination of store cancelling and late-select logic to double the performance of these instructions for the non-serializing cases.

Compare and Swap Improvement

     In a pipelined processor, instructions are executing in every stage of the processor at the same time. When an instruction requires the same pipeline stage for more then one cycle, the pipeline must be stalled while that instruction continues executing in the same hardware for multiple cycles. For example, the pipeline design of a fixed point unit may include a stage for operand fetch (E0), a multi-cycle stage for instruction execution (Ex) where processed data is looped back into the same pipeline stage for additional operations, and a stage to record results (PA). When designing for an aggressive cycle time, the control signals to determine if information is to be stored into memory must be set two cycles before PA. This is necessary to provide enough time to prioritize the cache and compute the next store address. The control logic to determine which bytes are to be stored (byte marks) to memory must be set one cycle before PA.

     The following sequence shows how previous implementations of the CS, CGS, and CDS instructions require the execute pipeline stage in the fixed point unit (FXU) for two cycles. Referring to the old algorithm in Figure 1:
- E0: The operands are fetched during E0. - E1: During the first execute cycle, the write logic for the GPR is activated and the first and second operands are passed throu...