Browse Prior Art Database

Elimination of Address Generation Interlocks on Sequences of Load Instructions

IP.com Disclosure Number: IPCOM000047089D
Original Publication Date: 1983-Sep-01
Included in the Prior Art Database: 2005-Feb-07
Document File: 4 page(s) / 59K

Publishing Venue

IBM

Related People

Meltzer, D: AUTHOR

Abstract

The method described reduces the effects of address generation interlocks on processor performance for sequences of load instructions commonly used by programming to obtain addresses of control blocks. A common sequence of instructions used by control programs to obtain addressability to system control blocks is, e.g.: L 1,16(0,1) L 1,X(,1) L 1,Y(,1) where X and Y are particular constants depending on which control block the programmer desires to access. Execution of this sequence of instructions on an overlapped machine with 2-cycle cache and single-cycle decode/address generate would be as shown in Fig. 1. The cycles labelled A are Decode/Agen cycles; C1 is the first and C2 the second cache access cycles; and X is the execution cycle for simple ops.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 42% of the total text.

Page 1 of 4

Elimination of Address Generation Interlocks on Sequences of Load Instructions

The method described reduces the effects of address generation interlocks on processor performance for sequences of load instructions commonly used by programming to obtain addresses of control blocks. A common sequence of instructions used by control programs to obtain addressability to system control blocks is, e.g.: L 1,16(0,1) L 1,X(,1) L 1,Y(,1) where X and Y are particular constants depending on which control block the programmer desires to access. Execution of this sequence of instructions on an overlapped machine with 2-cycle cache and single-cycle decode/address generate would be as shown in Fig. 1. The cycles labelled A are Decode/Agen cycles; C1 is the first and C2 the second cache access cycles; and X is the execution cycle for simple ops. The timing diagram shows the discovery of the AGI condition at cycles 2 and 5. The timing also assumes that the exemplary machine has a bypass mechanism such that cycles 4, 7 and 10 have both the execution cycles for the load and the Agen/decode cycle for the waiting instruction on the same cycle.

This sequence then represents the minimum execution time on the exemplary machine with the assumptions described above, which is known from prior art. Note that instruction 3 is also commonly a BC or BCR or BALR which uses the address obtained from the control block whose address was loaded from the preceding instruction. Simple ops on the exemplary machine would execute with a Decode/ agen every cycle until some pipeline disruption. The sequence of instructions shown above decodes one instruction every four cycles. In the machine assumed, many cycles are lost due to AGI conditions, and of these many are caused by L-L, L-BALR, or L-BCR sequences. The sequences of instructions, when used as described to obtain addresses of control blocks have the characteristic that on any one machine with its resident operating system, the load instructions always occupy a fixed virtual address, and for the V=F nucleus, a fixed real address. In general, the instruction at a particular location always returns the same result each time it is executed with the exception of the third instruction. This will return the same result when used to obtain addressability to global control blocks and different results when used for private control blocks. Using this characteristic, a lookaside scheme is used to resolve the AGI in advance of the time when it would be resolved according to the timing chart shown above. Fig. 2 shows a solution as it would be embedded in the data path of an exemplary machine. Normal operand references to the cache are made via a virtual address EA bus 1 from the I unit.

This address would normally address, in parallel, a translation mechanism shown as DLAT 8 and a directory for the cache 10. The appropriate compare logic then gates, via 13, the appropriate element in the congruence class stored in the cache array...