Browse Prior Art Database

Method and apparatus for synchronization between out of order dispatched instruction stream and execution queue Disclosure Number: IPCOM000126022D
Original Publication Date: 2005-Jun-28
Included in the Prior Art Database: 2005-Jun-28
Document File: 9 page(s) / 147K

Publishing Venue



The key invention of the algorithms is that by assign two additional ID tags to every instruction accordingly and partially decode them in the early stage (before the queue) in a inserted re-ordering blocks effectively put the out of orderly dispatched instruction stream back to the original ?order? even after the out of order retirement. So memory resource conflict and synchronization problem can be avoided.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 20% of the total text.

Page 1 of 9


AUS820041923 Derek S Jennings/Watson/IBM Oliver K Ban

Method and apparatus for synchronization between out of order dispatched instruction stream and execution queue

In a modern super scalar microprocessor design such as the one shown in Diagram 1, it is difficult to synchronize between dispatched instruction stream and execution queue:

Bus Interface Unit with wide Data Bus

Instruction fetcher

Rename buffer

Branch Processing Unit

Cond Reg


Mux / Distribut er

Instruction Reservation


Clock Multiplier/P

Register File with multiple input ports and multiple output

Instruction Dispatch Unit With speculative out of order Dispatch d G

Load Queue

Load/store Unit with Load b ff

Internal Bus

Execution Unit

Address Translation Unit for virtual address management


I-Cache D-Cache Manageme nt

Reorder buffer

Scan chain and self-test vector generation co- processor

Level 2 on Chip Data and Instruction

Completion Unit entry reorder Buffer

Block Diagram of the VISA processor


[This page contains 2 pictures or other non-text objects]

Page 2 of 9

Diagram 1: Typical super scalar CPU architecture

A typical super scalar RISC CPU, as illustrated in Diagram 1, is a complicated concurrent operational machine, so it is usually difficult to keep bus cycle accuracy between the software model and hardware HDL design, it is especially true if the super scalar machine involved with out of order dispatched instruction stream.

The division of the blocks inside the implementation is based on the functionality separation. Another consideration is processor internal bus boundary. With these consideration in mind, processor micro architecture is divided into integer execution unit IEU, floating point unit FPU, bus interface unit BIU, instruction fetch unit IFU, floating point register file FPRF, integer register files IRF, and among some other blocks as shown in Diagram 1.

Conceptually, the processor is designed around the pipeline started from read port of vector register file, through vector execution unit, to the end of write port of vector register file. So the instruction flow side of the design is mainly done in IFU where pre-decode and fetch are performed.

The concurrency problem is complicated even more by the addition of floating point instruction stream, since the FPU has separated instruction queues and execution engines.

A typical floating point or integer execution unit block diagram will be shown in Diagram 2.

Diagram 2: Typical execution unit block diagram

The internal timing of a modern CPU is typically a multiple staged pipelined operation, as shown in Diagram 3.

If we denoted the stage number one as P-1, P-2, ...... P-n, we can describe as the following execution sequence (for simplicity reason, a 5 stage pipeline is shown in here):

P-1: multiple instructions fetched

Reservation Queue and Dispatch

Register File of Floating point


Interfac e

Memor y
Manage ...