Browse Prior Art Database

Asymmetric Execution/Decoder Concurrence

IP.com Disclosure Number: IPCOM000052472D
Original Publication Date: 1981-Jun-01
Included in the Prior Art Database: 2005-Feb-11
Document File: 2 page(s) / 51K

Publishing Venue

IBM

Related People

Agerwala, TK: AUTHOR [+2]

Abstract

A 2-at-a-time fixed-point E-unit can be implemented at a much lower cos than a 2-at-a-time decoder. An infinite cache flow analysis shows that a balance between decoder and %-unit concurrences is most effective. However, finite cache penalties have become a larger portion of achievable cycles per instruction, and a productive work opportunity exists if the decoder can continue to process instructions during the cycles following an operand miss. The scheme set forth enhances the cycles saved by implementing an E-unit with higher concurrency than the I-unit (as shown in Fig. 1).

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 100% of the total text.

Page 1 of 2

Asymmetric Execution/Decoder Concurrence

A 2-at-a-time fixed-point E-unit can be implemented at a much lower cos than a 2-at-a-time decoder. An infinite cache flow analysis shows that a balance between decoder and %-unit concurrences is most effective. However, finite cache penalties have become a larger portion of achievable cycles per instruction, and a productive work opportunity exists if the decoder can continue to process instructions during the cycles following an operand miss. The scheme set forth enhances the cycles saved by implementing an E-unit with higher concurrency than the I-unit (as shown in Fig. 1).

As shown in Fig. 2, a high performance pipelined processor with balanced 1- at-a-time I and E units will decode and execute instructions A through L in 17 cycles. This assumes - no pipeline disruptions, - decode and operand fetch during a cache miss, - 2-cycle cache access, See original. - a cache with concurrent access and - line put away, and - no additional cache faults.

An enhancement of the E-unit concurrency to 2 at a time (as proposed above) will decode and execute the same instructions in 12 cycles. This is illustrated in Fig. 3.

Under the above assumptions, if the queue size is commensurate with the number of cycles required to resolve a cache fault, the 2 at a time E-unit recovers the entire cache fault penalty.

1

Page 2 of 2

2

[This page contains 5 pictures or other non-text objects]