Browse Prior Art Database

Enhancements for Multithreaded Processors: Improved Backward Compatibility and Improved Idle Code Detection

IP.com Disclosure Number: IPCOM000015527D
Original Publication Date: 2002-Jan-27
Included in the Prior Art Database: 2003-Jun-20
Document File: 1 page(s) / 44K

Publishing Venue

IBM

Abstract

Disclosed are two enhancements for multithreaded processors. One improves backward compatbility with code not aware that the processor is multithreaded; the other improves performance of software that is aware of the processor. A multithreaded processor is a processor that contains only one logical core pipeline but stores the thread state of multiple theads on die. The processor uses some sort of algorithm to determine which thread may use the logical core on any given clock cycle. ENHANCEMENT 1: Automatic Yield on Branch Any multithreaded processor must inlcude some sort of algorithm that determines when to switch from one bank to another. This algorithm may rely heavily on various stall conditions normal to any computer, such as cache misses. The processor automatically "puts to sleep" any thread that is stalled. Since cache misses and other stalls happen relatively frequently, this is a fair algorithm. However, some code may enter tight loops where all of the necessary instructions and data may be contained in the L1 cache. In this case, the thread might run indefinitely without yielding the logical core. A multithreaded processor could solve this problem by using branches as implicit yield points. Any time that the processor branches, that thread is considered to be a low-priority thread. If there are any other threads ready to run (i.e. not stalled), one of them will take over. However, if the current thread is the only one that is ready to run, it will continue on without stalling.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 1

  Enhancements for Multithreaded Processors: Improved Backward Compatibility and Improved Idle Code Detection

    Disclosed are two enhancements for multithreaded processors. One improves backward compatbility with code not aware that the processor is multithreaded; the other improves performance of software that is aware of the processor. A multithreaded processor is a processor that contains only one logical core pipeline but stores the thread state of multiple theads on die. The processor uses some sort of algorithm to determine which thread may use the logical core on any given clock cycle.

ENHANCEMENT 1: Automatic Yield on Branch

Any multithreaded processor must inlcude some sort of algorithm that determines when to switch from one bank to another. This algorithm may rely heavily on various stall conditions normal to any computer, such as cache misses. The processor automatically "puts to sleep" any thread that is stalled. Since cache misses and other stalls happen relatively frequently, this is a fair algorithm. However, some code may enter tight loops where all of the necessary instructions and data may be contained in the L1 cache. In this case, the thread might run indefinitely without yielding the logical core.

A multithreaded processor could solve this problem by using branches as implicit yield points. Any time that the processor branches, that thread is considered to be a low-priority thread. If there are any other threads ready to run (i.e. not stalled), one of them will take over. However, if the current thread is the only one that is ready to run, it will continue on without stalling.

This improves sharing of the processor, and thus perceived overall system performance. It does not require any code changes to existing programs.

ENHANCEMENT 2: ConditionalWaitOnSignal and Signal instructions

Many different times a thread must be paused to wait for a condition, such as the release of a semaphore, or setting of a signal. Generally, this must be provided by the operating system, and putting a thread to sleep is a relatively expensive proposition.

This could be partially solved by adding ConditionalWaitOnSignal (CWOS) and Signal (SIG) instructions to the instruction set. CWO...