Browse Prior Art Database

Storage Reservation Method for a Superscalar Embedded Processor with Lower Power Density Consideration in Multi-processor System

IP.com Disclosure Number: IPCOM000124403D
Original Publication Date: 2005-Apr-19
Included in the Prior Art Database: 2005-Apr-19
Document File: 3 page(s) / 40K

Publishing Venue

IBM

Abstract

Disclosed is a new technique for implementing lwarx and stwcx storage reservation mechanisms in Multiprocessing system design. This technique provides a special mechanism between the processor and L2 cache to handle stwcx. instruction rather uniquely, and observe and process the reservation flag and address in the processor for snoop push particular way.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 3

Storage Reservation Method for a Superscalar Embedded Processor with Lower Power Density Consideration in Multi-processor System

Many Multiprocessing (MP) based system approaches are to maintain the reservation mechanism (reservation address granule and reservation control) within the L2 cache (or Bus Interface Unit) where the processor bus traffic can be observed closely and where the logic unit (L2 cache) is the closest to the bus. These processors in the systems are designed to execute instructions out-of-order to reduce "process delay/latency issues and effects", and are implemented to process the reservation in its L2 cache. Thus, the processor in such a system needs to wait for the L2 cache to complete reservation requiring instruction before it's operations progress forward.

    A solution implemented in Embedded Central Processing Unit Core (PPC450) embedded processor has been to incorporate "lwarx and stwcx."storage reservation mechanism into the existing PPC440 design and L2 cache without complicating its design and with the minimum logic changes in CPU and maintain the minimum processor power dissipation. The processor L1 data cache CAMRAM design (custom circuits) has also been kept intact. Since PPC440 has had a design for basic pipeline structure for lwarx and stwcx. instructions, and since the L2 cache has had a snooping logic, a reservation granule handling feature is needed to be added to the processor with L1 D-cache being a write-through, and a combination of L1 D-cache and L2 cache to resolve stwcx. operation. "Reservation only" snoop mechanism has also been added between the L2 cache and the processor L1 data cache to handle certain snoops. This mechanism gives a good lwarx and stwcx. performance with minimum processor power and size increase but maintains the most of the processor pipeline structure and the CAMRAM design (no ECC or MESI needed in L1 D-cache), since the bus snoop traffic is filtered by L2 cache but reservation control is maintained within the L1 cache. The L2 cache capability of being "Inclusive cache" feature has been utilized to further reduce snoop traffic effects to L1 data cache and the reservation logic in the processor. The overall implementation approach is illustrated in Figure-1.

    PPC450 processor is a redesigned PPC440 with L2 cache. This is illustrated in Figure-1. The overall design of PPC440 is apparent and known amongst IBM designers; it is a 2-way issue superscalar processor design and has separate instruction and data L1 caches. The L1 data-cache will be configured to a Write-through cache for our implementation to support memory coherency. The L2 cache will be configur...