Browse Prior Art Database

Matching Prediction Algorithms to Accelerate Cache Coherence Numa Protocol

IP.com Disclosure Number: IPCOM000031030D
Original Publication Date: 2004-Sep-07
Included in the Prior Art Database: 2004-Sep-07
Document File: 1 page(s) / 41K

Publishing Venue

IBM

Abstract

Directory coherence protocols, available in any NUMA system suffer from performance degradation when dealing with 3-hop fetches. This is especially a problem in commercial workloads (due to the sharing patterns). Today several algorithms exist to monitor and predict the 3-hop fetches to accelerate performance. Presented here is a method that takes this performance acceleration a step forward.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 1

Matching Prediction Algorithms to Accelerate Cache Coherence Numa Protocol

Directory coherence protocols, available in any NUMA system suffer from performance degradation when dealing with 3-hop fetches. This is especially a problem in commercial workloads (due to the sharing patterns). Today several algorithms exist to monitor and predict the 3-hop fetches to accelerate performance. Presented here is a method that takes this performance acceleration a step forward.

    The core idea is to have the NUMA system support several different algorithms, from the basic NUMA algorithm to broadcast and suggested accelerators, and to match each memory region (cache line or larger) by analyzing its dynamic behavior to its best match from the pool of algorithms. All of this is done by hardware on run time. The dynamic analysis is based on the number of readers and writers in competition.

    The main advantage is fine-tuning the latency-bandwidth trade-off on a memory region basis, thus achieving higher performance.

    First insert a notion of time. This is needed to supply time-stamps and to ensure that all conclusions are not too old and that they are probably still relevant (for example a very slow T=2 bit counter that will count 01->10->11, where 00 is invalid and 2 generation old time-stamps become invalidated).

    A sorting hardware (in each directory) should classify the memory chunk to one of the following example groups:

1 - Chunks that are in no competition (up to a single writing and reading node) 2 - C...