Browse Prior Art Database

Method for software-enforced strict processor affinity to reduce coherency traffic in a cache-coherent MP system

IP.com Disclosure Number: IPCOM000033803D
Publication Date: 2004-Dec-28
Document File: 4 page(s) / 52K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method for software-enforced strict processor affinity to reduce coherency traffic in a cache-coherent multiprocessor (MP) system. Benefits include improved functionality and improved performance.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 45% of the total text.

Method for software-enforced strict processor affinity to reduce coherency traffic in a cache-coherent MP system

Disclosed is a method for software-enforced strict processor affinity to reduce coherency traffic in a cache-coherent multiprocessor (MP) system. Benefits include improved functionality and improved performance.

Background

              Processor affinity is a conventional technique to map processes to processors in an MP system. The general idea of processor affinity has also been used in the networking space by having the network interface card (NIC) map incoming packets to processors based on the flow of the packet. Existing techniques do not define application data structures that are flow-specific and rely on temporal and spatial locality to keep flow-specific data in a given processor’s cache.

              In conventional cache-coherent MP systems, hardware maintains data cache coherency via a protocol, such as snooping or a directory mechanism. For flow-specific data structures, the protocol imposes unnecessary overhead due to increasing the demands on bus bandwidth.  For example, the modified exclusive shared and invalid (MESI) cache state defines different states for a cacheline in an MP system. These states enable a single processor to own a cacheline without broadcasting changes in that cacheline to other processors. However, a processor using these states is still required to snoop for all changes to all cachelines, even if those cachelines are flow-specific and are never be accessed or modified by another processor.

 

              For networking applications, if all of the packets in the same transmission control protocol (TCP) stream can be pinned to a particular processor in an MP system, all flow-specific data can be pinned to that processor without running cache coherency on this data during the lifetime of the flow.

              Flow pinning is performed to increase data cache locality. However, some cache coherency traffic is still unnecessarily present on the processor-to-processor interconnect (see Figure 1).

              MP cache coherency protocols reduce the amount of coherency traffic by enabling one processor to own a cacheline. However, these protocols must assume that another processor could, at any time, obtain access to any of the owned cache lines.

              A high performance, wireless and networking architecture includes per-cache line locking with which the processor can ensure a given cache line is never evicted. This concept is not conventionally extended to MP systems.

              A typical flow is defined by the 5-tuple, which is an Internet Protocol term defining a TCP stream. However, each application can have a different definition of a flow. In general, a flow is a group of related packets that can require processing to occur in order and access the same flow-specific state.

General description

              The disclosed method detects flow-specific data through a simple software construct. Cache co...