Browse Prior Art Database

DMA Controller as Cache-Line Pre-fetcher

IP.com Disclosure Number: IPCOM000237155D
Publication Date: 2014-Jun-05
Document File: 4 page(s) / 303K

Publishing Venue

The IP.com Prior Art Database

Abstract

In a multi-core platform, Rreceive-Side Packet Steering (RPS) allows for distribution of packet processing load across cores. Under RPS, a master core receives interrupts for incoming packets and subsequently, queues packets to backlog queues associated with threads running on other cores. The secondary cores then start processing these packets, thus leading to load balancing among multiple cores. The incoming multiple network flows thus achieve some level of load balancing and performance scaling using RPS. However, when the secondary core tries to read the packet contents, it encounters a cacheline miss for the packet data and thus requires the cacheline to be fetched from main memory or from another core’s cache. This data access latency may be really high, depending on the size of the data set to be read by the secondary core.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

DMA Controller as Cache-Line Pre-fetcher

Abstract

In a multi-core platform, Rreceive-Side Packet Steering (RPS) allows for distribution of packet processing load across cores. Under RPS, a master core receives interrupts for incoming packets and subsequently, queues packets to backlog queues associated with threads running on other cores. The secondary cores then start processing these packets, thus leading to load balancing among multiple cores. The incoming multiple network flows thus achieve some level of load balancing and performance scaling using RPS.  However, when the secondary core tries to read the packet contents, it encounters a cacheline miss for the packet data and thus requires the cacheline to be fetched from main memory or from another core’s cache. This data access latency may be really high, depending on the size of the data set to be read by the secondary core.

Problem

Figure 1 depicts the flow corresponding to packet handling with RPS. The input packet is DMA’ed to host memory (memory access validated by IOMMU). The Rx interrupt is generated to the primary RPS core. Once the RPS core completes first level packet processing it queues the packet and notifies a secondary core to start processing the packet.  On receiving the notice, the secondary core attempts to process the packet. As the cache lines corresponding to the packet data aren’t present in the secondary core’s cache, it gets a read miss and subsequently gets the data either from the primary core’s cache with cacheline snooping, platform cache or from DDR memory accesses. These result in instruction pipeline stalls and thus lower performance.

Figure 1: RPS with Snoop or DDR Access on Second...