Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Method to Enable Receive Packet Steering to Multicore for a Single Elephant Flow

IP.com Disclosure Number: IPCOM000236758D
Publication Date: 2014-May-14
Document File: 5 page(s) / 524K

Publishing Venue

The IP.com Prior Art Database

Abstract

High performances Network Interface Cards (NIC) classify and distribute incoming flows to different queues. With distribution, all packets belonging to a flow are queued in the same queue. These NICs have multiple physical queues so that a particular flow can be assigned to a particular queue. A high performance NIC has multiple CPUs to process the packet. For the receive-side packet processing, these queues have a logical affinity to the CPUs in the Multi-core SoC as shown in figure 1.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 45% of the total text.

Method to Enable Receive Packet Steering to Multicore for a Single Elephant Flow

Abstract

High performances Network Interface Cards (NIC) classify and distribute incoming flows to different queues. With distribution, all packets belonging to a flow are queued in the same queue. These NICs have multiple physical queues so that a particular flow can be assigned to a particular queue. A high performance NIC has multiple CPUs to process the packet. For the receive-side packet processing, these queues have a logical affinity to the CPUs in the Multi-core SoC as shown in figure 1.

Figure 1: Classification of packets for multi-core system


The network flows are typically directed to one of the queues to keep all the packets processed in the same order. The ordering is not a hard requirement for all types of flows e.g. in UDP flow, ordering is not a strict requirement. There is no specific relationship/dependency of packets in state-less flows amongst each-other.

Due to the affinity of one flow to one queue and thus to one CPU, the processing rate for a flow is limited by the processing power of the CPU. The affinity of a flow to a queue is not a hard requirement for state-less packet forwarding and thus one queue per flow queue affinity can be relaxed. This paper describes the apparatus and method for state-less distribution of packets among multiple queues to achieve better CPU load balancing and thus better CPU utilization. This paper is valid for handling packets that do not require packet ordering and are state-less with respect to other packets of the flow.

Problem

If there is a single high bandwidth network stream that is mapped to single flow, all the packets belonging to single flow are en-queued in the single queue and eventually those packets are routed to one CPU, one at a time. This causes congestion even when other CPUs are available to process the packets. The impact of high-bandwidth flows being routed to a single CPU is shown in figure 2.

Figure 2: Elephant flow processing by single CPU

This scenario can be generalized for multiple physical queues handling respective elephant flows. Due to instantaneous queue-to-core affinity, these flows are only able to get processed by the particular CPU even when other CPUs are having available bandwidth to process the packet resulting in congestion and performance limit.

Solution

RPS is one of the software ways where core C0 analyzes each packet and can steer to the other core(s) based on their load-profile. This causes the additional work-load on primary core C0 as well as cores need to maintain the semaphore mechanism to access the shared resource (i.e. physical queue 0). In addition, RPS again distributes packets belonging to one flow to one cpu at a time. This method is an offline redistribution of the packets.

The paper presents programmable packet re-distributor logic for state-less flows. This scheme results in-line redistribution of packets based on the loading of processing engines. The...