Browse Prior Art Database

Method and apparatus for autonomous statistic support

IP.com Disclosure Number: IPCOM000239523D
Publication Date: 2014-Nov-13
Document File: 4 page(s) / 170K

Publishing Venue

The IP.com Prior Art Database

Abstract

The data-path applications need to keep track of various statistics. There are pre-processing statistics i.e. before packet is sent to data-plane or post-processing statistics i.e. those which are updated based on processed packet by data-plane. The statistics are per network-sessions or per-flows and needs to be periodically read by control plane. This results in contention for the statistics between multiple data-plane threads at pre-processing and post-processing stages. On a Multicore processor, in presence of load-distribution among multiple cores per network flow, the statistic updates requires synchronization to avoid simultaneous access and corruption. High-end processors have programmable I/O controllers on chip. The per-flow packets are sent to these programmable I/O controllers for specialized work by the network interface card (NIC) after classification to their respective flows for pre-process statistic updates. Data-plane application can send post-processed packets for respective statistic updates before autonomously sending packet out on egress interface. CPUs running data-plane application don’t need to synchronize and thus are offloaded from statistic updates.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 51% of the total text.

Method and apparatus for autonomous statistic support

Abstract

The data-path applications need to keep track of various statistics. There are pre-processing statistics i.e. before packet is sent to data-plane or post-processing statistics i.e. those which are updated based on processed packet by data-plane. The statistics are per network-sessions or per-flows and needs to be periodically read by control plane. This results in contention for the statistics between multiple data-plane threads at pre-processing and post-processing stages.

On a Multicore processor, in presence of load-distribution among multiple cores per network flow, the statistic updates requires synchronization to avoid simultaneous access and corruption. High-end processors have programmable I/O controllers on chip. The per-flow packets are sent to these programmable I/O controllers for specialized work by the network interface card (NIC) after classification to their respective flows for pre-process statistic updates. Data-plane application can send post-processed packets for respective statistic updates before autonomously sending packet out on egress interface. CPUs running data-plane application don’t need to synchronize and thus are offloaded from statistic updates.

Problem

The data-path applications need to keep track of various statistics. There are pre-processing statistics i.e. before packet modification by data-plane or post-processing statistics i.e. those which are updated after packet contents are updated by application. The statistic update for both stages repeated for every packet and flow.

On a Multicore processor, in presence of load-distribution among multiple cpus per network flow, the statistic updates becomes a contentious operation at both stages:

Statistics updates are intrusive: Both the pre-processing and post-processing stages of application have to perform load/store access from DDR for statistics and take logical decision on multiple statistics to be updated per packet. Following are the approaches available:

1.    Shared Statistics: As shown in figure 1 below, statistics per flow are maintained anywhere in DDR. When program running on multiple-cores needs to update the statistic, a lock is acquired to make sure no two thread contexts are updating statistics at same time. After updates, the locks are released. Due to serial access of the shared statistics, heavy coherency traffic for statistics, Instruction pipeline stall during statistic loading, cache thrashing, context switches among application threads etc. reduces the performance.

Figure 1: Locked statistic updates

Increase in number of flows further introduces cache thrashing due to the shared statistics per flow. If one tries to pack multiple shared statistics, the false cacheline sharing increases the performance issues.

2.    PerCPU statistics: With per CPU statistics, the number of statistic copies increases with number of online CPUs. This results in increased memory requirement than would...