Browse Prior Art Database

Method for inline atomic statistic updates with programmable I/O controller

IP.com Disclosure Number: IPCOM000239522D
Publication Date: 2014-Nov-13
Document File: 4 page(s) / 119K

Publishing Venue

The IP.com Prior Art Database

Abstract

The data-path applications need to keep track of various statistics including total packet, good/bad packet counters, ICV failures, byte count etc. These statistics exists at various program pipeline stages and updated for every processed packet. The statistics are per network-sessions or per-flows and needs to be periodically read by control plane. This results in contention for the statistics between multiple data-plane threads. On a Multicore processor, in presence of load-distribution among multiple cores per network flow, the statistic updates requires synchronization to avoid simultaneous access and corruption. High-end processors use programmable I/O controllers for job processing at various pipeline stages. The per-flow packets are sent to these programmable I/O controllers for specialized work. The proposed design helps in updating data-path per-flow statistics during the job-processing by these I/O controllers without data-path core intervention when packets are sent to these I/O controllers. CPUs running data-plane application don’t need to synchronize and thus are offloaded from statistic updates.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Method for inline atomic statistic updates with programmable I/O controller

Abstract

The data-path applications need to keep track of various statistics including total packet, good/bad packet counters, ICV failures, byte count etc. These statistics exists at various program pipeline stages and updated for every processed packet. The statistics are per network-sessions or per-flows and needs to be periodically read by control plane. This results in contention for the statistics between multiple data-plane threads.

On a Multicore processor, in presence of load-distribution among multiple cores per network flow, the statistic updates requires synchronization to avoid simultaneous access and corruption. High-end processors use programmable I/O controllers for job processing at various pipeline stages. The per-flow packets are sent to these programmable I/O controllers for specialized work. The proposed design helps in updating data-path per-flow statistics during the job-processing by these I/O controllers without data-path core intervention when packets are sent to these I/O controllers. CPUs running data-plane application don’t need to synchronize and thus are offloaded from statistic updates.

Problem

The data-path applications need to keep track of various statistics including total packet, good/bad packet counters, ICV failures, byte count etc. These statistics exist at various program stages and updated for every processed packet per flow.

On a Multicore processor, in presence of load-distribution among multiple cpus per network flow, the statistic updates becomes a contentious operation:

Statistics updates are intrusive: The pipeline stages in a processing application has to perform load/store access from DDR for statistics and take logical decision on multiple statistics to be updated per packet. Following are the approaches available:

1.    Shared Statistics: Here statistics per flow are maintained anywhere in DDR. When program running on multiple-cores needs to update the statistic, a lock is acquired to make sure no two thread contexts are updating statistics at same time. After updates, the locks are released. Due to serial access of the shared statistics, problem relates to coherency traffic for statistics, Instruction pipeline stall during statistic loading, cache thrashing, program footprint, context switches among application threads etc. reduces the performance.

Figure 1: Locked statistic updates

Increase in number of flows further introduces cache thrashing due to the shared statistics per flow. If one tries to pack multiple shared statistics, the false cacheline sharing increases the performance issues.

2.    PerCPU statistics: With per CPU statistics as shown in Figure 2, the number of statistic copies increases with number of online CPUs. This results in increased memory requirement than would be required with shared statistics. With increase in number of simultaneous active flows, the total number of...