Browse Prior Art Database

Adaptive Threshold and Data Protection for Distributed Tracing Systems

IP.com Disclosure Number: IPCOM000236743D
Publication Date: 2014-May-14
Document File: 6 page(s) / 91K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a system to improve the efficiency of collecting and saving large amounts of performance events in a distributed tracing system. The novel contribution is a distributed tracing algorithm comprising multiple data acquisition nodes and data collectors that use adaptive load control and data protection to selectively save traces based on the associated execution times.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 41% of the total text.

Page 01 of 6

Adaptive Threshold and Data Protection for Distributed Tracing Systems

In distributed tracing systems, collecting and saving large amounts of performance events can create significant impact in performance. Some current systems implement a sampling algorithm in order to limit the amount of traces saved and to minimize the performance impact of data collection. The problem with this approach is that it randomly discards traces, including those that take more than the usual amount of time to execute and are more likely to be the target of an investigation . It also does not allow precise measurements of resource utilization, as there is no way to evaluate the source of the traces that were discarded.

The novel contribution is a distributed tracing algorithm comprising multiple data acquisition nodes and data collectors that use adaptive load control and data protection to selectively save traces based on the associated execution times . Instead of randomly sampling- out individual traces, the approach is to continuously measure the elapsed time of each trace segment , dynamically measuring and discarding those segments that fall below a configurable adaptive threshold. The system also protects important events that cannot be discarded. This approach provides precise measurement of resource utilization because key trace events in the system can be protected and counted.

Distributed tracing systems are typically based on individual trace events carrying individual timestamps. Once a trace event is generated and saved, the next trace event has no recollection of the previous timestamp. Calculations of elapsed times can only be performed after-the-fact, during collection or analysis phase.

The distributed tracing algorithm uses pairs of start and stop events as individual timers. Once started, timers are kept in memory until stopped. Elapsed time calculations are then performed on the fly, allowing timers that are too short to be discarded on the spot, only saving important timers. This strategy also allows the processing of nested timers and the calculation of self-time measurements.

The novel approach allows the system to filter-out trace events based on associated significance in terms of execution time, instead of random sampling.

Figure 1 describes the topology of a typical distributed tracing system. Instrumented processes generate traces that are filtered and temporarily saved in local collector storage (1). These traces are then sent to one or more remote collectors (2) that process and save the data in a centralized database (3). The items in GREEN (1 and 2) are the parts related to this disclosure.

Figure 1: Network topology of a typical distributed tracing system

1


Page 02 of 6

The distributed tracing algorithm introduces a new method for filtering (1) and transferring of data (2) using adaptive load control and data protection to selectively save traces based on their elapsed times.

Filtering (1)

Existing distributed tracin...