Browse Prior Art Database

Adaptive Middleware Logging Using Trace Trees Stability

IP.com Disclosure Number: IPCOM000010762D
Original Publication Date: 2003-Jan-16
Included in the Prior Art Database: 2003-Jan-16
Document File: 4 page(s) / 73K

Publishing Venue

IBM

Abstract

Disclosed is a method for structuring and viewing computer program tracing data as trees that allow easy identification of unusual behaviour based upon the compression of the most frequent sequences of trace data identifiers.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 37% of the total text.

Page 1 of 4

Adaptive Middleware Logging Using Trace Trees Stability

In servicing middleware products a constant and real problem is differentiating 'normal' diagnostic data from data that relates to problematic running of the system and also dealing with the shear volume of problem data. This problem occurs particularly with trace and log file entries.

    The current common practice is to provide a controllable level of detail in trace output.

    However, having a very detailed tracing system is inhibitive to performance and often makes it time consuming and difficult to find the start of a problem. Also turning up full trace details often perturbs the system enough to make timing or stress problems 'go away'. On the other hand having 'chunky' trace often does not give enough data to find or fix a problem or more importantly work out the initial reasons the system generated an error. Additionally, if a problem only occurs on a customer site or takes many hours to re-produce, having to request a re-run with additional tinkering to the tracing levels is undesirable.

    This idea is a novel way to abstract and structure trace events that allows patterns of behaviour in the product to be more easily observed by the trace subsystem and usual and unusual patterns to be recognized in a simple enough manner that this can be done 'real time' by the RAS system itself while the system is running. These patterns can be reacted to while any problem is beginning to occur. This allows an improved degree of data capture to occur while the patient is running towards the cliff rather than relying on a post mortem on the beach or requesting the patient to repeat the failure (not always possible).

    The core of the invention is to use event abstraction to allow the imposing of a tree like structure on the stream of trace events (perhaps occurring on different threads or at different times) and thereby recognizing 'similar' events by their context within the tree. Once the system can recognize the overall 'location' of a trace event (not just based on the event ID) and having somewhere to store data on a series of such 'similar' events (such as the statistical likelihood of what the next event will be) one can build a model of 'usual' (perhaps with probability attached) current and future events based on current trace tree context and allow the system to react to the 'unusual' - all without these 'unusual' events being manually identified by the programmer.

    Building a 'tree' of trace events has been done before - performance groups sometimes summarize profiling data by compressing a trace file into a tree like structure - but processing events into a graph in real time and using this graph to recognize unusual events and change the event capturing system based on the model built so far is novel (we believe).

    A simple function based on current trace tree context is described below but further elaborations based on recent trace events can be imagined.

Page 2 of 4

    For example,...