Browse Prior Art Database

Self-Activate Back Tracing

IP.com Disclosure Number: IPCOM000125654D
Original Publication Date: 2005-Jun-10
Included in the Prior Art Database: 2005-Jun-10
Document File: 3 page(s) / 237K

Publishing Venue

IBM

Abstract

Disclosed is a device for recording system information, which is generally called Log or Trace. To record as detail as possible without performance impact to the system, it employs two threshold levels, three groups of data and a buffer to keep data temporarily. This method realizes optimized trace data recording to shorten time to solve a problem.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 56% of the total text.

Page 1 of 3

Self-Activate Back Tracing

Disclosed is a device for recording system information, which is generally called Log or Trace. Log and trace data are used to record system behavior and determine system problem. Usually each trace data has importance, and trace data are divided into two groups by a threshold level. One group is the "recorded group". As shown in Figure 1, all trace data which have higher importance than the level are categorized this group and saved to permanent storage, like a file. The other is "discarded group". Rest of the trace data are categorized this group and discarded.

Recorded group

Discarded group

Recorded group

Discarded group

7

7

4

4

6

6

7

7

6

6

Classify

Threshold level = 5

Classify

Threshold level = 5

6

6

4

4

1

1

4

4

1

1

1

1

7

7

Trace data (Num ber is im portance)

Trace data (Num ber is im portance)

Threshold level System

Failure

Threshold level System

Failure

time

time

Normal operation

Normal operation

Time period of reproduction and determination of the problem

Time period of reproduction and determination of the problem

Figure 1: Current Trace data categorization

Using the level, system can record only important data and discard unnecessary data. By increasing the level, only few very important data can be recorded. Although this needs small space to save data and has low impact to system performance, less data makes problem determination difficult. By decreasing the level, more data can be recorded that are required to understand what was happened in the system when the problem was occurred. But it needs large space to save data and has high impact to system performance.

In a typical production environment, the level is set high because of performance consideration. As described by Figure 2, if there is a system failure, the activity to reproduce the problem with low threshold level might be required, because the amount of recorded trace data is small. In other words, it needs the system failure again to determine, analyze and solve the problem. This is the issue this disclosure solves.

Figure 2 : System failure and reproduction activity.

The goal is as follows:

- In the normal operation, unnecessary trace data must be discarded.

- At a system failure is occurred, all of the data right before the failure are recorded.

To solve the issue, This method employs two threshold levels, and divide trace data into three groups. They are "Critical group", "Normal operation group" and "Detail...