Browse Prior Art Database

An effective method of detecting abnormally slow response time situations

IP.com Disclosure Number: IPCOM000127579D
Original Publication Date: 2005-Sep-02
Included in the Prior Art Database: 2005-Sep-02
Document File: 3 page(s) / 34K

Publishing Venue

IBM

Abstract

A method is disclosed that can distinguish abnormally slow response time situations from others happening many times even in usual operations. The slow response time situations indicate that some abnormal event happened. The method has two major stages. In the preparation stage, a set of statistical values on response times are gathered during usual operations. In the execution stage, the method can be used both in a real-time monitoring and in a batch analysis of transaction response time logs. In the former, a program monitors the response time of transactions and evaluates them with the set of prepared values so that it can distinguish abnormally slow response time situations from others. In the latter, a program analyzes numerous response time data for a day, detects the times when abnormally slow response time situations happened, and evaluates whether the response time of the transactions for the day was abnormal or not from the stochastic viewpoint.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 29% of the total text.

Page 1 of 3

An effective method of detecting abnormally slow response time situations

A method is disclosed that can distinguish abnormally slow response time situations from others happening many times even in usual operations. The method has two major stages. In the preparation stage, the summary of steps is as follows.
(1) Select transaction IDs that represent the target system and gather response time data of them for several usual days.
(2) Determine threshold values, or 95 percentile response times, that are regarded as the boundary values of slow response time for each selected transaction ID and each time interval that has different operational characteristics.
(3) Determine the number of transactions in processed ones for a monitoring interval , which is considered to be very rare or abnormal from the stochastic viewpoint when their response times are slower than the threshold value determined in step (2).
(4) Determine the number of intervals in a day, which is considered to be very rare or abnormal from the stochastic viewpoint when they are considered to be abnormal in step (3).
(5) Adjust the values determined in step (3) by analyzing actual cases to investigate how many transactions in processed ones are slower than the threshold value for the time interval and the transaction ID.

In the monitoring or analyzing stage, the following steps are done.
(6) Compare the response time values of selected transactions with the threshold values determined in step
(2) for the execution time interval and the transaction ID, and classify them into normal and slow response transaction classes.
(7) Compare the count of slow transactions with the threshold count determined in step (5) in the pair of the processed transaction count to check whether an abnormally slow response time situation happened.
(8) Compare the count of the above abnormally slow response time situations for the day comparing the value determined in step (4) to check whether some abnormal event happened.

The following is the detail of each step. The above brief descriptions are repeated for convenience.
(1) Select transaction IDs that represent the target system and gather response time data of them for several usual days.

Select transactions that are processed more than one thousand times a day by rule of thumb.

Select transactions that have as different peak time as possible.
(2) Determine threshold values, or 95 percentile response times, that are regarded as the boundary values of slow response time for each selected transaction ID and each time interval that has different operational characteristics.

 The 95 percentile response times, which are examples, will be used in a later evaluation step with the count of slower response time transactions. The reason why threshold values for time intervals are used instead of a daily threshold one is that the differences of response time distributions are very large depending on the other processes running concurrently. Using hourly interv...