Performance bottleneck detection with regression analysis for web-based applications
Original Publication Date: 2005-Apr-08
Included in the Prior Art Database: 2005-Apr-08
Disclosed is a system and methods that monitors component response time and resource utilization and uses regression methods to determine performance bottleneck in a complex web-based application. The method then computes a weighted contribution from each utilization factor and is able to predict component response time using linear regression analysis. The combined weight factor can be used to determine the performance bottleneck and recommend optimization solution for the entire application.
Performance bottleneck detection with regression analysis for web -based applications The problem
The performance of a complex system depends on many factors. In this type of system, the amount of available data may also overwhelming. Relations between these observable data and the system performance are inherent. If these relations can be uncovered, there may be an ability to respond to system anomalies more effectively thereby improving productivity and reducing system service interruption.
Web applications are typical manifestations of this type of problem wherein the response time to a request from browser may be the aggregation of processing time of web server, application server, database queries and various backend systems. The complexity of business logics in web applications makes it impossible to use a static formula to represent the behavior the system, therefore statistic of models are used to determine the root cause of the problem based on collected performance data.
The assumption is that system response time is a linear or close-to-linear function of one or more independent variables plus an error term that accounts for all other factors not captured by the identified variables. The assumption is reasonable in most cases because time itself is linear. In cases where observable data are not time based (e.g. memory usage), an error term attempts to capture the effect of the higher order terms as described below.
This method attempts to uncover the change in system response time in relation to the change in the independent parameters:
R = a1 * v1 + a2 * v2 + a3 * v3 + ... + ak * vk + E
Where the dependent variable R is:
R: Response time of a transaction in consideration
v1, v2, v3, ..., vk: measurable variables, could be elapsed time
E: error term
The coefficients are to be determined (they indicate how a change in these independent variables affects values of the dependent variable):
a1, a2, a3, ..., ak
For a web application, the variables can be independent, measured data, collected from various components of the system, i.e. web server, application server, database and remote systems (such as SAP). Regression analysis is used because the actual values of the coefficients need not to be estimated. Regression analysis will render these values instead.
The next sections details the procedure to identify independent variables, data collection, and analysis.
A series of steps must be performed such as the following:
Determine the driving factors and quantify the influences of each variable. Usually this involves analysis of the transaction path and identifying the service time taken within each component.
Collect data from various logs on the system. The logs include application logs (for each software within the entire software stack), resource utilization logs (by operating system) etc. The data extracted may include application based event elapsed tim...