Browse Prior Art Database

System and method to sense and isolate the problematic node in distributed performance test environment

IP.com Disclosure Number: IPCOM000229536D
Publication Date: 2013-Aug-06
Document File: 8 page(s) / 273K

Publishing Venue

The IP.com Prior Art Database

Abstract

Using network sniffer and linear function to sense and isolate the problematic node in distributed performance test environment. Assume there is a appromimately linear relationship between clilent and certain node(server or midway node), increase the workload till the network package violate the linear function, and then, we can find the bottleneck node. the traditional way of distribution system monitoring, either log in and check the system to check the log, or install the monitor agent to gather the performance result, they both will take much more effort. This Artical introduce a methodology that using the network and linear function to sense and isolate the bottleneck node. It is more quickly, automatically, less effort, and less system resource consumption

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 44% of the total text.

Page 01 of 8

System and method to sense and isolate the problematic node in distributed performance test environment

This invention gives a system and method to sense and isolate the problematic node in the distributed performance test environment .

Today, in order to provide high performance and high reliability services, more and more IT systems are designed as distributed systems. The node exceeding its capability under the high workload becomes the bottleneck of the whole system. People want to find out where is the bottleneck quickly during performance testing period.

Traditionally, there are two ways. One is to log in and check the log files in each node, however, if it is a large distributed system, it will take much more efforts and spend lots of time, it's usually a heavy manual work. Another way is to install monitor agent on the distributed system, and then people can get detailed information including the bottleneck indication quickly. Unfortunately, installing the monitor product which need to embed/integrate with the server instance will consume the system resources and reduce the performance of the whole system, it influences the performance testing result obviously.

So, we need a system and method that can help us to sense and isolate the node of bottleneck in the whole system quickly, automatically, with less system resource consumption.

There are many nodes in the distributed performance test system, each of nodes has its input and output network packages. The Load generator(client) send out the events(or to say, send out the requests).

We assume that there is a approximately linear relationship between the event send out from client and the package that send out from certain one node

from certain one node.

                .For example y=bx+a, x represents the event which is sent out from the client, and the y represents the network package which was sent out from certain one node. We also assume that the linear relationship should not significantly change whether under low workload or high workload when the whole system is under healthy condition.

When the workload is low and the system is healthy, we sniff and capture the network packages on each node(the sniffer can be installed on the other servers with less influence on the testing systems), build a linear relationship function, and then we can compute the parameters "a" and "b" of the linear function. Next, we increase the workload and exceed the system capacity, some error will occur, meanwhile we can find that in some nodes, the previous linear relationship is not well obeyed, many nodes violate the linear function seriously, the first node that violates the function is probably the node of bottleneck in the whole system. It is a better way which is used to find out the bottleneck more quickly, automatically, with less system resource consumption.

[The figure 1 shows the topology of the sample system, illustrates the relationship between the events that is sent out from Load generat...