Browse Prior Art Database

Method for a scalable systems monitoring solution

IP.com Disclosure Number: IPCOM000205846D
Publication Date: 2011-Apr-06
Document File: 3 page(s) / 77K

Publishing Venue

The IP.com Prior Art Database

Abstract

Monitoring of system parameters like CPU usage, memory usage, network bandwidth usage, errors logged etc. is an integral part of any systems management solution. Current monitoring solutions employ some form of master-slave (client-server) principle, whereby there is one master server which gathers monitoring data from multiple client nodes. For monitoring n nodes, the master server needs to contact all the n nodes. As is already known, establishing communication with a node require resources (memory, processing power) on the master server. The more the number of nodes, the more the resources required on the master server. Further, with increasing number of nodes, it becomes all the more complex for master server to handle data coming simultaneously from all these nodes. The amount of resources required and the complexity in handling of data from multiple nodes simultaneously puts a limit to the scalability of a monitoring product.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 3

Method for a scalable systems monitoring solution

(This page contains 00 pictures or other non-text object)

Fig-1: Master-Slave model for monitoring

The above figure depicts the existing master-slave model for monitoring. For monitoring 5 nodes the master server needs to contact all the 5 nodes.

Proposed here is a scalable method for monitoring of hundreds and thousands of nodes in a cluster. According to the proposed method every node in the cluster monitors itself as well as exactly two other nodes. Each node has an agent running, which contains the necessary logic for monitoring and peer data gathering. Agent technology is well-known in the art and is the basis for all monitoring solutions. Further, as per the proposed method, for gathering monitoring data for the entire cluster (for all nodes), the master node contacts, starting from the first node, every third node in the cluster.

The invention is described further with the help of the following embodiment.

Monitoring agent is installed and running on all the nodes in the cluster.

The agent on a specific node decides its peers based on either the IP address or the host-name.

Normally in a cluster or a data-center with thousands of nodes, the IP addresses or the

host-names of the nodes follow some well defined sequence. The invention makes use of this well defined sequence for this embodiment.

For example, if the node IP addresses are in the range 192.168.1.1 to 192.168.1.5, for total number of nodes equal to 5 (m=5) then usually the nodes are allocated the IP addresses in a sequence like 192.168.1.1, 192.168.1.2, 192.168.1.3 and so on. As per the invention if a

particular node's IP address is 192.168.1.2, then this particular node will also monitor node with

IP address 192.168.1.1 and 192.168.1.3. In other words the node with IP address 192.168.1.2 is the aggregator for nodes 192.168.1.1 and 192.168.1.3. The relevant logic is embedded into the agent code itself. As for the data communication between monitoring agents, it is well known in the art...