Browse Prior Art Database

Dynamic Workload Assignment for Network Monitoring

IP.com Disclosure Number: IPCOM000125488D
Original Publication Date: 2005-Jun-03
Included in the Prior Art Database: 2005-Jun-03
Document File: 4 page(s) / 104K

Publishing Venue

IBM

Abstract

Business processes are indirectly supported by network resources, therefore network monitoring is a fundamental disclipline in business process management. As the technology supporting the business processes evolves, there is the need to represent far more complex dependencies between the processes and the monitored resources in the network. As a result, there is a need to monitor a much larger number of resources in the network. With the advent of autonomic provisioning for hardware and software, the number of resources to be monitored is not only larger but also much more volatile. The challenge of dynamic allocation of network monitoring resources in a TCP/IP network must be addressed in order to cope with a large and dynamic installed base of network devices. An initial solution to this problem is the static allocation of resources in a distributed way, so that the network monitoring resources are assigned to certain hosts during configuration time. These hosts become responsible for monitoring the network areas assigned to them. The drawbacks are extended planning periods and the necessity to overprovision resources to cope with spikes and fluctuations in the number of network devices to be monitored and in the activity level of these devices.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 33% of the total text.

Page 1 of 4

Dynamic Workload Assignment for Network Monitoring

The core idea of the invention to shift network monitoring workload dynamically across monitoring engines according to three factors: (1) spare processing capacity at the monitoring engines, (2) monitoring cost for a set of network devices, (3) network 'distance' between the monitoring engines and the resources to be monitored.

Network distance on an IP network can be measured in several ways. For the purpose of this invention, we have chosen the latency between the of hops, which is the number of nodes between two devices; or in terms of latency, which is the time elapsed between an ICMP PING request send from one node and the arrival of the subsequent response from the second note.

The devices on an IP network can be discovered and monitored in several ways, through IP polling, through router queries, or monitoring of SNMP traps.

This invention utilizes two main building blocks:
a) A Monitoring Engine, or Monitor, which can poll the network devices connected to an IP router to determine their status and eventually notify a management application, such as a network topology server or an event management solution.
b) A Provisioning Service component, which can manage the work assignment to Monitoring Engines and control the lifecycle of Monitoring Engines to reflect the varying loads in the network. For instance, the Provisioning Service may start a new Monitoring Engine within a host in a clustered environment if a workload spike exceeds the capacity of the current Monitoring Engines on the network.

Assume an IP network arranged as in Figure 1 and an initial set of 3 network monitoring agents:

Each Monitor periodically informs the Provisioning Service component about the following metrics:
a) Average network latency to network devices within a sub-network (this should be done through sampling of a few addresses instead of using an exhaustive scanning process)
b) Number of network devices under each network tree assigned to its monitoring
c) Total elapsed time to monitor all network devices under each network tree assigned to its monitoring

Based on system configuration and runtime information provided by the Monitoring Engines, the Provisioning Service holds the following data tables during its runtime:
a) Network latency between Monitoring Engine and network tree
b) CPU utilization on each Monitoring Engine
c) Number of network devices for each network tree assigned to a Monitoring Engine
d) Total processing time for a network tree under a Monitoring Engine
e) Maximum polling duration for each network tree. This is a configuration value and when exceeded by a Monitoring Engine for a network tree, the network tree should be split up and reassigned among the remaining Monitors

The Provisioning Service component uses tables "a" and "b" to determine whether work reallocation should involve reassignment of a network tree to a new Monitor (CPU utilization exceeds a certain threshold) or reassig...