A system resource usage tracking tool in a cluster environment
Original Publication Date: 2002-Oct-12
Included in the Prior Art Database: 2003-Jun-21
A tool is disclosed that keeps track of cluster computer system resource usage history in a quantitative way. In monitoring or testing a cluster environment, keeping track of the system resource usage history in a quantitative way is very important in analyzing workload balancing of the system software and the effectiveness of test workloads. For example, in an IBM SP (Scalable POWERparallel ® computer system, we can check the instantaneous system status or check the system error report entry for major system errors by using AIX ® (Advanced Interactive eXecutive commands. However, system administrators and testers are also interested in how the system resources, such as CPU, memory, disk IO and network, have been used while the systems are running unattended. The available tools cannot provide detailed data about system resource usage history. This invention is intended to provide a tool to keep track of system resource usage history, such as CPU, memory, disk IO and network, in a quantitative manner. This invention can be implemented by using a client/server model. A central console node, which controls the start/stop/view of the system resource usage, can be selected from any node in the cluster environment. From the central console node, any interested resource usage monitoring can be started by starting a background process on the interested node(s). The interested resources usage as a function of time is recorded on each node. Whenever it is necessary, the data stored on an individual node is copied over to the central console node and viewed in a graphical manner on the central `console node. Monitoring process can also be stopped from the central console node. Since the system resource usage history data can be collected without much human intervention, the cost of administrating or testing the system can be significantly reduced. The collected data can be used to analyze how the CPU, memory, disk IO and network, etc., have been used, how effective the workload distribution software is, or how effective test workloads are in stressing the system resource. This tool may even be able to detect the intrusion of hackers by comparing the normal system resource usage and unusual system resource usage pattern.