Browse Prior Art Database

Method to Manage Analytics Data Collection of Large System Workloads

IP.com Disclosure Number: IPCOM000241913D
Publication Date: 2015-Jun-08
Document File: 3 page(s) / 59K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is an analysis scheme for smarter software instrumentation data collection, implemented to help resolve potential customer workload issues. The method includes novel approaches for both data collection and data analysis.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 50% of the total text.

Page 01 of 3

Method to Manage Analytics Data Collection of Large System Workloads

Increasingly complex customer workloads require system efficiency and higher scaling. Large operating systems (OS) sometimes encounter resource usage imbalances that contribute to performance problems while running customer workloads. Such problems appear as resource shortage/contention, slow response time, performance degraded, input/output (I/O) issues, queue/backlog growth and other unhealthy situations within the OS.

The cause of the workload problem may be difficult to determine. Problems in one area of the system may cause symptoms in another area of the system. Some problems may be transient in nature. It may be difficult to catch the act when it happens.

Existing analytics tools such as trace and system dumps that collect data for analysis may not simultaneously capture the metrics of all the software components to provide a comprehensive view of the problem. A smarter data collection and analysis scheme is

needed.

The novel contribution is an analysis scheme for smarter software instrumentation data collection, implemented to help resolve potential customer workload issues.

Software components continuously interact with other components and compete for the same finite hardware resources. Resource shortages or bottlenecks may occur during the workload. Long-duration (e.g. 15 minutes) statistics averages normally do not provide adequate workload-specific details for conditions that are transient in nature.

An instrumentation of multiple software components providing synchronized analytics

data at relatively short periodic intervals will provide finer-grained component level statistics for the workload.

The novel analysis scheme sends the captured analytics metrics in real time to a monitoring application for safekeeping and analysis. The monitoring application searches for anomalies in the workload and initiates additional actions when necessary to raise an alert or to correct the condition.

The novel approach includes Smart Data Collection . Analytics metrics captured over

long durations (e.g., 15 minutes) may not be helpful for transient resource conditions. Detailed data collected more frequently at shorter durations is necessary to solve many efficiency issues.

Software components interact with other components competing for the same finite hardware resources. Periodic system-wide snapshots of analytic metrics from a group

of software components provide valuable component-level insight information that identifies the correlation between the different software components. A slowdown in

response in one component may be caused by shortage/bottleneck (e.g., locking) in another.

1


Page 02 of 3

Instrumentation metrics that can be collected by each component may include the following:

• Average/maximum component response times, queue lengths, etc. • Serialization contention within the component
• Significant efficiency/throughput-related events

This lar...