Browse Prior Art Database

Method for Analytical Prioritization and Handling of Interrupted Failure Data Capture

IP.com Disclosure Number: IPCOM000247690D
Publication Date: 2016-Sep-27
Document File: 4 page(s) / 495K

Publishing Venue

The IP.com Prior Art Database

Abstract

A method for analytical prioritization and handling of interrupted failure data capture is disclosed. The method allows for the interruption of failure data collection (dumps, logs, etc) and ensures the data collected up to the point of the interruption is readable, usable, and also relevant to the failure.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 4

Method for Analytical Prioritization and Handling of Interrupted Failure Data Capture

Disclosed is a method for analytical prioritization and handling of interrupted failure data capture. The method allows for the interruption of failure data collection (dumps, logs, etc) and ensures the data collected up to the point of the interruption is readable, usable, and also relevant to the failure. To ensure relevant data is identified if the collection is interrupted, analytics are used to auto generate rules for the data collection. This approach solves the issues of time restrictions when systems are needed to be recovered as soon as possible. In addition, this approach avoids incomplete or useless data issues when data interruption occurs due to user or external intervention.

A constant issue in today's large enterprise systems, is to ensure the right data is collected on a failure. In most cases, the system software simply tries to capture everything. In a lot of cases, this is just too long for a system owner to wait, so they will interrupt the data collection (ctrl^c, system power off, power cycle, etc.). Most times, these actions cause the data which was collected to be lost or unusable.

Disclosed is a method that dynamically determines the most critical data to collect during a system failure and ensures whatever data is collected prior to interruption is complete and usable. During failure data collection, an option is provided to cancel the collection. If the collection is cancelled, this action triggers a signal to the collection process to gather remaining relevant data. The rules to collect the most valuable data are built into the code after being auto-generated by analytics that looks at what developers use the most to isolate this specific type of failure, most recent software changes in the area of the failure, and other analytic data.

After collecting the most valuable data, the collection process ensures the data is in a readable and usable format, and exits.

There are 3 major pieces:

When a failure does occur and the debug data is sent back to development for

       1.
debug, a program analyzes all debug data used by development to get to the root cause of the issue. This is an indication of which data is most critical for this particular failure signature.

The program analyzes the different internal software components that look at the

        2.
debug data. This is an indication of which software components should have their debug dat...