Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Context Aware Diagnostics and Error Correction

IP.com Disclosure Number: IPCOM000224134D
Publication Date: 2012-Dec-11
Document File: 5 page(s) / 81K

Publishing Venue

The IP.com Prior Art Database

Abstract

This article describes a context aware diagnostic and error correction mechanism for distributed solutions spanning multiple products. The mechanism is dynamic and self adjusting in nature for a given customer environement and aims towards improved time to value.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 34% of the total text.

Page 01 of 5

Context Aware Diagnostics and Error Correction

Products like IBM Smart Analytics Systems which involve multiple products from various vendors face customer problems which are difficult to recreate, transient in nature and are usually not caught in test environments. For example, Systems Director is one such component. Discovery is one of the major tasks that it performs and involves multiple mixed vendor products and fails for some specific end-points and succeeds for others. As a result, a lot of time is spent in recreation, troubleshooting and analysis resulting in a lot of time and money being spent. The diagnostic and corrective approaches play a crucial role in timely and cost-effective resolution of the problem. Current approaches are naïve and simplistic and provide limited help. Some frameworks provide levels of logging and some provide component specific logging but they do not suffice for the issues resolved above. Too much logging cannot be done due to efficient reasons. There is a need of a smart mechanism which not only provides the required details but decides the level of details and also solves the problem if needed on a per environment basis. Reduction in time to value is the key factor.

Problem Scope
Most Customer reported issues

Are difficult to recreate - user may not even know how the Application state/data ended corrupt


Are transient - may not always occur using the same set of test cases


Require significant amount of time to recreate due to complex and time consuming steps


Are specific to customer environment, not producible in support test environment


May repeatedly hit in a given customer environment


Customer Support teams

Have limited access to the customer environment in order to debug


May have to visit customer labs


Solution in question

Is a combination of various vendor products


Has advanced versions in market but customer can not move in yet


Is legacy

Current Solution approaches
Provide logging mechanisms inline with code


Provide various kinds of logs to support team to debug a reported problem


Provide enabling additional loggers on request by support teams


Retry actions automatically for a certain number of times before they error out (most of them by


design)

Have definite life time for support


Have support team to provide resolutions of errors due to environment issues, hot patches for


known/new issues, upgrade of solution application to higher fix packs
Are combination of various vendor product and versions making the problem debugging difficult

The following steps are usually followed once an error has occurred:

Capture logs - send to support team


Enable additional loggers; re-run the test cases; resend logs to support team


Enable additional loggers; wait for the problem to create when there is no definite method known to


create a problem or due to time constraint
All the above consume additional resources in terms of


Time - test/retest/wait


Additional and unwanted impacts


Frustration on delays


Addi...