Browse Prior Art Database

Context Aware Diagnostics and Error Correction Disclosure Number: IPCOM000224134D
Publication Date: 2012-Dec-11
Document File: 5 page(s) / 81K

Publishing Venue

The Prior Art Database


This article describes a context aware diagnostic and error correction mechanism for distributed solutions spanning multiple products. The mechanism is dynamic and self adjusting in nature for a given customer environement and aims towards improved time to value.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 34% of the total text.

Page 01 of 5

Context Aware Diagnostics and Error Correction

Products like IBM Smart Analytics Systems which involve multiple products from various vendors face customer problems which are difficult to recreate, transient in nature and are usually not caught in test environments. For example, Systems Director is one such component. Discovery is one of the major tasks that it performs and involves multiple mixed vendor products and fails for some specific end-points and succeeds for others. As a result, a lot of time is spent in recreation, troubleshooting and analysis resulting in a lot of time and money being spent. The diagnostic and corrective approaches play a crucial role in timely and cost-effective resolution of the problem. Current approaches are naïve and simplistic and provide limited help. Some frameworks provide levels of logging and some provide component specific logging but they do not suffice for the issues resolved above. Too much logging cannot be done due to efficient reasons. There is a need of a smart mechanism which not only provides the required details but decides the level of details and also solves the problem if needed on a per environment basis. Reduction in time to value is the key factor.

Problem Scope
Most Customer reported issues

Are difficult to recreate - user may not even know how the Application state/data ended corrupt

Are transient - may not always occur using the same set of test cases

Require significant amount of time to recreate due to complex and time consuming steps

Are specific to customer environment, not producible in support test environment

May repeatedly hit in a given customer environment

Customer Support teams

Have limited access to the customer environment in order to debug

May have to visit customer labs

Solution in question

Is a combination of various vendor products

Has advanced versions in market but customer can not move in yet

Is legacy

Current Solution approaches
Provide logging mechanisms inline with code

Provide various kinds of logs to support team to debug a reported problem

Provide enabling additional loggers on request by support teams

Retry actions automatically for a certain number of times before they error out (most of them by


Have definite life time for support

Have support team to provide resolutions of errors due to environment issues, hot patches for

known/new issues, upgrade of solution application to higher fix packs
Are combination of various vendor product and versions making the problem debugging difficult

The following steps are usually followed once an error has occurred:

Capture logs - send to support team

Enable additional loggers; re-run the test cases; resend logs to support team

Enable additional loggers; wait for the problem to create when there is no definite method known to

create a problem or due to time constraint
All the above consume additional resources in terms of

Time - test/retest/wait

Additional and unwanted impacts

Frustration on delays