Method and Apparatus for Consolidation of Multiple Errors into a Single Log Entry in a Distributed System
Publication Date: 2012-Jul-03
The IP.com Prior Art Database
Disclosed is a method and apparatus for consolidation of multiple errors into a single log entry in a distributed system.
Page 01 of 2
Ȉ ˇ ˄
In a distributed computing environment, such as a stacked network switch, a single command from an external interface may be required to be split into multiple internal commands that are distributed to individual compute elements for processing. Each of the individual commands
may fail, resulting in an error being reported through the software back to the external interface. Typically, any such errors are simply reported as pass/fail to the external interface, and there is
no indication or correlation to any error log entries that may have been created during processing. This makes a problem very hard to analyze and debug for service/support/development personnel.
Another alternative to handle the situation would be simply consolidate all information from the failures into one log entry. This alternative has numerous problems, such as truncation issues if the log entry is a limited size or having one serviceable event for multiple serviceable failures. This would cause failures to go unreported and, therefore, unresolved by traditional service methodologies.
This invention provides a mechanism to have individual log events as well as a single response event that correlates all of the individual entries making correlation simple, thereby reducing the need for detailed analysis and guessing as to the root cause of a fail response without losing data or serviceable events.
To solve the potential problems described, this invention introduces the concept of a summary log entry and rules on creating, committing, and summarizing log entries on distributed commands.
The normal error log entry consists of data related to a particular failure - what failed, what part needs to be serviced (if any), any recovery actions, and extended error data that is unique to that failure. All of this data is packaged into an error log entry and assigned an identifier unique to that log entry. The summary log entry is a special class of log entry where it contains a set of error log IDs referencing each of the individual error log entries that have been created in this call chain as well as extended error data relevant to the command being processed.
In addition to the unique summary entry, this invention includes rules about managing the summary and basic entries. These rules include the following.
If an API is distributed across multiple computing elements or can be targeted to multiple hardware entities, it must return a summary entry
If an API is known to access only one compute element and only one hardware entity
within the domain of that compute entity, it may return a singular entry
If it is unknown what the API can access, it must return a summary entry