Prioritizing Data Collection in a Constrained-Memory System
Original Publication Date: 2009-Dec-03
Included in the Prior Art Database: 2009-Dec-03
Disclosed is a method for capturing the most important debug data when the work space available is limited.
When a system fails, it is often desirable to keep the system as operational as possible, yet still collect debug data so causal analysis can be performed and ultimately the failure can be corrected. These two goals are typically in conflict. Some buffer space can be set aside for capturing debug data, but in a system with constrained resources, the space dedicated for such a buffer may be much less than the space used to perform normal system work. Consequently, desired debug data can exceed the buffer space reserved for holding it. Traditional methods to deal with this problem are:
Discard additional debug data when the buffer space is exhausted.
Steal space from a subsystem so all debug data can be stored.
Use an external system to drive debug data collection.
The first method causes the last debug data identified to be lost, regardless of its importance or volatility. The second method suspends the subsystem whose space is stolen, followed by a restart and recovery of that subsystem. The third method requires an external system to have access to the candidate data on the failing system and to have some knowledge about which data to collect. This typically leads to slower and excessive data collection, and may require a suspension of all activity on the failing system, followed by a restart and recovery of that system.
A fourth method is to prioritize debug data as it is identified by the failing system, temporarily storing a copy of the most volatile data first, and deferring collection of large less-volatile data. With this method, when debug data is identified, not only is the data's location determined, but so to are its persistence characteristics and priority relative to other debug data. From this information, the data collection program can determine the best use of the limited space reserved for debug data, deferring collection of some data to when the full data set is being off-loaded from the system. This method permits the system and all its subsystems to continue running during the capture of debug data, reduces the space needed to buffer debug data and the time needed to store volatile debug data, and provides more of the desired debug data for problem analysis. This method depends on some debug data being kept unbuffered in the live system until the full data collection is off-loaded to external storage.
To defer some data capture, a data buffer is preallocated from the memory space available to the system during system initialization. Since debug is expected to be a relatively rare occurrence, and memory is a precious resource, only a small fraction of the total memory available to the system is dedicated to this debug data buffer. If a new debug data capture is requested while the system is running, all data desired for that collection is identified as to its type of data, priority, location and, if possible, its size (see Figure 1). For data whose size cannot be readily...