Memory Error Detection Using Light-Weight Software Approach
Original Publication Date: 2005-Aug-16
Included in the Prior Art Database: 2005-Aug-16
Problem: Need the ability to apply threshold to certain errors when detecting/logging.
Memory Error Detection Using Light -Weight Software Approach
Contribution: Light-weight software method for detecting and characterizing memory errors in systems without full hardware support for detection and reporting of such errors.
Benefit 1: Eliminates need for intrusive high polling rates. This improves system performance.
Benefit 2: Detects critical errors within a single polling interval. This improves reliability of the system.
Our approach is to improve the timeliness of error detection in system service
routines that use polling in lieu of interrupt generation. If an error is detected during the
periodic system service a focused characterization of memory is made. This
characterization allows potentially critical errors to be detected within a single polling
event. This method is intended for systems that have the ability to detect memory
errors, but not the ability to invoke a system service to characterize the severity of the
error. The system is assumed to be able to capture the address of at least one of the
memory accesses which caused the detected memory error. For example, the address
of the first or last error. A high level description of the steps is listed below.
1) Periodic invocation of system service.
2) System service check for presence of error.
3) If error exists, address of at least one of the errors obtained by system service.
4) Clear error status.
5) A tight loop is executed to access the failing address. Following each access, the cache line containing the failing address is flushed using CLFLUSH instruction or equivalent.
6) Following each cache line flush, check for presence of memory error and increment error count as required.
7) Clear error status.
8) If the number of errors is greater than an acceptable th...