Browse Prior Art Database

Memory Error Detection Using Light-Weight Software Approach

IP.com Disclosure Number: IPCOM000126967D
Original Publication Date: 2005-Aug-16
Included in the Prior Art Database: 2005-Aug-16
Document File: 2 page(s) / 24K

Publishing Venue

IBM

Abstract

Problem: Need the ability to apply threshold to certain errors when detecting/logging.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

Memory Error Detection Using Light -Weight Software Approach

     Contribution: Light-weight software method for detecting and characterizing memory errors in systems without full hardware support for detection and reporting of such errors.

Benefit 1: Eliminates need for intrusive high polling rates. This improves system performance.

Benefit 2: Detects critical errors within a single polling interval. This improves reliability of the system.

     Our approach is to improve the timeliness of error detection in system service routines that use polling in lieu of interrupt generation. If an error is detected during the periodic system service a focused characterization of memory is made. This characterization allows potentially critical errors to be detected within a single polling event. This method is intended for systems that have the ability to detect memory errors, but not the ability to invoke a system service to characterize the severity of the error. The system is assumed to be able to capture the address of at least one of the memory accesses which caused the detected memory error. For example, the address of the first or last error. A high level description of the steps is listed below.
1) Periodic invocation of system service.
2) System service check for presence of error.
3) If error exists, address of at least one of the errors obtained by system service.
4) Clear error status.
5) A tight loop is executed to access the failing address. Following each access, the cache line containing the failing address is flushed using CLFLUSH instruction or equivalent.
6) Following each cache line flush, check for presence of memory error and increment error count as required.
7) Clear error status.
8) If the number of errors is greater than an acceptable th...