Browse Prior Art Database

Fault-tolerant self-replication of firmware image

IP.com Disclosure Number: IPCOM000022037D
Original Publication Date: 2004-Feb-20
Included in the Prior Art Database: 2004-Feb-20
Document File: 2 page(s) / 55K

Publishing Venue

IBM

Abstract

Common designs providing firmware fault-tolerance involve systems failing over to a previous valid level of code, when their current level fails. However, doing this exposes the system to the bugs and vulnerabilities that the current level was fixing. This article suggests a solution to this problem.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

Fault-tolerant self-replication of firmware image

Most modern embedded microprocessor designs include an EPROM that retains the firmware while the system is powered off. EPROMs are slow devices, so the firmware is usually copied into a faster RAM at power-up time (and sometimes after a reset or firmware update). As EPROM memory is inexpensive, a known technique is to choose an EPROM large enough to hold two copies of the firmware; one copy is designated the primary image and the other is designated the secondary image. Each copy includes a checksum so that EPROM failures can be detected. If the primary image becomes defective then a hardware circuit or firmware algorithm will detect a checksum error and then automatically switch over to use the secondary image.

    It is common design practice to protect against download of a bad firmware image. This can happen due to human error (downloading an invalid image) or if a spurious electrical problem occurs during the download operation. A good design, already known and used in existing products, is to download the new firmware image to the secondary half of the EPROM. When the download is complete, it is verified by checking that there is no checksum error, then a pointer is changed to "swap over" the primary and secondary images. After a download, the new firmware is designated the primary image and the old firmware is designated the secondary image.

    However there is a problem with this approach. If the EPROM subsequently becomes faulty and a defective bit appears in the primary image, then after the next power-cycle the primary image will fail the checksum test and so the secondary image will be used. But this means that the firmware level will regress to an older version. This may mean that important fixes and enhancements are "lost". This article addresses and overcomes this problem.

    In order to preserve the fixes and enhancements that have been introduced with the downloading of a new level of code, our aim is to replicate the contents of the new primary image into the secondary area. The method described here allows this replication to happen automatically and without involving the system in further download operations - instead the device can perform its normal operations and manage the replication as a specific task. Thus, the replication is done over a period of time, and this process is managed by an algorithm in the new firmware.

    Once the processor is up and running, it will periodically check the progress of the replication process. If the microprocessor is multi-threaded, then a single thread may be responsible for this. On a single threaded microprocessor, the main execution loop will have to apportion some time to copying the data from the primary image to the secondary area. In time, this process wi...