Browse Prior Art Database

Correctly Finding a Failed Task in a Pre-Emptive Non-Time-Sliced Operating System

IP.com Disclosure Number: IPCOM000131858D
Publication Date: 2005-Nov-21
Document File: 5 page(s) / 52K

Publishing Venue

The IP.com Prior Art Database

Abstract

Background There exists a problem which applies to pre-emptive (not purely time-sliced), multi-tasking operating systems. The solution disclosed herein applies to this type of OS. A watchdog is a utility that monitors all of the current tasks to ensure functionality. Each task will report or “kick” the watchdog during task-specific intervals of time to let the watchdog know that it is still running. Timeout values, also known as reporting periods, vary from one task to another task. A task is expired when its timeout period has expired. If the watchdog does not receive a signal or “kick” from a task within the task’s timeout value, the task has reported, indicating a failed task. Some currently implemented operating systems use two mechanisms to ensure functionality of software: 1. “Kicking” the watchdog so the device does not reset 2. Having all the tasks report periodically to the watchdog Note: the watchdog is the task with the highest priority; it does not report to itself. 1. Technical Problem The nature of the scheduling of a non-time-sliced OS will prevent other tasks from running if a task with a higher priority has failed to report. Therefore, for debugging purposes, it is very useful to be able to precisely identify the task that failed to report to the watchdog. A current implementation in an OS to find a timed-out task, which failed to report to the watchdog task, is reporting the wrong information. The current implementation in an OS code looks randomly for a task that failed to report. It stops as soon as it finds one expired task and reports that as the failure task. This is incorrect, since it isn’t necessarily the first task that expired which failed. Since tasks are executed by order of priority, if a task expires, other lower-priority tasks will expire eventually also. Therefore, it may be any task with a higher priority, other than the first expired task, that is the failing task. It became necessary to find a solution to efficiently find the correct failing task. 2. Novel Aspect of Invention Our new implementation was intended to precisely and optimally find a failing task by using an effective search algorithm. It uses elimination via top and bottom delimiters to limit the scope of search for the failing task. 3. An Overview Drawing Figure 1: Tasks reporting to the watchdog task. The watchdog is a task that monitors all tasks in the operating system. All tasks will report, or “kick” the watchdog during task-specific intervals of time, to let the watchdog know that it is still running. Timeout values, also known as reporting periods, vary from one task to another task. A task is expired when its timeout period has expired. 4. A High-Level Block Diagram The nature of a pre-emptive, priority-based operating system will prevent other tasks with lower priority from running if a particular task fails to report to the watchdog (i.e., times-out.) Below are two high-level block diagrams depicting the events which occur after a task times-out. . Please note: the task number is its priority: the greater the number, the greater the priority. For example, task 3 has a higher priority than task 2. Figure 2.1: Tasks reporting to the watchdog task. Figure 2.2: Eventually all the tasks with lower priority all time-out 5. A Detailed block diagram of our implementation: Figure 3. High-level flowchart. The first step in our algorithm to identify the correct failed task is to sort the tasks by order of priority, from the highest to lowest priority. Next, while the counters count down for each task, search for one that is expired (does not report to watchdog) and that is not a blocked task or has a timeout value of zero. An expired task indicates that either this particular task or one with a higher priority has failed. Also, check that the task that expired is not the sleep task. If all these conditions are met, stop counting down and begin the search for the failed task. Begin searching: Set the expired task as the bottom delimiter for the search interval, and the top delimiter. Initially the highest priority task is the top delimiter: if the top delimiter is updated, then every task above it is functional. Loop from the task with priority one above the bottom delimiter, to the top delimiter; incrementing the index by 1 each time. With each iteration in the loop, first check if the correct failed task is found: this is when top and bottom delimiters are reporting neighbors, or equal. One of two things can happen to a task: (1) it can expire, which means every task below it can be discarded, so move the bottom delimiter up one, or (2) the task can get updated, which means every task above it in priority is fine. If the task didn’t expire, check if task was updated. If so, change the value of the top delimiter to the updated task, since every task above it is functional. By changing the top and bottom delimiters, we narrow the window of search in each iteration of the loop. We are able to find the correct failed task almost immediately. The search is stopped when one of the following conditions is met: 1. Only two neighbour reporting tasks are left. 2. The value of the bottom delimiter is equal to the value of top delimiter. In either condition is met, the correct failed task found is the bottom delimiter task. Another function checks to see whether two tasks are neighbouring reporting tasks. It determines if the two are neighboring tasks by priority. The function receives a top priority task and bottom priority (top and bottom delimiter), determines if they are neighbours based on this condition: if a task timeout value is NOT 0 and is NOT blocked, there exists tasks in between top and bottom, therefore they are not neighbours.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 50% of the total text.

DETECTING FAILED TASKS

Correctly Finding a Failed Task in a Pre-Emptive Non-Time-Sliced Operating System

Disclosed Anonymously

Background

There exists a problem which applies to pre-emptive (not purely time-sliced), multi-tasking operating systems. The solution disclosed herein applies to this type of OS.

 

A watchdog is a utility that monitors all of the current tasks to ensure functionality.  Each task will report or “kick” the watchdog during task-specific intervals of time to let the watchdog know that it is still running.  Timeout values, also known as reporting periods, vary from one task to another task. A task is expired when its timeout period has expired.

If the watchdog does not receive a signal or “kick” from a task within the task’s timeout value, the task has reported, indicating a failed task.

Some currently implemented operating systems use two mechanisms to ensure functionality of software:

 

  1. “Kicking” the watchdog so the device does not reset
  2. Having all the tasks report periodically to the watchdog

Note: the watchdog is the task with the highest priority; it does not report to itself.

1.  Technical Problem

The nature of the scheduling of a non-time-sliced OS will prevent other tasks from running if a task with a higher priority has failed to report.
Therefore, for debugging purposes, it is very useful to be able to precisely identify the task that failed to report to the watchdog.

A current implementation in an OS to find a timed-out task, which failed to report to the watchdog task, is reporting the wrong information.

The current implementation in an OS code looks randomly for a task that failed to report. It stops as soon as it finds one expired task and reports that as the failure task. This is incorrect, since it isn’t necessarily the first task that expired which failed. Since tasks are executed by order of priority, if a task expires, other lower-priority tasks will expire eventually also. Therefore, it may be any task with a higher priority, other than the first expired task, that is the failing task.

It became necessary to find a solution to efficiently find the correct failing task.

2.  Novel Aspect of Invention

Our new implementation was intended to precisely and optimally find a failing task by using an effective search algorithm. It uses elimination via top and bottom delimiters to limit the scope of search for the failing task.

3. An Overview Drawing

 

 

Figure 1: Tasks reporting to the watchdog task.

The watchdog is a task that monitors all tasks in the operating system.  All tasks will report, or “kick” the watchdog during task-specific intervals of time, to let the watchdog know that it is still running.  Timeout values, also known as reporting periods, vary from one task to another task. A task is expired when its timeout period has expired.

 


4. A High-Level Block Diagram

The nature of a pre-emptive, priority-based operating system will prevent other tasks with lower priority from running...