Mechanisms for detecting and handling timing errors
Design and analysis of real-time systems is heavily based on knowing worst-case execution times (WCET) of periodic threads and aperiodic servers.
Accurately measuring WCET, however, is often difficult and sometimes impossible, for several reasons: •Interrupts in the system, which either execute longer than expected or occur more frequently than anticipated may steal critical execution time from the highest priority threads. •Variations in processing speed due to caching, pipelining, and bus arbitration may alter WCET. •There is no easy way to accurately measure execution times of embedded code.
As long as scheduling policies are based on WCET, these difficulties in measuring WCET inevitably lead to timing errors in the system. Many of these errors go undetected until more catastrophic failures occur, and others result in the system failing to meet its specifications, but with non-obvious reasons as to the cause of such failures.
We have created low-overhead policy-independent real-time operating system (RTOS) mechanisms, which detect and handle these types of timing errors. The mechanisms can be used with a variety of common scheduling algorithms, and serve as the basis for easily extending these policies to incorporate aperiodic servers, soft real-time threads, imprecise computations, and adaptive real-time scheduling. The mechanisms have been incorporated into the Chimera RTOS[9].