Robustness Testing of A Distributed Simulation Backplane
Creating robust software requires not only careful specification and implementation, but also quantitative measurement. This paper describes Ballista exception handling testing of the High Level Architecture Run-Time Infrastructure (HLA RTI). The RTI is a standard distributed simulation system intended to provide completely robust exception handling, yet implementations have normalized robustness failure rates as high as 10%. Non-robust testing responses include exception handler crashes, segmentation violations, "unknown" exceptions, and task hangs. Other issues include different robustness failure modes across ports to two operating systems, and mandatory client machine rebooting after a particular RTI failure. Testing the RTI led to scalable extensions of the Ballista architecture for handling exception-based error reporting models, testing object-oriented software structures (including call-backs, pass by reference, and constructors), and operating in a state-rich, distributed system environment. These results demonstrate that robustness testing can provide useful feedback to high-quality software development processes, and can be applied to domains well beyond the previous work on testing operating systems.