Zizyte_cmu_0041E_10542.pdf (2.76 MB)

Better Robustness Testing for Autonomy Systems

Download (2.76 MB)
posted on 16.09.2020, 15:49 by Milda Zizyte
Successfully testing autonomy systems is important to decrease the likelihood that these systems will cause damage to people, themselves, or the environment.
Historically, robustness testing has been successful at finding failures in traditional system software. Robustness testing uses a chosen test value input generation technique
to exercise the system under test with potentially exceptional inputs and evaluate how the system performs. However, assessing coverage for a given input generation technique, especially in black box testing, is tricky. Past work has justified new input generation techniques on the basis that they find a non-zero number of failures, or find more failures than other methods. Simply measuring the efficacy of
these techniques in this way does not consider the complexity or uniqueness of these failures. No strongly justified metrics of comparison or systematic ways to combine test value input generation techniques have been introduced. In this dissertation, we explore two main robustness testing input generation techniques: fuzzing and dictionary-based testing. These techniques represent two
different ways of sampling the possible input space for a given parameter. Fuzzing can theoretically generate any value, but may generate wasteful test cases due to
the size of the sample space. Conversely, dictionary-based testing may closer match the distribution of failure-triggering inputs, but is restricted in scope by the predetermined
values in the dictionary. By introducing metrics to compare these techniques, we can highlight how these tradeoffs manifest on actual systems. To perform this comparison, we have created an approach to test autonomy systems
and apply both test input generation techniques to an assortment of systems. We introduce the comparison metrics of efficiency and effectiveness, and show that
both test methods have areas of strength, weakness, and similar performance. By delving deeper into the reason for these differences and similarities, we justify combining
the test input generation techniques in a hybridized way. We propose various hybrid testing methods and evaluate them according to our metrics of comparison. We find that dictionary-based testing, followed by fuzzing, performs the best according to our metrics. We show that this happens because of a path dependency in testing, that is, deeper bugs cannot be found until fragile fields are eliminated from
testing. We discuss how both of our metrics were necessary to reach this insight. We also include general insights from testing autonomy systems, such as low dimensionality
of failure-triggering inputs. Our recommendations of testing frameworks, test input generation techniques, test case selection strategies for a hybrid testing method, and metrics of evaluation can be used to test robotics software effectively and efficiently in the future, which is a step toward safer autonomy systems.




Degree Type



Electrical and Computer Engineering

Degree Name

  • Doctor of Philosophy (PhD)


Philip Koopman