Better Robustness Testing for Autonomy Systems

Zizyte, Milda

doi:10.1184/R1/12938561.v1

Better Robustness Testing for Autonomy Systems

thesis

posted on 2020-09-16, 15:49 authored by Milda ZizyteMilda Zizyte

Successfully testing autonomy systems is important to decrease the likelihood that these systems will cause damage to people, themselves, or the environment.

Historically, robustness testing has been successful at finding failures in traditional system software. Robustness testing uses a chosen test value input generation technique

to exercise the system under test with potentially exceptional inputs and evaluate how the system performs. However, assessing coverage for a given input generation technique, especially in black box testing, is tricky. Past work has justified new input generation techniques on the basis that they find a non-zero number of failures, or find more failures than other methods. Simply measuring the efficacy of

these techniques in this way does not consider the complexity or uniqueness of these failures. No strongly justified metrics of comparison or systematic ways to combine test value input generation techniques have been introduced. In this dissertation, we explore two main robustness testing input generation techniques: fuzzing and dictionary-based testing. These techniques represent two

different ways of sampling the possible input space for a given parameter. Fuzzing can theoretically generate any value, but may generate wasteful test cases due to

the size of the sample space. Conversely, dictionary-based testing may closer match the distribution of failure-triggering inputs, but is restricted in scope by the predetermined

values in the dictionary. By introducing metrics to compare these techniques, we can highlight how these tradeoffs manifest on actual systems. To perform this comparison, we have created an approach to test autonomy systems

and apply both test input generation techniques to an assortment of systems. We introduce the comparison metrics of efficiency and effectiveness, and show that

both test methods have areas of strength, weakness, and similar performance. By delving deeper into the reason for these differences and similarities, we justify combining

the test input generation techniques in a hybridized way. We propose various hybrid testing methods and evaluate them according to our metrics of comparison. We find that dictionary-based testing, followed by fuzzing, performs the best according to our metrics. We show that this happens because of a path dependency in testing, that is, deeper bugs cannot be found until fragile fields are eliminated from

testing. We discuss how both of our metrics were necessary to reach this insight. We also include general insights from testing autonomy systems, such as low dimensionality

of failure-triggering inputs. Our recommendations of testing frameworks, test input generation techniques, test case selection strategies for a hybrid testing method, and metrics of evaluation can be used to test robotics software effectively and efficiently in the future, which is a step toward safer autonomy systems.

History

Date

2020-05-06

Degree Type

Dissertation

Department

Electrical and Computer Engineering

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Philip Koopman

Usage metrics

Keywords

fuzz testing robotics systems robustness testing safety-critical systems software safety software testing Computer Engineering

Licence

In Copyright

Better Robustness Testing for Autonomy Systems

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports