Certifiable Evaluation for Safe Intelligent Autonomy
Intelligent autonomous systems, such as those in healthcare, transportation, and manufacturing, require thorough safety evaluations before deployment. However, traditional evaluation methods struggle to handle the high-dimensional input spaces and rare failures that occur in safe autonomous systems. This presents a significant challenge, as the extreme rarity and high dimensionality make it difficult to generate sufficient data to accurately estimate rare-event probabilities. In previous attempts to tackle this issue, techniques like naive sampling or accelerated sampling algorithms have yielded incorrect and overconfident results, posing a risk to public safety.
To address this challenge, this thesis proposes three novel safety evaluation algorithms that incorporate machine learning methods: Deep Importance Sampling (Deep IS), Deep Probabilistic Accelerated Evaluation (Deep-PrAE), and Computationally Efficient and Robust Evaluation of Safety (CERTIFY). These algorithms are scalable, reliable, and efficient, allowing for evaluating intelligent autonomy prototypes in digital-twin platforms like high-fidelity driving simulators. The use of machine learning enables these algorithms to handle high-dimensional input spaces and extreme rarity, making them suitable for evaluating modern safe intelligent autonomy algorithms.
Deep IS is a highly efficient evaluation algorithm that estimates rare-event probabilities associated with the safety of autonomous systems by combining a deep neural network, mixed-integer programming, and importance sampling. Benchmarking studies have shown that Deep IS can identify and quantify safety risks associated with perception algorithms in autonomous vehicles with superior efficiency and accuracy compared to traditional evaluation algorithms such as Naive Monte Carlo, Multilevel Splitting, or Cross Entropy.
Deep-PrAE builds upon Deep IS by providing theoretical guarantees for efficient estimation of the upper or lower bounds of rare-event probabilities. This allows Deep-PrAE to prevent underestimation of the target value in black-box problems, achieving valid bounds efficiently for various numerical experiments while balancing efficiency and correctness compared to other evaluation algorithms. CERTIFY combines Deep-PrAE with a neural network verification algorithm to efficiently compute an upper bound for the failure rate of autonomous systems. In benchmarking studies, CERTIFY demonstrated up to two orders of magnitude efficiency improvement in safety certification tasks, ensuring that the true performance of an algorithm is below a certain threshold. Despite maintaining a valid upper bound similar to Deep-PrAE, CERTIFY results in a much faster evaluation procedure, even when the underlying problem dimension is high.
The proposed algorithms were benchmarked in a common driving scenario simulator, highlighting their relevance to assessing complex autonomous vehicle algorithms like perception systems under various noise models. The findings suggest that traditional evaluation algorithms may prematurely conclude safety due to a lack of sample size, making them unreliable for certifying safe autonomous systems. On the other hand, the proposed algorithms offer a scalable, reliable, efficient alternative for certifying safe autonomous systems by computing a meaningful upper bound that can be used to certify the autonomous agent's performance. The study's findings serve to empower developers, researchers, and policymakers in selecting and deploying suitable evaluation algorithms to effectively identify and mitigate safety risks. This study contributes to the ongoing efforts to develop safer and more reliable intelligent autonomy in various application domains dealing with rare failure cases and high-dimensional spaces.
- Mechanical Engineering
- Doctor of Philosophy (PhD)