Explainability in Cybersecurity Data Science

online resource

posted on 2023-11-28, 00:06 authored by Jeffrey MellonJeffrey Mellon, Clarence WorrellClarence Worrell

Cybersecurity is data-rich and therefore a natural setting for machine learning (ML). However, many challenges hamper ML deployment into cybersecurity systems and organizations. One major challenge is that the human-machine relationship is rooted in a lack of explainability. Generally, there are two directions of explainability in cybersecurity data science:

Model-to-Human: predictive models inform the human cyber experts
Human-to-Model: human cyber experts inform the predictive models

When we build systems that combine both directions, we encourage a bidirectional, continuous relationship between the human and machine. We consider the absence of this two-way relationship a barrier to adopting ML systems at the cybersecurity-operations level. On a very basic level, explainable cybersecurity ML can be achieved now, but there are opportunities for significant improvement. In this blog, we first provide an overview of explainability in ML. Next, we illustrate (1) model-to-human explainability with the ML model form of cybersecurity decision trees. We then illustrate (2) human-to-model explainability with the feature engineering step of a cybersecurity ML pipeline. Finally, motivated by the progress made toward physics-informed ML, we recommend research needed to advance cybersecurity ML to achieve the level of two-way explainability necessary to encourage use of ML-based systems at the cybersecurity operations level.

History

Publisher Statement

This material is based upon work funded and supported by the Department of Defense under Contract No. FA8702-15-D-0002 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center. The view, opinions, and/or findings contained in this material are those of the author(s) and should not be construed as an official Government position, policy, or decision, unless designated by other documentation. References herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by Carnegie Mellon University or its Software Engineering Institute. This report was prepared for the SEI Administrative Agent AFLCMC/AZS 5 Eglin Street Hanscom AFB, MA 01731-2100. NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN "AS-IS" BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT. [DISTRIBUTION STATEMENT A] This material has been approved for public release and unlimited distribution. Please see Copyright notice for non-US Government use and distribution.

Explainability in Cybersecurity Data Science

History

Publisher Statement

Copyright Statement

Usage metrics

Categories

Keywords

Licence

Exports