Carnegie Mellon University
Browse

Cyber-Informed Machine Learning

Download (116.35 kB)
online resource
posted on 2025-02-11, 22:31 authored by Jeffrey MellonJeffrey Mellon, Clarence WorrellClarence Worrell

Consider a security operations center (SOC) that monitors network and endpoint data in real time to identify threats to their enterprise. Depending on the size of its organization, the SOC may receive about 200,000 alerts per day. Only a small portion of these alerts can receive human attention because each investigated alert may require 15-to-20 minutes of analyst attention to answer a critical question for the enterprise: Is this a benign event, or is my organization under attack? This is a challenge for nearly all organizations, since even small enterprises generate far more network, endpoint, and log events than humans can effectively monitor. SOCs therefore must employ security monitoring software to pre-screen and downsample the number of logged events requiring human investigation. Machine learning (ML) for cybersecurity has been researched extensively because SOC activities are data rich, and ML is now increasingly deployed into security software. ML is not yet broadly trusted in SOCs, and a major barrier is that ML methods suffer from a lack of explainability. Without explanations, it is reasonable for SOC analysts not to trust the ML. Outside of cybersecurity, there are broad general demands for ML explainability. The European General Data Protection Regulation (Article 22 and Recital 71) encodes into law the “right to an explanation” when ML is used in a way that significantly affects an individual. The SOC analyst also has a need for explanations because the decisions they must make, often under time pressure and with ambiguous information, can have significant impacts on their organization. We propose cyber-informed machine learning as a conceptual framework for emphasizing three types of explainability when ML is used for cybersecurity:

  • data-to-human
  • model-to-human
  • human-to-model

In this blog post, we provide an overview of each type of explainability, and we recommend research needed to achieve the level of explainability necessary to encourage use of ML-based systems intended to support cybersecurity operations.

History

Publisher Statement

This material is based upon work funded and supported by the Department of Defense under Contract No. FA8702-15-D-0002 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center. The view, opinions, and/or findings contained in this material are those of the author(s) and should not be construed as an official Government position, policy, or decision, unless designated by other documentation. References herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by Carnegie Mellon University or its Software Engineering Institute. This report was prepared for the SEI Administrative Agent AFLCMC/AZS 5 Eglin Street Hanscom AFB, MA 01731-2100. NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN "AS-IS" BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT. [DISTRIBUTION STATEMENT A] This material has been approved for public release and unlimited distribution. Please see Copyright notice for non-US Government use and distribution.

Copyright Statement

Copyright 2025 Carnegie Mellon University.

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC