Carnegie Mellon University
Browse

DataOps: Towards More Reliable Machine Learning Systems

Download (116.35 kB)
online resource
posted on 2025-04-23, 17:30 authored by Daniel DeCapriaDaniel DeCapria

As organizations increasingly rely on machine learning (ML) systems for mission-critical tasks, they face significant challenges in managing the raw material of these systems: data. Data scientists and engineers grapple with ensuring data quality, maintaining consistency across different versions, tracking changes over time, and coordinating work across teams. These challenges are amplified in defense contexts, where decisions based on ML models can have significant consequences and where strict regulatory requirements demand complete traceability and reproducibility. DataOps emerged as a response to these challenges, providing a systematic approach to data management that enables organizations to build and maintain reliable, trustworthy ML systems. In our previous post, we introduced our series on machine learning operations (MLOps) testing & evaluation (T&E) and outlined the three key domains we'll be exploring: DataOps, ModelOps and EdgeOps. . In this post, we're diving into DataOps, an area that focuses on the management and optimization of data throughout its lifecycle. DataOps is a critical component that forms the foundation of any successful ML system.

History

Publisher Statement

NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN "AS-IS" BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT. [DISTRIBUTION STATEMENT A] This material has been approved for public release and unlimited distribution. Please see Copyright notice for non-US Government use and distribution.

Copyright Statement

Copyright 2025 Carnegie Mellon University.

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC