Carnegie Mellon University
Browse

Real-Time Telemetry Systems for Multidimensional Streaming Data: A Case Study on Video Viewership Analytics

Download (1.83 MB)
thesis
posted on 2022-11-16, 21:49 authored by Antonios ManousisAntonios Manousis

Large-scale infrastructures across various domains (e.g., Internet services, sensor farms, operations monitoring etc.) produce ever-increasing amounts of streaming data. As these data streams contain invaluable operational insights, operators invest heavily on telemetry frameworks to extract these insights and use them towards ensuring their infrastructure’s reliability and growth. In this dissertation, we focus on telemetry for streaming video infrastructures. In particular, this work is motivated by a previously unexplored aspect of video telemetry, namely viewership analytics. That is, detecting and diagnosing video viewership anomalies, simultaneously, across multiple subpopulations of viewers. 

This dissertation aims at enhancing video operators’ toolbox with novel telemetry capabilities for viewership analytics. Nevertheless, designing telemetry workflows for viewership analytics proves challenging on multiple fronts. First, increases in volume and dimensionality of incoming data streams result in a combinatorial explosion of data subpopulations to monitor and, as a result, in prohibitive cost and resource overheads for operators. Second, the contextual and non stationary nature of viewership complicates the detection and diagnosis of viewership anomalies. Last, the need to simultaneously monitor ever-increasing numbers of subpopulations of viewers complicates extracting the few critical, and often highly localized, events of interest needed to provide actionable insights to operators. 

Our work addresses these challenges through the design and implementation of a suite of practical tools for video viewership analytics. First, we introduce Hydra, a novel sketch-based analytics framework for efficient and general analytics over multidimensional data streams. We show that HYDRA offers robust accuracy guarantees at one tenth (or less) of the operational cost of exact analytics frameworks and does so with query latencies that are up to 20× lower than existing alternatives. In Proteas, our second contribution, we leverage key structural insights of viewership in order to enable accurate detection and insightful diagnosis of viewership anomalies. We show that our approach ensures low numbers of false positives and outperforms the closest state-of-the-art alternatives. Last, we illustrate how these insights can be combined in the design of an end-to-end telemetry framework. Through extensive analysis driven by real-world datasets, we demonstrate that our findings can yield substantial cost and resource benefits over existing solutions. Additionally, we discuss their potential applicability in different domains, in addition to video. 

History

Date

2021-10-22

Degree Type

  • Dissertation

Department

  • Electrical and Computer Engineering

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Vyas Sekar