Carnegie Mellon University
CMU-CS-21-128.pdf (55.38 MB)

Dynamic Model Specialization for Efficient Inference, Training and Supervision

Download (55.38 MB)
posted on 2022-01-24, 21:41 authored by Ravi Teja MullapudiRavi Teja Mullapudi
Recent supervised learning approaches focus on designing and building models that generalize to a wide range of scenarios. The key ingredients for building these general models are large scale datasets that capture a diverse set of scenarios and computational resources to train large models. This large scale supervised learning approach has well known scalability challenges namely: 1) accurate general models are computationally expensive for training and inference 2) collecting and labeling large datasets requires extensive human effort and 3) datasets need to be repeatedly curated due to shifts in the target distribution. In this thesis, we argue that in many cases creating a set of highly specialized models that span the domain of interest can reduce model inference, training, and supervision costs, compared to creating a single monolithic model that generalizes across the entire domain. Specifically, we exploit temporal specialization for building efficient video segmentation models. We show that continuously specializing a compact model to the content in a video stream enables accurate and efficient inference. We leverage specialization to visually similar categories for building efficient image classification architectures. We
show that by specializing model features to discriminate between visually similar categories, one can improve inference efficiency by only computing the subset
of features necessary for classifying a specific image. We exploit specialization to individual categories for reducing human labeling effort in building models for rare categories. We show that models specialized for binary classification of
individual rare categories reduce human e?ort in mining large unlabeled data collections for relevant examples. More broadly, we demonstrate that by dynamically specializing to a moment in time, to an input scene, or to a specific
object category, it is possible to train accurate models quickly, reduce inference costs, and significantly reduce the amount of supervision required for training.




Degree Type

  • Dissertation


  • Computer Science

Degree Name

  • Doctor of Philosophy (PhD)


Kayvon Fatahalian Deva Ramanan

Usage metrics



    Ref. manager