With the increasing prevalence of large datasets of images, machine learning has all but overtaken the field of computer vision. In place of specialized domain knowledge, many problems are now dominated by deep neural networks
that are trained end-to-end on collections of labeled examples. But can we trust their predictions in real-world applications? Purely data-driven approaches can
be thwarted by high dimensionality, insufficient training data variability, intrinsic problem ambiguity, or adversarial vulnerability. In this thesis, we address two strategies for encouraging more effective generalization: 1) integrating prior knowledge through inference constraints 2) theoretically motivated model selection. While inherently challenging for feed-forward deep networks, they are prevalent in traditional techniques for data decomposition such as component analysis and sparse coding. Building upon recent connections between deep learning and
sparse approximation theory, we develop new methods to bridge this gap between deep and shallow learning.
We first introduce a formulation for data decomposition posed as approximate constraint satisfaction, which can accommodate richer instance-level prior knowledge. We apply this framework in Semantic Component Analysis, a method for weakly-supervised semantic segmentation with constraints that encourage interpretability even in the absence of supervision. From its close relationship
to standard component analysis, we also derive Additive Component Analysis for learning nonlinear manifold representations with roughness-penalized additive
models. Then, we propose Deep Component Analysis, an expressive model of constrained data decomposition that enforces hierarchical structure through multiple layers of constrained latent variables. While it can again be approximated by feed-forward deep networks, exact inference requires an iterative algorithm for minimizing approximation error subject to constraints. This is implemented using Alternating Direction Neural Networks, recurrent neural networks that can be trained discriminatively with backpropagation. Generalization capacity is improved by replacing nonlinear activation functions with constraints that are enforced by feedback connections. This is demonstrated experimentally through
applications to single-image depth prediction with sparse output constraints. Finally, we propose a technique for deep model selection motivated by sparse approximation theory. Specifically, we interpret the activations of feed-forward
deep networks with rectified linear units as algorithms for approximate inference in structured nonnegative sparse coding models. These models are then compared
by their capacities for achieving low mutual coherence, which is theoretically tied to the uniqueness and robustness of sparse representations. This provides a framework
for jointly quantifying the contributions of architectural hyperparameters such as depth, width, and skip connections without requiring expensive validation
on a specific dataset. Experimentally, we show correlation between a lower bound on mutual coherence and validation error across a variety of common network architectures including DenseNets and ResNets. More broadly, this suggests promising new opportunities for understanding and designing deep learning architectures based on connections to structured data decomposition.