Neural Architectures Towards Invariant Representation Learning
thesisposted on 21.05.2021, 19:44 by Dipan PalDipan Pal
The world is complex, ever changing yet structured. Given this complexity, how can an organism living in such a world or even an artificial system distill this information to perceive what is important towards its intention and what is not?
What theoretical principles underline such an ability? More interestingly, how do these principles operate in the chaotic randomness and complexities of neural circuits in mammalian brains? These questions form the very heart of the study of representation learning, and also point to perhaps the most interesting directions the field must explore. Indeed, one of the fundamental pursuits of machine learning and artificial intelligence is learning to be invariant to nuisance transformations in the data. Most prior work has focused on addressing these challenges through different aspects of the deep learning pipeline such as loss functions, data augmentation and more recently self-supervision techniques. However, the core architecture or structure
of these networks has yet to be adapted to these challenges. In this thesis, we explore how to learn and encode invariance towards nuisance transformations by
redesigning the convolution architecture itself leading to more powerful and efficient neural networks. We present two fundamental improvements to neural architecture
design through NPTNs (Non-Parametric Transformation Networks) and PRCNs (Permanent Random Connectome Networks). These are designed to be drop-in replacements for the ubiquitous vanilla convolution layer. NPTNs are a natural generalization of ConvNets and unlike almost all previous works in deep architectures, they make no assumption regarding the structure of the invariances present in the data. PRCNs on the other hand, are initialized
with random connectomes (not just weights) which are a small subset of the connections in a fully connected convolution layer. Importantly, these connections in
PRCNs once initialized remain permanent throughout training and testing. Permanent random connectomes make these architectures loosely more biologically plausible
than many other mainstream network architectures which require highly ordered structures. They also offer insights towards computational models of random connectomes
in the visual cortex. Empirically, we find that these randomly initialized permanent connections have positive effects on generalization and parameter efficiency. These ideas open a new dimension in deep network design providing more
versatile and effective learning. More importantly, they offer initial answers to some of the fundamental and motivating questions we highlighted above in representation learning.
DepartmentElectrical and Computer Engineering
- Doctor of Philosophy (PhD)