Search Algorithms and Search Spaces for Neural Architecture Search
Neural architecture search (NAS) is recently proposed to automate the process of designing network architectures. Instead of manually designing network architectures, NAS automatically finds the optimal architecture in a data-driven way. Despite its impressive progress, NAS is still far from being widely adopted as a common paradigm for architecture design in practice. This thesis aims to develop principled NAS methods that can automate the design of neural networks and reduce human efforts in architecture tuning as much as possible. To achieve this goal, we focus on developing better search algorithms and search spaces, both of which are important for the performance of NAS.
For search algorithms, we first present an efficient NAS framework based on Bayesian optimization (BO). Specifically, we propose a method to learn an embedding space over the domain of network architectures, which makes it possible to define a kernel function for the architecture domain, a necessary component to applying BO to NAS. Then, we propose a neighborhood-aware NAS formulation to improve the generalization of architectures found by NAS. The proposed formulation is general enough to be applied to various search algorithms, including both sampling-based algorithms and gradient-based algorithms.
For search spaces, we first extend NAS beyond discovering convolutional cells to attention cells. We propose a search space for spatiotemporal attention cells that use attention operations as the primary building block. Our discovered attention cells not only outperform manually designed ones, but also demonstrate strong generalization across different modalities, backbones, or datasets. Then, we show that committee-based models (ensembles or cascades) are an overlooked design space for efficient models. We find that simply building committees from off-the shelf pre-trained models can match or exceed the accuracy of state-of-the-art models while being drastically more efficient. Finally, we point out the importance of controlling the cost in the comparison of different LiDAR-based 3D object detectors. We show that, SECOND, a simple baseline which is generally believed to have been significantly surpassed, can almost match the performance of the state-of-the-art method on theWaymo Open Dataset, if allowed to use a similar latency.
- Doctor of Philosophy (PhD)