Chin_cmu_0041E_10683.pdf (15.07 MB)
Download file

On Designing Resource-Constrained CNNs Efficiently

Download (15.07 MB)
thesis
posted on 03.03.2022, 20:45 authored by Ting-wu ChinTing-wu Chin
Deep Convolutional Neural Networks (CNNs) have been adopted in many computer vision applications to achieve high performance. However, the growing computational demand of CNNs has made it increasingly difficult to deploy state-of-the-art CNNs onto resource-constrained platforms. As a result, model compression/acceleration has emerged to be an important field of research. In this thesis, we intend to
make CNNs more friendly for resource-limited platforms from two perspectives. The first perspective is to introduce novel ways of compressing/accelerating CNNs and the second perspective is to reduce the overhead of existing methodologies for constructing resource-constrained CNNs.
In the first perspective, we propose one novel technique for model acceleration and another for model compression. First, we propose AdaScale which is an algorithm that automatically scales the resolution of input images to improve both the speed and accuracy of a video object detection system. Second, we identify the Winning-Bitwidth phenomenon, where we found some weight bitwidth is more efficient than others for model compression when the filter counts of the CNNs are allowed to change. In the second perspective, we propose three novel algorithms for accelerating existing filter pruning methods for constructing resource-constrained CNNs. First, we propose LeGR, an algorithm that aims to learn a global ranking among filters of a pre-trained CNN so that compressing the CNN to different target constraint levels using filter pruning can be done efficiently by greedily pruning the filters following the
learned ranking. Second, we improve upon LeGR and propose Joslim, which is an algorithm that trains a CNN from scratch by jointly optimizing its weights and filter counts such that the trained CNN can be pruned without fine-tuning. Joslim improves upon LeGR in terms of efficiency as LeGR requires the pruned models to be fine-tuned to be usable. Lastly, we propose Width Transfer, which improves
the efficiency for filter pruning methods that are derived from a neural architecture search perspective. Width Transfer assumes that the optimized filter counts are regular across depths and widths of a CNN architecture and are invariant to the size and the resolution of the training dataset. As a result, Width Transfer performs neural architecture search for filter counts by solving a proxy problem that has a much
lower overhead.

History

Date

07/07/2021

Degree Type

Dissertation

Department

Electrical and Computer Engineering

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Diana Marculescu