Exploiting Network Science for Feature Extraction and Representation Learning

Bhardwaj, Kartikeya

doi:10.1184/R1/9907484.v1

Bhardwaj_cmu_0041E_10447.pdf (10.87 MB)

Exploiting Network Science for Feature Extraction and Representation Learning

thesis

posted on 2019-10-15, 20:26 authored by Kartikeya BhardwajKartikeya Bhardwaj

Networks are ubiquitous for many real-world problems such as modeling information diffusion over social networks, transportation systems, understanding protein-protein
interactions, human mobility, computational sustainability, among many others. Recently, due to the ongoing Big Data revolution, the fields of machine learning and Artificial Intelligence (AI) have also become extremely important, with AI mostly being dominated by representation learning techniques such as deep learning. However, research at the intersection of network science, machine learning and AI has been mostly unexplored. Specifically, most of the prior research focuses on how machine learning techniques can be used to solve “network” problems such as predicting information diffusion on social networks or classifying blogger interests in a blog network, etc. On the contrary, in this thesis, we answer the following key question: How can
we exploit network science to improve machine learning and representation learning models when addressing general problems? To answer the above question, we address several problems at the intersection of network science, machine learning, and AI. Specifically, we address four fundamental research challenges: (i) Network Science for Traditional Machine Learning, (ii) Representation Learning for Small-Sample Datasets, (iii) Network Science-Based
Deep Learning Model Compression, and (iv) Network Science for Neural Architecture Space Exploration. In other words, we show that many problems are governed by latent network dynamics which must be incorporated into the machine learning or representation learning models.
To this end, we first demonstrate how network science can be used for traditional machine learning problems such as spatiotemporal timeseries prediction and application-specific feature extraction. More precisely, we propose a new framework called Network-of-Dynamic Bayesian Networks (NDBN) to address a complex probabilistic learning problem over networks with known but rapidly changing structure. We also propose a new domain-specific network inference approach when the network structure is unknown and only the high-dimensional data is available. We further introduce
a new network science-based, application-specific feature extraction method called K-Hop Learning. As concrete case studies, we show that both NDBN framework and K-Hop Learning significantly outperform traditional machine learning techniques for computational sustainability problems such as short-term solar energy and river flow prediction, respectively. We then discuss how network science can be used to address general representation
learning problems with high-dimensional and small-sample datasets. Here, we propose a new network community-based dimensionality reduction framework called
FeatureNet. Our approach is based on a new correlations-based network construction technique that explicitly discovers hidden communities in high-dimensional raw data.
We show the effectiveness of FeatureNet on many diverse small-sample problems as deep learning typically overfits for such problems. We demonstrate that our technique
achieves significantly higher accuracy than ten state-of-the-art dimensionality reduction methods (up to 40% improvement) for the small-sample problems. Since a simple correlations-based network alone cannot capture meaningful features for problems like image classification, we focus on deep learning models like Convolutional Neural Networks (CNN). Indeed, in the era of Internet-of-Things (IoT),
computational costs of deep networks have become a critical challenge for deploying such models on resource-constrained edge devices. Towards this, model compression
has emerged as an important area of research. However, when a computationally expensive CNN (or even a compressed model) cannot fit within the memory-budget
of a single IoT-device, it must be distributed across multiple devices which leads to significant inter-device communication. To alleviate the above problem, we propose a new model compression framework called the Network-of-Neural Networks (NoNN) which first exploits network science to partition a large “teacher” model’s knowledge into disjoint groups and then trains individual “student” models for each group. This results in a set of student modules
which satisfy the strict resource-constraints of individual IoT-devices. Extensive experiments on five well-known image classification tasks show that NoNN achieves similar accuracy as the teacher model and significantly outperforms the prior art. We also deploy our proposed framework on real hardware such as Raspberry Pi’s and Odroids to demonstrate that NoNN results in up to 12 reduction in latency, and up to 14 reduction in energy per device with negligible loss of accuracy. Finally, since deep networks are essentially a network of (artificial) neurons, network
science is a perfect candidate to study their architectural characteristics. Hence, we model deep networks from a network science perspective to identify which architecture level characteristics enable models with different number of parameters and layers to achieve comparable accuracy. To this end, we propose new metrics called NN-Mass
and NN-Density to study the architecture design space of deep networks. We further theoretically demonstrate that (i) For a given depth and width, CNN architectures with
higher NN-Mass achieve lower generalization error, and (ii) Irrespective of number of parameters and layers (but same width), models with similar NN-Mass yield similar test accuracy. We then present extensive empirical evidence towards the above two theoretical insights by conducting experiments on real image classification tasks such
as CIFAR-10 and CIFAR-100. Lastly, we exploit the latter insight to directly design efficient architectures which achieve comparable accuracy to large models ( 97%
on CIFAR-10 dataset) with up to 3 reduction in total parameters. This ultimately reveals how model sizes can be reduced directly from the architecture perspective.
In summary, in this thesis, we address several problems at the intersection of network science, machine learning, and representation learning. Our research comprehensively
demonstrates that network science can not only play a significant role but also lead to excellent results in both machine learning and representation learning.

History

Date

2019-09-10

Degree Type

Dissertation

Department

Electrical and Computer Engineering

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Radu Marculescu

Usage metrics

Keywords

deep learning machine learning model compression network science

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Exploiting Network Science for Feature Extraction and Representation Learning

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports