Microbiome profiles in the human body and environment niches have become publicly available due to recent advances in high-throughput sequencing technologies. Indeed, recent studies have already identified different microbiome profiles in healthy and sick individuals for a variety of diseases; this suggests that the microbiome profile can be used as a diagnostic tool in identifying the disease states of an individual. However, the high-dimensional nature of metagenomic data poses a significant challenge to existing machine learning models. Consequently, to enable personalized treatments, an efficient framework that can accurately and robustly differentiate between healthy and sick microbiome profiles is needed.
In this paper, we propose MetaNN (i.e., classification of host phenotypes from Metagenomic data using Neural Networks), a neural network framework which utilizes a new data augmentation technique to mitigate the effects of data over-fitting.
We show that MetaNN outperforms existing state-of-the-art models in terms of classification accuracy for both synthetic and real metagenomic data. These results pave the way towards developing personalized treatments for microbiome related diseases.
Publisher StatementThis is the publisher version of Lo, C., Marculescu, R. MetaNN: accurate classification of host phenotypes from metagenomic data using neural networks. BMC Bioinformatics 20, 314 (2019) doi:10.1186/s12859-019-2833-2
© The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
This article was published open access using the Carnegie Mellon University Libraries' Article Processing Charge (APC) fund