MicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k-mer based representation of shallow sub-samples.
Average rating
Cast your vote
You can rate an item by clicking the amount of stars they wish to award to this item.
When enough users have cast their vote on this item, the average rating will also be shown.
Star rating
Your vote was cast
Thank you for your feedback
Thank you for your feedback
Issue Date
2018-07-01
Metadata
Show full item recordAbstract
Microbial communities play important roles in the function and maintenance of various biosystems, ranging from the human body to the environment. A major challenge in microbiome research is the classification of microbial communities of different environments or host phenotypes. The most common and cost-effective approach for such studies to date is 16S rRNA gene sequencing. Recent falls in sequencing costs have increased the demand for simple, efficient and accurate methods for rapid detection or diagnosis with proved applications in medicine, agriculture and forensic science. We describe a reference- and alignment-free approach for predicting environments and host phenotypes from 16S rRNA gene sequencing based on k-mer representations that benefits from a bootstrapping framework for investigating the sufficiency of shallow sub-samples. Deep learning methods as well as classical approaches were explored for predicting environments and host phenotypes. A k-mer distribution of shallow sub-samples outperformed Operational Taxonomic Unit (OTU) features in the tasks of body-site identification and Crohn's disease prediction. Aside from being more accurate, using k-mer features in shallow sub-samples allows (i) skipping computationally costly sequence alignments required in OTU-picking and (ii) provided a proof of concept for the sufficiency of shallow and short-length 16S rRNA sequencing for phenotype prediction. In addition, k-mer features predicted representative 16S rRNA gene sequences of 18 ecological environments, and 5 organismal environments with high macro-F1 scores of 0.88 and 0.87. For large datasets, deep learning outperformed classical methods such as Random Forest and Support Vector Machine. The software and datasets are available at https://llp.berkeley.edu/micropheno. Supplementary data are available at Bioinformatics online.Citation
Bioinformatics. 2018 Jul 1;34(13):i32-i42. doi: 10.1093/bioinformatics/bty296.Affiliation
BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany.Publisher
Oxford University PressPubMed ID
29950008Additional Links
https://llp.berkeley. edu/microphenoType
ArticleISSN
1367-4811ae974a485f413a2113503eed53cd6c53
10.1093/bioinformatics/bty296
Scopus Count
The following license files are associated with this item:
- Creative Commons
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 4.0 International
Related articles
- DiTaxa: nucleotide-pair encoding of 16S rRNA for host phenotype and biomarker detection.
- Authors: Asgari E, Münch PC, Lesker TR, McHardy AC, Mofrad MRK
- Issue date: 2019 Jul 15
- Updating the 97% identity threshold for 16S ribosomal RNA OTUs.
- Authors: Edgar RC
- Issue date: 2018 Jul 15
- 16S rRNA sequence embeddings: Meaningful numeric feature representations of nucleotide sequences that are convenient for downstream analyses.
- Authors: Woloszynek S, Zhao Z, Chen J, Rosen GL
- Issue date: 2019 Feb
- MicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k-mer based representation of shallow sub-samples.
- Authors: Asgari E, Garakani K, McHardy AC, Mofrad MRK
- Issue date: 2019 Mar 15
- A comprehensive evaluation of the sl1p pipeline for 16S rRNA gene sequencing analysis.
- Authors: Whelan FJ, Surette MG
- Issue date: 2017 Aug 14