• A Fréchet tree distance measure to compare phylogeographic spread paths across trees.

      Reimering, Susanne; Muñoz, Sebastian; McHardy, Alice C; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany. (Nature publishing group, 2018-11-19)
      Phylogeographic methods reconstruct the origin and spread of taxa by inferring locations for internal nodes of the phylogenetic tree from sampling locations of genetic sequences. This is commonly applied to study pathogen outbreaks and spread. To evaluate such reconstructions, the inferred spread paths from root to leaf nodes should be compared to other methods or references. Usually, ancestral state reconstructions are evaluated by node-wise comparisons, therefore requiring the same tree topology, which is usually unknown. Here, we present a method for comparing phylogeographies across different trees inferred from the same taxa. We compare paths of locations by calculating discrete Fréchet distances. By correcting the distances by the number of paths going through a node, we define the Fréchet tree distance as a distance measure between phylogeographies. As an application, we compare phylogeographic spread patterns on trees inferred with different methods from hemagglutinin sequences of H5N1 influenza viruses, finding that both tree inference and ancestral reconstruction cause variation in phylogeographic spread that is not directly reflected by topological differences. The method is suitable for comparing phylogeographies inferred with different tree or phylogeographic inference methods to each other or to a known ground truth, thus enabling a quality assessment of such techniques.
    • From Genomes to Phenotypes: Traitar, the Microbial Trait Analyzer.

      Weimann, Aaron; Mooren, Kyra; Frank, Jeremy; Pope, Phillip B; Bremges, Andreas; McHardy, Alice C; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56, 38106 Braunschweig, Germany. (2017-01-31)
      The number of sequenced genomes is growing exponentially, profoundly shifting the bottleneck from data generation to genome interpretation. Traits are often used to characterize and distinguish bacteria and are likely a driving factor in microbial community composition, yet little is known about the traits of most microbes. We describe Traitar, the microbial trait analyzer, which is a fully automated software package for deriving phenotypes from a genome sequence. Traitar provides phenotype classifiers to predict 67 traits related to the use of various substrates as carbon and energy sources, oxygen requirement, morphology, antibiotic susceptibility, proteolysis, and enzymatic activities. Furthermore, it suggests protein families associated with the presence of particular phenotypes. Our method uses L1-regularized L2-loss support vector machines for phenotype assignments based on phyletic patterns of protein families and their evolutionary histories across a diverse set of microbial species. We demonstrate reliable phenotype assignment for Traitar to bacterial genomes from 572 species of eight phyla, also based on incomplete single-cell genomes and simulated draft genomes. We also showcase its application in metagenomics by verifying and complementing a manual metabolic reconstruction of two novel Clostridiales species based on draft genomes recovered from commercial biogas reactors. Traitar is available at https://github.com/hzi-bifo/traitar. IMPORTANCE Bacteria are ubiquitous in our ecosystem and have a major impact on human health, e.g., by supporting digestion in the human gut. Bacterial communities can also aid in biotechnological processes such as wastewater treatment or decontamination of polluted soils. Diverse bacteria contribute with their unique capabilities to the functioning of such ecosystems, but lab experiments to investigate those capabilities are labor-intensive. Major advances in sequencing techniques open up the opportunity to study bacteria by their genome sequences. For this purpose, we have developed Traitar, software that predicts traits of bacteria on the basis of their genomes. It is applicable to studies with tens or hundreds of bacterial genomes. Traitar may help researchers in microbiology to pinpoint the traits of interest, reducing the amount of wet lab work required.
    • Functional omics analyses reveal only minor effects of microRNAs on human somatic stem cell differentiation.

      Schira-Heinen, Jessica; Czapla, Agathe; Hendricks, Marion; Kloetgen, Andreas; Wruck, Wasco; Adjaye, James; Kögler, Gesine; Werner Müller, Hans; Stühler, Kai; Trompeter, Hans-Ingo; et al. (NPG, 2020-02-24)
      The contribution of microRNA-mediated posttranscriptional regulation on the final proteome in differentiating cells remains elusive. Here, we evaluated the impact of microRNAs (miRNAs) on the proteome of human umbilical cord blood-derived unrestricted somatic stem cells (USSC) during retinoic acid (RA) differentiation by a systemic approach using next generation sequencing analysing mRNA and miRNA expression and quantitative mass spectrometry-based proteome analyses. Interestingly, regulation of mRNAs and their dedicated proteins highly correlated during RA-incubation. Additionally, RA-induced USSC demonstrated a clear separation from native USSC thereby shifting from a proliferating to a metabolic phenotype. Bioinformatic integration of up- and downregulated miRNAs and proteins initially implied a strong impact of the miRNome on the XXL-USSC proteome. However, quantitative proteome analysis of the miRNA contribution on the final proteome after ectopic overexpression of downregulated miR-27a-5p and miR-221-5p or inhibition of upregulated miR-34a-5p, respectively, followed by RA-induction revealed only minor proportions of differentially abundant proteins. In addition, only small overlaps of these regulated proteins with inversely abundant proteins in non-transfected RA-treated USSC were observed. Hence, mRNA transcription rather than miRNA-mediated regulation is the driving force for protein regulation upon RA-incubation, strongly suggesting that miRNAs are fine-tuning regulators rather than active primary switches during RA-induction of USSC.
    • Genome-guided design of a defined mouse microbiota that confers colonization resistance against Salmonella enterica serovar Typhimurium.

      Brugiroux, Sandrine; Beutler, Markus; Pfann, Carina; Garzetti, Debora; Ruscheweyh, Hans-Joachim; Ring, Diana; Diehl, Manuel; Herp, Simone; Lötscher, Yvonne; Hussain, Saib; et al. (2016-11-21)
      Protection against enteric infections, also termed colonization resistance, results from mutualistic interactions of the host and its indigenous microbes. The gut microbiota of humans and mice is highly diverse and it is therefore challenging to assign specific properties to its individual members. Here, we have used a collection of murine bacterial strains and a modular design approach to create a minimal bacterial community that, once established in germ-free mice, provided colonization resistance against the human enteric pathogen Salmonella enterica serovar Typhimurium (S. Tm). Initially, a community of 12 strains, termed Oligo-Mouse-Microbiota (Oligo-MM
    • Genomic variation and strain-specific functional adaptation in the human gut microbiome during early life.

      Vatanen, Tommi; Plichta, Damian R; Somani, Juhi; Münch, Philipp C; Arthur, Timothy D; Hall, Andrew Brantley; Rudolf, Sabine; Oakeley, Edward J; Ke, Xiaobo; Young, Rachel A; et al. (Springer-Nature, 2019-01-01)
      The human gut microbiome matures towards the adult composition during the first years of life and is implicated in early immune development. Here, we investigate the effects of microbial genomic diversity on gut microbiome development using integrated early childhood data sets collected in the DIABIMMUNE study in Finland, Estonia and Russian Karelia. We show that gut microbial diversity is associated with household location and linear growth of children. Single nucleotide polymorphism- and metagenomic assembly-based strain tracking revealed large and highly dynamic microbial pangenomes, especially in the genus Bacteroides, in which we identified evidence of variability deriving from Bacteroides-targeting bacteriophages. Our analyses revealed functional consequences of strain diversity; only 10% of Finnish infants harboured Bifidobacterium longum subsp. infantis, a subspecies specialized in human milk metabolism, whereas Russian infants commonly maintained a probiotic Bifidobacterium bifidum strain in infancy. Groups of bacteria contributing to diverse, characterized metabolic pathways converged to highly subject-specific configurations over the first two years of life. This longitudinal study extends the current view of early gut microbial community assembly based on strain-level genomic variation.
    • Genomics and prevalence of bacterial and archaeal isolates from biogas-producing microbiomes.

      Maus, Irena; Bremges, Andreas; Stolze, Yvonne; Hahnke, Sarah; Cibis, Katharina G; Koeck, Daniela E; Kim, Yong S; Kreubel, Jana; Hassa, Julia; Wibberg, Daniel; et al. (2017)
      To elucidate biogas microbial communities and processes, the application of high-throughput DNA analysis approaches is becoming increasingly important. Unfortunately, generated data can only partialy be interpreted rudimentary since databases lack reference sequences.
    • Hepatitis C reference viruses highlight potent antibody responses and diverse viral functional interactions with neutralising antibodies.

      Bankwitz, Dorothea; Bahai, Akash; Labuhn, Maurice; Doepke, Mandy; Ginkel, Corinne; Khera, Tanvi; Todt, Daniel; Ströh, Luisa J; Dold, Leona; Klein, Florian; et al. (BMJ Publisher. Group, 2020-12-15)
      Community-acquired pneumonia by primary or superinfections with Streptococcus pneumoniae can lead to acute respiratory distress requiring mechanical ventilation. The pore-forming toxin pneumolysin alters the alveolar-capillary barrier and causes extravasation of protein-rich fluid into the interstitial pulmonary tissue, which impairs gas exchange. Platelets usually prevent endothelial leakage in inflamed pulmonary tissue by sealing inflammation-induced endothelial gaps. We not only confirm that S pneumoniae induces CD62P expression in platelets, but we also show that, in the presence of pneumolysin, CD62P expression is not associated with platelet activation. Pneumolysin induces pores in the platelet membrane, which allow anti-CD62P antibodies to stain the intracellular CD62P without platelet activation. Pneumolysin treatment also results in calcium efflux, increase in light transmission by platelet lysis (not aggregation), loss of platelet thrombus formation in the flow chamber, and loss of pore-sealing capacity of platelets in the Boyden chamber. Specific anti-pneumolysin monoclonal and polyclonal antibodies inhibit these effects of pneumolysin on platelets as do polyvalent human immunoglobulins. In a post hoc analysis of the prospective randomized phase 2 CIGMA trial, we show that administration of a polyvalent immunoglobulin preparation was associated with a nominally higher platelet count and nominally improved survival in patients with severe S pneumoniae-related community-acquired pneumonia. Although, due to the low number of patients, no definitive conclusion can be made, our findings provide a rationale for investigation of pharmacologic immunoglobulin preparations to target pneumolysin by polyvalent immunoglobulin preparations in severe community-acquired pneumococcal pneumonia, to counteract the risk of these patients becoming ventilation dependent. This trial was registered at www.clinicaltrials.gov as #NCT01420744.
    • The homeobox transcription factor HB9 induces senescence and blocks differentiation in hematopoietic stem and progenitor cells.

      Ingenhag, Deborah; Reister, Sven; Auer, Franziska; Bhatia, Sanil; Wildenhain, Sarah; Picard, Daniel; Remke, Marc; Hoell, Jessica I; Kloetgen, Andreas; Sohn, Dennis; et al. (Ferrata Storti Foundation, 2019-01-01)
      The homeobox gene
    • How to Grow a Computational Biology Lab.

      McHardy, Alice Carolyn; Helmholtz Centre for infection research, Inhoffenstr. 7, 38124 Braunschweig, Germany. (2015-09)
    • Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data.

      Frank, J A; Pan, Y; Tooming-Klunderud, A; Eijsink, V G H; McHardy, A C; Nederbragt, A J; Pope, P B; Department of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, 1432 Norway. (2016)
      DNA assembly is a core methodological step in metagenomic pipelines used to study the structure and function within microbial communities. Here we investigate the utility of Pacific Biosciences long and high accuracy circular consensus sequencing (CCS) reads for metagenomic projects. We compared the application and performance of both PacBio CCS and Illumina HiSeq data with assembly and taxonomic binning algorithms using metagenomic samples representing a complex microbial community. Eight SMRT cells produced approximately 94 Mb of CCS reads from a biogas reactor microbiome sample that averaged 1319 nt in length and 99.7% accuracy. CCS data assembly generated a comparative number of large contigs greater than 1 kb, to those assembled from a ~190x larger HiSeq dataset (~18 Gb) produced from the same sample (i.e approximately 62% of total contigs). Hybrid assemblies using PacBio CCS and HiSeq contigs produced improvements in assembly statistics, including an increase in the average contig length and number of large contigs. The incorporation of CCS data produced significant enhancements in taxonomic binning and genome reconstruction of two dominant phylotypes, which assembled and binned poorly using HiSeq data alone. Collectively these results illustrate the value of PacBio CCS reads in certain metagenomics applications.
    • In Silico Vaccine Strain Prediction for Human Influenza Viruses.

      Klingen, Thorsten R; Reimering, Susanne; Guzmán, Carlos A; McHardy, Alice C; Braunschweiger Zentrum für Systembiology, Rebenring 56,38108 Braunschweig, Germany. (2017-10-09)
      Vaccines preventing seasonal influenza infections save many lives every year; however, due to rapid viral evolution, they have to be updated frequently to remain effective. To identify appropriate vaccine strains, the World Health Organization (WHO) operates a global program that continually generates and interprets surveillance data. Over the past decade, sophisticated computational techniques, drawing from multiple theoretical disciplines, have been developed that predict viral lineages rising to predominance, assess their suitability as vaccine strains, link genetic to antigenic alterations, as well as integrate and visualize genetic, epidemiological, structural, and antigenic data. These could form the basis of an objective and reproducible vaccine strain-selection procedure utilizing the complex, large-scale data types from surveillance. To this end, computational techniques should already be incorporated into the vaccine-selection process in an independent, parallel track, and their performance continuously evaluated.
    • An Integrated Metagenome Catalog Reveals New Insights into the Murine Gut Microbiome.

      Lesker, Till R; Durairaj, Abilash C; Gálvez, Eric J C; Lagkouvardos, Ilias; Baines, John F; Clavel, Thomas; Sczyrba, Alexander; McHardy, Alice C; Strowig, Till; HZI,Helmholtz-Zentrum für Infektionsforschung GmbH, Inhoffenstr. 7,38124 Braunschweig, Germany.
      The complexity of host-associated microbial ecosystems requires host-specific reference catalogs to survey the functions and diversity of these communities. We generate a comprehensive resource, the integrated mouse gut metagenome catalog (iMGMC), comprising 4.6 million unique genes and 660 metagenome-assembled genomes (MAGs), many (485 MAGs, 73%) of which are linked to reconstructed full-length 16S rRNA gene sequences. iMGMC enables unprecedented coverage and taxonomic resolution of the mouse gut microbiota; i.e., more than 92% of MAGs lack species-level representatives in public repositories (<95% ANI match). The integration of MAGs and 16S rRNA gene data allows more accurate prediction of functional profiles of communities than predictions based on 16S rRNA amplicons alone. Accompanying iMGMC, we provide a set of MAGs representing 1,296 gut bacteria obtained through complementary assembly strategies. We envision that integrated resources such as iMGMC, together with MAG collections, will enhance the resolution of numerous existing and future sequencing-based studies.
    • Investigation of different nitrogen reduction routes and their key microbial players in wood chip-driven denitrification beds.

      Grießmeier, Victoria; Bremges, Andreas; McHardy, Alice Carolyn; Gescher, Johannes; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56, 38106 Braunschweig, Germany. (2017-12-05)
      Field denitrification beds containing polymeric plant material are increasingly used to eliminate nitrate from agricultural drainage water. They mirror a number of anoxic ecosystems. However, knowledge of the microbial composition, the interaction of microbial species, and the carbon degradation processes within these denitrification systems is sparse. This study revealed several new aspects of the carbon and nitrogen cycle, and these findings can be correlated with the dynamics of the microbial community composition and the activity of key species. Members of the order Pseudomonadales seem to be important players in denitrification at low nitrate concentrations, while a switch to higher nitrate concentrations seems to select for members of the orders Rhodocyclales and Rhizobiales. We observed that high nitrate loading rates lead to an unpredictable transition of the community's activity from denitrification to dissimilatory reduction of nitrate to ammonium (DNRA). This transition is mirrored by an increase in transcripts of the nitrite reductase gene nrfAH and the increase correlates with the activity of members of the order Ignavibacteriales. Denitrification reactors sustained the development of an archaeal community consisting of members of the Bathyarchaeota and methanogens belonging to the Euryarchaeota. Unexpectedly, the activity of the methanogens positively correlated with the nitrate loading rates.
    • MicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k-mer based representation of shallow sub-samples.

      Asgari, Ehsaneddin; Garakani, Kiavash; McHardy, Alice C; Mofrad, Mohammad R K; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany. (Oxford University Press, 2018-07-01)
      Microbial communities play important roles in the function and maintenance of various biosystems, ranging from the human body to the environment. A major challenge in microbiome research is the classification of microbial communities of different environments or host phenotypes. The most common and cost-effective approach for such studies to date is 16S rRNA gene sequencing. Recent falls in sequencing costs have increased the demand for simple, efficient and accurate methods for rapid detection or diagnosis with proved applications in medicine, agriculture and forensic science. We describe a reference- and alignment-free approach for predicting environments and host phenotypes from 16S rRNA gene sequencing based on k-mer representations that benefits from a bootstrapping framework for investigating the sufficiency of shallow sub-samples. Deep learning methods as well as classical approaches were explored for predicting environments and host phenotypes. A k-mer distribution of shallow sub-samples outperformed Operational Taxonomic Unit (OTU) features in the tasks of body-site identification and Crohn's disease prediction. Aside from being more accurate, using k-mer features in shallow sub-samples allows (i) skipping computationally costly sequence alignments required in OTU-picking and (ii) provided a proof of concept for the sufficiency of shallow and short-length 16S rRNA sequencing for phenotype prediction. In addition, k-mer features predicted representative 16S rRNA gene sequences of 18 ecological environments, and 5 organismal environments with high macro-F1 scores of 0.88 and 0.87. For large datasets, deep learning outperformed classical methods such as Random Forest and Support Vector Machine. The software and datasets are available at https://llp.berkeley.edu/micropheno. Supplementary data are available at Bioinformatics online.
    • Modular Traits of the Rhizobiales Root Microbiota and Their Evolutionary Relationship with Symbiotic Rhizobia.

      Garrido-Oter, Ruben; Nakano, Ryohei Thomas; Dombrowski, Nina; Ma, Ka-Wai; McHardy, Alice C; Schulze-Lefert, Paul; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany. (Elsevier, 2018-07-11)
      Animal-microbe facultative symbioses play a fundamental role in ecosystem and organismal health. Yet, due to the flexible nature of their association, the selection pressures that act on animals and their facultative symbionts remain elusive. Here we apply experimental evolution to Drosophila melanogaster associated with its growth-promoting symbiont Lactobacillus plantarum, representing a well-established model of facultative symbiosis. We find that the diet of the host, rather than the host itself, is a predominant driving force in the evolution of this symbiosis. Furthermore, we identify a mechanism resulting from the bacterium's adaptation to the diet, which confers growth benefits to the colonized host. Our study reveals that bacterial adaptation to the host's diet may be the foremost step in determining the evolutionary course of a facultative animal-microbe symbiosis.
    • Novel Syntrophic Populations Dominate an Ammonia-Tolerant Methanogenic Microbiome.

      Frank, J A; Arntzen, M Ø; Sun, L; Hagen, L H; McHardy, A C; Horn, S J; Eijsink, V G H; Schnürer, A; Pope, P B; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56, 38106 Braunschweig, Germany. (2017-05-10)
      Biogas reactors operating with protein-rich substrates have high methane potential and industrial value; however, they are highly susceptible to process failure because of the accumulation of ammonia. High ammonia levels cause a decline in acetate-utilizing methanogens and instead promote the conversion of acetate via a two-step mechanism involving syntrophic acetate oxidation (SAO) to H2 and CO2, followed by hydrogenotrophic methanogenesis. Despite the key role of syntrophic acetate-oxidizing bacteria (SAOB), only a few culturable representatives have been characterized. Here we show that the microbiome of a commercial, ammonia-tolerant biogas reactor harbors a deeply branched, uncultured phylotype (unFirm_1) accounting for approximately 5% of the 16S rRNA gene inventory and sharing 88% 16S rRNA gene identity with its closest characterized relative. Reconstructed genome and quantitative metaproteomic analyses imply unFirm_1's metabolic dominance and SAO capabilities, whereby the key enzymes required for acetate oxidation are among the most highly detected in the reactor microbiome. While culturable SAOB were identified in genomic analyses of the reactor, their limited proteomic representation suggests that unFirm_1 plays an important role in channeling acetate toward methane. Notably, unFirm_1-like populations were found in other high-ammonia biogas installations, conjecturing a broader importance for this novel clade of SAOB in anaerobic fermentations. IMPORTANCE The microbial production of methane or "biogas" is an attractive renewable energy technology that can recycle organic waste into biofuel. Biogas reactors operating with protein-rich substrates such as household municipal or agricultural wastes have significant industrial and societal value; however, they are highly unstable and frequently collapse due to the accumulation of ammonia. We report the discovery of a novel uncultured phylotype (unFirm_1) that is highly detectable in metaproteomic data generated from an ammonia-tolerant commercial reactor. Importantly, unFirm_1 is proposed to perform a key metabolic step in biogas microbiomes, whereby it syntrophically oxidizes acetate to hydrogen and carbon dioxide, which methanogens then covert to methane. Only very few culturable syntrophic acetate-oxidizing bacteria have been described, and all were detected at low in situ levels compared to unFirm_1. Broader comparisons produced the hypothesis that unFirm_1 is a key mediator toward the successful long-term stable operation of biogas production using protein-rich substrates.
    • The PARA-suite: PAR-CLIP specific sequence read simulation and processing.

      Kloetgen, Andreas; Borkhardt, Arndt; Hoell, Jessica I; McHardy, Alice C; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56, 38106 Braunschweig, Germany. (2016 (Sour)
      Next-generation sequencing technologies have profoundly impacted biology over recent years. Experimental protocols, such as photoactivatable ribonucleoside-enhanced cross-linking and immunoprecipitation (PAR-CLIP), which identifies protein-RNA interactions on a genome-wide scale, commonly employ deep sequencing. With PAR-CLIP, the incorporation of photoactivatable nucleosides into nascent transcripts leads to high rates of specific nucleotide conversions during reverse transcription. So far, the specific properties of PAR-CLIP-derived sequencing reads have not been assessed in depth. [Source code of the PARA-suite toolkit and the PARA-suite aligner (BWA PARA) are available at https://github.com/akloetgen/PARA-suite and https://github.com/akloetgen/PARA-suite_aligner , respectively, under the GNU GPLv3 license.]
    • Pediatric ALL relapses after allo-SCT show high individuality, clonal dynamics, selective pressure, and druggable targets.

      Hoell, Jessica I; Ginzel, Sebastian; Kuhlen, Michaela; Kloetgen, Andreas; Gombert, Michael; Fischer, Ute; Hein, Daniel; Demir, Salih; Stanulla, Martin; Schrappe, Martin; et al. (American Society of Haematology, 2019-10-22)
      Survival of patients with pediatric acute lymphoblastic leukemia (ALL) after allogeneic hematopoietic stem cell transplantation (allo-SCT) is mainly compromised by leukemia relapse, carrying dismal prognosis. As novel individualized therapeutic approaches are urgently needed, we performed whole-exome sequencing of leukemic blasts of 10 children with post-allo-SCT relapses with the aim of thoroughly characterizing the mutational landscape and identifying druggable mutations. We found that post-allo-SCT ALL relapses display highly diverse and mostly patient-individual genetic lesions. Moreover, mutational cluster analysis showed substantial clonal dynamics during leukemia progression from initial diagnosis to relapse after allo-SCT. Only very few alterations stayed constant over time. This dynamic clonality was exemplified by the detection of thiopurine resistance-mediating mutations in the nucleotidase NT5C2 in 3 patients' first relapses, which disappeared in the post-allo-SCT relapses on relief of selective pressure of maintenance chemotherapy. Moreover, we identified TP53 mutations in 4 of 10 patients after allo-SCT, reflecting acquired chemoresistance associated with selective pressure of prior antineoplastic treatment. Finally, in 9 of 10 children's post-allo-SCT relapse, we found alterations in genes for which targeted therapies with novel agents are readily available. We could show efficient targeting of leukemic blasts by APR-246 in 2 patients carrying TP53 mutations. Our findings shed light on the genetic basis of post-allo-SCT relapse and may pave the way for unraveling novel therapeutic strategies in this challenging situation.
    • Phylogeographic reconstruction using air transportation data and its application to the 2009 H1N1 influenza A pandemic.

      Reimering, Susanne; Muñoz, Sebastian; McHardy, Alice C; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany. (PLOS, 2020-02-01)
      Influenza A viruses cause seasonal epidemics and occasional pandemics in the human population. While the worldwide circulation of seasonal influenza is at least partly understood, the exact migration patterns between countries, states or cities are not well studied. Here, we use the Sankoff algorithm for parsimonious phylogeographic reconstruction together with effective distances based on a worldwide air transportation network. By first simulating geographic spread and then phylogenetic trees and genetic sequences, we confirmed that reconstructions with effective distances inferred phylogeographic spread more accurately than reconstructions with geographic distances and Bayesian reconstructions with BEAST that do not use any distance information, and led to comparable results to the Bayesian reconstruction using distance information via a generalized linear model. Our method extends Bayesian methods that estimate rates from the data by using fine-grained locations like airports and inferring intermediate locations not observed among sampled isolates. When applied to sequence data of the pandemic H1N1 influenza A virus in 2009, our approach correctly inferred the origin and proposed airports mainly involved in the spread of the virus. In case of a novel outbreak, this approach allows to rapidly analyze sequence data and infer origin and spread routes to improve disease surveillance and control.
    • PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes.

      Gregor, Ivan; Dröge, Johannes; Schirmer, Melanie; Quince, Christopher; McHardy, Alice C; Helmholtz Centre for infection research, Inhoffenstr. 7, D-38124 Braunschweig, Germany. (2016)
      Background. Metagenomics is an approach for characterizing environmental microbial communities in situ, it allows their functional and taxonomic characterization and to recover sequences from uncultured taxa. This is often achieved by a combination of sequence assembly and binning, where sequences are grouped into 'bins' representing taxa of the underlying microbial community. Assignment to low-ranking taxonomic bins is an important challenge for binning methods as is scalability to Gb-sized datasets generated with deep sequencing techniques. One of the best available methods for species bins recovery from deep-branching phyla is the expert-trained PhyloPythiaS package, where a human expert decides on the taxa to incorporate in the model and identifies 'training' sequences based on marker genes directly from the sample. Due to the manual effort involved, this approach does not scale to multiple metagenome samples and requires substantial expertise, which researchers who are new to the area do not have. Results. We have developed PhyloPythiaS+, a successor to our PhyloPythia(S) software. The new (+) component performs the work previously done by the human expert. PhyloPythiaS+ also includes a new k-mer counting algorithm, which accelerated the simultaneous counting of 4-6-mers used for taxonomic binning 100-fold and reduced the overall execution time of the software by a factor of three. Our software allows to analyze Gb-sized metagenomes with inexpensive hardware, and to recover species or genera-level bins with low error rates in a fully automated fashion. PhyloPythiaS+ was compared to MEGAN, taxator-tk, Kraken and the generic PhyloPythiaS model. The results showed that PhyloPythiaS+ performs especially well for samples originating from novel environments in comparison to the other methods. Availability. PhyloPythiaS+ in a virtual machine is available for installation under Windows, Unix systems or OS X on: https://github.com/algbioi/ppsp/wiki.