• The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens.

      Zhou, Naihui; Jiang, Yuxiang; Bergquist, Timothy R; Lee, Alexandra J; Kacsoh, Balint Z; Crocker, Alex W; Lewis, Kimberley A; Georghiou, George; Nguyen, Huy N; Hamid, Md Nafiz; et al. (BMC, 2019-11-19)
      BACKGROUND: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. RESULTS: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. CONCLUSION: We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.
    • CAMISIM: simulating metagenomes and microbial communities.

      Fritz, Adrian; Hofmann, Peter; Majda, Stephan; Dahms, Eik; Dröge, Johannes; Fiedler, Jessika; Lesker, Till R; Belmann, Peter; DeMaere, Matthew Z; Darling, Aaron E; et al. (BioMedCentral, 2019-02-08)
      Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required. We describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series, and differential abundance studies, includes real and simulated strain-level diversity, and generates second- and third-generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes, we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT, and metaSPAdes, on several thousand small data sets generated with CAMISIM. CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with standards of truth for method evaluation. All data sets and the software are freely available at https://github.com/CAMI-challenge/CAMISIM.
    • CAMITAX: Taxon labels for microbial genomes.

      Bremges, Andreas; Fritz, Adrian; McHardy, Alice C; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany. (Oxford Academic, 2020-01-01)
      BACKGROUND: The number of microbial genome sequences is increasing exponentially, especially thanks to recent advances in recovering complete or near-complete genomes from metagenomes and single cells. Assigning reliable taxon labels to genomes is key and often a prerequisite for downstream analyses. FINDINGS: We introduce CAMITAX, a scalable and reproducible workflow for the taxonomic labelling of microbial genomes recovered from isolates, single cells, and metagenomes. CAMITAX combines genome distance-, 16S ribosomal RNA gene-, and gene homology-based taxonomic assignments with phylogenetic placement. It uses Nextflow to orchestrate reference databases and software containers and thus combines ease of installation and use with computational reproducibility. We evaluated the method on several hundred metagenome-assembled genomes with high-quality taxonomic annotations from the TARA Oceans project, and we show that the ensemble classification method in CAMITAX improved on all individual methods across tested ranks. CONCLUSIONS: While we initially developed CAMITAX to aid the Critical Assessment of Metagenome Interpretation (CAMI) initiative, it evolved into a comprehensive software package to reliably assign taxon labels to microbial genomes. CAMITAX is available under Apache License 2.0 at https://github.com/CAMI-challenge/CAMITAX.
    • 'Candidatus Adiutrix intracellularis', an endosymbiont of termite gut flagellates, is the first representative of a deep-branching clade of Deltaproteobacteria and a putative homoacetogen.

      Ikeda-Ohtsubo, Wakako; Strassert, Jürgen F H; Köhler, Tim; Mikaelyan, Aram; Gregor, Ivan; McHardy, Alice C; Tringe, Susannah Green; Hugenholtz, Phil; Radek, Renate; Brune, Andreas; et al. (2016-09)
      Termite gut flagellates are typically colonized by specific bacterial symbionts. Here we describe the phylogeny, ultrastructure and subcellular location of 'Candidatus Adiutrix intracellularis', an intracellular symbiont of Trichonympha collaris in the termite Zootermopsis nevadensis. It represents a novel, deep-branching clade of uncultured Deltaproteobacteria widely distributed in intestinal tracts of termites and cockroaches. Fluorescence in situ hybridization and transmission electron microscopy localized the endosymbiont near hydrogenosomes in the posterior part and near the ectosymbiont 'Candidatus Desulfovibrio trichonymphae' in the anterior part of the host cell. The draft genome of 'Ca. Adiutrix intracellularis' obtained from a metagenomic library revealed the presence of a complete gene set encoding the Wood-Ljungdahl pathway, including two homologs of fdhF encoding hydrogenase-linked formate dehydrogenases (FDHH ) and all other components of the recently described hydrogen-dependent carbon dioxide reductase (HDCR) complex, which substantiates previous claims that the symbiont is capable of reductive acetogenesis from CO2 and H2 . The close phylogenetic relationship between the HDCR components and their homologs in homoacetogenic Firmicutes and Spirochaetes suggests that the deltaproteobacterium acquired the capacity for homoacetogenesis via lateral gene transfer. The presence of genes for nitrogen fixation and the biosynthesis of amino acids and cofactors indicate the nutritional nature of the symbiosis.
    • "Candidatus Paraporphyromonas polyenzymogenes" encodes multi-modular cellulases linked to the type IX secretion system.

      Naas, A E; Solden, L M; Norbeck, A D; Brewer, H; Hagen, L H; Heggenes, I M; McHardy, A C; Mackie, R I; Paša-Tolić, L; Arntzen, M Ø; et al. (2018-03-01)
      In nature, obligate herbivorous ruminants have a close symbiotic relationship with their gastrointestinal microbiome, which proficiently deconstructs plant biomass. Despite decades of research, lignocellulose degradation in the rumen has thus far been attributed to a limited number of culturable microorganisms. Here, we combine meta-omics and enzymology to identify and describe a novel Bacteroidetes family ("Candidatus MH11") composed entirely of uncultivated strains that are predominant in ruminants and only distantly related to previously characterized taxa.
    • Cellular Importin-α3 Expression Dynamics in the Lung Regulate Antiviral Response Pathways against Influenza A Virus Infection.

      Thiele, Swantje; Stanelle-Bertram, Stephanie; Beck, Sebastian; Kouassi, Nancy Mounogou; Zickler, Martin; Müller, Martin; Tuku, Berfin; Resa-Infante, Patricia; van Riel, Debby; Alawi, Malik; et al.
      Importin-α adaptor proteins orchestrate dynamic nuclear transport processes involved in cellular homeostasis. Here, we show that importin-α3, one of the main NF-κB transporters, is the most abundantly expressed classical nuclear transport factor in the mammalian respiratory tract. Importin-α3 promoter activity is regulated by TNF-α-induced NF-κB in a concentration-dependent manner. High-level TNF-α-inducing highly pathogenic avian influenza A viruses (HPAIVs) isolated from fatal human cases harboring human-type polymerase signatures (PB2 627K, 701N) significantly downregulate importin-α3 mRNA expression in primary lung cells. Importin-α3 depletion is restored upon back-mutating the HPAIV polymerase into an avian-type signature (PB2 627E, 701D) that can no longer induce high TNF-α levels. Importin-α3-deficient mice show reduced NF-κB-activated antiviral gene expression and increased influenza lethality. Thus, importin-α3 plays a key role in antiviral immunity against influenza. Lifting the bottleneck in importin-α3 availability in the lung might provide a new strategy to combat respiratory virus infections.
    • Characterisation of a stable laboratory co-culture of acidophilic nanoorganisms.

      Krause, Susanne; Bremges, Andreas; Münch, Philipp C; McHardy, Alice C; Gescher, Johannes; Helmholtz Centre for infection research, Inhoffenstr. 7, 38124 Braunschweig, Germany. (2017-06-12)
      This study describes the laboratory cultivation of ARMAN (Archaeal Richmond Mine Acidophilic Nanoorganisms). After 2.5 years of successive transfers in an anoxic medium containing ferric sulfate as an electron acceptor, a consortium was attained that is comprised of two members of the order Thermoplasmatales, a member of a proposed ARMAN group, as well as a fungus. The 16S rRNA identity of one archaeon is only 91.6% compared to the most closely related isolate Thermogymnomonas acidicola. Hence, this organism is the first member of a new genus. The enrichment culture is dominated by this microorganism and the ARMAN. The third archaeon in the community seems to be present in minor quantities and has a 100% 16S rRNA identity to the recently isolated Cuniculiplasma divulgatum. The enriched ARMAN species is most probably incapable of sugar metabolism because the key genes for sugar catabolism and anabolism could not be identified in the metagenome. Metatranscriptomic analysis suggests that the TCA cycle funneled with amino acids is the main metabolic pathway used by the archaea of the community. Microscopic analysis revealed that growth of the ARMAN is supported by the formation of cell aggregates. These might enable feeding of the ARMAN by or on other community members.
    • Computational prediction of vaccine strains for human influenza A (H3N2) viruses.

      Steinbrück, L; Klingen, T R; McHardy, A C; Helmholtz Centre for infection research, Inhoffenstr. 7, 38124 Braunschweig, Germany. (2014-10)
      Human influenza A viruses are rapidly evolving pathogens that cause substantial morbidity and mortality in seasonal epidemics around the globe. To ensure continued protection, the strains used for the production of the seasonal influenza vaccine have to be regularly updated, which involves data collection and analysis by numerous experts worldwide. Computer-guided analysis is becoming increasingly important in this problem due to the vast amounts of generated data. We here describe a computational method for selecting a suitable strain for production of the human influenza A virus vaccine. It interprets available antigenic and genomic sequence data based on measures of antigenic novelty and rate of propagation of the viral strains throughout the population. For viral isolates sampled between 2002 and 2007, we used this method to predict the antigenic evolution of the H3N2 viruses in retrospective testing scenarios. When seasons were scored as true or false predictions, our method returned six true positives, three false negatives, eight true negatives, and one false positive, or 78% accuracy overall. In comparison to the recommendations by the WHO, we identified the correct antigenic variant once at the same time and twice one season ahead. Even though it cannot be ruled out that practical reasons such as lack of a sufficiently well-growing candidate strain may in some cases have prevented recommendation of the best-matching strain by the WHO, our computational decision procedure allows quantitative interpretation of the growing amounts of data and may help to match the vaccine better to predominating strains in seasonal influenza epidemics. Importance: Human influenza A viruses continuously change antigenically to circumvent the immune protection evoked by vaccination or previously circulating viral strains. To maintain vaccine protection and thereby reduce the mortality and morbidity caused by infections, regular updates of the vaccine strains are required. We have developed a data-driven framework for vaccine strain prediction which facilitates the computational analysis of genetic and antigenic data and does not rely on explicit evolutionary models. Our computational decision procedure generated good matches of the vaccine strain to the circulating predominant strain for most seasons and could be used to support the expert-guided prediction made by the WHO; it thus may allow an increase in vaccine efficacy.
    • Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research.

      Hufsky, Franziska; Lamkiewicz, Kevin; Almeida, Alexandre; Aouacheria, Abdel; Arighi, Cecilia; Bateman, Alex; Baumbach, Jan; Beerenwinkel, Niko; Brandt, Christian; Cacciabue, Marco; et al. (Oxford Academic, 2020-11-04)
      SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The virus causes the infectious disease COVID-19. The biology of coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly for SARS-CoV-2 have only recently been developed as a rapid reaction to the need for fast detection, understanding and treatment of COVID-19. To control the ongoing COVID-19 pandemic, it is of utmost importance to get insight into the evolution and pathogenesis of the virus. In this review, we cover bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets and development of therapeutic strategies. For each tool, we briefly describe its use case and how it advances research specifically for SARS-CoV-2. All tools are free to use and available online, either through web applications or public code repositories.
    • Coupling of diversification and pH adaptation during the evolution of terrestrial Thaumarchaeota.

      Gubry-Rangin, Cécile; Kratsch, Christina; Williams, Tom A; McHardy, Alice C; Embley, T Martin; Prosser, James I; Macqueen, Daniel J; Institute of Biological and Environmental Sciences, University of Aberdeen, Aberdeen AB24 2TZ, United Kingdom (2015-07-28)
      The Thaumarchaeota is an abundant and ubiquitous phylum of archaea that plays a major role in the global nitrogen cycle. Previous analyses of the ammonia monooxygenase gene amoA suggest that pH is an important driver of niche specialization in these organisms. Although the ecological distribution and ecophysiology of extant Thaumarchaeota have been studied extensively, the evolutionary rise of these prokaryotes to ecological dominance in many habitats remains poorly understood. To characterize processes leading to their diversification, we investigated coevolutionary relationships between amoA, a conserved marker gene for Thaumarchaeota, and soil characteristics, by using deep sequencing and comprehensive environmental data in Bayesian comparative phylogenetics. These analyses reveal a large and rapid increase in diversification rates during early thaumarchaeotal evolution; this finding was verified by independent analyses of 16S rRNA. Our findings suggest that the entire Thaumarchaeota diversification regime was strikingly coupled to pH adaptation but less clearly correlated with several other tested environmental factors. Interestingly, the early radiation event coincided with a period of pH adaptation that enabled the terrestrial Thaumarchaeota ancestor to initially move from neutral to more acidic and alkaline conditions. In contrast to classic evolutionary models, whereby niches become rapidly filled after adaptive radiation, global diversification rates have remained stably high in Thaumarchaeota during the past 400-700 million years, suggesting an ongoing high rate of niche formation or switching for these microbes. Our study highlights the enduring importance of environmental adaptation during thaumarchaeotal evolution and, to our knowledge, is the first to link evolutionary diversification to environmental adaptation in a prokaryotic phylum.
    • Determination of antigenicity-altering patches on the major surface protein of human influenza A/H3N2 viruses.

      Kratsch, Christina; Klingen, Thorsten R; Mümken, Linda; Steinbrück, Lars; McHardy, Alice Carolyn; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56, 38106 Braunschweig, Germany. (2016-01)
      Human influenza viruses are rapidly evolving RNA viruses that cause short-term respiratory infections with substantial morbidity and mortality in annual epidemics. Uncovering the general principles of viral coevolution with human hosts is important for pathogen surveillance and vaccine design. Protein regions are an appropriate model for the interactions between two macromolecules, but the currently used epitope definition for the major antigen of influenza viruses, namely hemagglutinin, is very broad. Here, we combined genetic, evolutionary, antigenic, and structural information to determine the most relevant regions of the hemagglutinin of human influenza A/H3N2 viruses for interaction with human immunoglobulins. We estimated the antigenic weights of amino acid changes at individual sites from hemagglutination inhibition data using antigenic tree inference followed by spatial clustering of antigenicity-altering protein sites on the protein structure. This approach determined six relevant areas (patches) for antigenic variation that had a key role in the past antigenic evolution of the viruses. Previous transitions between successive predominating antigenic types of H3N2 viruses always included amino acid changes in either the first or second antigenic patch. Interestingly, there was only partial overlap between the antigenic patches and the patches under strong positive selection. Therefore, besides alterations of antigenicity, other interactions with the host may shape the evolution of human influenza A/H3N2 viruses.
    • EDEN: evolutionary dynamics within environments.

      Münch, Philipp C; Stecher, Bärbel; McHardy, Alice C; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany. (Oxford Academic, 2017-10-15)
      Metagenomics revolutionized the field of microbial ecology, giving access to Gb-sized datasets of microbial communities under natural conditions. This enables fine-grained analyses of the functions of community members, studies of their association with phenotypes and environments, as well as of their microevolution and adaptation to changing environmental conditions. However, phylogenetic methods for studying adaptation and evolutionary dynamics are not able to cope with big data. EDEN is the first software for the rapid detection of protein families and regions under positive selection, as well as their associated biological processes, from meta- and pangenome data. It provides an interactive result visualization for detailed comparative analyses. Availability and implementation: EDEN is available as a Docker installation under the GPL 3.0 license, allowing its use on common operating systems, at http://www.github.com/hzi-bifo/eden.
    • Eleven grand challenges in single-cell data science.

      Lähnemann, David; Köster, Johannes; Szczurek, Ewa; McCarthy, Davis J; Hicks, Stephanie C; Robinson, Mark D; Vallejos, Catalina A; Campbell, Kieran R; Beerenwinkel, Niko; Mahfouz, Ahmed; et al. (BMC, 2020-02-07)
      The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.
    • Evaluating assembly and variant calling software for strain-resolved analysis of large DNA viruses

      Deng, Zhi-Luo; Dhingra, Akshay; Fritz, Adrian; Götting, Jasper; Münch, Philipp C; Steinbrück, Lars; Schulz, Thomas F; Ganzenmüller, Tina; McHardy, Alice C; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany. (Oxford University Press (OUP), 2020-07-07)
      Infection with human cytomegalovirus (HCMV) can cause severe complications in immunocompromised individuals and congenitally infected children. Characterizing heterogeneous viral populations and their evolution by high-throughput sequencing of clinical specimens requires the accurate assembly of individual strains or sequence variants and suitable variant calling methods. However, the performance of most methods has not been assessed for populations composed of low divergent viral strains with large genomes, such as HCMV. In an extensive benchmarking study, we evaluated 15 assemblers and 6 variant callers on 10 lab-generated benchmark data sets created with two different library preparation protocols, to identify best practices and challenges for analyzing such data. Most assemblers, especially metaSPAdes and IVA, performed well across a range of metrics in recovering abundant strains. However, only one, Savage, recovered low abundant strains and in a highly fragmented manner. Two variant callers, LoFreq and VarScan2, excelled across all strain abundances. Both shared a large fraction of false positive variant calls, which were strongly enriched in T to G changes in a ‘G.G’ context. The magnitude of this context-dependent systematic error is linked to the experimental protocol. We provide all benchmarking data, results and the entire benchmarking workflow named QuasiModo, Quasispecies Metric determination on omics, under the GNU General Public License v3.0 (https://github.com/hzi-bifo/Quasimodo), to enable full reproducibility and further benchmarking on these and other data.
    • Evolution of 2009 H1N1 influenza viruses during the pandemic correlates with increased viral pathogenicity and transmissibility in the ferret model.

      Otte, Anna; Marriott, Anthony C; Dreier, Carola; Dove, Brian; Mooren, Kyra; Klingen, Thorsten R; Sauter, Martina; Thompson, Katy-Anne; Bennett, Allan; Klingel, Karin; et al. (2016)
      There is increasing evidence that 2009 pandemic H1N1 influenza viruses have evolved after pandemic onset giving rise to severe epidemics in subsequent waves. However, it still remains unclear which viral determinants might have contributed to disease severity after pandemic initiation. Here, we show that distinct mutations in the 2009 pandemic H1N1 virus genome have occurred with increased frequency after pandemic declaration. Among those, a mutation in the viral hemagglutinin was identified that increases 2009 pandemic H1N1 virus binding to human-like α2,6-linked sialic acids. Moreover, these mutations conferred increased viral replication in the respiratory tract and elevated respiratory droplet transmission between ferrets. Thus, our data show that 2009 H1N1 influenza viruses have evolved after pandemic onset giving rise to novel virus variants that enhance viral replicative fitness and respiratory droplet transmission in a mammalian animal model. These findings might help to improve surveillance efforts to assess the pandemic risk by emerging influenza viruses.
    • Evolutionary model for the unequal segregation of high copy plasmids.

      Münch, Karin; Münch, Richard; Biedendieck, Rebekka; Jahn, Dieter; Müller, Johannes; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany. (PLOS, 2019-01-01)
      Plasmids are extrachromosomal DNA elements of microorganisms encoding beneficial genetic information. They were thought to be equally distributed to daughter cells during cell division. Here we use mathematical modeling to investigate the evolutionary stability of plasmid segregation for high-copy plasmids—plasmids that are present in up to several hundred copies per cell—carrying antibiotic resistance genes. Evolutionary stable strategies (ESS) are determined by numerical analysis of a plasmid-load structured population model. The theory predicts that the evolutionary stable segregation strategy of a cell depends on the plasmid copy number: For low and medium plasmid load, both daughters receive in average an equal share of plasmids, while in case of high plasmid load, one daughter obtains distinctively and systematically more plasmids. These findings are in good agreement with recent experimental results. We discuss the interpretation and practical consequences.
    • Evolutionary Stabilization of Cooperative Toxin Production through a Bacterium-Plasmid-Phage Interplay.

      Spriewald, Stefanie; Stadler, Eva; Hense, Burkhard A; Münch, Philipp C; McHardy, Alice C; Weiss, Anna S; Obeng, Nancy; Müller, Johannes; Stecher, Bärbel; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany. (ASM, 2020-07-21)
      Colicins are toxins produced and released by Enterobacteriaceae to kill competitors in the gut. While group A colicins employ a division of labor strategy to liberate the toxin into the environment via colicin-specific lysis, group B colicin systems lack cognate lysis genes. In Salmonella enterica serovar Typhimurium (S. Tm), the group B colicin Ib (ColIb) is released by temperate phage-mediated bacteriolysis. Phage-mediated ColIb release promotes S. Tm fitness against competing Escherichia coli It remained unclear how prophage-mediated lysis is realized in a clonal population of ColIb producers and if prophages contribute to evolutionary stability of toxin release in S. Tm. Here, we show that prophage-mediated lysis occurs in an S. Tm subpopulation only, thereby introducing phenotypic heterogeneity to the system. We established a mathematical model to study the dynamic interplay of S. Tm, ColIb, and a temperate phage in the presence of a competing species. Using this model, we studied long-term evolution of phage lysis rates in a fluctuating infection scenario. This revealed that phage lysis evolves as bet-hedging strategy that maximizes phage spread, regardless of whether colicin is present or not. We conclude that the ColIb system, lacking its own lysis gene, is making use of the evolutionary stable phage strategy to be released. Prophage lysis genes are highly prevalent in nontyphoidal Salmonella genomes. This suggests that the release of ColIb by temperate phages is widespread. In conclusion, our findings shed new light on the evolution and ecology of group B colicin systems.IMPORTANCE Bacteria are excellent model organisms to study mechanisms of social evolution. The production of public goods, e.g., toxin release by cell lysis in clonal bacterial populations, is a frequently studied example of cooperative behavior. Here, we analyze evolutionary stabilization of toxin release by the enteric pathogen Salmonella The release of colicin Ib (ColIb), which is used by Salmonella to gain an edge against competing microbiota following infection, is coupled to bacterial lysis mediated by temperate phages. Here, we show that phage-dependent lysis and subsequent release of colicin and phage particles occurs only in part of the ColIb-expressing Salmonella population. This phenotypic heterogeneity in lysis, which represents an essential step in the temperate phage life cycle, has evolved as a bet-hedging strategy under fluctuating environments such as the gastrointestinal tract. Our findings suggest that prophages can thereby evolutionarily stabilize costly toxin release in bacterial populations.
    • A Fréchet tree distance measure to compare phylogeographic spread paths across trees.

      Reimering, Susanne; Muñoz, Sebastian; McHardy, Alice C; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany. (Nature publishing group, 2018-11-19)
      Phylogeographic methods reconstruct the origin and spread of taxa by inferring locations for internal nodes of the phylogenetic tree from sampling locations of genetic sequences. This is commonly applied to study pathogen outbreaks and spread. To evaluate such reconstructions, the inferred spread paths from root to leaf nodes should be compared to other methods or references. Usually, ancestral state reconstructions are evaluated by node-wise comparisons, therefore requiring the same tree topology, which is usually unknown. Here, we present a method for comparing phylogeographies across different trees inferred from the same taxa. We compare paths of locations by calculating discrete Fréchet distances. By correcting the distances by the number of paths going through a node, we define the Fréchet tree distance as a distance measure between phylogeographies. As an application, we compare phylogeographic spread patterns on trees inferred with different methods from hemagglutinin sequences of H5N1 influenza viruses, finding that both tree inference and ancestral reconstruction cause variation in phylogeographic spread that is not directly reflected by topological differences. The method is suitable for comparing phylogeographies inferred with different tree or phylogeographic inference methods to each other or to a known ground truth, thus enabling a quality assessment of such techniques.
    • From Genomes to Phenotypes: Traitar, the Microbial Trait Analyzer.

      Weimann, Aaron; Mooren, Kyra; Frank, Jeremy; Pope, Phillip B; Bremges, Andreas; McHardy, Alice C; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56, 38106 Braunschweig, Germany. (2017-01-31)
      The number of sequenced genomes is growing exponentially, profoundly shifting the bottleneck from data generation to genome interpretation. Traits are often used to characterize and distinguish bacteria and are likely a driving factor in microbial community composition, yet little is known about the traits of most microbes. We describe Traitar, the microbial trait analyzer, which is a fully automated software package for deriving phenotypes from a genome sequence. Traitar provides phenotype classifiers to predict 67 traits related to the use of various substrates as carbon and energy sources, oxygen requirement, morphology, antibiotic susceptibility, proteolysis, and enzymatic activities. Furthermore, it suggests protein families associated with the presence of particular phenotypes. Our method uses L1-regularized L2-loss support vector machines for phenotype assignments based on phyletic patterns of protein families and their evolutionary histories across a diverse set of microbial species. We demonstrate reliable phenotype assignment for Traitar to bacterial genomes from 572 species of eight phyla, also based on incomplete single-cell genomes and simulated draft genomes. We also showcase its application in metagenomics by verifying and complementing a manual metabolic reconstruction of two novel Clostridiales species based on draft genomes recovered from commercial biogas reactors. Traitar is available at https://github.com/hzi-bifo/traitar. IMPORTANCE Bacteria are ubiquitous in our ecosystem and have a major impact on human health, e.g., by supporting digestion in the human gut. Bacterial communities can also aid in biotechnological processes such as wastewater treatment or decontamination of polluted soils. Diverse bacteria contribute with their unique capabilities to the functioning of such ecosystems, but lab experiments to investigate those capabilities are labor-intensive. Major advances in sequencing techniques open up the opportunity to study bacteria by their genome sequences. For this purpose, we have developed Traitar, software that predicts traits of bacteria on the basis of their genomes. It is applicable to studies with tens or hundreds of bacterial genomes. Traitar may help researchers in microbiology to pinpoint the traits of interest, reducing the amount of wet lab work required.
    • Functional omics analyses reveal only minor effects of microRNAs on human somatic stem cell differentiation.

      Schira-Heinen, Jessica; Czapla, Agathe; Hendricks, Marion; Kloetgen, Andreas; Wruck, Wasco; Adjaye, James; Kögler, Gesine; Werner Müller, Hans; Stühler, Kai; Trompeter, Hans-Ingo; et al. (NPG, 2020-02-24)
      The contribution of microRNA-mediated posttranscriptional regulation on the final proteome in differentiating cells remains elusive. Here, we evaluated the impact of microRNAs (miRNAs) on the proteome of human umbilical cord blood-derived unrestricted somatic stem cells (USSC) during retinoic acid (RA) differentiation by a systemic approach using next generation sequencing analysing mRNA and miRNA expression and quantitative mass spectrometry-based proteome analyses. Interestingly, regulation of mRNAs and their dedicated proteins highly correlated during RA-incubation. Additionally, RA-induced USSC demonstrated a clear separation from native USSC thereby shifting from a proliferating to a metabolic phenotype. Bioinformatic integration of up- and downregulated miRNAs and proteins initially implied a strong impact of the miRNome on the XXL-USSC proteome. However, quantitative proteome analysis of the miRNA contribution on the final proteome after ectopic overexpression of downregulated miR-27a-5p and miR-221-5p or inhibition of upregulated miR-34a-5p, respectively, followed by RA-induction revealed only minor proportions of differentially abundant proteins. In addition, only small overlaps of these regulated proteins with inversely abundant proteins in non-transfected RA-treated USSC were observed. Hence, mRNA transcription rather than miRNA-mediated regulation is the driving force for protein regulation upon RA-incubation, strongly suggesting that miRNAs are fine-tuning regulators rather than active primary switches during RA-induction of USSC.