• Needs for an Integration of Specific Data Sources and Items - First Insights of a National Survey Within the German Center for Infection Research.

      Jakob, Carolin E M; Stecher, Melanie; Fuhrmann, Sandra; Wingen-Heimann, Sebastian; Heinen, Stephanie; Anton, Gabriele; Behnke, Michael; Behrends, Uta; Boeker, Martin; Castell, Stefanie; et al. (IOS Press, 2021-05-24)
      State-subsidized programs develop medical data integration centers in Germany. To get infection disease (ID) researchers involved in the process of data sharing, common interests and minimum data requirements were prioritized. In 06/2019 we have initiated the German Infectious Disease Data Exchange (iDEx) project. We have developed and performed an online survey to determine prioritization of requests for data integration and exchange in ID research. The survey was designed with three sub-surveys, including a ranking of 15 data categories and 184 specific data items and a query of available 51 data collecting systems. A total of 84 researchers from 17 fields of ID research participated in the survey (predominant research fields: gastrointestinal infections n=11, healthcare-associated and antibiotic-resistant infections n=10, hepatitis n=10). 48% (40/84) of participants had experience as medical doctor. The three top ranked data categories were microbiology and parasitology, experimental data, and medication (53%, 52%, and 47% of maximal points, respectively). The most relevant data items for these categories were bloodstream infections, availability of biomaterial, and medication (88%, 87%, and 94% of maximal points, respectively). The ranking of requests of data integration and exchange is diverse and depends on the chosen measure. However, there is need to promote discipline-related digitalization and data exchange.
    • Tutorial: assessing metagenomics software with the CAMI benchmarking toolkit.

      Meyer, Fernando; Lesker, Till-Robin; Koslicki, David; Fritz, Adrian; Gurevich, Alexey; Darling, Aaron E; Sczyrba, Alexander; Bremges, Andreas; McHardy, Alice C; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany. (Nature Research, 2021-03-01)
      Computational methods are key in microbiome research, and obtaining a quantitative and unbiased performance estimate is important for method developers and applied researchers. For meaningful comparisons between methods, to identify best practices and common use cases, and to reduce overhead in benchmarking, it is necessary to have standardized datasets, procedures and metrics for evaluation. In this tutorial, we describe emerging standards in computational meta-omics benchmarking derived and agreed upon by a larger community of researchers. Specifically, we outline recent efforts by the Critical Assessment of Metagenome Interpretation (CAMI) initiative, which supplies method developers and applied researchers with exhaustive quantitative data about software performance in realistic scenarios and organizes community-driven benchmarking challenges. We explain the most relevant evaluation metrics for assessing metagenome assembly, binning and profiling results, and provide step-by-step instructions on how to generate them. The instructions use simulated mouse gut metagenome data released in preparation for the second round of CAMI challenges and showcase the use of a repository of tool results for CAMI datasets. This tutorial will serve as a reference for the community and facilitate informative and reproducible benchmarking in microbiome research.
    • Hepatitis C reference viruses highlight potent antibody responses and diverse viral functional interactions with neutralising antibodies.

      Bankwitz, Dorothea; Bahai, Akash; Labuhn, Maurice; Doepke, Mandy; Ginkel, Corinne; Khera, Tanvi; Todt, Daniel; Ströh, Luisa J; Dold, Leona; Klein, Florian; et al. (BMJ Publisher. Group, 2020-12-15)
      Community-acquired pneumonia by primary or superinfections with Streptococcus pneumoniae can lead to acute respiratory distress requiring mechanical ventilation. The pore-forming toxin pneumolysin alters the alveolar-capillary barrier and causes extravasation of protein-rich fluid into the interstitial pulmonary tissue, which impairs gas exchange. Platelets usually prevent endothelial leakage in inflamed pulmonary tissue by sealing inflammation-induced endothelial gaps. We not only confirm that S pneumoniae induces CD62P expression in platelets, but we also show that, in the presence of pneumolysin, CD62P expression is not associated with platelet activation. Pneumolysin induces pores in the platelet membrane, which allow anti-CD62P antibodies to stain the intracellular CD62P without platelet activation. Pneumolysin treatment also results in calcium efflux, increase in light transmission by platelet lysis (not aggregation), loss of platelet thrombus formation in the flow chamber, and loss of pore-sealing capacity of platelets in the Boyden chamber. Specific anti-pneumolysin monoclonal and polyclonal antibodies inhibit these effects of pneumolysin on platelets as do polyvalent human immunoglobulins. In a post hoc analysis of the prospective randomized phase 2 CIGMA trial, we show that administration of a polyvalent immunoglobulin preparation was associated with a nominally higher platelet count and nominally improved survival in patients with severe S pneumoniae-related community-acquired pneumonia. Although, due to the low number of patients, no definitive conclusion can be made, our findings provide a rationale for investigation of pharmacologic immunoglobulin preparations to target pneumolysin by polyvalent immunoglobulin preparations in severe community-acquired pneumococcal pneumonia, to counteract the risk of these patients becoming ventilation dependent. This trial was registered at www.clinicaltrials.gov as #NCT01420744.
    • Longitudinal Multi-omics Analyses Identify Responses of Megakaryocytes, Erythroid Cells, and Plasmablasts as Hallmarks of Severe COVID-19.

      Bernardes, Joana P; Mishra, Neha; Tran, Florian; Bahmer, Thomas; Best, Lena; Blase, Johanna I; Bordoni, Dora; Franzenburg, Jeanette; Geisen, Ulf; Josephs-Spaulding, Jonathan; et al. (Elsevier (Cell Press), 2020-11-26)
      Temporal resolution of cellular features associated with a severe COVID-19 disease trajectory is needed for understanding skewed immune responses and defining predictors of outcome. Here, we performed a longitudinal multi-omics study using a two-center cohort of 14 patients. We analyzed the bulk transcriptome, bulk DNA methylome, and single-cell transcriptome (>358,000 cells, including BCR profiles) of peripheral blood samples harvested from up to 5 time points. Validation was performed in two independent cohorts of COVID-19 patients. Severe COVID-19 was characterized by an increase of proliferating, metabolically hyperactive plasmablasts. Coinciding with critical illness, we also identified an expansion of interferon-activated circulating megakaryocytes and increased erythropoiesis with features of hypoxic signaling. Megakaryocyte- and erythroid-cell-derived co-expression modules were predictive of fatal disease outcome. The study demonstrates broad cellular effects of SARS-CoV-2 infection beyond adaptive immune cells and provides an entry point toward developing biomarkers and targeted treatments of patients with COVID-19.
    • Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research.

      Hufsky, Franziska; Lamkiewicz, Kevin; Almeida, Alexandre; Aouacheria, Abdel; Arighi, Cecilia; Bateman, Alex; Baumbach, Jan; Beerenwinkel, Niko; Brandt, Christian; Cacciabue, Marco; et al. (Oxford Academic, 2020-11-04)
      SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The virus causes the infectious disease COVID-19. The biology of coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly for SARS-CoV-2 have only recently been developed as a rapid reaction to the need for fast detection, understanding and treatment of COVID-19. To control the ongoing COVID-19 pandemic, it is of utmost importance to get insight into the evolution and pathogenesis of the virus. In this review, we cover bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets and development of therapeutic strategies. For each tool, we briefly describe its use case and how it advances research specifically for SARS-CoV-2. All tools are free to use and available online, either through web applications or public code repositories.
    • Severe COVID-19 Is Marked by a Dysregulated Myeloid Cell Compartment.

      Schulte-Schrepping, Jonas; Reusch, Nico; Paclik, Daniela; Baßler, Kevin; Schlickeiser, Stephan; Zhang, Bowen; Krämer, Benjamin; Krammer, Tobias; Brumhard, Sophia; Bonaguro, Lorenzo; et al. (Elsevier /Cell Press), 2020-08-05)
      Coronavirus disease 2019 (COVID-19) is a mild to moderate respiratory tract infection, however, a subset of patients progress to severe disease and respiratory failure. The mechanism of protective immunity in mild forms and the pathogenesis of severe COVID-19 associated with increased neutrophil counts and dysregulated immune responses remain unclear. In a dual-center, two-cohort study, we combined single-cell RNA-sequencing and single-cell proteomics of whole-blood and peripheral-blood mononuclear cells to determine changes in immune cell composition and activation in mild versus severe COVID-19 (242 samples from 109 individuals) over time. HLA-DRhiCD11chi inflammatory monocytes with an interferon-stimulated gene signature were elevated in mild COVID-19. Severe COVID-19 was marked by occurrence of neutrophil precursors, as evidence of emergency myelopoiesis, dysfunctional mature neutrophils, and HLA-DRlo monocytes. Our study provides detailed insights into the systemic immune response to SARS-CoV-2 infection and reveals profound alterations in the myeloid cell compartment associated with severe COVID-19.
    • Evolutionary Stabilization of Cooperative Toxin Production through a Bacterium-Plasmid-Phage Interplay.

      Spriewald, Stefanie; Stadler, Eva; Hense, Burkhard A; Münch, Philipp C; McHardy, Alice C; Weiss, Anna S; Obeng, Nancy; Müller, Johannes; Stecher, Bärbel; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany. (ASM, 2020-07-21)
      Colicins are toxins produced and released by Enterobacteriaceae to kill competitors in the gut. While group A colicins employ a division of labor strategy to liberate the toxin into the environment via colicin-specific lysis, group B colicin systems lack cognate lysis genes. In Salmonella enterica serovar Typhimurium (S. Tm), the group B colicin Ib (ColIb) is released by temperate phage-mediated bacteriolysis. Phage-mediated ColIb release promotes S. Tm fitness against competing Escherichia coli It remained unclear how prophage-mediated lysis is realized in a clonal population of ColIb producers and if prophages contribute to evolutionary stability of toxin release in S. Tm. Here, we show that prophage-mediated lysis occurs in an S. Tm subpopulation only, thereby introducing phenotypic heterogeneity to the system. We established a mathematical model to study the dynamic interplay of S. Tm, ColIb, and a temperate phage in the presence of a competing species. Using this model, we studied long-term evolution of phage lysis rates in a fluctuating infection scenario. This revealed that phage lysis evolves as bet-hedging strategy that maximizes phage spread, regardless of whether colicin is present or not. We conclude that the ColIb system, lacking its own lysis gene, is making use of the evolutionary stable phage strategy to be released. Prophage lysis genes are highly prevalent in nontyphoidal Salmonella genomes. This suggests that the release of ColIb by temperate phages is widespread. In conclusion, our findings shed new light on the evolution and ecology of group B colicin systems.IMPORTANCE Bacteria are excellent model organisms to study mechanisms of social evolution. The production of public goods, e.g., toxin release by cell lysis in clonal bacterial populations, is a frequently studied example of cooperative behavior. Here, we analyze evolutionary stabilization of toxin release by the enteric pathogen Salmonella The release of colicin Ib (ColIb), which is used by Salmonella to gain an edge against competing microbiota following infection, is coupled to bacterial lysis mediated by temperate phages. Here, we show that phage-dependent lysis and subsequent release of colicin and phage particles occurs only in part of the ColIb-expressing Salmonella population. This phenotypic heterogeneity in lysis, which represents an essential step in the temperate phage life cycle, has evolved as a bet-hedging strategy under fluctuating environments such as the gastrointestinal tract. Our findings suggest that prophages can thereby evolutionarily stabilize costly toxin release in bacterial populations.
    • Evaluating assembly and variant calling software for strain-resolved analysis of large DNA viruses

      Deng, Zhi-Luo; Dhingra, Akshay; Fritz, Adrian; Götting, Jasper; Münch, Philipp C; Steinbrück, Lars; Schulz, Thomas F; Ganzenmüller, Tina; McHardy, Alice C; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany. (Oxford University Press (OUP), 2020-07-07)
      Infection with human cytomegalovirus (HCMV) can cause severe complications in immunocompromised individuals and congenitally infected children. Characterizing heterogeneous viral populations and their evolution by high-throughput sequencing of clinical specimens requires the accurate assembly of individual strains or sequence variants and suitable variant calling methods. However, the performance of most methods has not been assessed for populations composed of low divergent viral strains with large genomes, such as HCMV. In an extensive benchmarking study, we evaluated 15 assemblers and 6 variant callers on 10 lab-generated benchmark data sets created with two different library preparation protocols, to identify best practices and challenges for analyzing such data. Most assemblers, especially metaSPAdes and IVA, performed well across a range of metrics in recovering abundant strains. However, only one, Savage, recovered low abundant strains and in a highly fragmented manner. Two variant callers, LoFreq and VarScan2, excelled across all strain abundances. Both shared a large fraction of false positive variant calls, which were strongly enriched in T to G changes in a ‘G.G’ context. The magnitude of this context-dependent systematic error is linked to the experimental protocol. We provide all benchmarking data, results and the entire benchmarking workflow named QuasiModo, Quasispecies Metric determination on omics, under the GNU General Public License v3.0 (https://github.com/hzi-bifo/Quasimodo), to enable full reproducibility and further benchmarking on these and other data.
    • YBX1 Indirectly Targets Heterochromatin-Repressed Inflammatory Response-Related Apoptosis Genes through Regulating CBX5 mRNA.

      Kloetgen, Andreas; Duggimpudi, Sujitha; Schuschel, Konstantin; Hezaveh, Kebria; Picard, Daniel; Schaal, Heiner; Remke, Marc; Klusmann, Jan-Henning; Borkhardt, Arndt; McHardy, Alice C; et al. (MDPI, 2020-06-23)
      Medulloblastomas arise from undifferentiated precursor cells in the cerebellum and account for about 20% of all solid brain tumors during childhood; standard therapies include radiation and chemotherapy, which oftentimes come with severe impairment of the cognitive development of the young patients. Here, we show that the posttranscriptional regulator Y-box binding protein 1 (YBX1), a DNA- and RNA-binding protein, acts as an oncogene in medulloblastomas by regulating cellular survival and apoptosis. We observed different cellular responses upon YBX1 knockdown in several medulloblastoma cell lines, with significantly altered transcription and subsequent apoptosis rates. Mechanistically, PAR-CLIP for YBX1 and integration with RNA-Seq data uncovered direct posttranscriptional control of the heterochromatin-associated gene CBX5; upon YBX1 knockdown and subsequent CBX5 mRNA instability, heterochromatin-regulated genes involved in inflammatory response, apoptosis and death receptor signaling were de-repressed. Thus, YBX1 acts as an oncogene in medulloblastoma through indirect transcriptional regulation of inflammatory genes regulating apoptosis and represents a promising novel therapeutic target in this tumor entity.
    • Functional omics analyses reveal only minor effects of microRNAs on human somatic stem cell differentiation.

      Schira-Heinen, Jessica; Czapla, Agathe; Hendricks, Marion; Kloetgen, Andreas; Wruck, Wasco; Adjaye, James; Kögler, Gesine; Werner Müller, Hans; Stühler, Kai; Trompeter, Hans-Ingo; et al. (NPG, 2020-02-24)
      The contribution of microRNA-mediated posttranscriptional regulation on the final proteome in differentiating cells remains elusive. Here, we evaluated the impact of microRNAs (miRNAs) on the proteome of human umbilical cord blood-derived unrestricted somatic stem cells (USSC) during retinoic acid (RA) differentiation by a systemic approach using next generation sequencing analysing mRNA and miRNA expression and quantitative mass spectrometry-based proteome analyses. Interestingly, regulation of mRNAs and their dedicated proteins highly correlated during RA-incubation. Additionally, RA-induced USSC demonstrated a clear separation from native USSC thereby shifting from a proliferating to a metabolic phenotype. Bioinformatic integration of up- and downregulated miRNAs and proteins initially implied a strong impact of the miRNome on the XXL-USSC proteome. However, quantitative proteome analysis of the miRNA contribution on the final proteome after ectopic overexpression of downregulated miR-27a-5p and miR-221-5p or inhibition of upregulated miR-34a-5p, respectively, followed by RA-induction revealed only minor proportions of differentially abundant proteins. In addition, only small overlaps of these regulated proteins with inversely abundant proteins in non-transfected RA-treated USSC were observed. Hence, mRNA transcription rather than miRNA-mediated regulation is the driving force for protein regulation upon RA-incubation, strongly suggesting that miRNAs are fine-tuning regulators rather than active primary switches during RA-induction of USSC.
    • Eleven grand challenges in single-cell data science.

      Lähnemann, David; Köster, Johannes; Szczurek, Ewa; McCarthy, Davis J; Hicks, Stephanie C; Robinson, Mark D; Vallejos, Catalina A; Campbell, Kieran R; Beerenwinkel, Niko; Mahfouz, Ahmed; et al. (BMC, 2020-02-07)
      The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.
    • Phylogeographic reconstruction using air transportation data and its application to the 2009 H1N1 influenza A pandemic.

      Reimering, Susanne; Muñoz, Sebastian; McHardy, Alice C; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany. (PLOS, 2020-02-01)
      Influenza A viruses cause seasonal epidemics and occasional pandemics in the human population. While the worldwide circulation of seasonal influenza is at least partly understood, the exact migration patterns between countries, states or cities are not well studied. Here, we use the Sankoff algorithm for parsimonious phylogeographic reconstruction together with effective distances based on a worldwide air transportation network. By first simulating geographic spread and then phylogenetic trees and genetic sequences, we confirmed that reconstructions with effective distances inferred phylogeographic spread more accurately than reconstructions with geographic distances and Bayesian reconstructions with BEAST that do not use any distance information, and led to comparable results to the Bayesian reconstruction using distance information via a generalized linear model. Our method extends Bayesian methods that estimate rates from the data by using fine-grained locations like airports and inferring intermediate locations not observed among sampled isolates. When applied to sequence data of the pandemic H1N1 influenza A virus in 2009, our approach correctly inferred the origin and proposed airports mainly involved in the spread of the virus. In case of a novel outbreak, this approach allows to rapidly analyze sequence data and infer origin and spread routes to improve disease surveillance and control.
    • CAMITAX: Taxon labels for microbial genomes.

      Bremges, Andreas; Fritz, Adrian; McHardy, Alice C; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany. (Oxford Academic, 2020-01-01)
      BACKGROUND: The number of microbial genome sequences is increasing exponentially, especially thanks to recent advances in recovering complete or near-complete genomes from metagenomes and single cells. Assigning reliable taxon labels to genomes is key and often a prerequisite for downstream analyses. FINDINGS: We introduce CAMITAX, a scalable and reproducible workflow for the taxonomic labelling of microbial genomes recovered from isolates, single cells, and metagenomes. CAMITAX combines genome distance-, 16S ribosomal RNA gene-, and gene homology-based taxonomic assignments with phylogenetic placement. It uses Nextflow to orchestrate reference databases and software containers and thus combines ease of installation and use with computational reproducibility. We evaluated the method on several hundred metagenome-assembled genomes with high-quality taxonomic annotations from the TARA Oceans project, and we show that the ensemble classification method in CAMITAX improved on all individual methods across tested ranks. CONCLUSIONS: While we initially developed CAMITAX to aid the Critical Assessment of Metagenome Interpretation (CAMI) initiative, it evolved into a comprehensive software package to reliably assign taxon labels to microbial genomes. CAMITAX is available under Apache License 2.0 at https://github.com/CAMI-challenge/CAMITAX.
    • The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens.

      Zhou, Naihui; Jiang, Yuxiang; Bergquist, Timothy R; Lee, Alexandra J; Kacsoh, Balint Z; Crocker, Alex W; Lewis, Kimberley A; Georghiou, George; Nguyen, Huy N; Hamid, Md Nafiz; et al. (BMC, 2019-11-19)
      BACKGROUND: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. RESULTS: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. CONCLUSION: We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.
    • Pediatric ALL relapses after allo-SCT show high individuality, clonal dynamics, selective pressure, and druggable targets.

      Hoell, Jessica I; Ginzel, Sebastian; Kuhlen, Michaela; Kloetgen, Andreas; Gombert, Michael; Fischer, Ute; Hein, Daniel; Demir, Salih; Stanulla, Martin; Schrappe, Martin; et al. (American Society of Haematology, 2019-10-22)
      Survival of patients with pediatric acute lymphoblastic leukemia (ALL) after allogeneic hematopoietic stem cell transplantation (allo-SCT) is mainly compromised by leukemia relapse, carrying dismal prognosis. As novel individualized therapeutic approaches are urgently needed, we performed whole-exome sequencing of leukemic blasts of 10 children with post-allo-SCT relapses with the aim of thoroughly characterizing the mutational landscape and identifying druggable mutations. We found that post-allo-SCT ALL relapses display highly diverse and mostly patient-individual genetic lesions. Moreover, mutational cluster analysis showed substantial clonal dynamics during leukemia progression from initial diagnosis to relapse after allo-SCT. Only very few alterations stayed constant over time. This dynamic clonality was exemplified by the detection of thiopurine resistance-mediating mutations in the nucleotidase NT5C2 in 3 patients' first relapses, which disappeared in the post-allo-SCT relapses on relief of selective pressure of maintenance chemotherapy. Moreover, we identified TP53 mutations in 4 of 10 patients after allo-SCT, reflecting acquired chemoresistance associated with selective pressure of prior antineoplastic treatment. Finally, in 9 of 10 children's post-allo-SCT relapse, we found alterations in genes for which targeted therapies with novel agents are readily available. We could show efficient targeting of leukemic blasts by APR-246 in 2 patients carrying TP53 mutations. Our findings shed light on the genetic basis of post-allo-SCT relapse and may pave the way for unraveling novel therapeutic strategies in this challenging situation.
    • Structures and functions linked to genome-wide adaptation of human influenza A viruses.

      Klingen, Thorsten R; Loers, Jens; Stanelle-Bertram, Stephanie; Gabriel, Gülsah; McHardy, Alice C; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany. (Springer-Nature, 2019-04-18)
      Human influenza A viruses elicit short-term respiratory infections with considerable mortality and morbidity. While H3N2 viruses circulate for more than 50 years, the recent introduction of pH1N1 viruses presents an excellent opportunity for a comparative analysis of the genome-wide evolutionary forces acting on both subtypes. Here, we inferred patches of sites relevant for adaptation, i.e. being under positive selection, on eleven viral protein structures, from all available data since 1968 and correlated these with known functional properties. Overall, pH1N1 have more patches than H3N2 viruses, especially in the viral polymerase complex, while antigenic evolution is more apparent for H3N2 viruses. In both subtypes, NS1 has the highest patch and patch site frequency, indicating that NS1-mediated viral attenuation of host inflammatory responses is a continuously intensifying process, elevated even in the longtime-circulating subtype H3N2. We confirmed the resistance-causing effects of two pH1N1 changes against oseltamivir in NA activity assays, demonstrating the value of the resource for discovering functionally relevant changes. Our results represent an atlas of protein regions and sites with links to host adaptation, antiviral drug resistance and immune evasion for both subtypes for further study.
    • Probabilistic variable-length segmentation of protein sequences for discriminative motif discovery (DiMotif) and sequence embedding (ProtVecX).

      Asgari, Ehsaneddin; McHardy, Alice C; Mofrad, Mohammad R K; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany. (Springer Nature, 2019-03-05)
    • Assessing taxonomic metagenome profilers with OPAL.

      Meyer, Fernando; Bremges, Andreas; Belmann, Peter; Janssen, Stefan; McHardy, Alice C; Koslicki, David; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany. (BioMedCentral, 2019-03-04)
      The explosive growth in taxonomic metagenome profiling methods over the past years has created a need for systematic comparisons using relevant performance criteria. The Open-community Profiling Assessment tooL (OPAL) implements commonly used performance metrics, including those of the first challenge of the initiative for the Critical Assessment of Metagenome Interpretation (CAMI), together with convenient visualizations. In addition, we perform in-depth performance comparisons with seven profilers on datasets of CAMI and the Human Microbiome Project. OPAL is freely available at https://github.com/CAMI-challenge/OPAL .
    • CAMISIM: simulating metagenomes and microbial communities.

      Fritz, Adrian; Hofmann, Peter; Majda, Stephan; Dahms, Eik; Dröge, Johannes; Fiedler, Jessika; Lesker, Till R; Belmann, Peter; DeMaere, Matthew Z; Darling, Aaron E; et al. (BioMedCentral, 2019-02-08)
      Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required. We describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series, and differential abundance studies, includes real and simulated strain-level diversity, and generates second- and third-generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes, we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT, and metaSPAdes, on several thousand small data sets generated with CAMISIM. CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with standards of truth for method evaluation. All data sets and the software are freely available at https://github.com/CAMI-challenge/CAMISIM.
    • Toward unrestricted use of public genomic data.

      Amann, Rudolf I; Baichoo, Shakuntala; Blencowe, Benjamin J; Bork, Peer; Borodovsky, Mark; Brooksbank, Cath; Chain, Patrick S G; Colwell, Rita R; Daffonchio, Daniele G; Danchin, Antoine; et al. (AAAS, 2019-01-25)
      Despite some notable progress in data sharing policies and practices, restrictions are still often placed on the open and unconditional use of various genomic data after they have received official approval for release to the public domain or to public databases. These restrictions, which often conflict with the terms and conditions of the funding bodies who supported the release of those data for the benefit of the scientific community and society, are perpetuated by the lack of clear guiding rules for data usage. Existing guidelines for data released to the public domain recognize but fail to resolve tensions between the importance of free and unconditional use of these data and the “right” of the data producers to the first publication. This self-contradiction has resulted in a loophole that allows different interpretations and a continuous debate between data producers and data users on the use of public data. We argue that the publicly available data should be treated as open data, a shared resource with unrestricted use for analysis, interpretation, and publication.