• Evaluating assembly and variant calling software for strain-resolved analysis of large DNA viruses

      Deng, Zhi-Luo; Dhingra, Akshay; Fritz, Adrian; Götting, Jasper; Münch, Philipp C; Steinbrück, Lars; Schulz, Thomas F; Ganzenmüller, Tina; McHardy, Alice C; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany. (Oxford University Press (OUP), 2020-07-07)
      Infection with human cytomegalovirus (HCMV) can cause severe complications in immunocompromised individuals and congenitally infected children. Characterizing heterogeneous viral populations and their evolution by high-throughput sequencing of clinical specimens requires the accurate assembly of individual strains or sequence variants and suitable variant calling methods. However, the performance of most methods has not been assessed for populations composed of low divergent viral strains with large genomes, such as HCMV. In an extensive benchmarking study, we evaluated 15 assemblers and 6 variant callers on 10 lab-generated benchmark data sets created with two different library preparation protocols, to identify best practices and challenges for analyzing such data. Most assemblers, especially metaSPAdes and IVA, performed well across a range of metrics in recovering abundant strains. However, only one, Savage, recovered low abundant strains and in a highly fragmented manner. Two variant callers, LoFreq and VarScan2, excelled across all strain abundances. Both shared a large fraction of false positive variant calls, which were strongly enriched in T to G changes in a ‘G.G’ context. The magnitude of this context-dependent systematic error is linked to the experimental protocol. We provide all benchmarking data, results and the entire benchmarking workflow named QuasiModo, Quasispecies Metric determination on omics, under the GNU General Public License v3.0 (https://github.com/hzi-bifo/Quasimodo), to enable full reproducibility and further benchmarking on these and other data.
    • Tutorial: assessing metagenomics software with the CAMI benchmarking toolkit.

      Meyer, Fernando; Lesker, Till-Robin; Koslicki, David; Fritz, Adrian; Gurevich, Alexey; Darling, Aaron E; Sczyrba, Alexander; Bremges, Andreas; McHardy, Alice C; BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany. (Nature Research, 2021-03-01)
      Computational methods are key in microbiome research, and obtaining a quantitative and unbiased performance estimate is important for method developers and applied researchers. For meaningful comparisons between methods, to identify best practices and common use cases, and to reduce overhead in benchmarking, it is necessary to have standardized datasets, procedures and metrics for evaluation. In this tutorial, we describe emerging standards in computational meta-omics benchmarking derived and agreed upon by a larger community of researchers. Specifically, we outline recent efforts by the Critical Assessment of Metagenome Interpretation (CAMI) initiative, which supplies method developers and applied researchers with exhaustive quantitative data about software performance in realistic scenarios and organizes community-driven benchmarking challenges. We explain the most relevant evaluation metrics for assessing metagenome assembly, binning and profiling results, and provide step-by-step instructions on how to generate them. The instructions use simulated mouse gut metagenome data released in preparation for the second round of CAMI challenges and showcase the use of a repository of tool results for CAMI datasets. This tutorial will serve as a reference for the community and facilitate informative and reproducible benchmarking in microbiome research.