Show simple item record

dc.contributor.authorDeng, Zhi-Luo
dc.contributor.authorDhingra, Akshay
dc.contributor.authorFritz, Adrian
dc.contributor.authorGötting, Jasper
dc.contributor.authorMünch, Philipp C
dc.contributor.authorSteinbrück, Lars
dc.contributor.authorSchulz, Thomas F
dc.contributor.authorGanzenmüller, Tina
dc.contributor.authorMcHardy, Alice C
dc.date.accessioned2021-03-23T12:39:07Z
dc.date.available2021-03-23T12:39:07Z
dc.date.issued2020-07-07
dc.identifier.citationBriefings in Bioinformatics, 2020;, bbaa123, https://doi.org/10.1093/bib/bbaa123.en_US
dc.identifier.doi10.1093/bib/bbaa123
dc.identifier.urihttp://hdl.handle.net/10033/622786
dc.description.abstractInfection with human cytomegalovirus (HCMV) can cause severe complications in immunocompromised individuals and congenitally infected children. Characterizing heterogeneous viral populations and their evolution by high-throughput sequencing of clinical specimens requires the accurate assembly of individual strains or sequence variants and suitable variant calling methods. However, the performance of most methods has not been assessed for populations composed of low divergent viral strains with large genomes, such as HCMV. In an extensive benchmarking study, we evaluated 15 assemblers and 6 variant callers on 10 lab-generated benchmark data sets created with two different library preparation protocols, to identify best practices and challenges for analyzing such data. Most assemblers, especially metaSPAdes and IVA, performed well across a range of metrics in recovering abundant strains. However, only one, Savage, recovered low abundant strains and in a highly fragmented manner. Two variant callers, LoFreq and VarScan2, excelled across all strain abundances. Both shared a large fraction of false positive variant calls, which were strongly enriched in T to G changes in a ‘G.G’ context. The magnitude of this context-dependent systematic error is linked to the experimental protocol. We provide all benchmarking data, results and the entire benchmarking workflow named QuasiModo, Quasispecies Metric determination on omics, under the GNU General Public License v3.0 (https://github.com/hzi-bifo/Quasimodo), to enable full reproducibility and further benchmarking on these and other data.en_US
dc.description.sponsorshipDeutsches Zentrum für Infektionsforschungen_US
dc.language.isoenen_US
dc.publisherOxford University Press (OUP)en_US
dc.rightsAttribution 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.subjectMolecular Biologyen_US
dc.subjectInformation Systemsen_US
dc.titleEvaluating assembly and variant calling software for strain-resolved analysis of large DNA virusesen_US
dc.typeProblem solving protocolen_US
dc.identifier.eissn1477-4054
dc.contributor.departmentBRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany.en_US
dc.identifier.journalBriefings in Bioinformaticsen_US
refterms.dateFOA2021-03-23T12:39:07Z
dc.source.journaltitleBriefings in Bioinformatics


Files in this item

Thumbnail
Name:
Deng et al.pdf
Size:
4.340Mb
Format:
PDF
Description:
Open Access publication
Thumbnail
Name:
Deng_suppl.pdf
Size:
3.314Mb
Format:
PDF
Description:
supplemental material

This item appears in the following Collection(s)

Show simple item record

Attribution 4.0 International
Except where otherwise noted, this item's license is described as Attribution 4.0 International