• Dual RNA-seq analysis of in vitro infection multiplicity and RNA depletion methods in Chlamydia-infected epithelial cells.

      Hayward, Regan J; Humphrys, Michael S; Huston, Wilhelmina M; Myers, Garry S A; HIRI, Helmholtz-Institut für RNA-basierte Infektionsforschung, Josef-Shneider Strasse 2, 97080 Würzburg, Germany. (Nature Research, 2021-05-17)
      Dual RNA-seq experiments examining viral and bacterial pathogens are increasing, but vary considerably in their experimental designs, such as infection rates and RNA depletion methods. Here, we have applied dual RNA-seq to Chlamydia trachomatis infected epithelial cells to examine transcriptomic responses from both organisms. We compared two time points post infection (1 and 24 h), three multiplicity of infection (MOI) ratios (0.1, 1 and 10) and two RNA depletion methods (rRNA and polyA). Capture of bacterial-specific RNA were greatest when combining rRNA and polyA depletion, and when using a higher MOI. However, under these conditions, host RNA capture was negatively impacted. Although it is tempting to use high infection rates, the implications on host cell survival, the potential reduced length of infection cycles and real world applicability should be considered. This data highlights the delicate nature of balancing host-pathogen RNA capture and will assist future transcriptomic-based studies to achieve more specific and relevant infection-related biological insights.
    • Functional analysis of colonization factor antigen I positive enterotoxigenic identifies genes implicated in survival in water and host colonization.

      Abd El Ghany, Moataz; Barquist, Lars; Clare, Simon; Brandt, Cordelia; Mayho, Matthew; Joffre, Enrique; Sjöling, Åsa; Turner, A Keith; Klena, John D; Kingsley, Robert A; et al.
      Enterotoxigenic Escherichia coli (ETEC) expressing the colonization pili CFA/I are common causes of diarrhoeal infections in humans. Here, we use a combination of transposon mutagenesis and transcriptomic analysis to identify genes and pathways that contribute to ETEC persistence in water environments and colonization of a mammalian host. ETEC persisting in water exhibit a distinct RNA expression profile from those growing in richer media. Multiple pathways were identified that contribute to water survival, including lipopolysaccharide biosynthesis and stress response regulons. The analysis also indicated that ETEC growing in vivo in mice encounter a bottleneck driving down the diversity of colonizing ETEC populations.
    • Global RNA profiles show target selectivity and physiological effects of peptide-delivered antisense antibiotics.

      Popella, Linda; Jung, Jakob; Popova, Kristina; Ðurica-Mitić, Svetlana; Barquist, Lars; Vogel, Jörg; HIRI, Helmholtz-Institut für RNA-basierte Infektionsforschung, Josef-Shneider Strasse 2, 97080 Würzburg, Germany.
      Antisense peptide nucleic acids (PNAs) inhibiting mRNAs of essential genes provide a straight-forward way to repurpose our knowledge of bacterial regulatory RNAs for development of programmable species-specific antibiotics. While there is ample proof of PNA efficacy, their target selectivity and impact on bacterial physiology are poorly understood. Moreover, while antibacterial PNAs are typically designed to block mRNA translation, effects on target mRNA levels are not well-investigated. Here, we pioneer the use of global RNA-seq analysis to decipher PNA activity in a transcriptome-wide manner. We find that PNA-based antisense oligomer conjugates robustly decrease mRNA levels of the widely-used target gene, acpP, in Salmonella enterica, with limited off-target effects. Systematic analysis of several different PNA-carrier peptides attached not only shows different bactericidal efficiency, but also activation of stress pathways. In particular, KFF-, RXR- and Tat-PNA conjugates especially induce the PhoP/Q response, whereas the latter two additionally trigger several distinct pathways. We show that constitutive activation of the PhoP/Q response can lead to Tat-PNA resistance, illustrating the utility of RNA-seq for understanding PNA antibacterial activity. In sum, our study establishes an experimental framework for the design and assessment of PNA antimicrobials in the long-term quest to use these for precision editing of microbiota.
    • Global identification of RsmA/N binding sites in by UV CLIP-seq.

      Chihara, Kotaro; Barquist, Lars; Takasugi, Kenichi; Noda, Naohiro; Tsuneda, Satoshi; HIRI, Helmholtz-Institut für RNA-basierte Infektionsforschung, Josef-Shneider Strasse 2, 97080 Würzburg, Germany. (Taylor & Francis, 2021-04-27)
      Pseudomonas aeruginosa harbours two redundant RNA-binding proteins RsmA/RsmN (RsmA/N), which play a critical role in balancing acute and chronic infections. However, in vivo binding sites on target transcripts and the overall impact on the physiology remains unclear. In this study, we applied in vivo UV crosslinking immunoprecipitation followed by RNA-sequencing (UV CLIP-seq) to detect RsmA/N-binding sites at single-nucleotide resolution and mapped more than 500 binding sites to approximately 400 genes directly bound by RsmA/N in P. aeruginosa. This also verified the ANGGA sequence in apical loops skewed towards 5'UTRs as a consensus motif for RsmA/N binding. Genetic analysis combined with CLIP-seq results suggested previously unrecognized RsmA/N targets involved in LPS modification. Moreover, the RsmA/N-titrating RNAs RsmY/RsmZ may be positively regulated by the RsmA/N-mediated translational repression of their upstream regulators, thus providing a possible mechanistic explanation for homoeostasis of the Rsm system. Thus, our study provides a detailed view of RsmA/N-RNA interactions and a resource for further investigation of the pleiotropic effects of RsmA/N on gene expression in P. aeruginosa.
    • The minimal meningococcal ProQ protein has an intrinsic capacity for structure-based global RNA recognition.

      Bauriedl, Saskia; Gerovac, Milan; Heidrich, Nadja; Bischler, Thorsten; Barquist, Lars; Vogel, Jörg; Schoen, Christoph; HIRI, Helmholtz-Institut für RNA-basierte Infektionsforschung, Josef-Shneider Strasse 2, 97080 Würzburg, Germany. (Nature Research, 2020-06-04)
      FinO-domain proteins are a widespread family of bacterial RNA-binding proteins with regulatory functions. Their target spectrum ranges from a single RNA pair, in the case of plasmid-encoded FinO, to global RNA regulons, as with enterobacterial ProQ. To assess whether the FinO domain itself is intrinsically selective or promiscuous, we determine in vivo targets of Neisseria meningitidis, which consists of solely a FinO domain. UV-CLIP-seq identifies associations with 16 small non-coding sRNAs and 166 mRNAs. Meningococcal ProQ predominantly binds to highly structured regions and generally acts to stabilize its RNA targets. Loss of ProQ alters transcript levels of >250 genes, demonstrating that this minimal ProQ protein impacts gene expression globally. Phenotypic analyses indicate that ProQ promotes oxidative stress resistance and DNA damage repair. We conclude that FinO domain proteins recognize some abundant type of RNA shape and evolve RNA binding selectivity through acquisition of additional regions that constrain target recognition.
    • A high-resolution transcriptome map identifies small RNA regulation of metabolism in the gut microbe Bacteroides thetaiotaomicron.

      Ryan, Daniel; Jenniches, Laura; Reichardt, Sarah; Barquist, Lars; Westermann, Alexander J; HIRI, Helmholtz-Institut für RNA-basierte Infektionsforschung, Josef-Shneider Strasse 2, 97080 Würzburg, Germany. (NPG, 2020-07-16)
      Bacteria of the genus Bacteroides are common members of the human intestinal microbiota and important degraders of polysaccharides in the gut. Among them, the species Bacteroides thetaiotaomicron has emerged as the model organism for functional microbiota research. Here, we use differential RNA sequencing (dRNA-seq) to generate a single-nucleotide resolution transcriptome map of B. thetaiotaomicron grown under defined laboratory conditions. An online browser, called 'Theta-Base' ( www.helmholtz-hiri.de/en/datasets/bacteroides ), is launched to interrogate the obtained gene expression data and annotations of ~4500 transcription start sites, untranslated regions, operon structures, and 269 noncoding RNA elements. Among the latter is GibS, a conserved, 145 nt-long small RNA that is highly expressed in the presence of N-acetyl-D-glucosamine as sole carbon source. We use computational predictions and experimental data to determine the secondary structure of GibS and identify its target genes. Our results indicate that sensing of N-acetyl-D-glucosamine induces GibS expression, which in turn modifies the transcript levels of metabolic enzymes.
    • Single-Nucleotide RNA Maps for the Two Major Nosocomial Pathogens Enterococcus faecalis and Enterococcus faecium

      Michaux, Charlotte; Hansen, Elisabeth E; Jenniches, Laura; Gerovac, Milan; Barquist, Lars; Vogel, Jörg; HIRI, Helmholtz-Institut für RNA-basierte Infektionsforschung, Josef-Shneider Strasse 2, 97080 Würzburg, Germany. (Frontiers, 2020-11-25)
      Enterococcus faecalis and faecium are two major representative clinical strains of the Enterococcus genus and are sadly notorious to be part of the top agents responsible for nosocomial infections. Despite their critical implication in worldwide public healthcare, essential and available resources such as deep transcriptome annotations remain poor, which also limits our understanding of post-transcriptional control small regulatory RNA (sRNA) functions in these bacteria. Here, using the dRNA-seq technique in combination with ANNOgesic analysis, we successfully mapped and annotated transcription start sites (TSS) of both E. faecalis V583 and E. faecium AUS0004 at single nucleotide resolution. Analyzing bacteria in late exponential phase, we capture ~40% (E. faecalis) and 43% (E. faecium) of the annotated protein-coding genes, determine 5' and 3' UTR (untranslated region) length, and detect instances of leaderless mRNAs. The transcriptome maps revealed sRNA candidates in both bacteria, some found in previous studies and new ones. Expression of candidate sRNAs is being confirmed under biologically relevant environmental conditions. This comprehensive global TSS mapping atlas provides a valuable resource for RNA biology and gene expression analysis in the Enterococci. It can be accessed online at www.helmholtz-hiri.de/en/datasets/enterococcus through an instance of the genomic viewer JBrowse.
    • Conditional Hfq Association with Small Noncoding RNAs in Pseudomonas aeruginosa Revealed through Comparative UV Cross-Linking Immunoprecipitation Followed by High-Throughput Sequencing.

      Chihara, Kotaro; Bischler, Thorsten; Barquist, Lars; Monzon, Vivian A; Noda, Naohiro; Vogel, Jörg; Tsuneda, Satoshi (2019-12-03)
      Bacterial small noncoding RNAs (sRNAs) play posttranscriptional regulatory roles in cellular responses to changing environmental cues and in adaptation to harsh conditions. Generally, the RNA-binding protein Hfq helps sRNAs associate with target mRNAs to modulate their translation and to modify global RNA pools depending on physiological state. Here, a combination of in vivo UV cross-linking immunoprecipitation followed by high-throughput sequencing (CLIP-seq) and total RNA-seq showed that Hfq interacts with different regions of the Pseudomonas aeruginosa transcriptome under planktonic versus biofilm conditions. In the present approach, P. aeruginosa Hfq preferentially interacted with repeats of the AAN triplet motif at mRNA 5' untranslated regions (UTRs) and sRNAs and U-rich sequences at rho-independent terminators. Further transcriptome analysis suggested that the association of sRNAs with Hfq is primarily a function of their expression levels, strongly supporting the notion that the pool of Hfq-associated RNAs is equilibrated by RNA concentration-driven cycling on and off Hfq. Overall, our combinatorial CLIP-seq and total RNA-seq approach highlights conditional sRNA associations with Hfq as a novel aspect of posttranscriptional regulation in P. aeruginosaIMPORTANCE The Gram-negative bacterium P. aeruginosa is ubiquitously distributed in diverse environments and can cause severe biofilm-related infections in at-risk individuals. Although the presence of a large number of putative sRNAs and widely conserved RNA chaperones in this bacterium implies the importance of posttranscriptional regulatory networks for environmental fluctuations, limited information is available regarding the global role of RNA chaperones such as Hfq in the P. aeruginosa transcriptome, especially under different environmental conditions. Here, we characterize Hfq-dependent differences in gene expression and biological processes in two physiological states: the planktonic and biofilm forms. A combinatorial comparative CLIP-seq and total RNA-seq approach uncovered condition-dependent association of RNAs with Hfq in vivo and expands the potential direct regulatory targets of Hfq in the P. aeruginosa transcriptome.
    • A decade of advances in transposon-insertion sequencing.

      Cain, Amy K; Barquist, Lars; Goodman, Andrew L; Paulsen, Ian T; Parkhill, Julian; van Opijnen, Tim; HIRI, Helmholtz-Institut für RNA-basierte Infektionsforschung, Josef-Shneider Strasse 2, 97080 Würzburg, Germany. (Springer Nature, 2020-06-12)
      It has been 10 years since the introduction of modern transposon-insertion sequencing (TIS) methods, which combine genome-wide transposon mutagenesis with high-throughput sequencing to estimate the fitness contribution or essentiality of each genetic component in a bacterial genome. Four TIS variations were published in 2009: transposon sequencing (Tn-Seq), transposon-directed insertion site sequencing (TraDIS), insertion sequencing (INSeq) and high-throughput insertion tracking by deep sequencing (HITS). TIS has since become an important tool for molecular microbiologists, being one of the few genome-wide techniques that directly links phenotype to genotype and ultimately can assign gene function. In this Review, we discuss the recent applications of TIS to answer overarching biological questions. We explore emerging and multidisciplinary methods that build on TIS, with an eye towards future applications.
    • Plugging Small RNAs into the Network.

      Barquist, Lars; HIRI, Helmholtz-Institut für RNA-basierte Infektionsforschung, Josef-Shneider Strasse 2, 97080 Würzburg, Germany. (ASM, 2020-06-02)
      Small RNAs (sRNAs) have been discovered in every bacterium examined and have been shown to play important roles in the regulation of a diverse range of behaviors, from metabolism to infection. However, despite a wide range of available techniques for discovering and validating sRNA regulatory interactions, only a minority of these molecules have been well characterized. In part, this is due to the nature of posttranscriptional regulation: the activity of an sRNA depends on the state of the transcriptome as a whole, so characterization is best carried out under the conditions in which it is naturally active. In this issue of mSystems, Arrieta-Ortiz and colleagues (M. L. Arrieta-Ortiz, C. Hafemeister, B. Shuster, N. S. Baliga, et al., mSystems 5:e00057-20, 2020, https://doi.org/10.1128/mSystems.00057-20) present a network inference approach based on estimating sRNA activity across transcriptomic compendia. This shows promise not only for identifying new sRNA regulatory interactions but also for pinpointing the conditions in which these interactions occur, providing a new avenue toward functional characterization of sRNAs.
    • Dual RNA-seq of Orientia tsutsugamushi informs on host-pathogen interactions for this neglected intracellular human pathogen.

      Mika-Gospodorz, Bozena; Giengkam, Suparat; Westermann, Alexander J; Wongsantichon, Jantana; Kion-Crosby, Willow; Chuenklin, Suthida; Wang, Loo Chien; Sunyakumthorn, Piyanate; Sobota, Radoslaw M; Subbian, Selvakumar; et al. (Nature Publishing Group, 2020-07-03)
      Studying emerging or neglected pathogens is often challenging due to insufficient information and absence of genetic tools. Dual RNA-seq provides insights into host-pathogen interactions, and is particularly informative for intracellular organisms. Here we apply dual RNA-seq to Orientia tsutsugamushi (Ot), an obligate intracellular bacterium that causes the vector-borne human disease scrub typhus. Half the Ot genome is composed of repetitive DNA, and there is minimal collinearity in gene order between strains. Integrating RNA-seq, comparative genomics, proteomics, and machine learning to study the transcriptional architecture of Ot, we find evidence for wide-spread post-transcriptional antisense regulation. Comparing the host response to two clinical isolates, we identify distinct immune response networks for each strain, leading to predictions of relative virulence that are validated in a mouse infection model. Thus, dual RNA-seq can provide insight into the biology and host-pathogen interactions of a poorly characterized and genetically intractable organism such as Ot.
    • Rapid transcriptional responses to serum exposure are associated with sensitivity and resistance to antibody-mediated complement killing in invasive Typhimurium ST313.

      Ondari, Edna M; Klemm, Elizabeth J; Msefula, Chisomo L; El Ghany, Moataz Abd; Heath, Jennifer N; Pickard, Derek J; Barquist, Lars; Dougan, Gordon; Kingsley, Robert A; MacLennan, Calman A; et al. (F1000Research, 2019-01-01)
      Background: Salmonella Typhimurium ST313 exhibits signatures of adaptation to invasive human infection, including higher resistance to humoral immune responses than gastrointestinal isolates. Full resistance to antibody-mediated complement killing (serum resistance) among nontyphoidal Salmonellae is uncommon, but selection of highly resistant strains could compromise vaccine-induced antibody immunity. Here, we address the hypothesis that serum resistance is due to a distinct genotype or transcriptome response in S. Typhimurium ST313. Methods: Six S. Typhimurium ST313 bloodstream isolates, three of which were antibody resistant, were studied. Genomic content (single nucleotide polymorphisms and larger chromosomal modifications) of the strains was determined by Illumina and PACBIO sequencing, and functionally characterized using RNA-seq, transposon directed insertion site sequencing (TraDIS), targeted gene deletion and transfer of selected point mutations in an attempt to identify features associated with serum resistance.   Results: Sequence polymorphisms in genes from strains with atypical serum susceptibility when transferred from strains that were highly resistant or susceptible to a strain that exhibited intermediate susceptibility did not significantly alter serum killing phenotype. No large chromosomal modifications typified serum resistance or susceptibility. Genes required for resistance to serum identified by TraDIS and RNA-seq included those involved in exopolysaccharide synthesis, iron scavenging and metabolism. Most of the down-regulated genes were associated with membrane proteins. Resistant and susceptible strains had distinct transcriptional responses to serum, particularly related to genes responsible for polysaccharide biosynthesis. There was higher upregulation of wca locus genes, involved in the biosynthesis of colanic acid exopolysaccharide, in susceptible strains and increased expression of fepE, a regulator of very long-chain lipopolysaccharide in resistant strains. Conclusion: Clinical isolates of S. Typhimurium ST313 exhibit distinct antibody susceptibility phenotypes that may be associated with changes in gene expression on exposure to serum.
    • Global Maps of ProQ Binding In Vivo Reveal Target Recognition via RNA Structure and Stability Control at mRNA 3' Ends.

      Holmqvist, Erik; Li, Lei; Bischler, Thorsten; Barquist, Lars; Vogel, Jörg; HIRI, Helmholtz-Institut für RNA-basierte Infektionsforschung, Josef-Shneider Strasse 2, 97080 Würzburg, Germany. (Elsevier, 2018-06-07)
      The conserved RNA-binding protein ProQ has emerged as the centerpiece of a previously unknown third large network of post-transcriptional control in enterobacteria. Here, we have used in vivo UV crosslinking and RNA sequencing (CLIP-seq) to map hundreds of ProQ binding sites in Salmonella enterica and Escherichia coli. Our analysis of these binding sites, many of which are conserved, suggests that ProQ recognizes its cellular targets through RNA structural motifs found in small RNAs (sRNAs) and at the 3′ end of mRNAs. Using the cspE mRNA as a model for 3′ end targeting, we reveal a function for ProQ in protecting mRNA against exoribonucleolytic activity. Taken together, our results underpin the notion that ProQ governs a post-transcriptional network distinct from those of the well-characterized sRNA-binding proteins, CsrA and Hfq, and suggest a previously unrecognized, sRNA-independent role of ProQ in stabilizing mRNAs.
    • Transcriptional noise and exaptation as sources for bacterial sRNAs.

      Jose, Bethany R; Gardner, Paul P; Barquist, Lars; HIRI, Helmholtz-Institut für RNA-basierte Infektionsforschung, Josef-Shneider Strasse 2, 97080 Würzburg, Germany. (Portland Press, 2019-04-30)
      Understanding how new genes originate and integrate into cellular networks is key to understanding evolution. Bacteria present unique opportunities for both the natural history and experimental study of gene origins, due to their large effective population sizes, rapid generation times, and ease of genetic manipulation. Bacterial small non-coding RNAs (sRNAs), in particular, many of which operate through a simple antisense regulatory logic, may serve as tractable models for exploring processes of gene origin and adaptation. Understanding how and on what timescales these regulatory molecules arise has important implications for understanding the evolution of bacterial regulatory networks, in particular, for the design of comparative studies of sRNA function. Here, we introduce relevant concepts from evolutionary biology and review recent work that has begun to shed light on the timescales and processes through which non-functional transcriptional noise is co-opted to provide regulatory functions. We explore possible scenarios for sRNA origin, focusing on the co-option, or exaptation, of existing genomic structures which may provide protected spaces for sRNA evolution.
    • Functional analysis of Salmonella Typhi adaptation to survival in water.

      Kingsley, Robert A; Langridge, Gemma; Smith, Sarah E; Makendi, Carine; Fookes, Maria; Wileman, Tom M; El Ghany, Moataz Abd; Keith Turner, A; Dyson, Zoe A; Sridhar, Sushmita; et al. (Wiley-Blackwell, 2018-11-18)
      Contaminated water is a major risk factor associated with the transmission of Salmonella enterica serovar Typhi (S. Typhi), the aetiological agent of human typhoid. However, little is known about how this pathogen adapts to living in the aqueous environment. We used transcriptome analysis (RNA-seq) and transposon mutagenesis (TraDIS) to characterize these adaptive changes and identify multiple genes that contribute to survival. Over half of the genes in the S. Typhi genome altered expression level within the first 24 h following transfer from broth culture to water, although relatively few did so in the first 30 min. Genes linked to central metabolism, stress associated with arrested proton motive force and respiratory chain factors changed expression levels. Additionally, motility and chemotaxis genes increased expression, consistent with a scavenging lifestyle. The viaB-associated gene tviC encoding a glcNAc epimerase that is required for Vi polysaccharide biosynthesis was, along with several other genes, shown to contribute to survival in water. Thus, we define regulatory adaptation operating in S. Typhi that facilitates survival in water.
    • A global genomic approach uncovers novel components for twitching motility-mediated biofilm expansion in Pseudomonas aeruginosa.

      Nolan, Laura M; Whitchurch, Cynthia B; Barquist, Lars; Katrib, Marilyn; Boinett, Christine J; Mayho, Matthew; Goulding, David; Charles, Ian G; Filloux, Alain; Parkhill, Julian; et al. (Microbiology Society, 2018-11-01)
      Pseudomonas aeruginosa is an extremely successful pathogen able to cause both acute and chronic infections in a range of hosts, utilizing a diverse arsenal of cell-associated and secreted virulence factors. A major cell-associated virulence factor, the Type IV pilus (T4P), is required for epithelial cell adherence and mediates a form of surface translocation termed twitching motility, which is necessary to establish a mature biofilm and actively expand these biofilms. P. aeruginosa twitching motility-mediated biofilm expansion is a coordinated, multicellular behaviour, allowing cells to rapidly colonize surfaces, including implanted medical devices. Although at least 44 proteins are known to be involved in the biogenesis, assembly and regulation of the T4P, with additional regulatory components and pathways implicated, it is unclear how these components and pathways interact to control these processes. In the current study, we used a global genomics-based random-mutagenesis technique, transposon directed insertion-site sequencing (TraDIS), coupled with a physical segregation approach, to identify all genes implicated in twitching motility-mediated biofilm expansion in P. aeruginosa. Our approach allowed identification of both known and novel genes, providing new insight into the complex molecular network that regulates this process in P. aeruginosa. Additionally, our data suggest that the flagellum-associated gene products have a differential effect on twitching motility, based on whether components are intra- or extracellular. Overall the success of our TraDIS approach supports the use of this global genomic technique for investigating virulence genes in bacterial pathogens.
    • Morphological, genomic and transcriptomic responses of Klebsiella pneumoniae to the last-line antibiotic colistin.

      Cain, Amy K; Boinett, Christine J; Barquist, Lars; Dordel, Janina; Fookes, Maria; Mayho, Matthew; Ellington, Matthew J; Goulding, David; Pickard, Derek; Wick, Ryan R; et al. (2018-06-29)
      Colistin remains one of the few antibiotics effective against multi-drug resistant (MDR) hospital pathogens, such as Klebsiella pneumoniae. Yet resistance to this last-line drug is rapidly increasing. Characterized mechanisms of col
    • Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica.

      Wheeler, Nicole E; Gardner, Paul P; Barquist, Lars; HIRI, Helmoltz-Institut für RNA-basierteInfektionsforschung, Josef-Schneider-Strasse 2, 97080 Würzburg, Germany. (2018-01-01)
      Emerging pathogens are a major threat to public health, however understanding how pathogens adapt to new niches remains a challenge. New methods are urgently required to provide functional insights into pathogens from the massive genomic data sets now being generated from routine pathogen surveillance for epidemiological purposes. Here, we measure the burden of atypical mutations in protein coding genes across independently evolved Salmonella enterica lineages, and use these as input to train a random forest classifier to identify strains associated with extraintestinal disease. Members of the species fall along a continuum, from pathovars which cause gastrointestinal infection and low mortality, associated with a broad host-range, to those that cause invasive infection and high mortality, associated with a narrowed host range. Our random forest classifier learned to perfectly discriminate long-established gastrointestinal and invasive serovars of Salmonella. Additionally, it was able to discriminate recently emerged Salmonella Enteritidis and Typhimurium lineages associated with invasive disease in immunocompromised populations in sub-Saharan Africa, and within-host adaptation to invasive infection. We dissect the architecture of the model to identify the genes that were most informative of phenotype, revealing a common theme of degradation of metabolic pathways in extraintestinal lineages. This approach accurately identifies patterns of gene degradation and diversifying selection specific to invasive serovars that have been captured by more labour-intensive investigations, but can be readily scaled to larger analyses.