RNA landscape of the emerging cancer-associated microbe Fusobacterium nucleatum

Fusobacterium nucleatum, long known as a constituent of the oral microflora, has recently garnered renewed attention for its association with several different human cancers. The growing interest in this emerging cancer-associated bacterium contrasts with a paucity of knowledge about its basic gene expression features and physiological responses. As fusobacteria lack all established small RNA-associated proteins, post-transcriptional networks in these bacteria are also unknown. In the present study, using differential RNA-sequencing, we generate high-resolution global RNA maps for five clinically relevant fusobacterial strains—F. nucleatum subspecies nucleatum, animalis, polymorphum and vincentii, as well as F. periodonticum—for early, mid-exponential growth and early stationary phase. These data are made available in an online browser, and we use these to uncover fundamental aspects of fusobacterial gene expression architecture and a suite of non-coding RNAs. Developing a vector for functional analysis of fusobacterial genes, we discover a conserved fusobacterial oxygen-induced small RNA, FoxI, which serves as a post-transcriptional repressor of the major outer membrane porin FomA. Our findings provide a crucial step towards delineating the regulatory networks enabling F. nucleatum adaptation to different environments, which may elucidate how these bacteria colonize different compartments of the human body. Several Fusobacterium species, such as Fusobacterium nucleatum, have been associated with cancer. Here, using differential RNA-sequencing, the authors provide high-resolution global RNA maps for five clinically relevant fusobacterial strains, elucidating basic aspects of fusobacterial gene expression and identifying multiple non-coding RNAs, including an oxygen-induced small RNA, FoxI, which represses the major outer membrane porin FomA.

Transcription structure of virulence-associated genes. Although Fnn lacks typical secretion systems for export of effector proteins, there is a growing list of Fnn genes with proven roles in virulence 22,23,47 , including genes encoding type V autotransporters 44,[48][49][50] . Our RNA maps not only show that these genes are constitutively expressed but also provide expression context for many of them (Fig. 2a), for example, revealing monocistronic expression of adhesin FadA, which recognizes host cells and triggers β-catenin signalling 13,15,51 . By contrast, the important Gal-GalNAc lectin Fap2 is part of a dicistron. Likewise, the predicted FadA paralogue FN1529 is co-expressed with RadD (FN1526), a putative type Va autotransporter important for interspecies adherence and biofilm formation 52,53 .
The serine protease fusolisin (FN1426) 54 is a more complex case: the ORF is monocistronic but the 3′-region of this gene is independently transcribed into the FunR47 sRNA (Fig. 2a). Altogether, 65 of 208 proposed virulence-associated genes 55 are transcribed from a pTSS (Supplementary Dataset 4). Now, having their 5′-UTRs defined, these genes lend themselves to investigation of potential post-transcriptional control of fusobacterial virulence.

RNA-based annotations of ORFs and operons.
RNA maps are a powerful tool to assign and correct diverse gene expression features based on experimental data, which includes the global verification of ORF annotations in fully sequenced genomes 36 . To correct Fnn ORFs where necessary, we double-checked all coding sequences (CDSs) lacking a canonical starting codon or ribosome-binding site (RBS), or with an extendable ORF. These reannotations are in excellent agreement with the latest genome sequence update at Fusoportal 23 (Supplementary Dataset 5).
Small proteins represent an overlooked class of bacterial gene products, the functional importance of which is just beginning to unfold 56 . Fnn has 22 annotated small proteins <50 amino acids in length, many of which are associated with transposase genes or are ambiguous, lacking an AUG start codon or RBS, respectively. Using our TSS maps, we enrich the genome annotation with three previously overlooked, high-confidence candidates of small ORFs, naming them fspC1 to fspC3. For example, we propose fspC3 to encode a conserved ~48-amino acid hydrophobic peptide and to be cotranscribed with an operon for nucleotide metabolism and transfer RNA (tRNA) maturation/repair functions (Fig. 2b). The fspC1 ORF encodes a 41-amino acid peptide with a predicted transmembrane domain and lies close to the gene of anti-termination factor NusB ( Supplementary Fig. 2a), whereas fspC2 lies upstream of a predicted glutamate carboxypeptidase and encodes a 33-amino acid hydrophobic peptide ( Supplementary Fig. 2b). Their strong sequence conservation among Fusobacterium sp. suggests that the FspC1, FspC2 and FspC3 peptides play important roles in fusobacterial physiology.
Previous operon annotations in Fnn have greatly relied on computational inference from other bacteria 57 . Our TSS-based annotation predicts a total of 428 operons (Supplementary Dataset 6), which includes previously unknown poly(cistrons) such as that of phospholipase A1 type Vd autotransporter FplA (FN1704) 58 with the FN1706-FN1707 genes ( Supplementary Fig. 3a). The longest operon predicted spans 23 genes, encoding multiple ribosomal proteins and preprotein translocase SecY (Supplementary Dataset 6). TSSs inside primary operons predict the presence of 53 suboperons.   Fig. 1 | Differential RNA-seq for F. nucleatum subsp. nucleatum. a, Overview of analysed growth conditions (E, M and s phases) and experimental workflow for transcriptome analysis via dRNA-seq. Genome-wide read distribution for F. nucleatum subsp. nucleatum is shown, followed by the validation of the previously annotated tss for tnaA. Growth data are represented by the mean (± s.d.) from three biological replicates. r.p.m., reads per million. b, Venn diagram showing the number of detected tsss for each class. the lower panel shows tss classification based on expression strength and genomic location. c, Analysis of promoter regions associated with detected tsss using the MEME 45 suite identified a promoter motif in ~93% of analysed ptsss. An extended −10 box and the −35 box, which are separated by an At-rich region, are indicated. d, Length distribution and corresponding occurrences of all 5′-UtRs associated with ptsss (black) and stsss (red). A consensus sD sequence was predicted using MEME analysis. the average distance from the start codon is indicated.
Of note, in ~40% (21 suboperons) of such cases, the iTSS uncouples the last or the last two genes from the full operon (Supplementary Dataset 6). This is well illustrated with the FN1326-FN1320 operon, wherein gene FN1321 encoding an orphan response regulator seems to be conditionally uncoupled from an upstream biosynthesis gene cluster ( Supplementary Fig. 3b).
Riboregulatory elements in F. nucleatum. Typical 5′-UTR-borne, cis-regulatory RNA elements are riboswitches and RNA thermometers, which are RNA structures used by many pathogenic bacteria for location-dependent, post-transcriptional control of virulence genes 59,60 . The Rfam database 61 and literature searches led us to annotate 12 putative riboswitches in Fnn (Supplementary Dataset 7), 3 of which may sense cobalamin and thereby control genes involved in uptake of vitamin B 12 , iron or both (FN0300, Fe 2+ /vitamin B 12 -binding protein; FN1971, TonB-dependent receptor; FN1381, putative autotransporter) (Extended Data Fig. 5). Two riboswitches may sense flavin mononucleotide, one of which is associated with ribH (FN1505) encoding the riboflavin synthase β-subunit. The remaining ones belong to the families of glycine, lysine, purine, SAM and thiamine pyrophosphate riboswitches, and are generally found upstream of biosynthesis and/ or metabolism genes of these ligands (Supplementary Dataset 7). Ligand-responsive 5′-UTRs in F. nucleatum further include a glucosamine-6-phosphate-sensing ribozyme 62 ; as in Bacillus subtilis, this ribozyme may feedback-control glucosamine-6-phosphate synthase (FN0452) levels, and thereby cell wall production. Of other cis elements in 5′-UTRs (Supplementary Dataset 7), we found an RNA leader known to autoregulate the synthesis of ribosomal protein L10 (refs. 63,64 ). In addition, a putative binding site of the regulatory PyrR protein in the 5′-UTR of the pyr operon indicates transcriptional attenuation 65 . By contrast, we hardly found candidates for RNA thermometers. None of the five predicted candidates lies in a 5′-UTR (Supplementary Dataset 7). This currently leaves F. nucleatum without standard RNA thermometers, perhaps owing to the bacterium's stable environmental temperature in the human body.
Core ncRNAs and an active CRISPR-Cas system. Bacteria possess several ubiquitous stable RNAs other than ribosomal (r)RNA and tRNA, all of which lacked annotation in the Fnn genome. Guided by Rfam database predictions, we successfully probed 4.5S RNA, M1 RNA (RNase P) and transfer-messenger (tm)RNA on northern blots and observed constitutive expression over the course of growth (Fig. 3a). The 105-nt 4.5S RNA shows the conserved apical GGAA tetraloop on which the signal recognition particle for cotranslational delivery of inner membrane (IM) protein assembles 66 (Extended Data Fig. 6). As in many other bacteria 67 , the ~330-nt M1 RNA is expressed independently of the RNase P protein (FN0002); it is processed off a dicistronic transcript with hypothetical ORF FN1315. The Fnn tmRNA is 363 nt long and its internal small ORF (14 amino acids) for trans-peptide tagging shows an interesting sequence dichotomy between oral and non-oral isolates of fusobacteria (Extended Data Fig. 7). Importantly, however, its overall conservation argues that tmRNA-SmpB (FN0609), protein-associated rescue of stalled ribosomes on damaged mRNAs is a core process in fusobacteria.
The 6S RNA is a bacterial riboregulator shown to bind RNAP and modulate transcription in Escherichia coli and Bacillus subtilis 68 . The 6S RNA genes (ssrS) are hard to predict for a general lack of conserved primary sequence. In the present study, gene synteny searches combined with previous RNA structure predictions by others 69 identified an ssrS gene between serine protease FN0508 and arginyl-tRNA synthetase ArgS (FN0506), antisense to hypothetical ORF FN0507 (Fig. 3a,b). Several observations argue for this to be a functional 6S RNA: its overall abundance and accumulation towards stationary phase; its conserved structure as a long hairpin with an internal bulge mimicking a DNA open promoter complex; and our successful detection of tiny (26-nt long) pRNAs, the synthesis of which marks 6S RNA-related RNAP activity 68 (Fig. 3b,c and Extended Data Fig. 8).
CRISPR-Cas systems protect against unwanted foreign DNA in many bacteria. In the Fnn genome we identified a putative type I-B system with 17 repeats (Fig. 3d), one of which (repeat no. 11) displays a full match for the fusobacterial phage ФFunu2 (ref. 70 ). This CRISPR-Cas locus must be constitutively active, given that both dRNA-seq and northern blot analysis showed pre-crRNA processing into individual crispr RNA (crRNA) under all conditions tested. As previously seen with type I-B systems in Clostridium thermocellum and Methanococcus maripaludis 71 , the processing patterns are complex, exhibiting stable intermediates of double or even multiple repeat-spacer pairs. It is interesting that both the Cas genes and the crRNAs are upregulated in stationary phase cells, indicating a possible trade-off between active anti-phage defence and growth optimization (Supplementary Dataset 2). Further investigation across a greater number of strains is warranted to see whether these expression and processing patterns are conserved. A preliminary analysis of the four additional strains suggests that F. periodonticum also upregulates its CRISPR-Cas defence towards the stationary phase (Supplementary Dataset 2).
The full ncRNA suite. To comprehensively annotate additional non-coding transcripts, we combined ANNOgesic predictions 39 with manual inspection of the dRNA-seq data. This yielded 43 sRNA candidates from all over the genome (Fig. 4a,b), which were either transcribed from independent genes in intergenic regions (IGRs) or processed off mRNAs. For nomenclature, we refer to them as FunR sRNAs until a function has been assigned. Extensive northern blot validation experiments using RNA samples from 3 growth phases confirmed 24 of 43 tested candidates, ranging from 56 nt to 345 nt in length (Fig. 4c). Nine of these validated sRNAs are expressed from stand-alone sRNA genes in IGRs (Supplementary Dataset 8). In addition, 12 ncRNAs possess promoters that exhibit the same σ 70 motif seen in mRNA genes ( Supplementary Fig. 4).
Most of the candidate sRNAs are highly specific to Fnn, at least in the primary sequence (Fig. 4b). BLASTN analysis (75% nucleotide identity cut-off) across 36 fusobacterial species and subspecies suggests that there are some broadly conserved fusobacterial core sRNAs, for example, FunR12 or FunR23. At the other end of the spectrum, FunR19 is found in the reference strain F. nucleatum ATCC 25586 and only two additional strains. The FunR19 sRNA is located in the 5′-UTR of a putative reverse transcriptase gene, which is reminiscent of bacterial retrons and indicates a function in specialized anti-phage defence ( Fig. 4c) 72 . In addition, we noted intriguing similarities in genomic location to sRNAs from other bacterial genera that may provide functional clues. For example, FunR7 being located in the 5′-region of the stress chaperone gene, clpB, shared its location with the abundant ProQ-binding sRNA RyfD of E. coli and Salmonella sp. 73,74 (Fig. 4c). In E. coli, RyfD can inhibit biofilm formation 75 , which is an important physiological trait for fusobacteria as well. FunR27 is a 5′-UTR-derived sRNA that comprises a predicted SAM-I riboswitch region of the metK mRNA (S-adenosylmethionine synthase). Such riboswitch-derived sRNAs were proposed to act in trans as gene expression regulators in Listeria monocytogenes 76 . An interesting example of a cis-antisense sRNA is FunR43; it overlaps with the 3′-end of an SIR2-domain protein (FN1185) in a putative genomic 'defence island' , indicating an anti-phage function 77 .
FoxI is a conserved oxygen-induced sRNA. To gain initial insight into fusobacterial sRNA functions, we selected FunR22 for further analysis. As shown in Fig. 5a, FunR22 sequences are present in all F. nucleatum strains as well as in related F. periodonticum and F. hwasookii, usually in close vicinity to the rsmB (rRNA methyltransferase) or trpB (tryptophan synthase) genes. The dRNA-seq data argue that FunR22 is transcribed from a stand-alone sRNA gene. Northern blotting validated both the predicted length (87 nt) and growth-phase-dependent expression of FunR22 (Fig. 5b). Sequence alignment of FunR22 sequences and in silico RNA folding suggest two major sRNA regions, a 33-nt single-stranded region with a putative conserved seed sequence for mRNA recognition and a long Rho-independent terminator hairpin ( Fig. 5c,d). In addition, the alignment shows a highly conserved promoter region that differs from the σ 70 consensus (Fig. 5c), indicating that expression of FunR22 is regulated under specific conditions. To identify such conditions, we exposed F. nucleatum to oxidative or oxygen stress (H 2 O 2 or O 2 ), heat shock (42 °C), membrane stress (bile, acidic pH, lysozyme) and iron limitation (depletion of Fe 2+ ). Within this panel, we observed a specific upregulation of FunR22 on a 20-min oxygen shock, based on which we renamed it FoxI for fusobacterial oxygen-induced sRNA (Fig. 5e).

Plasmid pEcoFus-1 as a shuttle vector for functional analysis.
Functional characterization of bacterial sRNAs requires gene disruption and overexpression methods as well as knowledge of sRNA-associated proteins. To advance functional RNA analysis in fusobacteria, we developed a genetic system for sRNA expression by building a new shuttle vector, pEcoFus (Fig. 6a), on a pRPF185  tmRNA and M1 RNA of RNase P across the different growth stages by northern blotting. b, Read distribution for the 6s RNA detecting an antisense tss supporting the transcription of the pRNA. c, secondary structure prediction of the putative 6s RNA shows a two-handle stem-loop structure with an internal bulge region commonly found in known 6s RNAs. the tss associated with the identified pRNA (blue) is indicated with an arrow. d, Northern blot validation for the processing of the CRIsPR array by probing for the overlapping region between individual spacer and repeat pairs. the results show generation of single spacer-repeat pairs of ~67 nt (indicated by red asterisks) increasing towards the stationary phase, and also larger fragments indicating a more complex processing of the array. Rho-independent terminators were predicted via ANNOgesic. chassis 78 , in which the cloned (sRNA) gene of interest is expressed from the constitutive fusobacterial 4.5S RNA promoter mapped above. Northern blot analysis showed that we achieved stable expression of several sRNAs from this vector (Fig. 6a). Importantly, none of these sRNAs was toxic, neither in the cloning vehicle, E. coli, nor in the target organism, F. nucleatum.
FoxI represses the major OM protein FomA. To identify putative mRNA targets of FoxI, we overexpressed it in F. nucleatum from pEcoFus (Fig. 6a). Although the overexpression caused no apparent morphological changes, it prevented F. nucleatum from reaching full cell density (Extended Data Fig. 9a)  For each strain, sRNAs were considered to be conserved if they displayed ≥75% nucleotide identity and were not shorter than 75% of the sRNA's length in the reference strain. An asterisk marks sRNA candidates that overlapped with riboswitch predictions, but detection of a stable fragment by northern blot analysis indicated potential dual functions. Grey boxes indicate that the ncRNAs did not meet the required criteria but were predicted via Rfam. Purple ncRNAs were identified only when comparing genomic synteny in addition to the secondary structure of these regions. c, Northern blot validation of predicted sRNAs in three different growth phases. the sRNA candidates were classified into established classes of intergenic, 5′-UtR-or 3′-UtR-derived or antisense sRNAs. Rho-independent terminators were predicted by ANNOgesic. Processing sites are indicated by a dashed line.
revealed depletion of a very abundant protein in the 35-to 55-kDa range in the FoxI-expressing strain, compared with empty pEcoFus (Fig. 6b). Mass spectrometry (MS) of the excised band predicted this depleted protein to be the ~42 kDa outer membrane (OM) porin FomA (Fig. 6c and Extended Data Fig. 9c-f). The OM localization of this putative FoxI target was fully supported by cell fractionation experiments as well as western blot analysis with a FomA antibody (Fig. 6b). Next, we used the RNA-RNA interaction (IntaRNA) algorithm 79 to predict possible base-pairing interactions between FoxI and the fomA mRNA. The top prediction was an 8-bp RNA helix with a bulged A, sequestering the RBS of this target and so repressing translation initiation of the fomA mRNA (Fig. 6d). Most importantly, this interaction would fully engage a conserved region of the fomA mRNA (Fig. 6e) and one of the two candidate seed regions in FoxI, that is, the conserved cytidine-rich stretch upstream of the long 3′-hairpin (Fig. 5c,d). Indeed, mutation of three consecutive cytidines to adenines (sRNA variant FoxI-3C) abrogated both the growth phenotype and the downregulation of FomA (Fig. 6b,c and Extended Data Fig. 9a). Expression of the sRNA itself was, however, largely unaffected by this mutation (Extended Data Fig. 9b). Thus, the almost complete depletion of one of the most abundant Fnn proteins by the FoxI sRNA is very likely to occur on the post-transcriptional level by a mechanism of translational repression.

Discussion
The phylum Fusobacteria, despite its importance for human and veterinary medicine, is understudied with respect to molecular mechanisms of gene expression and RNA biology. Our global RNA maps obtained for five fusobacterial strains provide an important resource in the quest to understand how gene regulation enables this group of microbes to dwell and proliferate in diverse animals 6 . In addition, although high-throughput screening for fusobacterial gene function is in its infancy 80,81 , our single-nucleotide expression maps will be invaluable for scoring effects of transposon insertions outside reading frames. Our study increases by a factor of 1,000 the number of mapped 5′-ends, effectively assigning a primary TSS or poly(cistronic) transcription to most F. nucleatum genes. Specifically, we assigned TSSs to 706 genes or operons, which include the two major virulence factors FadA and Fap2, and ~200 other putative virulence factors 55 (Figs. 1b and 2a, and Supplementary Dataset 1). Their observed growth-independent expression starkly contrasts with the common condition-dependent induction of virulence genes in many other human pathogens. This supports F. nucleatum's role as an opportunistic pathogen and generalist 82 ,   thus in part explaining why the bacterium can colonize additional sites in the human body. With regard to general transcription signals, our analysis of different growth stages shows that most fusobacterial promoters possess an extended −10 box; a −35 box is less prevalent in Fnp and Fnv. Despite these differences, our results indicate a shared recognition sequence for the fusobacterial housekeeping σ 70 factor. Intriguingly, the fusobacterial promoters differ from those of closely related species (for example, Bacteroides fragilis 83 , B. thetaiotaomicron 84 or Porphyromonas gingivalis 85 ), and seem more similar to promoters of proteobacteria 36,41,46 or Firmicutes 86 .
We not only corrected the in silico annotation for ~22% of all genes of F. nucleatum, but also added hundreds of functional elements. In addition to three previously overlooked conserved small ORFs, we provided evidence for a rich layer of ncRNAs of diverse origin, including a dozen sRNAs from 'empty' IGRs. Except for four ubiquitous RNAs (6S RNA, 4.5S RNA, tmRNA and M1 of RNase P), the fusobacterial sRNAs seem to be unique, showing no obvious sequence homology outside this phylum. Yet, our discovery of FoxI as a repressor of the major porin FomA establishes proof of principle that fusobacteria share with many other bacteria the use of sRNAs to regulate envelope composition 28 . Mechanistically, we consider it likely that, in vivo, stable formation of the predicted 8-bp FoxI-fomA RNA duplex is mediated by an RNA-binding protein (RBP). As fusobacteria lack CsrA, Hfq and ProQ 87,88 , the use of FoxI as an RNA bait in experimental RBP discovery 29,74 promises to expand the currently known set of sRNA-related RBPs. Candidates include the KhpA/B proteins, which have been predicted to associate with sRNAs in Gram-positive species [89][90][91] .
Work in other bacteria has shown that conserved sRNAs are often the most stringently regulated genes within a given regulon [92][93][94][95] . Although the reported colonization of different body sites 2-4,96-101 implies that F. nucleatum possesses multiple environmental sensing and stress response pathways, the responsible transcription factors are unknown. In the present study, we observed strong and selective upregulation of the FoxI sRNA after oxygen exposure, paired with exceptional sequence conservation of the foxI promoter, indicating the presence of key elements for its oxygen-dependent activation (or derepression). These observations provide invaluable starting points to find a transcription factor that enables F. nucleatum to respond to elevated oxygen levels. Physiologically, the FoxI-mediated repression of FomA synthesis when F. nucleatum senses an oxygen-rich environment, that is, after leaving its stable niches in oral biofilms or cancer tissue, may protect from the host's immune system, because FomA is known to be recognized by both adapted and innate immune pathways 102 .
In conclusion, high-resolution RNA maps are essential prerequisites for the study of host-pathogen interactions by advanced transcriptomics methods such as dual RNA-seq 103 or bacterial single-cell RNA-seq 104,105 , in particular, when trying to track F. nucleatum gene activity in cancer tissue. Improving the poor genetic tractability of this organism, our new shuttle vector (Fig. 6a) should help to accelerate the discovery of gene function in niche adaptation and survival. Furthermore, this expression system may facilitate genome-wide antisense knockdown of chromosomal genes, as pioneered in Staphylococcus aureus 19 years ago 106 . Alternatively, it may be used to repurpose the CRISPR-Cas locus of F. nucleatum, which is here shown to be functional (Fig. 3d), for intrinsic gene regulation. All in all, the RNA-centric approach in the present study opens up new avenues for molecular microbiology excursions into an understudied phylum of great medical importance.

Methods
Bacterial strains and growth conditions. All oligonucleotides, plasmids or strains used in the present study can be found in Supplementary Dataset 9. The study utilized four different subspecies of F. nucleatum: F. nucleatum subsp. nucleatum (ATCC 25586 and ATCC 23726), subsp. polymorphum (ATCC 10953), subsp. vincentii 3_1_36A2 and subsp. animalis 7_1, as well as a strain of F. periodonticum. ATCC 25586 and ATCC 10953 were obtained from the German Collection of Microorganisms and Cell Culture (DSMZ) and ATCC 23726 was obtained from the American Type Culture Collection (ATCC). The Fnv, Fna and Fup strains were a kind gift from E. Allen-Vercoe (University of Guelph, Canada). All strains were routinely grown at 37 °C in 80:10:10 (N 2 :H 2 :CO 2 ) on brain-heart infusion (BHI) 2% agar plates supplemented with 1% (w:v) yeast extract, 1% (w:v) glucose, 5 µg ml −1 of haemin and 1% (v:v) fetal bovine serum (BHI-C). Liquid cultures for all experiments were grown in Columbia broth medium without agitation (below). Precultures were prepared 24 h before inoculating working cultures at a 1:50 dilution. When using F. nucleatum subsp. nucleatum ATCC 23726 carrying a plasmid, agar plates were supplemented with 5 µg ml −1 of thiamphenicol and liquid cultures contained 2.5 µg ml −1 of the antibiotic.
Sample collection for dRNA-seq analysis and RNA extraction. Samples for dRNA-seq analysis were collected from bacterial cultures corresponding to the early logarithmic, mid-logarithmic and early stationary growth phase. Three biological replicates were collected for each time point. Samples were fixed by the addition of STOP Mix (95% (v:v) EtOH, 5% (v:v) phenol) before snap-freezing in liquid nitrogen. All samples were subsequently stored at −80 °C until RNA extraction. RNA extraction was performed as previously reported 36 . In short, frozen bacterial cultures were thawed on ice and centrifuged, and cell pellets were resuspended in lysis solution (600 µl of 0.5 mg ml −1 of lysozyme in Tris-ethylenediaminetetraacetic acid buffer, pH 8.0 with 60 µl 10% sodium dodecylsulfate (SDS)). Bacterial cells were lysed by placing the samples for 2 min at 65 °C in a water bath and the reaction was stopped by addition of 65 µl of 3 M sodium acetate, pH 5.2. Total RNA was extracted from the lysates using the hot phenol method 107 .
Generation and sequencing of cDNA libraries for dRNA-seq. Complementary DNA libraries for dRNA-seq were generated by Vertis Biotechnology AG. This was performed as described in Berezikov et al. 108 while omitting the RNA size-fractionation step before cDNA synthesis. To summarize, two libraries were created for each total RNA sample, for which one included all transcripts (TEX−), and the second one was enriched for primary transcripts by treatment with terminator exonuclease (TEX+). This was followed by addition of a poly(A) tail to equal amounts of RNA samples using a poly(A) polymerase. Next, the 5′-triphosphate residues were removed with tobacco acid pyrophosphatase before ligation of the 5′-RNA adapter. This was used for the generation of first-strand cDNA using oligo(dT)-adapter primer and MMLV transcriptase. The cDNA concentration was further increased to 20-30 ng µl −1 through PCR amplification utilizing a high-fidelity DNA polymerase. Library-specific barcodes were included for the 5-′sequencing adapters to allow for multiplexing. The cDNA libraries for the untreated samples were performed in a similar fashion except that the TEX treatment was omitted, the final cDNA concentration was 10-20 ng µl −1 and barcode sequences were included in the 5′-and 3′-TrueSeq sequencing adapters. The different cDNA libraries were pooled in approximately equimolar amounts before being sequenced on an Illumina NextSeq 500 system (75 bp single-end read length). The cDNA libraries were sequenced with Vertis Biotechnology AG (for F. nucleatum subsp. nucleatum and polymorphum) or by the Core Unit SysMed (University of Würzburg; for F. nucleatum subsp. animalis and vincentii, and F. periodonticum) Read mapping of dRNA-seq data. The RAW data are available at the Gene Expression Omnibus (GEO) under accession no. GSE161360. Adapter clipping and quality trimming of the Illumina reads in FASTQ format was performed using the fastq_quality_trimmer function of the FASTX toolkit v.0.10.1 (http://hannonlab. cshl.edu/fastx_toolkit). The READemption 0.4.3 tool was used to perform the following steps with the usage of its subcommands 'create' , 'align' and ' coverage' 109 : the poly(A) tail sequences were computationally removed before the size-filtering step, which excluded all sequences <12 nt. The following reference genomes for each isolate were downloaded from the National Center for Biotechnology Information (NCBI) ftp server and used for mapping via Segemehl v.0.2.0 (ref. 110  Prediction of TSSs, transcripts, UTRs, operons, sORFs, sRNAs, CRISPR locus, regulatory RNA elements and terminators. For the following predictions, the ANNOgesic tool was used 39 : all parameters were kept at the default setting, if not otherwise specified. TSSs were predicted using the default setting of ANNOgesic, which further categorizes the TSSs into five different classes: pTSSs, sTSSs, iTSSs, aTSSs and oTSSs. For this, all parameters had to be met within all replicates of each condition to be annotated as a TSS. In addition, predicted sTSSs were excluded if a pTSS was predicted to be <7 nt away. The TSS prediction was further improved by manual curation. The annotation of UTRs was conducted by adjusting the default settings to allow for a maximum length of 300 nt and by extending 5′-UTRs up to 25 nt to be connected with detected transcripts. All sRNAs were predicted through the input of detected promoter sequences (below) while further allowing for the detection of 5′-UTR-derived sRNAs. In addition, ANNOgesic takes TSSs, coverage and the presence of terminators into account when predicting the length of sRNA candidates. For predictions of transcripts, operon structures, CRISPR locus and regulatory RNA elements, the default settings of ANNOgesic were used. Rho-independent terminator prediction was carried out using ANNOgesic, which combines two heuristic prediction algorithms: TransTermHP 111 analysis and a detection of sharp coverage decreases around the predicted terminator sequence.
Promoter and SD sequence detection. To detect putative promoter motifs, 50 nt upstream of each detected TSS, including the TSS position, were extracted using BEDtools 112 and further analysed with MEME v.4.12.0 (ref. 45 ). The same procedure was performed for all genes lacking a primary TSS. For this, 100-nt-long sequences upstream of the start codon were extracted to account for unaccounted 5′-UTRs. To identify SD sequences, all 5′-UTR sequences were extracted (above) and used as input for MEME analysis.

Northern blot analysis.
For northern blot analysis, 3-10 µg of DNase I-treated total RNA from three biological replicates for each time point or strain was separated on a 6% polyacrylamide gel containing 7 M urea. After transfer of the RNA to Hybond-XL membranes, hybridization took place overnight at 42 °C with [γ 32 P]ATP end-labelled deoxyribonucleotide probes. A Typhoon FLA 7000 phosphorimager (GE Healthcare) was used for signal visualization.
Reannotation of coding sequences. All annotated CDSs were checked for the presence of an AUG start codon. In addition, 20 nt upstream of the start codon was extracted to analyse whether an SD sequence was present. In the absence of either, the ORFs were manually inspected. In the case of a missing AUG start codon, the sequences were surveyed for an in-frame start codon up-or downstream of the annotated start. A similar analysis was conducted in the case of an absent SD sequence. CDSs for which the corrected start was supported by both the presence of a start codon (AUG, GUG or UUG) and a SD sequence, as well as by sequence conservation in other F. nucleatum susbp. nucleatum strains, were included for reannotation.
Construction of shuttle vector pEcoFus. The E. coli-C. difficile shuttle vector pRPF185 (ref. 78 ) was used as a backbone. In the first step, the backbone was opened by inverse PCR using primers JVO-17251/17252 to remove repA and orfB while introducing PvuI and NotI restriction sites. The origin of replication for F. nucleatum was amplified from pORI92 (a gift from G. Bachrach) using primers JVO-17207/17248 to introduce PvuI and NotI restriction sites, cut and ligated with the backbone, resulting in pFP14. Next, pFP14 was digested with KpnI and BamHI before being assembled together with an amplified fragment of the 100-nt promoter region of the 4.5S RNA using the NEBuilder Hifi Assembly Cloning kit (New England Biolabs) and transformed into E. coli topF′. This resulted in the vector pEcoFus for constitutive overexpression for any gene of interest. Electroporation of F. nucleatum subsp. nucleatum. Electrocompetent cells of the genetically tractable F. nucleatum subsp. nucleatum ATCC 23726 were generated in a similar fashion to that described in Haake et al. 113 . In short, cells were harvested from the M phase at 4,000g for 10 min at 4 °C. Subsequently the cells were washed in ice-cold and prereduced 10% (v:v) glycerol solution for a total of five washes. Finally, the cells were resuspended to 60 optical density at 600 nm (OD 600 ) units per ml in 10% glycerol solution, aliquoted and stored at −80 °C until use. Then, 80 µl of competent cells were transformed with 5 µg of dialysed plasmid (2.0 kV, 1-mm gap) and recovered in Columbia broth containing 1 mM MgCl 2 for 2 h before plating the bacteria on BHI-C plates containing 5 µg ml −1 of thiamphenicol. This routinely yielded >10 colony-forming units per µg of plasmid DNA. Transformants were re-streaked on to fresh BHI-C before verification that they carried the correct insert and preparation of glycerol stocks or downstream analyses.

Construction
Stress exposure of F. nucleatum. Fnn was grown to mid-exponential phase before being exposed to different stress conditions for 20 min. The following were conducted inside the anaerobic chamber: H 2 O 2 (400 µM; AppliChem), bile (0.05% (w:v); Sigma-Aldrich, catalogue no. 70168), iron depletion (300 µM; 2,2-bipyridyl; Sigma-Aldrich, catalogue no. D216305-10G) and lysozyme (50 µg ml −1 ; Roth, catalogue no. 8259.2). All treated cultures and the untreated control were kept in the incubator at 37 °C for the duration of the treatment. For acidic pH treatment, the bacteria were harvested (37 °C; 5 min at 4,000g) and resuspended in pH-adjusted (pH 5.5 or pH 7 for the control), prewarmed and reduced medium. Similarly, for the O 2 shock, the samples were spread on to Petri dishes and incubated at 37 °C under atmospheric conditions. The heat shock was conducted at 42.5 °C in a water bath outside the chamber. Before removing samples from the chamber, the falcons were wrapped with parafilm to avoid oxygen exposure. At the end of each treatment, cells were fixed in STOP mix and RNA was isolated as described above. Subcellular fractionation. Subcellular fractionation was performed as described by Knoke et al. 114 for two biological replicates. In short, 50 OD 600 units were harvested from M-phase cultures at 4,000g for 10 min at 4 °C. The cell pellet was resuspended in lysis buffer (10 mM Tris-HCl, pH 7.5, 0.2 mM phenylmethylsulfonyl fluoride, 1 mM MgCl 2 ) and total protein samples were collected. The cells were lysed via sonication and centrifuged at 4,000g for 10 min at 4 °C. A lysate sample was collected before performing another centrifugation step at 15,000g for 1 h at 4 °C. Samples for the cytosolic fraction were collected and the pellet was incubated with 0.5% N-lauroylsarcosine in lysis buffer overnight at 4 °C, to separate the IM and OM. Another high-speed centrifugation step was performed (100,000g for 2 h at 4 °C) collecting the IM fraction and dissolving the OM fraction in lysis buffer containing 0.5% Triton X-100. For denaturing SDSpolyacrylamide gel electrophoresis (PAGE) analysis, 0.1 OD 600 unit was loaded for total cell lysate and the cytosolic fraction whereas 1 OD 600 unit was loaded for both IM and OM fractions. For western blot analysis of FomA, a polyclonal antibody was generated in rabbits against a synthetic peptide derived from FomA (amino acid sequence: KKFATYNKGDKKSQF). Peptide synthesis, validation via ELISA and subsequent antigen purification was performed by Eurogentec. Afterwards, an unstained gel was transferred to a poly(vinylidene fluoride) membrane (GE Healthcare Life Sciences) and incubated with anti-FomA antibody before being detected using an anti-rabbit secondary antibody (Thermo Fisher Scientific, catalogue no. 31460).

NanoLC-MS/MS analysis of protein samples.
Quantification of FomA from F. nucleatum samples was conducted similarly to how it was described in Hör et al. 91 . Denatured samples for total protein (empty vector control, FoxI or FoxI-3C overexpression) or the OM fraction (empty vector control) was separated on a denaturing 15% SDS gel and for each sample the region of interest was excised from the gels for two replicates (Fig. 6b). The samples were destained in 100 mM ammonium bicarbonate containing 30% acetonitrile and shrunk in 100% acetonitrile before trypsin digestion (0.1 µg in 100 mM ammonium bicarbonate, overnight at 37 °C). The samples were then dissolved in 5% formic acid for subsequent nano-liquid chromatography-tandem MS (LC-MS/MS) analysis. For nanoLC-MS/MS analysis, an Orbitrap Fusion (Thermo Fisher Scientific) combined with a PicoView Ion source (New Objective) and an Easy-nLC 1000 (Thermo Fisher Scientific) was used. The dissolved samples were loaded on PicoFrit capillary columns (30 cm × 150 µm ID, New Objective) packed with ReproSil-Pur 120 C18-AQ (1.9 µm, Dr. Maisch) and separated using a 140-min linear gradient (3-30% acetonitrile, 0.1% formic acid) at a flow rate of 500 nl min −1 . Both MS (60,000 scans; target value for AGC: 2 × 10 5 ) and MS/MS (7,500 scans; target value for AGC: 5 × 10 4 ) analyses were conducted using the Orbitrap with the higher-energy collisional dissociation fragmentation set to 35% normalized collision energy. A fixed cycle time of 3 s was used for the Top Speed data-dependent MS/MS method, whereas a further exclusion of 1 repeat count per min was applied. Precursor selection was conducted at a minimum threshold of 50,000 while excluding single charged ones. Internal calibration with Easy-IC was used to improve mass:charge ratio assignment. Subsequent analysis of the data was conducted using MaxQuant (v. 1 Last updated by author(s): May 4, 2021 Reporting Summary Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability RNA-seq data can be accessed at NCBI Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) under the accession number GSE161360. The RNA-seq data can further be accessed via https://helmholtz-hiri.de/en/datasets/fusobacterium.

nature research | reporting summary
April 2020 MS data can be accessed at the Proteomics Identification Database PRIDE (https://www.ebi.ac.uk/pride) under the accession number PXD022474. For all other data types, the source data files are provided along with the final manuscript.

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size
The dRNA-seq was performed in biological triplicates. Northern blot analysis for growth and stress conditions was performed in biological triplicates and for the sRNA overexpression in biological duplicates. The subcellular fractionation was conducted for two biological replicates. The samples size was determined based upon experience with previous studies (PMID: 20164839; PMID: 26307765).
Data exclusions No data were excluded.

Replication
The dRNA-seq was performed in biological triplicates. Northern blot analysis for growth condition was performed in biological triplicates and for the sRNA overexpression in biological duplicates. The subcellular fractionation was conducted for two biological replicates. All repeats verified the original finding. All growth curves were conducted in biological triplicates. Mass spectrometry analysis was conducted in biological duplicates.