Variability of strain engraftment and predictability of microbiome composition after fecal microbiota transplantation across different diseases
Metagenomic dataset search strategy and selection
We systematically searched PubMed, Scopus and ISI Web of Knowledge as of 8 February 2021 for potentially eligible studies using the following search string: ((faecal microbiota suspension) OR (fecal microbiota suspension) OR (faecal microbiota transplant*) OR (fecal microbiota transplant*) OR (faecal microbiota donation) OR (fecal microbiota donation) OR (faecal microbiota transfer) OR (fecal microbiota transfer) OR (faecal microbiota infusion) OR (fecal microbiota infusion) OR (faecal microbial suspension) OR (fecal microbial suspension) OR (faecal microbial transplant*) OR (fecal microbial transplant*) OR (faecal microbial donation) OR (fecal microbial donation) OR (faecal microbial transfer) OR (fecal microbial transfer) OR (faecal microbial infusion) OR (fecal microbial infusion) OR (faecal suspension) OR (fecal suspension) OR (faecal transplant*) OR (fecal transplant*) OR (faecal donation) OR (fecal donation) OR (faecal transfer) OR (fecal transfer) OR (faecal infusion) OR (fecal infusion) OR (bacteriotherapy) OR (stool transplant*) OR (stool donation) OR (stool transfer) OR (stool infusion) OR (FMT)) AND ((Metagenom*) OR (shotgun) OR (engraft*) OR (whole genom*) OR (transkingdom) OR (WGS)). In addition, we manually searched the bibliographies of papers of interest to provide additional references. When needed, we contacted the authors to obtain additional data, metadata or clarification of study methods.
We considered as eligible all original studies with the following characteristics: (1) human subjects of any age were treated with nonautologous FMT; (2) shotgun metagenomic analysis of donor feces and of recipient feces (before and after treatment) was performed. We excluded studies in which the only therapeutic treatment for the disease was based on antibiotics. We further excluded those studies using microbial consortium-based transplantation approaches (instead of donor stool-based transplantations), those in which fewer than three recipients were enrolled and if raw sequencing data or metadata were not available or incomplete. In the case of randomized controlled trials that used autologous FMTs as placebo, we included only patients treated with nonautologous FMT. If studies used stool from mixed donors for FMT (multidonor FMT), they were included only if sequencing of multidonor stool batches were available. Finally, we excluded animal model studies or nonoriginal studies (reviews, meta-analyses, editorials, and so on). The eligibility of each study was assessed independently by two reviewers (N.K. and S.P.), and any disagreements were resolved by the opinion of a third reviewer (G.I.).
Sequencing data files and metadata were downloaded from public repositories as indicated in the original publications. If data were not publicly available, we contacted authors asking to provide them through private correspondence.
Metadata extraction and curation
Metadata extraction was carried out independently by two reviewers (N.K. and S.P.), using a data collection form. Discrepancies between the two reviewers were resolved by the opinion of a third investigator (G.I.). The following data were extracted from each study if available: author names, publication year, Bioproject Accession code, sequencing depth, study location, number of total samples, study disease, number of recipients and donors, donor type (that is, whether donor individuals were related to the recipient, either family/household members or through friendship or whether they were unrelated), use of antibiotics before FMT, characteristics of infused feces (grams, volumes, use of frozen/fresh material), routes and number of infusions, follow-up, and clinical and microbiological outcomes. Data were not analyzed by sex or gender due to lack of this information in most of the published datasets.
Newly collected metagenomic datasets
Three Italian cohorts were newly collected as case series and sequenced in the context of this study. A first cohort (This_study_Cdiff) was collected between February 2021 and August 2021 at the Fondazione Policlinico Gemelli IRCCS in Rome, Italy, and included 16 adult subjects with recurrent C. difficile infection and no history of other GI disorders or GI surgery. Patients were treated with a single fecal transplant from six different donors, and their stool was collected just before FMT and at different timepoints (7, 15, 30, 60, 180 and 240 days) after FMT. FMT was performed with frozen fecal material. Donor selection and manipulation of fecal material were performed following international guidelines3. All patients underwent FMT by colonoscopy, after bowel lavage and a 3-day vancomycin regimen, as previously described1. A total of 94 stool samples were sequenced. A second cohort (This_study_IBD) was collected from May 2017 to October 2017 at the Ospedale Bambino Gesù IRCCS in Rome, Italy, and included two pediatric patients with mild-to-moderately active IBD despite traditional treatments, without any active GI infection, placed central venous catheter or critical illness or comorbidity. They received a single FMT (one patient from a related donor, the other from an unrelated donor). Stool samples were collected and sequenced at follow-up visits up to 30 days after treatment, yielding eight metagenomic samples. A third cohort (This_study_MDRB), from the Ospedale Pediatrico Bambino Gesù IRCCS in Rome, Italy, included, between October 2018 and March 2019, five pediatric patients with large bowel colonization with MDRB and either acute leukemia (n = 4 patients) or severe combined immunodeficiency (n = 1 subject). Patients underwent single (n = 4 subjects) or sequential (n = 1 subjects, n = 2 procedures) fecal transplant from one of two donors. Stool samples were collected and sequenced at follow-up visits up to 30 days after FMT (n = 13 metagenomic samples in total). In both pediatric cohorts, FMT was performed as previously described63. Written informed consent was obtained from all participants (or the parents of pediatric participants). No compensation was provided to the participants. Consistent metadata of all 115 samples newly collected in this study can be found in Supplementary Table 2.
Samples were collected using a stool collector with a DNA stabilization buffer, brought directly by patients to the FMT centers in a refrigerated box within 6 h from collection, and then stored at –80 °C for up to 36 months before being shipped in dry ice to the CIBIO Department (Trento, Italy) for DNA extraction and sequencing. DNA extraction was performed using the DNeasy PowerSoil Pro Kit (Qiagen) according to the manufacturer’s procedures. No human DNA sequence depletion or enrichment of microbial or viral DNA was performed. DNA concentration was measured with Qubit (Thermo Fisher Scientific) and DNA was then stored at –20 °C. Sequencing libraries were prepared using the Illumina DNA Prep (M) Tagmentation kit (Illumina) following the manufacturer’s guidelines. Sequencing was performed on the Illumina NovaSeq 6000 platform at a target sequencing depth of 7.5 Gbp following the manufacturer’s protocols.
Newly generated shotgun metagenomic sequences were preprocessed and quality controlled using the pipeline available at https://github.com/SegataLab/preprocessing and KneadData within bioBakery v.3 (ref. 23). Shortly, reads were quality controlled and those of low quality (average quality score
Definition of clinical response across studies
To evaluate the association between microbial engraftment and clinical success, we identified all studies that expressed clinical outcomes as binary variables, for which single individual metadata were available or could be retrieved from the publication via manual curation, and for which both the clinically successful and the unsuccessful groups had at least one FMT triad. Ten published studies (AggarwalaV_2021, BarYoseph_2020, BaruchE_2020, DavarD_2021, GollR_2020, SmillieC_2018, SuskindD_2015, VaughnB_2016, ZhaoH_2020, IaniroG_2020) and the three new cohorts (This_Study_Cdiff, This_Study_IBD, This_Study_MDRB) were included. Clinical success was defined as C. difficile infection cure in three studies (AggarwalaV_2021, SmillieC_2018, This_Study_Cdiff), as eradication of MDRB in two studies (BarYoseph_2020, This_Study_MDRB), as objective tumor regression by imaging according to iRECIST criteria65 in two studies (BaruchE_2020, DavarD_2021), as reduction by more than 75 points in the IBS-Severity Scoring System (IBS-SSS) in GollR_2020, as resolution of diarrhea in IaniroG_2020, as reduction by >25% in the Yale Global Tic Severity Scale (YGTSS-TTS) and reduction by more than three in the Harvey-Bradshaw Index (HBI) change without an increase in IBD-related medications in VaughnB_2016, as clinical remission expressed as Pediatric Crohn’s Disease Activity Index (PCDAI) of less than ten in SuskindD_2015, and as clinical remission expressed as Pediatric Ulcerative Colitis Activity Index (PUCAI) of less than ten in This_Study_IBD.
Building the expanded SGB database
SGBs are clusters of microbial genomes and MAGs defined to have no more than 5% pairwise genetic divergence25. SGBs can contain taxonomically labeled microbial genomes from isolate sequencing (kSGBs) or can lack taxonomic contextualization from isolate sequencing (uSGBs; that is, SGBs with no cultured isolate). In this work, we first extended the SGB database and then employed it to detect and profile the taxa present in metagenomes belonging to any kSGB or uSGB at species- and strain-level resolution.
The custom extended database was built starting from the 154,723 MAGs and 80,990 reference isolate genomes from Pasolli et al.25 and further expanded using the same approach with 616,805 MAGs from different human body sites, animal hosts and other environments, together with 155,767 reference genomes in the National Center for Biotechnology Information GenBank database66 available as of November 2020. MAGs were assembled from metagenomes by applying metaSPAdes67 (v.3.10.1) or MEGAHIT68 (v.1.1.1) to each sample separately as reported in Pasolli et al.25. Obtained assembled contigs longer than 1,500 nucleotides were binned into MAGs with MetaBAT2 (ref. 69) (v.2.12.1). We executed CheckM (v.1.1.4)70 on the 1,008,148 genomes, filtering those with completeness below 50% or contamination above 5% to ensure high quality. Next, we minimized the redundancy among genomes by computing Mash distances71 on the quality-controlled sequences, and dereplicating sequences at 99.99% genetic identity. A total of 729,195 genomes (560,076 MAGs (Supplementary Table 15) and 169,119 reference genomes) were kept in the extended database used for species- and strain-level profiling, thus leveraging reference-based profiling with information provided by metagenome assembly. Reference isolate genomes and MAGs were then clustered into SGBs spanning at least 5% genetic diversity, and SGBs to genus-level genome bins (GGBs; 15% genetic diversity) and family-level genome bins (FGBs; 30% genetic diversity), following the procedure described in Pasolli et al.25. ‘phylophlan_metagenomic’—a subroutine of PhyloPhlAn 372 that applies Mash71 to estimate the whole-genome average nucleotide identity among genomes—was used to assign MAGs to SGBs. Reference genomes and MAGs for which no SGB with at least 5% average genetic distance was present in the database were assigned to new SGBs based on the average linkage hierarchical clustering (with the dendrogram cut at 5% genetic distance). Similarly, when no GGBs or FGBs below the genetic distance threshold existed, SGBs were assigned to new GGBs and FGBs following the same procedure.
Prokka (v.1.12 and v.1.13)73 was used to annotate the open reading frames of all reference genomes and MAGs. Coding sequences were assigned to a UniRef90 cluster74 by performing a Diamond search (v.0.9.24)75 of the coding sequences on the UniRef90 database (v.201906) and assigning a UniRef90 identifier when the mean sequence identity to the centroid sequence was greater than 90% and covered more than 80% of the centroid sequence. Sequences that could not be assigned to any UniRef90 cluster following this procedure were de novo clustered with MMseqs2 (ref. 76) to SGBs following the Uniclust90 criteria77.
Definition of kSGBs and uSGBs and taxonomic assignment
SGBs containing at least one reference genome (kSGBs) were assigned the same species-level taxonomy of the reference genomes included in the kSGB following a majority rule. SGBs containing no reference genomes (uSGBs) were given the taxonomic annotation of the corresponding GGB (up to the genus level) if this included reference genomes, and of the FGB (up to the family level) if that included reference genomes. Alternatively, if no reference genomes were contained in the FGB, a phylum-level taxonomic label was assigned based on the majority rule of up to 100 closest reference genomes to the MAGs in the SGB as determined by ‘phylophlan_metagenomic’. Taxonomic assignment of SGBs profiled in this study can be found in Supplementary Table 3.
Species-level profiling of metagenomic samples
Species-level profiling was performed on samples sequenced to a depth higher than 1 Gbp (n = 1,419; 100 samples being excluded from downstream analyses) using MetaPhlAn 4 (ref. 23,39) with default parameters and the custom extended SGB database. uSGBs with fewer than five MAGs were discarded, as there is a higher risk of them being the result of assembly artifacts or chimeric sequences. Next, SGB core genes were defined as ORFs in a UniRef90 family or in a de novo clustered gene family (based on the Uniclust90 clustering procedure77) that were detected in at least half of the genomes of the SGB. Core genes were further filtered by selecting the highest threshold that allowed obtaining at least 800 core genes. The obtained core genes were then split into fragments of 150 nt, and such fragments were then aligned against the genomes of all SGBs using Bowtie2 (v.18.104.22.168; –sensitive option)64. Marker genes of a SGB were defined as core genes whose fragments were found in less than 1% of the genomes of any other SGB. When fewer than ten marker genes were found for a SGB, conflicts were defined as occurrences of more than 200 of its core genes in more than 1% of the genomes of another SGB. All conflicts for each SGB were then retrieved to generate conflict graphs. Conflict graphs were processed iteratively, and SGBs were merged for each conflict to both minimize the number of merged SGBs and maximize the number of markers. Finally, a maximum of 200 marker genes were selected for each SGB, prioritizing first their uniqueness and next the larger sizes. SGBs with fewer than ten markers were discarded at this point. Merged SGBs (SGB_group) profiled in this study can be found in Supplementary Table 3. The resulting 5.1 M marker genes (average: 189 ± 34.25 s.d. marker genes/SGB) were used as a new reference database for MetaPhlAn 4 (species-level profiling) and StrainPhlAn 4 (strain-level profiling). The presence of Blastocystis and the identification of its different subtypes was inferred with a mapping-based computational pipeline described elsewhere55.
Strain-level profiling of metagenomic samples
Strain profiling was performed with a modified version of StrainPhlAn 3 (ref. 23) using the custom SGB marker database described above that has been released as StrainPhlAn 439. We modified the StrainPhlAn code to change the sample and marker filtering behavior to allow for profiling more samples and SGBs. A sample was kept as long as it had at least 20 markers (parameter–sample_with_n_markers) and a marker was kept as long as it was present in 50% of the samples (parameter–marker_in_n_samples). After this first filtering, we retained samples with at least ten markers (parameter–sample_with_n_markers_after_filt). All 2,576 SGBs profiled by MetaPhlAn were initially considered for the strain-level profiling.
To improve accuracy of strain sharing detection and to more confidently define strain identity, we additionally considered samples from curatedMetagenomicData (cMD) R package78 (v.3.15). We included 4,443 human gut metagenomic samples from 962 individuals older than 6 years from ‘Westernized’ populations (as defined in cMD) that were sampled longitudinally, obtained from 18 datasets (Supplementary Table 11). For each subject and each SGB, two samples being at most 6 months apart were selected. When more than two timepoints close in time were available, we selected the pair that maximized the lower estimated coverage of the SGB among the two samples, that is, maximized their chance to pass the filtering steps in StrainPhlAn. In case of ties, we took those with higher coverage. Coverage of an SGB in a sample was estimated as [sample sequencing depth] × [relative abundance of the SGB] / [estimated genome length], with estimated genome length being extracted from the MetaPhlAn enlarged database described above. For kSGBs this is determined using only the genome lengths of the reference genomes in the kSGB, whereas for uSGBs 7% is added to the average genome length (estimated to be the average difference between the genome sizes of reference genomes and MAGs within the same SGB).
We included in the strain analysis samples as primary (that is, those that are used to select markers, parameter–samples) if they had an estimated coverage of at least 2X that of a given SGB genome, otherwise they were added as secondary samples (that is, those that are added only after the markers are selected with the primary samples, parameter–secondary_samples). In total, 1,033 SGBs that were detected in at least 20 primary samples were profiled at the strain level. To exclude strains likely coming from food sources, we included 216 MAGs in 19 SGBs (Supplementary Table 16) coming from food samples79 and used them in the StrainPhlAn profiling with the –secondary_references parameters. Samples that had StrainPhlAn mutation rates less than 0.0015 to any food MAG were discarded following the same procedure as in (Valles-Colomer et al., manuscript in preparation). SGBs in which more than 20% of the samples would be discarded using this criterion—constituting in large part of strains regularly found in food—were fully excluded (n = 3 SGBs: Bifidobacterium animalis SGB17278, Lactobacillus acidophilus SGB7044, Streptococcus thermophilus SGB8002). Additionally, we excluded 7 SGBs for which the marker genes alignment length was shorter than 1,000 nucleotides, and another 11 SGBs for which StrainPhlAn was not successful in building a phylogenetic tree.
Inference of strain transmission events
We obtained phylogenetic distances between strains as their leaf-to-leaf branch lengths along the trees (that is, patristic distances) produced by StrainPhlAn (built on marker genes alignments, retaining positions with at least 1% variability), normalized by dividing them by the median phylogenetic distance. As no consensus definition of strain is currently available, to infer strain identity and supported by the clear bimodal distribution of patristic distances of strains from the same individual with the highest peak in 0 (ref. 22), we defined and applied operational species-specific definitions by identifying the threshold that optimally separated phylogenetic distance distributions of strains of a given species in the same individual sampled at two timepoints (same strain), to that in unrelated individuals (different strains) whenever enough data were available. For all strain-level profiled SGBs, we determined the phylogenetic distance threshold that best separates strains from the same subject (different post-FMT timepoints of the same recipient or different samples of the same donor subject or different additional longitudinal samples of the same subject, always less than 6 months apart) from those of unrelated subjects with no possibility of direct transmission (subjects in different datasets) in the datasets we used in this study. For SGBs for which at least 50 same-individual and 50 unrelated comparisons were available, we determined the threshold that maximizes Youden’s index (defined as sensitivity + specificity – 1). If the resulting calculated threshold was greater than the fifth percentile of the distribution of subjects in different datasets, we adjusted the threshold to the 5th percentile as a bound on the false discovery rate (FDR). For SGBs for which fewer than 50 same-individual comparisons but at least 50 unrelated comparisons were available (in which optimal thresholds cannot reliably be estimated), we used the third percentile of the interindividual phylogenetic distances of subjects in different datasets, which corresponded to the median of all the calculated percentiles in (Valles-Colomer et al., manuscript in preparation). SGBs for which fewer than 50 unrelated comparisons were available (n = 17) were discarded. The SGB-specific phylogenetic distance thresholds for all 995 strain-level analyzed SGBs can be found in Supplementary Table 3. Finally, we defined strain identity for pairs of strains when their pairwise genetic distance fell below the SGB-specific thresholds.
Strain-level profiling allows identification of mislabeled samples80. We identified and excluded post-FMT samples (n = 21 out of 1,419) that did not share any strain with neither their corresponding pre-FMT sample nor the donor’s sample—something highly unexpected due to the high temporal stability of the gut microbiome22,23,36,81 and thus potential cases of sample mislabeling. We also identified outliers with more than 20 shared strains between pre-FMT and donor samples while being from two supposedly unrelated individuals (n = 2 cases; Supplementary Fig. 15), most probably not representing true recipient–donor pairs. The third outlier with more than 20 shared strains was coming from a dataset using both related and unrelated donors, but the Bray–Curtis dissimilarity between the donor and pre-FMT samples was close to zero (Bray–Curtis = 0.019) suggesting they are the same biological sample and confirming the mislabeling. Finally, we excluded the ZouM_2019 cohort from the analysis because strain-sharing sample clustering was heavily discordant from the grouping of FMT triads according to the metadata (Extended Data Fig. 1) and ZouM_2019 was the only dataset with a median of only one strain shared between post-FMT and donor samples (Supplementary Fig. 16), further suggesting systematic errors in the metadata.
Inferring donor subject grouping
In three cohorts (BarYosephH_2020, DammanC_2015 and LeoS_2020) some donors provided stool material to multiple recipients, but we could not solve which donor samples were transferred to which patients, either from the metadata or through private correspondence with the authors. Therefore, we inferred grouping of donor samples into subjects using strain sharing: donor samples sharing more than 15 strains were grouped into one subject. This threshold allows confident matching of samples from the same subject, since unrelated samples very rarely share more than five strains (0.08% of pairs of samples), whereas longitudinal post-FMT samples frequently share more than 15 (56.8% of pairs of samples; Supplementary Fig. 17) as also reported elsewhere22. Indeed, in these three datasets samples from the same assigned donor always shared at least 15 strains, while this was never observed among samples from different donor individuals.
Inferring donor–recipient matching
Donor–recipient matching was unavailable for DammanC_2015 and we were unable to obtain it through private correspondence with the authors. However, as at least one post-FMT sample of a recipient always shared eight or more strains with one donor subject, while no post-FMT samples of the same recipient shared eight or more strains with any other donor subject (Supplementary Fig. 18), we used the criterion of sharing eight or more strains to infer donor–recipient matching in the dataset.
Definition of FMT triads
We considered only complete FMT triads, that is, sets of at least one sample from the recipient pre-FMT, at least one from the donor, and at least one from the recipient post-FMT. In case of multiple sequential FMT transplants, we included only the first one. In case of multiple pre-FMT samples, we used the one collected closest to the FMT. When multiple donor samples were available and there was no indication of which one was used, we picked one randomly since donor samples from the same individual are reasonably stable in terms of species-level composition and strain identity8,22 (Supplementary Fig. 19). Finally, when multiple post-FMT samples were available, we picked the one closest to 30 days post-FMT, which is the value that minimizes the sum of absolute deviations of timepoints (Supplementary Fig. 1). Where there was more than one round of treatment, we considered only those post-FMT samples that were taken before the second treatment round.
Assessing strain sharing, retention and engraftment
We defined strain-sharing rates as the total number of shared strains between two samples divided by the number of species profiled by StrainPhlAn in common between the two samples. To quantify the fraction of post-FMT strains that were already present pre-FMT or that are shared with the donor, we defined the fraction of retained strains as the fraction of post-FMT strains shared with pre-FMT (shared strains between post-FMT and pre-FMT divided by the number of strains profiled at post-FMT) and the fraction of donor strains as the fraction of post-FMT strains shared with the donor (shared strains between post-FMT and donor divided by the number of strains profiled at post-FMT).
Next, we determined the number of engrafted strains as the (absolute) number of shared strains between post-FMT and the donor excluding the strains shared between pre-FMT and the donor samples. In this context we defined four categories that describe the relationship between donor- and recipient individuals (Fig. 1e). ‘Related’: individuals are genetically related or cohabiting/friends; ‘unrelated’: individuals are neither genetically related nor cohabiting/friends as stated in the study manuscript, recruited through public advertisement or hospital’s cohorts; ‘mixed’: only some of the individuals are genetically related or cohabiting/friends; ‘unknown’: the relation of donors to recipients was not stated in the manuscript or metadata. The number of strains that could engraft is defined as the number of cases in which StrainPhlAn can profile the strain in the donor sample while excluding both the shared strains between pre-FMT and donor and the cases where the species is present in the post-FMT, but no strain is profiled by StrainPhlAn (as in these cases it is not possible to determine the strain identity). Finally the strain engraftment rate was defined as the number of engrafted strains divided by the number of strains that could engraft. This measure was computed for each FMT triad (by aggregating over species) and also for each species (by aggregating over FMT triads). In the latter case, only species with at least 15 FMT triads from at least four datasets in which the strain could engraft were included in the analyses.
Visualization and ordinations of strain sharing in cohorts
To visualize strain sharing in datasets, we computed networks as well as t-SNE plots based on the number of shared strains between pairs of samples. Unsupervised networks were visualized using the igraph package in R (v.1.2.6)82 with the Fruchterman–Reingold layout algorithm with squared edge weights, with edges being the number of shared strains and nodes representing samples. Only edges with more than one shared strain are shown. The t-SNE plot was generated using the scikit-learn package83 in Python (v.1.0.2) with perplexity set to 20 and remaining parameters left default.
Comparing strain- and species-level β-diversities for FMT triad clustering
To compare how well strain- and species-level information allow clustering of samples from the same FMT triads, we performed K-medoids clustering with partitioning around medoids (PAM) algorithm implemented in scikit-learn-extra Python package (v.0.2.0) using strain sharing rates dissimilarities (defined as 1 – strain sharing rate) as compared with Aitchison distance and Bray–Curtis dissimilarity (on untransformed data, after arcsine square root transformation and after logit transformation). In case of Aitchison distance, the zeros were replaced by the per taxon minimal nonzero abundance and in case of logit transformation the zeros were replaced by the half of the minimal nonzero abundance globally. Clustering quality was assessed using the clustering purity, which is defined as the fraction of samples that belong to the majority class in their respective cluster. When calculating the purity of FMT triads with shared donor samples (donor samples having been administered to several recipients), we treated the single sample as multiple samples, each belonging to one of the associated FMT triads. In this way the association was considered pure if the donor sample was clustered with any of the triads it belongs to.
Prevalence of the SGBs across different human body sites
We profiled 9,900 healthy human microbiome samples from 59 datasets spanning different body sites (airways, gastrointestinal tract, oral, skin and urogenital tract; Supplementary Table 11) using MetaPhlAn 4 (ref. 23,39) with default parameters and the custom SGB database (see above). Only individuals older than 3 years and from cohorts involving industrialized nonrural populations (defined as ‘Westernized’ in cMD78) were considered. Age, lifestyle and disease status were considered as reported in cMD78.
Annotation of SGB phenotypic traits
SGB phenotypes were predicted using Traitar (v.1.1.12)62 on the genes present in 50% of genomes available for each SGB in the custom SGB database. Only annotations for which the phypat and the phypat + PGL classifiers predictions were in agreement were used.
Total strain-sharing variance explained by FMT triad membership (Fig. 1a) was assessed by PERMANOVA on strain-sharing-based dissimilarities using the adonis function in the vegan package in R (v.2.5–7)84. Dissimilarities were computed within each dataset as 1 – (n/M), where n is the number of shared strains and M is the maximum of the number of shared strains.
To compare differences between median strain sharing or engraftment measures (Figs. 1e and 2a,b) in two groups of datasets against the null distribution, permutation tests were applied by randomly permuting the assignments between labels and dataset identifiers 9,999 times.
LOESS fit in Fig. 4d was computed using the geom_smooth function from the ggplot2 (v.3.3.5) in R with standard parameters.
To compare median strain-sharing rates between triads in which the FMT procedure was clinically defined as ‘successful’ and those in which was clinically ‘unsuccessful’ (see above) (Fig. 2c), we applied four statistical tests. First, we used a permutation test applied by randomly permuting the success labels within each dataset 9,999 times. Second, we fitted a linear mixed model predicting strain engraftment rate with the clinical success as an indicator variable and the dataset identifier as a random effect using the R package lme4 (ref. 85); the significance was assessed by performing a likelihood-ratio test against a null model without the success indicator variable. Third, we computed median strain sharing rates of successful and unsuccessful groups within each dataset and compared the medians of the successful group with the unsuccessful groups with the Wilcoxon signed-rank test as implemented in the SciPy package86 (v.1.7.3) in Python. Correction for multiple testing (Benjamini–Hochberg procedure, Q) was applied when appropriate with significance defined at Q < 0.1.
A multivariate analysis was performed to assess associations between strain engraftment rates and clinical/nonclinical variables. We included both covariates describing the clinical process, the recipient’s and donor’s microbiomes, and experimental variables consistently available across studies: antibiotics intake (that is, intake close to FMT treatment, intake as a FMT pretreatment or no antibiotic intake); whether the FMT was done to treat an infectious or noninfectious disease; administration of fresh or frozen stool; the amount of feces administered (in grams); the route of FMT administration categorized in ‘upper GI’ routes (capsules, enteroscopy, nasogastric tube, nasoduodenal tube, upper endoscopy, PEG), ‘lower GI’ routes (colonoscopy) and ‘mixed’ routes (FMT protocols utilizing both upper and lower routes for the same recipient); recipient’s age (in years); recipient’s and donor’s α-diversity (Shannon index on species-level abundances); the Bray–Curtis β-diversity and strain-sharing rate between recipient pre-FMT and donor; usage of bead-beating steps for DNA extraction; broad geographic regions based on the recipient’s lifestyle and diet (Mediterranean consisting of Israel, Italy and France87; North America consisting of the United States and Canada; Central and Northern Europe consisting of Norway, the Netherlands and Germany; and China). Categorical variables were converted to sets of binary variables, one per each category level (one-hot encoding). All variables were standardized by subtracting the mean and dividing by the s.d.
Since many variables in the analysis are correlated with each other (Supplementary Fig. 6), we performed partial least squares decomposition, which is well-suited for multicollinear data, where the standard linear models are inappropriate. We used the PLSRegression class with parameter scale=False from the scikit-learn83 Python library (v.1.0.2). The coefficients for each variable composing each component were retrieved through the x_weights_ parameter and the transformed data matrix through the x_scores_ variable returned from the fit_transform method. We regressed each component separately on the strain engraftment rate with ordinary least squares. The first two components were explaining the most the strain engraftment rate and were the only ones significantly associated with it (R2 = 0.187, Q = 6 × 10–10 and R2 = 0.046, Q = 3.8 × 10–3 for the first and second component, respectively; Extended Data Fig. 5). We assessed the association of the variables with the components by hierarchical bootstrap, that is, by resampling the datasets and for each dataset resampling the FMT triads and the associated variables. By resampling the data matrix this way and repeating the PLS decomposition (9,999 iterations) we obtained an estimate of empirical distribution for each weight coefficient.
We used an ML modeling approach to predict the taxonomic composition (presence/absence and relative abundance) of the post-FMT microbiome. To this end, we first organized the data such that each datapoint represented a species in a specific FMT triad. We did not consider species absent in both recipient pre-FMT and donor. As features associated with each datapoint we used information specific to each FMT triad (Jaccard distances and Bray–Curtis dissimilarities between pre-FMT and donor samples as estimates for their microbiome compositional similarity, ratio of pre-FMT and donor species abundances, time between FMT and sample collection), species relative abundances for all samples (abundances in the post-FMT were treated as the dependent variables), and Shannon entropy values for pre-FMT and donor samples, information about species (taxonomy, prevalence in an unrelated set of metagenomic samples23) and cohort-specific information (dataset, disease infectivity).
We trained RF models88 both in a LODO as well as in a fivefold CV fashion. In the CV setting, we repeated the entire training/evaluation with five resamplings and averaged the prediction probabilities. To avoid overestimating model performance, we omitted species that were absent in both pre-FMT and donor samples in the evaluation step since those are easy to predict (Fig. 4a,b). Training and evaluation of RF models was done using the classif.ranger learner (for the presence/absence classifier) and regr.ranger (for the relative abundance regressor) from the mlr3 package (v.0.10) in R89 with parameter importance = ‘permutation’. We used the unbiased AUROC metric to evaluate the performance of the presence/absence classifier. Feature importance values were obtained directly from the trained RF regression model. Reported AUROC values were calculated per FMT triad and correspond to the AUROC of the predicted post-FMT species against the species actually detected in the post-FMT sample.
The pre-FMT/donor exchange simulations are based on the idea that we can exchange the real pre-FMT/donor individuals with others (from different FMT triads) in silico and then predict and analyze the post-FMT microbiome of these artificial triads. (Fig. 4c,d). Here, we chose random pre-FMT/donor samples from a different FMT triad of the same dataset and exchanged all associated features. We ensured that donor samples came from a different FMT triad and from a different donor individual (since some donor individuals donated stool to more than one FMT triad). In these experiments, we only considered datasets with at least three donors.
To evaluate the ability of the presence/absence classifier to predict continuous post-FMT microbiome traits (Fig. 4e,f,h,i), we computed the predicted species richness of certain groups of bacteria (richness, proteobacterial richness, Firmicutes richness, Bacteroidetes richness, PREDICT 1 species richness (Supplementary Table 14), richness of oral bacterial (Supplementary Table 13). We summed up raw prediction probabilities to estimate richness values. Similarly, for the evaluation of the abundance regressor, we computed the predicted cumulative abundance of the same groups of bacteria described above.
Further information on research design is available in the Nature Research Reporting Summary linked to this article.