Long-term life history predicts current gut microbiome in a population-based cohort study

Study cohort
The study protocol was approved by the ethics committees of Bolzano and Verona by Comitato Etico della Azienda Sanitaria dell’ Alto Adige, Provincia Autonoma di Bolzano, and conformed to the Declaration of Helsinki. Fecal samples were collected in the Bruneck Study, a prospective population-based study on the epidemiology and pathogenesis of atherosclerosis launched in 1990 in Bruneck in northwest Italy6. Bruneck is an urban area located in an alpine region in northern Italy (South Tyrol). The genetic background of the population is heterogeneous, with sizable segments of the population having either a northern Italian or Austro-German background. Population mobility within the Bruneck area is low, at 0.2% per year at the time of the study. The official population register contains information obtained from the national census and is continuously updated regarding births, deaths and changes of residence. The study population was recruited as a sex- and age-stratified random sample of all inhabitants aged 40–79 years (125 men and women each in the fifth to eighth decade of life) selected using a computer-based random number generator. The baseline examination was performed in 1990 (July to November) with follow-up evaluations at 5-year intervals. Of the total sample, 93.6% participated in the baseline examination. The study population was white. The study has extensive metadata on all individuals since 1990 with comprehensive evaluations every 5 years up to 2016. The 2016 evaluation was performed with a 6-month delay in the spring of 2016 rather than in the autumn of 2015 (as usual) due to delays in ethics approval. All study participants provided written informed consent. Stool samples (n = 325) were collected at the most recent time point during the 2016 evaluation when study participants were 65 to 98 years old. Metadata collected include anthropometric information, each individual’s physician-confirmed medical history and diseases, food intake, lifestyle, vascular risk factors, medication and laboratory parameters4,22,23,24,25,26. In the survey area, virtually all inhabitants are referred to one local hospital that works closely with the general practitioners, which allows retrieval of full medical information. Accordingly, in this study, information on clinical diseases (current and past) and morbidities as well as medication does not rely on the participant’s self-report but was validated by medical records and based on standard diagnostic criteria.
Dietary intake was evaluated by quinquennial (1995, 2000, 2005, 2010 and 2015) dietician-administered 118-item food-frequency questionnaires (FFQs) based on the gold-standard FFQ by Willett and Stampfer27 and adapted to the dietary peculiarities in the survey area22,26. Dieticians made use of illustrative photos of foods when exploring aphasic patients and of information provided by spouses, caregivers and nursing homes. For each item in the FFQ, a common unit or portion size was specified, and we instructed participants to customize how often on average they had consumed that amount in the past years. The nine response categories ranged from ‘never’ to ‘six or more times a day’. We calculated nutritional intake by assigning a weight proportional to the frequency of use for each food (once per day equals a weight of one), multiplying this weight by the nutrient value for the specified size and summing the contribution of all foods. Nutrient composition data for foods were based on the US Department of Agriculture Nutrient Database (Release 23) (US Department of Agriculture, Agricultural Research Service, 2010, USDA National Nutrient Database for Standard Reference, Release 23; http://www.ars.usda.gov/ba/bhnrc/ndl). We dissected complex foods into component foods using common recipes. Estimates of nutrient intakes were calorie adjusted. For this purpose, we used the residuals obtained by regressing polyamine or other nutrient intake on total energy intake26,28. The reproducibility and validity of the original FFQ are well documented27 and extend to its application in the Bruneck Study, in which it was compared against 9-d diet records22,26. The Alternative Healthy Eating Index (AHEI), a measure of diet quality, significantly associated with the risk of major chronic diseases in a large number of studies, was calculated as described previously29. We did not consider the ‘duration of multivitamin use component’ because multivitamin supplementation was almost absent in our cohort. Accordingly, this index has eight components in our study (vegetable score, fruit score, cereal fiber score, alcohol score, meat ratio score, nuts and soy score, trans-fat score, polyunsaturated-to-saturated fatty acids ratio)29. Physical activity was quantified using the Baecke questionnaire30 and the Adult Compendium of Physical Activities to rate activity intensities, and the average metabolic-equivalent hours per week were calculated using these results (overall and separated into sports and non-sport physical activity). Individuals were coded as current smokers or non-smokers (including former smokers) with assessment of pack-years of smoking25. Alcohol intake was quantified in grams per day. BMI was calculated as weight in kilograms divided by height squared in meters. Systolic and diastolic blood pressure measures were taken after the participant had been sitting for at least 10 min, and the mean of three independent measurements was calculated. Hypertension was defined as systolic blood pressure ≥140 mm Hg, or diastolic blood pressure ≥90 mm Hg or the use of antihypertensive drugs. Socioeconomic status was defined on a three-category scale (low, medium and high) based on information about the occupational status and educational level of the person with the highest income in the household. Blood samples were taken in the morning hours after an overnight fast and 12 h of abstinence from smoking and immediately processed or stored at −70 °C. Diabetes mellitus was diagnosed when fasting plasma glucose exceeded 126 mg dl-1 or when participants were on antidiabetic medication. Laboratory parameters were assessed by standard methods in certified laboratories as detailed previously4,22,23,24,25,26. All study participants underwent ultrasound and transient elastography (Fibroscan, Echosens) examination to evaluate hepatic steatosis and liver stiffness. Of 325 individuals, 20 were excluded because of missing data for laboratory parameters, liver stiffness, stool features and visceral fat thickness. Variables with missing data for fewer than five individuals were replaced by the cohort mean or data were otherwise removed throughout the analysis (variables removed: muscle mass (%), metabolic rate, Bristol stool score, and fat mass (kg)). The FGFP cohort used in the present study (n = 2,215) is an expanded version of the first round of sampling completed in 2014 (n = 1,106)1,31.
DNA extraction and sequencing
Fecal DNA extraction and sequencing were performed as described previously1. Briefly, DNA was extracted from 150–200 mg of the frozen samples using the MagAttract PowerMicrobiome DNA/RNA KF kit (QIAGEN) following the manufacturer’s instructions. The V4 region of 16 S rRNA genes was amplified using the 515 F/806 R primer pair and purified using the QIAquick PCR Purification Kit. Sequencing was performed using the Illumina MiSeq platform (MiSeq Reagent Kit v2) and HiSeq 2500 system (151bp paired-end reads) for the Bruneck Study and the FGFP cohorts, respectively.
Microbial load measurement by flow cytometry
Microbial load of the study cohort was measured as described previously7. Briefly, 200–250 mg of frozen (−80 °C) fecal aliquots was diluted in saline solution (0.85% NaCl; VWR International) and filtered using a sterile syringe filter (a pore size of 5 µm; Sartorius Stedim Biotech). Next, 1 ml of the microbial cell suspension obtained was stained with 1 µl of SYBR Green I (1:100 dilution in DMSO; Thermo Fisher Scientific) and incubated for 15 min in the dark at 37 °C. The flow cytometry analysis was performed using a C6 Accuri flow cytometer (BD Biosciences) according to Prest et al.11. Fluorescence events were monitored using the FL1 533/30-nm and FL3 > 670-nm optical detectors. The BD Accuri CFlow software was used to gate and separate the microbial fluorescence events on the FL1/FL3 density plot from the fecal sample background. A threshold value of 2,000 was applied on the FL1 channel. Based on the exact weight of the aliquots analyzed, cell counts were converted to microbial loads per gram of fecal material.
Relative and quantitative microbiome profiling
After demultiplexing with LotuS v1.565 (ref. 32), fastq sequences were further processed following the DADA2 microbiome pipeline33. Briefly, sequence reads were first filtered and trimmed with the following parameters: truncQ=11, truncLen=c(130,200) and trimLeft=c(30, 30). Filtered reads were denoised using the DADA2 algorithm, which infers the sequencing errors. After removing chimeras, an amplicon sequence variant table was constructed, and taxonomy was assigned using the Ribosomal Database Project (RDP) classifier implemented in DADA2 (RDP trainset 16/release 11.5). The ELDERMET cohort data (n = 752) were obtained from the Sequence Read Archive under study accession number PRJNA283106. The dataset was processed using the same DADA2 pipeline following the recommendations for 454 sequencing technology and using the following filtering and trimming parameters: trimLeft=c(15) and truncLen=c(200). For the diversity analysis, we only included community-dwelling individuals and the first time point (n = 153).
To prepare the QMP table, the relative microbiome profiling (RMP) taxonomic table was then corrected for copy number and rarefied to even sampling depth by dividing the sequencing depth by the cell count and was subsequently multiplied by bacterial cell load to quantify the number of bacteria per gram of fecal sample as previously described in ref. 9. One participant was further excluded due to low read counts during the data conversion. Using this approach, the sequencing data became proportional to the microbial loads in the samples. All analysis was performed based on QMP unless otherwise noted.
Fecal moisture content
Moisture content was determined as the percentage of mass loss after lyophilization from 200–300 mg of frozen aliquots of non-homogenized fecal material (−80 °C). Lyophilization was performed for 2 d.
Fecal calprotectin measurement
Fecal calprotectin concentrations were determined using the fCAL ELISA kit (Bühlmann) on frozen fecal material (−80 °C). The level of calprotectin was corrected for the amount of fecal samples used.
Microbiome and statistical analysis
Statistical and microbiome analyses were performed in R (version 3.6.0)34 using the phyloseq35, vegan36, pairwiseAdonis37, rcompanion38, CoDaSeq39, DirichletMultinomial40, lm.beta41 and ppcor42 packages. Past lifestyle and dietary patterns were tested by autocorrelation (function ‘acf’) and a linear mixed model followed by the likelihood-ratio test:
$$begin{array}{l}{{{mathrm{Null}}}},{{{mathrm{model}}}}:{{{mathrm{dietary}}}},{{{mathrm{habit}}}},{{{mathrm{or}}}},{{{mathrm{lifestyle}}}}sim left( {1|{{{mathrm{participant}}}}} right) {{{mathrm{Alternative}}}},{{{mathrm{model}}}}:{{{mathrm{dietary}}}},{{{mathrm{habit}}}},{{{mathrm{or}}}},{{{mathrm{lifestyle}}}}sim {{{mathrm{time}}}} + left( {1|{{{mathrm{participant}}}}} right)end{array}$$
For the microbiota associations with any host parameters, taxa found in less than 20% of the population were excluded for noise reduction and alleviation of multiple-testing correction. Comparison of two groups was performed using the Wilcoxon rank-sum test, and Kruskal–Wallis test was used when analyzing more than two groups followed by post hoc Dunn’s test. Count data were analyzed by Fisher’s exact test. Taxonomic associations with host parameters were determined by partial correlation to adjust for confounders using the R package ppcor43. All statistical tests used were two sided. All statistical tests were followed by multiple-testing correction using the Benjamini–Hochberg method when testing more than two features. Data distribution was assumed to be normal, but if this was not the case, nonparametric testing or data transformation was applied.
Analysis of community variations using the current and past variables
The explanatory power of cohort covariates and their combined effect size for the microbial community variation was evaluated as described previously1. Briefly, distance-based RDA (db-RDA) was performed on the genus level using the Bray–Curtis dissimilarity as implemented in vegan36. Covariates (FDR < 0.1) found in this step were entered for forward stepwise model selection to measure their cumulative effect sizes. Before the analysis, the collinearity of variables was assessed by using Spearman’s rank correlation and the Wilcoxon rank-sum test for continuous and binary variables, respectively. One of the collinear variables was removed based on its representativeness and the explanatory power of its effect size of > |0.8| (Supplementary Table 20). To assess the effect of past events or host parameter shifts on the current microbiome variation, different approaches were performed for continuous and binary variables (infection, medication and smoking). For continuous variables, variable shifts between each time point and the year 2016 were calculated by subtracting the values. History of the categorical binary variables was determined by summing the event that occurred between the two time points. Smoking was taken as smoking history if the individuals were current smokers at the time point. Comparison of past and present nonredundant effect size was performed by likelihood-ratio test.
Associations of the past with the current microbiome
Enterotyping based on the DMM approach was performed as described by Holmes et al.43 on a genus-abundance RMP matrix using the R package DirichletMultinomial41 and the FGFP cohort (n = 2,215) as a background dataset. Evaluation of model fit was performed using the Bayesian information criterion (BIC) where the best model fit was found at four Dirichlet components. Taxonomic association analysis after adjusting for age and stool moisture was performed by fitting a GLM (link = logit). Beta-blocker treatment and hemoglobin clusters were used as binary dependent variables and genera were used as independent variables. Standardized β coefficients were calculated using the R package lm.beta41. Significant associations of deconfounded genera with the host parameters were tested by performing likelihood-ratio tests. Clustering of individuals was carried out by categorizing them as high or low based on the median values measured in the first time point. Multiple linear regression was performed on non-sport physical activity, hemoglobin and alanine transaminase, regressing out the effect of age, sex and BMI. Before the regression, physical activity and alanine transaminase were transformed by inverse normal transformation to fit a normal distribution.
Prediction of the current microbiome based on life history
To construct a microbiome prediction model, a random forest classifier (R package caret44) was trained by setting the historical metadata as the predictor variables and the enterotype as the response variable. Here, the historical covariates were corrected for time effects by retrieving residuals from autocorrelative models (that is, dependent variables ~ year) for each individual. Enterotype prediction was carried out for each time point and all years together to determine the most predictive variables regardless of the time points. We followed a nested cross-validation approximation, which includes data balancing, feature selection and hyperparameter optimization to eliminate redundant variables, simplify the model and improve the model’s performance. The outer loop was subjected to 40 rounds of k-fold cross-validation, while the inner loop was subjected to 5 rounds. Splitting the training dataset into training and validation datasets allowed for data balancing, feature selection and hyperparameter adjustment in the inner loop (Supplementary Information Fig. 2). The parameters that maximized the Matthews correlation coefficient (MCC), using function ‘mcc’ (R package mltools)45, and AUC values, using function ‘roc’ (R package pROC)46, were selected to train and test on the 40 partitions of the outer loop.
Data balancing
Due to the dataset’s imbalance property with the Prevotella enterotype showing the largest imbalance, a permuted covariate may be as good a predictor as the true historical data when one or more classes have meager proportions compared to the other classes. The enterotype distribution in the Bruneck cohort (B1 = 34.4%, B2 = 24.6%, P = 13.44% and R = 27.5%) was in the range of moderately (7:3) to highly (8:2 or 9:1) imbalanced. Enterotype data balancing was carried out using the synthetic minority over-sampling technique (SMOTE47) (R package DMwR48), the function ‘ROSE’ (R package ROSE49) and the down- and upsampling methods (R package caret44). To avoid overfitting due to the small sample proportion in the training partition, downsampling was skipped when the sample size was less than that in the first quartile (77 samples). As a result, four datasets were created, each of which was balanced independently. These datasets were further used for feature selection and hyperparameter tuning.
Feature selection
Feature selection was performed using recursive feature elimination (RFE)44. The RFE algorithm performs iterative modeling for feature selection. At each iteration, the top-ranked predictors are retained, and the model is reevaluated, with the best model being determined by the highest accuracy. This analysis was carried out using the function ‘rfe’ in the caret R package with the following parameters: functions=rfFuncs, method=‘cv’, metric=‘kappa’ and Number=10. The number of features selected from each iteration was set to be selected from one-quarter of the available covariates in the dataset. Cohen’s kappa metric was used as a selection criterion, given that it has a better performance than the accuracy score in imbalanced datasets. Feature selection was performed for each of the four previously balanced datasets (Supplementary Information Fig. 2).
Hyperparameter optimization
Once feature selection was performed, a random forest classifier was implemented for each of the four previously balanced datasets with its respective selected feature. Each model was tuned using a grid search optimization strategy (mtry=1:15 and ntree=1000 with 10 repetitions) using the caret R package44 functions ‘trainControl’ and ‘train’. The optimal parameters were the ones that had an AUC of > 0.7 and maximized the MCC, an index for an imbalanced dataset that incorporates all information from the confusion matrix.
Model performance
The model’s performance was assessed by applying the best parameters and features for each round to the remaining 40 rounds (functions ‘trainControl’, ‘train’ and ‘predict’; R package caret44). The model parameters and features that maximized the AUC and the mean MCC of the 40 rounds of k-fold cross-validation were selected as the best model. Random forest feature importance was estimated using the mean decrease in accuracy implemented in the caret package44 (function ‘varImp’).
Assessment of the effect of additional features
To verify that the prediction based on covariates with data for all years was not solely due to increasing the size of the feature pool from which the model could select, we evaluated the effect of additional data features by adding an increasing numbers of randomly selected additional features (10%, 25%, 50%, 75% and 100% of the entire historical dataset) to the 2016 data. By comparing the mean AUCs of prediction models with and without feature selection, we observed that the additional number of features was not associated with greater prediction power (Spearman’s correlation, P > 0.05 (Extended Data Fig. 3a,b); of note, with feature selection, a smaller number of features enters the model compared to the initial input). AUC values were significantly improved with the feature selection approach, even with a lower number of features entering the model compared to the one without the feature selection (Wilcoxon rank-sum test, P < 0.0001; Extended Data Fig. 3c). Within the feature selection prediction models, a greater number of initial input features did not significantly increase the number of features entering into the prediction model (Spearman’s rho = 0.048; P = 0.771; Extended Data Fig. 3d).
Statistics and reproducibility
We used all survival data from the Bruneck cohort since its inception in 1990; therefore, no statistical method was used to predetermine the sample size. Of 325 individuals, 20 were excluded due to missing data for laboratory parameters such as liver stiffness, stool features and visceral fat thickness. Missing data less than five was replaced by the cohort mean or otherwise removed throughout the analysis. To verify that the improvement in explanatory power was not due to an extra number of data features, we carried out the prediction analysis with randomly permuted historical covariates. The investigators were not blinded to allocation during experiments and outcome assessment.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.