Clarifying variance: moderate cross‑cohort batch effects after harmonized processing
We identified 15 cohorts across 11 studies (Supplementary Table S5) that met the inclusion criteria, which required a focus on melanoma patients undergoing ICI therapy targeting PD-1 and/or CTLA-4 and/or FMT, the use of gut microbiome profiling through shotgun metagenomics sequencing, and the availability of clinical response data stratified by responders and non-responders. Because we uniformly reprocessed the raw reads to derive both taxonomic and functional profiles, author‑provided profiles were not required and, when available, were used only for quality control and sensitivity checks. Despite meeting these criteria, the included studies exhibited variability in DNA extraction toolkits, which can significantly impact downstream analyses due to differences in DNA yield33,34.
To assess inter-study variability, we examined beta diversity (abundance‑based, non‑binary), which measures differences in microbial composition between samples and evaluated the similarity of microbial profiles across studies (Methods). Variability in microbial composition can arise from differences in patient populations, sample processing methods, sequencing techniques, or underlying biological heterogeneity, all of which can introduce batch effects that obscure true biological signals. By analyzing beta diversity metrics, we aimed to determine the extent of divergence between studies and identify clusters of studies with similar microbial communities. We applied this analysis to all 15 cohorts, including both ICI-only and ICI + FMT studies, and performed batch-effect correction for each stratum and for all available omics layers. Using the ICI-only taxonomy profiles as an example, studies such as ‘Spencer’, ‘Gop’, and ‘Matson’ clustered closely together before correction, suggesting high similarity in microbial composition, whereas ‘Glitza_ici’ and ‘Simpson’ formed a separate cluster, indicative of greater dissimilarity (Fig. 2a, left). The hierarchical clustering of the eight studies after batch effect correction demonstrates a reduction in inter-study dissimilarity, as evidenced by lower dissimilarity values (Fig. 2a, right).
Fig. 2: Beta diversity clustering and the effect of batch correction.The alternative text for this image may have been generated using AI.
a Hierarchical clustering of eight studies (across 12 cohorts) before(left) and after(right) batch effect correction, respectively. Before batch effect correction, distinct groupings influenced by batch effects were shown. The clustering pattern suggests that study-specific biases contribute significantly to the observed dissimilarities. After batch effect correction, clustering of the same studies after batch effect correction reveals a reduction in batch effects, with studies clustering more closely together. However, some degree of dissimilarity remains, suggesting that while the correction mitigates batch effects, it does not entirely eliminate them. Residual batch effects may still influence clustering patterns, likely due to inherent study-specific differences that persist despite correction. b Principal component analysis (PCA) of species-level profiles colored by study69. Pre-correction (Left), the data points cluster by study batch, showing greater batch-related variation. Post-correction (Right), batch effects are reduced, and the studies align more closely while still retaining meaningful biological signals.
Nevertheless, some study-specific structure persisted, with cohorts such as ‘Lee’ and ‘McCulloch’ still forming distinct clusters. This residual clustering suggests that, while the correction substantially mitigated inter-study variability, it did not eliminate all batch effects. The persistence of these patterns may reflect unmeasured confounders, experimental differences, or inherent biological variability between studies, highlighting the difficulty of fully normalizing batch effects in complex multi-cohort datasets and underscoring the need for cautious interpretation of inter-study comparisons even after correction. The differences in microbial community profiles across studies emphasized the need for subsequent batch (study) effect correction to mitigate potential inter-study variability. To this end, we applied the MMUPHin workflow and quantified the variance explained by batch effects by permutational multivariate analysis of variance (PERMANOVA). Continuing with the ICI-only taxonomy profiling example, PERMANOVA results before batch-effect correction (Fig. 2b, left) showed that dataset grouping explained 12.148% of the variance (R² = 0.12148, p = 0.001), indicating a moderate influence of study-specific effects. After batch-effect correction, the variance explained by dataset grouping decreased to 4.261% (R² = 0.04261, p = 0.001). This reduction demonstrates that batch correction effectively minimized inter-study variability, although a small residual effect remains. The corresponding increase in residual variance (from 87.852% to 95.739%) further supports the conclusion that most of the remaining variability is now attributable to biological or experimental noise rather than systematic batch effects. This reduction in cohort-driven variability highlights the success of the batch correction procedure in minimizing study-specific biases, allowing for a more accurate assessment of the underlying biological signals.
Species-level signatures differ between ICI and ICI + FMT cohorts
For species-level taxonomic profiling, we stratified the studies into ICI-only (k = 8) and ICI + FMT (k = 3) groups and fitted a compound Poisson linear model (CPLM) using the MMUPHin lm_meta framework to associate species abundance with response status (responders (R) vs. non-responders (NR)). In the ICI-only stratum, study was included as a random effect, whereas in the ICI + FMT stratum both study and patient were included as random effects. Species were considered significant at pval < 0.05 with low-to-moderate heterogeneity (I² < 50%, except for Holdemanella porci is 72% and Erysipelatoclostridium ramosum is 55%), and those additionally passing FDR correction (q < 0.25) are marked with asterisks in Fig. 3a. For the ICI-only stratum, all significant species are displayed; for the ICI + FMT stratum, which yielded many associations, we show the eight most responder-enriched and eight most non‑responder–enriched species ranked by standardized effect size (z = coef / s.e.).
Fig. 3: Taxonomic correlates of response across ICI and ICI + FMT cohorts.The alternative text for this image may have been generated using AI.
a Standardized coefficients (z = coefficient / s.e.) for species associated with clinical response in ICI‑only cohorts (left, k = 8 studies) and ICI + FMT cohorts (right, k = 3 studies). Bars to the right of zero indicate enrichment in responders and bars to the left enrichment in non‑responders. Blue and salmon bars correspond to ICI‑only species favouring responders and non‑responders, respectively; teal and mustard bars show the same for ICI + FMT. The lower panel highlights species that overlap between the two meta‑analyses. b Longitudinal relative abundance of three overlapping species, including GGB9501 SGB14898, Pseudoflavonifractor capillosus and Clostridiaceae bacterium Marseille Q4143, in the Baruch, Davar and Bertrand ICI + FMT trials. Lines show mean relative abundance (%) in responders (blue) and non‑responders (salmon), with shaded bands indicating 95% confidence intervals. Davar is restricted to days 1-110, and Bertrand is summarized by treatment stages S1-S4. Legend entries (e.g. Responder (2 out of 3)) report the number of patients with detectable abundance of the species out of the total number of responders or non‑responders in each trial.
In the ICI‑only cohorts, several short‑chain-fatty‑acid-producing Clostridiales displayed positive standardized coefficients, indicating an association with response, including Roseburia sp., Dorea formicigenerans, and an unclassified Clostridiales bacterium (Fig. 3a, left). These taxa are broadly consistent with previous reports linking Dorea formicigenerans and other Lachnospiraceae species to improved outcomes or longer progression‑free survival in patients treated with PD‑1–based regimens35. In contrast, Hungatella hathewayi, Clostridium innocuum, and Erysipelatoclostridium ramosum showed negative standardized coefficients, indicating an association with non‑response; these taxa have been linked to dysbiosis, antibiotic exposure, and poorer ICI outcomes in other settings36. Together, these results suggest that, in ICI‑only therapy, gut communities characterized by higher modeled abundance of butyrate‑producing commensals and lower modeled abundance of potentially pathobiontic Firmicutes may favor clinical response.
The ICI + FMT stratum showed a distinct pattern (Fig. 3a, right). Among the top responder-associated taxa were multiple Alistipes and Bacteroides species, Holdemania filiformis, and several unclassified SGBs, many of which also belong to the Clostridiales and Bacteroidales orders. Notably, Holdemania filiformis has repeatedly been reported as enriched in responders to combined CTLA‑4/PD‑1 blockade, consistent with our findings in the FMT‑augmented setting35. Non‑responders in the ICI + FMT cohorts, by contrast, showed higher abundances of Lachnospira eligens, Bifidobacterium pseudocatenulatum, Anaerobutyricum hallii, Coprococcus eutactus, and related Firmicutes. Intriguingly, several of these taxa, including B. pseudocatenulatum and Coprococcus eutactus, have been associated with favorable responses or prolonged progression‑free survival in previous ICI-only studies and reviews6, yet in our ICI + FMT meta-analysis, they were linked to non‑response, highlighting that the same species can behave differently depending on treatment context and ecological background.
We next examined species that were significant in both strata. Three taxa, GGB9501 SGB14898, Clostridiaceae bacterium Marseille Q4143, and Pseudoflavonifractor capillosus, were shared between the ICI-only and ICI + FMT analyses but exhibited discordant directions of association (Fig. 3a, bottom). GGB9501 SGB14898 and P. capillosus were associated with non‑response in ICI-only cohorts yet favored responders in the ICI + FMT stratum, whereas the Clostridiaceae bacterium showed the opposite pattern. Although these species have been only sparsely discussed in the ICI literature, P. capillosus has been implicated in dietary and metabolic interventions that modulate anti‑tumor immunity, including ketogenic diet-based strategies and experimental approaches to enhance immunotherapy37. Such directionally opposite effects across treatment strata mirror recent longitudinal and multi‑cohort observations where species like Coprococcus eutactus show regimen‑dependent associations with progression‑free survival under monotherapy versus combination checkpoint blockade38.
Overall, these species-level meta-analyses reveal that there is no single universal “good” or “bad” bacterium for ICI response. Instead, overlapping sets of SCFA‑producing Clostridiales, Bacteroidales, and Actinobacteria are differentially associated with outcomes depending on whether patients receive ICI alone or ICI combined with FMT. The observation that several taxa previously linked to favorable ICI responses, such as Bifidobacterium pseudocatenulatum and Coprococcus eutactus, are enriched among non‑responders in the ICI + FMT group underscores the context‑dependent nature of microbiome–immunotherapy interactions and suggests that microbiome‑based interventions may reshape not only community composition but also the functional role of individual species in determining treatment outcome.
To further explore these discordant associations, we examined the longitudinal relative abundance of the three overlapping taxa in the ICI + FMT cohorts (k = 3; Baruch, Davar, and Bertrand) (Fig. 3b). In Baruch and Bertrand, GGB9501 SGB14898 and Pseudoflavonifractor capillosus were rare at baseline but showed a clear increase in relative abundance over time or treatment stage in responders, whereas non‑responders remained at consistently low levels. In Davar, these species also displayed intermittent spikes, primarily in responders. By contrast, Clostridiaceae bacterium Marseille Q4143 tended to be more abundant in non‑responders than responders across time in Baruch and at early time points in Davar, with little separation between groups in Bertrand. These temporal patterns support the meta‑analytic finding that the direction of association for these taxa is reversed between the ICI‑only and ICI + FMT strata, and illustrate how FMT-driven ecological shifts can cause the same species to preferentially expand in either responders or non‑responders depending on treatment context.
Distinct functional pathway signatures of ICI response in ICI‑only versus ICI + FMT cohorts
To assess how clinical and technical covariates shape microbiome composition at different functional levels, we performed PERMANOVA with study included as a blocking factor on taxonomic, biosynthetic gene cluster (BGC), and pathway profiles in the ICI‑only and ICI + FMT strata (Fig. 4a). In the ICI‑only cohorts, geographic area explained the largest fraction of between‑sample variation (up to ~5% at the taxonomic level), followed by age and sequencing instrument, all of which showed significant effects across at least one omics layer. Treatment regimen (anti‑PD‑1 monotherapy vs. PD‑1/CTLA‑4 combination) and sex contributed smaller but still detectable fractions of variance. In the ICI + FMT cohorts, age emerged as the dominant covariate, particularly for pathways, with additional contributions from sex and study area. Although these covariates each explained only a modest proportion of the total variance, their consistent effects across taxa, BGCs, and pathways highlight the importance of accounting for host and technical factors when interpreting functional associations with outcome.
Fig. 4: Metadata effects and pathway-level associations with ICI response in ICI-only and ICI + FMT cohorts.The alternative text for this image may have been generated using AI.
a PERMANOVA analysis of clinical and technical covariates on microbiome composition, stratified by treatment group. Bars show the percentage of variance explained (R²) by each metadata variable (x-axis) for species-, BGC-, and pathway-level Bray–Curtis dissimilarities, with study included as a blocking factor. Statistical significance was assessed using permutational multivariate analysis of variance (PERMANOVA; adonis function in the vegan R package) with 999 permutations. All tests were two-sided. Asterisks indicate covariates with p < 0.05. b Pathway-level meta-analysis of associations with response status. For each stratum (left, ICI-only; right, ICI + FMT), volcano plots display standardized coefficients (coef/SE) from compound Poisson linear models (CPLM; MaAsLin2) on the x-axis and −log10(p-value) on the y-axis. Each point represents a MetaCyc pathway; positive coefficients indicate higher modeled pathway abundance in responders and negative coefficients indicate enrichment in non-responders. P-values were derived from two-sided Wald tests. Multiple comparisons were controlled using the Benjamini–Hochberg false discovery rate (FDR) procedure within each stratum, and pathways passing FDR correction (q < 0.25) are highlighted (green, enriched in responders; red, enriched in non-responders) and labeled with pathway names.
We next asked which microbial pathways were directly associated with immunotherapy response after adjusting for study effects. For each stratum, we fitted CPLMs on pathway abundances within each cohort and combined estimates using meta‑analysis, treating the standardized coefficient (coef/SE) as the effect size; positive values indicate pathways with higher modeled abundance in responders, and negative values indicate enrichment in non‑responders. In the ICI‑only group (Fig. 4b, left), several amino‑acid biosynthesis pathways showed positive associations with response, including L‑arginine biosynthesis II (acetyl cycle), L‑methionine biosynthesis III, and the superpathway of L‑lysine, L‑threonine, and L‑methionine biosynthesis II. In contrast, L‑lysine degradation I and (aminomethyl) phosphonate degradation were significantly depleted in responders (qval.fdr < 0.25). These findings suggest that microbiomes of responders are functionally skewed toward anabolic amino‑acid biosynthesis9,35, whereas non‑responders harbor communities with greater potential for amino‑acid and phosphonate catabolism, consistent with the emerging view that microbial amino‑acid metabolism can modulate host immunometabolism39 and responsiveness to checkpoint blockade40,41.
In the ICI + FMT strata (Fig. 4b, right; Supplementary Fig. S5), a different functional pattern emerged. A pyrimidine deoxyribonucleoside salvage pathway was positively associated with response, indicating an enrichment of nucleotide salvage capacity in responders. Nucleotide metabolism, including pyrimidine salvage, is increasingly recognized as a key regulator of immune cell proliferation and effector function, as well as a vulnerability in cancer cells42. In contrast, several carbohydrate and redox‑linked pathways were negatively associated with response, including 1,5‑anhydrofructose degradation, Entner–Doudoroff pathway I, L‑arginine biosynthesis III (via N‑acetyl‑L‑citrulline), and the superpathway of menaquinol‑8 (vitamin K₂) biosynthesis III, together with dTDP‑α‑D‑ravidosamine and dTDP‑4‑acetyl‑α‑D‑ravidosamine biosynthesis. Menaquinone biosynthesis and related vitamin K₂ pathways are known to support the growth and respiratory activity of specific gut bacteria and can shape community composition and host–microbe interactions43. The dTDP‑ravidosamine pathway contributes activated sugar donors used for glycosylation, and more broadly, tumor‑associated glycosylation and elevated UDP‑sugar pools have been implicated in promoting hyaluronan accumulation, immune evasion, and resistance to therapy44.
Together, these pathway‑level analyses indicate that the functional architecture of the gut microbiome associated with ICI response differs markedly between ICI‑only and ICI + FMT settings. In ICI‑only therapy, responders are linked to enrichment of amino‑acid biosynthetic capacity, whereas in the FMT‑augmented setting, response is instead associated with nucleotide salvage and relative depletion of carbohydrate catabolism, menaquinone biosynthesis, and sugar‑nucleotide glycosylation pathways. These context‑dependent functional signatures complement the species‑level findings and underscore that both microbial composition and metabolic potential jointly shape clinical outcome under different immunotherapy regimens.
Multi-omics cross-study profiling uncovers ecological and metabolic divides between ICI responders and non‑responders
To test whether microbiome predictors generalize across cohorts, we trained modality-specific and integrated multimodal models on seven melanoma ICI studies using IntegratedLearner32 with random forest as the base learner (The Glitza cohort was excluded from LODO modeling because only 6 patient-level profiles were available after within-patient aggregation.) and evaluated all pairwise train–test combinations across studies (Fig. 5a). Across most pairs, AUC–ROC values were above random expectation when models were applied to external cohorts, and leave-one-dataset-out (LODO) models performed similarly to within-study models, indicating that a subset of microbial features carries reproducible information about response status.
Fig. 5: Cross-study generalization of microbiome predictors and stable multi-modal features.The alternative text for this image may have been generated using AI.
a Cross-study area-under-the-ROC (AUC-ROC) for predicting response across seven ICI cohorts using IntegratedLearner32 with random forest as the base learner. Each heatmap shows performance when training on the row study and testing on the column study; the boxed diagonal cells indicate within-study performance, and the bottom “LODO” row shows leave-one-dataset-out models trained on all other cohorts. Left, early fusion model using simple concatenation of all microbiome features; right, late fusion model that combines modality-specific predictors using an ensemble. Warmer colors indicate higher AUC-ROC. b Top 10 cross-study stable features per modality (taxa, pathways, and biosynthetic gene clusters). Bars show the cross-study feature importance score, summarizing how consistently each feature contributes to prediction across LODO models.
Using the LODO models, we extracted the top 10 cross‑study stable features per modality (Fig. 5b). In the BGC panel (“bgc”), responder‑associated features included a mix of antimicrobial nonribosomal peptide synthetase (NRPS) and RiPP‑like clusters (for example, a Rothia lanthipeptide, BGC0003096, and the stenothricin NRPS BGC0000431), together with loci involved in exopolysaccharide and capsule‑like glycans, including an EPS locus from Lactobacillus johnsonii (BGC0000767) and Klebsiella‑like capsular clusters (BGC0000730 and BGC0000733). Non‑responder‑associated features comprised an Enterococcus faecalis capsular polysaccharide locus (BGC0000792), the tilivalline NRPS toxin cluster from Klebsiella oxytoca (BGC0000446), a Streptomyces polyketide (BGC0000268), and a salivaricin bacteriocin locus (BGC0000548). NRPS and RiPP clusters often encode narrow‑spectrum antimicrobial peptides that modulate competition within the gut microbiota45, while EPS and capsule loci from organisms such as L. johnsonii, Klebsiella and Enterococcus can influence adhesion, barrier interactions, and immune evasion46. These annotations suggest that stable BGC features on both sides of the classifier are enriched for surface‑associated and competitive functions, rather than fitting a simple “beneficial antimicrobial vs harmful capsule” dichotomy, and they remain hypotheses that will require functional validation.
At the species level (Fig. 5b, “taxa” and Supplementary Fig. S2), responder‑associated features included Veillonella parvula, Fusicatenibacter saccharivorans and Blautia faecis. Veillonella species are obligate anaerobes that ferment host‑derived lactate to short‑chain fatty acids (SCFAs), particularly propionate47. F. saccharivorans and Blautia spp. are common SCFA producers, and B. faecis has been repeatedly identified as a butyrate‑producing commensal decreased in inflammatory disease48. SCFAs such as acetate, propionate and butyrate can support epithelial barrier function and modulate immune responses49, providing a plausible, but still hypothetical-link between these taxa and ICI responsiveness. Several negatively weighted taxa enriched in non‑responders, including Parabacteroides distasonis and recently described Firmicutes clades, have context‑dependent associations in the literature, so their roles in ICI outcome remain uncertain.
In the pathway panel (Fig. 5b, “pwy” and Supplementary Fig. S3), responder‑associated features were primarily central metabolic and biosynthetic pathways involved in carbohydrate fermentation, amino‑acid and cofactor synthesis, and chorismate‑linked metabolism, all functions commonly encoded in commensal anaerobes that contribute to SCFA and vitamin production49. Non‑responder‑associated pathways were enriched for sugar‑nucleotide and related routes that contribute to specialized cell‑surface glycans, including components of capsular and lipopolysaccharide structures in Gram‑negative bacteria, which are important for virulence and immune modulation50.
Overall, the cross‑study stable features (Fig. 5a,b, Supplementary Fig. S4) point to consistent differences between responders and non‑responders in taxa, BGCs, and pathways linked to SCFA production, EPS and capsule biosynthesis, and other surface‑associated functions. These patterns are biologically plausible given existing literature, but should be interpreted as hypotheses about mechanism rather than direct proof of causality.

