Clinical characteristics of the patients
This study included 10 paired samples of BLCA tissues and adjacent noncancerous tissues. The specific clinical characteristics of the patients are detailed in Table 1. In terms of the mean age of patients according to the MTHFR (677C > T) genotype, those with the TT genotype (n = 4) were 55.75 years (range: 49–65), those with the CT genotype (n = 5) were 69.60 years (range: 57–80), and the single patient with the CC genotype (n = 1) was 72 years old. In terms of the sex distribution, males accounted for 75% (3/4) of the TT genotype group, 100% (5/5) of the CT group, and 0% (0/1) of the CC group. With respect to gene methylation, TWIST1 showed methylation rates of 75% (3/4) in the TT group, 80% (4/5) in the CT group, and 100% (1/1) in the CC group. Both ONECUT2 and VIM were methylated in all cases across all genotypes. Tumor grading and staging were assessed using the World Health Organization (WHO) criteria and the TNM staging system. The results indicated no significant differences among the different genotype groups (Kruskal–Wallis test, p > 0.05), suggesting that the MTHFR (677C > T) genotype may not directly influence the morphological characteristics of the tumor.
Table 1 Summary of clinical characteristics in patients.
Genome-wide differential methylation analysis
Using whole-genome bisulfite sequencing (WGBS), we first assessed the methylation levels across samples of different genotypes. In tissues with the TT genotype (n = 4), the average methylation level across the genome (0.57 ± 0.08) was significantly lower than that in the CT genotype group (n = 5) (0.64 ± 0.07) (p = 0.031, Fig. 1A). This observation aligns with the mechanism of reduced availability of methyl donors due to decreased activity of the MTHFR enzyme. Genomic annotation analysis revealed that the methylation levels in the TSS_up5kb, transcript, and TTS_down5kb regions were significantly lower in the TT genotype than in the CT genotype (Fig. 1B-D). Furthermore, we compared the methylation levels of regulatory elements, including enhancers, insulators, promoters, silencers, and transcriptional cis-regulatory regions, and found that the methylation levels in the TT genotype were consistently lower than those in the CT genotype (Fig. 1E-I).
Fig. 1
Genome-wide methylation characteristics of the TT genotype group (n = 4 pairs) and CT genotype group (n = 5 pairs). (A) Comparison of average genome-wide methylation levels. (B-D) Comparison of methylation levels across genomic annotation regions (TSS up 5 kb, transcript, TTS_down5K). (E-I) Comparison of methylation levels across regulatory elements (enhancer, insulator, promoter, silencer, cis_regulatory).
Subsequently, we evaluated the differences in methylation between BLCA and adjacent tissues. In the TT genotype group (n = 4), we identified a total of 2,033,114 significant DMPs (△β ≥ 0.2, p < 0.05), of which 87,967 were hypermethylated sites and 1,945,147 were hypomethylated (Fig. 2A). Meanwhile, for the CT genotype group, 2,407,546 DMPs were identified, including 275,313 hypermethylated and 2,132,233 hypomethylated sites (Fig. 2B). Owing to the presence of only one individual in the CC genotype group, differential methylation analysis was not performed for this group. Focusing on tumor tissues, comparison between TT and CT genotypes identified 26,755 DMPs unique to TT genotype samples and 194,308 DMPs unique to CT genotype samples (n = 5; Fig. 2C).
Fig. 2
Genome-wide differential methylation analysis across MTHFR genotypes. (A-B) Differentially methylated positions (DMPs) between tumor and adjacent normal tissues in the TT genotype group (n = 4 pairs) and CT genotype group (n = 5 pairs), respectively. (C) Differentially methylated sites between TT and CT genotype tumor samples. (D) Comparison of the number of differentially methylated regions (DMRs). (E) Distribution of DMRs across genomic regions by genotype. (F) Comparison of the number of differentially methylated genes (DMGs). (G-H) Gene Ontology (GO) enrichment analysis of DMGs in the CT genotype group and TT genotype group, respectively.
A total of 96,867 differentially methylated regions (DMRs) were identified in the TT genotype group and 187,796 in the CT genotype group (Fig. 2D; Supplementary Table S1). Genomic annotation indicated that 4.53% of DMRs in the TT group were located in promoter regions (defined as 2 kb upstream and downstream of the TSS), whereas 55.73% were intragenic (intronic and exonic) and 22.65% were intergenic (Fig. 2E). Corresponding proportions in the CT group were 3.06%, 48.45%, and 27.39%, respectively (Fig. 2E). The CT genotype group exhibited a greater number of differentially methylated genes (DMGs) (n = 1,395) than the TT genotype group (n = 787) (Fig. 2F; Supplementary Table S2). Gene Ontology (GO) enrichment analysis demonstrated that CT-associated DMGs were significantly enriched in biological processes including protein catabolism, lipid metabolism, and responses to growth factors (Fig. 2G; Supplementary Table S3). In contrast, TT-specific DMGs were predominantly enriched in processes related to embryonic morphogenesis, central nervous system neuron differentiation, and head development (Fig. 2H; Supplementary Table S4). Pathways overlapping between the two groups were primarily associated with skeletal system development. These findings suggest that MTHFR 677 C > T genotypes may differentially influence BLCA development and progression through distinct biological processes, including protein and lipid metabolism as well as developmental and differentiation pathways.
Relationship between molecular subgroup classification and clinical prognosis
Comparison of DMPs identified by WGBS with CpG sites from the TCGA-BLCA 450 K methylation array yielded 1,977 overlapping CpG sites, including 1,158 CT-associated DMPs and 819 TT-associated ones. NMF clustering was performed using the β values of these sites. The optimal number of clusters was determined based on the cophenetic correlation coefficient (0.98) and silhouette width (0.85) (Fig. 3A,B). Accordingly, TCGA-BLCA samples (n = 408) were classified into three molecular subgroups based on WGBS-derived methylation characteristics (Fig. 3C). Subgroup 1 (n = 77) exhibited high methylation levels (median = 0.42), Subgroup 2 (n = 135) displayed the lowest methylation (median = 0.36), and Subgroup 3 (n = 200) showed high methylation (median = 0.45) (Fig. 3D, E). Although significant differences in methylation levels were observed among these three subgroups, all demonstrated reduced methylation levels compared with the normal samples (Fig. 3F). Kaplan–Meier survival analysis revealed that the overall survival (OS) of patients in Subgroup 2 was significantly better than that of those in the other two subgroups (log-rank p = 0.008, Fig. 3F), whereas no significant difference in progression-free survival (PFS) was observed (p = 0.71). Subgroup 2, which exhibited the best prognosis, was also associated with a lower tumor grade (higher proportion of G1-G2, P < 0.001) and earlier stage (higher proportion of stages I-II, p < 0.001) (Table 2).
Fig. 3
Methylation subtyping based on DMCs associated with MTHFR (677C > T) genotype. (A) Plot of the cophenetic correlation coefficient for Non-negative matrix factorization(NMF), indicating clustering stability (values closer to 1 reflect greater robustness). (B) Silhouette score analysis evaluating clustering quality, with higher scores indicating better-defined clusters. (C) Heatmap of methylation-based clustering in TCGA-BLCA samples. (D) Comparison of average methylation levels across the three molecular subgroups. (E) Comparison of subgroup-specific methylation levels relative to normal samples. (F) Kaplan–Meier survival curves showing overall survival differences among the three subgroups (Subgroup 1, n = 77; Subgroup 2, n = 135; Subgroup 3, n = 200); two samples were excluded due to missing survival data.
Table 2 Association between molecular subgroups and clinical characteristics.
We then performed multivariate Cox proportional hazards regression analyses incorporating the methylation-based subgroup classification alongside standard clinicopathological variables. For overall survival (OS), Subgroup 2 (HR = 0.63, 95% CI: 0.40–0.79, p = 0.048) and subgroup 1 (reference) showed significantly better outcomes compared to subgroup 3 (HR = 1.32, 95% CI: 1.12–1.52, p = 0.05), after adjusting for age, sex, stage, and grade (Supplementary Table S5). For progression-free survival (PFS), the methylation subgroup classification did not show prognostic significance (p > 0.05). These results indicated that the methylation-based molecular classification provides prognostic information independent of standard clinicopathological variables in terms of overall survival.
Independent validation
Clustering analysis of the GSE183920 dataset further supported the robustness of the three molecular subgroups (Supplementary Fig. 1 A). This dataset provided event data regarding patient mortality over a 10-year period. In subgroup 1 (n = 258), 163 patients died within the 10-year survival period, resulting in a mortality rate of 63.18%. In subgroup 2 (n = 153), 77 patients died, corresponding to a mortality rate of 50.32%. Finally, in subgroup 3 (n = 192), 133 patients died, leading to a mortality rate of 69.27%. The mortality rate for subgroup 2 was significantly lower than those of the other two subgroups (χ2 test, p = 0.001) (Supplementary Fig. 1B). We also identified the presence of three subgroups in GSE193208 (cfDNA methylation) (Supplementary Fig. 1 C). Notably, the complete response rate to neoadjuvant chemotherapy was significantly higher in subgroup 2 patients (48.72%) than in subgroup 1 (33.33%) and subgroup 3 ones (25.00%) (p = 0.05, Supplementary Fig. 1D).
Analysis of tumor microenvironment characteristics
Given the marked prognostic differences among the three WGBS-derived subgroups, tumor immune microenvironment characteristics were further evaluated to determine whether these differences might contribute to variation in treatment response and survival outcomes. CIBERSORT analysis of immune cell infiltration in TCGA-BLCA data indicated that Subgroup 2, which exhibited the most favorable prognosis, had the highest proportions of plasma cells, lymphocytes, and memory B cells (Supplementary Fig. S2A-C). In contrast, naïve B cells were most abundant in Subgroup 3 (Supplementary Fig. S2D). Macrophage populations (M0, M1, and M2) were significantly reduced in Subgroup 2 compared with the levels in Subgroups 1 and 3 (Supplementary Fig. S2E-G). Additionally, the proportion of monocytes was significantly higher in Subgroup 2 (Supplementary Fig. S2I), whereas neutrophil infiltration was significantly lower (Supplementary Fig. S2J). No significant differences were observed among subgroups for eosinophils or activated and resting natural killer (NK) cells (Supplementary Fig. S2H, S2K, S2L). Activated and resting CD4 memory T cells were most abundant in Subgroup 3 (Supplementary Fig. S2M, S2N), whereas naïve CD4+ T cells were most abundant in Subgroup 2 (Supplementary Fig. S2O). Although CD8+ T cells were slightly more abundant in Subgroup 2 than in Subgroups 1 and 3, the difference was not statistically significant (Supplementary Fig. S2P).
Analysis of the expression of immune checkpoint molecules revealed that the expression levels of genes such as PD1, PD-L1, CTLA4, and LAG3 were significantly higher in subgroup 3 than in the other two subgroups (ANOVA, p < 0.05) (Fig. 4A).
Fig. 4
Tumor microenvironment characteristics across the three molecular subgroups. (A) Heatmap of immune cell infiltration estimated by the CIBERSORT algorithm. (B) Box plots of immune checkpoint gene expression (PDCD1, CD276, CTLA4, LAG3). (C) Comparison of predicted immunotherapy response using the Tumor Immune Dysfunction and Exclusion (TIDE) score. The TIDE algorithm estimates a patient’s response to immune checkpoint blockade therapy based on the gene expression characteristics of T-cell dysfunction and rejection. The higher the score, the lower the possibility of a response to immunotherapy. (D) Immunotherapy response rate between the three subgroups in IMvigor210 cohort (n = 195).
Furthermore, upon using the TIDE algorithm to predict responses to immunotherapy, substantial variations were observed in the T-cell dysfunction scores across the three patient subgroups. Specifically, subgroup 3 exhibited the highest score, followed by subgroup 1 and then subgroup 2 (Fig. 4B). Subsequently, the tumor immune microenvironment in the GSE183920 dataset was inferred using the epigenetic deconvolution method (HiTIMED). It was found that the overall scores of the three immune cell subgroups were consistent with the TCGA BLCA subtypes. Notably, in subgroup 3, there was a significant increase in neutrophil infiltration (ANOVA, P < 0.001), while the proportion of CD8+ T cells decreased significantly (p = 0.003) (Fig. 4C). Using the methylation-based subgroup gene signatures derived from our TCGA-BLCA analysis, we classified IMvigor210 samples into three subgroups (C1/C2/C3: n = 41/55/99) using the nearest-centroid classifier approach. Results showed that subgroup 2 exhibited the highest objective response rate (CR&PR: 44.44%) compared to Subgroup 1 (24.24%) and Subgroup 3 (15.56%; χ2 square test, p = 0.0013, Fig. 4D). These findings suggest that WGBS-derived methylation patterns may influence patient sensitivity to immunotherapy by modulating the tumor microenvironment.
Subgroup comparison of genetic mutations
The analysis of mutation profiles from TCGA-BLCA samples revealed distinct mutation patterns among the different subgroups. For example, subgroup 1 exhibited the notable enrichment of RB1 mutations (29.87%), while subgroup 2 was characterized by higher rates of FGFR3 (29.85%), STAG2 (21.64%), and PIK3CA (22.23%) mutations. subgroup 3 displayed significant rates of RB1 (21.11%) and PIK3CA (25.13%) mutations (Fig. 5A-C). Furthermore, mutational signature analysis indicated that subgroup 1 predominantly exhibited signature 3, which is associated with defects in DNA double-strand break (DSB) repair via homologous recombination (HR). In contrast, subgroup 2 was enriched with signature 2, which is linked to APOBEC enzyme activity, while subgroup 3 demonstrated significant enrichment of signature 5, which is related to clock-like aging (Fig. 5D). Additionally, we assessed the tumor mutational burden (TMB) across the three subgroups and observed that the TMB of subgroup 2 (median = 3.99) was lower than those of subgroups 1 and 3(median = 4.36/4.16); however, no statistically significant differences were found (P = 0.81). These genomic characteristics suggest that the methylation patterns associated with different MTHFR (677C > T) genotypes may influence the genomic stability of BLCA.
Fig. 5
Genomic alterations across methylation-based subgroups in BLCA. (A-C) Mutation frequencies of key driver genes in Subgroups 1–3. (D) Distribution of mutational signatures across the three subgroups.
Integrated transcriptomic data analysis
Differentially expressed gene (DEG) analysis was performed for each TCGA-BLCA subgroup, identifying a total of 2,333 significant DEGs (FDR < 0.05, |log₂ fold change|≥ 1; Supplementary Fig. S3A; Supplementary Table S6). Genes specifically upregulated in Subgroup 1 (n = 17) were primarily associated with positive regulation of calcium ion import and the phosphatidylcholine biosynthetic process (Supplementary Fig. S3B; Supplementary Table S7). In Subgroup 2, 1,291 upregulated genes were identified and were predominantly enriched in processes such as RNA splicing and ATP synthesis (Supplementary Fig. S3C; Supplementary Table S8). In contrast, 1,025 genes upregulated in Subgroup 3 were significantly associated with immune-related biological processes, including leukocyte-mediated immunity (Supplementary Fig. S3D; Supplementary Table S9).
Integration of methylation and transcriptomic data demonstrated a significant inverse correlation between promoter methylation and gene expression (median ρ = − 0.17, adjusted p < 0.05). This association was most pronounced in Subgroup 2 (median ρ for Subgroups 1, 2, and 3: − 0.12, − 0.16, and − 0.13, respectively).
We further discovered that TRIM27, a gene marker for the subgroup 2, and RASSF1, a gene marker for the subgroup 3, exhibited the most significant negative correlation between methylation and expression levels. Notably, 122 methylation probes were associated with TRIM27, of which 59.84% demonstrated a significant negative correlation (Fig. 6A). By sorting the samples from low to high based on expression levels, we observed that samples from the subgroup 2 were predominantly located within the high-expression range, with the promoter-related region probes of the subgroup 2 samples exhibiting lower methylation levels (Fig. 6B). A box plot (Fig. 6C) further confirmed that the expression level of the TRIM27 gene in the subgroup 2 was significantly higher than those in the other two subgroups. Survival analysis indicated that patients in the group with high TRIM27 expression (top 25% of samples) had a significantly better prognosis than those in the low expression group (lowest 25%) (Fig. 6D), with a log-rank P-value of 0.0025 and a hazard ratio (HR) of 0.49. Similarly, our analysis of the RASSF1 gene revealed that it exhibited the highest expression level in the subgroup 3 (Fig. 6E), while the methylation level of its promoter region was the lowest in this subgroup. Increased expression of RASSF1 was significantly associated with reduced overall survival in patients (Fig. 6F), consistent with the poor prognosis of the subgroup 3.
Fig. 6
Analysis of methylation-expression relationships for TRIM27 and RASSF1 across subgroups. (A) Correlation between methylation probes and TRIM27 expression. (B) Methylation heatmap ordered by TRIM27 expression levels. (C) Comparison of TRIM27 expression across subgroups. (D) Association between TRIM27 expression and overall survival. (E) Comparison of RASSF1 expression across subgroups. (F) Association between RASSF1 expression and overall survival.
These results indicate that the methylation patterns associated with the MTHFR (677C > T) genotype may influence the prognosis of BLCA patients by regulating the expression of key genes. TRIM27, a marker gene for the subgroup 2, is associated with a favorable prognosis when it is hypomethylated and highly expressed. In contrast, RASSF1, a marker gene for the subgroup 3, is linked to a poor prognosis when its expression is upregulated. This integrated analysis of epigenetics and transcriptomics provides new insights into the role of the MTHFR (677C > T) polymorphism in the development and progression of bladder cancer.
Experimental validation of the TRIM27 and RASSF1 methylation–expression association
Gene expression quantification and bisulfite Sanger sequencing were performed for TRIM27 and RASSF1 in 10 fresh tumor tissues to evaluate the relationship between promoter methylation and gene expression. Based on analyses of the TCGA-BLCA dataset, DMRs were identified within the promoter regions of both genes (Supplementary Table S10). Because each DMR encompassed multiple CpG sites, probes exhibiting the most significant negative correlations (lowest p values) were prioritized. Accordingly, cg23756442 and cg06980053 were selected for TRIM27 and RASSF1, respectively (Supplementary Table S10). qPCR and Sanger sequencing results demonstrated that increased methylation at the selected CpG site in the TRIM27 promoter was associated with reduced gene expression (Fig. 7A; Supplementary Table S11), consistent with TCGA findings. Similarly, methylation of the key CpG site in the RASSF1 promoter region was inversely correlated with gene expression (Fig. 7B; Supplementary Table S12). These experimental findings further support the bioinformatics-derived associations and indicate that promoter methylation plays a regulatory role in modulating TRIM27 and RASSF1 expression in bladder cancer.
Fig. 7
Experimental validation of methylation-expression relationships for TRIM27 and RASSF1 in 10 bladder cancer tissue samples. (A) Correlation between TRIM27 expression and methylation status of the target promoter region. (B) Correlation between RASSF1 expression and methylation status of the target promoter region. Correlations were assessed using Spearman’s rank correlation coefficient.

