We selected genes that are recognized both as somatic drivers in cancer, and as causative genes in hereditary cancer syndromes, for which BoostDM models are available (Supplementary Table 1) and with >100 unique single base substitutions classified as pathogenic, likely pathogenic, benign or likely benign considering ClinGen and our in-house variant dataset. These included ATM, BRCA1, BRCA2, CDH1, PTEN and TP53. For each gene, we utilized BoostDM models trained on cancer types that correspond to the organs primarily affected in the tumor spectrum of the associated hereditary syndrome [17,18,19,20,21,22,23,24]. When not available, we used models trained on “SOLID cancers”, or other available (Table 1).
Table 1 Genes evaluated, cancer-specific BoostDM models used for classification, number of variants evaluated, and performance of best-performing models.
A total of 1275 germline single base substitutions in the selected genes (ATM, BRCA1, BRCA2, CDH1, PTEN, and TP53) were included in the analysis. These variants had been previously classified as pathogenic or likely pathogenic (hereafter referred to as PVs; n = 562), and benign or likely benign (hereafter referred to as BVs; n = 424) based on ACMG/AMP variant classification guidelines, as curated by ClinGen expert panels (source: https://erepo.clinicalgenome.org/evrepo/) or by the Genetic Diagnostic Unit of the Hereditary Cancer Program at the Catalan Institute of Oncology (Table 1). Of the total PVs and BVs analyzed, 41.5% (529/1,275) were also reported as somatic variants in public cancer datasets (TCGA and/or COSMIC). When considering only PVs, this proportion increased to 54% (302/562), highlighting the substantial overlap between germline PVs and somatic alterations in tumors. The complete list of germline variants analyzed, including their pathogenicity scores and presence in TCGA or COSMIC, is provided in Supplementary Table 2.
Applying BoostDM’s established threshold for the categorization of somatic variants —categorizing variants with a score > 0.5 as likely driver mutations and those with a score ≤ 0.5 as likely passenger mutations— the best cancer type-specific models for each gene correctly classified 74.5% (419/562) of PVs and 99% (706/713) of all BVs, or 98.6% (418/424) of non-synonymous BVs.
When analyzed by gene, performance varied. For ATM, using the BoostDM model trained on lung adenocarcinoma data, 80.9% (68/84) of PVs were identified as likely drivers, while all BVs (129/129) were correctly classified as non-drivers. Because phenotype-matched/SOLID models were not available for ATM, it was analyzed using a non–phenotype-matched model (lung adenocarcinoma); therefore, the corresponding results should be interpreted with caution. For BRCA1 and BRCA2, the SOLID cancer model correctly classified 57.1% (64/112) and 69.1% (65/94) of PVs, and 100% (150/150) and 99.2% (252/254) of BVs, respectively. The CDH1, PTEN, and TP53 models —trained on breast, SOLID, and breast cancer data, respectively— achieved correct classification rates of 91.4% (74/81), 65.6% (63/96), and 89.5% (85/95) for PVs, and 97.0% (98/101), 100% (16/16), and 96.8% (61/63) for BVs (Table 1). The performance of additional gene–cancer model combinations is detailed in Supplementary Table 3.
BoostDM showed robust classification performance for non-synonymous non-missense variants —including stop-gain, start-loss, stop-loss, and splicing-region variants—, correctly classifying 92.3% (326/353) of them (Fig. 1). Notably, the majority of misclassified variants in this category (25 of 27) were splicing-region variants affecting both canonical and non-canonical splicing positions. Most incorrectly classified splicing-related PVs were observed in BRCA2, accounting for 68% (15/22) of all splicing PV misclassifications (Supplemantry Fig. 1). A plausible explanation for this observation is that in the IntOGen pipeline, splicing mutations in BRCA2 do not show a sufficiently strong positive selection signal, as estimated by splicing-specific dN/dS values —a metric that quantifies whether splicing-disrupting mutations occur more often than expected by chance—to meet the threshold for inclusion in the positive training set. As a result, most splicing-affecting variants are excluded during model training, reducing the contribution of splicing-related features to the final classifier. This can be an issue for cancer driver genes in which splicing variants contribute significantly to pathogenicity but the splicing positive selection signal is not strong enough. One possible avenue to address this limitation would be to relax the criteria for including mutations in the training set.
Fig. 1: BoostDM scores for non-synonymous, non-missense variants and correspondence with ACMG/AMP-based classifications according to variant type.
Left panel: Distribution (box and violin plots) of BoostDM scores for non-synonymous, non-missense pathogenic and benign variants. Right panel: Type of variants and BoostDM misclassifications by variant type.
As expected, the performance of BoostDM for missense variants was lower than for loss-of-function variants. BoostDM correctly classified 46% (103/223) of missense PVs and 99.5% (408/410) of missense BVs (Fig. 2). Unfortunately, over 50% of PVs received a benign BoostDM score, with misclassifications affecting all genes (Supplemantry Table 2, Supplemantry Fig. 2).
Fig. 2: Performance of BoostDM, AlphaMissense and REVEL scores for the prediction of pathogenic and benign missense variants in the studied genes.
A Distribution (box and violin plots) of scores for benign and pathogenic variants. B Receiver operating characteristic (ROC) curves and area under the curve (AUC) values for each predictor.
We compared BoostDM to AlphaMissense [14] and REVEL [12] —two high-performing, ClinGen-endorsed tools for missense variant pathogenicity prediction [25]— in their ability to classify germline missense variants. The continuous output scores (ranging from 0 to 1 for all predictors), without any binary categorization, were used to compare the performance of the tools. While BoostDM demonstrated competitive performance (AUC = 0.905; 95% CI: 0.881–0.930), it did not outperform AlphaMissense (AUC = 0.969; 95% CI: 0.955–0.982) or REVEL (AUC = 0.962; 95% CI: 0.944–0.980) in overall pathogenicity prediction accuracy (Fig. 2A and B). Pairwise AUC comparisons using DeLong’s test revealed significant differences between BoostDM and AlphaMissense (p = 1.45×10⁻⁷), and between BoostDM and REVEL (p = 1.46×10⁻⁵), but not between AlphaMissense and REVEL (p = 0.355).
Importantly, a low BoostDM score ( ≤ 0.5) was not a reliable indicator of benignity, as 23% (120/528) of missense variants predicted as benign by BoostDM were classified as PVs by ClinGen or expert review. In contrast, a high BoostDM score ( > 0.5) was strongly predictive of pathogenicity: 98% of these missense variants were confirmed as PVs. These findings suggest that, although BoostDM may have limited utility for excluding pathogenicity, a high score could serve as supportive evidence for pathogenicity (consistent with ACMG/AMP rule PP3) when interpreting missense variants. Interestingly, only missense variants in TP53 and PTEN received BoostDM scores >0.5 (Supplementary Fig. 2).
To further evaluate the gene-specific performance of BoostDM for missense variant classification, we calculated key performance metrics for each gene and across all genes (Table 2). While BoostDM models consistently achieved high specificity, sensitivity varied considerably by gene. Notably, BoostDM failed to identify any pathogenic missense variants in ATM, BRCA1, BRCA2, and CDH1, resulting in a sensitivity of 0.000 for these genes —indicating that all PVs were incorrectly classified as benign at the standard >0.5 threshold. The PTEN model demonstrated moderate sensitivity (0.523) and perfect specificity (1.000), yielding a positive predictive value (PPV) of 1.000 but a low negative predictive value (NPV = 0.139), underscoring a high false-negative rate for benign predictions. The TP53 model showed the best performance, with high sensitivity (0.885), specificity (0.964), and F1 score (0.926), indicating that the BoostDM model for TP53 is particularly effective for distinguishing pathogenic from benign missense variants. The reasons for the observed inter-gene variability in BoostDM performance for missense variants remain unclear and warrant further investigation, particularly as additional classified variants become available for genes with smaller numbers of missense PVs, such as ATM, BRCA2, and CDH1.
Table 2 Performance metrics of BoostDM for missense variant pathogenicity prediction per gene, applying the 0.5 score threshold.
We acknowledge the potential limitations associated with using BoostDM models trained on cancer types that may not be directly relevant to the corresponding hereditary syndrome (e.g., lung adenocarcinoma for ATM or generalized “SOLID cancer” models for other genes). Additionally, there may be limitations stemming from the assumption that the mechanisms underlying somatic mutations in tumor cells are similar to those driving germline variants. In particular, germline PVs are often not present in somatic datasets, as expected, due to the positive selection processes that drive somatic mutations in tumors. Somatic cancer drivers are selected for their ability to confer growth advantages to the tumor, while germline variants may have different biological effects that do not confer such selective advantages. This discrepancy may influence the accuracy of somatic model predictions when applied to germline contexts. Further research and the development of models specifically trained on germline variant data would be beneficial to improve the accuracy of pathogenicity predictions in hereditary contexts.

