Overview of SPIFEE
SPIFEE models spatial relationships in the TME by representing each sample as a graph, where nodes correspond to TME entities (e.g., cells, clusters, spots) and edges encode functions that describe relationships between entities as a function of spatial distance or other contextual factors. Node features encode phenotypic or functional properties of the entity. These embeddings may be derived from deep learning models or other representation learning techniques that capture biologically meaningful information. Meanwhile, edge features are multi-dimensional feature vectors, representing a sampled function characterizing the spatial relationship between the entities (Fig. 1a). These graphs then serve as input to a GAT architecture, where we leverage the attention mechanism to understand the most influential edges, and hence interactions, influencing the models’ decisions in differentiating pathologies of interest.
Fig. 1: Overview of the SPIFEE framework.
a SPIFEE represents spatial data as graphs, with edges encoding functional relationships. Nodes can represent any TME entity, such as cells or spots, and node features encode high-dimensional feature representations of nodes, capturing characteristics such as phenotypes, or functional or structural properties. b SPIFEE is widely adaptable to spatial biological data, here its usage is demonstrated on mIF, H&E, and ST data. First, nodes are identified from the spatial data. For mIF, we define entities as cell types, for H&E it is phenotypic clusters, and for ST it is pathways. c Node features are defined for the TME entities, capturing phenotypic or functional characteristics. d SPIFEE graphs are then constructed where edges encode spatial interaction between nodes. e A graph attention network is trained and attention is analyzed to identify the top attended interactions in influencing model predictions. These can be interpreted to be important multi-scale interactions in distinguishing groups such as cancer subtypes and survival.
Patient cohorts spanning multiple modalities and scales
We demonstrate our method on two main independent datasets: UM-PD, an mIF dataset from the University of Michigan, representing pancreatic diseases (PD), and TCGA-NSCLC, from The Cancer Genome Atlas, representing non-small cell lung cancer (NSCLC).
UM-PD consists of mIF stained tissues. The dataset consists of 198 samples (imaged cores), representing 105 patients diagnosed with either chronic pancreatitis (CP) or pancreatic ductal adenocarcinoma (PDAC). A 6-marker panel was used to phenotype 5 cell types of interest: antigen presenting cells (APC), cytotoxic T lymphocytes (CTL), epithelial cells (Epi), T helper cells (THelper), and T regulatory cells (Treg). In addition to imaging data, single-cell RNA sequencing (scRNA-seq) was performed on six organ donors, providing transcriptomic profiles for normal pancreatic tissue.
TCGA-NSCLC is a dataset consisting of H&E-stained WSIs. We leverage this dataset in two ways, generating the following sub-datasets we will refer to as: (1) TCGA-NSCLC-HPC and (2) TCGA-NSCLC-ST. These are TCGA-NSCLC datasets that utilize histomorphological phenotype clusters (HPC) information and have been in-silico stained for ST, respectively. TCGA-NSCLC contains information for lung adenocarcinoma (LUAD) versus lung squamous cell carcinoma (LUSC) subtype classification as well as LUAD survival.
TCGA-NSCLC-HPC utilizes HPCs discovered by Quiros et al.23 associated with the subtype classification and LUAD survival objectives. These HPCs, discovered in a self-supervised manner, represent distinct morphological patterns captured across tile patches in the WSIs. For the subtype classification objective, we obtained 607 samples (WSIs), and for the survival regression objective, we obtained 395 samples (patients). These LUAD samples included information on survival time and censorship. Of the 395 patients included in the survival analysis, 199 experienced the event of interest, and 196 were censored.
TCGA-NSCLC-ST uses slides generated with DeepSpot24, a deep learning method that predicts ST directly from H&E images. The authors have validated its application specifically on the TCGA-NSCLC cohort. We leveraged 994 samples (WSIs), representing 898 patients for the subtype classification objective.
The additional datasets, Stanford-CRC and LUAD-LUSC-Visium, are used for generalization and method validation, and are introduced in later sections. A summary of the spatial datasets used and their details are found in Table 1.
Table 1 Spatial datasets used in this study
SPIFEE enables flexible representation of many functions to model the TME
In this work, we construct multiple sets of graphs, where each set corresponds to a distinct edge type used to model interactions within the tissue. For each edge type, a graph is generated per sample, sharing the same node set but differing in how edges and edge features are defined. Specifically, we implement edge types capturing the following spatial co-localization and interaction relationships: G-cross area under the curve (AUC), G-cross function, and modified geographical weighted regression (mGWR) function (Methods). Briefly, G-cross measures the degree of spatial mixing between two entities over a distance range25,26, and mGWR captures the variation in interaction between entities across fields of view27,28. We use the scalar G-cross AUC as a baseline to compare against the richer, functional edge representations.
SPIFEE predicts PD from single-cell level data and identifies key cell-cell interactions
We constructed graphs from UM-PD such that nodes represent unique cell types in each sample, and node features representing cell communication embeddings derived from the associated scRNA-seq data (Fig. 1b). As such, edges capture spatial relationships between cell types. Graphs corresponding to each edge type are then used as input to our GAT models (Methods). Given the limited sample size, we employ repeated k-fold cross-validation, reporting performance metrics averaged across all 50 validation folds and 95% confidence interval bounds.
We first compare SPIFEE to Proximogram21, as SPIFEE extends this prior framework (Table 2). We observe that allowing the number of nodes to vary (including only the cell types present in each sample), permitting directed edges, and the use of GAT leads to substantial performance improvements. Finally, employing the G-cross function edge type yields additional gains in performance, highlighting the advantage of utilizing the full function rather than its reduced AUC summary. A more detailed investigation of the function compared to the scalar counterpart is presented in a later section.
Table 2 Average predictive performance for distinguishing CP from PDAC in UM-PD, evaluated across all validation folds for each edge type model
In evaluating the attention scores across the graphs for each edge type, we identified key edges most influential in differentiating CP from PDAC (Table 3). With the G-cross function edge type, we observe that the (Epi, APC) and (Epi, THelper) relationships are top-attended edges in the CP and PDAC groups, respectively. This can be explained by key differences in the epithelial-immune interactions in the inflammatory environment as opposed to the TME. APCs and CD3+ T helper cells are influential in modulating both the respective microenvironments, with CD3+ THelpers present at the invasive front of PDAC29,30. Overall, we observe distinct patterns and the emphasis placed on different cell types across various edge types. The G-cross function highlights the elevated role of epithelial cells, while the G-cross AUC underscores the heightened importance of THelper cells. Meanwhile, the mGWR function places emphasis on both of these cells with each another.
Table 3 Top three highly attended edges for predicting CP or PDAC in UM-PD
SPIFEE shows improved performance over existing methods and strong generalizability
Furthermore, SPIFEE demonstrates improved performance on classification tasks compared with competitive existing methods (Table 4). We evaluate our framework on both our primary dataset, UM-PD, and an independent generalization dataset, Stanford-CRC. The Stanford-CRC dataset consists of CRC tissues profiled using CODEX imaging from Stanford University. It includes 196 samples from 109 patients, with a binary primary outcome: recurrence or death within 60 months versus survival beyond 60 months. A 40-marker panel was used to phenotype 14 cell types of interest.
Table 4 Benchmarking against other competitive graph-based methods on UM-PD dataset and Stanford-CRC generalization dataset
We compare SPIFEE against two competitive existing methods: Mew31 and SPACE-GM11. These are both GNN-based approaches designed for mIF data in which nodes correspond to individual cells with features including normalized biomarker expression and cell size, and edges are weighted by cell-cell distances.
To enable fair comparison, while also highlighting SPIFEE’s flexibility, we evaluate three different node feature types: (1) the same feature vector used by Mew and SPACE-GM, (2) cell-type proportion per sample, and (3) our scRNA-seq-derived communication embeddings. Across all node feature types, SPIFEE outperformed both Mew and SPACE-GM. On both UM-PD and Stanford-CRC, SPIFEE achieved gains using the same biomarker expression and cell size node features used by the baseline methods. On UM-PD, scRNA-seq embeddings were the second-best performing feature type, while on Stanford-CRC, cell proportion features were second-best.
For the Stanford-CRC dataset, matched scRNA-seq data were unavailable. To generate scRNA-seq embeddings for this cohort, we leveraged a publicly available dataset from seven primary CRC patients with the closest matching cell types (Methods).
Additionally, SPIFEE hyperparameters optimized on UM-PD were applied directly to Stanford-CRC, demonstrating strong generalizability to independent datasets. These results emphasize the flexibility of our graph representation, which also enables substitution of any available node features: matched scRNA-seq embeddings when possible, related publicly available scRNA-seq data when matched data are absent, or imaging-derived features such as biomarker expression, cell size, or simple cell type proportions.
Functional representations retain predictive information for PD better than scalar reductions
We next sought to investigate: (1) whether high edge attention within each group corresponds to stronger or weaker interactions between entities, and (2) what additional information the full edge function provides beyond the scalar summary. To achieve this, we focused on the top highly attended edges in the G-cross function models (Table 3). We plot the average G-cross functions per group, and compute the AUCs of each function within each group. To assess if the groups are significantly different based on the AUC values, we perform statistical testing using the Wilcoxon rank-sum test.
We observe that the (Epi, Treg) edge, which was uniquely highly attended to in the PDAC group, corresponds to increased interactions (visualized by the average functions and the AUC in Fig. 2a), as compared to the CP group (p < 0.001). In CP, the (Epi, APC) edge was most highly attended to, and was also highly attended to in PDAC. From Fig. 2b, high interactions of these cell types is indicative of CP, while low interactions characterizes PDAC (p < 0.001). For (Epi, THelper) interactions, the reduction to the AUC of the functions does not provide information on how these patterns influence CP or PDAC (Fig. 2c). However, in visualizing the G-cross functions, there is a clear difference in how the interactions trend over distance. We see that CP is characterized by a sharper rise over increasing distances than PDAC, indicating that T helper cells are more spatially enriched around epithelial cells in CP, suggesting closer or more frequent immune surveillance or interaction, whereas in PDAC, these immune cells appear more spatially distant or excluded from epithelial regions.
Fig. 2: Average G-cross functions and corresponding area under the curve (AUC) of UM-PD G-cross function model for top attended edges.
a Epithelial, T regulatory (Treg), b Epithelial, antigen presenting cell (APC), c Epithelial, T helper cell (THelper), d Epithelial, cytotoxic T cell (CTL). Statistical significance was assessed using the Wilcoxon rank-sum test, with a p-value threshold of 0.05.
Particularly interesting is the (Epi, CTL) case (Fig. 2d), where we again observe no significant differences between the AUCs. While the average functions also yield similar AUCs (PDAC=2.903, CP=2.716), the shape of the curves reveals a more nuanced distinction. Specifically, the CP curve rises more steeply at shorter distances compared to PDAC, indicating a higher probability of close-range epithelial-CTL interactions. In contrast, PDAC requires longer distances to reach comparable cumulative interaction probabilities. This suggests that, although the overall frequency of interactions may be similar, the spatial dynamics differ substantially–potentially reflecting more immediate or direct immune engagement in CP, versus more dispersed or delayed interactions in PDAC. This highlights the limitations of AUC as a summary metric, and underscores the value of analyzing the shape and features of the G-cross function itself for deeper insight into spatial cellular dynamics.
SPIFEE identifies disease-specific interactions that are uniquely captured using functional metrics
We next analyze edge pairs that are differentially attended to across the CP and PDAC cohorts. In contrast with the previous analysis where we assess top attended edges in isolation, here we identify edges that have been differentially attended to across CP and PDAC, and consequently point to differences in importance assigned to a given edge across cohorts. This was examined for the G-cross AUC, G-cross function and mGWR function edge type models individually. The modified GWR metric captures the variation in spatial interactions across a given field of view, in contrast to the G-function and the corresponding AUC metric, capturing infiltration of cells around a reference point at a fixed distance25,27.
The observed differences in top attended edges (Table 3) and in relative attention scores (Fig. 3), emphasizes that each edge type, as well as the respective metric, offers a unique perspective on the spatial arrangement and interactions of cells in the TME. At the same time, disregarding directionality, we observe that some pairs of interactions occur across all three distinct models, the interactions between Tregs and APCs being an example. Such commonalities indicate consensus on spatial relationships of varying types that are still deemed influential in the underlying biology governing the distribution of these cells in the TME.
Fig. 3: Edges, or cell pair interactions, differentially attended to between CP and PDAC in UM-PD across all edge type models.
a G-cross AUC, b G-cross function, c mGWR function. Statistical significance was assessed using the Wilcoxon rank-sum test, with a p-value threshold of 0.05.
Of the top-attended edges, three key edge pairs were differentially attended to between CP and PDAC across all three edge type models, with (APC, Treg), (Epi, Treg), and (THelper, Treg) being highly attended to in PDAC over CP. For context, PDAC has a highly immunosuppressive and hypoxic environment compared to CP, elevating the role and interactions of Tregs with other elements in the TME32,33,34. From our previous study, it is well-established that Treg interactions with cancerous epithelial cells are elevated in PDAC compared to CP21. Tregs play a critical role in immune evasion for tumor cells, explaining their increased recruitment to tumor regions in PDAC, compared to CP, where there is no protection for the Tregs from cytotoxic immune response35. We observe that there is also an increasingly important role assigned to the (APC, Treg) edges in PDAC compared to CP. In this study, APCs are primarily CD163-expressing cells, a subset of which have been known to exhibit tumor-promoting characteristics together with Tregs36, explaining their heightened importance in the TME.
Similar trends are observed in other interactions with Tregs, indicating that Treg interactions may potentially be the key differentiator between CP and PDAC. This may be explained by several factors. Firstly, Tregs are present at a higher proportion in the pancreatic TME in relation to other immune cells, so their interactions are selectively highlighted more in PDAC. Thus, Treg interactions are greater in PDAC over CP, where other cells have a much more active role in promoting and maintaining various stages of inflammation, and in turn offer protection for the Tregs in the PDAC environment. Secondly, they are also more actively involved in the creation and maintenance of the environment in PDAC37. In contrast, CP is an inflammatory condition compared to PDAC, so the inherent mechanisms and signaling pathways involved may not all be the same29.
SPIFEE predicts lung cancer subtypes and survival and identifies key cluster-cluster interactions
We next constructed graphs from TCGA-NSCLC-HPC such that nodes are unique phenotypic clusters present in the sample, and edges are the spatial relationships between clusters (Fig. 1b). With these input graphs, we trained GAT models across all edge representations, and observe that our approach achieves a performance of over 0.8 over all edge types and metrics in the classification task. This highlights the capabilities in extending the framework and graph representations past the cellular level (with mIF) to a higher cluster level (with H&E). Graphs using the G-cross function edge representations marginally outperformed the other edge representations in distinguishing LUAD from LUSC (Table 5).
Table 5 Summary of results for subtype classification and survival prediction on TCGA-NSCLC-HPC
Our results on the classification task are competitive with existing works. Jiang et al.38 performed a comprehensive benchmark across eight methods on the same TCGA-NSCLC dataset for the identical LUAD vs. LUSC classification task. Reported accuracies ranged from 0.785 to 0.970, with a mean of 0.876, placing SPIFEE well within the performance range of these existing H&E-based approaches. There are some caveats to this comparison, particularly in different data splits used, as well as resolution of images used. The competitive models were trained on TCGA H&E slides at a higher resolution (10x magnification), whereas our analysis was carried out at a lower resolution (5x), which typically provides less fine-grained morphological detail and therefore represents a more challenging classification setting.
In predicting LUAD survival, the mGWR function edge type slightly outperforms the others according to the concordance-index (C-index) and Akaike information criterion (AIC) (Table 5). To predict survival time, we trained the GAT model on a regression task to predict survival rank, then leveraged embeddings from the model as input to a Cox proportional hazard model39 (Methods). We then stratified patients into low- and high-risk groups based on their true survival rank. Using repeated k-fold cross-validation, the survival model achieved a training C-index of 0.667 ± 0.018, indicating moderate generalization with no evidence of severe overfitting. Further, our results are on par with the previously reported C-index score of 0.6 by Quiros et al. 23. However, our approach enables additional insights and interpretability on cluster interactions most influential on stratifying risk.
On evaluating the attention scores for each sample in our survival model, we identified key edge pairs associated with high-risk and low-risk across the three edge types, with the results summarized in Supplementary Table 1. For the best-performing edge type model based on the G-cross function, we observe that (HPC0, HPC1) and (HPC1, HPC0) are the top-attended edges for high-risk and low-risk patients, respectively, with the results flipped for the mGWR function. (HPC20, HPC22) and (HPC1, HPC16) are the top-attended G-cross AUC edges for high-risk and low-risk cases. The LUAD HPC reference atlas table provided by Quiros et al. provided some histopathological and morphological context, and was used to infer biological features underlying our presented results.
We observe that HPC20, a connective mesenchymal tissue cluster consisting of peribronchial fat with moderate lymphocytic infiltration, is highly correlated with HPC22 in high-risk patients in the G-cross AUC model. HPC22, in turn, is characterized as mesenchymal tissue associated with large vessel lumina, with moderate lymphocytic infiltration23. This can be a sign of strong stromal growth and increased vascularization supporting tumor growth, or it can also indicate vascular invasion, which correlates with a higher risk of metastatic spread40,41.
Stromal HPC16 is associated with elastosis, indicating the presence of inflamed elastic fibers in and around tumor tissues in the lung, thus influencing tumor architecture and remodeling in this tissue42. The interaction of inflamed elastic-fibers-rich stroma (HPC16) and mesenchymal tissue associated with large vessel lumina is quite important, as it may be indicative of an active vascular and tissue remodeling within the TME, indicating potential tumor progression40,43. Additionally, the epithelial-mesenchymal tissue transition has been potentially linked to the proliferation of cancer-associated fibroblasts, which have been identified as an important marker for LUAD transition44. This was an HPC that was not identified as one previously associated with either better or worse survival23, thus highlighting that our framework is able to capture relationships or attributes that may not have been identified using other modeling paradigms. The interaction between diverse, inflamed sparse tumor tissue (HPC9) with highly immune tumor-stromal tissue (HPCs 0,1) across multiple models, points to immune-stromal niches or aggregates that are present in tumor tissue, with high intercellular crosstalk via cytokines and growth factors45. We also observed that (HPC16, HPC1) interaction edges were highly attended to in both low-risk and high-risk patients in the G-cross AUC model. HPC1 clusters consists of mostly inflamed compact stroma-predominant tumor tissue with high lymphocytic infiltration, and was correlated with good outcomes in the reference study23. The presence of immune-rich HPC1 environments in proximity to regions of active tissue modeling can also be potentially indicative of an active immune response being mounted against the tumor, leading to classic markers such as inflammation. This has been alluded to in specific breast cancer cohorts and can potentially explain the elevation of these edges by the model in low-risk patients46. Additionally, the high attention given to edges with HPC0 and HPC1—both correlated with good outcomes in the reference paper, and the highest attending edges across the G-cross function and mGWR models—validates that our framework is able to identify interactions that are clinically significant23.
Surprisingly, all three models pick up interactions between various tumor-stromal HPCs (HPCs 1, 0, 9), which are usually transitory regions within the tissues. Thus, our framework enables the identification of patterns and interactions at the tumor-stroma interface, providing insights into the evolution of the TME. This can facilitate the discovery of markers for treatment efficacy and disease progression through rigorous downstream analyses.
SPIFEE nominates molecular pathway interactions key in differentiating NSCLC subtypes from H&E only
Leveraging the ST data from TCGA-NSCLC-ST, we constructed graphs such that nodes represent unique molecular pathways (determined by spot-level transcriptomic information, see Methods), and edges are the spatial relationships between pathways (Fig. 1b). Overall, SPIFEE achieved a performance of around over 0.7 across all edge types and metrics (Table 6), and revealed functional relationships between regions exhibiting different pathway enrichment. Importantly, we demonstrate the ability of our framework to obtain rich molecular insights and nominate key interactions from routine H&E only.
Table 6 Average predictive performance for distinguishing LUAD from LUSC in TCGA-NSCLC-ST, evaluated across all validation folds for each edge type model
As was performed for the previous datasets, we evaluated and identified edge pairs with the highest attention weights for LUAD and LUSC across all three models (Table 7). From our results, we observe that there is uniqueness in the set of high attention edges identified across the models.
Table 7 Top three highly attended edges for predicting LUAD or LUSC in TCGA-NSCLC-ST
In the G-cross AUC model, we observe that the (JAK-STAT, PI3K), (JAK-STAT, Hypoxia) and (JAK-STAT, EGFR) were identified as key pairs for both LUAD and LUSC. In LUAD, both JAK-STAT and EGFR pathways are heavily involved in tumor remodeling, as both pathways have been implicated in promoting PD-L1 expression in various cells in the TME, promoting immune escape and tumor progression in the lung47,48,49. In LUSC, observing regions of JAK-STAT and Hypoxia upregulation in close proximity is not a surprising finding, as both pathways have a shared impact on immune suppression and increasing tumor plasticity in the TME. The hypoxia pathway is heavily involved in tumor plasticity through metabolic reprogramming and cancer cell proliferation50,51.
In the G-cross function model, we observe that the (Hypoxia, EGFR) and (TRAIL, Hypoxia) edges were identified as the top attended edges for LUAD and LUSC, respectively. EGFR mutations have been identified as a critical factor influencing LUAD malignancies52. Their long-term moderate interactions with hypoxic regions in the TME have been associated with resistance to EGFR-inhibitor-based therapies53. Integrating clinical context into the workflow can help confirm if these interactions are elevated in patients non-responsive to these therapeutic regimens for further research. On the other hand, regions having high TRAIL pathway scores seem to correlate strongly with high hypoxia activity in LUSC. A hypoxic environment is a classic marker of the worsening tumor environment, with specific hypoxia-promoting markers being induced by the tumor cells increasing aggressiveness and progression to tumor malignancy, specifically in LUSC54. The presence of other death ligand pathways in the surrounding environment, such as TRAIL, induces increased Interleukin-8 (IL-8) expression, which is a marker of cancer progression55.
In contrast, the edge pairs between spots with elevated (Estrogen, TGFb) and (Hypoxia, JAK-STAT) pathways are identified as highly attended to in LUAD and LUSC, respectively for the mGWR function-based model. The proximity of regions with enriched Estrogen and TGFb pathways in LUAD is consistent with literature, as both are involved in promoting cell invasion and metastasis in the microenvironment56. A similar observation can be made about the heightened interactions of regions exhibiting Hypoxia and JAK-STAT pathways in LUSC, both of which are key pathways involved in immune modulation in the TME. Many studies demonstrate a potential cooperative relationship in animal models and other tissue between the hypoxic environment and JAK-STAT activation57. A key point to note is that the Estrogen pathway-associated edges are highly attended to in LUAD, which is supported by earlier studies demonstrating the upregulation of genes and associated pathways in LUAD and other cancer types58,59.
In conclusion, we observe that both the G-cross function and mGWR function based models highlight key features and pathways that are strongly associated with the cancer subtypes and progression in literature, and how regions of high pathway activity interact with each other. Concurrently, our methodology also critically highlights common pathology-specific driver pathways (such as Estrogen and JAK-STAT) across multiple viewpoints of querying both real and simulated biological data. However, due to the imputed nature of the dataset, further validation on a real ST cohort is necessary to establish the full validity of the interactions.
Validation of SPIFEE-nominated pathway interactions on matched bulk transcriptomics and real ST data
In order to validate the findings from our H&E ST pathway analysis, we adopted two main strategies: (1) we performed pathway correlation analysis on associated bulk TCGA-NSCLC transcriptomics and identified pathways pairs that were highly correlated within LUAD and LUSC, and (2) we replicated the pathway analysis on an independent 10x Genomics Visium ST LUAD and LUSC dataset (n = 20) from De Zuani et al.60. Using the G-cross function, we then identified overlap between top pathway pairs nominated by our method and found in the real ST data.
First, we performed pathway analysis separately on the associated LUAD and LUSC bulk trascriptomics samples using PROGENy, and computed pathway enrichment z-scores for all 14 human pathway signatures in the database61. We then performed correlation testing between pathway pairs within the bulk data. Spearman correlation with a p-value threshold p < 0.05 with Benjamini-Hochberg FDR correction was used to compute pairwise correlations across samples for all pathways. Finally, the top 20 pathway correlation pairs with the lowest FDR scores were selected for comparison against the high attention edge pairs identified from the SPIFEE H&E ST models. The results are presented in Supplementary Tables 6 and 7, where the bolded pathway pairs are those that overlapped with our SPIFEE findings in Table 7. For example, we observed that TRAIL–EGFR pairs are prevalent across both LUAD and LUSC groups in both the bulk data and from the DeepSpot-based results from SPIFEE. This indicates that the SPIFEE top edges identified are not purely spurious, but are identifying biologically relevant signals.
Secondly, we also modeled spatial interactions between regions exhibiting high pathway activity on the real ST dataset (n = 20, nLUAD = 12, nLUSC = 8). Similar to the analysis we performed on our virtual stained ST dataset, PROGENy pathway analysis was performed, and pathways were assigned to each spot according to highest enrichment61. Following this, spatial proximity analysis was done using the G-cross AUC framework previously used in our edge generation process to identify the top interacting pathways in the LUAD and LUSC tissue samples (at a distance of 150 microns). The top ten pathway interactions identified from the real ST dataset, and their corresponding G-cross AUC scores are presented in Supplementary Table 4. We found that edges nominated by SPIFEE overlapped with some of the top edges identified in the real ST dataset, again demonstrating biological significance of our framework.
Notably, JAK-STAT self-interactions and Hypoxia self-interactions show strong overlap in both LUAD and LUSC between model attention patterns and pathway importance scores. Particular, in the LUSC samples where spots with high PI3K, Hypoxia, TGFb, JAK-STAT, and TRAIL tend to co-localize near each other with high proximity scores. When querying interactions between spots enriched with differing pathways, we observed TRAIL and PI3K spots near each other. This interaction was observed as highly enriched for predicting LUAD in the G-cross function based model, which is encouraging even after accounting for potential batch effects such as acquisition modality, tumor stage, and treatment variables.

