S2C2 database retrieval and processing
The L–R database was developed following the approach outlined in our previous study,13 with updated information incorporated from KEGG (updated as of Jan 2024). Signaling and gene interaction data were sourced from KEGG and Ingenuity Pathway Analysis (IPA, version 22.0.2). Self-binding interactions were excluded from the integrated network to create simplified, unidirectional pathway networks. Pathways with identical or similar names from multiple sources were retained and labeled according to their originating database. These combined resources were then merged into a comprehensive gene signaling database to support S2C2 modeling. Each pathway in the S2C2 database was transformed into a graph, \(G=(V,E,W)\), where the node set V represents the genes in the pathway, the edge set E denotes the interactions among these nodes, and W indicates the context-specific weight of each edge based on the nature of the interaction as defined by the gene signaling database. Positive interactions, such as activation, causation, binding, phosphorylation, and expression, were assigned a positive weight (+1). In contrast, inhibitory interactions, including inhibition, deactivation, dephosphorylation, and dissociation, were assigned a negative weight (−1). For interactions categorized as indirect or unspecified, a neutral weight (+0.5) was applied.
Preprocessing AD single-nucleus RNA sequencing (snRNA-seq) data from Tsai’s dataset35
The raw cell counts were obtained from Synapse (syn18485175) and preprocessed according to the protocols outlined in the original paper, utilizing the Seurat package.93 Differentially expressed genes (DEGs) were identified using the FindMarkers function in Seurat. Four brain cell types were annotated: astrocytes, microglia, inhibitory neurons, and excitatory neurons.
Preprocessing AD snRNA-seq data from Swarup’s dataset46
The dataset was accessed via Synapse (syn22079621), and the preprocessing steps provided by the original authors were followed using code from their GitHub repository (https://github.com/swaruplab/Single-nuclei-epigenomic-and-transcriptomic-landscape-in-Alzheimer-disease/), resulting in a fully processed Seurat object. For comparison, DEGs between the control group and AD patients were identified for key cell types (astrocytes, microglia, inhibitory neurons, and excitatory neurons) using the FindMarkers function in Seurat.
Preprocessing AD snRNA-seq data from the Kampmann dataset47
Data were retrieved from Synapse (syn21788402), focusing on two brain regions: the superior frontal gyrus (SFG) and the entorhinal cortex (EC). Each region was treated as a separate dataset, labeled Leng_EC and Leng_SFG. All the data were processed using the code provided by the original authors, generating two distinct Seurat objects. AD pathology was classified based on Braak staging, where a Braak stage of zero indicates early AD pathology and a stage of six indicates moderate to severe pathology. For comparative analysis, DEGs between Braak stages zero and six were identified using Seurat’s FindMarkers function.
Preprocessing of TICAtlas single-cell RNA sequencing (scRNA-seq) data
Processed RDS files for the TICAtlas were obtained from Zenodo (https://doi.org/10.5281/zenodo.4263972) as described by Nieto et al.36 For this study, we specifically extracted and analyzed data from breast cancer and NSCLC cohorts. Analysis focused on characterizing cellular crosstalk within the immune microenvironment; specifically, we investigated interactions between M2 tumor-associated macrophages (M2 TAMs) and natural killer (NK) cells in breast cancer and interactions between M2 TAMs and conventional dendritic cells (cDCs) in NSCLC.
Single-cell RNA-seq data analysis of bone metastasis
Bone marrow single-cell gene expression data were processed using Cell Ranger 7.1.0 software, with alignment to the mouse genome reference, refdata-gex-mm10-2020-A. Quality control was conducted via the Seurat pipeline, excluding cells with more than 2500 or fewer than 200 unique features, as well as those with over 5% mitochondrial content. Data normalization was performed using the global-scaling LogNormalize method, which adjusts the expression levels of each feature by dividing by the total expression for each cell, multiplying by a scale factor (default 10,000), and applying a log-transformation. A linear transformation (scaling) was subsequently applied, followed by dimensionality reduction using principal component analysis (PCA). Cell clustering was achieved through Seurat’s FindClusters function, utilizing a graph-based approach to group cells. Nonlinear dimensional reduction was conducted with UMAP to visualize clusters. DEGs for each cluster were identified using the FindMarkers function, and cell types were annotated based on known markers: erythroid cells (Tfrc, Gypa, Gata1, Klf1, Hbb-bs, Hba-a1/a2), MSCs (Nt5e, Thy1, Ly6a), T cells (Cd4, Cd3e), B cells (Cd19), endothelial cells (Cd34, Tek, Kdr), myeloid cells (Cd33, Itgam, Ly6g), and macrophages (Adgre1, Csf1r, Cd68).
Spatial transcriptome data analysis of bone metastasis
Spatial transcriptomics data from the early bone-metastasis mouse model induced by Lewis carcinoma cells (LLC1-GFP) were generated using the 10x Visium platform. FASTQ files from four samples were aligned to the mouse refdata-gex-mm10-2020-A reference genome using Space Ranger v2.1.0. Initial preprocessing of the spot-by-gene expression data was performed with Seurat, and normalization was applied to account for variance in sequencing depth across data points. The four samples were then integrated into a single dataset using Seurat’s FindIntegrationAnchors function, where pairs of similar cells (anchors) were identified across datasets through reciprocal PCA and mutual nearest neighbors methodologies. These anchors enabled the alignment and integration of the datasets, mitigating batch effects while preserving shared biological structures. Dimensionality reduction and clustering were carried out using the same approaches as described for single-cell data analysis.
Multimodality data integration of bone metastasis
To identify tumor regions in the spatial transcriptomic tissue slices, immunofluorescence (IF) images with GFP signals (LLC1-GFP), marking tumor cells, were generated from tissue slices adjacent to those used for spatial transcriptomics. Tumor-enriched regions in the IF images were aligned with the spatial transcriptomics (ST) images using ImageJ. To locate stroma- and immune-enriched areas, scRNA-seq data were integrated with the ST data using the anchor-based integration workflow from Seurat v3, enabling probabilistic transfer of annotations from a reference dataset to a query set. This process facilitated a probabilistic classification of cell types for each ST spot according to scRNA-seq-derived cell type identities.
Spatial measurement of cell‒cell distance/clustering in ST data using Ripley’s K function
Analyzing all pairwise cell‒cell communications demands substantial computational time and resources, especially for spatial transcriptomic data where dozens of cell groups may be identified on each tissue slide, potentially resulting in hundreds of cell‒cell interaction pairs to predict cell communication. However, autocrine and paracrine communications typically occur among cells within approximately 200 μm.94 Thus, running crosstalk algorithms on all possible cell pairs is unnecessary and inefficient. To address this, a bivariate K function was introduced to assess the spatial relationships between stromal and tumor cell groups before applying S2C2 to spatial transcriptomics data. The K function is defined as:
$${K}_{i,j}\left(r\right)=\frac{1}{{D}_{i}{D}_{j}A}{\sum }_{k}{\sum }_{l}I\left({d}_{{i}_{k},{j}_{l}} < \,r\right)$$
where:
-
\(I[\bullet ]\) denotes the indicator function, which equals 1 if the condition is true and 0 otherwise.
-
\({D}_{i}\) and \({D}_{j}\) represent the densities of cell group i and cell group j in each ST sample.
-
\({d}_{{i}_{k},{j}_{l}}\) is the distance between the kth cell of group i and the lth cell of group j.
-
A refers to the area of the tissue region in each ST sample.
-
r is the selected radius (default is 200 µm), representing the potential L–R interaction range for local paracrine pathways when calculating the bivariate K function.
A higher K score indicates greater clustering of the two groups of cells within the specified radius in each sample. To assess whether the two cell types are significantly clustered across all samples, a normalized K score is calculated for each sample and subsequently averaged over all ST samples. Monte Carlo simulations are performed to determine the upper and lower bounds of the 95% confidence interval for the K values. The normalized K score is then defined as: \({K}_{i{,}j}^{norm}=\frac{2{K}_{i,j}-{K}_{i,j}^{upper}-{K}_{i,j}^{lower}}{{K}_{i,j}^{upper}-{K}_{i,j}^{lower}}\)
where:
-
\({K}_{i,j}^{upper}\) and \({K}_{i,j}^{lower}\) are the upper and lower bounds of the 95% confidence interval obtained from the Monte Carlo simulations.
-
\({K}_{i,j}\) is the observed K score between cell groups i and j.
This normalization enables direct comparisons across multiple samples by ensuring a consistent range of variation in spatial clustering. A normalized K score greater than 1 indicates significant clustering between groups i and j, while a score less than −1 indicates significant dispersion between the groups. Scores between −1 and 1 suggest nonsignificance. Only cell groups with an average normalized K score greater than 1 across all samples are recommended for spatial crosstalk analysis.
Preparation of S2C2 input
A standard Seurat object in R, containing gene expression data in the assay slot and cell type annotations in the metadata slot, was used as input for S2C2. This object is generated through conventional workflows applicable to both single-cell and spatial transcriptomics data, including data normalization, PCA-based dimensionality reduction, clustering, and annotation. To compare different conditions, sample condition information, such as AD vs. Control in the AD crosstalk use case, must be included in the metadata slot. If this information is absent, comparisons are performed within the same condition, as shown in the bone metastasis use case. For both the Java-based GUI and the CLI, an RDS file saved using the Seurat saveRDS function can be used as input.
Prediction of ligand–receptor (L–R) interactions with S2C2
L–R interactions between sender and receiver cells were determined based on their expression patterns, characterized by fold change (FC) and the percentage of expressed cells (PE) in each cell group. Ligands were filtered using default thresholds: an average log2-fold change greater than 0.2, expression in more than 0.5% of the sender cell population, and a p value below 0.05. Receptors were selected if they were expressed in more than 0.5% of the receiver cell group, as receptor upregulation is not required to activate downstream pathways. An enrichment score (ES) was calculated for each L–R pair to further refine the interactions. The ES is defined as:
$$ES\left({C}_{s},{C}_{r},L,R\right)=mean[{P{E}_{L}^{{C}_{s}}\times \log (FC}_{L}^{{C}_{s}})+{P{E}_{{\rm{R}}}^{{{\rm{C}}}_{{\rm{r}}}}\times log(FC}_{R}^{{C}_{r}})]$$
where:
-
\(F{C}_{L}^{{C}_{s}}\) and \(P{E}_{L}^{{C}_{s}}\) denote the average fold change and the percentage of expressed cells for ligand \(L\) in the specified sender cell \({C}_{s}\).
-
\(F{C}_{R}^{{C}_{r}}\)g and \(P{E}_{R}^{{C}_{r}}\) denote the corresponding values for receptor \(R\) in receiver cell \({C}_{r}\).
When different sample conditions are provided, the fold change is calculated by comparing the mean gene expression of the cell groups across conditions (e.g., AD vs. control). If no sample conditions are specified, the fold change is calculated by comparing the value across different cell groups (e.g., tumor vs. nontumor). The enrichment score represents the strength of the interaction between the ligand and receptor. A higher enrichment score indicates that both ligand and receptor exhibit larger fold changes and are expressed in more cells. Only L–R pairs with a positive enrichment score were predicted as potential intercellular communication candidates and exported to the “LR_pairs.txt” output file. Additionally, the significance of the enrichment score for the (L, R) pair from sender cell group \({C}_{s}\) to receiver cell group \({C}_{r}\) was calculated to measure the overall contribution of the (\({C}_{s},{C}_{r}\)) cell pairs to the (L, R) interaction among all possible cell pairs. This significance score is defined as:
$$E{S}_{sig}\left({C}_{s},{C}_{r},L,R\right)=\frac{{\sum }_{i,j}I[ES\left({C}_{{i}},{C}_{{j}},L,R\right)\ge ES\left({C}_{{s}},{C}_{{r}},L,R\right)]}{{\sum }_{i,j}1}\in (0,1]$$
where:
-
\(I[\bullet ]\) is the indicator function.
-
\(i,j\in \{1,2,\ldots ,k\}\) represents the set of all k cell types in the dataset.
A lower value of \({{ES}}_{{sig}}\) indicates a more significant contribution of the cell type pair (\({C}_{s},{C}_{r}\)) to the \((L,R)\) interaction among all possible cell pairs. This specificity score is included in the “LR_pairs.txt” output file for reference.
Prediction of activated pathway branches with S2C2
Tens of L–R interactions are typically included within each established pathway in KEGG or IPA, followed by various signal flow paths triggered by each L–R interaction, potentially involving hundreds of genes contributing to each pathway. Therefore, a single activation score for each pathway is not considered insightful, as different activation patterns regulating various biological functions of cells are encompassed within the same comprehensive pathway. To specify which specific downstream flows are triggered by each L–R interaction, the concept of pathway branches is introduced. Each pathway’s branches are defined as the subgraph of its pathway graph \(G=(V,E,W)\), denoted by \(P=({V}_{p},{E}_{p},{W}_{p})\),where:
-
\({V}_{p}=\left\{{v}_{0},{v}_{1},\ldots ,{v}_{m}\right\}\subseteq V\) denotes the node set of subgraph P.
-
\({E}_{p}=\left\{{e}_{0},{e}_{1},\ldots ,{e}_{n}\right\}\subseteq E\) denotes the edge set of subgraph P.
-
\({W}_{p}:{E}_{p}\to \{1,-\mathrm{1,0.5}\}\subseteq W\) denotes the weight of \({E}_{p}\) in subgraph P.
-
\(\text{deg}({{\rm{v}}}_{{\rm{i}}})=2\,\mathrm{for}\,{\rm{i}}\,\mathrm{in}\,\left\{1,2,\ldots ,{\rm{m}}-1\right\}\,\mathrm{or}\,1\,\mathrm{for}\,{\rm{i}}\,\mathrm{in}\,\left\{0,{\rm{m}}\right\}\) denotes the node degree, reflecting the number of edges linked to the node.
-
\(\text{deg}\,({{\rm{e}}}_{{\rm{j}}})=2\,{\rm{for}}\,{\rm{j}}\,{\rm{in}}\,\{0,1,\ldots ,{\rm{n}}\}\) denotes the edge degree, reflecting the number of nodes connected to the edge.
Here, \({v}_{0}=L,{v}_{1}=R\) represents the ligand and receptor pair with an enrichment score greater than 0, while \({v}_{m}\) represents the transcription factor or leaf node, which has no outgoing edges as defined in KEGG and IPA. The set of {\({v}_{2},{v}_{3},\ldots ,{v}_{m-1}\)} denotes all signaling genes linking the (L, R) pair to transcription factors. Note that \(n=m-1\), indicating the property of a graph path. The pathway branches are generated by linking (L, R) to each transcription factor using the shortest_paths function in igraph (R package version 2.0.3, https://CRAN.R-project.org/package=igraph). Spearman correlation scores for genes along the pathway branch were calculated to assess the direct correlation between consecutive genes on each edge of the pathway branch. Finally, a PAS for pathway branch P triggered by interaction of (L, R) is calculated as:
$$PAS(L,R,P)={\rm{\lambda }}\left(logF{C}_{{L}}+logF{C}_{{R}}\right)\times {w}_{L,R}+\left(1-\lambda \right){\sum }_{i=1}^{m-1}{w}_{{e}_{i}}\times cor\left({v}_{{i}},{v}_{i+1}\right)\times logF{C}_{{{v}}_{{i}}}$$
where:
-
\({\mathrm{FC}}_{L}\), \({{FC}}_{R}\) and \({{FC}}_{{v}_{i}}\) are the fold changes in ligand, receptor, and downstream branch genes \({v}_{i}\).
-
\({w}_{{e}_{i}}\in \{-\mathrm{1,1,0.5}\}\) is the edge weight in pathway branch P.
-
\({w}_{L,R}=\frac{{\sum }_{i=1}^{m-1}{w}_{{e}_{i}}* {cor}\left({v}_{i},{v}_{i+1}\right)}{n}\) denotes the weight of L–R interactions. Note that this differs from \({w}_{{e}_{0}}\), which equals 1.
-
\(\lambda\) is a hyperparameter that controls the balance between the influence of the L–R pair and their downstream pathway branches on cell communication signaling activation. When λ is set to 0.5 (default), the L–R pair and the downstream pathway genes contribute equally to the PAS. A \(\lambda\) value of 0 indicates that the score is based entirely on downstream pathway genes, ignoring the L–R influence. A value of 1 makes the score depend solely on the L–R pair.
To identify significant pathway branches, a permutation test is performed by S2C2 for each pathway branch j over n iterations. In the ith iteration, random genes are sampled to replace the pathway genes, and the pathway activity score \({\mathrm{PAS}}_{i}^{(j)}\) is calculated, with the default \(n=500\) iterations. An empirical distribution of \({\mathrm{PAS}}^{(j)}\) is formed from these 500 scores. This process is repeated for each pathway branch, and the p value for pathway branch j is calculated by:
$${p}^{j}=\frac{{\sum }_{i}I({\mathrm{PAS}}_{i}^{(j)} > {\mathrm{PAS}}^{(j)})}{n}.$$
A smaller p value indicates higher significance in pathway alterations.
Prioritization of ligands and terminal nodes
To facilitate efficient experimental validation of cell communication predictions, the predicted ligands, pathway branches, and terminal nodes, such as transcription factors or their targets, need to be ranked. Beyond enrichment scores, ligands can also be ranked by the number of associated downstream activated pathway branches and downstream genes. Higher-ranked ligands may indicate that more signaling pathways are triggered, suggesting the regulation of various cellular biological processes. Similarly, transcription factors can be ranked, with higher-ranked factors reflecting converged signaling transductions from diverse L–R interactions.
Gene coverage enhancement to expand the 1k gene list to an 18k gene list
A random selection of 1000 genes was drawn from the original 18,000-gene list in the Tsai dataset.35 Subsets containing these 1000 genes were generated for both the Tsai and Swarup46 datasets. The H5ad files for these subsets, as well as the original datasets, were loaded using the Scanpy package in Python. Data normalization was performed using total count normalization, followed by log normalization. The data were then embedded using the tasks.embed_data function from scGPT.37
Two deep learning models, MappingNetwork and DecoderNetwork, were developed to learn from the embedding space. In MappingNetwork, the scGPT embeddings from the 1000-gene dataset were taken as input and passed through a multilayer perceptron (MLP) with four layers to predict the scGPT embeddings from the original 18,000-gene dataset. The structure for each layer \(i\in \{\mathrm{1,2,3}\}\) is
$${z}_{i}={W}_{i}{h}_{i-1}+{b}_{i}\left(Linear\,Transformation\right)$$
$${a}_{i}=\sigma \left({z}_{i}\right)(Activation\,Function)$$
$${\widetilde{a}}_{i}=Batch\,Norm\left({a}_{{i}}\right)(Batch\,Normalization)$$
$${h}_{i}=Dropout\left({\widetilde{{\rm{a}}}}_{{\rm{i}}}\right)(Dropout)$$
where:
-
\({h}_{0}={E}_{1000}\) represents the embeddings of 1000 genes by scGPT.
-
\({{\widetilde{E}}_{18,000}=h}_{4}=\) \({W}_{4}{h}_{3}+{b}_{4}\) represents the predicted embeddings of 18,000 genes by scGPT.
In DecoderNetwork, the output from MappingNetwork, along with the original 1000-gene expression profile, was taken as input and passed through an MLP with five layers to decode the embeddings back to the original gene expression for the 18,000 genes. The structure for each layer \(i\in \left\{\mathrm{1,2,3,4}\right\}\) is
$${z}_{i}^{{\prime} }={W}_{i}^{{\prime} }{h}_{i-1}^{{\prime} }+{b}_{i}^{{\prime} }\left(Linear\,Transformation\right)$$
$${a}_{i}^{{\prime} }=\sigma \left({z}_{i}^{{\prime} }\right)(Activation\,Function)$$
$${h}_{i}^{{\prime} }=Dropout\left({a}_{i}^{{\prime} }\right)(Dropout)$$
where:
-
\({h}_{0}^{{\prime} }=\mathrm{Concat}({G}_{1000},{\widetilde{E}}_{18,000})\) represents the concatenated vector of the original gene expression values for 1000 genes and the predicted embeddings from the MappingNetwork.
-
\({{\widetilde{G}}_{18,000}=h}_{5}^{{\prime} }={W}_{5}^{{\prime} }{h}_{4}^{{\prime} }+{b}_{5}^{{\prime} }\) represents the predicted gene expression value of the 18,000 genes.
The number of layers for both the MappingNetwork and DecoderNetwork was determined based on the dimensions of the input and target outputs. The MappingNetwork consists of four layers, where the number of neurons doubles in each layer, starting from a 512-dimensional input, increasing to 1024, then 2048, before scaling back down to 1024, and finally to 512. In contrast, the DecoderNetwork uses five layers to progressively upscale the combined input (512 + 1000 = 1512 dimensions), doubling the neurons in each layer and ultimately producing an 18K-dimensional output. The LeakyReLU activation function, with a slope of 0.01, was applied in the MappingNetwork to prevent neuron inactivity (dead neurons). In contrast, the simpler and more efficient ReLU activation function was used in the DecoderNetwork, as the model’s overall capacity could tolerate a few inactive neurons without significantly impacting learning. \(\mathrm{BatchNorm}\) was not applied to the DecoderNetwork due to the skewness in the target gene expression data. Both models were trained using mean squared error loss and optimized with the Adam optimizer at a learning rate of 0.001. Training was conducted on a cloud GPU server equipped with two RTX A5000 GPUs connected via NVLink. The Tsai dataset was split into 80% for training and 20% for validation, with a batch size of 64. The Swarup dataset was used for independent testing. Both models were trained for 50 epochs.
Model parameter sensitivity analysis
The hyperparameters of S2C2 were tested to evaluate how the predicted L–R interactions and activated pathway branches were affected. λ values ranging from 0 to 1 were tested in increments of 0.05, with the number of downstream pathway branches being used as the performance metric. Log-fold change thresholds between 0.10 and 0.50 were tested in increments of 0.05. As the log-fold change threshold was increased, both the number of L–R pairs and the number of significant pathway branches decreased. Percent expression threshold values ranging from 0.5 to 5% were tested in increments of 0.5%. When higher percent expression thresholds were applied, a decrease in both the number of L–R pairs and the significant pathway branches was observed.
In silico prediction and validation with pathway perturbation using S2C2. In silico experiments were conducted to evaluate the reliability of S2C2 in pathway prediction by manipulating the expression levels of L–R pairs and downstream genes (Supplementary Figs. 11 and 12). For the MAPK and NFκB signaling pathways, which were not identified in the original S2C2 predictions, a manual amplification of a log(1.5) increase in gene expression was applied to the ligands and receptors in these pathways, respectively. Conversely, for the PI3K–Akt and RAS pathways, which had been identified by S2C2, manual reductions were introduced to the ligands and receptors. S2C2 was then applied to the modified datasets, and the number of identified L–R pairs, significant downstream branches, unique genes within those branches, and PASs were compared with the original predictions. Furthermore, 15 downstream genes in the PI3K–AKT pathway, excluding ligands and receptors, were inhibited, and similar comparative analyses were performed to assess whether S2C2 is sensitive to downstream gene changes.
Following each perturbation, the modified expression matrix was processed using S2C2. The outputs were compared to the original, unperturbed results by quantifying (i) the number of significant L–R pairs, (ii) the number of significant downstream branches, (iii) the count of unique downstream genes, and (iv) the change in PAS. To evaluate the performance of S2C2 relative to other cell‒cell communication tools, LIANA+ and NicheNet were applied to two simulated datasets: one with simulated MAPK upregulation and another with RAS downregulation. For each tool, the number of predicted L–R pairs specific to the perturbed pathway was quantified.
Multicell cascading crosstalk prediction using S2C2
Signaling cascades were inferred by S2C2 to trace communication from microglia to astrocytes to excitatory neurons (Mic→Ast→Exc) and from astrocytes to microglia to excitatory neurons (Ast→Mic→Exc). Transcription factors were extracted from the Mic→Ast/Ast→Mic pathways, and their downstream targets were identified using TF–target interaction databases. Ligands that overlapped with target genes were traced to excitatory neuron pathways and annotated with PAS and p values. A signaling axis was revealed in which astrocytic transcription factors, including FOS and JUN, regulated ligands such as VEGFB, SPP1, and VEGFA. These ligands were found to activate downstream cascades in excitatory neurons, particularly those involved in focal adhesion signaling. Notably, FOS and JUN, key regulators in our Mic→Ast signaling cascade, were central to the ERK-FOS inflammatory axis implicated in Alzheimer’s pathology.
Implementation of S2C2
To support customized implementation of S2C2, three different approaches have been developed: a GUI, CLI, and R script. The S2C2 GUI is Java-based, ensuring cross-platform compatibility across Windows, macOS, and Linux, with built-in visualization capabilities. The CLI version is designed for cloud server environments, while the R script provides a fast and efficient option tailored for bioinformaticians working with single-cell and spatial transcriptomics (ST) data in R. These options make S2C2 a versatile solution for users on various operating systems.
Parallel computing is supported by S2C2, allowing multiple calculations for selected cell type pairs to be executed efficiently. A detailed tutorial, accessible through the provided GitHub link (https://github.com/methodistsmab/S2C2/), has been created with instructions for installing S2C2 on Windows, macOS, and Linux. System requirements have been outlined, recommending at least 8 GB of RAM, with 16 GB preferable for handling large-scale single-cell and ST data, as well as a multicore processor with at least 2.5 GHz per core. For processing multiple cell‒cell crosstalk pairs, four or more cores are advised to significantly enhance performance.
In the tutorial, guidance is provided on utilizing key S2C2 features, such as generating L–R chord diagrams and identifying significant branches. Other functionalities, including project management, progress monitoring, drug discovery tools, highlighting pathways and sub-pathways, organizing network layouts, and controlling network edge visibility and properties, are also explained. Additionally, network node detection is supported. Essential files are generated, which can be directly used with other tools, such as Cytoscape for visualization and Connectivity Map for drug repositioning.
Design and implementation of prompting strategies
Four distinct prompting strategies were implemented using the web-based version of GPT-4o: (1) zero-shot prompting, (2) zero-shot chain-of-thought (CoT) prompting, (3) few-shot prompting, and (4) few-shot CoT prompting.
-
In zero-shot prompting, GPT-°was provided with simple, direct instructions that conveyed the essential information about the input file types and formats, without the inclusion of illustrative examples.
-
In few-shot prompting, curated examples of validated cell–cell interactions—drawn from peer-reviewed studies across diverse disease contexts—were presented as reference points to enhance the model’s ability to generalize and predict novel interactions grounded in established biological knowledge.
-
In CoT prompting, the model was systematically guided through a stepwise reasoning process. These steps included (i) identification of relevant L–R pairs, (ii) analysis of downstream signaling pathways linked to the identified pairs, and (iii) assessment of their biological relevance in specific disease contexts. The model was explicitly instructed to “complete the task above step-by-step” to encourage structured reasoning, transparency, and interpretability.
All prompting strategies were subjected to iterative refinement to improve clarity and precision. This refinement process was driven by the continuous evaluation of model outputs. Whenever ambiguous, incorrect, or incomplete responses were encountered, the prompts were revised to include more detailed task instructions, clearer definitions of biological concepts, and additional criteria necessary for accurate model inference. These iterative adjustments minimized ambiguity, enhanced interpretability, and improved the biological validity of the generated outputs.
Validation and interpretation of iS2C2 predictions
The prompt engineering strategies were evaluated based on three of the four robustness axes used to evaluate the iS2C2 framework, including accuracy, reproducibility, and interpretability, while excluding sensitivity to perturbation, as this metric applies only to evaluating S2C2.
-
Accuracy was assessed by calculating the proportion of L–R pairs and their associated downstream pathways predicted by GPT-°that overlapped with those derived from S2C2-based cell–cell interaction predictions.
-
Reproducibility was measured using the APRT, defined as the proportion of L–R pairs consistently identified in at least 80% (i.e., four out of five) of repeated runs.
-
Interpretability was evaluated through a blinded assessment conducted by six domain experts specializing in cell–cell interactions and disease biology. Confidence scores were provided under three sequential conditions: (1) relying solely on domain expertise; (2) supplementing expert judgment with GPT-4o–generated hypotheses contextualized by S2C2 predictions; and (3) further incorporating independently identified literature evidence.
Interpretability ratings were recorded using a five-point Likert scale, ranging from 1 (strong disagreement with disease relevance) to 5 (strong agreement).
Evaluation of large language models for S2C2
Three LLMs (Llama 3.2, Gemini 2.5 Pro, and KIMI K2) were evaluated, and their performance was compared to previously obtained results from ChatGPT. Two types of prompt strategies were employed: simple prompts, which provided direct instructions without examples, and enhanced prompts, which included a chain-of-thought approach with illustrative examples to guide model reasoning. Llama 3.2 was assessed only in the simple zero-shot configuration.
Significant L–R interactions generated from single-cell RNA sequencing analyses (LLM_significant_branches.csv) were supplied to the models, which were then tasked with predicting biologically relevant L–R pairs and their downstream signaling pathways. Gemini 2.5 Pro and KIMI K2 were accessed through their respective web-based chat interfaces, while Llama 3.2 was operated locally via the Msty desktop application.
Model performance was assessed using two primary metrics. Accuracy was defined as the proportion of correctly predicted L–R pairs and downstream pathways. The APRT was calculated as the proportion of L–R pairs appearing in at least 80% of repeated experiments. The results from Gemini 2.5 Pro, KIMI K2, and Llama 3.2 were benchmarked against ChatGPT outcomes obtained under four conditions: simple zero-shot, simple few-shot, enhanced zero-shot, and enhanced few-shot.
Wet lab validation of S2C2 predictions
To validate the predictions of the cointelligent framework related to AD (Fig. 4b), experimental approaches were employed to assess the consistency between in silico predictions and pathological outcomes. The interactions between astrocytes and excitatory neurons were examined using in vitro models, with gene knockdown perturbations, phenotypic observations, and the assessment of proinflammatory markers. For further details, please refer to the subsequent sections in the Methods.
Astrocyte culture
Human astrocyte cell lines (ACBRI 376) were purchased from Cell Systems (ACBRI 376). Upon arrival, the cell line was stored in liquid nitrogen vapor. One day later, the cells were thawed and cultured promptly to maintain optimal viability. Culturing was performed in ATCC-formulated Dulbecco’s Modified Eagle’s Medium (Catalog No. 30-2002) supplemented with 10% FBS, 1% N-2 Plus Media Supplement (AR003, R&D Systems), and 1% penicillin‒streptomycin (Lonza, Switzerland). After 3 days of incubation, the medium was removed, and the flasks were washed three times with D-PBS. Cells were then trypsinized with 0.05% trypsin-EDTA (Gibco, MA, USA) for 5 min at 37 °C. The enzymatic reaction was halted by adding ten volumes of DMEM, and the cells were centrifuged at 2000 × g for 5 min at room temperature. The cell pellet was resuspended in 12-well plates with astrocyte media as needed. Cells were maintained at 37 °C with 5% CO2 for 2 days before being subjected to shRNA treatment in Opti-MEM® medium. One day after transfection, the Opti-MEM® medium was replaced with fresh astrocyte media for primary astrocyte collection or Dulbecco’s modified Eagle’s medium (DMEM) (Gibco, MA, USA) with high glucose containing 1% penicillin‒streptomycin and 10% FBS for coculture with neuronal cells.
Neural progenitor cell line culture
ReNCell VM cells (EMD Millipore, SCC008) were obtained from EMD Millipore, USA. As previously reported, ReNCell VM cells were seeded into flasks treated with Cell Basement Membrane (ATCC, ACS-3035) at a density of 4 × 103 cells/well (1.25 × 104 cells/cm²) using ReNcell® NSC Maintenance Medium (Millipore Cat. No. SCM005) with a Growth Kit (ATCC® ACS-3003™) for neural progenitor cell expansion. The cells were maintained until reaching 90–95% confluency and then washed three times with D-PBS. Subsequently, 0.5 ml of Accutase (Millipore Cat. No. SCR005) was added, and the cells were incubated in a 37 °C and 5% CO2 incubator for 3–5 min. After incubation, 3 ml of prewarmed ReN proliferation medium was added, and the cell suspensions were transferred to a 15 ml sterile conical tube for centrifugation at 2000 × g for 3 min. After removing the supernatant, the cell pellet was resuspended in 10 ml of ReN proliferation medium. Progenitor cells were induced to differentiate by removing the growth factors bFGF and EGF from the tissue culture medium for 4-7 days. The differentiated neurons were then maintained at 37 °C and 5% CO2 for subsequent analysis or for coculture with astrocytes and neurons.
Target gene knockdown with shRNA
Short hairpin RNAs (shRNAs) were utilized to selectively knock down genes coding for ligands and/or receptors in astrocytes and neurons, respectively. The shRNA knockdown lentiviral vectors were obtained from Vectorbuilder, with sequences provided in Supplementary Table 4. Cells were transiently transfected with the specified lentivirus using Lipofectamine LTX in combination with Plus Reagent (Thermo Fisher, Catalog number: 15338030). Target gene expression was monitored 2–3 days post-transfection. After optimizing the dosage and treatment duration of the shRNA, the shRNAs were introduced into the indicated cells to achieve gene knockdown.
Astrocyte-neuron coculture
The culture medium from the astrocytes, with or without target shRNA pretreatment, was collected to incubate the well-differentiated neurons. All ligands secreted by astrocytes were present in the medium, mimicking the L–R crosstalk between astrocytes and neurons in the coculture model.
Quantitative PCR (qPCR) analysis
To determine the target gene expression levels upon shRNA treatment, total RNAs were extracted using the RNeasy Plus Mini Kit (Qiagen, 74134) following the manufacturer’s instructions. The quality of the RNA was evaluated using a NanoDrop 2000, and cDNA synthesis of each RNA sample was performed using the Superscript VILO cDNA Synthesis Kit (Thermo Scientific, Cat. 11754050) according to the manufacturer’s instructions. Gene expression levels were quantified via qPCR using Perfecta® SYBR® Green FastMix® Reaction Mixes (QuantaBio, Reactions per Kit = 5000, Cat. 101414-280) with a LightCycler® 480 Instrument (Roche Sequencing & Life Science). The results were normalized to the β-ACTIN level and calculated using the 2−ΔΔCt method. Details of the primer sequences are provided in Supplementary Table 4. Each qPCR was independently performed in triplicate.
Bone metastasis models
All animal experiments were conducted in accordance with protocols approved by the Baylor College of Medicine Institutional Animal Care and Use Committee. The bone metastasis model was established by internal iliac artery injection as previously described.50 Briefly, mice were anesthetized and secured on a plastic board. An ~1.5 cm incision was made along the line between the femoral and iliac bones, followed by blunt dissection of the muscles to expose the common iliac artery. LLC1 cells suspended in 0.1 ml PBS were injected into the artery using a 31 G needle. Hemostasis was achieved by applying gentle pressure with a cotton tip until bleeding ceased (approximately 10 min). The incision was then sutured, and the animals were monitored until full recovery from anesthesia and restoration of normal movement. Tumor bone metastases were imaged using the IVIS system. Methods for the collection of spatial or single-cell data from mouse bone metastases have been previously reported.95
For in vivo treatment, tamoxifen was administered by intraperitoneal injection at a dosage of 5 mg/kg/day. For in vitro experiments, MCF-7 cells were treated with 100 nM tamoxifen for 24 h, followed by collection for qPCR analysis as described above.
Image processing
Images were acquired using a SLIDEVIEW VS200 research slide scanner (Olympus, USA). Consistent settings were uniformly applied to all images from the same experimental set. The analysis was conducted by a blinded investigator using the open-source ImageJ/Fiji software (NIH, Version 1.52a).
Comparison of existing cell‒cell communication algorithms
In this study, two widely used algorithms for single-cell data analysis, CellChat14 and LIANA+,17 were compared with S2C2 using a public AD dataset. The outputs of S2C2, NicheNet, and LIANA+ were systematically compared based on several quantitative metrics. First, the total number of significant L–R pairs identified by each algorithm was quantified. Second, the functionality for downstream prediction was compared between S2C2 and NicheNet by enumerating their predicted ligand-target links. Finally, the concordance among the algorithms was determined by calculating the intersection of their respective L–R and LT prediction sets. This allowed for an objective assessment of the consistency and divergence in outputs across the different computational methods.
Statistical analysis
GraphPad Prism 9 and/or 10 (GraphPad Software, CA, USA) was used for all statistical analyses. Data from individual experiments are presented as the mean ± standard deviation and were assessed using Student’s t-test, one-way ANOVA, or two-way ANOVA, followed by Tukey’s post hoc test for multiple comparisons. For comparisons involving more than two groups, paired one-way ANOVA was utilized, followed by Tukey’s post hoc test. Statistical significance was denoted as *p < 0.05, **p < 0.01, ***p < 0.001, and ****p < 0.0001.

