Human samples
All patients provided informed consent for the use of the tumor samples in this study. The protocol used for sample collection was approved by the local ethics committee (CER-VD, BASEC ID 2016-02094).
Datasets
Study design
Data were generated from 10 FFPE NSCLC samples and 17 FFPE breast cancer samples. Xenium spatial transcriptomics was performed on all samples using different gene panels, with the 5K panel specifically applied to lung samples. In addition, matched snRNA-seq and multiplexed IHC data were generated for selected samples, as detailed below and in Fig. 1.
Xenium
Xenium lung tissues were processed using three gene panels in parallel, the 10x Human Lung panel, a custom immuno-oncology panel (Custom IO; see Supplementary Table 2 for the gene list) and the 10x Prime 5K Human Pan Tissue & Pathways panel, allowing an investigation into how the choice of a gene panel may affect downstream analyses. For breast samples, we used the 10x Human Breast panel.
snRNA-seq
snRNA-seq data were obtained from FFPE blocks from eight patients, four per disease group, on the Chromium platform with NEXT-GEM FLEX chemistry. Details about this dataset and its methodology are available in the publication by Dong et al.2.
IHC
Two post-Xenium slides—one per disease—containing a total of five tissue sections were immunohistochemically stained for 4′,6-diamidino-2-phenylindole (DAPI; Spectral DAPI, AKOYA), CD8 (Cellmarque, Clone SP16, 108R) and pan-cytokeratin (panCK, Dako, Clone AE1/AE3, IS053) in an autostainer (ventana Discovery Ultra, Roche) using the Tyramide Signal Amplification system (OPAL, Akoya), allowing us to identify CD8+ T cells and malignant cells. Antigenicity was reduced in some samples, particularly those processed with the multimodal staining protocol used for Xenium segmentation, leading to failed IHC staining in certain cases. As a result, IHC yielded usable staining quality in only five samples.
Data preprocessing and QC
snRNA-seq
Filtered barcode counts from CellRanger were analyzed using the Seurat package33 (v. 5.0.1) in R (v. 4.3.2, https://www.R-project.org/). Cells with at least 200 detected genes and fewer than 20% of reads mapping to mitochondrial genes were retained. To recover neutrophils, the gene threshold was lowered to 100 genes per cell, and cells forming a distinct cluster with high expression of canonical neutrophil markers (that is, FCGR3B, S100A9, IL1R2, CSF3R, FPR1 and NAMPT) were retained.
Raw counts were normalized using SCTransform, and the top 3,000 variable genes across samples were selected using SelectIntegrationFeatures. Dimensionality reduction was done using principal component analysis (PCA). Clustering was done via Seurat’s shared nearest neighbor modularity optimization algorithm (FindNeighbors and FindClusters) using 30 principal components and resolutions between 0.4 and 0.8.
Clusters were annotated based on the top differentially expressed genes and canonical markers. Significant markers were identified with Seurat::FindAllMarkers() (two-sided Wilcoxon’s rank sum test, Bonferroni correction, adjusted P < 0.05, log fold change >1). Subclustering enabled finer annotation, resulting in four hierarchical annotation levels. Cell-type labels were standardized to match ontology naming conventions. This dataset was used as a matched reference for RCTD cell-type annotation of Xenium data.
Annotation Level 2.1 was added for visualizing cell types on Xenium. As an extension of the broader Level 2, it distinguishes CD8+ T cells and supports validation with IHC staining.
Xenium
10x segmentation
To facilitate segmentation, Xenium’s standard output includes a DAPI-stained image for nuclear localization along with the x, y and z coordinates of individual transcripts. Segmentation algorithms can leverage this information to assign transcripts to individual cells. Unless the multimodal segmentation kit is used, the default 10x segmentation algorithm expands from the nucleus outward—up to 5 µm or until it reaches the boundary of an adjacent cell.
In our study, the multimodal segmentation kit was applied only to the 5K panel. Therefore, we used the default 5-µm nuclear expansion-based segmentation for all other panels. In practice, segmentation results are identical for a substantial proportion of cells, as the multimodal method defaults to the 5-µm nuclear expansion when membrane staining is weak or absent. In either case, we consider these approaches to provide reasonable and consistent baselines for our analyses.
Alternative segmentations
Several alternative segmentation methods have been developed to improve upon the default 10x approach described above. In this study, we evaluated Baysor (v0.7.0, https://github.com/kharchenkolab/Baysor), ProSeg (v2, https://github.com/dcjones/proseg, commit ef2d1ca8c535fd911b7cd37da47c84de98e784a4) and Segger (a version adapted by us at https://github.com/bdsc-tds/segger_dev, commit 4bf56dec2a364de8eee4fcab663e798eb106e21a)—all of which utilize transcript-level information to guide segmentation. We applied these methods with default arguments to all Xenium samples. Specifically, when running Segger, we used 6,000 tokens for samples from the 5K panel and 500 tokens for others. These methods represent robust alternatives and serve as valuable baselines for benchmarking segmentation performance.
Quality control
Cells with fewer than ten counts or five genes were filtered out, as well as genes expressed in fewer than ten cells.
IHC
Analysis
Xenium and IHC images were coregistered using Xenium Explorer (v3.2.0). Xenium’s cell segmentation, as a GeoJSON file, was loaded into a project in Qupath (v0.5.1) with the corresponding IHC image (details included as part of our pipeline; see ‘Code availability’ section), and an object classifier was trained within QuPath to classify cells based on protein expression.
Downstream analyses
UMAP visualization
UMAPs were computed for log-normalized snRNA-seq and for Xenium data at the sample and panel level using default scanpy34 parameters for PCA, k-nearest neighbor (kNN) search and UMAP, except kNN n_neighbors, which was set to 50.
RCTD cell-type decomposition
RCTD22 is a computational method for deconvolving spatial transcriptomics data using single-cell RNA-seq references. It uses a Poisson regression model to represent each spatial unit (spot or segmented cell) as a mixture of cell-type-specific profiles, assigning weights that capture the contribution of each type. In doublet mode, RCTD restricts the decomposition to two cell types, assigning w1 to the primary type and w2 to the secondary type. In our analysis, we interpret the primary cell type as the cell’s true underlying identity, whereas the secondary type and its weight w2 are treated as indicative of contaminating signal and its magnitude.
In addition to cell-type composition, RCTD in doublet mode assigns a ‘spot class’ to each unit, with four possible values: (1) singlet, for cells confidently assigned to a single type; (2) doublet_certain, for cells confidently annotated as a mixture of two types; (3) doublet_uncertain, for cells with a confident primary type but unclear contamination sources; and (4) rejects, for cells with uncertain annotation. This classification is based on the difference between two scores: the singlet score, the residual error between the observed profile and the primary reference profile, and the doublet score, the residual error to the weighted sum (w1 and w2) of primary and secondary profiles. This doublet–singlet calling is based on the difference between two scores: singlet score and doublet score. The doublet calling is performed on a fixed threshold. Because residuals accumulate over all expressed genes, the score difference tends to be higher in larger panels (for example, 5K Prime), leading to higher apparent doublet rates. Therefore, doublet scores and rates are only meaningful for comparisons within the same panel.
Xenium annotation with RCTD
We performed cell-type annotation of Xenium data using the RCTD algorithm in doublet mode, leveraging both matched snRNA-seq and external scRNA-seq references (see ‘Data availability’ section). Each sample was annotated independently. To enhance the specificity of tumor cell detection, matched references were restricted to include only tumor cells derived from the corresponding donor, when donor-specific cells were available. Reference datasets were filtered to remove cells with fewer than 10 or more than 2,000 unique molecular identifiers (UMIs), and cell types represented by fewer than 25 cells were excluded. RCTD was run on raw UMI count data from both the reference and Xenium data. Due to the inherently lower UMI counts in single-cell resolution Xenium data, we applied a more permissive filtering strategy: cells with fewer than 10 total UMIs were excluded and do not appear in further plots, and the UMI_min_sigma parameter was reduced to 100 to account for the reduced transcript coverage compared with spot-based spatial transcriptomics in the original RCTD application. To improve the robustness of RCTD annotation, we provided the class_df parameter, which groups cell types into broader classes. Finally, we applied mild postprocessing to the RCTD results, removing an arbitrary second cell type in highly confident singlets—specifically, singlets that showed no evidence of a secondary signal.
Spatial spillover and contamination metrics
A cell’s neighborhood composition is defined as the relative proportion of cell types within a cell’s spatial neighborhood. The spatial neighborhood consists of up to 20 nearest cells within a 15-µm radius. Each neighbor is represented by two cell types (primary and secondary) with associated weights defined above (namely w1 and w2), obtained from the RCTD decomposition. Within a neighborhood, weights are aggregated by cell type and normalized to sum to 1, yielding a vector of relative cell-type proportions, referred to as the neighborhood composition.
The proportion of the secondary cell type in the neighborhood is defined as the neighborhood composition weight corresponding to the secondary cell type. Equivalently, it can be described as the average weight of the signal corresponding to a given cell’s secondary cell type across all other cells in its spatial neighborhood.
The pairwise spillover index is defined as the cosine similarity between a cell’s secondary cell-type weight (w2) and the proportion of that same cell type within the cell’s spatial neighborhood, computed across all cells sharing a specific primary and secondary cell-type pair. A higher index suggests greater evidence of spillover from type 2 into type 1. For each sample, a pairwise spillover index was computed for cell-type pairs having at least 20 observations. Nonsignificant scores (<20 observations or permutation test false discovery rate (FDR) >0.01) were replaced with zeros.
The cell-type spillover index is derived by averaging the pairwise spillover indices across all primary cell types for each given secondary (contaminating) cell type. A higher index indicates stronger spillover originating from that secondary cell type. When averaging across all primary cell types, statistically nonsignificant observations (<20 observations or permutation test FDR >0.01) were treated as zeros.
SPLIT correction
To reduce contamination from secondary cell-type signals in an observed expression profile, we leveraged RCTD output. RCTD provides, for each cell, the primary and secondary cell-type labels, along with their weights w1 and w2 = 1 − w1, which reflect the relative contributions of the primary and secondary reference profiles to the observed expression. Using these weights and the reference profiles, we computed a purified expression profile designed to separate the contribution of the primary cell type from the secondary cell type. Specifically, for each cell:
$${{\bf{x}}}_{\mathrm{doublet}-\mathrm{SPLIT}}=\frac{{w}_{1}{{\bf{ref}}}_{1}}{{w}_{1}{{\bf{ref}}}_{1}+{w}_{2}{{\bf{ref}}}_{2}}\odot {{\bf{x}}}_{\mathrm{observed}},$$
where ref1 and ref2 are the average reference profiles of the primary and secondary cell types, respectively. Fraction bar and ‘\(\odot\)’ correspond to element-wise division and product, respectively. Intuitively, this ratio (used as a scaling factor) is an estimate of the expected fraction of transcripts coming from the primary cell type. The complement, namely \(\frac{{w}_{2}{{\bf{ref}}}_{2}}{{w}_{1}{{\bf{ref}}}_{1}+{w}_{2}{{\bf{ref}}}_{2}}\), being the expected fraction of transcripts coming from the secondary cell type.
We call this approach doublet-SPLIT and by default apply it to cells classified as ‘doublets_certain’ and ‘singlets’ that showed signs of contamination—that is, cells for which both a primary and secondary cell type were confidently assigned by RCTD.
For cells labeled as ‘doublets_uncertain’, only the primary cell type is confidently assigned, and the secondary identity is considered ambiguous. To purify these cells, we performed a full-SPLIT as
$${{\bf{x}}}_{\mathrm{full}-\mathrm{SPLIT}}=\frac{{w}_{1}{{\bf{ref}}}_{1}}{{\sum }_{i=1}^{k}{w}_{i}\,{{\bf{ref}}}_{i}}\odot {{\bf{x}}}_{\mathrm{observed}}.$$
In addition, we observed that RCTD occasionally swaps the primary and secondary cell types. To account for this, we implemented a label-correction step termed SPLIT-shift. If a cell has primary label ct1 and secondary label ct2, but its transcriptomic neighborhood is primarily composed of ct2 cells, we assume a potential label swap and perform purification using ct2 as the primary identity, namely
$${{\bf{x}}}_{\mathrm{SPLIT}-\mathrm{shift}}=\frac{{w}_{2}{{\bf{ref}}}_{2}}{{w}_{1}{{\bf{ref}}}_{1}+{w}_{2}{{\bf{ref}}}_{2}}\odot {{\bf{x}}}_{\mathrm{observed}}.$$
To support SPLIT-shift decisions, we also constructed a transcriptomic neighborhood for each cell, defined by a kNN graph (k = 10) based on Euclidean distance in PCA space.
Which cells to SPLIT
We offer several modes to decide which cells should be purified:
-
Default: purifies all cells having secondary cell type by applying doublet-SPLIT to singlets and doublets_certain, and full-SPLIT to doublets_uncertain.
-
balance_score_based: singlets and doublets_certain undergo doublet-SPLIT only if they show evidence of contamination, specifically, if they are surrounded by cells of their assigned secondary type. Doublets_uncertain still receive full-SPLIT; all others remain unchanged. To support this decision-making, we introduce a spatial neighborhood-based metric, neighborhood_weight_second_type (referred to as proportion of secondary cell type in cell’s neighborhood in Fig. 3), which represents the average proportion of a cell’s secondary type found within its local spatial neighborhood. In this study, we defined each cell’s spatial neighborhood using a kNN approach with (k = 20) based on Euclidean distance in physical space. To ensure locality, we removed all edges longer than 15 µm.
Alternative count correction
ResolVI14 is a probabilistic model that refines spatial transcriptomic data by resolving cell-type mixtures using variational inference. In this study, we applied ResolVI using default parameters, with models trained either in an unsupervised manner or supervised using Level 2.1 cell-type labels, with T cell subtypes and malignant cell subtypes each grouped into one category. We treated sample information as a batch covariate, as recommended through personal communication with the authors (https://github.com/bdsc-tds/xenium_analysis_pipeline/issues/83). Because ResolVI requires input expression values to be integers, ProSeg expression matrices were rounded to the nearest integer.
Ovrlpy15 combines a vertical subslicing strategy with an unsupervised, segmentation-free analysis of spatial transcriptomics data to identify and correct for potential spatial doublets. In this study, we applied ovrlpy with default parameters, with a threshold of either 0.5 or 0.7 applied to the pixel signal integrity map, as recommended in the original preprint and through personal communication with the authors (https://github.com/HiDiHlabs/ovrl.py/issues/40), respectively.
Comparing segmentations and count correction methods
Computational procedures and metrics for assessing cell type separation, batch effect, data quality and potential spatial signal contamination. Analyses focused on quantifying specificity (that is, potential contamination between neighboring cell types) and sensitivity (that is, data preservation, similarity to chromium profiles) were adapted from Mitchel et al.16. These were applied to each sample, before and after count correction.
Assessment of cell-type separation and batch effect strength
Biological conservation of cell types and batch effect strength were assessed using metrics proposed by Luecken et al.26, as implemented in the scib-metrics package (https://github.com/YosefLab/scib-metrics) and Calinski–Harabasz, Davies–Bouldin scores from scikit-learn35. Higher scores indicate improved biological conservation or batch integration; Extended Data Fig. 2 uses color coding to facilitate interpretation and comparison across methods. Malignant cells were excluded before computing these metrics because they are often sample specific. Datasets were downsampled to a maximum of 100,000 cells to speed up calculations.
Assessment of specificity and potential contamination
To evaluate potential transcript spillover or misassignment between spatially adjacent cells of different types, we reimplemented a recently proposed approach16. For each pair of distinct cell types, denoted as the target cell type i and the potential contaminating cell type j, cells annotated as type i were assigned a binary label: 1 for those spatially adjacent (based on centroid distance ≤15 µm) to at least one cell of type j, and 0 for those not adjacent to any cell of type j. We then trained a logistic regression model to predict, based on the gene expression profile of a cell of type i, whether it is adjacent to a cell of type j.
For each sample after QC filtering, input features were z-scored log-normalized expression levels of all genes for cells of type i. We used the implementation from scikit-learn35 with default parameters, except for max_iter, which was increased to 500 and class_weight set to ‘balanced’ to account for possible imbalance of target labels.
Genes were ranked by the magnitude of their coefficients to assess their potential association with signal contamination from cell type j within cell type i. To evaluate whether these ranked gene lists were enriched for markers of cell type j, we counted the number of markers present in the top 20 ranks. Cell type markers were defined independently for each sample, as the top 20 genes identified via Wilcoxon rank-sum test statistics, comparing with all other cells.
This analysis was run across all segmentations of each Xenium sample, both before and after applying count correction. We used RCTD cell-type annotation Level 2.1, grouping T cell subtypes into one class and tumor subtypes into one class. Cell types with more than 50,000 cells were downsampled to a maximum of 25,000 cells from each class. For a given sample, cell types were excluded from analysis if fewer than 100 cells were available in both target classes (proximal versus distal to cell type j).
In addition, we ran Wilcoxon differential expression tests between classes, on all samples except those with fewer than 30 cells available in both target classes. For T cell contamination by malignant cells, we ran Prerank gene set enrichment analysis (GSEA) using genes ranked by log-fold changes and custom gene sets (Supplementary Tables 3 and 4).
Assessment of sensitivity and data quality
After applying QC filters (see ‘Xenium’ in ‘Data preprocessing and QC’ section in Methods) to raw or corrected count matrices, standard QC metrics were computed per cell, including the total number of UMIs or reads detected, and the number of genes expressed (defined as genes with >0 counts). Mean and median values for the number of genes expressed per cell, as well as the mean UMI count per cell, were calculated across all cells.
To assess the biological fidelity of the expression profiles, we compared them with reference chromium data. Pseudo-bulk expression profiles were generated for each identified cell type in our dataset by summing gene expression across all cells assigned to that type. We then calculated the cosine similarity between log-normalized pseudo-bulk profiles and corresponding cell type profiles derived from Chromium scRNA-seq. The similarity calculation was performed using all shared genes.
Analysis coordination
Given the raw Xenium data and the curated snRNA-seq reference, we implemented a reproducible Snakemake36 pipeline to coordinate the above sophisticated analysis on a high-performance cluster. The number of threads and the amount of memory for each task were allocated dynamically, with a limit of 48 threads and 1 TB memory, and the run time was limited to 3 days (72 h). Tasks that failed due to out-of-memory errors were excluded from the results.
Statistics and reproducibility
All correlations were quantified using Spearman’s rank correlation coefficient (R). Statistical significance was assessed using a two-sided t-test of the null hypothesis R = 0.
Statistical significance of cosine similarity metric was assessed using a one-sided permutation test (n = 1,000).
The micrographs shown in Fig. 3f,g are illustrative. Confounding of primary and contaminating cell-type assignments by RCTD, occurring when contaminating signal dominates the true signal, was observed consistently within each slide and across all five slides with successful IHC staining.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

