Patient cohorts
Four gallbladder carcinoma samples subjected to scRNA-seq were obtained from patients undergoing surgical resection with gallbladder diseases at Yan’an Affiliated Hospital of Kunming Medical University and The Third Affiliated Hospital of Kunming Medical University. All clinical data were sourced from electronic medical records under informed consent, and sample collection was approved by the Ethical Committee of Yan’an Affiliated Hospital of Kunming Medical University (No. 2023-024-01), which was conducted in accordance with Declaration of Helsinki. Fresh surgical specimens were divided under sterile conditions: one portion was directly processed for scRNA-seq, and the other underwent formalin fixation for histopathological analysis. All diagnoses were independently confirmed by three pathologists through hematoxylin-eosin-stained section review. The clinical samples collected for scRNA-seq should meet the following criteria: inclusion criteria: 1) Diagnosis of gallbladder adenoma or carcinoma confirmed by histological methods; 2) Availability of complete clinical and pathological data, including TNM stage, treatment records information; 3) Availability of sufficient tumor tissue samples for analysis; 4) Received a standard surgery regimen relevant to the study objectives. Exclusion criteria: 1) Carcinomas originating in bile duct; 2) Lack of pathological diagnosis; 3) Incomplete key clinical data or follow-up records; 4) A history of other active malignant tumors within the past 5 years; 5) Received prior anticancer therapy that could interfere with the study’s endpoints.
Single-cell RNA-sequencing
Tissue samples were minced in a cell culture dish and enzymatically digested in centrifuge tubes using a mixture containing Collagenase I (Sigma), Collagenase IV (Sigma), DNase I (Sigma), and Dispase (Corning) at 37 °C for 30 min. The digested cell suspension was filtered through a 40-μm cell strainer (BD Falcon) to obtain a single-cell suspension. For viability assessment, aliquots of the suspension were mixed with 0.4% trypan blue at a 9:1 ratio and analyzed using the Countess® II Automated Cell Counter, ensuring ≥90% viability. Cell concentrations were adjusted to ≥1000 cells/μL prior to processing. Cell suspensions were processed through the 10X Genomics GemCode platform using microfluidic partitioning technology to generate Gel Bead-In-EMlusions (GEMs). Libraries were generated and sequenced from the cDNAs with the 10X Genomics 5’ platform with Chromium Next GEM Single Cell 5’ Reagent Kits v3.1. Upon dissolution of Gel Beads in GEMs, the encapsulated primers comprising Illumina® R1 sequences, 16-nucleotide 10x Genomics barcodes, 10-nucleotide unique molecular identifiers (UMIs), and poly-dT sequences were liberated into the cellular lysate. Subsequently, barcode-labeled and full-length cDNA were synthesized via reverse transcription using polyadenylated mRNA transcripts as templates.
Post-GEM reactions were purified using silane magnetic beads to remove residual reagents. Amplification of cDNAs was conducted via PCR to achieve sufficient material for library preparation. The R1 primer was integrated during GEM incubation, while P5/P7 sample index, and R2 primers were introduced in library construction through end repair, A-tailing, adapter ligation, and PCR. Final Illumina-ready libraries contained paired-end constructs flanked by P5/P7 primers, with Read 1 encoding 10x Barcodes and UMIs, and Read 2 capturing cDNA fragments. The generated cDNA library was sequenced using an Illumina NovaSeq X Plus.
Single-cell RNA-seq data processing and data quality control
Raw sequencing data (BCL files) were processed using 10X Genomics Cell Ranger (version 3.1.0) for FASTQ file conversion, read alignment, and gene expression quantification. Following quality control filtering of low-confidence barcodes and UMIs, sequencing reads were aligned to the GRCh38.100 human reference genome. Post-alignment data from individual samples were processed through Seurat (version 3.1.1) with the following quality thresholds: exclusion of cells exhibiting extreme UMI counts (<500 or >4000 genes detected), elevated mitochondrial content (>10%), or excessive sequencing depth (≥8000 UMIs) [22]. Potential doublets were computationally identified and removed using DoubletFinder (version 2.0.3) through artificial nearest-neighbor analysis (pANN scoring) calibrated to the experimental cell density [23]. To minimize the effects of batch effect and behavioral conditions on clustering, we performed dataset integration using Harmony R package to aggregate all samples, ultimately resolving 28,548 high-quality cells expressing 39,974 detectable genes from 4 samples [24]. Following log-normalization, integrated datasets were scaled and subjected to principal component analysis (PCA) for dimensionality reduction prior to downstream clustering analyses [25]. We annotated the cell types, and six clusters were identified based on expression of the following marker genes: CD2, CD3D, and CD3E for lymphocyte cells; EPCAM and KRT19 for epithelial cells; DCN and COL1A1 for fibroblast cells; VWF, RAMP2, and PECAM1 for endothelium cells; CD14 and CD68 for myeloid cells; IGHA1, IGHG1, and JCHAIN for plasma cells.
Trajectory analysis
To investigate dynamic biological processes, such as interconversion and evolutionary trajectories of different cell types, we applied the Monocle (Version 2.36.0) algorithm [26]. Signature genes expressed in at least 10% cells of the dataset and with a p < 0.01 calculated using the differentialGeneTest function were included to define the trajectory progress. The Ggridges package was used to analyze the frequency of distributed cells in different groups along the pseudotime axis. Cellular differentiation trajectories were reconstructed using Monocle (version 2.36.0) to investigate dynamic processes including cell state transitions and lineage progression. Ordering genes were selected based on two criteria: (1) detection in ≥10% of cells, and (2) significant differential expression (q-value < 0.01) determined by Monocle’s differentialGeneTest function with Benjamini-Hochberg correction. The ggridges package (version 0.5.3) was used to analyze the frequency of distributed cells in different groups along the pseudotime axis.
Enrichment analysis
Pseudotime-dependent genes were further subjected to GO and KEGG enrichment analysis using the clusterProfiler package (Version 3.11.0) with default settings. A nonparametric and unsupervised algorithm from the gene set variation analysis (GSVA) package (Version 1.42.0) was selected to assess the EMT scoring of different states generated with Monocle [27]. The signature genes of EMT and other related biological process were obtained from fifty hallmark gene sets in the MSigDB database (https://www.gsea-msigdb.org/gsea/msigdb).
Immunohistochemical (IHC) staining and grading
Gallbladder tissue samples were obtained from 108 chronic cholecystitis and 72 gallbladder carcinoma cases through surgical resection, fixation in 10% neutral buffered formalin and paraffin embedding. Serial 4 μm sections were subjected to de-paraffinized with xylene and graded ethanol hydration. Microwave-based antigen retrieval was performed in 0.01 M citrate buffer (pH 6.0), with subsequent endogenous peroxidase blockade using 3% H₂O₂ for 10 min and nonspecific binding inhibition with 5% horse serum for 1 h at room temperature. Sections were incubated overnight at 4 °C with anti-OLFM4 primary antibody (Proteintech 28432-1-AP, 1:100 in antibody diluent), followed by HRP-conjugated secondary antibody development and hematoxylin counterstaining for 3 min. The staining intensity was calculated based on the intensity of the positive staining (0 = negative; 1 = weak; 2 = moderate; 3 = strong) multiplied by the percent of positive cells.
Clinical tissue microarrays and Kaplan-Meier survival analysis
Two paraffin-embedded cancer tissue microarrays (TMA) containing 220 tumor tissues and 24 adjacent non-tumor tissues were obtained from the National Engineering Center for Biochip (Outdo Biotech, China), with ethical approval granted by the Institutional Review Board of Outdo Biotech. Patients were stratified into high and low expression groups based on OLFM4 expressions. Kaplan-Meier survival curves were generated to evaluate overall survival (OS) differences between cohorts, and the significance was assessed via log-rank tests.
Cell culture
GBC-SD and NOZ cell lines were purchased from the Cell Bank of Type Culture Collection of Chinese Academy of Sciences (Shanghai, China). All cell lines were propagated and cultured in accordance with instructions from the providing institution and utilized with 15 or fewer passages. GBC-SD were cultured in RPMI-1640 medium (Biological Industries) supplemented with 10% FBS (Biological Industries), and 100 U/mL penicillin and 100 μg/mL streptomycin (Biological Industries). NOZ were cultured in DMEM/F12 medium (Gibco) supplemented with 10% FBS (Biological Industries), and 100 U/mL penicillin and 100 μg/mL streptomycin (Biological Industries). Short Tandem Repeat profiling was used to authenticate cell lines before use in experiments. Cultured cells were maintained at 37 °C with 5% CO2 in a humidified incubator.
Vector construction and transduction
OLFM4, CEACAM6 or TGFβR1 knockdown in GBC-SD and NOZ cells was performed using the hU6-MCS-CBh-gcGFP-IRES-puromycin and hU6-MCS-CMV-Neomycin vector. OLFM4 and CEACAM6 overexpression vectors were generated using the Ubi-MCS-SV40-EGFP-IRES-puromycin and Ubi-MCS-3FLAG-SV40-Neomycin vectors. Both vector and recombinant lentiviruses were designed and synthesized by GeneChem (Shanghai, China).
The interference sequences are shOLFM4-1:
“5-CCGGAGTGCAGAGCATTAACTATAACTCGAGTTATAGTTAATGCTCTGCACTTTTTTG-3“;
shOLFM4-2:
“5-CCGGCCCTAATGCTGCCTATAATAACTCGAGTTATTATAGGCAGCATTAGGGTTTTTG-3”;
shCEACAM6-1:
“5-CCGGGGTTTATCAATGGGACGTTCCCTCGAGGGAACGTCCCATTGATAAACCTTTTTG-3”;
shCEACAM6-2:
“5-CCGGGGAACGATGCAGGATCCTATGCTCGAGCATAGGATCCTGCATCGTTCCTTTTTG-3”;
shTGFBR1-1: ‘5-CCGGGCCTTGAGAGTAATGGCTAAACTCGAGTTTAGCCATTACTCTCAAGGCTTTTTG-3”;
shTGFBR1-2:
“5-CCGGGCCACAGATACCATTGATATTCTCGAGAATATCAATGGTATCTGTGGCTTTTTG-3”;
shControl (Scramble):
“5-CCGGAGTTCTCCGAACGTGTCACGTCTCGAGACGTGACACGTTCGGAGAATTTTTG-3”.
The OLFM4 and CEACAM6 overexpression lentiviral vector was built using NCBI Reference Sequence: NM_006418 (OLFM4) and NM_002483.7 (CEACAM6). GBC-SD and NOZ cells were co-cultured with 4 μg/mL puromycin for 7 days or 300 μg/mL G418 for 14 days for screening, and immunoblotting was performed to validate the level of protein expression.
Western blot
Protein samples (20–40 μg) were separated on 10% SDS-PAGE gels and subsequently transferred to PVDF membranes (Millipore). Following 1 h blocking with 5% BSA/TBST at room temperature, the membranes were incubated with primary antibodies (diluted in 5% BSA/TBST) at 4 °C overnight. The primary antibodies used in the Western blotting assay are summarized in Supplementary Table S1. After three 10-min TBST washes, membranes were probed with horseradish peroxidase (HRP)-conjugated secondary antibodies for 1 hour at room temperature. Following three additional TBST washes (10 min/wash), protein bands were visualized using enhanced chemiluminescence detection reagent (ECL, ProteinTech) with a chemiluminescence imaging. In TGF-β/Smad3 signaling blockage assay, cells were treated with TGF-β type I receptor kinase inhibitor IN1130 (Cat No. HY-18758, MCE) and phosphor-Smad3 inhibitor E-SIS3 (Cat No. HY-13013, MCE). For rescue experiments, OLFM4-knockdown GBC cells were incubated with recombinant human TGF-β protein (Cat No. HZ-1011, ProteinTech) with different doses and time series indicated. To explore the role of p-AKT in CEACAM6/EMT cascade, p-AKT inhibitor (Capivasertib, Cat No. HY-15431, MCE) was employed to treat GBC cells at different concentrations and immunoblotting was performed to examine the expression of EMT markers.
Cell invasion and migration assays
Two strains of GBC cells (GBC-SD and NOZ) overexpressing and knocking down by OLFM4 or CEACAM6 were employed to investigate their invasion and migration promoting function. For cell invasion assays, 24-well plates with Transwell chambers (8 μm pore, Corning) were coated with 100 μL (400 μg/mL) Matrigel before use (Matrigel Basement Membrane Matrix High Concentration, Corning). Cell migration assays were conducted without Matrigel coating using the same procedures described above. GBC cells transfected with different lentivirus or treated with Capivasertib were trypsinized and washed twice with PBS before being resuspended in serum-free DMEM/F12 medium (Gibco) and seeded into the upper chamber. During p-AKT inhibition assay, the culture medium was supplemented with Capivasertib at different concentrations indicated. The lower chamber was filled with 500 μL of culture medium containing 20% FBS. Cells in upper chambers were wiped out with cotton swabs after 24 h of incubation at 37 °C, 5% CO2. Cells were fixed in methanol and stained with 1% crystal violet solution, and the chambers were photographed with a microscope (Eclipse 50i POL, Nikon) equipped with a camera. Then, cells were quantified using the ImageJ software.
In vivo tumor metastasis models
For tumor metastasis models, six-week-old male nude mice were purchased from SiPeiFu Biotechnology Co., Ltd. (Beijing, China). After acclimatization for one week, the mice were randomly assigned to either the experimental or control group. Approximate 1 × 106 GBC-SD or NOZ cells stably expressing either OLFM4 or CEACAM6 were injected into mice by tail vein injection. Correspondingly, GBC cells transfected with shRNA vector against OLFM4 or CEACAM6 were inoculated with same cell mounts. Cells transfected with negative control or scramble RNA vector were taken as control, and all of viral vector were marked with firefly luciferase. In vivo imaging and quantification of bioluminescence were performed on a BLT AniView600 Living Image system. D-luciferin, a substrate for firefly luciferase, was injected intraperitoneally (150 μl D-luciferin at 30 mg/mL, Cat No. E1605, Promega) 5 min before luminescence imaging. After injection, mice were placed in a Sealed Optical Imaging Tray, ventilated with 1.5% isoflurane. All animal experiments were approved by the Animal Ethics and Welfare Committee (AEWC) of Yan’an Affiliated Hospital of Kunming Medical University (No. 2023025).
Bulk RNA sequencing and data analysis
Total RNA was isolated from GBC-SD cells stably transfected with scramble or shOLFM4 construct using TRIzol (Invitrogen). Following RNA quality assessment, mRNA was enriched using Oligo(dT) magnetic beads. The RNA sequencing libraries were constructed with the KAPA Stranded RNA-Seq Library Preparation Kit (Illumina) according to the manufacturer’s protocol. Libraries were subjected to 150 bp paired-end sequencing on an Illumina HiSeq2000 platform with three independent biological replicates per group. Raw sequencing reads were aligned to the human reference genome GRCh38.87 using STAR aligner (Version 2.7.10). Gene expression levels were quantified as fragments per kilobase million (FPKM) using Cufflinks (Version 2.2.1). Differential gene expression analysis between Scramble and shOLFM4 groups was performed using the limma R package (v3.54.2) with thresholds of |log2FC | >1, p-value < 0.01 and q-value < 0.05. Protein-protein interaction networks were reconstructed using the STRING database (Version 11.5) and visualized by Cytoscape (Version 3.9.1).
Immunofluorescence and confocal microscopy
GBC-SD and NOZ cells were plated on coverslips at 50,000 cells per well in a 24-well plate. After three times washing with PBS, the cells were fixed in 4% paraformaldehyde in PBS for 10 min. Permeabilization was performed using 0.1% Triton X-100 in PBS for 10 min and blocked with 5% horse serum for 1 h at room temperature, which was followed by primary antibodies staining. For GBC tissue samples, sections were obtained from gallbladder carcinoma cases, fixated in 10% neutral buffered formalin and embedded in paraffin. The 4 μm sections were subjected to de-paraffinized with xylene and graded ethanol hydration. Microwave-based antigen retrieval was performed in 0.01 M citrate buffer (pH 6.0), and the sections were blocked with 5% horse serum for 1 h at room temperature. Primary antibodies of OLFM4, CEACAM6 or TGFBR1 (summarized in Supplementary Table S1) was added and incubated overnight at 4 °C. After three times washing, corresponding secondary antibodies were added at 1:500 (Alexa Fluor 488, Alexa Fluor 594, Abcam). After three times washing, the coverslips were mounted using antifade mounting medium with DAPI (Cat No. P0131, Beyotime Biotechnology). Stained cells were imaged using a confocal microscope (LSM980, ZEISS) or Fluorescence microscope (Axio Scope5, ZEISS). For analysis of colocalization, two independent experiments were quantified using ImageJ and the Coloc 2 plug-in.
ELISA
For measurement of TGF-β in the medium, GBC-SD cells, which were knocked down or overexpressed by OLFM4, were cultured in 12-well plates and starved in the serum-free medium for 12 h. The supernatants were collected and centrifuged at 300 g for 5 min at 4 °C. TGF-β concentrations of the medium were determined by ELISA using the Human TGF-β ELISA kit (Cat No. EHC107b, Neobioscience) according to the manufacturer’s instructions.
Co-immunoprecipitation
Approximate 1 × 106 GBC-SD or NOZ cells were harvested and washed twice with PBS. The cells were lysed in 1 ml lysis buffer (50 mM Tris-HCl, 150 mM NaCl, 1% Triton X-100, 1 mM EDTA, Millipore) supplemented with cOmplete™ EDTA-free Protease Inhibitor Cocktail (Roche) and PhosSTOP Phosphatase Inhibitor Cocktail Tablets (Roche) for 30 min on ice. The cell lysates were centrifuged at 4 °C for 5 min at 15,000 × g. The protein concentrations were determined by BCA-based method. For inputs, 500 µg of cell lysate was mixed with 5× Loading Buffer (Solarbio) and heated for 10 min at 100 °C. Immunoprecipitations were performed by employing Catch and Release kit v2.0 (Cat. #17-500, Millipore). Briefly, total 500 µl of reaction mass, including 500 µg of cell lysates, 4 µg of antibody (anti-OLFM4, anti-TGFβRI or normal IgG antibody), 10 µl of antibody capture affinity ligand and appropriate amount of wash buffer, was incubated in a spin column on a rotating wheel at 4 °C for 12 h. Then, the column was centrifuged and washed three times with 400 µl wash buffer. The elutes were collected for immunoblotting by adding denaturing elution buffer containing β-ME.
His pull-down assay
For pull-down assay, anti-His-tag mAb-Magnetic Beads (Cat No. D291-11, MBL) were blocked with 5% BSA for 1 h, then incubated with OLFM4 Protein (Cat No. HY-P71179, MedChemExpress) and TGFBR1 Protein (Cat No. T07-13G, SignalChem) at 4 °C for 2 h on a rotating wheel. In competitive binding assays, TGF-β was added at concentrations of 0.5 and 0.25 μg/mL (Cat No. HZ-1011, Proteintech). After incubation, the supernatants were completely discarded using a magnetic stand, and the protein-bound beads were washed five times with wash buffer. Proteins were eluted by adding loading buffer and heated at 100 °C for immunoblotting.

