Our pipeline leverages a multimodal constraint autoencoder (scHCAE) to integrate the multiomics data during the clustering process and a matrix factorization-based model (scMF) to predict target genes regulated by a TF.
- Python 3.8 or higher
- Required packages:
torch
,sklearn
,scipy
,scanpy
,h5py
,numpys
,pandas
git clone https://github.com/xianglin226/Multi-SC.git
To run the scHCAE for clustering with a specified number of clusters, use the following command:
python -u run_scMultiCluster3.py \
--n_clusters 6 \
--data_file GSE178707_neatseq_lane1.h5
To predict target genes regulated by TFs using scMF, run:
python -u run_scMF.py \
--data_file processedinput_scMF_lane1.h5
The example data can be access here.
GSE178707_neatseq_lane1.h5
GSE178707_neatseq_lane2.h5
GSM5123951_TEAseq_well1.h5
X1
: Gene expression data (RNA)X2
: Protein expression data (ADT)X3
: Chromatin accessibility data (ATAC)X4
: Chromatin accessibility data (ATAC) mapped to gene featuresGenes
: Gene features (rows ofX1
)ADT
: Surface protein features (rows ofX2
)Peaks
: Peak features (rows ofX3
)GeneFromPeaks
: Gene features (rows ofX4
)Barcode
: Cell barcodes
processedinput_scMF_lane1.h5
processedinput_scMF_lane1.h5
B
: ADT-to-cell matrixW
: Gene-to-ADT matrixX
: Cell-to-gene matrix