-
Create a new folder in
subworkflows/deconvolution
, containing- A script which (minimally) takes as input a single-cell reference matrix, spatial expression matrix, and cell type annotation column,
and returns as output a TSV file of the predicted proportions (spots in rows and cell types in columns) - A Nextflow process which calls this script
- A script which (minimally) takes as input a single-cell reference matrix, spatial expression matrix, and cell type annotation column,
-
Include the process in
subworkflows/deconvolution/run_methods.nf
We will be adding a simple algorithm of non-negative least squares regression to the pipeline. For R methods, we assume the single-cell reference is a Seurat Object with a cell type annotation column, and the spatial object is either a Seurat object or a synthetic dataset generated by synthspot (a named list with the expression matrix in "counts").
-
Create a directory
subworkflows/deconvolution/nnls
containingscript_nf.R
: a script that runs NNLS. We useR.utils::commandArgs
to parse command line arguments, withsc_input
andsp_input
the path to the single-cell and spatial objects, andannot
the name of the cell type annotation column.
The script returns a TSV file of the spot x cell type proportion matrix. We recommend copying the template of results printing,
because we 1) do not include rownames, 2) remove non-alphanumeric characters from cell types and 3) shell sort the cell types.run_method.nf
: a Nextflow process that runsscript_nf.R
. For simple cases you can simply replace "nnls" with your method name. You can remove thecontainer
directive if you only wish to run it locally.- OPTIONAL: build a docker container with your method (see
Dockerfile
)
-
Add nnls to subworkflows/deconvolution/run_methods.nf
- In the
include
statement at the beginning of the file (include { runNNLS } from './nnls/run_method.nf'
) - In parameters
all_methods
(all_methods = "music,rctd, ... ,dstg,nnls"
) - In the
runMethods
workflow
if ( methods =~ /nnls/ ){ runNNLS(pair_input_ch) output_ch = output_ch.mix(runNNLS.out) }
- In the
-
Test it out with
nextflow run main.nf --methods nnls -profile local \
--sc_input unit-test/test_sc_data.rds --sp_input unit-test/test_sp_data.rds \
--annot subclass
For simple algorithms like NNLS the workflow is exactly the same, but the inputs are expected to be h5ad files instead of Seurat objects. However, most Python methods make use of Bayesian probabilistic models (i.e., cell2location, stereoscope, and DestVI) and comprise model building and model fitting steps. Hence, you would need two scripts and two Nextflow processes.
Typically, the model building script (build_model.py
) only takes the single-cell object and annotation column as input, while the model fitting script (fit_model.py)
takes the model and spatial dataset as input. The Nextflow processes in run_method.nf
would minimally look something like
process buildModel {
input:
path (sc_input)
output:
path (model)
script:
"""
python build_model.py $sc_input --annot $params.annot
# Assume the script outputs a file called "model" containing the built model
"""
}
process fitModel {
input:
path (sp_input)
path (model)
output:
path (output_props)
script:
"""
python fit_model.py $sp_input $model
# Assume the script outputs a file called "output_props" containing output proportions
"""
}
Note: In the cell2location/stereoscope/DestVI processes you will instead see input: tuple path (sp_input), path (sp_input_rds)
. This is because although we internally converted the RDS file to a H5AD file, the original RDS file is still needed for metric computation. So you will also need to follow this format while implementing your own method.
Then, you can also add the method in subworkflow/deconvolution/run_methods.nf
:
- In the
include
statement at the beginning of the file (include { runMethod } from './method_name/run_method.nf'
) - In parameters
all_methods
(all_methods = "music,rctd, ... ,dstg,nnls,method_name"
) - In
python_methods
(python_methods = ['stereoscope', ... 'method_name']
) - In the
runMethods
workflow
Adding your process to the runMethods
workflow is slightly more complicated than the R case, since we want to be able to run multiple spatial datasets by building the model only once. In Nextflow, a channel can only be used once, so we will need to replicate the model channel to the amount of spatial datasets. As an example, we will see how this was done in the case of cell2location:
buildCell2locationModel(sc_input_conv) --> build model using single-cell dataset
// Repeat model output for each spatial file
buildCell2locationModel.out.combine(sp_input_pair) --> .out refers to the model file; we "combine" (cartesian product) the model channel with each spatial input channel
.multiMap { model_sc_file, sp_file_h5ad, sp_file_rds -> --> we will remap this combined channel which contains three components (the model file, H5AD spatial file, and RDS spatial file)
model: model_sc_file --> in the redefined channel, the model file can be accessed via "model"
sp_input: tuple sp_file_h5ad, sp_file_rds } --> the spatial files are grouped as a tuple, accessed via "sp_input"
.set{ c2l_combined_ch } --> name this channel as c2l_combined_ch
fitCell2locationModel(c2l_combined_ch.sp_input, --> fit the model using sp_input tuple and the model file
c2l_combined_ch.model)
formatC2L(fitCell2locationModel.out) --> format the TSV file output by the cell2location model
output_ch = output_ch.mix(formatC2L.out)