Skip to content

Commit

Permalink
Merge branch 'main' of github.com:HusseinLakkis01/scCoAnnotate into main
Browse files Browse the repository at this point in the history
  • Loading branch information
Hussein Lakkis committed Jul 13, 2022
2 parents 5bf8960 + 8c6e900 commit 56c48cd
Showing 1 changed file with 72 additions and 18 deletions.
90 changes: 72 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
# scCoAnnotate <img src ="https://user-images.githubusercontent.com/59002771/130340419-3d1eff0b-ecb2-4104-9bf4-1bb968aff433.png" width="50" height="50">
<<<<<<< HEAD
=======

scRNA seq based Prediction of cell-types using a fast and efficient pipeline to increase automation and reduce the need to run several scripts and experiments. The pipeline allows the user to select what single-cell projection tools they want to run on a selected reference to annotate a list of query datasets.
>>>>>>> 4ca4f4307d12d050953dc5c184efcbf2f0facbab
# Summary

scRNA seq based Prediction of cell-types using a fast and efficient pipeline to increase automation and reduce the need to run several scripts and experiments. The pipeline allows the user to select what single-cell projection tools they want to run on a selected reference to annotate a list of query datasets. It then outputs a consensus of the predictions across tools selected. This pipeline trains classifiers on genes common to the reference and all query datasets.

The pipeline also features parallelization options to exploit the computational resources available.

Expand All @@ -30,7 +29,11 @@ Current version of snakemake is snakemake/5.32.0

Using snakemake is straight forward and simple. The rules and processes are arranged as per this rule graph:

<img width="758" align = 'center' alt="rule_graph" src="https://user-images.githubusercontent.com/59002771/130340625-1239a7ec-dfd5-4005-aa90-c65ada201886.png">
Rule preprocess gets the common genes and creates temporary reference and query datasets based ob the common genes. Rule concat appends all predictions into one tab seperate file (prediction_summary.tsv) and gets the consensus prediction


![save_as_a_png](https://user-images.githubusercontent.com/59002771/178054140-e7129733-6a8f-4819-8162-c29b3954d303.png)




Expand All @@ -43,9 +46,13 @@ snakemake --use-conda --configfile config.yml --cores 3
## Config File:
```yaml
output_dir: <path to outputs directory>
reference: <path to reference csv file with counts per cell, genes as columns and cells as rows>
reference: <path to reference csv file with RAW counts per cell, genes as columns and cells as rows>
labfile: <csv with labels per cell, the column header for the labels should be "label">
test: <path to test csv file with counts per cell, genes as columns and cells as rows>
test: - <path to test csv file 1 with RAW counts per cell, genes as columns and cells as rows>
- <path to test csv file 2 with RAW counts per cell, genes as columns and cells as rows>
.
.
.
rejection: <whether or not to reject poorly classified cells by SVM, default is True>
tools_to_run: # List of tools to run
- <tool 1>
Expand All @@ -56,17 +63,28 @@ tools_to_run: # List of tools to run
### An Example Config is attached
```yaml

output_dir: Results
reference: /project/kleinman/hussein.lakkis/from_hydra/2021_01_07-Cross_Validation_and_Benchmark/2021_04_05-SVM_and_SVMrej/data/scRNAseq_Benchmark_datasets/Joint_Mouse/joint_mouse.training.csv
labfile: /project/kleinman/hussein.lakkis/from_hydra/2021_01_07-Cross_Validation_and_Benchmark/2021_04_05-SVM_and_SVMrej/data/scRNAseq_Benchmark_datasets/Joint_Mouse/full_labels.csv
test: /project/kleinman/zahedeh.bashardanesh/from_beluga/2020-11_MANAV/data/S-10068_28741/expr.csv
rejection: "True"
output_dir: /project/kleinman/hussein.lakkis/from_hydra/test
reference: /projects/kleinman/hussein.lakkis/from_hydra/Collab/HGG_Selin_Revision/reference/reference.csv
labfile: /projects/kleinman/hussein.lakkis/from_hydra/Collab/HGG_Selin_Revision/reference/labels.csv
test:
- /projects/kleinman/hussein.lakkis/from_hydra/Collab/HGG_Selin_Revision/test/BT2016062/expression.csv
- /projects/kleinman/hussein.lakkis/from_hydra/Collab/HGG_Selin_Revision/test/BT2018022/expression.csv
- /projects/kleinman/hussein.lakkis/from_hydra/Collab/HGG_Selin_Revision/test/P-1190_S-1197/expression.csv
- /projects/kleinman/hussein.lakkis/from_hydra/Collab/HGG_Selin_Revision/test/P-1569_S-1569/expression.csv
- /projects/kleinman/hussein.lakkis/from_hydra/Collab/HGG_Selin_Revision/test/P-1694_S-1694_multiome/expression.csv
- /projects/kleinman/hussein.lakkis/from_hydra/Collab/HGG_Selin_Revision/test/P-1701_S-1701_multiome/expression.csv
rejection: True
tools_to_run:
- correlation
- SciBet
- scmapcell
- scmapcluster
- ACTINN
- SVM_reject
- SingleCellNet
- SciBet
- scHPL
- correlation
- CHETAH
- correlation
```
## Submission File:
Expand All @@ -75,7 +93,7 @@ An example of the submission file is also available in this repository and is ca
``` bash
#!/usr/bin/bash
#PBS -N Snakemake)_Pipeline
#PBS -N scCoAnnotate
#PBS -o logs/err.txt
#PBS -e logs/out.txt
#PBS -l walltime=20:00:00
Expand Down Expand Up @@ -109,8 +127,10 @@ snakemake --use-conda --configfile config.yml --cores 3
6. SVM Rejection
7. [SingleR](https://bioconductor.org/packages/release/bioc/html/SingleR.html)
8. [SingleCellNet](https://github.com/pcahan1/singleCellNet)

and many tools such as scMap Cell and my own classifier are being tested to be integrated in the pipeline.
9. [CHETAH](https://www.bioconductor.org/packages/release/bioc/html/CHETAH.html)
10. [scHPL](https://github.com/lcmmichielsen/scHPL)
11. [scPred](https://github.com/powellgenomicslab/scPred)
12. [scmap (cell and cluster)](https://bioconductor.org/packages/release/bioc/html/scmap.html)



Expand All @@ -130,11 +150,17 @@ pandas==1.1.5
numpy==1.19.5
numpy-groupies==0.9.13
numpydoc==1.1.0
scHPL==0.0.2
```

## R Libraries:

```
scPred_1.9.2
SingleCellExperiment_1.12.0
SummarizedExperiment_1.20.0
CHETAH_1.6.0
scmap_1.12.0
singleCellNet == 0.1.0
scibet == 1.0
SingleR == 1.4.1
Expand All @@ -146,3 +172,31 @@ ggsci == 2.9
tidyverse == 1.3.1
```
# Adding New Tools:

to add new tools, you have to add this template to the the snakefile as such:

``` python
rule {tool_name}:
input:
reference = "{output_dir}/expression.csv".format(output_dir =config['output_dir']),
labfile = config["labfile"],
test = expand("{output_dir}/{sample}/expression.csv",sample = samples,output_dir=config['output_dir']),
output_dir = expand("{output_dir}/{sample}",sample = samples,output_dir=config['output_dir'])

output:
pred = expand("{output_dir}/{sample}/{tool_name}/{tool_name}_pred.csv", sample = samples,output_dir=config["output_dir"]),
test_time = expand("{output_dir}/{sample}/{tool_name}/{tool_name}_test_time.csv",sample = samples,output_dir=config["output_dir"]),
training_time = expand("{output_dir}/{sample}/{tool_name}/{tool_name}_training_time.csv",sample = samples,output_dir=config["output_dir"])
log: expand("{output_dir}/{sample}/{tool_name}/{tool_name}.log", sample = samples,output_dir=config["output_dir"])
shell:
"Rscript Scripts/run_{tool_name}.R "
"--ref {input.reference} "
"--labs {input.labfile} "
"--test {input.test} "
"--output_dir {input.output_dir} "
"&> {log}"
```
The tool script you add must generate outputs that match the output of the rule..



0 comments on commit 56c48cd

Please sign in to comment.