Merge branch 'main' of github.com:HusseinLakkis01/scCoAnnotate into main

HusseinLakkis01 · Jul 13, 2022 · 56c48cd · 56c48cd
2 parents 5bf8960 + 8c6e900
commit 56c48cd
Showing 1 changed file with 72 additions and 18 deletions.
diff --git a/README.md b/README.md
@@ -1,9 +1,8 @@
 # scCoAnnotate <img src ="https://user-images.githubusercontent.com/59002771/130340419-3d1eff0b-ecb2-4104-9bf4-1bb968aff433.png" width="50" height="50">
-<<<<<<< HEAD
-=======
 
-scRNA seq based Prediction of cell-types using a fast and efficient pipeline to increase automation and reduce the need to run several scripts and experiments. The pipeline allows the user to select what single-cell projection tools they want to run on a selected reference to annotate a list of query datasets.
->>>>>>> 4ca4f4307d12d050953dc5c184efcbf2f0facbab
+# Summary
+
+scRNA seq based Prediction of cell-types using a fast and efficient pipeline to increase automation and reduce the need to run several scripts and experiments. The pipeline allows the user to select what single-cell projection tools they want to run on a selected reference to annotate a list of query datasets. It then outputs a consensus of the predictions across tools selected. This pipeline trains classifiers on genes common to the reference and all query datasets. 
 
 The pipeline also features parallelization options to exploit the computational resources available. 
 
@@ -30,7 +29,11 @@ Current version of snakemake is snakemake/5.32.0
 
 Using snakemake is straight forward and simple. The rules and processes are arranged as per this rule graph:
 
-<img width="758" align = 'center' alt="rule_graph" src="https://user-images.githubusercontent.com/59002771/130340625-1239a7ec-dfd5-4005-aa90-c65ada201886.png">
+Rule preprocess gets the common genes and creates temporary reference and query datasets based ob the common genes. Rule concat appends all predictions into one tab seperate file (prediction_summary.tsv) and gets the consensus prediction
+
+
+![save_as_a_png](https://user-images.githubusercontent.com/59002771/178054140-e7129733-6a8f-4819-8162-c29b3954d303.png)
+
 
 
 
@@ -43,9 +46,13 @@ snakemake --use-conda --configfile config.yml --cores 3
 ##  Config File:
 ```yaml 
 output_dir: <path to outputs directory>
-reference: <path to reference csv file with counts per cell, genes as columns and cells as rows>
+reference: <path to reference csv file with RAW counts per cell, genes as columns and cells as rows>
 labfile: <csv with labels per cell, the column header for the labels should be "label">
-test: <path to test csv file with counts per cell, genes as columns and cells as rows>
+test: - <path to test csv file 1 with RAW counts per cell, genes as columns and cells as rows>
+      - <path to test csv file 2 with RAW counts per cell, genes as columns and cells as rows>
+      .
+      .
+      .
 rejection: <whether or not to reject poorly classified cells by SVM, default is True>
 tools_to_run: # List of tools to run
   - <tool 1>
@@ -56,17 +63,28 @@ tools_to_run: # List of tools to run
 ### An Example Config is attached 
 
 ```yaml 
-
-output_dir: Results
-reference: /project/kleinman/hussein.lakkis/from_hydra/2021_01_07-Cross_Validation_and_Benchmark/2021_04_05-SVM_and_SVMrej/data/scRNAseq_Benchmark_datasets/Joint_Mouse/joint_mouse.training.csv
-labfile: /project/kleinman/hussein.lakkis/from_hydra/2021_01_07-Cross_Validation_and_Benchmark/2021_04_05-SVM_and_SVMrej/data/scRNAseq_Benchmark_datasets/Joint_Mouse/full_labels.csv
-test: /project/kleinman/zahedeh.bashardanesh/from_beluga/2020-11_MANAV/data/S-10068_28741/expr.csv
-rejection: "True"
+output_dir: /project/kleinman/hussein.lakkis/from_hydra/test
+reference: /projects/kleinman/hussein.lakkis/from_hydra/Collab/HGG_Selin_Revision/reference/reference.csv
+labfile: /projects/kleinman/hussein.lakkis/from_hydra/Collab/HGG_Selin_Revision/reference/labels.csv
+test:
+      - /projects/kleinman/hussein.lakkis/from_hydra/Collab/HGG_Selin_Revision/test/BT2016062/expression.csv
+      - /projects/kleinman/hussein.lakkis/from_hydra/Collab/HGG_Selin_Revision/test/BT2018022/expression.csv
+      - /projects/kleinman/hussein.lakkis/from_hydra/Collab/HGG_Selin_Revision/test/P-1190_S-1197/expression.csv
+      - /projects/kleinman/hussein.lakkis/from_hydra/Collab/HGG_Selin_Revision/test/P-1569_S-1569/expression.csv
+      - /projects/kleinman/hussein.lakkis/from_hydra/Collab/HGG_Selin_Revision/test/P-1694_S-1694_multiome/expression.csv
+      - /projects/kleinman/hussein.lakkis/from_hydra/Collab/HGG_Selin_Revision/test/P-1701_S-1701_multiome/expression.csv
+rejection: True
 tools_to_run:
-      - correlation
-      - SciBet
+      - scmapcell
+      - scmapcluster
       - ACTINN
       - SVM_reject
+      - SingleCellNet
+      - SciBet
+      - scHPL
+      - correlation
+      - CHETAH
+      - correlation
 ```
 
 ## Submission File:
@@ -75,7 +93,7 @@ An example of the submission file is also available in this repository and is ca
 
 ``` bash 
 #!/usr/bin/bash
-#PBS -N Snakemake)_Pipeline
+#PBS -N scCoAnnotate
 #PBS -o logs/err.txt
 #PBS -e logs/out.txt
 #PBS -l walltime=20:00:00
@@ -109,8 +127,10 @@ snakemake --use-conda --configfile config.yml --cores 3
 6. SVM Rejection
 7. [SingleR](https://bioconductor.org/packages/release/bioc/html/SingleR.html)
 8. [SingleCellNet](https://github.com/pcahan1/singleCellNet)
-
-and many tools such as scMap Cell and my own classifier are being tested to be integrated in the pipeline.
+9. [CHETAH](https://www.bioconductor.org/packages/release/bioc/html/CHETAH.html)
+10. [scHPL](https://github.com/lcmmichielsen/scHPL)
+11. [scPred](https://github.com/powellgenomicslab/scPred)
+12. [scmap (cell and cluster)](https://bioconductor.org/packages/release/bioc/html/scmap.html)
 
 
 
@@ -130,11 +150,17 @@ pandas==1.1.5
 numpy==1.19.5
 numpy-groupies==0.9.13
 numpydoc==1.1.0
+scHPL==0.0.2
 ```
 
 ## R Libraries:
 
 ```
+scPred_1.9.2
+SingleCellExperiment_1.12.0
+SummarizedExperiment_1.20.0
+CHETAH_1.6.0
+scmap_1.12.0 
 singleCellNet == 0.1.0
 scibet == 1.0
 SingleR == 1.4.1
@@ -146,3 +172,31 @@ ggsci == 2.9
 tidyverse == 1.3.1
 ```
 # Adding New Tools:
+
+to add new tools, you have to add this template to the the snakefile as such:
+
+``` python
+rule {tool_name}:
+  input:
+    reference = "{output_dir}/expression.csv".format(output_dir =config['output_dir']),
+    labfile = config["labfile"],
+    test = expand("{output_dir}/{sample}/expression.csv",sample = samples,output_dir=config['output_dir']),
+    output_dir =  expand("{output_dir}/{sample}",sample = samples,output_dir=config['output_dir'])
+
+  output:
+    pred = expand("{output_dir}/{sample}/{tool_name}/{tool_name}_pred.csv", sample  = samples,output_dir=config["output_dir"]),
+    test_time = expand("{output_dir}/{sample}/{tool_name}/{tool_name}_test_time.csv",sample  = samples,output_dir=config["output_dir"]),
+    training_time = expand("{output_dir}/{sample}/{tool_name}/{tool_name}_training_time.csv",sample  = samples,output_dir=config["output_dir"])
+  log: expand("{output_dir}/{sample}/{tool_name}/{tool_name}.log", sample = samples,output_dir=config["output_dir"])
+  shell:
+    "Rscript Scripts/run_{tool_name}.R "
+    "--ref {input.reference} "
+    "--labs {input.labfile} "
+    "--test {input.test} "
+    "--output_dir {input.output_dir} "
+    "&> {log}"
+ ```   
+ The tool script you add must generate outputs that match the output of the rule..
+
+
+