CCBR · kelly-sovacool · Nov 23, 2024 · Nov 22, 2024 · Nov 22, 2024 · Nov 23, 2024
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,7 +1,5 @@
 ## CHARLIE development version
 
-### bug fixes
-
 - CHARLIE was falsely throwing a file permissions error for tempdir values containing bash variables. (#118, @kelly-sovacool)
 - Singularity bind paths were not being set properly. (#119, @kelly-sovacool)
 - Update docker containers to set `$PYTHONPATH`. (#119, #125, @kelly-sovacool)
@@ -10,6 +8,7 @@
 - Fix `reconfig` to correctly replace variables in the config file. (#121, @kelly-sovacool)
 - Prevent using excessive memory when copying reference files. (#126, @kelly-sovacool)
 - Fix missing output files due to file system latency and use real (absolute) paths where possible. (#130, @kelly-sovacool)
+- Update documentation to reflect biowulf usage and improved test dataset. (#132, @kelly-sovacool)
 
 ## CHARLIE 0.11.0
 

diff --git a/README.md b/README.md
@@ -65,6 +65,14 @@ For complete documentation, view the website <https://CCBR.github.io/CHARLIE/>.
 
 ### 3. Software Dependencies
 
+CHARLIE is already installed on biowulf.
+It is included in the ccbrpipeliner module from release 7 onward.
+To load the module run:
+
+```bash
+module load ccbrpipeliner/7
+```
+
 The following version of various bioinformatics tools are using within CHARLIE:
 
 | tool          | version  |
@@ -97,7 +105,7 @@ The following version of various bioinformatics tools are using within CHARLIE:
 ### 4. Usage
 
 ```bash
- % ./charlie
+charlie
 
 
 ##########################################################################################
@@ -148,7 +156,7 @@ VIRUSES:
 ##########################################################################################
 
 USAGE:
-  bash /data/Ziegelbauer_lab/Pipelines/circRNA/activeDev/charlie -w/--workdir=<WORKDIR> -m/--runmode=<RUNMODE>
+  charlie -w/--workdir=<WORKDIR> -m/--runmode=<RUNMODE>
 
 Required Arguments:
 1.  WORKDIR     : [Type: String]: Absolute or relative path to the output folder with write permissions.
@@ -177,17 +185,17 @@ Optional Arguments:
 
 
 Example commands:
-  bash /data/Ziegelbauer_lab/Pipelines/circRNA/activeDev/charlie -w=/my/output/folder -m=init
-  bash /data/Ziegelbauer_lab/Pipelines/circRNA/activeDev/charlie -w=/my/output/folder -m=dryrun
-  bash /data/Ziegelbauer_lab/Pipelines/circRNA/activeDev/charlie -w=/my/output/folder -m=run
+  charlie -w=/my/output/folder -m=init
+  charlie -w=/my/output/folder -m=dryrun
+  charlie -w=/my/output/folder -m=run
 
 ##########################################################################################
 
 VersionInfo:
-  python          : 3.7
-  snakemake       : 7.19.1
-  pipeline_home   : /vf/users/Ziegelbauer_lab/Pipelines/circRNA/activeDev
-  git commit/tag  : 1ae5ca091976364369784f67adffbbbf1dcdb7d5    v0.8-197-g1ae5ca0
+  python          : 3
+  snakemake       : 7
+  pipeline_home   : /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/CHARLIE/.v0.11.1
+  git commit/tag  : 613fb617f1ed426fb8900f98e599ca0497a67cc0    v0.11.0-49-g613fb61
 
 ##########################################################################################
 ```
@@ -230,7 +238,7 @@ This will create the folder provided by `-w=`. The user should have write permis
 
 #### Dry-run
 
-Test data (1 paired-end subsample and 1 single-end subsample) have been including under the `.tests/dummy_fastqs` folder. After running in `-m=init`, `samples.tsv` should be edited to point the copies of the above mentioned samples with the column headers:
+Test data (1 paired-end subsample and 1 single-end subsample) have been including under the `/data/CCBR_Pipeliner/testdata/circRNA/human` folder. After running in `-m=init`, `samples.tsv` should be edited to point the copies of the above mentioned samples with the column headers:
 
 - sampleName
 - path_to_R1_fastq
@@ -302,14 +310,15 @@ Running...
 
 ##### 6.1 Test Data
 
-The `.tests/dummy_fastqs` folder in the repo has test dataset:
+The `/data/CCBR_Pipeliner/testdata/circRNA/human` folder in the repo has test dataset:
 
 ```bash
-% tree .tests/dummy_fastqs
-.tests/dummy_fastqs
-├── GI1_N.R1.fastq.gz
-├── GI1_N.R2.fastq.gz
-└── GI1_T.R1.fastq.gz
+tree /data/CCBR_Pipeliner/testdata/circRNA/human
+/data/CCBR_Pipeliner/testdata/circRNA/human
+├── GI1_N_ss.R1.fastq.gz
+├── GI1_N_ss.R2.fastq.gz
+├── GI1_T_ss.R1.fastq.gz
+└── samples.tsv
 ```
 
 `GI1_N` is a PE sample while `GI1_T` is a SE sample.

diff --git a/docs/tutorial.md b/docs/tutorial.md
@@ -4,52 +4,22 @@
 
 - [Biowulf](https://hpc.nih.gov/) account: Biowulf account can be requested [here](https://hpc.nih.gov/docs/accounts.html).
 
-- Membership to Ziegelbauer user group on Biowulf. You can check this by typing the following command:
+#### Installation
 
-  ```bash
-  % groups
-  ```
-
-output:
-
-```bash
-CCBR kopardevn Ziegelbauer_lab
-```
-
-If `Ziegelbauer_lab` is not listed then you can email a request to be added to the groups [here](mailto:staff@hpc.nih.gov)
-
-#### Location
-
-Different versions of circRNA DAQ pipeline have been parked at `/data/Ziegelbauer_lab/Pipelines/circRNA`
+CHARLIE is already installed on biowulf.
+It is included in the ccbrpipeliner module from release 7 onward.
+To load the module run:
 
 ```bash
-% ls /data/Ziegelbauer_lab/Pipelines/circRNA
+module load ccbrpipeliner/7
 ```
 
-output:
-
-```bash
-v0.1.0
-v0.10.0
-v0.10.0-dev
-v0.2.1
-v0.3.3
-v0.4.2
-v0.5.2
-v0.6.5
-v0.7.0
-v0.8
-v0.9.0
-```
-
-The exacts versions listed here may changed as newer versions are added. Also, the `dev` version is pointing to the most recent untagged version of the pipeline (use at own risk!)
-
 #### Init
 
 To get help about the pipeline you can run:
 
 ```bash
-% bash /data/Ziegelbauer_lab/Pipelines/circRNA/v0.10.0-dev/charlie
+charlie --help
 ```
 
 output:
@@ -76,7 +46,7 @@ Please contact Vishal Koparde for comments/questions (vishal.koparde@nih.gov)
 
 ##########################################################################################
 
-CHARLIE can be used to DAQ(Detect/Annotate/Quantify) circRNAs in hosts and viruses.
+CHARLIE can be used to DAQ (Detect/Annotate/Quantify) circRNAs in hosts and viruses.
 
 Here is the list of hosts and viruses that are currently supported:
 
@@ -104,7 +74,7 @@ VIRUSES:
 ##########################################################################################
 
 USAGE:
-  bash /data/Ziegelbauer_lab/Pipelines/circRNA/v0.10.0-dev/charlie -w/--workdir=<WORKDIR> -m/--runmode=<RUNMODE>
+  charlie -w/--workdir=<WORKDIR> -m/--runmode=<RUNMODE>
 
 Required Arguments:
 1.  WORKDIR     : [Type: String]: Absolute or relative path to the output folder with write permissions.
@@ -133,17 +103,17 @@ Optional Arguments:
 
 
 Example commands:
-  bash /data/Ziegelbauer_lab/Pipelines/circRNA/v0.10.0-dev/charlie -w=/my/output/folder -m=init
-  bash /data/Ziegelbauer_lab/Pipelines/circRNA/v0.10.0-dev/charlie -w=/my/output/folder -m=dryrun
-  bash /data/Ziegelbauer_lab/Pipelines/circRNA/v0.10.0-dev/charlie -w=/my/output/folder -m=run
+  charlie -w=/my/output/folder -m=init
+  charlie -w=/my/output/folder -m=dryrun
+  charlie -w=/my/output/folder -m=run
 
 ##########################################################################################
 
 VersionInfo:
-  python          : 3.7
-  snakemake       : 7.19.1
-  pipeline_home   : /vf/users/Ziegelbauer_lab/Pipelines/circRNA/v0.10.0-dev
-  git commit/tag  : b2cf2f089788651041b16bf4378c2c5172c13cb2    v0.10.0-2-gb2cf2f0
+  python          : 3
+  snakemake       : 7
+  pipeline_home   : /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/CHARLIE/.v0.11.1
+  git commit/tag  : 613fb617f1ed426fb8900f98e599ca0497a67cc0    v0.11.0-49-g613fb61
 
 ##########################################################################################
 ```
@@ -154,15 +124,15 @@ VersionInfo:
 To initial the working directory run:
 
 ```bash
-% bash <path to charlie> -w=<path to output dir> -m=init
+charlie -w=<path to output dir> -m=init
 ```
 
 This assumes that `<path to output dir>` does not exist before running the above command and is at a location where write permissions are available.
 
 The above command creates `<path to output dir>` folder and creates 2 subfolders `logs` and `stats` inside that folder along with `config.yaml` and `samples.tsv` files.
 
 ```bash
-% tree <path to output dir>
+tree <path to output dir>
 ```
 
 ##### config.yaml
@@ -188,14 +158,15 @@ Tab delimited definition of sample sheet. The header is fixed and each row repre
 2. path_to_R1_fastq = absolute path to the read1 fastq.gz file.
 3. path_to_R2_fastq = absolute path to the read2 fastq.gz file. If the sample was sequenced in single-end mode, then leave this blank.
 
-The `.tests/dummy_fastqs` folder in the repo has test dataset:
+The `/data/CCBR_Pipeliner/testdata/circRNA/humans` folder in the repo has test dataset:
 
 ```bash
-% tree .tests/dummy_fastqs
-.tests/dummy_fastqs
-├── GI1_N.R1.fastq.gz
-├── GI1_N.R2.fastq.gz
-└── GI1_T.R1.fastq.gz
+tree /data/CCBR_Pipeliner/testdata/circRNA/humans
+/data/CCBR_Pipeliner/testdata/circRNA/humans
+├── GI1_N_ss.R1.fastq.gz
+├── GI1_N_ss.R2.fastq.gz
+├── GI1_T_ss.R1.fastq.gz
+└── samples.tsv
 ```
 
 `GI1_N` is a PE sample while `GI1_T` is a SE sample.
@@ -205,7 +176,7 @@ The `.tests/dummy_fastqs` folder in the repo has test dataset:
 Once the `samples.tsv` file has been edited appropriately to include the desired samples, it is a good idea to **dryrun** the pipeline to ensure that everything will work as desired. Dryrun can be run as follows:
 
 ```bash
-bash <path to charlie> -w=<path to output dir> -m=dryrun
+charlie -w=<path to output dir> -m=dryrun
 ```
 
 This will create the reference fasta and gtf file based on the selections made in the `config.yaml`. Hence, can take a few minutes to run.
@@ -215,7 +186,7 @@ This will create the reference fasta and gtf file based on the selections made i
 Upon verifying that dryrun is successful. You can then submit the job to the cluster using the following command:
 
 ```bash
-bash <path to charlie> -w=<path to output dir> -m=run
+charlie -w=<path to output dir> -m=run
 ```
 
 which will produce something like this:
@@ -273,7 +244,7 @@ Running...
 In this example, `14743440` is the jobid returned by the slurm job scheduler on biowulf. This means that the job was successfully submitted, it will spawn off other subjobs which in-turn will be run and outputs will be moved to the `results` folder created inside the working directory supplied at command line. You can check the status of your queue of jobs in biowulf running:
 
 ```bash
-% squeue -u `whoami`
+squeue -u `whoami`
 ```
 
 output:
@@ -290,7 +261,7 @@ Next, just sit tight until the pipeline finishes. You can keep monitoring the qu
 Once completed the output should something like this:
 
 ```bash
-% tree <path to output dir>
+tree <path to output dir>
 ```
 
 output: