diff --git a/CHANGELOG.md b/CHANGELOG.md
index a244c6b..3a759cc 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,7 +1,5 @@
## CHARLIE development version
-### bug fixes
-
- CHARLIE was falsely throwing a file permissions error for tempdir values containing bash variables. (#118, @kelly-sovacool)
- Singularity bind paths were not being set properly. (#119, @kelly-sovacool)
- Update docker containers to set `$PYTHONPATH`. (#119, #125, @kelly-sovacool)
@@ -10,6 +8,7 @@
- Fix `reconfig` to correctly replace variables in the config file. (#121, @kelly-sovacool)
- Prevent using excessive memory when copying reference files. (#126, @kelly-sovacool)
- Fix missing output files due to file system latency and use real (absolute) paths where possible. (#130, @kelly-sovacool)
+- Update documentation to reflect biowulf usage and improved test dataset. (#132, @kelly-sovacool)
## CHARLIE 0.11.0
diff --git a/README.md b/README.md
index a53db66..b1867ba 100644
--- a/README.md
+++ b/README.md
@@ -65,6 +65,14 @@ For complete documentation, view the website .
### 3. Software Dependencies
+CHARLIE is already installed on biowulf.
+It is included in the ccbrpipeliner module from release 7 onward.
+To load the module run:
+
+```bash
+module load ccbrpipeliner/7
+```
+
The following version of various bioinformatics tools are using within CHARLIE:
| tool | version |
@@ -97,7 +105,7 @@ The following version of various bioinformatics tools are using within CHARLIE:
### 4. Usage
```bash
- % ./charlie
+charlie
##########################################################################################
@@ -148,7 +156,7 @@ VIRUSES:
##########################################################################################
USAGE:
- bash /data/Ziegelbauer_lab/Pipelines/circRNA/activeDev/charlie -w/--workdir= -m/--runmode=
+ charlie -w/--workdir= -m/--runmode=
Required Arguments:
1. WORKDIR : [Type: String]: Absolute or relative path to the output folder with write permissions.
@@ -177,17 +185,17 @@ Optional Arguments:
Example commands:
- bash /data/Ziegelbauer_lab/Pipelines/circRNA/activeDev/charlie -w=/my/output/folder -m=init
- bash /data/Ziegelbauer_lab/Pipelines/circRNA/activeDev/charlie -w=/my/output/folder -m=dryrun
- bash /data/Ziegelbauer_lab/Pipelines/circRNA/activeDev/charlie -w=/my/output/folder -m=run
+ charlie -w=/my/output/folder -m=init
+ charlie -w=/my/output/folder -m=dryrun
+ charlie -w=/my/output/folder -m=run
##########################################################################################
VersionInfo:
- python : 3.7
- snakemake : 7.19.1
- pipeline_home : /vf/users/Ziegelbauer_lab/Pipelines/circRNA/activeDev
- git commit/tag : 1ae5ca091976364369784f67adffbbbf1dcdb7d5 v0.8-197-g1ae5ca0
+ python : 3
+ snakemake : 7
+ pipeline_home : /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/CHARLIE/.v0.11.1
+ git commit/tag : 613fb617f1ed426fb8900f98e599ca0497a67cc0 v0.11.0-49-g613fb61
##########################################################################################
```
@@ -230,7 +238,7 @@ This will create the folder provided by `-w=`. The user should have write permis
#### Dry-run
-Test data (1 paired-end subsample and 1 single-end subsample) have been including under the `.tests/dummy_fastqs` folder. After running in `-m=init`, `samples.tsv` should be edited to point the copies of the above mentioned samples with the column headers:
+Test data (1 paired-end subsample and 1 single-end subsample) have been including under the `/data/CCBR_Pipeliner/testdata/circRNA/human` folder. After running in `-m=init`, `samples.tsv` should be edited to point the copies of the above mentioned samples with the column headers:
- sampleName
- path_to_R1_fastq
@@ -302,14 +310,15 @@ Running...
##### 6.1 Test Data
-The `.tests/dummy_fastqs` folder in the repo has test dataset:
+The `/data/CCBR_Pipeliner/testdata/circRNA/human` folder in the repo has test dataset:
```bash
-% tree .tests/dummy_fastqs
-.tests/dummy_fastqs
-├── GI1_N.R1.fastq.gz
-├── GI1_N.R2.fastq.gz
-└── GI1_T.R1.fastq.gz
+tree /data/CCBR_Pipeliner/testdata/circRNA/human
+/data/CCBR_Pipeliner/testdata/circRNA/human
+├── GI1_N_ss.R1.fastq.gz
+├── GI1_N_ss.R2.fastq.gz
+├── GI1_T_ss.R1.fastq.gz
+└── samples.tsv
```
`GI1_N` is a PE sample while `GI1_T` is a SE sample.
diff --git a/docs/tutorial.md b/docs/tutorial.md
index 2b797c3..e3aa099 100644
--- a/docs/tutorial.md
+++ b/docs/tutorial.md
@@ -4,52 +4,22 @@
- [Biowulf](https://hpc.nih.gov/) account: Biowulf account can be requested [here](https://hpc.nih.gov/docs/accounts.html).
-- Membership to Ziegelbauer user group on Biowulf. You can check this by typing the following command:
+#### Installation
- ```bash
- % groups
- ```
-
-output:
-
-```bash
-CCBR kopardevn Ziegelbauer_lab
-```
-
-If `Ziegelbauer_lab` is not listed then you can email a request to be added to the groups [here](mailto:staff@hpc.nih.gov)
-
-#### Location
-
-Different versions of circRNA DAQ pipeline have been parked at `/data/Ziegelbauer_lab/Pipelines/circRNA`
+CHARLIE is already installed on biowulf.
+It is included in the ccbrpipeliner module from release 7 onward.
+To load the module run:
```bash
-% ls /data/Ziegelbauer_lab/Pipelines/circRNA
+module load ccbrpipeliner/7
```
-output:
-
-```bash
-v0.1.0
-v0.10.0
-v0.10.0-dev
-v0.2.1
-v0.3.3
-v0.4.2
-v0.5.2
-v0.6.5
-v0.7.0
-v0.8
-v0.9.0
-```
-
-The exacts versions listed here may changed as newer versions are added. Also, the `dev` version is pointing to the most recent untagged version of the pipeline (use at own risk!)
-
#### Init
To get help about the pipeline you can run:
```bash
-% bash /data/Ziegelbauer_lab/Pipelines/circRNA/v0.10.0-dev/charlie
+charlie --help
```
output:
@@ -76,7 +46,7 @@ Please contact Vishal Koparde for comments/questions (vishal.koparde@nih.gov)
##########################################################################################
-CHARLIE can be used to DAQ(Detect/Annotate/Quantify) circRNAs in hosts and viruses.
+CHARLIE can be used to DAQ (Detect/Annotate/Quantify) circRNAs in hosts and viruses.
Here is the list of hosts and viruses that are currently supported:
@@ -104,7 +74,7 @@ VIRUSES:
##########################################################################################
USAGE:
- bash /data/Ziegelbauer_lab/Pipelines/circRNA/v0.10.0-dev/charlie -w/--workdir= -m/--runmode=
+ charlie -w/--workdir= -m/--runmode=
Required Arguments:
1. WORKDIR : [Type: String]: Absolute or relative path to the output folder with write permissions.
@@ -133,17 +103,17 @@ Optional Arguments:
Example commands:
- bash /data/Ziegelbauer_lab/Pipelines/circRNA/v0.10.0-dev/charlie -w=/my/output/folder -m=init
- bash /data/Ziegelbauer_lab/Pipelines/circRNA/v0.10.0-dev/charlie -w=/my/output/folder -m=dryrun
- bash /data/Ziegelbauer_lab/Pipelines/circRNA/v0.10.0-dev/charlie -w=/my/output/folder -m=run
+ charlie -w=/my/output/folder -m=init
+ charlie -w=/my/output/folder -m=dryrun
+ charlie -w=/my/output/folder -m=run
##########################################################################################
VersionInfo:
- python : 3.7
- snakemake : 7.19.1
- pipeline_home : /vf/users/Ziegelbauer_lab/Pipelines/circRNA/v0.10.0-dev
- git commit/tag : b2cf2f089788651041b16bf4378c2c5172c13cb2 v0.10.0-2-gb2cf2f0
+ python : 3
+ snakemake : 7
+ pipeline_home : /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/CHARLIE/.v0.11.1
+ git commit/tag : 613fb617f1ed426fb8900f98e599ca0497a67cc0 v0.11.0-49-g613fb61
##########################################################################################
```
@@ -154,7 +124,7 @@ VersionInfo:
To initial the working directory run:
```bash
-% bash -w= -m=init
+charlie -w= -m=init
```
This assumes that `` does not exist before running the above command and is at a location where write permissions are available.
@@ -162,7 +132,7 @@ This assumes that `` does not exist before running the above
The above command creates `` folder and creates 2 subfolders `logs` and `stats` inside that folder along with `config.yaml` and `samples.tsv` files.
```bash
-% tree
+tree
```
##### config.yaml
@@ -188,14 +158,15 @@ Tab delimited definition of sample sheet. The header is fixed and each row repre
2. path_to_R1_fastq = absolute path to the read1 fastq.gz file.
3. path_to_R2_fastq = absolute path to the read2 fastq.gz file. If the sample was sequenced in single-end mode, then leave this blank.
-The `.tests/dummy_fastqs` folder in the repo has test dataset:
+The `/data/CCBR_Pipeliner/testdata/circRNA/humans` folder in the repo has test dataset:
```bash
-% tree .tests/dummy_fastqs
-.tests/dummy_fastqs
-├── GI1_N.R1.fastq.gz
-├── GI1_N.R2.fastq.gz
-└── GI1_T.R1.fastq.gz
+tree /data/CCBR_Pipeliner/testdata/circRNA/humans
+/data/CCBR_Pipeliner/testdata/circRNA/humans
+├── GI1_N_ss.R1.fastq.gz
+├── GI1_N_ss.R2.fastq.gz
+├── GI1_T_ss.R1.fastq.gz
+└── samples.tsv
```
`GI1_N` is a PE sample while `GI1_T` is a SE sample.
@@ -205,7 +176,7 @@ The `.tests/dummy_fastqs` folder in the repo has test dataset:
Once the `samples.tsv` file has been edited appropriately to include the desired samples, it is a good idea to **dryrun** the pipeline to ensure that everything will work as desired. Dryrun can be run as follows:
```bash
-bash -w= -m=dryrun
+charlie -w= -m=dryrun
```
This will create the reference fasta and gtf file based on the selections made in the `config.yaml`. Hence, can take a few minutes to run.
@@ -215,7 +186,7 @@ This will create the reference fasta and gtf file based on the selections made i
Upon verifying that dryrun is successful. You can then submit the job to the cluster using the following command:
```bash
-bash -w= -m=run
+charlie -w= -m=run
```
which will produce something like this:
@@ -273,7 +244,7 @@ Running...
In this example, `14743440` is the jobid returned by the slurm job scheduler on biowulf. This means that the job was successfully submitted, it will spawn off other subjobs which in-turn will be run and outputs will be moved to the `results` folder created inside the working directory supplied at command line. You can check the status of your queue of jobs in biowulf running:
```bash
-% squeue -u `whoami`
+squeue -u `whoami`
```
output:
@@ -290,7 +261,7 @@ Next, just sit tight until the pipeline finishes. You can keep monitoring the qu
Once completed the output should something like this:
```bash
-% tree
+tree
```
output: