Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update docs for use with ccbrpipeliner/7 #132

Merged
merged 3 commits into from
Nov 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
## CHARLIE development version

### bug fixes

- CHARLIE was falsely throwing a file permissions error for tempdir values containing bash variables. (#118, @kelly-sovacool)
- Singularity bind paths were not being set properly. (#119, @kelly-sovacool)
- Update docker containers to set `$PYTHONPATH`. (#119, #125, @kelly-sovacool)
Expand All @@ -10,6 +8,7 @@
- Fix `reconfig` to correctly replace variables in the config file. (#121, @kelly-sovacool)
- Prevent using excessive memory when copying reference files. (#126, @kelly-sovacool)
- Fix missing output files due to file system latency and use real (absolute) paths where possible. (#130, @kelly-sovacool)
- Update documentation to reflect biowulf usage and improved test dataset. (#132, @kelly-sovacool)

## CHARLIE 0.11.0

Expand Down
41 changes: 25 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,14 @@ For complete documentation, view the website <https://CCBR.github.io/CHARLIE/>.

### 3. Software Dependencies

CHARLIE is already installed on biowulf.
It is included in the ccbrpipeliner module from release 7 onward.
To load the module run:

```bash
module load ccbrpipeliner/7
```

The following version of various bioinformatics tools are using within CHARLIE:

| tool | version |
Expand Down Expand Up @@ -97,7 +105,7 @@ The following version of various bioinformatics tools are using within CHARLIE:
### 4. Usage

```bash
% ./charlie
charlie


##########################################################################################
Expand Down Expand Up @@ -148,7 +156,7 @@ VIRUSES:
##########################################################################################

USAGE:
bash /data/Ziegelbauer_lab/Pipelines/circRNA/activeDev/charlie -w/--workdir=<WORKDIR> -m/--runmode=<RUNMODE>
charlie -w/--workdir=<WORKDIR> -m/--runmode=<RUNMODE>

Required Arguments:
1. WORKDIR : [Type: String]: Absolute or relative path to the output folder with write permissions.
Expand Down Expand Up @@ -177,17 +185,17 @@ Optional Arguments:


Example commands:
bash /data/Ziegelbauer_lab/Pipelines/circRNA/activeDev/charlie -w=/my/output/folder -m=init
bash /data/Ziegelbauer_lab/Pipelines/circRNA/activeDev/charlie -w=/my/output/folder -m=dryrun
bash /data/Ziegelbauer_lab/Pipelines/circRNA/activeDev/charlie -w=/my/output/folder -m=run
charlie -w=/my/output/folder -m=init
charlie -w=/my/output/folder -m=dryrun
charlie -w=/my/output/folder -m=run

##########################################################################################

VersionInfo:
python : 3.7
snakemake : 7.19.1
pipeline_home : /vf/users/Ziegelbauer_lab/Pipelines/circRNA/activeDev
git commit/tag : 1ae5ca091976364369784f67adffbbbf1dcdb7d5 v0.8-197-g1ae5ca0
python : 3
snakemake : 7
pipeline_home : /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/CHARLIE/.v0.11.1
git commit/tag : 613fb617f1ed426fb8900f98e599ca0497a67cc0 v0.11.0-49-g613fb61

##########################################################################################
```
Expand Down Expand Up @@ -230,7 +238,7 @@ This will create the folder provided by `-w=`. The user should have write permis

#### Dry-run

Test data (1 paired-end subsample and 1 single-end subsample) have been including under the `.tests/dummy_fastqs` folder. After running in `-m=init`, `samples.tsv` should be edited to point the copies of the above mentioned samples with the column headers:
Test data (1 paired-end subsample and 1 single-end subsample) have been including under the `/data/CCBR_Pipeliner/testdata/circRNA/human` folder. After running in `-m=init`, `samples.tsv` should be edited to point the copies of the above mentioned samples with the column headers:

- sampleName
- path_to_R1_fastq
Expand Down Expand Up @@ -302,14 +310,15 @@ Running...

##### 6.1 Test Data

The `.tests/dummy_fastqs` folder in the repo has test dataset:
The `/data/CCBR_Pipeliner/testdata/circRNA/human` folder in the repo has test dataset:

```bash
% tree .tests/dummy_fastqs
.tests/dummy_fastqs
├── GI1_N.R1.fastq.gz
├── GI1_N.R2.fastq.gz
└── GI1_T.R1.fastq.gz
tree /data/CCBR_Pipeliner/testdata/circRNA/human
/data/CCBR_Pipeliner/testdata/circRNA/human
├── GI1_N_ss.R1.fastq.gz
├── GI1_N_ss.R2.fastq.gz
├── GI1_T_ss.R1.fastq.gz
└── samples.tsv
```

`GI1_N` is a PE sample while `GI1_T` is a SE sample.
Expand Down
85 changes: 28 additions & 57 deletions docs/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,52 +4,22 @@

- [Biowulf](https://hpc.nih.gov/) account: Biowulf account can be requested [here](https://hpc.nih.gov/docs/accounts.html).

- Membership to Ziegelbauer user group on Biowulf. You can check this by typing the following command:
#### Installation

```bash
% groups
```

output:

```bash
CCBR kopardevn Ziegelbauer_lab
```

If `Ziegelbauer_lab` is not listed then you can email a request to be added to the groups [here](mailto:staff@hpc.nih.gov)

#### Location

Different versions of circRNA DAQ pipeline have been parked at `/data/Ziegelbauer_lab/Pipelines/circRNA`
CHARLIE is already installed on biowulf.
It is included in the ccbrpipeliner module from release 7 onward.
To load the module run:

```bash
% ls /data/Ziegelbauer_lab/Pipelines/circRNA
module load ccbrpipeliner/7
```

output:

```bash
v0.1.0
v0.10.0
v0.10.0-dev
v0.2.1
v0.3.3
v0.4.2
v0.5.2
v0.6.5
v0.7.0
v0.8
v0.9.0
```

The exacts versions listed here may changed as newer versions are added. Also, the `dev` version is pointing to the most recent untagged version of the pipeline (use at own risk!)

#### Init

To get help about the pipeline you can run:

```bash
% bash /data/Ziegelbauer_lab/Pipelines/circRNA/v0.10.0-dev/charlie
charlie --help
```

output:
Expand All @@ -76,7 +46,7 @@ Please contact Vishal Koparde for comments/questions (vishal.koparde@nih.gov)

##########################################################################################

CHARLIE can be used to DAQ(Detect/Annotate/Quantify) circRNAs in hosts and viruses.
CHARLIE can be used to DAQ (Detect/Annotate/Quantify) circRNAs in hosts and viruses.

Here is the list of hosts and viruses that are currently supported:

Expand Down Expand Up @@ -104,7 +74,7 @@ VIRUSES:
##########################################################################################

USAGE:
bash /data/Ziegelbauer_lab/Pipelines/circRNA/v0.10.0-dev/charlie -w/--workdir=<WORKDIR> -m/--runmode=<RUNMODE>
charlie -w/--workdir=<WORKDIR> -m/--runmode=<RUNMODE>

Required Arguments:
1. WORKDIR : [Type: String]: Absolute or relative path to the output folder with write permissions.
Expand Down Expand Up @@ -133,17 +103,17 @@ Optional Arguments:


Example commands:
bash /data/Ziegelbauer_lab/Pipelines/circRNA/v0.10.0-dev/charlie -w=/my/output/folder -m=init
bash /data/Ziegelbauer_lab/Pipelines/circRNA/v0.10.0-dev/charlie -w=/my/output/folder -m=dryrun
bash /data/Ziegelbauer_lab/Pipelines/circRNA/v0.10.0-dev/charlie -w=/my/output/folder -m=run
charlie -w=/my/output/folder -m=init
charlie -w=/my/output/folder -m=dryrun
charlie -w=/my/output/folder -m=run

##########################################################################################

VersionInfo:
python : 3.7
snakemake : 7.19.1
pipeline_home : /vf/users/Ziegelbauer_lab/Pipelines/circRNA/v0.10.0-dev
git commit/tag : b2cf2f089788651041b16bf4378c2c5172c13cb2 v0.10.0-2-gb2cf2f0
python : 3
snakemake : 7
pipeline_home : /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/CHARLIE/.v0.11.1
git commit/tag : 613fb617f1ed426fb8900f98e599ca0497a67cc0 v0.11.0-49-g613fb61

##########################################################################################
```
Expand All @@ -154,15 +124,15 @@ VersionInfo:
To initial the working directory run:

```bash
% bash <path to charlie> -w=<path to output dir> -m=init
charlie -w=<path to output dir> -m=init
```

This assumes that `<path to output dir>` does not exist before running the above command and is at a location where write permissions are available.

The above command creates `<path to output dir>` folder and creates 2 subfolders `logs` and `stats` inside that folder along with `config.yaml` and `samples.tsv` files.

```bash
% tree <path to output dir>
tree <path to output dir>
```

##### config.yaml
Expand All @@ -188,14 +158,15 @@ Tab delimited definition of sample sheet. The header is fixed and each row repre
2. path_to_R1_fastq = absolute path to the read1 fastq.gz file.
3. path_to_R2_fastq = absolute path to the read2 fastq.gz file. If the sample was sequenced in single-end mode, then leave this blank.

The `.tests/dummy_fastqs` folder in the repo has test dataset:
The `/data/CCBR_Pipeliner/testdata/circRNA/humans` folder in the repo has test dataset:

```bash
% tree .tests/dummy_fastqs
.tests/dummy_fastqs
├── GI1_N.R1.fastq.gz
├── GI1_N.R2.fastq.gz
└── GI1_T.R1.fastq.gz
tree /data/CCBR_Pipeliner/testdata/circRNA/humans
/data/CCBR_Pipeliner/testdata/circRNA/humans
├── GI1_N_ss.R1.fastq.gz
├── GI1_N_ss.R2.fastq.gz
├── GI1_T_ss.R1.fastq.gz
└── samples.tsv
```

`GI1_N` is a PE sample while `GI1_T` is a SE sample.
Expand All @@ -205,7 +176,7 @@ The `.tests/dummy_fastqs` folder in the repo has test dataset:
Once the `samples.tsv` file has been edited appropriately to include the desired samples, it is a good idea to **dryrun** the pipeline to ensure that everything will work as desired. Dryrun can be run as follows:

```bash
bash <path to charlie> -w=<path to output dir> -m=dryrun
charlie -w=<path to output dir> -m=dryrun
```

This will create the reference fasta and gtf file based on the selections made in the `config.yaml`. Hence, can take a few minutes to run.
Expand All @@ -215,7 +186,7 @@ This will create the reference fasta and gtf file based on the selections made i
Upon verifying that dryrun is successful. You can then submit the job to the cluster using the following command:

```bash
bash <path to charlie> -w=<path to output dir> -m=run
charlie -w=<path to output dir> -m=run
```

which will produce something like this:
Expand Down Expand Up @@ -273,7 +244,7 @@ Running...
In this example, `14743440` is the jobid returned by the slurm job scheduler on biowulf. This means that the job was successfully submitted, it will spawn off other subjobs which in-turn will be run and outputs will be moved to the `results` folder created inside the working directory supplied at command line. You can check the status of your queue of jobs in biowulf running:

```bash
% squeue -u `whoami`
squeue -u `whoami`
```

output:
Expand All @@ -290,7 +261,7 @@ Next, just sit tight until the pipeline finishes. You can keep monitoring the qu
Once completed the output should something like this:

```bash
% tree <path to output dir>
tree <path to output dir>
```

output:
Expand Down