Skip to content

Commit

Permalink
Merge pull request #102 from CCBR/iss-99
Browse files Browse the repository at this point in the history
create Docker containers and remove envmodules
  • Loading branch information
kelly-sovacool authored Sep 12, 2024
2 parents e359e03 + 65b73ad commit 928e8a5
Show file tree
Hide file tree
Showing 59 changed files with 1,525 additions and 1,600 deletions.
39 changes: 39 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: build

on:
push:
branches:
- master
- main
- develop
pull_request:

jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.11"]
snakemake-version: ["7.32.3"]
steps:
- uses: actions/checkout@v4
- uses: mamba-org/setup-micromamba@v1
with:
environment-name: build
cache-environment: true
condarc: |
channels:
- conda-forge
- bioconda
create-args: >-
python=${{ matrix.python-version }}
snakemake=${{ matrix.snakemake-version }}
setuptools
pip
pytest
- name: Test
run: |
python -m pytest
env:
TMPDIR: ${{ runner.temp }}
shell: micromamba-shell {0}
5 changes: 0 additions & 5 deletions .github/workflows/docker-auto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,6 @@ on:
- main
paths:
- "docker/**"
pull_request:
branches:
- main
paths:
- "docker/**"

jobs:
generate-matrix:
Expand Down
14 changes: 14 additions & 0 deletions .github/workflows/projects.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
name: Add issues/PRs to user projects

on:
issues:
types:
- assigned
pull_request:
types:
- assigned

jobs:
add-to-project:
uses: CCBR/.github/.github/workflows/auto-add-user-project.yml@v0.1.0
secrets: inherit
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,5 @@ resources/mouse_mm9_circRNAs_putative_spliced_sequence.fa.gz
workflow/scripts/copy_strand_info.py
.tests/lint_workdir
**/tmp*
**__pycache__/**
*.pyc
75 changes: 75 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# CHARLIE development version

- Major updates to convert CHARLIE from a biowulf-specific to a platform-agnostic pipeline (#102, @kelly-sovacool):
- All rules now use containers instead of envmodules.
- Default config and cluster config files are provided for use on biowulf and FRCE.
- New entry `TEMPDIR` in the config file sets the temporary directory location for rules that require transient storage.
- New `--singcache` argument to provide a singularity cache dir location. The singularity cache dir is automatically set inside `/data/$USER/` or `$WORKDIR/` if `--singcache` is not provided.

# CHARLIE 0.10.1

- strand are reported together, strand from all callers are reported,
- both + and - flanking sites are reported,
- rev-comp function updated,
- updated versions of tools to match available tools on BIOWULF.

# CHARLIE 0.9.0

Significant upgrades since the last release:

- updates to wrapper script, many new arguments/options added
- new per-sample counts table format
- new all-sample master counts matrix with min-nreads filtering and ntools column to show number of tools supporting the circRNA call
- new version of Snakemake
- cluster_status script added for forced completion of pipeline upon TIMEOUTs
- updated flowchart from lucid charts
- added circRNAfinder, find_circ, circExplorer2_bwa and other tools
- optimized execution and resource requirements
- updated viral annotations (Thanks Sara!)
- new method to extract linear counts, create linear BAMs using circExplorer2 outputs
- new job reporting using jobby and its derivatives
- separated creation of BWA and BOWTIE2 index from creation of STAR index to speed things up
- parallelized find_circ
- better cleanup (eg. deleting \_STARgenome folders, etc.) for much smaller digital footprint
- multitude of comments throughout the snakefiles including listing of output file column descriptions
- preliminary GH actions added

# CHARLIE 0.7.0

- 5 circRNA callers
- all-sample counts matrix with annotations

# CHARLIE 0.6.9

- Optimized pysam scripts
- fixed premature completion of singularity rules

# CHARLIE 0.6.5

- updated config.yaml to use the latest HSV-1 annotations received from Sarah (050421)

# CHARLIE 0.6.4

- create linear reads BAM file
- create linear reads BigWigs for each region in the .regions file.

# CHARLIE 0.6.3

- QOS not working for Taka... removed from cluster.json
- recall rule requires python/3.7 ... env module updated

# CHARLIE 0.6.2

- BSJ files are in BSJ subfolder... bug fix for v0.6.1

# CHARLIE 0.6.1

- customBSJs recalled from STAR alignments
- only for PE
- removes erroneously called CircExplorer BSJs
- create sense and anti-sense BSJ BAMs and BW for each reference (host+viruses)
- find reads which contribute to CIRI BSJs but not on the STAR list of BSJ reads, see if they contribute to novel (not called by STAR) BSJs and append novel BSJs to customBSJ list

# CHARLIE 0.6.0

cutadapt_min_length to cutadapt rule... setting it to 15 in config (for miRNAs, Biot and short viral features)
5 changes: 5 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,11 @@ authors:
orcid: https://orcid.org/0000-0001-8978-8495
affiliation: Advanced Biomedical Computational Science, Frederick National Laboratory
for Cancer Research, Frederick, MD 21702, USA
- family-names: Sovacool
given-names: Kelly
orcid: https://orcid.org/0000-0003-3283-829X
affiliation: Advanced Biomedical Computational Science, Frederick National Laboratory
for Cancer Research, Frederick, MD 21702, USA
title: 'CHARLIE: Circrnas in Host And viRuses anaLysis pIpEline'
url: https://ccbr.github.io/CHARLIE/
repository-code: https://github.com/CCBR/CHARLIE
Expand Down
108 changes: 55 additions & 53 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,50 @@
# CHARLIE
![img](https://img.shields.io/github/issues/CCBR/CHARLIE?style=for-the-badge)![img](https://img.shields.io/github/forks/CCBR/CHARLIE?style=for-the-badge)![img](https://img.shields.io/github/stars/CCBR/CHARLIE?style=for-the-badge)![img](https://img.shields.io/github/license/CCBR/CHARLIE?style=for-the-badge)

![img](https://img.shields.io/github/issues/CCBR/CHARLIE?style=for-the-badge)![img](https://img.shields.io/github/forks/CCBR/CHARLIE?style=for-the-badge)![img](https://img.shields.io/github/stars/CCBR/CHARLIE?style=for-the-badge)![img](https://img.shields.io/github/license/CCBR/CHARLIE?style=for-the-badge)

### Table of Contents

- [CHARLIE - **C**ircrnas in **H**ost **A**nd vi**R**uses ana**L**ysis p**I**p**E**line](#charlie)
- [Table of Contents](#table-of-contents)
- [1. Introduction](#1-introduction)
- [2. Flowchart](#2-flowchart)
- [3. Software Dependencies](#3-software-dependencies)
- [4. Usage](#4-usage)
- [5. License](#5-license)
- [6. Testing](#6-testing)
- [6.1 Test data](#61-test-data)
- [6.2 Expected output](#62-expected-output)
- [Table of Contents](#table-of-contents)
- [1. Introduction](#1-introduction)
- [2. Flowchart](#2-flowchart)
- [3. Software Dependencies](#3-software-dependencies)
- [4. Usage](#4-usage)
- [5. License](#5-license)
- [6. Testing](#6-testing)
- [6.1 Test data](#61-test-data)
- [6.2 Expected output](#62-expected-output)

### 1. Introduction

**C**ircrnas in **H**ost **A**nd vi**R**uses ana**L**ysis p**I**p**E**line
**C**ircrnas in **H**ost **A**nd vi**R**uses ana**L**ysis p**I**p**E**line

Things to know about CHARLIE:

- Snakemake workflow to detect, annotate and quantify (DAQ) host and viral circular RNAs.
- Primirarily developed to run on [BIOWULF](https://hpc.nih.gov/)
- Reach out to [Vishal Koparde](mailto:vishal.koparde@nihgov) for questions/comments/requests.


This circularRNA detection pipeline uses CIRCExplorer2, CIRI2 and many other tools in parallel to detect, quantify and annotate circRNAs. Here is a list of tools that can be run using CHARLIE:

| circRNA Detection Tool | Aligner(s) | Run by default |
| ---------------------- | ---------- | -------------- |
| [CIRCExplorer2](https://github.com/YangLab/CIRCexplorer2) | STAR<sup>1</sup> | Yes |
| [CIRI2](https://sourceforge.net/projects/ciri/files/CIRI2/) | BWA<sup>1</sup> | Yes |
| [CIRCExplorer2](https://github.com/YangLab/CIRCexplorer2) | BWA<sup>1</sup> | Yes |
| [CLEAR](https://github.com/YangLab/CLEAR) | STAR<sup>1</sup> | Yes |
| [DCC](https://github.com/dieterich-lab/DCC) | STAR<sup>2</sup> | Yes |
| [circRNAFinder](https://github.com/bioxfu/circRNAFinder) | STAR<sup>3</sup> | Yes |
| [find_circ](https://github.com/marvin-jens/find_circ) | Bowtie2 | Yes |
| [MapSplice](https://github.com/merckey/MapSplice2) | BWA<sup>2</sup> | No |
| [NCLScan](https://github.com/TreesLab/NCLscan) | NovoAlign | No |
| circRNA Detection Tool | Aligner(s) | Run by default |
| ----------------------------------------------------------- | ---------------- | -------------- |
| [CIRCExplorer2](https://github.com/YangLab/CIRCexplorer2) | STAR<sup>1</sup> | Yes |
| [CIRI2](https://sourceforge.net/projects/ciri/files/CIRI2/) | BWA<sup>1</sup> | Yes |
| [CIRCExplorer2](https://github.com/YangLab/CIRCexplorer2) | BWA<sup>1</sup> | Yes |
| [CLEAR](https://github.com/YangLab/CLEAR) | STAR<sup>1</sup> | Yes |
| [DCC](https://github.com/dieterich-lab/DCC) | STAR<sup>2</sup> | Yes |
| [circRNAFinder](https://github.com/bioxfu/circRNAFinder) | STAR<sup>3</sup> | Yes |
| [find_circ](https://github.com/marvin-jens/find_circ) | Bowtie2 | Yes |
| [MapSplice](https://github.com/merckey/MapSplice2) | BWA<sup>2</sup> | No |
| [NCLScan](https://github.com/TreesLab/NCLscan) | NovoAlign | No |

> Note: STAR<sup>1</sup>, STAR<sup>2</sup>, STAR<sup>3</sup> denote 3 different sets of alignment parameters, etc.
> Note: BWA<sup>1</sup>, BWA<sup>2</sup> denote 2 different alignment parameters, etc.
### 2. Flowchart

![](docs/images/CHARLIE_v0.8.x.png)

For complete documentation with tutorial go [here](https://CCBR.github.io/CHARLIE/).
Expand All @@ -54,33 +55,32 @@ For complete documentation with tutorial go [here](https://CCBR.github.io/CHARLI

The following version of various bioinformatics tools are using within CHARLIE:

| tool | version |
| ------------- | --------- |
| blat | 3.5 |
| bedtools | 2.30.0 |
| bowtie | 2-2.5.1 |
| bowtie | 1.3.1 |
| bwa | 0.7.17 |
| circexplorer2 | 2.3.8 |
| cufflinks | 2.2.1 |
| cutadapt | 4.4 |
| fastqc | 0.11.9 |
| hisat | 2.2.2.1 |
| java | 18.0.1.1 |
| multiqc | 1.9 |
| parallel | 20231122 |
| perl | 5.34 |
| picard | 2.27.3 |
| python | 2.7 |
| python | 3.8 |
| sambamba | 0.8.2 |
| samtools | 1.16.1 |
| STAR | 2.7.6a |
| stringtie | 2.2.1 |
| ucsc | 450 |
| R | 4.0.5 |
| novocraft | 4.03.05 |

| tool | version |
| ------------- | -------- |
| blat | 3.5 |
| bedtools | 2.30.0 |
| bowtie | 2-2.5.1 |
| bowtie | 1.3.1 |
| bwa | 0.7.17 |
| circexplorer2 | 2.3.8 |
| cufflinks | 2.2.1 |
| cutadapt | 4.4 |
| fastqc | 0.11.9 |
| hisat | 2.2.2.1 |
| java | 18.0.1.1 |
| multiqc | 1.9 |
| parallel | 20231122 |
| perl | 5.34 |
| picard | 2.27.3 |
| python | 2.7 |
| python | 3.8 |
| sambamba | 0.8.2 |
| samtools | 1.16.1 |
| STAR | 2.7.6a |
| stringtie | 2.2.1 |
| ucsc | 450 |
| R | 4.0.5 |
| novocraft | 4.03.05 |

### 4. Usage

Expand Down Expand Up @@ -155,6 +155,7 @@ Required Arguments:

Optional Arguments:

--singcache|-c : singularity cache directory. Default is `/data/${USER}/.singularity` if available, or falls back to `${WORKDIR}/.singularity`. Use this flag to specify a different singularity cache directory.
--host|-g : supply host at command line. hg38 or mm39. (--runmode=init only)
--additives|-a : supply comma-separated list of additives at command line. ERCC or BAC16Insert or both (--runmode=init only)
--viruses|-v : supply comma-separated list of viruses at command line (--runmode=init only)
Expand Down Expand Up @@ -219,8 +220,8 @@ This will create the folder provided by `-w=`. The user should have write permis
Test data (1 paired-end subsample and 1 single-end subsample) have been including under the `.tests/dummy_fastqs` folder. After running in `-m=init`, `samples.tsv` should be edited to point the copies of the above mentioned samples with the column headers:
- sampleName
- path_to_R1_fastq
- sampleName
- path_to_R1_fastq
- path_to_R2_fastq
Column `path_to_R2_fastq` will be blank in case of single-end samples.
Expand All @@ -234,6 +235,7 @@ bash <path to charlie> -w=<path to output dir> -m=dryrun
This will create the reference fasta and gtf file based on the selections made in the `config.yaml`.
#### Run
If `-m=dryrun` was sucessful, then simply do `-m=run`. The output will look something like this
```
Expand Down Expand Up @@ -307,5 +309,5 @@ Expected output from the sample data is stored under `.tests/expected_output`.
More details about running test data can be found [here](https://ccbr.github.io/CHARLIE/tutorial).
> DISCLAIMER:
>
>
> CHARLIE is built to be run only on [BIOWULF](https://hpc.nih.gov). A newer HPC-agnostic version of CHARLIE is planned for 2024.
1 change: 1 addition & 0 deletions VERSION
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0.10.1-dev
Loading

0 comments on commit 928e8a5

Please sign in to comment.