This repository has been archived by the owner on Oct 2, 2020. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 62
[WIP] CNVkit tool definitions #93
Open
anton-khodak
wants to merge
10
commits into
common-workflow-library:master
Choose a base branch
from
anton-khodak:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
f205c3f
Initial incomplete cnvkit-batch tool definition
1de8a71
Fix cnvkit-batch job
467076d
Fix cnvkit-batch job once again
54a3847
remove stdout from test file
4a0d359
confused letters
5da14f6
Add a few more tools
9d8fec8
Update to CWL v.1.0
1065d24
Delete tools without jobs
375acb2
Remove all tools except from cnvkit
5ce1aeb
Revert deleting files
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"bam_files": [ | ||
"*Tumor.bam" | ||
], | ||
"normal":[ | ||
"*Normal.bam" | ||
], | ||
"targets": "my_baits.bed", | ||
"split": true, | ||
"annotate": "refFlat.txt", | ||
"fasta": "hg19.fasta", | ||
"access": "data/access-5kb-mappable.hg19.bed", | ||
"output_dir": "results/", | ||
"diagram": true, | ||
"scatter": true | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
- args: [ | ||
"cnvkit.py", | ||
"batch", | ||
"--access", "data/access-5kb-mappable.hg19.bed", | ||
"--annotate", "refFlat.txt", | ||
"--diagram", | ||
"--fasta", "hg19.fasta", | ||
"--normal", "*Normal.bam", | ||
"--output-dir", "results/", | ||
"--processes", "1", | ||
"--scatter", | ||
"--split", | ||
"--targets", "my_baits.bed", | ||
"*Tumor.bam", | ||
] | ||
job: cnvkit-batch-job.json | ||
tool: ../tools/cnvkit-batch.cwl | ||
doc: General test of command line generation |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
{ | ||
"segment": "segment.cns", | ||
"chromosome": "chr1", | ||
"split": true, | ||
"gene": "gen1, gen2", | ||
"range_list": "chr -start-end", | ||
"sample_id": "data/access-5kb-mappable.hg19.bed", | ||
"vcf": "data.vcf", | ||
"y_min": 3.04, | ||
"y_max": 4.04, | ||
"trend": true, | ||
"output": "result.txt" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
- args: [ | ||
"cnvkit.py", | ||
"scatter", | ||
"--chromosome", "chr1", | ||
"--gene", "gen1, gen2", | ||
"--min-variant-depth", "20", | ||
"--output", "result.txt", | ||
"--range-list", "chr -start-end", | ||
"--sample-id", "data/access-5kb-mappable.hg19.bed", | ||
"--segment", "segment.cns", | ||
"--trend", | ||
"--vcf", "data.vcf", | ||
"--width", "1000000.0", | ||
"--y-max", "4.04", | ||
"--y-min", "3.04", | ||
] | ||
job: cnvkit-scatter-job.json | ||
tool: ../tools/cnvkit-scatter.cwl | ||
doc: General test of command line generation |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
{ | ||
"cnarray": "*Tumor.bam", | ||
"segments": "*Normal.cns", | ||
"drop_low_coverage": true, | ||
"output": "results/result.txt", | ||
"stdev": true, | ||
"mad": true, | ||
"pi": true | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
- args: [ | ||
"cnvkit.py", | ||
"segmetrics", | ||
"--alpha", "0.05", | ||
"--bootstrap", "100", | ||
"--drop-low-coverage", | ||
"--mad", | ||
"--output", "results/result.txt", | ||
"--pi", | ||
"--segments", "*Normal.cns", | ||
"--stdev", | ||
"*Tumor.bam" | ||
] | ||
job: cnvkit-segmetrics-job.json | ||
tool: ../tools/cnvkit-segmetrics.cwl | ||
doc: General test of command line generation |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
{ | ||
"interval": "*Tumor.bam", | ||
"annotate": "refFlat.txt", | ||
"avg_size": 33, | ||
"output": "results.json", | ||
"short_names": true, | ||
"split": true | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
- args: [ | ||
"cnvkit.py", | ||
"target", | ||
"--annotate", | ||
"refFlat.txt", | ||
"--avg-size", "33", | ||
"--output", "results.json", | ||
"--short-names", | ||
"--split", | ||
"*Tumor.bam" | ||
] | ||
job: cnvkit-target-job.json | ||
tool: ../tools/cnvkit-target.cwl | ||
doc: General test of command line generation |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
command from cnvkit batch tutorial (https://cnvkit.readthedocs.io/en/v0.7.11/pipeline.html#batch) I'm trying to run | ||
|
||
|
||
cnvkit.py batch *Tumor.bam --normal *Normal.bam \ | ||
--targets my_baits.bed --split --annotate refFlat.txt \ | ||
--fasta hg19.fasta --access data/access-5kb-mappable.hg19.bed \ | ||
--output-reference my_reference.cnn --output-dir results/ \ | ||
--diagram --scatter |
Large diffs are not rendered by default.
Oops, something went wrong.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,160 @@ | ||
#!/usr/bin/env cwl-runner | ||
# This tool description was generated automatically by argparse2cwl ver. 0.2.8 | ||
# To generate again: $ cnvkit.py --generate_cwl_tool | ||
# Help: $ cnvkit.py --help_arg2cwl | ||
|
||
cwlVersion: "cwl:v1.0" | ||
|
||
class: CommandLineTool | ||
baseCommand: ['cnvkit.py', 'batch'] | ||
|
||
description: | | ||
Run the complete CNVkit pipeline on one or more BAM files. | ||
|
||
inputs: | ||
|
||
bam_files: | ||
type: | ||
- "null" | ||
- type: array | ||
items: string | ||
|
||
description: Mapped sequence reads (.bam) | ||
inputBinding: | ||
position: 1 | ||
|
||
male_reference: | ||
type: ["null", boolean] | ||
default: False | ||
description: Use or assume a male reference (i.e. female samples will have +1 log-CNR of chrX; otherwise male samples would have -1 chrX). | ||
inputBinding: | ||
prefix: --male-reference | ||
|
||
count_reads: | ||
type: ["null", boolean] | ||
default: False | ||
description: Get read depths by counting read midpoints within each bin. (An alternative algorithm). | ||
inputBinding: | ||
prefix: --count-reads | ||
|
||
processes: | ||
type: ["null", int] | ||
default: 1 | ||
description: Number of subprocesses used to running each of the BAM files in parallel. Give 0 or a negative value to use the maximum number of available CPUs. [Default - process each BAM in serial] | ||
inputBinding: | ||
prefix: --processes | ||
|
||
rlibpath: | ||
type: ["null", string] | ||
description: Path to an alternative site-library to use for R packages. | ||
inputBinding: | ||
prefix: --rlibpath | ||
|
||
normal: | ||
type: | ||
- "null" | ||
- type: array | ||
items: string | ||
|
||
description: Normal samples (.bam) to construct the pooled reference. If this option is used but no files are given, a "flat" reference will be built. | ||
inputBinding: | ||
prefix: --normal | ||
|
||
fasta: | ||
type: ["null", string] | ||
description: Reference genome, FASTA format (e.g. UCSC hg19.fa) | ||
inputBinding: | ||
prefix: --fasta | ||
|
||
targets: | ||
type: ["null", string] | ||
description: Target intervals (.bed or .list) | ||
inputBinding: | ||
prefix: --targets | ||
|
||
antitargets: | ||
type: ["null", string] | ||
description: Antitarget intervals (.bed or .list) | ||
inputBinding: | ||
prefix: --antitargets | ||
|
||
annotate: | ||
type: ["null", string] | ||
description: UCSC refFlat.txt or ensFlat.txt file for the reference genome. Pull gene names from this file and assign them to the target regions. | ||
inputBinding: | ||
prefix: --annotate | ||
|
||
short_names: | ||
type: ["null", boolean] | ||
default: False | ||
description: Reduce multi-accession bait labels to be short and consistent. | ||
inputBinding: | ||
prefix: --short-names | ||
|
||
split: | ||
type: ["null", boolean] | ||
default: False | ||
description: Split large tiled intervals into smaller, consecutive targets. | ||
inputBinding: | ||
prefix: --split | ||
|
||
target_avg_size: | ||
type: ["null", int] | ||
description: Average size of split target bins (results are approximate). | ||
inputBinding: | ||
prefix: --target-avg-size | ||
|
||
access: | ||
type: ["null", string] | ||
description: Regions of accessible sequence on chromosomes (.bed), as output by the 'access' command. | ||
inputBinding: | ||
prefix: --access | ||
|
||
antitarget_avg_size: | ||
type: ["null", int] | ||
description: Average size of antitarget bins (results are approximate). | ||
inputBinding: | ||
prefix: --antitarget-avg-size | ||
|
||
antitarget_min_size: | ||
type: ["null", int] | ||
description: Minimum size of antitarget bins (smaller regions are dropped). | ||
inputBinding: | ||
prefix: --antitarget-min-size | ||
|
||
output_reference: | ||
type: ["null", string] | ||
description: Output filename/path for the new reference file being created. (If given, ignores the -o/--output-dir option and will write the file to the given path. Otherwise, "reference.cnn" will be created in the current directory or specified output directory.) | ||
inputBinding: | ||
prefix: --output-reference | ||
|
||
reference: | ||
type: ["null", string] | ||
description: Copy number reference file (.cnn). | ||
inputBinding: | ||
prefix: --reference | ||
|
||
output_dir: | ||
type: ["null", string] | ||
default: . | ||
description: Output directory. | ||
inputBinding: | ||
prefix: --output-dir | ||
|
||
scatter: | ||
type: ["null", boolean] | ||
default: False | ||
description: Create a whole-genome copy ratio profile as a PDF scatter plot. | ||
inputBinding: | ||
prefix: --scatter | ||
|
||
diagram: | ||
type: ["null", boolean] | ||
default: False | ||
description: Create a diagram of copy ratios on chromosomes as a PDF. | ||
inputBinding: | ||
prefix: --diagram | ||
|
||
|
||
outputs: | ||
[] | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several outputs from this command and they vary based on the input BAM filenames and the options given.
--scatter
option is given, then for each tumor/test sample, "Sample-scatter.pdf" is created--diagram
option creates "Sample-diagram.pdf"-d/--output-dir
is specified, the created file names are relative to (i.e. in) that specified directory-r/--reference
option is not given, then a.cnn
file is created either with the filename given by--output-reference
(regardless of the-d/--output-dir
path) or by default "cnv_reference.cnn"