Skip to content
This repository has been archived by the owner on Oct 2, 2020. It is now read-only.

[WIP] CNVkit tool definitions #93

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions test/cnvkit-batch-job.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"bam_files": [
"*Tumor.bam"
],
"normal":[
"*Normal.bam"
],
"targets": "my_baits.bed",
"split": true,
"annotate": "refFlat.txt",
"fasta": "hg19.fasta",
"access": "data/access-5kb-mappable.hg19.bed",
"output_dir": "results/",
"diagram": true,
"scatter": true
}
18 changes: 18 additions & 0 deletions test/cnvkit-batch-test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
- args: [
"cnvkit.py",
"batch",
"--access", "data/access-5kb-mappable.hg19.bed",
"--annotate", "refFlat.txt",
"--diagram",
"--fasta", "hg19.fasta",
"--normal", "*Normal.bam",
"--output-dir", "results/",
"--processes", "1",
"--scatter",
"--split",
"--targets", "my_baits.bed",
"*Tumor.bam",
]
job: cnvkit-batch-job.json
tool: ../tools/cnvkit-batch.cwl
doc: General test of command line generation
13 changes: 13 additions & 0 deletions test/cnvkit-scatter-job.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"segment": "segment.cns",
"chromosome": "chr1",
"split": true,
"gene": "gen1, gen2",
"range_list": "chr -start-end",
"sample_id": "data/access-5kb-mappable.hg19.bed",
"vcf": "data.vcf",
"y_min": 3.04,
"y_max": 4.04,
"trend": true,
"output": "result.txt"
}
19 changes: 19 additions & 0 deletions test/cnvkit-scatter-test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
- args: [
"cnvkit.py",
"scatter",
"--chromosome", "chr1",
"--gene", "gen1, gen2",
"--min-variant-depth", "20",
"--output", "result.txt",
"--range-list", "chr -start-end",
"--sample-id", "data/access-5kb-mappable.hg19.bed",
"--segment", "segment.cns",
"--trend",
"--vcf", "data.vcf",
"--width", "1000000.0",
"--y-max", "4.04",
"--y-min", "3.04",
]
job: cnvkit-scatter-job.json
tool: ../tools/cnvkit-scatter.cwl
doc: General test of command line generation
9 changes: 9 additions & 0 deletions test/cnvkit-segmetrics-job.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"cnarray": "*Tumor.bam",
"segments": "*Normal.cns",
"drop_low_coverage": true,
"output": "results/result.txt",
"stdev": true,
"mad": true,
"pi": true
}
16 changes: 16 additions & 0 deletions test/cnvkit-segmetrics-test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
- args: [
"cnvkit.py",
"segmetrics",
"--alpha", "0.05",
"--bootstrap", "100",
"--drop-low-coverage",
"--mad",
"--output", "results/result.txt",
"--pi",
"--segments", "*Normal.cns",
"--stdev",
"*Tumor.bam"
]
job: cnvkit-segmetrics-job.json
tool: ../tools/cnvkit-segmetrics.cwl
doc: General test of command line generation
8 changes: 8 additions & 0 deletions test/cnvkit-target-job.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"interval": "*Tumor.bam",
"annotate": "refFlat.txt",
"avg_size": 33,
"output": "results.json",
"short_names": true,
"split": true
}
14 changes: 14 additions & 0 deletions test/cnvkit-target-test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
- args: [
"cnvkit.py",
"target",
"--annotate",
"refFlat.txt",
"--avg-size", "33",
"--output", "results.json",
"--short-names",
"--split",
"*Tumor.bam"
]
job: cnvkit-target-job.json
tool: ../tools/cnvkit-target.cwl
doc: General test of command line generation
8 changes: 8 additions & 0 deletions test/test-files/cnvkit-batch/draft.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
command from cnvkit batch tutorial (https://cnvkit.readthedocs.io/en/v0.7.11/pipeline.html#batch) I'm trying to run


cnvkit.py batch *Tumor.bam --normal *Normal.bam \
--targets my_baits.bed --split --annotate refFlat.txt \
--fasta hg19.fasta --access data/access-5kb-mappable.hg19.bed \
--output-reference my_reference.cnn --output-dir results/ \
--diagram --scatter
58,939 changes: 58,939 additions & 0 deletions test/test-files/cnvkit-batch/refFlat.txt

Large diffs are not rendered by default.

Empty file modified tools/GATK-BaseRecalibrator.cwl
100755 → 100644
Empty file.
Empty file modified tools/GATK-FastaAlternateReferenceMaker.cwl
100755 → 100644
Empty file.
Empty file modified tools/GATK-HaplotypeCaller.cwl
100755 → 100644
Empty file.
Empty file modified tools/GATK-IndelRealigner.cwl
100755 → 100644
Empty file.
Empty file modified tools/GATK-PrintReads.cwl
100755 → 100644
Empty file.
Empty file modified tools/GATK-RealignTargetCreator.cwl
100755 → 100644
Empty file.
Empty file modified tools/STAR.cwl
100755 → 100644
Empty file.
Empty file modified tools/alea-alignReads.cwl
100755 → 100644
Empty file.
Empty file modified tools/alea-createGenome.cwl
100755 → 100644
Empty file.
Empty file modified tools/alea-insilico.cwl
100755 → 100644
Empty file.
Empty file modified tools/alea-phaseVCF.cwl
100755 → 100644
Empty file.
Empty file modified tools/bcftools-concat.cwl
100755 → 100644
Empty file.
Empty file modified tools/bcftools-consensus.cwl
100755 → 100644
Empty file.
Empty file modified tools/bedtools-genomecov.cwl
100755 → 100644
Empty file.
Empty file modified tools/bowtie.cwl
100755 → 100644
Empty file.
Empty file modified tools/bwa-aln.cwl
100755 → 100644
Empty file.
Empty file modified tools/bwa-index.cwl
100755 → 100644
Empty file.
Empty file modified tools/bwa-mem.cwl
100755 → 100644
Empty file.
160 changes: 160 additions & 0 deletions tools/cnvkit-batch.cwl
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
#!/usr/bin/env cwl-runner
# This tool description was generated automatically by argparse2cwl ver. 0.2.8
# To generate again: $ cnvkit.py --generate_cwl_tool
# Help: $ cnvkit.py --help_arg2cwl

cwlVersion: "cwl:v1.0"

class: CommandLineTool
baseCommand: ['cnvkit.py', 'batch']

description: |
Run the complete CNVkit pipeline on one or more BAM files.

inputs:

bam_files:
type:
- "null"
- type: array
items: string

description: Mapped sequence reads (.bam)
inputBinding:
position: 1

male_reference:
type: ["null", boolean]
default: False
description: Use or assume a male reference (i.e. female samples will have +1 log-CNR of chrX; otherwise male samples would have -1 chrX).
inputBinding:
prefix: --male-reference

count_reads:
type: ["null", boolean]
default: False
description: Get read depths by counting read midpoints within each bin. (An alternative algorithm).
inputBinding:
prefix: --count-reads

processes:
type: ["null", int]
default: 1
description: Number of subprocesses used to running each of the BAM files in parallel. Give 0 or a negative value to use the maximum number of available CPUs. [Default - process each BAM in serial]
inputBinding:
prefix: --processes

rlibpath:
type: ["null", string]
description: Path to an alternative site-library to use for R packages.
inputBinding:
prefix: --rlibpath

normal:
type:
- "null"
- type: array
items: string

description: Normal samples (.bam) to construct the pooled reference. If this option is used but no files are given, a "flat" reference will be built.
inputBinding:
prefix: --normal

fasta:
type: ["null", string]
description: Reference genome, FASTA format (e.g. UCSC hg19.fa)
inputBinding:
prefix: --fasta

targets:
type: ["null", string]
description: Target intervals (.bed or .list)
inputBinding:
prefix: --targets

antitargets:
type: ["null", string]
description: Antitarget intervals (.bed or .list)
inputBinding:
prefix: --antitargets

annotate:
type: ["null", string]
description: UCSC refFlat.txt or ensFlat.txt file for the reference genome. Pull gene names from this file and assign them to the target regions.
inputBinding:
prefix: --annotate

short_names:
type: ["null", boolean]
default: False
description: Reduce multi-accession bait labels to be short and consistent.
inputBinding:
prefix: --short-names

split:
type: ["null", boolean]
default: False
description: Split large tiled intervals into smaller, consecutive targets.
inputBinding:
prefix: --split

target_avg_size:
type: ["null", int]
description: Average size of split target bins (results are approximate).
inputBinding:
prefix: --target-avg-size

access:
type: ["null", string]
description: Regions of accessible sequence on chromosomes (.bed), as output by the 'access' command.
inputBinding:
prefix: --access

antitarget_avg_size:
type: ["null", int]
description: Average size of antitarget bins (results are approximate).
inputBinding:
prefix: --antitarget-avg-size

antitarget_min_size:
type: ["null", int]
description: Minimum size of antitarget bins (smaller regions are dropped).
inputBinding:
prefix: --antitarget-min-size

output_reference:
type: ["null", string]
description: Output filename/path for the new reference file being created. (If given, ignores the -o/--output-dir option and will write the file to the given path. Otherwise, "reference.cnn" will be created in the current directory or specified output directory.)
inputBinding:
prefix: --output-reference

reference:
type: ["null", string]
description: Copy number reference file (.cnn).
inputBinding:
prefix: --reference

output_dir:
type: ["null", string]
default: .
description: Output directory.
inputBinding:
prefix: --output-dir

scatter:
type: ["null", boolean]
default: False
description: Create a whole-genome copy ratio profile as a PDF scatter plot.
inputBinding:
prefix: --scatter

diagram:
type: ["null", boolean]
default: False
description: Create a diagram of copy ratios on chromosomes as a PDF.
inputBinding:
prefix: --diagram


outputs:
[]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several outputs from this command and they vary based on the input BAM filenames and the options given.

  • For each tumor/test-sample BAM named e.g. Sample.bam, the outputs are: "Sample.targetcoverage.cnn", "Sample.antitargetcoverage.cnn", "Sample.cnr", "Sample.cns"
  • If the --scatter option is given, then for each tumor/test sample, "Sample-scatter.pdf" is created
  • Similarly, the --diagram option creates "Sample-diagram.pdf"
  • For all of the above, if -d/--output-dir is specified, the created file names are relative to (i.e. in) that specified directory
  • If the -r/--reference option is not given, then a .cnn file is created either with the filename given by --output-reference (regardless of the -d/--output-dir path) or by default "cnv_reference.cnn"

Loading