-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2 from nvnieuwk/dev
Version 0.1.0
- Loading branch information
Showing
22 changed files
with
1,467 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# This workflow will build a golang project | ||
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-go | ||
|
||
name: Go | ||
|
||
on: | ||
push: | ||
branches: [ "main", "dev" ] | ||
pull_request: | ||
branches: [ "main", "dev" ] | ||
|
||
jobs: | ||
|
||
build: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v3 | ||
|
||
- name: Set up Go | ||
uses: actions/setup-go@v4 | ||
with: | ||
go-version: '1.21' | ||
|
||
- name: Install dependencies | ||
run: go get . | ||
|
||
- name: Build | ||
run: go build -v ./... | ||
|
||
- name: Test | ||
run: go test -v ./... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
on: | ||
release: | ||
types: [created] | ||
|
||
permissions: | ||
contents: write | ||
packages: write | ||
|
||
jobs: | ||
release-bedgovcf: | ||
name: release bedgovcf ${{ matrix.goos }}_${{ matrix.goarch }} | ||
runs-on: ubuntu-latest | ||
strategy: | ||
matrix: | ||
goos: [linux, darwin] | ||
goarch: [amd64, arm64] | ||
exclude: | ||
- goos: linux | ||
goarch: arm64 | ||
|
||
steps: | ||
- uses: actions/checkout@v3 | ||
- uses: wangyoucao577/go-release-action@v1 | ||
with: | ||
github_token: ${{ secrets.GITHUB_TOKEN }} | ||
goos: ${{ matrix.goos }} | ||
goarch: ${{ matrix.goarch }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,61 @@ | ||
# svync | ||
⚠️ This tool is still under development, please check back in the future ⚠️ | ||
Svync is a tool designed to synchronize structural variant calls from different callers. It uses YAML configs to define how to handle the standardization. | ||
|
||
## Usage | ||
```bash | ||
svync --config <config.yaml> --input <input.vcf> | ||
``` | ||
|
||
### Arguments | ||
#### Required | ||
| Argument | Description | | ||
| --- | --- | | ||
| `--config`/`-c` | Path to the YAML config file | | ||
| `--input`/`-i` | Path to the input VCF file | | ||
|
||
#### Optional | ||
| Argument | Description | Default | | ||
| --- | --- | --- | | ||
| `--output`/`-o` | Path to the output VCF file | `stdout` | | ||
| `--nodate`/`--nd` | Do not add the date to the output VCF file | `false` | | ||
| `--mute-warnings`/`--mw` | Do not output warnings | `false` | | ||
|
||
## Configuration | ||
The configuration file is the core of the standardization in Svync. More information can be found in the [configuration documentation](docs/configuration.md). | ||
|
||
|
||
## Installation | ||
### Mamba/Conda | ||
This is the preffered way of installing BedGoVcf. | ||
|
||
```bash | ||
mamba install -c bioconda bedgovcf | ||
``` | ||
|
||
or with conda: | ||
|
||
```bash | ||
conda install -c bioconda bedgovcf | ||
``` | ||
|
||
### Precompiled binaries | ||
Precompiled binaries are available for Linux and macOS on the [releases page](https://github.com/nvnieuwk/svync/releases). | ||
|
||
|
||
### Installation from source | ||
Make sure you have go installed on your machine (or [install](https://go.dev/doc/install) it if you don't currently have it) | ||
|
||
Then run these commands to install bedgovcf: | ||
|
||
```bash | ||
go get . | ||
go build . | ||
sudo mv bedgovcf /usr/local/bin/ | ||
``` | ||
|
||
Next run this command to check if it was correctly installed: | ||
|
||
```bash | ||
bedgovcf --help | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# Test config for Delly SV caller | ||
id: "delly_$INFO/SVTYPE" | ||
alt: | ||
BND: TRA | ||
info: | ||
CALLER: | ||
value: delly | ||
description: SV caller | ||
number: 1 | ||
type: string | ||
TEST: | ||
value: $INFO/END,$INFO/CIEND/1 | ||
description: Test info field | ||
number: 2 | ||
type: integer | ||
SVLEN: | ||
value: ~sub:$INFO/END,$POS | ||
description: SV length | ||
number: 2 | ||
type: integer | ||
alts: | ||
DEL: -~sub:$INFO/END,$POS | ||
INS: $INFO/INSLEN | ||
format: | ||
PE: | ||
value: $FORMAT/DR,$FORMAT/DV | ||
description: Paired-read support for the ref and alt alleles in the order listed | ||
number: 2 | ||
type: integer |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# Test config for Delly SV caller | ||
id: "gridss_$INFO/SVTYPE" | ||
info: | ||
CALLER: | ||
value: gridss | ||
description: SV caller | ||
number: 1 | ||
type: string | ||
CIPOS: | ||
value: $INFO/CIPOS | ||
description: Confidence interval around POS for imprecise variants | ||
number: 2 | ||
type: Integer | ||
alts: | ||
BND: | ||
CIEND: | ||
value: $INFO/CIRPOS | ||
description: Confidence interval around END position for imprecise variants | ||
number: 2 | ||
type: Integer | ||
SVLEN: | ||
value: $INFO/SVLEN | ||
description: The length of the structural variant | ||
number: 1 | ||
type: Integer | ||
IMPRECISE: | ||
value: $INFO/IMPRECISE | ||
description: Imprecise structural variation | ||
number: 0 | ||
type: flag | ||
format: | ||
GT: | ||
value: ./. | ||
description: Genotype | ||
number: 1 | ||
type: string |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
##fileformat=VCFv4.2 | ||
##FILTER=<ID=PASS,Description="All filters passed"> | ||
##fileDate=20231204 | ||
##ALT=<ID=DEL,Description="Deletion"> | ||
##ALT=<ID=DUP,Description="Duplication"> | ||
##ALT=<ID=INV,Description="Inversion"> | ||
##ALT=<ID=BND,Description="Translocation"> | ||
##ALT=<ID=INS,Description="Insertion"> | ||
##FILTER=<ID=LowQual,Description="Poor quality and insufficient number of PEs and SRs."> | ||
##INFO=<ID=CIEND,Number=2,Type=Integer,Description="PE confidence interval around END"> | ||
##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="PE confidence interval around POS"> | ||
##INFO=<ID=CHR2,Number=1,Type=String,Description="Chromosome for POS2 coordinate in case of an inter-chromosomal translocation"> | ||
##INFO=<ID=POS2,Number=1,Type=Integer,Description="Genomic position for CHR2 in case of an inter-chromosomal translocation"> | ||
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the structural variant"> | ||
##INFO=<ID=PE,Number=1,Type=Integer,Description="Paired-end support of the structural variant"> | ||
##INFO=<ID=MAPQ,Number=1,Type=Integer,Description="Median mapping quality of paired-ends"> | ||
##INFO=<ID=SRMAPQ,Number=1,Type=Integer,Description="Median mapping quality of split-reads"> | ||
##INFO=<ID=SR,Number=1,Type=Integer,Description="Split-read support"> | ||
##INFO=<ID=SRQ,Number=1,Type=Float,Description="Split-read consensus alignment quality"> | ||
##INFO=<ID=SVINSSEQ,Number=1,Type=String,Description="Split-read consensus sequence"> | ||
##INFO=<ID=CE,Number=1,Type=Float,Description="Consensus sequence entropy"> | ||
##INFO=<ID=CT,Number=1,Type=String,Description="Paired-end signature induced connection type"> | ||
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Insertion length for SVTYPE=INS."> | ||
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation"> | ||
##INFO=<ID=PRECISE,Number=0,Type=Flag,Description="Precise structural variation"> | ||
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant"> | ||
##INFO=<ID=SVMETHOD,Number=1,Type=String,Description="Type of approach used to detect SV"> | ||
##INFO=<ID=INSLEN,Number=1,Type=Integer,Description="Predicted length of the insertion"> | ||
##INFO=<ID=HOMLEN,Number=1,Type=Integer,Description="Predicted microhomology length using a max. edit distance of 2"> | ||
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> | ||
##FORMAT=<ID=GL,Number=G,Type=Float,Description="Log10-scaled genotype likelihoods for RR,RA,AA genotypes"> | ||
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> | ||
##FORMAT=<ID=FT,Number=1,Type=String,Description="Per-sample genotype filter"> | ||
##FORMAT=<ID=RC,Number=1,Type=Integer,Description="Raw high-quality read counts or base counts for the SV"> | ||
##FORMAT=<ID=RCL,Number=1,Type=Integer,Description="Raw high-quality read counts or base counts for the left control region"> | ||
##FORMAT=<ID=RCR,Number=1,Type=Integer,Description="Raw high-quality read counts or base counts for the right control region"> | ||
##FORMAT=<ID=RDCN,Number=1,Type=Integer,Description="Read-depth based copy-number estimate for autosomal sites"> | ||
##FORMAT=<ID=DR,Number=1,Type=Integer,Description="# high-quality reference pairs"> | ||
##FORMAT=<ID=DV,Number=1,Type=Integer,Description="# high-quality variant pairs"> | ||
##FORMAT=<ID=RR,Number=1,Type=Integer,Description="# high-quality reference junction reads"> | ||
##FORMAT=<ID=RV,Number=1,Type=Integer,Description="# high-quality variant junction reads"> | ||
##reference=reference.fasta | ||
##contig=<ID=chr14,length=2000001> | ||
##contig=<ID=chr16,length=2000001> | ||
##contig=<ID=chrX,length=2000001> | ||
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT PosCon1 | ||
chr16 86933 DEL00000000 T <DEL> 120 LowQual PRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.1.6;END=1349692;PE=0;MAPQ=0;CT=3to5;CIPOS=-9,9;CIEND=-9,9;SRMAPQ=60;INSLEN=0;HOMLEN=9;SR=2;SRQ=0.986667;SVINSSEQ=AAAACAAAAAAAAAAAAAAAAAAAAAAAAAAAATATATATATATATATATATATATATATATATATATATACACATACATATATACGGTTGATTTTTACATATTGATCTTGTATCTTGTAACCTTGCTGAACTTGTTCATTAGTTCTAAT;CE=1.61868 GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 1/1:-24.3989,-2.10615,0:21:PASS:0:31692:19715:3:0:0:0:7 | ||
chr16 1077371 INV00000001 T <INV> 58 LowQual IMPRECISE;SVTYPE=INV;SVMETHOD=EMBL.DELLYv1.1.6;END=1078502;PE=2;MAPQ=29;CT=5to5;CIPOS=-392,392;CIEND=-392,392 GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/0:0,-0.521621,-107.701:6:LowQual:40:102:49:2:19:2:0:0 | ||
chr16 1123476 INV00000002 A <INV> 180 PASS PRECISE;SVTYPE=INV;SVMETHOD=EMBL.DELLYv1.1.6;END=1604486;PE=0;MAPQ=0;CT=5to5;CIPOS=-3,3;CIEND=-3,3;SRMAPQ=60;INSLEN=0;HOMLEN=3;SR=3;SRQ=0.98;SVINSSEQ=GAATTGCTTGAACACTGCACCACTGCACTCCAGCCTGGGTGACAGAGGAAGACTCTTTCTCCAAAAAAAAAGAATGTTTTCCTACATATATATATATATATATATATATATACACACACACACACACACACACACACACACACACAGTCT;CE=1.88447 GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 1/1:-38.9839,-3.59622,0:36:PASS:11456:39951:0:7:0:0:0:12 | ||
chr16 1135261 INS00000003 C <INS> 299 PASS PRECISE;SVTYPE=INS;SVMETHOD=EMBL.DELLYv1.1.6;END=1135262;SVLEN=27;PE=0;MAPQ=0;CT=NtoN;CIPOS=-2,2;CIEND=-2,2;SRMAPQ=60;INSLEN=27;HOMLEN=2;SR=5;SRQ=1;SVINSSEQ=AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCACTGGAAACAGCCAAGAGATCCTTCAAAAAGTGAATGGATAAACCAACTGTAACTCATTCATACAGTGGAACGTTAATCAGCAATTCTAAAAATGAGCTATCAAGTCACAAAAAGACAAAGAAGAACCTTAACACAAAATAACA;CE=1.67205 GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 1/1:-90.4974,-7.52311,0:75:PASS:11619:25087:13468:2:0:0:0:25 |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
# Configuration | ||
The configuration file consists of 4 main parts: | ||
1. `id` | ||
2. `alt` | ||
3. `info` | ||
4. `format` | ||
|
||
## `id` | ||
The `id` section is used to define the ID of the variant. The `id` section can be defined as follows: | ||
```yaml | ||
id: <id> | ||
``` | ||
The value for the ID can be resolved (see [Resolvable fields](#resolvable-fields)). All IDs get a unique number appended to them to ensure that they are unique. | ||
## `alt` | ||
The `alt` section can be used to change the ALT field and SVTYPE info field for each variant. The `alt` section can be defined as follows: | ||
```yaml | ||
alt: | ||
<alt>: <new_alt> | ||
``` | ||
|
||
For example you might want to change the `BND` ALT to `TRA` (for Delly for example): | ||
```yaml | ||
alt: | ||
BND: TRA | ||
``` | ||
|
||
## `info` | ||
The `info` section can be used to change the info fields for each variant. The `info` section can be defined as follows: | ||
```yaml | ||
info: | ||
<info_field>: | ||
value: <new_value> | ||
type: <new_type> | ||
description: <new_description> | ||
number: <new_number> | ||
alts: | ||
<alt>: <new_value> | ||
<alt>: <new_value> | ||
``` | ||
### value | ||
The `value` field can be used to change the default value of the info field. The value can be resolved (see [Resolvable fields](#resolvable-fields)). | ||
|
||
### type | ||
The `type` field can be used to set the type of the info field (This will be reflected in the header of the output VCF file). | ||
|
||
### description | ||
The `description` field can be used to set the description of the info field (This will be reflected in the header of the output VCF file). | ||
|
||
### number | ||
The `number` field can be used to set the number of the info field (This will be reflected in the header of the output VCF file). | ||
|
||
### alts | ||
The `alts` field can be used to set the value of the info field for a specific ALT. The value can be resolved (see [Resolvable fields](#resolvable-fields)). | ||
|
||
For example when all `SVLEN` info fields are positive, you maybe want to change the field for all deletions to the negative length: | ||
```yaml | ||
info: | ||
SVLEN: | ||
value: $INFO/SVLEN | ||
type: Integer | ||
description: "Structural variant length" | ||
number: 1 | ||
alts: | ||
DEL: -$INFO/SVLEN | ||
``` | ||
|
||
## `format` | ||
The `format` section can be used to change the format fields for each variant. The `format` section can be defined as follows: | ||
```yaml | ||
format: | ||
<format_field>: | ||
value: <new_value> | ||
type: <new_type> | ||
description: <new_description> | ||
number: <new_number> | ||
alts: | ||
<alt>: <new_value> | ||
<alt>: <new_value> | ||
``` | ||
|
||
The format fields work the same as the info fields (see [Info](#info)). | ||
|
||
## Resolvable fields | ||
|
||
Some fields can be resolved to a value. | ||
|
||
### Variables | ||
|
||
A variable can be resolved appending a `$` to the field name. | ||
|
||
Following variables are available: | ||
1. `$FORMAT/<format_field>` => This is only accesible for other format fields | ||
- An additional `/<number>` can be added to get a specific value in case of multiple values | ||
2. `$INFO/<info_field>` | ||
- An additional `/<number>` can be added to get a specific value in case of multiple values | ||
3. `$POS` | ||
4. `$CHROM` | ||
5. `$ALT` | ||
6. `$QUAL` | ||
7. `$FILTER` | ||
|
||
For example `$INFO/SVLEN` will be resolved to the value of the `SVLEN` info field. | ||
|
||
### Functions | ||
|
||
Functions are very simple calculations that can be done on the values. | ||
|
||
More functions can be added in the future. Please open an issue to request new functions. | ||
|
||
#### `~sub` | ||
The `~sub` function can be used to substract values from each other. The function can be used as follows: | ||
|
||
```yaml | ||
~sub:<value_start>,<value_to_substract>,<value_to_substract>,... | ||
``` | ||
|
||
:warning: only integers and floats are supported for this function :warning: | ||
|
||
#### `~sum` | ||
The `~sum` function can be used to take the sum of all values. The function can be used as follows: | ||
|
||
```yaml | ||
~sum:<value_start>,<value_to_add>,<value_to_add>,... | ||
``` | ||
|
||
:warning: only integers and floats are supported for this function :warning: | ||
|
||
#### `~len` | ||
The `~len` function can be used to get the length of a string value. The function can be used as follows: | ||
|
||
```yaml | ||
~len:<value> | ||
``` |
Oops, something went wrong.