Skip to content

Commit

Permalink
Merge pull request #2 from nvnieuwk/dev
Browse files Browse the repository at this point in the history
Version 0.1.0
  • Loading branch information
nvnieuwk authored Jan 22, 2024
2 parents 0d35e76 + d387aa6 commit 4d27add
Show file tree
Hide file tree
Showing 22 changed files with 1,467 additions and 1 deletion.
31 changes: 31 additions & 0 deletions .github/workflows/go.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# This workflow will build a golang project
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-go

name: Go

on:
push:
branches: [ "main", "dev" ]
pull_request:
branches: [ "main", "dev" ]

jobs:

build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: '1.21'

- name: Install dependencies
run: go get .

- name: Build
run: go build -v ./...

- name: Test
run: go test -v ./...
27 changes: 27 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
on:
release:
types: [created]

permissions:
contents: write
packages: write

jobs:
release-bedgovcf:
name: release bedgovcf ${{ matrix.goos }}_${{ matrix.goarch }}
runs-on: ubuntu-latest
strategy:
matrix:
goos: [linux, darwin]
goarch: [amd64, arm64]
exclude:
- goos: linux
goarch: arm64

steps:
- uses: actions/checkout@v3
- uses: wangyoucao577/go-release-action@v1
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
goos: ${{ matrix.goos }}
goarch: ${{ matrix.goarch }}
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
*.dll
*.so
*.dylib
svync

# Test binary, built with `go test -c`
*.test
Expand All @@ -19,3 +20,6 @@

# Go workspace file
go.work

# Output files
test.vcf
61 changes: 60 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,61 @@
# svync
⚠️ This tool is still under development, please check back in the future ⚠️
Svync is a tool designed to synchronize structural variant calls from different callers. It uses YAML configs to define how to handle the standardization.

## Usage
```bash
svync --config <config.yaml> --input <input.vcf>
```

### Arguments
#### Required
| Argument | Description |
| --- | --- |
| `--config`/`-c` | Path to the YAML config file |
| `--input`/`-i` | Path to the input VCF file |

#### Optional
| Argument | Description | Default |
| --- | --- | --- |
| `--output`/`-o` | Path to the output VCF file | `stdout` |
| `--nodate`/`--nd` | Do not add the date to the output VCF file | `false` |
| `--mute-warnings`/`--mw` | Do not output warnings | `false` |

## Configuration
The configuration file is the core of the standardization in Svync. More information can be found in the [configuration documentation](docs/configuration.md).


## Installation
### Mamba/Conda
This is the preffered way of installing BedGoVcf.

```bash
mamba install -c bioconda bedgovcf
```

or with conda:

```bash
conda install -c bioconda bedgovcf
```

### Precompiled binaries
Precompiled binaries are available for Linux and macOS on the [releases page](https://github.com/nvnieuwk/svync/releases).


### Installation from source
Make sure you have go installed on your machine (or [install](https://go.dev/doc/install) it if you don't currently have it)

Then run these commands to install bedgovcf:

```bash
go get .
go build .
sudo mv bedgovcf /usr/local/bin/
```

Next run this command to check if it was correctly installed:

```bash
bedgovcf --help
```

29 changes: 29 additions & 0 deletions data/delly.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Test config for Delly SV caller
id: "delly_$INFO/SVTYPE"
alt:
BND: TRA
info:
CALLER:
value: delly
description: SV caller
number: 1
type: string
TEST:
value: $INFO/END,$INFO/CIEND/1
description: Test info field
number: 2
type: integer
SVLEN:
value: ~sub:$INFO/END,$POS
description: SV length
number: 2
type: integer
alts:
DEL: -~sub:$INFO/END,$POS
INS: $INFO/INSLEN
format:
PE:
value: $FORMAT/DR,$FORMAT/DV
description: Paired-read support for the ref and alt alleles in the order listed
number: 2
type: integer
36 changes: 36 additions & 0 deletions data/gridss.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Test config for Delly SV caller
id: "gridss_$INFO/SVTYPE"
info:
CALLER:
value: gridss
description: SV caller
number: 1
type: string
CIPOS:
value: $INFO/CIPOS
description: Confidence interval around POS for imprecise variants
number: 2
type: Integer
alts:
BND:
CIEND:
value: $INFO/CIRPOS
description: Confidence interval around END position for imprecise variants
number: 2
type: Integer
SVLEN:
value: $INFO/SVLEN
description: The length of the structural variant
number: 1
type: Integer
IMPRECISE:
value: $INFO/IMPRECISE
description: Imprecise structural variation
number: 0
type: flag
format:
GT:
value: ./.
description: Genotype
number: 1
type: string
50 changes: 50 additions & 0 deletions data/test1.delly.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=20231204
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=DUP,Description="Duplication">
##ALT=<ID=INV,Description="Inversion">
##ALT=<ID=BND,Description="Translocation">
##ALT=<ID=INS,Description="Insertion">
##FILTER=<ID=LowQual,Description="Poor quality and insufficient number of PEs and SRs.">
##INFO=<ID=CIEND,Number=2,Type=Integer,Description="PE confidence interval around END">
##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="PE confidence interval around POS">
##INFO=<ID=CHR2,Number=1,Type=String,Description="Chromosome for POS2 coordinate in case of an inter-chromosomal translocation">
##INFO=<ID=POS2,Number=1,Type=Integer,Description="Genomic position for CHR2 in case of an inter-chromosomal translocation">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the structural variant">
##INFO=<ID=PE,Number=1,Type=Integer,Description="Paired-end support of the structural variant">
##INFO=<ID=MAPQ,Number=1,Type=Integer,Description="Median mapping quality of paired-ends">
##INFO=<ID=SRMAPQ,Number=1,Type=Integer,Description="Median mapping quality of split-reads">
##INFO=<ID=SR,Number=1,Type=Integer,Description="Split-read support">
##INFO=<ID=SRQ,Number=1,Type=Float,Description="Split-read consensus alignment quality">
##INFO=<ID=SVINSSEQ,Number=1,Type=String,Description="Split-read consensus sequence">
##INFO=<ID=CE,Number=1,Type=Float,Description="Consensus sequence entropy">
##INFO=<ID=CT,Number=1,Type=String,Description="Paired-end signature induced connection type">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Insertion length for SVTYPE=INS.">
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">
##INFO=<ID=PRECISE,Number=0,Type=Flag,Description="Precise structural variation">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=SVMETHOD,Number=1,Type=String,Description="Type of approach used to detect SV">
##INFO=<ID=INSLEN,Number=1,Type=Integer,Description="Predicted length of the insertion">
##INFO=<ID=HOMLEN,Number=1,Type=Integer,Description="Predicted microhomology length using a max. edit distance of 2">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GL,Number=G,Type=Float,Description="Log10-scaled genotype likelihoods for RR,RA,AA genotypes">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=FT,Number=1,Type=String,Description="Per-sample genotype filter">
##FORMAT=<ID=RC,Number=1,Type=Integer,Description="Raw high-quality read counts or base counts for the SV">
##FORMAT=<ID=RCL,Number=1,Type=Integer,Description="Raw high-quality read counts or base counts for the left control region">
##FORMAT=<ID=RCR,Number=1,Type=Integer,Description="Raw high-quality read counts or base counts for the right control region">
##FORMAT=<ID=RDCN,Number=1,Type=Integer,Description="Read-depth based copy-number estimate for autosomal sites">
##FORMAT=<ID=DR,Number=1,Type=Integer,Description="# high-quality reference pairs">
##FORMAT=<ID=DV,Number=1,Type=Integer,Description="# high-quality variant pairs">
##FORMAT=<ID=RR,Number=1,Type=Integer,Description="# high-quality reference junction reads">
##FORMAT=<ID=RV,Number=1,Type=Integer,Description="# high-quality variant junction reads">
##reference=reference.fasta
##contig=<ID=chr14,length=2000001>
##contig=<ID=chr16,length=2000001>
##contig=<ID=chrX,length=2000001>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT PosCon1
chr16 86933 DEL00000000 T <DEL> 120 LowQual PRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.1.6;END=1349692;PE=0;MAPQ=0;CT=3to5;CIPOS=-9,9;CIEND=-9,9;SRMAPQ=60;INSLEN=0;HOMLEN=9;SR=2;SRQ=0.986667;SVINSSEQ=AAAACAAAAAAAAAAAAAAAAAAAAAAAAAAAATATATATATATATATATATATATATATATATATATATACACATACATATATACGGTTGATTTTTACATATTGATCTTGTATCTTGTAACCTTGCTGAACTTGTTCATTAGTTCTAAT;CE=1.61868 GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 1/1:-24.3989,-2.10615,0:21:PASS:0:31692:19715:3:0:0:0:7
chr16 1077371 INV00000001 T <INV> 58 LowQual IMPRECISE;SVTYPE=INV;SVMETHOD=EMBL.DELLYv1.1.6;END=1078502;PE=2;MAPQ=29;CT=5to5;CIPOS=-392,392;CIEND=-392,392 GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/0:0,-0.521621,-107.701:6:LowQual:40:102:49:2:19:2:0:0
chr16 1123476 INV00000002 A <INV> 180 PASS PRECISE;SVTYPE=INV;SVMETHOD=EMBL.DELLYv1.1.6;END=1604486;PE=0;MAPQ=0;CT=5to5;CIPOS=-3,3;CIEND=-3,3;SRMAPQ=60;INSLEN=0;HOMLEN=3;SR=3;SRQ=0.98;SVINSSEQ=GAATTGCTTGAACACTGCACCACTGCACTCCAGCCTGGGTGACAGAGGAAGACTCTTTCTCCAAAAAAAAAGAATGTTTTCCTACATATATATATATATATATATATATATACACACACACACACACACACACACACACACACACAGTCT;CE=1.88447 GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 1/1:-38.9839,-3.59622,0:36:PASS:11456:39951:0:7:0:0:0:12
chr16 1135261 INS00000003 C <INS> 299 PASS PRECISE;SVTYPE=INS;SVMETHOD=EMBL.DELLYv1.1.6;END=1135262;SVLEN=27;PE=0;MAPQ=0;CT=NtoN;CIPOS=-2,2;CIEND=-2,2;SRMAPQ=60;INSLEN=27;HOMLEN=2;SR=5;SRQ=1;SVINSSEQ=AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCACTGGAAACAGCCAAGAGATCCTTCAAAAAGTGAATGGATAAACCAACTGTAACTCATTCATACAGTGGAACGTTAATCAGCAATTCTAAAAATGAGCTATCAAGTCACAAAAAGACAAAGAAGAACCTTAACACAAAATAACA;CE=1.67205 GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 1/1:-90.4974,-7.52311,0:75:PASS:11619:25087:13468:2:0:0:0:25
Binary file added data/test1.delly.vcf.gz
Binary file not shown.
Binary file added data/test1.delly.vcf.gz.tbi
Binary file not shown.
Binary file added data/test2.gridss.vcf.gz
Binary file not shown.
Binary file added data/test2.gridss.vcf.gz.tbi
Binary file not shown.
134 changes: 134 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# Configuration
The configuration file consists of 4 main parts:
1. `id`
2. `alt`
3. `info`
4. `format`

## `id`
The `id` section is used to define the ID of the variant. The `id` section can be defined as follows:
```yaml
id: <id>
```
The value for the ID can be resolved (see [Resolvable fields](#resolvable-fields)). All IDs get a unique number appended to them to ensure that they are unique.
## `alt`
The `alt` section can be used to change the ALT field and SVTYPE info field for each variant. The `alt` section can be defined as follows:
```yaml
alt:
<alt>: <new_alt>
```

For example you might want to change the `BND` ALT to `TRA` (for Delly for example):
```yaml
alt:
BND: TRA
```

## `info`
The `info` section can be used to change the info fields for each variant. The `info` section can be defined as follows:
```yaml
info:
<info_field>:
value: <new_value>
type: <new_type>
description: <new_description>
number: <new_number>
alts:
<alt>: <new_value>
<alt>: <new_value>
```
### value
The `value` field can be used to change the default value of the info field. The value can be resolved (see [Resolvable fields](#resolvable-fields)).

### type
The `type` field can be used to set the type of the info field (This will be reflected in the header of the output VCF file).

### description
The `description` field can be used to set the description of the info field (This will be reflected in the header of the output VCF file).

### number
The `number` field can be used to set the number of the info field (This will be reflected in the header of the output VCF file).

### alts
The `alts` field can be used to set the value of the info field for a specific ALT. The value can be resolved (see [Resolvable fields](#resolvable-fields)).

For example when all `SVLEN` info fields are positive, you maybe want to change the field for all deletions to the negative length:
```yaml
info:
SVLEN:
value: $INFO/SVLEN
type: Integer
description: "Structural variant length"
number: 1
alts:
DEL: -$INFO/SVLEN
```

## `format`
The `format` section can be used to change the format fields for each variant. The `format` section can be defined as follows:
```yaml
format:
<format_field>:
value: <new_value>
type: <new_type>
description: <new_description>
number: <new_number>
alts:
<alt>: <new_value>
<alt>: <new_value>
```

The format fields work the same as the info fields (see [Info](#info)).

## Resolvable fields

Some fields can be resolved to a value.

### Variables

A variable can be resolved appending a `$` to the field name.

Following variables are available:
1. `$FORMAT/<format_field>` => This is only accesible for other format fields
- An additional `/<number>` can be added to get a specific value in case of multiple values
2. `$INFO/<info_field>`
- An additional `/<number>` can be added to get a specific value in case of multiple values
3. `$POS`
4. `$CHROM`
5. `$ALT`
6. `$QUAL`
7. `$FILTER`

For example `$INFO/SVLEN` will be resolved to the value of the `SVLEN` info field.

### Functions

Functions are very simple calculations that can be done on the values.

More functions can be added in the future. Please open an issue to request new functions.

#### `~sub`
The `~sub` function can be used to substract values from each other. The function can be used as follows:

```yaml
~sub:<value_start>,<value_to_substract>,<value_to_substract>,...
```

:warning: only integers and floats are supported for this function :warning:

#### `~sum`
The `~sum` function can be used to take the sum of all values. The function can be used as follows:

```yaml
~sum:<value_start>,<value_to_add>,<value_to_add>,...
```

:warning: only integers and floats are supported for this function :warning:

#### `~len`
The `~len` function can be used to get the length of a string value. The function can be used as follows:

```yaml
~len:<value>
```
Loading

0 comments on commit 4d27add

Please sign in to comment.