forked from samtools/bcftools
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathNEWS
1063 lines (678 loc) · 36.6 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
## Release a.b
## Release 1.13 (7th July 2021)
This release brings new options and significant changes in BAQ parametrization
in `bcftools mpileup`. The previous behavior can be triggered by providing
the `--config 1.12` option. Please see https://github.com/samtools/bcftools/pull/1474
for details.
Changes affecting the whole of bcftools, or multiple commands:
* Improved build system
Changes affecting specific commands:
* bcftools annotate:
- Fix rare a bug when INFO/END is present, all INFO fields are removed
with `bcftools annotate -x INFO` and BCF output is produced. Then the
removed INFO/END continues to inform the end coordinate and causes
incorrect retrieval of records with the -r option (#1483)
- Support for matching annotation line by ID, in addition to CHROM,POS,REF,
and ALT (#1461)
bcftools annotate -a annots.tab.gz -c CHROM,POS,~ID,REF,ALT,INFO/END input.vcf
* bcftools csq:
- When GFF and VCF/fasta use a different chromosome naming convention
(e.g. chrX vs X), no consequences would be added. Newly the program
attempts to detect these differences and remove/add the "chr" prefix
to chromosome name to match the GFF and VCF/fasta (#1507)
- Parametrize brief-predictions parameter to allow explicit number of
aminoacids to be printed. Note that the `-b, --brief-predictions` option
is being replaced with `-B, --trim-protein-seq INT`
* bcftools +fill-tags:
- Generalization and better support for custom functions that allow
adding new INFO tags based on arbitrary `-i, --include` type of
expressions. For example, to calculate a missing INFO/DP annotation
from FORMAT/AD, it is possible to use:
-t 'DP:1=int(sum(FORMAT/AD))'
Here the optional ":1" part specifies that a single value will be
added (by default Number=. is used) and the optional int(...) adds
an integer value (by default Type=Float is used).
- When FORMAT/GT is not present, the INFO/AF tag will be newly calculated
from INFO/AC and INFO/AN.
* bcftools gtcheck:
- Switch between FORMAT/GT or FORMAT/PL when one is (implicitly) requested
but only the other is available
- Improve diagnostics, printing warnings when a line cannot be matched and
the number of lines skipped for various reasons (#1444)
- Minor bug fix, with PLs being the default, the `--distinctive-sites` option
started to require explicit `--error-probability 0`
* bcftools index:
- The program now accepts both data file name and the index file name. This
adds to user convenience when running index statistics (-n, -s)
* bcftools isec:
- Always generate sites.txt with isec -p (#1462)
* bcftools +mendelian:
- Consider only complete trios, do not crash on sample name typos (#1520)
* bcftools mpileup:
- New `--seed` option for reproducibility of subsampling code in HTSlib
- The SCR annotation which shows the number of soft-clipped reads now
correctly pools reads together regardless of the variant type. Previously
only reads with indels were included at indel sites.
- Major revamp of BAQ. Please see https://github.com/samtools/bcftools/pull/1474
for details. The previous behavior can be triggered by providing the `--config 1.12`
option.
- Thanks to improvements in HTSlib, the removal of overlapping reads (which can
be disabled with the `-x, --ignore-overlaps` options) is not systematically biased
anymore (https://github.com/samtools/htslib/pull/1273)
- Modified scale of Mann-Whitney U tests. Newly INFO/*Z annotations will be printed,
for example MQBZ replaces MQB.
* bcftools norm:
- Fix Type=Flag output in `norm --atomize` (#1472)
- Atomization must not discard ALT=. records
- Atomization of AD and QS tags now correctly updates occurrences of duplicate
alleles within different haplotypes
- Fix a bug in atomization of Number=A,R tags
* bcftools reheader:
- Add `-T, --temp-prefix` option
* bcftools +setGT:
- A wider range of genotypes can be set by the plugin by allowing
specifying custom genotypes. For example, to force a heterozygous
genotype it is now possible to use expressions like:
c:'m|M'
c:0/1
c:0
* bcftools +split-vep:
- New `-u, --allow-undef-tags` option
- Better handling of ambiguous keys such as INFO/AF and CSQ/AD. The
`-p, --annot-prefix` option is now applied before doing anything else
which allows its use with `-f, --format` and `-c, --columns` options.
- Some consequence field names may not constitute a valid tag name, such
as "pos(1-based)". Newly field names are trimmed to exclude brackets.
* bcftools +tag2tag:
- New --QR-QA-to-QS option to convert annotations generated by Freebays
to QS used by BCFtools
* bcftools +trio-dnm:
- Add support for sites with more than four alleles. Note that only the
four most frequent alleles are considered, the model remains unchanged.
Previously such sites were skipped.
- New --use-NAIVE option for a naive DNM calling based solely on FORMAT/GT
and expected Mendelian inheritance. This option is suitable for prefiltering.
- Fix behavior to match the documentation, the `--dnm-tag DNG` option now
correctly outputs log scaled values by default, not phred scaled.
- Fix bug in VAF calculation, homozygous de novo variants were incorrectly
reported as having VAF=50%
- Fix arithmetic underflow which could lead to imprecise scores and improve
sensitivity in high coverage regions
- Allow combining --pn and --pns to set the noise trehsholds independently
## Release 1.12 (17th March 2021)
Changes affecting the whole of bcftools, or multiple commands:
* The output file type is determined from the output file name suffix, where
available, so the -O/--output-type option is often no longer necessary.
* Make F_MISSING in filtering expressions work for sites with multiple
ALT alleles (#1343)
* Fix N_PASS and F_PASS to behave according to expectation when reverse
logic is used (#1397). This fix has the side effect of `query` (or
programs like `+trio-stats`) behaving differently with these expressions,
operating now in site-oriented rather than sample-oriented mode. For
example, the new behavior could be:
bcftools query -f'[%POS %SAMPLE %GT\n]' -i'N_PASS(GT="alt")==1'
11 A 0/0
11 B 0/0
11 C 1/1
while previously the same expression would return:
11 C 1/1
The original mode can be mimicked by splitting the filtering into two steps:
bcftools view -i'N_PASS(GT="alt")==1' | \
bcftools query -f'[%POS %SAMPLE %GT\n]' -i'GT="alt"'
Changes affecting specific commands:
* bcftools annotate:
- New `--rename-annots` option to help fix broken VCFs (#1335)
- New -C option allows to read a long list of options from a file to
prevent very long command lines.
- New `append-missing` logic allows annotations to be added for each ALT
allele in the same order as they appear in the VCF. Note that this is
not bullet proof. In order for this to work:
- the annotation file must have one line per ALT allele
- fields must contain a single value as multiple values are appended
as they are and would break the correspondence between the alleles
and values
* bcftools concat:
- Do not phase genotypes by mistake if they are not already phased
with `-l` (#1346)
* bcftools consensus:
- New `--mask-with`, `--mark-del`, `--mark-ins`, `--mark-snv` options
(#1382, #1381, #1170)
- Symbolic <DEL> should have only one REF base. If there are multiple,
take POS+1 as the first deleted base.
- Make consensus work when the first base of the reference genome is
deleted. In this situation the VCF record has POS=1 and the first
REF base cannot precede the event. (#1330)
* bcftools +contrast:
- The NOVELGT annotation was previously not added when requested.
* bcftools convert:
- Make the --hapsample and --hapsample2vcf options consistent with each
other and with the documentation.
* bcftools call:
- Revamp of `call -G`, previously sample grouping by population was not
truly independent and could still be influenced by the presence of other
sample groups.
- Optional addition of INFO/PV4 annotation with `call -a INFO/PV4`
- Remove generation of useless HOB and ICB annotation;
use `+fill-tags -- -t HWE,ExcHet` instead
- The `call -f` option was renamed to `-a` to (1) make it consistent with
`mpileup` and (2) to indicate that it includes both INFO and FORMAT
annotations, not just FORMAT as previously
- Any sensible Number=R,Type=Integer annotation can be used with -G,
such as AD or QS
- Don't trim QUAL; although usefulness of this change is questionable for
true probabilistic interpretation (such high precision is unrealistic),
using QUAL as a score rather than probability is helpful and permits more
fine-grained filtering
- Fix a suspected bug in `call -F` in the worst case, for certain improve
readability
- `call -C trio` is temporarily disabled
* bcftools csq:
- Fix a bug wich caused incorrect FORMAT/BCSQ formatting at sites with too
many per-sample consequences
- Fix a bug which incorrectly handled the --ncsq parameter and could clash
with reserved BCF values, consequently producing truncated or even incorrect
output of the %TBCSQ formatting expression in `bcftools query`. To account
for the reserved values, the new default value is --ncsq 15 (#1428)
* bcftools +fill-tags:
- MAF definition revised for multiallelic sites, the second most common
allele is considered to be the minor allele (#1313)
- New FORMAT/VAF, VAF1 annotations to set the fraction of alternate reads
provided FORMAT/AD is present
* bcftools gtcheck:
- support matching of a single sample against all other samples in the file
with `-s qry:sample -s gt:-`. This was previously not possible, either
full cross-check mode had to be run or a list of pairs/samples had to
be created explicitly
* bcftools merge:
- Make `merge -R` behavior consistent with other commands and pull in
overlapping records with POS outside of the regions (#1374)
- Bug fix (#1353)
* bcftools mpileup:
- Add new optional tag `mpileup -a FORMAT/QS`
* bcftools norm:
- New `-a, --atomize` functionality to decompose complex variants,
for example MNVs into consecutive SNVs
- New option `--old-rec-tag` to indicate the original variant
* bcftools query:
- Incorrect fields were printed in the per-sample output when subset
of samples was requested via -s/-S and the order of samples in the
header was different from the requested -s/-S order (#1435)
* bcftools +prune:
- New options --random-seed and --nsites-per-win-mode (#1050)
* bcftools +split-vep:
- Transcript selection now works also on the raw CSQ/BCSQ annotation.
- Bug fix, samples were dropped on VCF input and VCF/BCF output (#1349)
* bcftools stats:
- Changes to QUAL and ts/tv plotting stats: avoid capping QUAL to
predefined bins, use an open-range logarithmic binning instead
- plot dual ts/tv stats: per quality bin and cumulative as if threshold
applied on the whole dataset
* bcftools +trio-dnm2:
- Major revamp of +trio-dnm plugin, which is now deprecated and replaced by
+trio-dnm2.
The original trio-dnm calling model used genotype likelihoods (PLs) as the
input for calling. However, that is flawed because PLs make assumptions
which are unsuitable for de novo calling: PL(RR) can become bigger than
PL(RA) even when the ALT allele is present in the parents. Note that
this is true also for other programs such as DeNovoGear which rely on
the same samtools calculation.
The new recommended workflow is
bcftools mpileup -a AD,QS -f ref.fa -Ou proband.bam father.bam mother.bam |
bcftools call -mv -Ou |
bcftools +trio-dnm -p proband,father,mother -Oz -o output.vcf.gz
This new version also implements the DeNovoGear model. The original
behavior of trio-dnm is no longer supported.
For more details see http://samtools.github.io/bcftools/trio-dnm.pdf
## Release 1.11 (22nd September 2020)
Changes affecting the whole of bcftools, or multiple commands:
* Filtering -i/-e expressions
- Breaking change in -i/-e expressions on the FILTER column. Originally
it was possible to query only a subset of filters, but not an exact match.
The new behavior is:
FILTER="A" .. exact match, for example "A;B" does not pass
FILTER!="A" .. exact match, for example "A;B" does pass
FILTER~"A" .. both "A" and "A;B" pass
FILTER!~"A" .. neither "A" nor "A;B" pass
- Fix in commutative comparison operators, in some cases reversing sides
would produce incorrect results (#1224; #1266)
- Better support for filtering on sample subsests
- Add SMPL_*/S* family of functions that evaluate within rather than across
all samples. (#1180)
* Improvements in the build system
Changes affecting specific commands:
* bcftools annotate:
- Previously it was not possible to use `--columns =TAG` with INFO tags
and the `--merge-logic` feature was restricted to tab files with BEG,END
columns, now extended to work also with REF,ALT.
- Make `annotate -TAG/+TAG` work also with FORMAT fields. (#1259)
- ID and FILTER can be transferred to INFO and ID can be populated from
INFO. However, the FILTER column still cannot be populated from an INFO
tag because all possible FILTER values must be known at the time of
writing the header (#947; #1187)
* bcftools consensus:
- Fix in handling symbolic deletions and overlapping variants.
(#1149; #1155; #1295)
- Fix `--iupac-codes` crash on REF-only positions with `ALT="."`. (#1273)
- Fix `--chain` crash. (#1245)
- Preserve the case of the genome reference. (#1150)
- Add new `-a, --absent` option which allows to set positions with no
supporting evidence to "N" (or any other character). (#848; #940)
* bcftools convert:
- The option `--vcf-ids` now works also with `-haplegendsample2vcf`. (#1217)
- New option `--keep-duplicates`
* bcftools csq:
- Add `misc/gff2gff.py` script for conversion between various flavors of
GFF files. The initial commit supports only one type and was contributed
by @flashton2003. (#530)
- Add missing consequence types. (PR #1203; #1292)
- Allow overlapping CDS to support ribosomal slippage. (#1208)
* bcftools +fill-tags:
- Added new annotations: INFO/END, TYPE, F_MISSING.
* bcftools filter:
- Make `--SnpGap` optionally filter also SNPs close to other variant types.
(#1126)
* bcftools gtcheck:
- Complete revamp of the command. The new version is faster and allows
N:M sample comparisons, not just 1:N or NxN comparisons.
Some functionality was lost (plotting and clustering) but may be added
back on popular demand.
* bcftools +mendelian:
- Revamp of user options, output VCFs with mendelian errors annotation,
read PED files (thanks to Giulio Genovese).
* bcftools merge:
- Update headers when appropriate with the '--info-rules *:join' INFO rule.
(#1282)
- Local alleles merging that produce LAA and LPL when requested, a draft
implementation of https://github.com/samtools/hts-specs/pull/434 (#1138)
- New `--no-index` which allows to merge unindexed files. Requires the input
files to have chromosomes in th same order and consistent with the order
of sequences in the header. (PR #1253; samtools/htslib#1089)
- Fixes in gVCF merging. (#1127; #1164)
* bcftools norm:
- Fixes in `--check-ref s` reference setting features with non-ACGT bases.
(#473; #1300)
- New `--keep-sum` switch to keep vector sum constant when splitting
multiallelics. (#360)
* bcftools +prune:
- Extend to allow annotating with various LD metrics: r^2,
Lewontin's D' (PMID:19433632), or Ragsdale's D (PMID:31697386).
* bcftools query:
- New `%N_PASS()` formatting expression to output the number of samples
that pass the filtering expression.
* bcftools reheader:
- Improved error reporting to prevent user mistakes. (#1288)
* bcftools roh:
- Several fixes and improvements
- the `--AF-file` description incorrectly suggested "REF\tALT" instead
of the correct "REF,ALT". (#1142)
- RG lines could have negative length. (#1144)
- new `--include-noalt` option to allow also ALT=. records. (#1137)
* bcftools scatter:
- New plugin intended as a convenient inverse to `concat`
(thanks to Giulio Genovese, PR #1249)
* bcftools +split:
- New `--groups-file` option for more flexibility of defining desired
output. (#1240)
- New `--hts-opts` option to reduce required memory by reusing one
output header and allow overriding the default hFile's block size
with `--hts-opts block_size=XXX`. On some file systems (lustre) the
default size can be 4M which becomes a problem when splitting files
with 10+ samples.
- Add support for multisample output and sample renaming
* bcftools +split-vep:
- Add default types (Integer, Float, String) for VEP subfields and make
`--columns -` extract all subfields into INFO tags in one go.
## Release 1.10.2 (19th December 2019)
This is a release fix that corrects minor inconsistencies discovered in
previous deliverables.
## Release 1.10 (6th December 2019)
* Numerous bug fixes, usability improvements and sanity checks were added
to prevent common user errors.
* The -r, --regions (and -R, --regions-file) option should never create
unsorted VCFs or duplicates records again. This also fixes rare cases where
a spanning deletion makes a subsequent record invisible to `bcftools isec`
and other commands.
* Additions to filtering and formatting expressions
- support for the spanning deletion alternate allele (ALT=*)
- new ILEN filtering expression to be able to filter by indel length
- new MEAN, MEDIAN, MODE, STDEV, phred filtering functions
- new formatting expression %PBINOM (phred-scaled binomial probability),
%INFO (the whole INFO column), %FORMAT (the whole FORMAT column),
%END (end position of the REF allele), %END0 (0-based end position
of the REF allele), %MASK (with multiple files indicates the presence
of the site in other files)
* New plugins
- `+gvcfz`: compress gVCF file by resizing gVCF blocks according to
specified criteria
- `+indel-stats`: collect various indel-specific statistics
- `+parental-origin`: determine parental origin of a CNV region
- `+remove-overlaps`: remove overlapping variants.
- `+split-vep`: query structured annotations such INFO/CSQ created by
bcftools/csq or VEP
- `+trio-dnm`: screen variants for possible de-novo mutations in trios
* `annotate`
- new -l, --merge-logic option for combining multiple overlapping regions
* `call`
- new `bcftools call -G, --group-samples` option which allows grouping
samples into populations and applying the HWE assumption within but
not across the groups.
* `csq`
- significant reduction of memory usage in the local -l mode for VCFs
with thousands of samples and 20% reduction in the non-local
haplotype-aware mode.
- fixes a small memory leak and formatting issue in FORMAT/BCSQ at
sites with many consequences
- do not print protein sequence of start_lost events
- support for "start_retained" consequence
- support for symbolic insertions (ALT="<INS...>"), "feature_elongation"
consequence
- new -b, --brief-predictions option to output abbreviated protein
predictions.
* `concat`
- the `--naive` command now checks header compatibility when concatenating
multiple files.
* `consensus`
- add a new `-H, --haplotype 1pIu/2pIu` feature to output first/second
allele for phased genotypes and the IUPAC code for unphased genotypes
- new -p, --prefix option to add a prefix to sequence names on output
* `+contrast`
- added support for Fisher's test probability and other annotations
* `+fill-from-fasta`
- new -N, --replace-non-ACGTN option
* `+dosage`
- fix some serious bugs in dosage calculation
* `+fill-tags`
- extended to perform simple on-the-fly calculations such as calculating
INFO/DP from FORMAT/DP.
* `merge`
- add support for merging FORMAT strings
- bug fixed in gVCF merging
* `mpileup`
- a new optional SCR annotation for the number of soft-clipped reads
* `reheader`
- new -f, --fai option for updating contig lines in the VCF header
* `+trio-stats`
- extend output to include DNM homs and recurrent DNMs
* VariantKey support
## Release 1.9 (18th July 2018)
* `annotate`
- REF and ALT columns can be now transferred from the annotation file.
- fixed bug when setting vector_end values.
* `consensus`
- new -M option to control output at missing genotypes
- variants immediately following insersions should not be skipped. Note
however, that the current fix requires normalized VCF and may still
falsely skip variants adjacent to multiallelic indels.
- bug fixed in -H selection handling
* `convert`
- the --tsv2vcf option now makes the missing genotypes diploid, "./."
instead of "."
- the behavior of -i/-e with --gvcf2vcf changed. Previously only sites with
FILTER set to "PASS" or "." were expanded and the -i/-e options dropped
sites completely. The new behavior is to let the -i/-e options control
which records will be expanded. In order to drop records completely,
one can stream through "bcftools view" first.
* `csq`
- since the real consequence of start/splice events are not known,
the amino acid positions at subsequent variants should stay unchanged
- add `--force` option to skip malformatted transcripts in GFFs with
out-of-phase CDS exons.
* `+dosage`: output all alleles and all their dosages at multiallelic sites
* `+fixref`: fix serious bug in -m top conversion
* `-i/-e` filtering expressions:
- add two-tailed binomial test
- add functions N_PASS() and F_PASS()
- add support for lists of samples in filtering expressions, with many
samples it was impractical to list them all on the command line. Samples
can be now in a file as, e.g., GT[@samples.txt]="het"
- allow multiple perl functions in the expressions and some bug fixes
- fix a parsing problem, '@' was not removed from '@filename' expressions
* `mpileup`: fixed bug where, if samples were renamed using the `-G`
(`--read-groups`) option, some samples could be omitted from the output file.
* `norm`: update INFO/END when normalizing indels
* `+split`: new -S option to subset samples and to use custom file names
instead of the defaults
* `+smpl-stats`: new plugin
* `+trio-stats`: new plugin
* Fixed build problems with non-functional configure script produced on
some platforms
## Release 1.8 (April 2018)
* `-i, -e` filtering: Support for custom perl scripts
* `+contrast`: New plugin to annotate genotype differences between groups
of samples
* `+fixploidy`: New options for simpler ploidy usage
* `+setGT`: Target genotypes can be set to phased by giving `--new-gt p`
* `run-roh.pl`: Allow to pass options directly to `bcftools roh`
* Number of bug fixes
## Release 1.7 (February 2018)
* `-i, -e` filtering: Major revamp, improved filtering by FORMAT fields
and missing values. New GT=ref,alt,mis etc keywords, check the documentation
for details.
* `query`: Only matching expression are printed when both the -f and -i/-e
expressions contain genotype fields. Note that this changes the original
behavior. Previously all samples were output when one matching sample was
found. This functionality can be achieved by pre-filtering with view and then
streaming to query. Compare
bcftools query -f'[%CHROM:%POS %SAMPLE %GT\n]' -i'GT="alt"' file.bcf
and
bcftools view -i'GT="alt"' file.bcf -Ou | bcftools query -f'[%CHROM:%POS %SAMPLE %GT\n]'
* `annotate`: New -k, --keep-sites option
* `consensus`: Fix --iupac-codes output
* `csq`: Homs always considered phased and other fixes
* `norm`: Make `-c none` work and remove `query -c`
* `roh`: Fix errors in the RG output
* `stats`: Allow IUPAC ambiguity codes in the reference file; report the number of missing genotypes
* `+fill-tags`: Add ExcHet annotation
* `+setGt`: Fix bug in binom.test calculation, previously it worked only for nAlt<nRef!
* `+split`: New plugin to split a multi-sample file into single-sample files in one go
* Improve python3 compatibility in plotting scripts
## Release 1.6 (September 2017)
* New `sort` command.
* New options added to the `consensus` command. Note that the `-i, --iupac`
option has been renamed to `-I, --iupac`, in favor of the standard
`-i, --include`.
* Filtering expressions (`-i/-e`): support for `GT=<type>` expressions and
for lists and ranges (#639) - see the man page for details.
* `csq`: relax some GFF3 parsing restrictions to enable using Ensembl
GFF3 files for plants (#667)
* `stats`: add further documentation to output stats files (#316) and
include haploid counts in per-sample output (#671).
* `plot-vcfstats`: further fixes for Python3 (@nsoranzo, #645, #666).
* `query` bugfix (#632)
* `+setGT` plugin: new option to set genotypes based on a two-tailed binomial
distribution test. Also, allow combining `-i/-e` with `-t q`.
* `mpileup`: fix typo (#636)
* `convert --gvcf2vcf` bugfix (#641)
* `+mendelian`: recognize some mendelian inconsistencies that were
being missed (@oronnavon, #660), also add support for multiallelic
sites and sex chromosomes.
## Release 1.5 (June 2017)
* Added autoconf support to bcftools. See `INSTALL` for more details.
* `norm`: Make norm case insensitive (#601). Trim the reference allele (#602).
* `mpileup`: fix for misreported indel depths for reads containing adjacent
indels (3c1205c1).
* `plot-vcfstats`: Open stats file in text mode, not binary (#618).
* `fixref` plugin: Allow multiallelic sites in the `-i, --use-id reference`.
Also flip genotypes, not just REF/ALT!
* `merge`: fix gVCF merge bug when last record on a chromosome opened a
gVCF block (#616)
* New options added to the ROH plotting script.
* `consensus`: Properly flush chain info (#606, thanks to @krooijers).
* New `+prune` plugin for pruning sites by LD (R2) or maximum number of
records within a window.
* New N_MISSING, F_MISSING (number and fraction missing) filtering
expressions.
* Fix HMM initialization in `roh` when snapshots are used in multiple
chromosome VCF.
* Fix buffer overflow (#607) in `filter`.
## Release 1.4.1 (8 May 2017)
* `roh`: Fixed malfunctioning options `-m, --genetic-map` and `-M, --rec-rate`,
and newly allowed their combination. Added a convenience wrapper `misc/run-roh.pl`
and an interactive script for visualizing the calls `misc/plot-roh.py`.
* `csq`: More control over warning messages (#585).
* Portability improvements (#587). Still work to be done on this front.
* Add support for breakends to `view`, `norm`, `query` and filtering (#592).
* `plot-vcfstats`: Fix for python 2/3 compatibility (#593).
* New `-l, --list` option for `+af-dist` plugin.
* New `-i, --use-id` option for `+fix-ref` plugin.
* Add `--include/--exclude` options to `+guess-ploidy` plugin.
* New `+check-sparsity` plugin.
* Miscellaneous bugfixes for #575, #584, #588, #599, #535.
## Release 1.4 (13 March 2017)
Two new commands - `mpileup` and `csq`:
* The `mpileup` command has been imported from samtools to bcftools. The
reasoning behind this is that bcftools calling is intimately tied to mpileup
and any changes to one, often requires changes to the other. Only the
genotype likelihood (BCF output) part of mpileup has moved to bcftools,
while the textual pileup output remains in samtools. The BCF output option
in `samtools mpileup` will likely be removed in a release or two or when
changes to `bcftools call` are incompatible with the old mpileup output.
The basic mpileup functionality remains unchanged as do most of the command
line options, but there are some differences and new features that one
should be aware of:
- The option `samtools mpileup -t, --output-tags` changed to `bcftools
mpileup -a, --annotate` to avoid conflict with the `-t, --targets`
option common across other bcftools commands.
- `-O, --output-BP` and `-s, --output-MQ` are no longer used as they are
only for textual pipelup output, which is not included in `bcftools
mpileup`. `-O` short option reassigned to `--output-type` and `-s`
reassigned to `--samples` for consistency with other bcftools commands.
- `-g, --BCF`, `-v, --VCF`, and ` -u, --uncompressed` options from
`samtools mpileup` are no longer used, being replaced by the
`-O, --output-type` option common to other bcftools commands.
- The `-f, --fasta-ref` option is now required by default to help avoid user
errors. Can be disabled using `--no-reference`.
- The option `-d, --depth .. max per-file depth` now behaves as expected
and according to the documentation, and prints a meaningful diagnostics.
- The `-S, --samples-file` can be used to rename samples on the fly. See man
page for details.
- The `-G, --read-groups` functionality has been extended to allow
reassignment, grouping and exclusion of readgroups. See man page for
details.
- The `-l, --positions` replaced by the `-t, --targets` and
`-T, --targets-file` options to be consistent with other bcftools
commands.
- gVCF output is supported. Per-sample gVCFs created by mpileup can be
merged using `bcftools merge --gvcf`.
- Can generate mpileup output on multiple (indexed) regions using the
`-r, --regions` and `-R, --regions-file` options. In samtools, one
was restricted to a single region with the `-r, --region` option.
- Several speedups thanks to @jkbonfield (cf3a55a).
* `csq`: New command for haplotype-aware variant consequence calling.
See man page and [paper](https://www.ncbi.nlm.nih.gov/pubmed/28205675).
Updates, improvements and bugfixes for many other commands:
* `annotate`: `--collapse` option added. `--mark-sites` now works with
VCF files rather than just tab-delimited files. Now possible to annotate
a subset of samples from tab file, not just VCF file (#469). Bugfixes (#428).
* `call`: New option `-F, --prior-freqs` to take advantage of prior knowledge
of population allele frequencies. Improved calculation of the QUAL score
particularly for REF sites (#449, 7c56870). `PLs>=256` allowed in
`call -m`. Bugfixes (#436).
* `concat --naive` now works with vcf.gz in addition to bcf files.
* `consensus`: handle variants overlapping region boundaries (#400).
* `convert`: gvcf2vcf support for mpileup and GATK. new `--sex` option to
assign sex to be used in certain output types (#500). Large speedup of
`--hapsample` and `--haplegendsample` (e8e369b) especially with `--threads`
option enabled. Bugfixes (#460).
* `cnv`: improvements to output (be8b378).
* `filter`: bugfixes (#406).
* `gtcheck`: improved cross-check mode (#441).
* `index` can now specify the path to the output index file. Also, gains the
`--threads` option.
* `merge`: Large overhaul of `merge` command including support for merging
gVCF files created by `bcftools mpileup --gvcf` with the new `-g, --gvcf`
option. New options `-F` to control filter logic and `-0` to set missing
data to REF. Resolved a number of longstanding issues (#296, #361, #401,
#408, #412).
* `norm`: Bugfixes (#385,#452,#439), more informative error messages (#364).
* `query`: `%END` plus `%POS0`, `%END0` (0-indexed) support - allows easy BED
format output (#479). `%TBCSQ` for use with the new `csq` command. Bugfixes
(#488,#489).
* `plugin`: A number of new plugins:
- `GTsubset` (thanks to @dlaehnemann)
- `ad-bias`
- `af-dist`
- `fill-from-fasta`
- `fixref`
- `guess-ploidy` (deprecates `vcf2sex` plugin)
- `isecGT`
- `trio-switch-rate`
and changes to existing plugins:
- `tag2tag`: Added `gp-to-gt`, `pl-to-gl` and `--threshold` options and
bugfixes (#475).
- `ad-bias`: New `-d` option for minimum depth.
- `impute-info`: Bugfix (49a9eaf).
- `fill-tags`: Added ability to aggregate tags for sample subgroups, thanks
to @mh11. (#503). HWE tag added as an option.
- `mendelian`: Bugfix (#566).
* `reheader`: allow muiltispace delimiters in `--samples` option.
* `roh`: Now possible to process multiple samples at once. This allows
considerable speedups for files with thousands of samples where the cost of
HMM is neglibible compared to I/O and decompressing. In order to fit tens of
thousands samples in memory, a sliding HMM can be used (new `--buffer-size`
option). Viterbi training now uses Baum-Welch algorithm, and works much
better. Support for gVCFs or FORMAT/PL tags. Added `-o, output` and
`-O, --output-type` options to control output of sites or regions
(compression optional). Many bugs fixed - do not segfault on missing PL
values anymore, a typo in genetic map calculation resulted in a slowdown and
incorrect results.
* `stats`: Bugfixes (16414e6), new options `-af-bins` and `-af-tags` to control
allele frequency binning of output. Per-sample genotype concordance tables
added (#477).
* `view -a, --trim-alt-alleles` various bugfixes for missing data and more
informative errors should now be given on failure to pinpoint problems.
General changes:
* Timestamps are now added to header lines summarising the command (#467).