1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134
|
## Release 1.22 (30th May 2025)
Changes affecting the whole of bcftools, or multiple commands:
* Add support for matching lines by ID via the --pair-logic and --collapse options (#1739)
* The -i/-e filtering expressions
- The expressions now properly match the regex negation of missing values, e.g. -i 'TAG!~"\."' (#2355)
- Added support for Fisher's exact test
* Add the option `-v, --verbosity INT` to all bcftools commands and plugins. Verbosity values
bigger than 3 are passed to the underlying HTSlib library so that the user can investigate
network issues and other problems occurring at the library level.
Changes affecting specific commands:
* bcftools annotate
- Fix Number in the header definition of transferred FILTER and ID tags (#2335)
* bcftools call
- The `-s, --samples` option was not working properly, now also supporting
sample negation as advertised in the manual page, e.g. `-s ^sample1,sample2`
to include all samples but sample1 and sample2 (#2380)
* bcftools consensus
- Preserve entire missing gVCF blocks with --missing (#2350)
- Fixed a bug, the `-S, --samples-file` option is no longer ignored (#2398)
* bcftools convert
- The command `convert --gvcf2vcf` was not filling the REF allele when BCF was output (#243)
* bcftools csq
- Check the input GFF for features outside transcript boundaries and extend the transcript
to contain the feature fully (#2323)
- Add experimental support for alternative genetic code tables, accessible via
a new option `-C, --genetic-code` (#2368)
- Change in the `--unify-chr-names` option, no automatic sequence name modification
is attempted anymore, the prefixes to trim must be given explictly. For example,
if run with `--unify-chr-names chr,Chromosome,-`, the program will trim the "chr"
prefix in the VCF, "Chromosome" in the GFF, leaving the fasta unchanged (#2378)
* bcftools +fill-tags
- Thanks to the extension of filtering expressions with Fisher's exact test, the plugin
can now be used to add FT annotation (#1582)
* bcftools merge
- Preserve phasing in half-missing genotypes (#2331)
- The option `--merge none` is expected to create no new multiallelic sites, but it should
allow to merge, say, A>C with A>C,AT (#2333)
- Make `--merge both` work with indel-only records; for example, the multiallelic
site G>GT,T should be merged with G>GT (#2339)
- Do not merge symbolic alleles unless they have not just the same type, eg. <DEL>,
but also length, i.e the INFO/END coordinate (#2362)
- Fix a bug where an incorrectly formatted gVCF file with overlapping blocks would trigger
an infinite loop in the program (#2410)
* bcftools mpileup
- The -r/-R option newly merge overlapping regions, preventing the output of duplicate sites
* bcftools norm
- Print the number of removed duplicate sites in the final statistics (#2346)
- Preserve the original alleles in `--old-rec-tag` when `--check-ref s` requested (#2357)
- Print a warning when INFO/SVLEN is not defined as Number=A (#2371)
* plot-vcfstats
- Make the option `-s, --sample-names` functional again (#2353)
* bcftools +prune
- New option to remove or annotate clusters of sites within a window
* bcftools query
- The functions used in -i/-e filtering expressions (such as SUM, MEDIAN, etc) can be
now used in formatting expressions (#2271).
If the VCF contains INFO/AD and FORMAT/AD, try:
bcftools query test.vcf -f '%CHROM:%POS \t [ %AD] \t [ %sSUM(FMT/AD)]'
bcftools query test.vcf -f '%CHROM:%POS \t [ %AD] \t [ %SUM(FMT/AD)]'
bcftools query test.vcf -f '%CHROM:%POS \t [ %AD] \t %SUM(FMT/AD)'
bcftools query test.vcf -f '%CHROM:%POS \t [ %AD] \t %SUM(INFO/AD)'
- Make it possible to refer to the ID column from the FORMAT expression (#2337)
bcftools query test.vcf -f 'ID=%ID ID=[ %/ID] vs FMT_ID=[ %ID]'
* bcftools roh
- New visualization tool misc/roh-viz, see below
* bcftools +setGT
- Support for setting missing genotypes with arbitrary ploidy via `-n c:./.` (#2303)
* bcftools +split-vep
- The `-s, --select` option was extended to print only one consequence. Previously it
was possible to select a single transcript (e.g., the one with the worst consequence),
and it was possible to filter by consequence severity (e.g., missing or worse),
but in some cases multiple consequences are reported within a single transcript
(e.g., start_lost&splice_region). The extended option allows to print the worst
part, for example as
--select primary:missense+:worst
* bcftools +trio-dnm2
- Fix a problem with --strictly-novel option which would neglect the presence of the apparent de novo
allele in the father for male offspring
- Fix a problem with uncallsed mosaic chrX variants in males
* roh-viz
- HTML/JavaScript visualization of bcftools/roh output and homozygosity rate.
* bcftools +vrfs
- New experimental plugin for scoring variants and assess site noisiness (variant read frequency profiles)
from a large number of unaffected parental samples
## Release 1.21 (12th September 2024)
Changes affecting the whole of bcftools, or multiple commands:
* Support multiple semicolon-separated strings when filtering by ID using -i/-e (#2190).
For example, `-i 'ID="rs123"'` now correctly matches `rs123;rs456`
* The filtering expression ILEN can be positive (insertion), negative (deletion), zero
(balanced substitutions), or set to missing value (symbolic alleles).
* bcftools query
* bcftools +split-vep
- The columns indices printed by default with `-H` (e.g., "#[1]CHROM") can be now
suppressed by giving the option twice `-HH` (#2152)
Changes affecting specific commands:
* bcftools annotate
- Support dynamic variables read from a tab-delimited annotation file (#2151)
For example, in the two cases below the field 'STR' from the -a file is required to match
the INFO/TAG in VCF. In the first example the alleles REF,ALT must match, in the second
example they are ignored. The option -k is required to output also records that were not
annotated:
bcftools annotate -a ann.tsv.gz -c CHROM,POS,REF,ALT,SCORE,~STR -i'TAG={STR}' -k in.vcf
bcftools annotate -a ann.tsv.gz -c CHROM,POS,-,-,SCORE,~STR -i'TAG={STR}' -k in.vcf
- When adding Type=String annotations from a tab-delimited file, encode characters with
special meaning using percent encoding (';', '=' in INFO and ':' in FORMAT) (#2202)
* bcftools consensus
- Allow to apply a reference allele which overlaps a previous deletion, there is no
need to complain about overlapping alleles in such case
- Fix a bug which required `-s -` to be present even when there were no samples in the VCF
(#2260)
* bcftools csq
- Fix a rare bug where indel combined with a substitution ending at exon boundary is
incorrectly predicted to have 'inframe' rather than 'frameshift' consequence (#2212)
* bcftools gtcheck
- Fix a segfault with --no-HWE-prob. The bug was introduced with the output format change in
1.19 which replaced the DC section with DCv2 (#2180)
- The number of matching genotypes in the DCv2 output was not calculated correctly with
non-zero `-E, --error-probability`. Consequently, also the average HWE score was incorrect.
The main output, the discordance score, was not affected by the bug
* bcftools +mendelian2
- Include the number of good cases where at least one of the trio genotypes has an alternate
allele (#2204)
- Fix the error message which would report the wrong sample when non-existent sample is given.
Note that bug only affected the error message, the program otherwise assigns the family
members correctly (#2242)
* bcftools merge
- Fix a severe bug in merging of FORMAT fields with Number=R and Number=A values. For example,
rows with high-coverage FORMAT/AD values (bigger or equal to 128) could have been assigned
to incorrect samples. The bug was introduced in version 1.19. For details see #2244.
* bcftools mpileup
- Return non-zero error code when the input BAM/CRAM file is truncated (#2177)
- Add FORMAT/AD annotation by default, disable with `-a -AD`
* bcftools norm
- Support realignment of symbolic <DUP.*> alleles, similarly to <DEL.*> added previously
(#1919,#2145)
- Fix in reporting reference allele genotypes with `--multi-overlaps .` (#2160)
- Support of duplicate removal of symbolic alleles of the same type but different SVLEN (#2182)
- New `-S, --sort` switch to optionally sort output records by allele (#1484)
- Add the `-i/-e` filtering options to select records for normalization. Note duplicate
removal ignores this option.
- Fix a bug where `--atomize` would not fill GT alleles for atomized SNVs followed by
an indel (#2239)
* bcftools +remove-overlaps
- Revamp the program to allow greater flexibility, with the following new options:
-M, --mark-tag TAG Mark -m sites with INFO/TAG
-m, --mark EXPR Mark (if also -M is present) or remove sites [overlap]
dup .. all overlapping sites
overlap .. overlapping sites
min(QUAL) .. mark sites with lowest QUAL until overlaps are resolved
--missing EXPR Value to use for missing tags with -m 'min(QUAL)'
0 .. the default
DP .. heuristics, scale maximum QUAL value proportionally to INFO/DP
--reverse Apply the reverse logic, for example preserve duplicates instead of removing
-O, --output-type t t: plain list of sites (chr,pos), tz: compressed list
* bcftools +tag2tag
- The conversions --LXX-to-XX, --XX-to-LXX were working but specific cases such as --LAD-to-AD were not.
- Print more informative error message when source tag type violiates VCF specification
* bcftools +trio-dnm2
- Better handling of the --strictly-novel functionality, especically with respect to chrX inheritance
## Release 1.20 (15th April 2024)
Changes affecting the whole of bcftools, or multiple commands:
* Add short option -W for --write-index. The option now accepts an optional parameter
which allows to choose between TBI and CSI index format.
Changes affecting specific commands:
* bcftools consensus
- Add new --regions-overlap option which allows to take into account overlapping deletions
that start out of the fasta file target region.
* bcftools isec
- Add new option `-l, --file-list` to read the list of file names from a file
* bcftools merge
- Add new option `--force-single` to support single-file edge case (#2100)
* bcftools mpileup
- Add new option --indels-cns for an alternative indel calling model, which should increase
the speed on long read data (thanks to using edlib) and the precision (thanks to a number
of heuristics).
* bcftools norm
- Change the order of atomization and multiallelic splitting (when both -a,-m are given)
from "atomize first, then split" to "split first, then atomize". This usually results
in a simpler VCF representation. The previous behaviour can be achieved by explicitly
streaming the output of the --atomize command into the --multiallelics splitting command.
- Fix Type=String multiallelic splitting for Number=A,R,G tags with incorrect number
of values.
- Merging into multiallelic sites with `bcftools norm -m +indels` did not work. This is
now fixed and the merging is now more strict about variant types, for example complex
events, such as AC>TGA, are not considered as indels anymore (#2084)
* bcftools reheader
- Allow reading the input file from a stream with --fai (#2088)
* bcftools +setGT
- Support for custom genotypes based on the allele with higher depth, such
as `--new-gt c:0/X` custom genotypes (#2065)
* bcftools +split-vep
- When only one of the tags is present, automatically choose INFO/BCSQ (the default
tag name produced by `bcftools csq`) or INFO/CSQ (produced by VEP). When both
tags are present, use the default INFO/CSQ.
- Transcript selection by MANE, PICK, and user-defined transcripts, for example
--select CANONICAL=YES
--select MANE_SELECT!=""
--select PolyPhen~probably_damaging
- Select all matching transcripts via --select, not just one
- Change automatic type parsing of VEP fields DNA_position, CDS_position, and Protein_position
from Integer to String, as it can be of the form "8586-8599/9231". The type Integer can be
still enforced with `-c cDNA_position:int,CDS_position:int,Protein_position:int`.
- Recognize `-c field:str`, not just `-c field:string`, as advertised in the usage page
- Fix a bug which made filtering expression containing missing values crash (#2098)
* bcftools stats
- When GT is missing but AD is present, the program determines the alternate allele from AD.
However, if the AD tag has incorrect number of values, the program would exit with an error
printing "Requested allele outside valid range". This is now fixed by taking into account
the actual number of ALT alleles.
* bcftools +tag2tag
- Support for conversion from tags using localized alleles (e.g. LPL, LAD) to the family of
standard tags (PL, AD)
* bcftools +trio-dnm2
- Extend --strictly-novel to exclude cases where the non-Mendelian allele
is the reference allele. The change is motivated by the observation that
this class of variants is enriched for errors (especially for indels),
and better corresponds with the option name.
## Release 1.19 (12th December 2023)
Changes affecting the whole of bcftools, or multiple commands:
* Filtering expressions can be given a file with list of strings to match, this
was previously possible only for the ID column. For example
ID=@file .. selects lines with ID present in the file
INFO/TAG=@file.txt .. selects lines where TAG has a string value listed in the file
INFO/TAG!=@file.txt .. TAG must not have a string value listed in the file
Allow to query REF,ALT columns directly, for example
-e 'REF="N"'
Changes affecting specific commands:
* bcftools annotate
- Fix `bcftools annotate --mark-sites`, VCF sites overlapping regions in a BED file
were not annotated (#1989)
- Add flexibility to FILTER column transfers and allow transfers within the same file,
across files, and in combination. For examples see
http://samtools.github.io/bcftools/howtos/annotate.html#transfer_filter_to_info
* bcftools call
- Output MIN_DP rather than MinDP in gVCF mode
- New `-*, --keep-unseen-allele` option to output the unobserved allele <*>,
intended for gVCF.
* bcftools head
- New `-s, --samples` option to include the #CHROM header line with samples.
* bcftools gtcheck
- Add output options `-o, --output` and `-O, --output-type`
- Add filtering options `-i, --include` and `-e, --exclude`
- Rename the short option `-e, --error-probability` from lower case to upper
case `-E, --error-probability`
- Changes to the output format, replace the DC section with DCv2:
- adds a new column for the number of matching genotypes
- The --error-probability is newly interpreted as the probability of erroneous
allele rather than genotype. In other words, the calculation of the discordance
score now considers the probability of genotyping error to be different
for HOM and HET genotypes, i.e. P(0/1|dsg=0) > P(1/1|dsg=0).
- fixes in HWE score calculation plus output average HWE score rather
than absolute HWE score
- better description of fields
* bcftools merge
- Add `-m` modifiers to suppress the output of the unseen allele <*> or <NON_REF>
at variant sites (e.g. `-m both,*`) or all sites (e.g. `-m both,**`)
* bcftools mpileup
- Output MIN_DP rather than MinDP in gVCF mode
* bcftools norm
- Add the number of joined lines to the summary output, for example
Lines total/split/joined/realigned/skipped: 6/0/3/0/0
- Allow combining -m and -a with --old-rec-tag (#2020)
- Symbolic <DEL> alleles caused norm to expand REF to the full length of the deletion.
This was not intended and problematic for long deletions, the REF allele should list
one base only (#2029)
* bcftools query
- Add new `-N, --disable-automatic-newline` option for pre-1.18 query formatting behavior
when newline would not be added when missing
- Make the automatic addition of the newline character in a more predictable way and,
when missing, always put it at the end of the expression. In version 1.18 it could
be added at the end of the expression (for per-site expressions) or inside the square
brackets (for per-sample expressions). The new behavior is:
- if the formatting expression contains a newline character, do nothing
- if there is no newline character and -N, --disable-automatic-newline is given, do nothing
- if there is no newline character and -N is not given, insert newline at the end of the expression
See #1969 for details
- Add new `-F, --print-filtered` option to output a default string for samples that would otherwise
be filtered by `-i/-e` expressions.
- Include sample name in the output header with `-H` whenever it makes sense (#1992)
* bcftools +spit-vep
- Fix on the fly filtering involving numeric subfields, e.g. `-i 'MAX_AF<0.001'` (#2039)
- Interpret default column type names (--columns-types) as entire strings, rather than
substrings to avoid unexpected spurious matches (i.e. internally add ^ and $ to all
field names)
* bcftools +trio-dnm2
- Do not flag paternal genotyping errors as de novo mutations. Specifically, when father's
chrX genotype is 0/1 and mother's 0/0, 0/1 in the child will not be marked as DNM.
* bcftools view
- Add new `-A, --trim-unseen-allele` option to remove the unseen allele <*> or <NON_REF>
at variant sites (`-A`) or all sites (`-AA`)
## Release 1.18 (25th July 2023)
Changes affecting the whole of bcftools, or multiple commands:
* Support auto indexing during writing BCF and VCF.gz via new `--write-index` option
Changes affecting specific commands:
* bcftools annotate
- The `-m, --mark-sites` option can be now used to mark all sites without the
need to provide the `-a` file (#1861)
- Fix a bug where the `-m` function did not respect the `--min-overlap` option (#1869)
- Fix a bug when update of INFO/END results in assertion error (#1957)
* bcftools concat
- New option `--drop-genotypes`
* bcftools consensus
- Support higher-ploidy genotypes with `-H, --haplotype` (#1892)
- Allow `--mark-ins` and `--mark-snv` with a character, similarly to `--mark-del`
* bcftools convert
- Support for conversion from tab-delimited files (CHROM,POS,REF,ALT) to sites-only VCFs
* bcftools csq
- New `--unify-chr-names` option to automatically unify different chromosome
naming conventions in the input GFF, fasta and VCF files (e.g. "chrX" vs "X")
- More versatility in parsing various flavors of GFF
- A new `--dump-gff` option to help with debugging and investigating the internals
of hGFF parsing
- When printing consequences in nonsense mediated decay transcripts, include 'NMD_transcript'
in the consequence part of the annotation. This is to make filtering easier and analogous to
VEP annotations. For example the consequence annotation
3_prime_utr|PCGF3|ENST00000430644|NMD
is newly printed as
3_prime_utr&NMD_transcript|PCGF3|ENST00000430644|NMD
* bcftools gtcheck
- Add stats for the number of sites matched in the GT-vs-GT, GT-vs-PL, etc modes. This
information is important for interpretation of the discordance score, as only the
GT-vs-GT matching can be interpreted as the number of mismatching genotypes.
* bcftools +mendelian2
- Fix in command line argument parsing, the `-p` and `-P` options were not
functioning (#1906)
* bcftools merge
- New `-M, --missing-rules` option to control the behavior of merging of vector tags
to prevent mixtures of known and missing values in tags when desired
- Use values pertaining to the unknown allele (<*> or <NON_REF>) when available
to prevent mixtures of known and missing values (#1888)
- Revamped line matching code to fix problems in gVCF merging where split gVCF blocks
would not update genotypes (#1891, #1164).
* bcftool mpileup
- Fix a bug in --indels-v2.0 which caused an endless loop when CIGAR operator 'H' or 'P'
was encountered
* bcftools norm
- The `-m, --multiallelics +` mode now preserves phasing (#1893)
- Symbolic <DEL.*> alleles are now normalized too (#1919)
- New `-g, --gff-annot` option to right-align indels in forward transcripts to follow
HGVS 3'rule (#1929)
* bcftools query
- Force newline character in formatting expression when not given explicitly
- Fix `-H` header output in formatting expressions containing newlines
* bcftools reheader
- Make `-f, --fai` aware of long contigs not representable by 32-bit integer (#1959)
* bcftools +split-vep
- Prevent a segfault when `-i/-e` use a VEP subfield not included in `-f` or `-c` (#1877)
- New `-X, --keep-sites` option complementing the existing `-x, --drop-sites` options
- Force newline character in formatting expression when not given explicitly
- Fix a subtle ambiguity: identical rows must be returned when `-s` is applied regardless
of `-f` containing the `-a` VEP tag itself or not.
* bcftools stats
- Collect new VAF (variant allele frequency) statistics from FORMAT/AD field
- When counting transitions/transversions, consider also alternate het genotypes
* plot-vcfstats
- Add three new VAF plots
## Release 1.17 (21st February 2023)
Changes affecting the whole of bcftools, or multiple commands:
* The -i/-e filtering expressions
- Error checks were added to prevent incorrect use of vector arithmetics. For example,
when evaluating the sum of two vectors A and B, the resulting vector could contain
nonsense values when the input vectors were not of the same length. The fix introduces
the following logic:
- evaluate to C_i = A_i + B_i when length(A)==B(A) and set length(C)=length(A)
- evaluate to C_i = A_i + B_0 when length(B)=1 and set length(C)=length(A)
- evaluate to C_i = A_0 + B_i when length(A)=1 and set length(C)=length(B)
- throw an error when length(A)!=length(B) AND length(A)!=1 AND length(B)!=1
- Arrays in Number=R tags can be now subscripted by alleles found in FORMAT/GT. For example,
FORMAT/AD[GT] > 10 .. require support of more than 10 reads for each allele
FORMAT/AD[0:GT] > 10 .. same as above, but in the first sample
sSUM(FORMAT/AD[GT]) > 20 .. require total sample depth bigger than 20
* The commands `consensus -H` and `+split-vep -H`
- Drop unnecessary leading space in the first header column and newly print `#[1]columnName`
instead of the previous `# [1]columnName` (#1856)
Changes affecting specific commands:
* bcftools +allele-length
- Fix overflow for indels longer than 512bp and aggregate alleles equal or larger than
that in the same bin (#1837)
* bcftools annotate
- Support sample reordering of annotation file (#1785)
- Restore lost functionality of the --pair-logic option (#1808)
* bcftools call
- Fix a bug where too many alleles passed to `-C alleles` via `-T` caused memory
corruption (#1790)
- Fix a bug where indels constrained with `-C alleles -T` would sometimes be missed (#1706)
* bcftools consensus
- BREAKING CHANGE: the option `-I, --iupac-codes` newly outputs IUPAC codes based on FORMAT/GT
of all samples. The `-s, --samples` and `-S, --samples-file` options can be used to subset
samples. In order to ignore samples and consider only the REF and ALT columns (the original
behavior prior to 1.17), run with `-s -` (#1828)
* bcftools convert
- Make variantkey conversion work for sites without an ALT allele (#1806)
* bcftool csq
- Fix a bug where a MNV with multiple consequences (e.g. missense + stop_gained)
would report only the less severe one (#1810)
- GFF file parsing was made slightly more flexible, newly ids can be just 'XXX'
rather than, for example, 'gene:XXX'
- New gff2gff perl script to fix GFF formatting differences
* bcftools +fill-tags
- More of the available annotations are now added by the `-t all` option
* bcftools +fixref
- New INFO/FIXREF annotation
- New -m swap mode
* bcftools +mendelian
- The +mendelian plugin has been deprecated and replaced with +mendelian2. The
function of the plugin is the same but the command line options and the output
format has changed, and for this was introduced as a new plugin.
* bcftools mpileup
- Most of the annotations generated by mpileup are now optional via the
`-a, --annotate` option and add several new (mostly experimental) annotations.
- New option `--indels-2.0` for an EXPERIMENTAL indel calling model. This model aims
to address some known deficiencies of the current indel calling algorithm, specifically,
it uses diploid reference consensus sequence. Note that in the current version it
has the potential to increase sensitivity but at the cost of decreased specificity.
- Make the FS annotation (Fisher exact test strand bias) functional and remove it
from the default annotations
* bcftools norm
- New --multi-overlaps option allows setting overlapping alleles either to the
ref allele (the current default) or to a missing allele (#1764 and #1802)
- Fixed a bug in `-m -` which does not split missing FORMAT values correctly and
could lead to empty FORMAT fields such as `::` instead of the correct `:.:` (#1818)
- The `--atomize` option previously would not split complex indels such as C>GGG.
Newly these will be split into two records C>G and C>CGG (#1832)
* bcftools query
- Fix a rare bug where the printing of SAMPLE field with `query` was incorrectly
suppressed when the `-e` option contained a sample expression while the formatting
query did not. See #1783 for details.
* bcftools +setGT
- Add new `--new-gt X` option (#1800)
- Add new `--target-gt r:FLOAT` option to randomly select a proportion of genotypes (#1850)
- Fix a bug where `-t ./x` mode was advertised as selecting both phased and unphased
half-missing genotypes, but was in fact selecting only unphased genotypes (#1844)
* bcftools +split-vep
- New options `-g, --gene-list` and `--gene-list-fields` which allow to prioritize
consequences from a list of genes, or restrict output to the listed genes
- New `-H, --print-header` option to print the header with `-f`
- Work around a bug in the LOFTEE VEP plugin used to annotate gnomAD VCFs. There the
LoF_info subfield contains commas which, in general, makes it impossible to parse the
VEP subfields. The +split-vep plugin can now work with such files, replacing the offending
commas with slash (/) characters. See also https://github.com/Ensembl/ensembl-vep/issues/1351
- Newly the `-c, --columns` option can be omitted when a subfield is used in `-i/-e` filtering
expression. Note that `-c` may still have to be given when it is not possible to infer the
type of the subfield. Note that this is an experimental feature.
* bcftools stats
- The per-sample stats (PSC) would not be computed when `-i/-e` filtering options and
the `-s -` option were given but the expression did not include sample columns (1835)
* bcftools +tag2tag
- Revamp of the plugin to allow wider range of tag conversions, specifically all combinations
from FORMAT/GL,PL,GP to FORMAT/GL,PL,GP,GT
* bcftools +trio-dnm2
- New `-n, --strictly-novel` option to downplay alleles which violate Mendelian
inheritance but are not novel
- Allow to set the `--pn` and `--pns` options separately for SNVs and indels and make
the indel settings more strict by default
- Output missing FORMAT/VAF values in non-trio samples, rather than random nonsense values
* bcftools +variant-distance
- New option `-d, --direction` to choose the directionality: forward, reverse, nearest (the default)
or both (#1829)
## Release 1.16 (18th August 2022)
* New plugin `bcftools +variant-distance` to annotate records with distance to the
nearest variant (#1690)
Changes affecting the whole of bcftools, or multiple commands:
* The -i/-e filtering expressions
- Added support for querying of multiple filters, for example `-i 'FILTER="A;B"'`
can be used to select sites with two filters "A" and "B" set. See the documentation
for more examples.
- Added modulo arithmetic operator
Changes affecting specific commands:
* bcftools annotate
- A bug introduced in 1.14 caused that records with INFO/END annotation would
incorrectly trigger `-c ~INFO/END` mode of comparison even when not explicitly
requested, which would result in not transferring the annotation from a tab-delimited
file (#1733)
* bcftools merge
- New `-m snp-ins-del` switch to merge SNVs, insertions and deletions separately (#1704)
* bcftools mpileup
- New NMBZ annotation for Mann-Whitney U-z test on number of mismatches within
supporting reads
- Suppress the output of MQSBZ and FS annotations in absence of alternate allele
* bcftools +scatter
- Fix erroneous addition of duplicate PG lines
* bcftools +setGT
- Custom genotypes (e.g. `-n c:1/1`) now correctly override ploidy
## Release 1.15.1 (7th April 2022)
* bcftools annotate
- New `-H, --header-line` convenience option to pass a header line on command line,
this complements the existing `-h, --header-lines` option which requires a file
with header lines
* bcftools csq
- A list of consequence types supported by `bcftools csq` has been added to
the manual page. (#1671)
* bcftools +fill-tags
- Extend generalized functions so that FORMAT tags can be filled as well, for example:
bcftools +fill-tags in.bcf -o out.bcf -- -t 'FORMAT/DP:1=int(smpl_sum(FORMAT/AD))'
- Allow multiple custom functions in a single run. Previously the program would silently
go with the last one, assigning the same values to all (#1684)
* bcftools norm
- Fix an assertion failure triggered when a faulty VCF file with a '-'
character in the REF allele was used with `bcftools norm --atomize`. This
option now checks that the REF allele only includes the allowed characters
A, C, G, T and N. (#1668)
- Fix the loss of phasing in half-missing genotypes in variant atomization (#1689)
* bcftools roh
- Fix a bug that could result in an endless loop or incorrect AF estimate when
missing genotypes are present and the `--estimate-AF -` option was used (#1687)
* bcftools +split-vep
- VEP fields with characters disallowed in VCF tag names by the specification (such as '-'
in 'M-CAP') couldn't be queried. This has been fixed, the program now sanitizes the field
names, replacing invalid characters with underscore (#1686)
## Release 1.15 (21st February 2022)
* New `bcftools head` subcommand for conveniently displaying the headers
of a VCF or BCF file. Without any options, this is equivalent to
`bcftools view --header-only --no-version` but more succinct and memorable.
* The `-T, --targets-file` option had the following bug originating in HTSlib code:
when an uncompressed file with multiple columns CHR,POS,REF was provided, the
REF would be interpreted as 0 gigabases (#1598)
Changes affecting specific commands:
* bcftools annotate
- In addition to `--rename-annots`, which requires a file with name mappings,
it is now possible to do the same on the command line `-c NEW_TAG:=OLD_TAG`
- Add new option --min-overlap to specify the minimum required
overlap of intersecting regions
- Allow to transfer ALT from VCF with or without replacement using
bcftools annotate -a annots.vcf.gz -c ALT file.vcf.gz
bcftools annotate -a annots.vcf.gz -c +ALT file.vcf.gz
* bcftools convert
- Revamp of `--gensample`, `--hapsample` and `--haplegendsample` family of options
which includes the following changes:
- New `--3N6` option to output/input the new version of the .gen file format,
see https://www.cog-genomics.org/plink/2.0/formats#gen
- Deprecate the `--chrom` option in favor of `--3N6`. A simple `cut` command
can be used to convert from the new 3*M+6 column format to the format printed
with `--chrom` (`cut -d' ' -f1,3-`).
- The CHROM:POS_REF_ALT IDs which are used to detect strand swaps are required
and must appear either in the "SNP ID" column or the "rsID" column. The column
is autodetected for `--gensample2vcf`, can be the first or the second for
`--hapsample2vcf` (depending on whether the `--vcf-ids` option is given), must be
the first for `--haplegendsample2vcf`.
* bcftools csq
- Allow GFF files with phase column unset
* bcftools filter
- New `--mask`, `--mask-file` and `--mask-overlap` options to soft filter
variants in regions (#1635)
* bcftools +fixref
- The `-m id` option now works also for non-dbSNP ids, i.e. not just `rsINT`
- New `-m flip-all` mode for flipping all sites, including ambiguous A/T and C/G sites
* bcftools isec
- Prevent segfault on sites filtered with -i/-e in all files (#1632)
* bcftools mpileup
- More flexible read filtering using the options
--ls, --skip-all-set .. skip reads with all of the FLAG bits set
--ns, --skip-any-set .. skip reads with any of the FLAG bits set
--lu, --skip-all-unset .. skip reads with all of the FLAG bits unset
--nu, --skip-any-unset .. skip reads with any of the FLAG bits unset
The existing synonymous options will continue to function but their use
is discouraged
--rf, --incl-flags STR|INT Required flags: skip reads with mask bits unset
--ff, --excl-flags STR|INT Filter flags: skip reads with mask bits set
* bcftools query
- Make the `--samples` and `--samples-file` options work also in the `--list-samples`
mode. Add a new `--force-samples` option which enables proceeding even when some of
the requested samples are not present in the VCF (#1631)
* bcftools +setGT
- Fix a bug in `-t q -e EXPR` logic applied on FORMAT fields, sites with all
samples failing the expression EXPR were incorrectly skipped. This problem
affected only the use of `-e` logic, not the `-i` expressions (#1607)
* bcftools sort
- make use of the TMPDIR environment variable when defined
* bcftools +trio-dnm2
- The --use-NAIVE mode now also adds the de novo allele in FORMAT/VA
## Release 1.14 (22nd October 2021)
Changes affecting the whole of bcftools, or multiple commands:
* New `--regions-overlap` and `--targets-overlap` options which address
a long-standing design problem with subsetting VCF files by region.
BCFtools recognize two sets of options, one for streaming (`-t/-T`) and
one for index-gumping (`-r/-R`). They behave differently, the first
includes only records with POS coordinate within the regions, the other
includes overlapping regions. The two new options allow to modify the
default behavior, see the man page for more details.
* The `--output-type` option can be used to override the default compression
level
Changes affecting specific commands:
* bcftools annotate
- when `--set-id` and `--remove` are combined, `--set-id` cannot use
tags deleted by `--remove`. This is now detected and the program
exists with an informative error message instead of segfaulting
(#1540)
- while non-symbolic variation are uniquely identified by POS,REF,ALT,
symbolic alleles starting at the same position were undistinguishable.
This prevented correct matching of records with the same positions and
variant type but different length given by INFO/END (samtools/htslib@60977f2).
When annotating froma VCF/BCF, the matching is done automatically. When
annotating from a tab-delimited text file, this feature can be invoked
by using `-c INFO/END`.
- add a new '.' modifier to control whether missing values should be carried
over from a tab-delimited file or not. For example:
-c TAG .. adds TAG if the source value is not missing. If TAG
exists in the target file, it will be overwritten
-c .TAG .. adds TAG even if the source value is missing. This
can overwrite non-missing values with a missing value
and can create empty VCF fields (`TAG=.`)
* bcftools +check-ploidy
- by default missing genotypes are not used when determining ploidy.
With the new option `-m, --use-missing` it is possible to use the
information carried in the missing and half-missing genotypes
(e.g. ".", "./." or "./1")
* bcftools concat:
- new `--ligate-force` and `--ligate-warn` options for finer control
of `-l, --ligate` behavior in imperfect overlaps. The new default is
to throw an error when sites present in one chunk but absent in the
other are encountered. To drop such sites and proceed, use the new
`--ligate-warn` option (previously this was the default). To keep such
sites, use the new `--ligate-force` option (#1567).
* bcftools consensus:
- Apply mask even when the VCF has no notion about the chromosome. It
was possible to encounter this problem when `contig` lines were not
present in the VCF header and no variants were called on that chromosome
(#1592)
* bcftools +contrast:
- support for chunking within map/reduce framework allowing to collect
NASSOC counts even for empty case/control sample sets (#1566)
* bcftools csq:
- bug fix, compound indels were not recognised in some cases (#1536)
- compound variants were incorrectly marked as 'inframe' even when
stop codon would occur before the frame was restored (#1551)
- bug fix, FORMAT/BCSQ bitmasks could have been assigned incorrectly
to some samples at multiallelic sites, a superset of the correct
consequences would have been set (#1539)
- bug fix, the upstream stop could be falsely assigned to all samples in
a multi-sample VCF even if the stop was relevant for a single sample
only (#1578)
- further improve the detection of mismatching chromosome naming
(e.g. "chrX" vs "X") in the GFF, VCF and fasta files
* bcftools merge:
- keep (sum) INFO/AN,AC values when merging VCFs with no samples (#1394)
* bcftools mpileup:
- new --indel-size option which allows increase of the maximum considered
indel size considered, large deletions in long read data are otherwise
lost.
* bcftools norm:
- atomization now supports Number=A,R string annotations (#1503)
- assign as many alternate alleles to genotypes at multiallelic sites
in the`-m +` mode, disregarding the phase. Previously the program
assumed to be executed as an inverse operation of `-m -`, but when
that was not the case, reference alleles would have been filled
instead of multiple alternate alleles (#1542)
* bcftools sort:
- increase accuracy of the --max-mem option limit, previously the limit
could be exceeded by more than 20% (#1576)
* bcftools +trio-dnm:
- new `--with-pAD` option to allow processing of VCFs without FORMAT/QS.
The existing `--ppl` option was changed to the analogous `--with-pPL`
* bcftools view:
- the functionality of the option --compression-level lost in 1.12
has been restored
## Release 1.13 (7th July 2021)
This release brings new options and significant changes in BAQ parametrization
in `bcftools mpileup`. The previous behavior can be triggered by providing
the `--config 1.12` option. Please see https://github.com/samtools/bcftools/pull/1474
for details.
Changes affecting the whole of bcftools, or multiple commands:
* Improved build system
Changes affecting specific commands:
* bcftools annotate:
- Fix rare a bug when INFO/END is present, all INFO fields are removed
with `bcftools annotate -x INFO` and BCF output is produced. Then the
removed INFO/END continues to inform the end coordinate and causes
incorrect retrieval of records with the -r option (#1483)
- Support for matching annotation line by ID, in addition to CHROM,POS,REF,
and ALT (#1461)
bcftools annotate -a annots.tab.gz -c CHROM,POS,~ID,REF,ALT,INFO/END input.vcf
* bcftools csq:
- When GFF and VCF/fasta use a different chromosome naming convention
(e.g. chrX vs X), no consequences would be added. Newly the program
attempts to detect these differences and remove/add the "chr" prefix
to chromosome name to match the GFF and VCF/fasta (#1507)
- Parametrize brief-predictions parameter to allow explicit number of
aminoacids to be printed. Note that the `-b, --brief-predictions` option
is being replaced with `-B, --trim-protein-seq INT`
* bcftools +fill-tags:
- Generalization and better support for custom functions that allow
adding new INFO tags based on arbitrary `-i, --include` type of
expressions. For example, to calculate a missing INFO/DP annotation
from FORMAT/AD, it is possible to use:
-t 'DP:1=int(sum(FORMAT/AD))'
Here the optional ":1" part specifies that a single value will be
added (by default Number=. is used) and the optional int(...) adds
an integer value (by default Type=Float is used).
- When FORMAT/GT is not present, the INFO/AF tag will be newly calculated
from INFO/AC and INFO/AN.
* bcftools gtcheck:
- Switch between FORMAT/GT or FORMAT/PL when one is (implicitly) requested
but only the other is available
- Improve diagnostics, printing warnings when a line cannot be matched and
the number of lines skipped for various reasons (#1444)
- Minor bug fix, with PLs being the default, the `--distinctive-sites` option
started to require explicit `--error-probability 0`
* bcftools index:
- The program now accepts both data file name and the index file name. This
adds to user convenience when running index statistics (-n, -s)
* bcftools isec:
- Always generate sites.txt with isec -p (#1462)
* bcftools +mendelian:
- Consider only complete trios, do not crash on sample name typos (#1520)
* bcftools mpileup:
- New `--seed` option for reproducibility of subsampling code in HTSlib
- The SCR annotation which shows the number of soft-clipped reads now
correctly pools reads together regardless of the variant type. Previously
only reads with indels were included at indel sites.
- Major revamp of BAQ. Please see https://github.com/samtools/bcftools/pull/1474
for details. The previous behavior can be triggered by providing the `--config 1.12`
option.
- Thanks to improvements in HTSlib, the removal of overlapping reads (which can
be disabled with the `-x, --ignore-overlaps` options) is not systematically biased
anymore (https://github.com/samtools/htslib/pull/1273)
- Modified scale of Mann-Whitney U tests. Newly INFO/*Z annotations will be printed,
for example MQBZ replaces MQB.
* bcftools norm:
- Fix Type=Flag output in `norm --atomize` (#1472)
- Atomization must not discard ALT=. records
- Atomization of AD and QS tags now correctly updates occurrences of duplicate
alleles within different haplotypes
- Fix a bug in atomization of Number=A,R tags
* bcftools reheader:
- Add `-T, --temp-prefix` option
* bcftools +setGT:
- A wider range of genotypes can be set by the plugin by allowing
specifying custom genotypes. For example, to force a heterozygous
genotype it is now possible to use expressions like:
c:'m|M'
c:0/1
c:0
* bcftools +split-vep:
- New `-u, --allow-undef-tags` option
- Better handling of ambiguous keys such as INFO/AF and CSQ/AD. The
`-p, --annot-prefix` option is now applied before doing anything else
which allows its use with `-f, --format` and `-c, --columns` options.
- Some consequence field names may not constitute a valid tag name, such
as "pos(1-based)". Newly field names are trimmed to exclude brackets.
* bcftools +tag2tag:
- New --QR-QA-to-QS option to convert annotations generated by Freebays
to QS used by BCFtools
* bcftools +trio-dnm:
- Add support for sites with more than four alleles. Note that only the
four most frequent alleles are considered, the model remains unchanged.
Previously such sites were skipped.
- New --use-NAIVE option for a naive DNM calling based solely on FORMAT/GT
and expected Mendelian inheritance. This option is suitable for prefiltering.
- Fix behavior to match the documentation, the `--dnm-tag DNG` option now
correctly outputs log scaled values by default, not phred scaled.
- Fix bug in VAF calculation, homozygous de novo variants were incorrectly
reported as having VAF=50%
- Fix arithmetic underflow which could lead to imprecise scores and improve
sensitivity in high coverage regions
- Allow combining --pn and --pns to set the noise trehsholds independently
## Release 1.12 (17th March 2021)
Changes affecting the whole of bcftools, or multiple commands:
* The output file type is determined from the output file name suffix, where
available, so the -O/--output-type option is often no longer necessary.
* Make F_MISSING in filtering expressions work for sites with multiple
ALT alleles (#1343)
* Fix N_PASS and F_PASS to behave according to expectation when reverse
logic is used (#1397). This fix has the side effect of `query` (or
programs like `+trio-stats`) behaving differently with these expressions,
operating now in site-oriented rather than sample-oriented mode. For
example, the new behavior could be:
bcftools query -f'[%POS %SAMPLE %GT\n]' -i'N_PASS(GT="alt")==1'
11 A 0/0
11 B 0/0
11 C 1/1
while previously the same expression would return:
11 C 1/1
The original mode can be mimicked by splitting the filtering into two steps:
bcftools view -i'N_PASS(GT="alt")==1' | \
bcftools query -f'[%POS %SAMPLE %GT\n]' -i'GT="alt"'
Changes affecting specific commands:
* bcftools annotate:
- New `--rename-annots` option to help fix broken VCFs (#1335)
- New -C option allows a long list of options to be read from a file to
prevent very long command lines.
- New `append-missing` logic allows annotations to be added for each ALT
allele in the same order as they appear in the VCF. Note that this is
not bullet proof. In order for this to work:
- the annotation file must have one line per ALT allele
- fields must contain a single value as multiple values are appended
as they are and would break the correspondence between the alleles
and values
* bcftools concat:
- Do not phase genotypes by mistake if they are not already phased
with `-l` (#1346)
* bcftools consensus:
- New `--mask-with`, `--mark-del`, `--mark-ins`, `--mark-snv` options
(#1382, #1381, #1170)
- Symbolic <DEL> should have only one REF base. If there are multiple,
take POS+1 as the first deleted base.
- Make consensus work when the first base of the reference genome is
deleted. In this situation the VCF record has POS=1 and the first
REF base cannot precede the event. (#1330)
* bcftools +contrast:
- The NOVELGT annotation was previously not added when requested.
* bcftools convert:
- Make the --hapsample and --hapsample2vcf options consistent with each
other and with the documentation.
* bcftools call:
- Revamp of `call -G`, previously sample grouping by population was not
truly independent and could still be influenced by the presence of other
sample groups.
- Optional addition of INFO/PV4 annotation with `call -a INFO/PV4`
- Remove generation of useless HOB and ICB annotation;
use `+fill-tags -- -t HWE,ExcHet` instead
- The `call -f` option was renamed to `-a` to (1) make it consistent with
`mpileup` and (2) to indicate that it includes both INFO and FORMAT
annotations, not just FORMAT as previously
- Any sensible Number=R,Type=Integer annotation can be used with -G,
such as AD or QS
- Don't trim QUAL; although usefulness of this change is questionable for
true probabilistic interpretation (such high precision is unrealistic),
using QUAL as a score rather than probability is helpful and permits more
fine-grained filtering
- Fix a suspected bug in `call -F` in the worst case, for certain improve
readability
- `call -C trio` is temporarily disabled
* bcftools csq:
- Fix a bug which caused incorrect FORMAT/BCSQ formatting at sites with too
many per-sample consequences
- Fix a bug which incorrectly handled the --ncsq parameter and could clash
with reserved BCF values, consequently producing truncated or even incorrect
output of the %TBCSQ formatting expression in `bcftools query`. To account
for the reserved values, the new default value is --ncsq 15 (#1428)
* bcftools +fill-tags:
- MAF definition revised for multiallelic sites, the second most common
allele is considered to be the minor allele (#1313)
- New FORMAT/VAF, VAF1 annotations to set the fraction of alternate reads
provided FORMAT/AD is present
* bcftools gtcheck:
- support matching of a single sample against all other samples in the file
with `-s qry:sample -s gt:-`. This was previously not possible, either
full cross-check mode had to be run or a list of pairs/samples had to
be created explicitly
* bcftools merge:
- Make `merge -R` behavior consistent with other commands and pull in
overlapping records with POS outside of the regions (#1374)
- Bug fix (#1353)
* bcftools mpileup:
- Add new optional tag `mpileup -a FORMAT/QS`
* bcftools norm:
- New `-a, --atomize` functionality to decompose complex variants,
for example MNVs into consecutive SNVs
- New option `--old-rec-tag` to indicate the original variant
* bcftools query:
- Incorrect fields were printed in the per-sample output when subset
of samples was requested via -s/-S and the order of samples in the
header was different from the requested -s/-S order (#1435)
* bcftools +prune:
- New options --random-seed and --nsites-per-win-mode (#1050)
* bcftools +split-vep:
- Transcript selection now works also on the raw CSQ/BCSQ annotation.
- Bug fix, samples were dropped on VCF input and VCF/BCF output (#1349)
* bcftools stats:
- Changes to QUAL and ts/tv plotting stats: avoid capping QUAL to
predefined bins, use an open-range logarithmic binning instead
- plot dual ts/tv stats: per quality bin and cumulative as if threshold
applied on the whole dataset
* bcftools +trio-dnm2:
- Major revamp of +trio-dnm plugin, which is now deprecated and replaced by
+trio-dnm2.
The original trio-dnm calling model used genotype likelihoods (PLs) as the
input for calling. However, that is flawed because PLs make assumptions
which are unsuitable for de novo calling: PL(RR) can become bigger than
PL(RA) even when the ALT allele is present in the parents. Note that
this is true also for other programs such as DeNovoGear which rely on
the same samtools calculation.
The new recommended workflow is
bcftools mpileup -a AD,QS -f ref.fa -Ou proband.bam father.bam mother.bam |
bcftools call -mv -Ou |
bcftools +trio-dnm -p proband,father,mother -Oz -o output.vcf.gz
This new version also implements the DeNovoGear model. The original
behavior of trio-dnm is no longer supported.
For more details see http://samtools.github.io/bcftools/trio-dnm.pdf
## Release 1.11 (22nd September 2020)
Changes affecting the whole of bcftools, or multiple commands:
* Filtering -i/-e expressions
- Breaking change in -i/-e expressions on the FILTER column. Originally
it was possible to query only a subset of filters, but not an exact match.
The new behavior is:
FILTER="A" .. exact match, for example "A;B" does not pass
FILTER!="A" .. exact match, for example "A;B" does pass
FILTER~"A" .. both "A" and "A;B" pass
FILTER!~"A" .. neither "A" nor "A;B" pass
- Fix in commutative comparison operators, in some cases reversing sides
would produce incorrect results (#1224; #1266)
- Better support for filtering on sample subsests
- Add SMPL_*/S* family of functions that evaluate within rather than across
all samples. (#1180)
* Improvements in the build system
Changes affecting specific commands:
* bcftools annotate:
- Previously it was not possible to use `--columns =TAG` with INFO tags
and the `--merge-logic` feature was restricted to tab files with BEG,END
columns, now extended to work also with REF,ALT.
- Make `annotate -TAG/+TAG` work also with FORMAT fields. (#1259)
- ID and FILTER can be transferred to INFO and ID can be populated from
INFO. However, the FILTER column still cannot be populated from an INFO
tag because all possible FILTER values must be known at the time of
writing the header (#947; #1187)
* bcftools consensus:
- Fix in handling symbolic deletions and overlapping variants.
(#1149; #1155; #1295)
- Fix `--iupac-codes` crash on REF-only positions with `ALT="."`. (#1273)
- Fix `--chain` crash. (#1245)
- Preserve the case of the genome reference. (#1150)
- Add new `-a, --absent` option which allows setting positions with no
supporting evidence to "N" (or any other character). (#848; #940)
* bcftools convert:
- The option `--vcf-ids` now works also with `-haplegendsample2vcf`. (#1217)
- New option `--keep-duplicates`
* bcftools csq:
- Add `misc/gff2gff.py` script for conversion between various flavors of
GFF files. The initial commit supports only one type and was contributed
by @flashton2003. (#530)
- Add missing consequence types. (PR #1203; #1292)
- Allow overlapping CDS to support ribosomal slippage. (#1208)
* bcftools +fill-tags:
- Added new annotations: INFO/END, TYPE, F_MISSING.
* bcftools filter:
- Make `--SnpGap` optionally filter also SNPs close to other variant types.
(#1126)
* bcftools gtcheck:
- Complete revamp of the command. The new version is faster and allows
N:M sample comparisons, not just 1:N or NxN comparisons.
Some functionality was lost (plotting and clustering) but may be added
back on popular demand.
* bcftools +mendelian:
- Revamp of user options, output VCFs with mendelian errors annotation,
read PED files (thanks to Giulio Genovese).
* bcftools merge:
- Update headers when appropriate with the '--info-rules *:join' INFO rule.
(#1282)
- Local alleles merging that produce LAA and LPL when requested, a draft
implementation of https://github.com/samtools/hts-specs/pull/434 (#1138)
- New `--no-index` which allows unindexed files to be merged. Requires the input
files to have chromosomes in th same order and consistent with the order
of sequences in the header. (PR #1253; samtools/htslib#1089)
- Fixes in gVCF merging. (#1127; #1164)
* bcftools norm:
- Fixes in `--check-ref s` reference setting features with non-ACGT bases.
(#473; #1300)
- New `--keep-sum` switch to keep vector sum constant when splitting
multiallelics. (#360)
* bcftools +prune:
- Extend to allow annotating with various LD metrics: r^2,
Lewontin's D' (PMID:19433632), or Ragsdale's D (PMID:31697386).
* bcftools query:
- New `%N_PASS()` formatting expression to output the number of samples
that pass the filtering expression.
* bcftools reheader:
- Improved error reporting to prevent user mistakes. (#1288)
* bcftools roh:
- Several fixes and improvements
- the `--AF-file` description incorrectly suggested "REF\tALT" instead
of the correct "REF,ALT". (#1142)
- RG lines could have negative length. (#1144)
- new `--include-noalt` option to allow also ALT=. records. (#1137)
* bcftools scatter:
- New plugin intended as a convenient inverse to `concat`
(thanks to Giulio Genovese, PR #1249)
* bcftools +split:
- New `--groups-file` option for more flexibility of defining desired
output. (#1240)
- New `--hts-opts` option to reduce required memory by reusing one
output header and allow overriding the default hFile's block size
with `--hts-opts block_size=XXX`. On some file systems (lustre) the
default size can be 4M which becomes a problem when splitting files
with 10+ samples.
- Add support for multisample output and sample renaming
* bcftools +split-vep:
- Add default types (Integer, Float, String) for VEP subfields and make
`--columns -` extract all subfields into INFO tags in one go.
## Release 1.10.2 (19th December 2019)
This is a release fix that corrects minor inconsistencies discovered in
previous deliverables.
## Release 1.10 (6th December 2019)
* Numerous bug fixes, usability improvements and sanity checks were added
to prevent common user errors.
* The -r, --regions (and -R, --regions-file) option should never create
unsorted VCFs or duplicates records again. This also fixes rare cases where
a spanning deletion makes a subsequent record invisible to `bcftools isec`
and other commands.
* Additions to filtering and formatting expressions
- support for the spanning deletion alternate allele (ALT=*)
- new ILEN filtering expression to be able to filter by indel length
- new MEAN, MEDIAN, MODE, STDEV, phred filtering functions
- new formatting expression %PBINOM (phred-scaled binomial probability),
%INFO (the whole INFO column), %FORMAT (the whole FORMAT column),
%END (end position of the REF allele), %END0 (0-based end position
of the REF allele), %MASK (with multiple files indicates the presence
of the site in other files)
* New plugins
- `+gvcfz`: compress gVCF file by resizing gVCF blocks according to
specified criteria
- `+indel-stats`: collect various indel-specific statistics
- `+parental-origin`: determine parental origin of a CNV region
- `+remove-overlaps`: remove overlapping variants.
- `+split-vep`: query structured annotations such INFO/CSQ created by
bcftools/csq or VEP
- `+trio-dnm`: screen variants for possible de-novo mutations in trios
* `annotate`
- new -l, --merge-logic option for combining multiple overlapping regions
* `call`
- new `bcftools call -G, --group-samples` option which allows grouping
samples into populations and applying the HWE assumption within but
not across the groups.
* `csq`
- significant reduction of memory usage in the local -l mode for VCFs
with thousands of samples and 20% reduction in the non-local
haplotype-aware mode.
- fixes a small memory leak and formatting issue in FORMAT/BCSQ at
sites with many consequences
- do not print protein sequence of start_lost events
- support for "start_retained" consequence
- support for symbolic insertions (ALT="<INS...>"), "feature_elongation"
consequence
- new -b, --brief-predictions option to output abbreviated protein
predictions.
* `concat`
- the `--naive` command now checks header compatibility when concatenating
multiple files.
* `consensus`
- add a new `-H, --haplotype 1pIu/2pIu` feature to output first/second
allele for phased genotypes and the IUPAC code for unphased genotypes
- new -p, --prefix option to add a prefix to sequence names on output
* `+contrast`
- added support for Fisher's test probability and other annotations
* `+fill-from-fasta`
- new -N, --replace-non-ACGTN option
* `+dosage`
- fix some serious bugs in dosage calculation
* `+fill-tags`
- extended to perform simple on-the-fly calculations such as calculating
INFO/DP from FORMAT/DP.
* `merge`
- add support for merging FORMAT strings
- bug fixed in gVCF merging
* `mpileup`
- a new optional SCR annotation for the number of soft-clipped reads
* `reheader`
- new -f, --fai option for updating contig lines in the VCF header
* `+trio-stats`
- extend output to include DNM homs and recurrent DNMs
* VariantKey support
## Release 1.9 (18th July 2018)
* `annotate`
- REF and ALT columns can be now transferred from the annotation file.
- fixed bug when setting vector_end values.
* `consensus`
- new -M option to control output at missing genotypes
- variants immediately following insersions should not be skipped. Note
however, that the current fix requires normalized VCF and may still
falsely skip variants adjacent to multiallelic indels.
- bug fixed in -H selection handling
* `convert`
- the --tsv2vcf option now makes the missing genotypes diploid, "./."
instead of "."
- the behavior of -i/-e with --gvcf2vcf changed. Previously only sites with
FILTER set to "PASS" or "." were expanded and the -i/-e options dropped
sites completely. The new behavior is to let the -i/-e options control
which records will be expanded. In order to drop records completely,
one can stream through "bcftools view" first.
* `csq`
- since the real consequence of start/splice events are not known,
the amino acid positions at subsequent variants should stay unchanged
- add `--force` option to skip malformatted transcripts in GFFs with
out-of-phase CDS exons.
* `+dosage`: output all alleles and all their dosages at multiallelic sites
* `+fixref`: fix serious bug in -m top conversion
* `-i/-e` filtering expressions:
- add two-tailed binomial test
- add functions N_PASS() and F_PASS()
- add support for lists of samples in filtering expressions, with many
samples it was impractical to list them all on the command line. Samples
can be now in a file as, e.g., GT[@samples.txt]="het"
- allow multiple perl functions in the expressions and some bug fixes
- fix a parsing problem, '@' was not removed from '@filename' expressions
* `mpileup`: fixed bug where, if samples were renamed using the `-G`
(`--read-groups`) option, some samples could be omitted from the output file.
* `norm`: update INFO/END when normalizing indels
* `+split`: new -S option to subset samples and to use custom file names
instead of the defaults
* `+smpl-stats`: new plugin
* `+trio-stats`: new plugin
* Fixed build problems with non-functional configure script produced on
some platforms
## Release 1.8 (April 2018)
* `-i, -e` filtering: Support for custom perl scripts
* `+contrast`: New plugin to annotate genotype differences between groups
of samples
* `+fixploidy`: New options for simpler ploidy usage
* `+setGT`: Target genotypes can be set to phased by giving `--new-gt p`
* `run-roh.pl`: Allow to pass options directly to `bcftools roh`
* Number of bug fixes
## Release 1.7 (February 2018)
* `-i, -e` filtering: Major revamp, improved filtering by FORMAT fields
and missing values. New GT=ref,alt,mis etc keywords, check the documentation
for details.
* `query`: Only matching expression are printed when both the -f and -i/-e
expressions contain genotype fields. Note that this changes the original
behavior. Previously all samples were output when one matching sample was
found. This functionality can be achieved by pre-filtering with view and then
streaming to query. Compare
bcftools query -f'[%CHROM:%POS %SAMPLE %GT\n]' -i'GT="alt"' file.bcf
and
bcftools view -i'GT="alt"' file.bcf -Ou | bcftools query -f'[%CHROM:%POS %SAMPLE %GT\n]'
* `annotate`: New -k, --keep-sites option
* `consensus`: Fix --iupac-codes output
* `csq`: Homs always considered phased and other fixes
* `norm`: Make `-c none` work and remove `query -c`
* `roh`: Fix errors in the RG output
* `stats`: Allow IUPAC ambiguity codes in the reference file; report the number of missing genotypes
* `+fill-tags`: Add ExcHet annotation
* `+setGt`: Fix bug in binom.test calculation, previously it worked only for nAlt<nRef!
* `+split`: New plugin to split a multi-sample file into single-sample files in one go
* Improve python3 compatibility in plotting scripts
## Release 1.6 (September 2017)
* New `sort` command.
* New options added to the `consensus` command. Note that the `-i, --iupac`
option has been renamed to `-I, --iupac`, in favor of the standard
`-i, --include`.
* Filtering expressions (`-i/-e`): support for `GT=<type>` expressions and
for lists and ranges (#639) - see the man page for details.
* `csq`: relax some GFF3 parsing restrictions to enable using Ensembl
GFF3 files for plants (#667)
* `stats`: add further documentation to output stats files (#316) and
include haploid counts in per-sample output (#671).
* `plot-vcfstats`: further fixes for Python3 (@nsoranzo, #645, #666).
* `query` bugfix (#632)
* `+setGT` plugin: new option to set genotypes based on a two-tailed binomial
distribution test. Also, allow combining `-i/-e` with `-t q`.
* `mpileup`: fix typo (#636)
* `convert --gvcf2vcf` bugfix (#641)
* `+mendelian`: recognize some mendelian inconsistencies that were
being missed (@oronnavon, #660), also add support for multiallelic
sites and sex chromosomes.
## Release 1.5 (June 2017)
* Added autoconf support to bcftools. See `INSTALL` for more details.
* `norm`: Make norm case insensitive (#601). Trim the reference allele (#602).
* `mpileup`: fix for misreported indel depths for reads containing adjacent
indels (3c1205c1).
* `plot-vcfstats`: Open stats file in text mode, not binary (#618).
* `fixref` plugin: Allow multiallelic sites in the `-i, --use-id reference`.
Also flip genotypes, not just REF/ALT!
* `merge`: fix gVCF merge bug when last record on a chromosome opened a
gVCF block (#616)
* New options added to the ROH plotting script.
* `consensus`: Properly flush chain info (#606, thanks to @krooijers).
* New `+prune` plugin for pruning sites by LD (R2) or maximum number of
records within a window.
* New N_MISSING, F_MISSING (number and fraction missing) filtering
expressions.
* Fix HMM initialization in `roh` when snapshots are used in multiple
chromosome VCF.
* Fix buffer overflow (#607) in `filter`.
## Release 1.4.1 (8 May 2017)
* `roh`: Fixed malfunctioning options `-m, --genetic-map` and `-M, --rec-rate`,
and newly allowed their combination. Added a convenience wrapper `misc/run-roh.pl`
and an interactive script for visualizing the calls `misc/plot-roh.py`.
* `csq`: More control over warning messages (#585).
* Portability improvements (#587). Still work to be done on this front.
* Add support for breakends to `view`, `norm`, `query` and filtering (#592).
* `plot-vcfstats`: Fix for python 2/3 compatibility (#593).
* New `-l, --list` option for `+af-dist` plugin.
* New `-i, --use-id` option for `+fix-ref` plugin.
* Add `--include/--exclude` options to `+guess-ploidy` plugin.
* New `+check-sparsity` plugin.
* Miscellaneous bugfixes for #575, #584, #588, #599, #535.
## Release 1.4 (13 March 2017)
Two new commands - `mpileup` and `csq`:
* The `mpileup` command has been imported from samtools to bcftools. The
reasoning behind this is that bcftools calling is intimately tied to mpileup
and any changes to one, often requires changes to the other. Only the
genotype likelihood (BCF output) part of mpileup has moved to bcftools,
while the textual pileup output remains in samtools. The BCF output option
in `samtools mpileup` will likely be removed in a release or two or when
changes to `bcftools call` are incompatible with the old mpileup output.
The basic mpileup functionality remains unchanged as do most of the command
line options, but there are some differences and new features that one
should be aware of:
- The option `samtools mpileup -t, --output-tags` changed to `bcftools
mpileup -a, --annotate` to avoid conflict with the `-t, --targets`
option common across other bcftools commands.
- `-O, --output-BP` and `-s, --output-MQ` are no longer used as they are
only for textual pipelup output, which is not included in `bcftools
mpileup`. `-O` short option reassigned to `--output-type` and `-s`
reassigned to `--samples` for consistency with other bcftools commands.
- `-g, --BCF`, `-v, --VCF`, and ` -u, --uncompressed` options from
`samtools mpileup` are no longer used, being replaced by the
`-O, --output-type` option common to other bcftools commands.
- The `-f, --fasta-ref` option is now required by default to help avoid user
errors. Can be disabled using `--no-reference`.
- The option `-d, --depth .. max per-file depth` now behaves as expected
and according to the documentation, and prints a meaningful diagnostics.
- The `-S, --samples-file` can be used to rename samples on the fly. See man
page for details.
- The `-G, --read-groups` functionality has been extended to allow
reassignment, grouping and exclusion of readgroups. See man page for
details.
- The `-l, --positions` replaced by the `-t, --targets` and
`-T, --targets-file` options to be consistent with other bcftools
commands.
- gVCF output is supported. Per-sample gVCFs created by mpileup can be
merged using `bcftools merge --gvcf`.
- Can generate mpileup output on multiple (indexed) regions using the
`-r, --regions` and `-R, --regions-file` options. In samtools, one
was restricted to a single region with the `-r, --region` option.
- Several speedups thanks to @jkbonfield (cf3a55a).
* `csq`: New command for haplotype-aware variant consequence calling.
See man page and [paper](https://www.ncbi.nlm.nih.gov/pubmed/28205675).
Updates, improvements and bugfixes for many other commands:
* `annotate`: `--collapse` option added. `--mark-sites` now works with
VCF files rather than just tab-delimited files. Now possible to annotate
a subset of samples from tab file, not just VCF file (#469). Bugfixes (#428).
* `call`: New option `-F, --prior-freqs` to take advantage of prior knowledge
of population allele frequencies. Improved calculation of the QUAL score
particularly for REF sites (#449, 7c56870). `PLs>=256` allowed in
`call -m`. Bugfixes (#436).
* `concat --naive` now works with vcf.gz in addition to bcf files.
* `consensus`: handle variants overlapping region boundaries (#400).
* `convert`: gvcf2vcf support for mpileup and GATK. new `--sex` option to
assign sex to be used in certain output types (#500). Large speedup of
`--hapsample` and `--haplegendsample` (e8e369b) especially with `--threads`
option enabled. Bugfixes (#460).
* `cnv`: improvements to output (be8b378).
* `filter`: bugfixes (#406).
* `gtcheck`: improved cross-check mode (#441).
* `index` can now specify the path to the output index file. Also, gains the
`--threads` option.
* `merge`: Large overhaul of `merge` command including support for merging
gVCF files created by `bcftools mpileup --gvcf` with the new `-g, --gvcf`
option. New options `-F` to control filter logic and `-0` to set missing
data to REF. Resolved a number of longstanding issues (#296, #361, #401,
#408, #412).
* `norm`: Bugfixes (#385,#452,#439), more informative error messages (#364).
* `query`: `%END` plus `%POS0`, `%END0` (0-indexed) support - allows easy BED
format output (#479). `%TBCSQ` for use with the new `csq` command. Bugfixes
(#488,#489).
* `plugin`: A number of new plugins:
- `GTsubset` (thanks to @dlaehnemann)
- `ad-bias`
- `af-dist`
- `fill-from-fasta`
- `fixref`
- `guess-ploidy` (deprecates `vcf2sex` plugin)
- `isecGT`
- `trio-switch-rate`
and changes to existing plugins:
- `tag2tag`: Added `gp-to-gt`, `pl-to-gl` and `--threshold` options and
bugfixes (#475).
- `ad-bias`: New `-d` option for minimum depth.
- `impute-info`: Bugfix (49a9eaf).
- `fill-tags`: Added ability to aggregate tags for sample subgroups, thanks
to @mh11. (#503). HWE tag added as an option.
- `mendelian`: Bugfix (#566).
* `reheader`: allow muiltispace delimiters in `--samples` option.
* `roh`: Now possible to process multiple samples at once. This allows
considerable speedups for files with thousands of samples where the cost of
HMM is negligible compared to I/O and decompressing. In order to fit tens of
thousands samples in memory, a sliding HMM can be used (new `--buffer-size`
option). Viterbi training now uses Baum-Welch algorithm, and works much
better. Support for gVCFs or FORMAT/PL tags. Added `-o, output` and
`-O, --output-type` options to control output of sites or regions
(compression optional). Many bugs fixed - do not segfault on missing PL
values anymore, a typo in genetic map calculation resulted in a slowdown and
incorrect results.
* `stats`: Bugfixes (16414e6), new options `-af-bins` and `-af-tags` to control
allele frequency binning of output. Per-sample genotype concordance tables
added (#477).
* `view -a, --trim-alt-alleles` various bugfixes for missing data and more
informative errors should now be given on failure to pinpoint problems.
General changes:
* Timestamps are now added to header lines summarising the command (#467).
* Use of the `--threads` options should be faster across the board thanks to
changes in HTSlib meaning meaning threads are now shared by the compression
and decompression calls.
* Changes to genotype filtering with `-i, --include` and `-e, --exclude` (#454).
## Noteworthy changes in release 1.3.1 (22 April 2016)
* The `concat` command has a new `--naive` option for faster operations on
large BCFs (PR #359).
* `GTisec`: new plugin courtesy of David Laehnemann (@dlaehnemann) to count
genotype intersections across all possible sample subsets in a VCF file.
* Numerous VCF parsing fixes.
* Build fix: _peakfit.c_ now builds correctly with GSL v2 (#378).
* Various bug fixes and improvements to the `annotate` (#365), `call` (#366),
`index` (#367), `norm` (#368, #385), `reheader` (#356), and `roh` (#328)
commands, and to the `fill-tags` (#345) and `tag2tag` (#394) plugins.
* Clarified documentation of `view` filter options, and of the
`--regions-file` and `--targets-file` options (#357, #411).
## Noteworthy changes in release 1.3 (15 December 2016)
* `bcftools call` has new options `--ploidy` and `--ploidy-file` to make
handling sample ploidy easier. See man page for details.
* `stats`: `-i`/`-e` short options changed to `-I`/`-E` to be consistent with
the filtering `-i`/`-e` (`--include`/`--exclude`) options used in other
tools.
* general `--threads` option to control the number of output compression
threads used when outputting compressed VCF or BCF.
* `cnv` and `polysomy`: new commands for detecting CNVs, aneuploidy, and
contamination from SNP genotyping data.
* various new options, plugins, and bug fixes, including #84, #201, #204,
#205, #208, #211, #222, #225, #242, #243, #249, #282, #285, #289, #302,
#311, #318, #336, and #338.
## Noteworthy changes in release 1.2 (2 February 2016)
* new `bcftools consensus` command
* new `bcftools annotate` plugins: fixploidy, vcf2sex, tag2tag
* more features in `bcftools convert` command, amongst others new
`--hapsample` function (thanks to Warren Kretzschmar @wkretzsch)
* support for complements in `bcftools annotate --remove`
* support for `-i`/`-e` filtering expressions in `bcftools isec`
* improved error reporting
* `bcftools call`
- the default prior increased from `-P 1e-3` to `-P 1.1e-3`, some clear
calls were missed with default settings previously
- support for the new symbolic allele `<*>`
- support for `-f GQ`
- bug fixes, such as: proper trimming of DPR tag with `-c`; the `-A` switch
does not add back records removed by `-v` and the behaviour has been made
consistent with `-c` and `-m`
* many bug fixes and improvements, such as
- bug in filtering, FMT & INFO vs INFO & FMT
- fixes in `bcftools merge`
- filter update AN/AC with `-S`
- isec outputs matching records for both VCFs in the Venn mode
- annotate considers alleles when working with `Number=A,R` tags
- new `--set-id` feature for annotate
- `convert` can be used similarly to `view`
|