1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103
|
Version 1.14.11 (16th October 2018)
---------------
Updates:
* CRAM: http(s) queries now honour redirects.
The User-Agent header is also set, which is necessary in some
proxies.
Bug fixes:
* CRAM: fix to major range query bug introduced in 1.14.10.
* CRAM: more bug fixing on range queries when multi-threading (EOF
detection).
* The test harness now works correctly in bourne shell, without
using bashisms.
Version 1.14.10 (26th September 2018)
---------------
Updates:
* BAM: Libdeflate support (https://github.com/ebiggers/libdeflate).
This library is significantly faster than zlib, so it is a good
alternative to the Cloudflare and/or Intel libraries.
Configure using --with-libdeflate=/dir/to/deflate/install
* CRAM *EXPERIMENTAL*: Added custom quality and identifier codecs.
Also added the ability to use libbsc as a general purpose codec.
These are NOT OFFICIAL and so not enabled by default (version 3.0).
However as a technology demonstration only, they are available with
scramble -V3.1 or -V4.0 for evaluation and to promote discussion on
future CRAM formats. Do not use these on production data.
Implementations of the codecs and CRAM version 4.0 layout are liable
to change without prior warning.
* CRAM: name sorted files now automatically switch to non-ref mode.
Bug fixes:
* CRAM: Considerable fixes to multi-threading.
- Using more than 1 slice per container with threading now works.
- Removal of race conditions when using CRAM_OPT_REQUIRED_FIELDS.
- Combinations of ref and no-ref mode in adjacent containers.
- Other misc. threading bugs.
* Corrected end-of-range check in some scenarios.
* CRAM: bug fix to index creation when a slice contains exactly one
alignment.
* SAM: fixed parsing of illegal sequence characters (eg "Z").
These are now treated as "N" and not "=".
* BAM/SAM: protect against out of bound CIGAR operations.
* CRAM: hardening of rANS codec against malicious input.
Also fixed a very rare frequency renormalisation case.
* CRAM: fix with range queries used in conjuction with turning off
sequence retrieval (via CRAM_OPT_REQUIRED_FIELDS).
* Improved test harness for Windows and some header file problems.
* Fixed bgzip on big endian systems. (Debian bugs 876839, 876840)
Version 1.14.9 (9th February 2017)
--------------
Updates:
* BAM: Added CRC checking. Bizarrely this was absent here and in most
other BAM implementations too. Pure BAM decode of an uncompressed
BAM is around 9% slower and compressed BAM to compressed BAM is
almost identical. The most significant hit is reading uncompressed
BAM (and doing nothing else) which is 120% slower as CRC dominates.
Options are available to disable the CRC checking incase this is an
issue (scramble -!).
* CRAM: Now supports bgziped fasta references.
* CRAM/SAM: Headers are now kept in the same basic type order while
transcoding. (Eg all @PG before all @SQ, or vice versa, depending on
input ordering.)
* CRAM: Compression level 1 is now faster but larger. (The old -1 and
-2 were too similar.)
* CRAM: Improved compression efficiency in some files, when switching
from sorted to unsorted data.
* CRAM: Speedups and improvements to memory handling under GNU
malloc. See the scram_init() function.
* CRAM: Sped up the rANS codecs on x86_64 platforms (assembly code).
* CRAM: Improved multi-threading performance during decode.
* CRAM: Block CRC checks are now only done when the block is used,
speeding up multi-threading and tools that do not decode all blocks
(eg flagstat).
* Scramble -g and -G options to generate and reuse bgzip indices when
reading and writing BAM files.
* Scramble -q option to omit updating the @PG header records.
* Experimental cram_filter tool has been added, to rapidly produce
cram subsets.
* Migrated code base to git. Use github for primary repository.
Dropped ChangeLog file (recommend git clone and "git log
--abbrev-commit --pretty=medium --stat" for an svn similar log
style).
* BAM: minor improvements to gcc SIMD auto-vectorisation.
* Minor improvements to dstring memory usage (potentially reducing
memory usage when loading very large BAM headers).
Bug fixes:
* BAM: Fixed the bin value calculation for placed but unmapped reads.
* CRAM: Fixed file descriptor leak in refs_load_fai().
* CRAM: Fixed a crash in MD5 calculation for sequences beyond the
reference end.
* CRAM: Bug fixes when encoding malformed @SQ records.
* CRAM: Fixed a rare renormalisation bug in rANS codec.
* Fixed tests so make -j worked.
* Removed ancient, broken and unused popen() code.
Version 1.14.8 (22nd April 2016)
--------------
* SAM: Small speed up to record parsing.
* CRAM: Scramble now has -p and -P options to control whether to
force the BAM auxiliary sizes (8 vs 16 vs 32-bit integer quantities)
rather than reducing to smallest size required, and whether to
preserve the order of auxiliary tags including RG, NM and MD.
This latter option requires storing these values verbatim instead of
regenerating them on-the-fly, but note this only preserves tag order
with Scramble / Htslib. Htsjdk will still produce these fields out
of order.
* CRAM no longer stores data in the CORE block, permitting greater
flexibility in choosing which fields to decode. (This change is
also mirrored in htslib and htsjdk.)
* CRAM: ref.fai files in a different order to @SQ headers should now
work correctly.
* CRAM required-fields parameters no longer forces quality decoding
when asking for sequence.
* CRAM: More robustness / safety checks during decoding; itf8 bounds
checks, running out of memory, bounds checks in BETA codec, and
more.
* CRAM auto-generated read names are consistent regardless of range
queries. They also now match those produced by htslib.
* CRAM: the rANS codec should now be slightly faster at decoding.
* CRAM: there is a newer (faster than vanilla Zlib) crc32
implementation. If you are linking against CloudFlare's optimised
Zlib you should configure with --disable-own-crc to utilise their
assembly PCLMUL CRC implementation.
* CRAM bug fix: removed potential (but unobserved) possibility of
8-bit quantities stored as a 16-bit value in BAM being converted
incorrectly within CRAM.
* CRAM bug fix: fixed field widths for cram_dump and cram_size.
* SAM bug fix: no more complaining about "unknown" sort order.
* A few compiler warnings in cram_dump / cram_size have gone away.
Many small CRAM code tweaks to aid comparisons to htslib. It should
also be easier to build under Microsoft Visual Studio (although no
project file is provided).
Version 1.14.7 (18th February 2016)
--------------
* Some speed ups to BAM encoding, particularly when using uncompressed
BAM.
* Scramble now has a lossy read-name method (scramble -n) when
outputting to CRAM.
* Tidied up the formatting of cram_size.
* Cram_dump now prints up the TD map.
* CRAM bug fix: Scramble -N was sometimes failing when multi-threaded.
* CRAM bug fix: The code once again builds if CRAM_IO_CUSTOM_BUFFERING
is disabled.
* CRAM bug fix: avoid undefined behaviour in some uses of
CRAM_OPT_REQUIRED_FIELD (not readily noticable from command line
tools).
* CRAM bug fix: on very rare cases TLEN could change during decode
(albeit fixing it) for read-pairs that spanned references.
* CRAM bug fix: CIGAR sequences with more than 2^27 operations are now
supported.
* CRAM bug fix: fixed an assertion failure triggered with some
repeated templates.
Version 1.14.6 (6th November 2015)
--------------
* CRAM bug fix, reversing a bug introduced during 1.14.5. Output from
cramtools could trigger a crash during decoding with scramble. This
happened where a HUFFMAN codec was specified with zero symbols.
E.g. "DL => HUFFMAN {0, 0}" from cram_dump for a slice where there
are no D operators in the cigar strings for this slice.
Version 1.14.5 (5th November 2015)
--------------
* Scramble now has a way to control the maximum number of bases per
slice (default 5Mb), forcing a new slice if this limit is hit before
hitting the existing max sequences per slice limit. This improves
performance on very long read data. (PacBio, ONT, etc)
* Improvements for MacOS X building.
* Removed erroneous debugging output from Scramble.
* Fixed cram_dump so it works on longer sequences, eg PacBio data.
* Fixed cram_size (but not yet cram_dump summary output) to handle
multi-byte content_ids when reporting which block type is for which
data series.
* Fixed a bug with multi-slice containers, broken since r3946
(1.14.1).
* Bug fixes to the libmaus2/biobambam2 interface code. Part of this
change includes simplifying how auxiliary tag content_ids are
assigned for CRAM.
* This io_lib release should now work again when being used by the
current (albeit 2013) release of Staden Package.
Version 1.14.4 (5th October 2015)
--------------
CRAM changes:
* Fixed a CRAM encoding bug with compression level 6 and above where
the resulting CRAM file could not be decoded. This has been in
existance since 1.13.8 (and possibly 1.13.6 under rarer conditions).
* New scramble option -H to avoid printing header in SAM output.
Version 1.14.3 (29th September 2015)
--------------
CRAM changes:
* Disabled the experimental slice checksum headers (SD and BD *slice*
tags) as their behaviour is still undefined and not in the CRAM
specification. These were left in in error.
* Fixed scram_merge to honour the -R option to specify a region.
(NB: This isn't really a properly supported tool, but a test of the
library code.)
* Fixed a bug in decoding memory that caused lzma level 9 compressed
files to be unable to be decompressed.
* Minor updates to scramble usage text.
Version 1.14.2 (16th September 2015)
--------------
CRAM changes:
* Bug fix to SAM header parsing so that it now permits nul characters.
(This is a long standing bug.)
* Bug fix to auxiliary tag compression; we failed to correctly cache
the best codec to use, resulting in slower (but valid) compression
times.
Version 1.14.1 (10th September 2015)
--------------
CRAM changes:
* Small improvements to compression ratios. CRAM auxiliary fields are
now always written to their own blocks. Also now experiment with
level 1 zlib compression in addition to the required level, as
on some data series this is the best solution.
* Removed support for writing CRAM version 1.0. This format was never
truely spec compliant anyway due to errors with the first
specification.
* rANS O1 memory allocation is now via malloc rather than the heap,
permitting building on MacOS X again.
* Fixed crash in multi-threaded decoding if not decoding positional
data (via cram_required_fields() function).
* Fixed a bug with non-reference encoded CRAMs and indexing.
Version 1.14.0 (10th July 2015)
--------------
* The default CRAM format type is now version 3.0. You can still
generate version 2.1 files using scramble -V2.1
* Lots of BAM/CRAM code hardening against I/O errors or corrupted data
files. Some have been from visual inspection while many have come
from automated "american fuzzy lop" fuzz testing. See the ChangeLog
for the full list.
* Imperovements to compilation; we now compile with -Wall and this
should produce no warnings. Let us know if you get them. There is
a configure --disable-warnings option to switch off -Wall.
* CRAM: added mmap support for references. This can reduce the total
memory footprint if many instances of scramble are running and also
reduces I/O when we are using small regions of cached md5
references.
* CRAM: improved compatibility with Java Cramtools CRC32 checking
(Cram version 3.0). We've also now done full integration checking
between the two implementations to ensure best compatibility with
version 3.0.
* CRAM: better support for Biobambam/libmaus; provides an in-memory
buffered alternative to the file descriptor to allow better
multi-threading performance with libmaus.
* CRAM: we now correctly spot sequence "*" when generating a CRAM file
so we can correctly export it again (cram version 3.0). Similarly
quality "*" is better handled too when being passed on to Cramtools.
* CRAM: bug fixed NM:i tag so it no longer counts hard clips. Also
fixed NM/MD for sequence "*" and cases where one was previously
present but not the other.
* CRAM: bug fixed handling unmapped reads with sequence "*".
* CRAM: bug fix to index querying when a read starts precisely on a
boundary of a cram slice.
* CRAM: fixed the container number of blocks field to be computed
correctly for multi-slice containers. (Oddly this didn't actually
matter for Scramble or Cramtools.)
version 1.13.10 (3rd Mar 2015)
---------------
* Reduced memory coordinate sorted CRAM files with many references per
slice.
* More error protection for mismatching .fai/@SQ headers.
* Improved handling of alignments off the end of references.
version 1.13.9 (29th Jan 2015)
--------------
* Improved CRAM stats array usage. Previously it could create
sub-optimal HUFFMAN trees in rare situations. Harmless, but larger
output than necessary.
* The "configure --enable-custom-buffering" (or --disable-) mode, on
by default, adds an additional scram_open interface to allow low
level I/O operations to be externally defined. This is used within
Biobambam to replace stdio with custom code supporting an iRODS
backend. See scram_open_cram_via_callbacks().
* CRAM should now be 100% lossless, barring a few specific broken
inputs (eg CIGAR strings on unaligned data). If it detects flags,
pnext or tlen fields that would differ if decoded using the
read-pairing algorithm built in to scramble's cram decoder then it
stores the read verbatim to avoid deduplicathing these fields. It
also has better support for the Supplementary flag.
* Improved support for the Supplementary flags when auto-generating
SAM flags.
* Fixed an issue where new gcc with -O3 could crash in processing SAM
due to SIMD vectorisation and unaligned memory accesses.
* Cram_index now works via a pipe, by specifying "-" as the input filename.
version 1.13.8 (12th Jan 2015)
--------------
* The REF_PATH and RAWDATA variable expansion can now handle URLs
without an explicit URL= component. It also understands the format
of URLs and doesn't require a double colon (::) to escape a single
colon any more.
* Removed a few compiler warnings.
* CRAM: Improved test harness to test Scramble -e and -x. Fixed an
issue related to -x not setting RI data series in some cases.
* CRAM: BS:Z and BI:Z now go to separate external blocks, for
(usually) improved compression. Also added special blocks for
IonTorrent ZM:B and FZ:B tags.
* CRAM: Reading CRAM indices is now much faster. Also fixed a bug
when doing multiple range queries that fell exactly on CRAM
container boundaries.
* CRAM: Bug fixes and efficiency improvements to the logic to work out
which data-series to decode.
* CRAM: Better handling of MD and NM tags when using non-reference
encoding. Also sped up the MD string generation.
* CRAM: Improved support for dealing with both primary and secondary
alignments.
* CRAM: Better support for name-sorted data. It worked before, but we
had too many re-loads of the reference sequence. Similarly removed
pointless reference sequence loads when encoding with scramble -x.
* CRAM: Various minor memory leaks removed.
* CRAM: Multi-threading updates.
Fixed some uninitialised memory accesses causing crashes on
SPARC/Solaris. Also fixed issues when using range-requests while
multi-threading. Less mutex locking when using name-sorted data.
* CRAM: Removed spurious warnings about lack of EOF block when reading
older format CRAM files.
* CRAM: Bug fix to the (undocumented) SAM 'd' aux type. Only used here
because samtools supports it.
* CRAM: Bug fix when attempting to decode 0 bytes from an external
block.
* CRAM: More support for version 3.0. The 'b' and 'q' CRAM feature
codes have been implemented along with better support for the old
BYTE_ARRAY_LEN encoder. Support for compressed SAM headers.
Unified RANS0/RANS1 codecs to RANS (switches order itself).
EXPERIMENTAL: enable using "scramble -V3.0".
This has now been cross-validated against Java cram_tools 3.0
format. (Note this is incompatible with the v3.0 files produced
from earlier Scramble.)
* BAM: Added support for CIGAR strings above 65535 elements long. See
http://sourceforge.net/p/samtools/mailman/message/30672431/
* BAM: Removed buffer overrun in records with no auxiliary data and a
record length of 1024 or a higher power of 2.
* BAM: Handle newline/carriage-return format files.
* BAM: Bug fixed the bin/index calculation; Now [beg,end) instead of
[beg,end].
* BAM: Fixed the SAM parser to handle integers between -2billion and
+4billion. (Incoming change to the SAM spec.)
Version 1.13.7 (30th May 2014)
--------------
* CRAM: Bug fixed the required fields detection code. (It was crashing
when running scram_flagstat on Cramtools output.)
* CRAM: Bug fix to cram_dump output on files using E_BYTE_ARRAY_*
codecs.
* BAM/CRAM: Modified the thread-pool to try and minimise the number of
threads used when the program hits an I/O bottleneck. This avoids
CPU auto-frequence-scaling causing slowdowns.
Version 1.13.6 (19th May 2014)
--------------
* CRAM: Major overhaul of how data series are assigned to CORE vs
EXTERNAL blocks. The net effect is that CRAM file should become
slightly smaller and also faster for decoding when the decoder only
needs specific SAM columns. This has dramatically sped up flagstats.
* CRAM: Selection of compression algorithm for external blocks is now
more advanced. We allows specifying multiple compression (eg gzip,
bzip2, lzma, rANS) and the tool will learn which methods work best
for which blocks and adapt. (This matters only for v3.0 CRAM
specification, so is experimental.)
* Cram_dump should now do a better job of auto-detecting binary vs
printable text, allowing printing of arbitrary blocks in a
friendlier fashion. It also now tracks which block is for which
data-series and displays these in the summary output.
* Cram_index has been refactored, with the code moving out of the
program and into the io_lib library. It has also been bug-fixed to
cope with multiple references packed into a single cram container.
* CRAM: bug fixes
- The EOF writing code now uses the correct bit stream for value -1.
- Changing the version with scramble -V wasn't having any effect.
- The BETA codec now correct honours beta offset value for zero
length codes. (Previously unused)
- EOF now returns the correct value when a CRAM file is closed
before attempting to decode the first sequence. (Ie header only.)
* BAM: bug fixes
- Adding @SQ lines to a BAM file with no textual representation of
the SAM header now works.
- Fixed an issue in multi-threaded decoding, causing rare deadlocks.
* CRAM: experimental changes
- Further tweaks to the rANS codec used for the version 3 CRAM.
(Ongoing work, to be used for experimentation only.)
Version 1.13.5 (28th February 2014)
--------------
* CRAM: Fixed two bugs involving reference sequences:
- When loading a fasta file containing un-folded sequence (all on
one line) the input data wasn't uppercased. This could lead to
invalid slice MD5 sums if the fasta file contained lowercase
sequence.
- In some situations the MD5 header in the @SQ line would be
computed on a blank sequence, leading to header errors.
Version 1.13.4 (17th February 2014)
--------------
* BAM: Fixed some buffer overruns in BAM decoding.
* CRAM: Added support for CRAM EOF blocks (new in CRAM v2.1
specification). Also improved BAM EOF block checking. v2.1 is now
the default output version.
* CRAM: Fixed an error causing multi-threading to take longer for the
first additional thread.
* CRAM: Improved memory-caching of reference sequences when fetching
via MD5 path. Also reduced memory used when loading via REF_PATH.
* CRAM: Experimental / alpha quality code for CRAM version 3.0,
including new codecs (rANS & arithmetic coders) and CRCs.
* CRAM: Small improvements to the bzip2 (-j) mode. It now periodically
tests bzip2 vs gzip and uses whichever is best rather than forcing
all compression to go via bzip2.
* CRAM: Fixed crash when handling CRAM files with non-sequential
reference IDs.
* Scramble: Now supports 8-way quality binning as an output option.
* Index_tar: Fixed debian bug #729276 - buffer overflow.
* Improved Windows building via cross-compilation.
Version 1.13.3 (25th October 2013)
--------------
* Improved robustness of CRAM support.
* Fixed important bugs in CRAM multi-threading support.
* CTF has been removed from io_lib source tree. Use 1.13.2 or older if
you still need this.
* CRAM now supports the new SAM 0x800 supplementary flag.
* Minor optimisations to CRAM compression levels (-1, -3 etc) so they
are more distinct.
* Fixed bug with curl timeouts when fetching large traces and/or
reference sequences.
Version 1.13.2 (25th June 2013)
--------------
* Added multi-threading support for sam/bam/cram I/O.
* New scram_flagstat command, mainly to act as a test harness for
reading speed.
* Bug fixes and improvements to reference sequence handling, in
particular when dealing with unsorted data.
* Sped up SAM decoding by about 70%. Also improved robustness of
header parsing.
* Improved automatic file type detection (scramble).
* The CRAM header block is now padded out with lots of nul characters
to permit inline editing, although we have no tool or API to do this
currently.
Version 1.13.1 (3rd May 2013)
--------------
* CRAM now has support for storing unsorted data and for using
non-reference based encoding (although this isn't very efficient).
* CRAM can now use the EBI MD5 server: http://www.ebi.ac.uk/ena/cram/md5/%s
The library will use a colon separated REF_PATH environ (see
TRACE_PATH for analogous examples) to find references, and if set
will write a local MD5 cache to REF_CACHE environ.
* Added a rudimentary scram_pileup command, mainly as a test of the
library.
* CRAM now supports bzip2 encoding, specified using scramble -j.
* Various speed increases.
* Improved BAM support for non-intel hardware.
* Bug fixes to CRAM mate flags in various scenarios.
* Fixes to generation of NM and MD strings.
* Can now code with BAM files containing no text headers; only binary
@SQ records.
* Can also cope with SAM/BAM files containing no @SQ records at all
(entirely unmapped).
* More rigorous error checking in BAM/SAM/CRAM code.
* The code is more compatible with linking into samtools (a pilot
project is ongoing).
* CRAM encoding is more robust to broken CIGAR strings.
* Various bug fixes to cram indexing.
* [API] Various function renaming, to allow this and samtools
libraries to be linked into the same applications and also for
tidyness.
*_next_seq() becomes *_get_seq()
refs type is now refs_t.
cram_fd->SAM_hdr is now cram_fd->header
sam_header_*() becomes sam_hdr_*()
* [API] bam_construct_seq now has a different calling syntax, to make
it easier for memory allocation.
* [API] Added various auxiliary field construction functions to BAM.
* [API] Setting cram options is now done via a varargs syntax.
* [API] Added extern "C" around headers to permit use by C++.
Version 1.13.0 (21st Mar 2013)
--------------
* Added SAM, BAM and CRAM APIs.
* Added scramble and cram_merge tools using the above APIs.
* Copied in (needs to be "move") dstring, string_alloc and zfio files
from main Staden Package tree.
* Minor code tidyups to remove warnings from gcc -Wall and from the
Intel compiler.
Version 1.12.5 (3rd Feb 2011)
--------------
* Fixed detection of requirement for va_copy(); affecting some Mac
builds.
* Added hash_exp. I'm not sure this is of pratical use, but it was in
the source tree and in theory at least still works.
* Removed minor memory leak in HashTableResize().
Version 1.12.4 (7th July 2010)
--------------
* Fixed bug added with 1.12.3 in extracting data from .hash files.
Version 1.12.3 (6th July 2010)
--------------
* Resolved compilation problems: endianness and large file support are
now detected in a wider variety of cases.
* srf2fastq now automatically outputs appropriate calibrated vs
non-calibrated values without having to explicitly use -c. This
option is now only needed when both CNF1 and CNF4 ZTR chunks are
present (it'll default to CNF4).
* srf2fastq no longer forces "." to be reformatted to "N".
* srf_dump_all improvements when facing read names with "#index" from
tagged runs.
* srf_filter can now act as a pipe, reading from stdin and writing to
stdout.
* hash_tar can now index multiple tar files together into one .hash
file. hash_list -l has been extended to list the originating .tar.
Additionally "hash_tar -m map_file" allows renaming of tar filenames
prior to indexing, allowing for a hacky way to work around modified
names, cases, directory layouts, etc.
Version 1.12.2 (15th Jan 2010)
--------------
* Extra options in srf2fastq: -S to output split regions sequentially
to stdout. -r to request a region to be reverse complemented before
output.
* API addition
- Added pooled_alloc.h. This is a general purpose mechanism of
pooling multiple fixed size memory allocations into fewer malloc()
library calls.
- HashTables now have a HASH_POOL_ITEMS option to use the above
pooling system. This reduces memory wasted and speeds them up.
* Bug fix: Fixed ztr_add_text() so that is leaves two nul bytes on the
end of TEXT chunks instead of one, as documented in the ZTR
specification.
* Bug fix: Fixed buffer overrun in parse region chunks; srf2fastq and
srf2fasta.
* Bug fix: API read_sff_read_data() did not skip ahead to the next
8-byte boundary.
Version 1.12.1 (7th August 2009)
--------------
* Fixed the endianness detection in io_lib/os.h when used in
conjuction with auto-conf. This fix allows for "fat" binaries to be
built on MacOS X.
* Fixed io_lib-config program to use -lstaden-read instead of -lread.
Version 1.12.0 (29th July 2009)
--------------
* Renamed the library from libread.so to libstaden-read.so. This was
already the case for the Fedora bundled RPM.
* Switched to using libtool to allow building of dynamic libraries.
Note that this is tweaked to not use -rpath though. Proper library
versioning has been added too.
* Removed deprecated platform specific tools: illumina2srf,
srf2illumina.
* Srf_info now reports the compressed size of chunks, sorted by type,
in addition to their counts. It also correctly sums to over 2Gb now
for base-call counting.
* Various SRF tools have had the maximum sequence length changed from
1024 to 10000. This allows for even the most gifting capillary traces.
* API
- The Array functions now take size_t instead of int for the
array dimensions. (API CHANGE)
- Removed the (unused?) pipe2 function from compress.h. This was
intended to be internal only, and it now clashes with a new linux
kernel function. (API CHANGE)
- Added iterators to the HashTable* api.
* Bug fixes
- Fixed a memory allocation bug in the codes2codeset() function.
- ztr2read() should now work better on ZTR structs with no BPOS
chunk.
- Fixed various srf tools when facing an SRF file containing zero
chunks in the data block header.
- index_tar handles some GNU tar extensions better (LongLink).
Version 1.11.6.1 (9th December 2008)
----------------
* Identical except removal of a debugging printf statement in solexa2srf.
Version 1.11.6 (9th December 2008)
--------------
* illumina2srf, srf2illumina, srf2fastq
- We no longer change from log-odds to phred when storing data in
SRF, instead preferring to just mark it in correct input
scale. srf2fastq now honours this scale information and so the
conversion from log-odd to phred is done at the export stage
instead. (Chris Saunders)
- Bug fix to srf2illumina qcal conversion. Combined with above
changes the qcal output should now be 100% identical to the
original data input via illumina2srf.
* API
- New function srf_next_ztr_flags. This is like srf_next_ztr but
also returns the SRF flags value (good/bad read, etc).
* srf_filter, srf2fastq, srf_info (Steven Leonard)
- Improved support for multiple index blocks in SRF files, eg from
manually concatenated files.
- srf2fastq now sports options for splitting the output into
multiple fastq files when the input data is a paired-end run.
Version 1.11.5 (3rd December 2008)
--------------
* Illumina2srf
- Fixed major bug with using *both* -qf and -qr together. The
quality values for the reverse strand were shifted by one
character.
- Fixed qcal quality values so they're not shifted down by 64
(illumina format fastq).
- Fixed bugs in parsing directory names if not matching the expected
format.
* Removed major memory leaks from srf_filter.
* hash_sff now has support for outputting the table of contents to a
new file rather than appending to an existing sff file or copying
the entire contents to a new file.
* Various man pages have been added. The list is still incomplete
though. Additions are most welcome.
* New program: srf_list. This lists and/or counts the number of
sequences within an SRF file.
Version 1.11.4 (11th September 2008)
--------------
* New "make check" build target to perform some automated tested.
Currently limited to testing the SRF tools.
* Fixed machine endianness issues. Specifically this resolves known Intel
MacOS-X problems.
* New SRF tools
- srf_info: reports simple metrics on the contents of an SRF file.
- srf_filter: slices and dices the SRF file to produce a new one
with various types of data removed.
* illumina2srf
- Minor float/int rounding change when storing int/nse/sig2 data.
- Improved error detection such that it returns a failure code more
often given a parsing issue.
- Added -pf/pr parameters for storing Phasing files.
- Reduced memory usage, especially on large numbers of clusters per
tile. We may now produce multiple DBH blocks per tile. Also major
reduction to memory when handling the .params files.
- Added storage of 2nd .params file (firecrest).
- Fixed bug in the automatic base-call version identification.
- Fixed a bug with using -qf/qr when not providing all tiles (ie not
starting from tile number 1).
- Bug fix with storing the reverse matrix file in paired-end runs; a
duplicate of the forward one was being used instead.
* General SRF
- Improved error checking in srf_index_hash. It now spots duplicate
reads and also has a -c option to check an existing SRF file
without writing the index.
- Fixed a memory leak in srf_next_ztr(), triggered in srf2fastq -C.
Version 1.11.3 (9th July 2008)
--------------
* illumina2srf change:
- IMPORTANT bug fix to illumina2srf when using the "-r" flag to
store raw (.int and .nse) data. This could often result in
corrupting the data ZTR meta-data for the SMP4 chunks resulting in
confusion over which trace channels are raw and which are
processed.
Fortunately the corruption is reversable. For more details and a
fix see the ssrformat announcement of the issue:
http://www.bcgsc.ca/pipermail/ssrformat/2008-July/000531.html
* General SRF changes:
- Removed a memory leak in ztr_find_chunks().
- Added SRFB_NULL_INDEX as an SRF block type. This provides a more
transparent way to skip over the 8 zero value bytes that may exist
at the end of an SRF file missing an index block.
* Other changes
- Fixed a bug in extract_seq when operating on multiple files and
outputting to a file rather than a pipe. An erroneous seek in the
mFILE code lead to it repeatedly truncating the output, resulting
in one sequence file at the end instead of multiple files.
Version 1.11.2 (4th June 2008)
--------------
* solexa2srf/srf2solexa changes:
- Renamed to illumina2srf/srf2illumina.
- Incorporated support the IPAR format (Come Raczy, Illumina).
- Added support for qcal format data (Come Raczy).
- Added -C option to tag data as failing the chastity filter, but it
is still included in the SRF output (Camil Toma).
- Many more additional features added to srf_dump_all provided by
Camil Toma. It somewhat overlaps srf2solexa now, but may still
have it's own use.
- Ztr TEXT chunks now output in srf2solexa.
- Improved ways to specify matrices (-mf/-mr) in solexa2srf.
- solexa2srf is substantially faster when reading gzipped files.
- The -N/-n naming scheme options for solexa2srf now default to the
same conventions used by GERALD. Added additional %d, %m and %r
format rules too.
- Calibrated confidence values are now output if -qf or -qr
paramaters are used, in addition to uncalibrated ones. These are
stored in phred scale in a CNF1 ZTR chunk.
* srf2fastq now has a -c option to output calibrated confidence values
(if present). It also supports multiple archives on the command line.
* SRF fixes:
- Better handling of full pathnames in solexa2srf.
- Use binary IO mode; fixes bugs on Windows.
- Fixed an error where some chunks were not compressed properly
(valid still, just not compressed).
- Removed memory corruption in solexa2srf (in rare cases).
- Fixed bug with binary formatted read_id suffixes (fixed by
Cristian Goina).
- Initialised memory in hash table code (used in indexing amongst
other things).
- Indexes very occasionally failed to find a trace that did infact
exist.
- Removed memory leak in construct_trace_name (patch from John
Emhoff, Helicos).
- Fixed reading of XML block in srf_read_xml(). From John Emhoff.
* Added SRF= format string to TRACE_PATH to facilitate on-the-fly
extraction from indexed SRF files. This means io_lib can now
transparently pull traces from an archive or treat it as if it was a
directory - eg "foo.srf/IL15_..._123:456".
* Bug fix (SF-1898427) - now builds on Fedora.
* Better handling of 64-bit file size sensing in autoconf.
Version 1.11.1 (not officially released - internal testing only)
--------------
Version 1.11.0 (20th February 2008)
--------------
First official release of v1.11.0 and SRF support.
* Further speed improvements to solexa2srf.
* Added extract_qual program (analogous to extract_seq).
* Added new srf2fasta program and also sped up srf2fastq by 25%.
* Solexa2srf now supports storing the raw .int/.nse trace data instead
of or in addition to the processed .sig2 data.
* Solexa2srf now stores enough to reproduce sufficient firecrest
output to rerun the solexa basecaller. Specifically that's a couple
matrix files and 'region' data for paired end runs.
* Minor changes / bug fixes:
- extract_seq no longer attempts to gzip the output by default if
the input was gzipped
- ztr2read conversion (eg visible in trace_dump) now correctly
handles ZTR files with multiple SMP4 chunks.
- Fixed memory leaks in various bits of SRF code (srf_extract_linear
mainly and srf_index_hash).
Version 1.11.0b8 (25th January 2008)
----------------
(Hopefully final beta test of SRF code before official 1.11.0 release.)
* Bug fixed the index format. We incorrectly handled null dbhFile and
containerFile elements plus incorrectly computing the index size.
* Improvements for solexa2srf code.
- Can store raw vs processed data
- Stores matrix and .params contents.
- Optional chastity filtering.
- Input data may now be gzipped.
* Minor fixes to output of trace_dump and ztr_dump.
* Minor srf_index_hash bug fixes (when dealing with concatenated
indexed files).
Version 1.11.0b7 (11th January 2008)
----------------
* IMPORTANT bug fix to the SRF format. The Data Block Header had the
blocksize field 4 bytes too large. Now fixed. Old SRF files will not
be readable by this new code (as they were in error).
Version 1.11.0b6 (2nd January 2008)
----------------
* Changes to adhere to SRF v1.3:
* Removal of the readID counter.
* Added support for printf style name formatting.
* Minor index format tweaks (64-bit data, dch/container filenames).
Index format is therefore now 1.01.
Version 1.11.0b5 (8th November 2007)
----------------
* Major reorganisation of directories. All library code is in subdir
"io_lib". The code now uses "io_lib/xxx.h" in all include statements
too.
* Fixed memory leaks in ZTR code
* Various SRF bug fixes and better support for sample OFFS metadata in
both ZTR/ZTR.
* Added srf_extract_hash program to perform random-access on a hash
indexed SRF archive.
Version 1.11.0b4 (26th October 2007)
----------------
* The SRF format now supported adheres to version 1.2.
* More speedups, in particular focusing on uncompression this time, so
srf2solexa is an order of magnitude faster.
* ztr2read() now honours the read_sections() options and so is much
faster when only decoding (say) base and quality values.
* New program srf2fastq.
* Internal changes to various ztr data structures. If you use these
yourself take note of the new ztr_owns fields to avoid memory leaks.
Version 1.11.0b3 (16th October 2007)
----------------
* Major speed improvements for compression. solexa2srf is now 30-35x faster.
* Fixed various buffer overruns and memory leaks reported by valgrind
in the new deflate interlaced and SRF code.
Version 1.11.0b2 (2nd October 2007)
----------------
* Minor version change to fix typoes in Makefile system.
Version 1.11.0b1 (28th September 2007)
----------------
Beta release 1.
* Added preliminary SRF support. This consists of a new subdirectory
'srf' (yes these all really need merging into a single directory,
but that's a later task), a substantial update to ZTR and a variety
of SRF tools in progs.
The old huffman_static.[ch] files were renamed and substantially
worked upon to create deflate_interlaced.[ch].
Added new compression types. xrle2, tshift and qshift. The latter two
of these are very specific to trace and quality packings. May need to
rename to be more generic.
Version 1.10.3 (???)
--------------
* The HashTable interface now also allows for Bob Jenkins' lookup3
64-bit hash function. This allows for substantially larger hash
tables.
* Replaced tempnam() with tmpfile(). On systems without tmpfile
(Windows) this is simply a wrapper to use the old tempnam calls.
* hash_extract bug fix for windows: now operates in binary mode.
* INCOMPATIBLE CHANGE: On windows we now use semi-colon as the path
separator. The reason is that with the MinGW getenv() seems to do
"clever things" with PATH variables and consequently ends up
corrupting our clumsy attempt of escaping colons in paths.
* Fasta format is semi-supported in "plain" format. It returns the
first entry when reading.
* Experimental support for static huffman (STHUFF) compression type.
Version 1.10.2 (30th May 2007)
--------------
Primarily this is a bug fix release.
* Convert_trace now has -signed and -noneg options to control signed
vs unsigned issues when shifting trace data about.
* Include files now have C++ extern "C" style guards around them.
* Various programs now accept -ztr command line arguments to force ZTR
format reading. This is for consistencies sake only and it is
recommended that users simply let the programs automatically detect
the file formats.
* Hash_exp now outputs to the same file containing the experiment
files (in appended hash-table mode). It also has better Windows
handling (stripping ^M and using binary mode).
* hash_extract bug fix: now only needs at least 1 filename specified
when fofn mode is not in use.
* mFILE emulation: bug fixes when dealing with ftruncate, append mode,
checking for read/write flags, new mfcreate_from() function.
* ZTR: added an experimental ZTR_FORM_STHUFF compression scheme. This
uses static huffman encoding on a predefined hard-coded set of
huffman tables. The purpose (as yet not put into action) is to allow
efficient compression of very small data sets for Illumina, AB
SOLiD, etc style traces.
Version 1.10.1 (20th June 2006)
--------------
* Trace files are now opened in read-only mode by default
(open_trace_file func).
Version 1.10.0 (15th June 2006)
--------------
* Two new environment variables are used, EXP_PATH and TRACE_PATH, to
replace RAWDATA. EXP_PATH is used when the new open_exp_mfile()
function is called and TRACE_PATH is used when open_trace_mfile() is
called. Both default to using RAWDATA when EXP or TRACE env is now
found. Also defined a trace type TT_ANYTR which is analogous to the
existing TT_ANY except it will not look for experiment or plain
format files.
Modified the various example programs to use the appropriate open
call. This allows for traces and experiment files to have identical
names, such as is usually the case when querying named trace objects
from a trace server.
* New program: extract_fastq to generate FASTQ output format.
* New program: hash_exp. This allows multiple experiment files to be
contatenated together and then indexed so io_lib can still treat
them as single files.
* The URL based search path mechanism now by default uses libcurl
instead of wget. This makes it considerably faster.
* If an element in RAWDATA, EXP_PATH or TRACE_PATH now starts with the
pipe symbol ("|") then the compressed file extension code is negated
for that search element. (This prevents looking for foo.gz, foo.Z,
foo.bz2, etc if it fails to find foo.)
* Added HashTableDel() and HashTableRemove() functions to take items
out of a hash table.
* ZTR's compress_chunk() and uncompress_chunk() functions are now
externally callable.
* New program io_lib-config. This has --version, --cflags and --libs
options to query the appropriate configuration when compiling and
linking against io_lib. There's also a new io_lib.m4 file which
provides an AC_CHECK_IO_LIB autoconf macro to use io_lib-config and
generate appropriate Makefile substitutions.
* Updated the autoconf code to support libcurl searching.
* Renamed SCF's delta_samples[12] functions to be
scf_delta_samples[12]. (From Saul Kravitz)
* Added a '-error filename' option to convert_trace. (From Saul Kravitz)
* Bug fix: HashTableAdd() now works properly with non-string keys.
* Bug fix to read_dup().
* Bug fix to xrle which could read past the array bounds. It also now
handles run-lengths of 256 or more.
* Bug fix: the fwrite_* functions no longer close the FILE pointer
given to them.
* Bug fix to fdetermine_trace_type(); it now rewinds the file back.
* Bug fix to mfseek and mrewind; they both now clear the EOF flag.
* Bug fix to find_file_dir().
Version 1.9.2 (14th December 2005)
-------------
* Added AC_CHECK_LIB calls for the nsl and socket libraries
(gethostbyname / socket functions). Needed for Solaris compilations.
* In extract_seq, used open_trace_mfile instead of
open_trace_file. Functionally this is the same, but it is faster.
* fwrite_reading() now frees the temporary mFILE it created.
* mfreopen_compressed() no longer closes the original FILE
pointer. This brings it back into line with the original
functionality provided in 1.8.x. It also cures a bug where the old
file pointer was often left opening meaning operates on many files
could could cause a resource leak ending in the inability to open
more trace files.
* Added private_data and private_size to the Read struct. Populate
these when reading SCF files.
* Hash_extract now returns an error code to the calling process upon
failure.
* Major overhaul of hash_sff. It no longer loads the entire file into
memory. It can now cope with adding a hash index to an archive that
already contains an index.
* Added support for 454's "sorted index" code. NB this is based on the
extraction code from their getsff.c code and has not been tested
with a genuine indexed SFF file yet.
* Fixed an uninitialised memory access in mfload().
* Fixed a bug where hash query searches for items that do not exist
and map to an empty bucket could cause hangs or crashes.
* Fixed a hang in mfload() when reading a zero length file.
Version 1.9.1
-------------
* Implemented the SFF (454) file structure, currently as read-only.
This is supported both as an archive containing multiple files and
also as a single SFF entry.
* Allow for SFF=? components in RAWDATA search path.
* Tar files, SFF archives and hashed archives (eg hashed tar, sff, or
"solid" archives) may now be used as part of a pathname. Eg if a
tar file foo.tar contains entry xyzzy.ztr then we can ask to fetch
trace foo.tar/xyzzy.ztr instead of requiring setting of the
RAWDATA environment variable.
* Changed the HashFile format slightly. It's now format 1.00.
The key difference is that it has a file footer pointing back to the
hashfile header (so the hashfile can be appended to an archive) and
it also has an offset in the header to apply to all seeks within the
archive itself, so it can be prepending to an archive that's already
been indexed without breaking the offsets.
Extended the hash_tar program to allow control over these header options.
* Fixed divide-by-zero buf when calling mfread for zero
* Removed the warning for unknown ZTR chunk types. It now just
silently stores them in memory.
* mfopen now honours binary verses ascii differences (and so updated
Read.c calls accordingly) so that Windows works better.
* Removed file descriptor 'leak' in write_reading().
* Unset compression_used when opening uncompressed files instead of
leaving as the last value.
* Fixed a file descriptor (and some memory) leak in
freopen_compressed. (Bug ID #1289095)
* Fixed the hash file saving and loading so that it works on all
platforms instead of just x86 linux. There were bugs in assuming the
size of structures. The assumptions are still there in that I assume
they pad the same internally (for ease of coding - we can change it
when we finally see a system which operates differently), but the
final "boundary" padding has been resolved.
Version 1.9.0
-------------
* ***INCOMPATIBILITIES*** to 1.8.12
- The Exp_info structure now internally contains an "mFILE *" member
instead of "FILE *" member. If you use the experiment file functions
for I/O then hopefully it'll still work. However if you directly
manipulated the Exp_info yourself using fprintf etc then you will
need to modify your code.
- Some functions no longer have external scope. Most of these did not
previously have external function prototypes. If you have a burning
need to use one of these, please contact me directly via sourceforge.
The full list is:
ctfType (global variable) ztr_encode_samples_C
replace_nl ztr_encode_samples_G
ctfDecorrelate ztr_encode_samples_T
exp_print_line_ ztr_decode_samples
find_file_tar ztr_encode_bases
find_file_archive ztr_decode_bases
find_file_url ztr_encode_positions
ztr_write_header ztr_decode_positions
ztr_write_chunk ztr_encode_confidence_1
ztr_read_header ztr_decode_confidence_1
ztr_read_chunk_hdr ztr_encode_confidence_4
compress_chunk ztr_decode_confidence_4
uncompress_chunk ztr_encode_text
ztr_encode_samples_4 ztr_decode_text
ztr_decode_samples_4 ztr_encode_clips
ztr_encode_samples_common ztr_decode_clips
ztr_encode_samples_A
- Some external functions have changed prototypes to use mFILE instead
of FILE. Most cases of these I've put in place a wrapper function
with the old name, but not yet all. Functions changed are:
ctfFRead write_scf_samples32
ctfFWrite write_scf_base
exp_print_line write_scf_bases
exp_print_mline write_scf_bases3
exp_print_seq write_scf_comment
read_scf_header fcompress_file
read_scf_sample1 fopen_compressed
read_scf_samples1 freopen_compressed
read_scf_samples31 be_write_int_1
read_scf_sample2 be_write_int_2
read_scf_samples2 be_write_int_4
read_scf_samples32 be_read_int_1
read_scf_base be_read_int_2
read_scf_bases be_read_int_4
read_scf_bases3 le_write_int_1
read_scf_comment le_write_int_2
write_scf_header le_write_int_4
write_scf_sample1 le_read_int_1
write_scf_samples1 le_read_int_2
write_scf_samples31 le_read_int_4
write_scf_samples2 fdetermine_trace_type
- Removed support for the OLD unix "pack" program as a valid trace
compression algorithm.
- Removed CORBA support. (It wasn't enabled and I've no idea if it
even worked as I cannot test it.)
- The default search order for RAWDATA now has the current working
directory at the end of RAWDATA instead of the start.
* Significant speed ups, particularly when dealing with reading
gzipped files or when extracting data from tar files.
* New external functions for faster access via mFILE (memory-file)
structs. These mimic the fread/fwrite calls, but with mfread/mfwrite
etc.
* Numerous minor tweaks and updates to fix compiler warnings on more
stricter modes of the Intel C Compiler.
* Preliminary support for storing pyrosequencing style traces. This
has been modeled on the flowgram data from 454, but should be
applicable to other platforms. ZTR has been updated to incorporate
this too.
The Read structure also has flow, flow_order, nflows and flow_raw
elements too. Code to convert these into the more usual traceA/C/G/T
arrays exists currently as part of Trev (in tk_utils in the Staden
Package), but this may move into io_lib for the next official release.
* New hash_tar and hash_extract programs. These replace the index_tar
program for rast random access. For RAWDATA include "HASH=hashfile"
as an element to get io_lib to use the archive hash. It's possible
to create hash files of most archive formats as the hash itself
contains the offset and size of each item in the archive. This means
that extracting an item does not need to know the format of the
original archive.
Some benchmarks show that on ext3 it's actually faster to extract
files from the hash than directly via the directory. This was
testing with ~200,000 files, whereupon directory lookups become
slow. I'd imagine ResierFS or similar to be faster.
* Added an XRLE encoding for ZTR. This is similar to the existing RLE
mechanism but it copes with run length encoding of items larger than
a single byte. It's current use is for storing the 4-base repeating
flow order in 454 data.
Version 1.8.12
--------------
* The ABI format code now reads the confidence values from KB (via
PCON field).
* New program: trace_dump. Like scf_dump, but deals with generic input
formats.
* Slightly more sensible average spacing calculation in the ABI
reading code. It's still not perfect, but is only used when the real
spacing value is negative or zero.
* Disabled the base-reordering fix for ABI files. We believe the bug
causing this no longer exists.
* Expriment file format: added FT (EMBL feature table) and LF
(LiGation; a combination of LI and LE) records.
* Experiment files: strip out digits from the sequence we read
(for better support of EMBL files).
* Experiment files: fixed a potential buffer overrun in the conversion
of binary confidence values to ascii values.
* Minor improvements to portability (INT_MAX vs MAXINT2) and removal
of some compilation warnings.
* Extract_seq now accepts a -fofn argument.
* New functions: read_update_base_positions() and
read_udpate_confidence_values() to replace read_update_opos().
These apply an edit buffer to the sequence details and are used (for
example) within Trev for saving edits back to a trace file.
* Better error handling in fcompress_file().
* New specifiers in RAWDATA. Added a generic URL format (eg
"URL=http://some/where/trace=%s") implemented via use of wget. There
is also an ARC= format to make use of the Sanger Trace Archive,
although currently this will not work externally.
* Zero memory used in read_alloc(). Fixes to read_dup().
Version 1.8.11
--------------
* Rewrote the background subtraction in convert_trace to deal with each
channel independently.
* Make install now install the include files (all of them, although not all
are strictly required) in $prefix/include/io_lib/.
* Moved the ABI filter wheel order (FWO) reading from outside the sample
reading code into the general reading bit as this is needed for reading the
comments too (it also applies to the order of the signal strengths). Hence
when the READ_COMMENTS section only is defined it now works correctly.
* Moved the DataCount #defines into static values and added a
abi_set_data_counts function to change these. This allows reading of the raw
data from ABI files. This is used within the new convert_trace -abi_data
option.
* Removed a one-byte write buffer overflow in the CTF writing code.
* New Experiment file records WL and WR for indicating clip points within a WT
trace.
* Removed the saved copy of fp for exp_fread_info in 'e' structure as it
doesn't belong to us. (If we do store it there then the exp_destroy_info
function will free it and this causes bugs.). POTENTIAL INCOMPATIBILITY:
if you assumed that exp_destroy_info closed the files that you opened and
passed into exp_fread_info, then this is no longer true.
* New function read_dup() to copy a Read structure.
* get_read_conf() now deals with loading confidence values from any suitable
format and not just SCF.
* Fixed memory leak in ztr (ztr->text_segments).
Version 1.8.10
--------------
* Added Steven Leonard's changes to index_tar. It no longer adds index entries
for directories, unless -d is specified. It also now supports longer names
using the @LongLink tar extension.
* Fixed a bug in exp2read where the base positions were random if experiment
files are loaded without referencing a trace and without having ON lines.
* New program get_comment. This queries and extracts text fields held within
the Read 'info' section
* Overhaul of convert_trace to support the makeSCF options (normalise etc).
Version 1.8.9
-------------
Sorry this isn't a proper changes-by-source listing. Any suggestions for how I
collate the 'cvs log' output into something more concise? The below text is
simply a list of changes, but more complete than in the NEWS file.
* ZTR spec updated to v1.2. The chebyshev predictor has been rewritten in
integer format. The old chebyshev still has a format type allocated to it
(73), but the new ICHEB format (74) is now the default. The old floating
point method was potentially unstable (eg when running on non IEEE fp
systems). The new method also seems to save a bit more space.
* The docs and code disagreed for CNF4 storage. Changed the docs to reflect
the code (which does as intended).
* ZTR speed increase. Follow1 is substantially faster, increasing write
times by about 10%.
* New named formats types. ZTR1, ZTR2 and ZTR3. ZTR defaults to ZTR2, but we
can explicitly ask for another compression level if desired. Also explicit
statement of format (TT_ZTR instead of TT_ANY) removes the need for
a rewind() call and so ZTR can now work through a pipe.
* General tidy up to remove a few compilation warnings (missing include files,
signed vs unsigned issues, etc).
* Initial support is included for BioLIMS integration, but this is not
complete. (Unfortunately it requires access to a non-public library.)
* New function compress_str2int - opposite of existing compress_int2str.
* (Steven Leonard). Uses zlib for gzip compression and decompression.
These are extracts from the full Staden Package change log. They may not be
immediately obvious when taken out of context, but we feel this information
may still be useful to the users of io_lib.
23rd August 2000, James
-----------------------
1. Removed find_trace_file and added an open_trace_file function.
The idea is that searching for a files existance is better done by attempting
to open it. This in turn allows for more possibilities of file searching.
Makefile
utils/open_trace_file.c
read/Read.c
read/scf_extras.c
read/translate.[ch]
progs/extract_seq.c
2. Added a TAR option to RAWDATA. We can now read trace files directly from
tar files (although they cannot be written to directly).
utils/open_trace_file.c
utils/tar_format.h
3. Created an index_tar program to optimise tar reading, although it is not
mandatory.
progs/index_tar.c
progs/Makefile
4. Fixed a bug when dealing with plain text files containing spaces.
plain/seqIOPlain.c
31st July 2000, James
---------------------
1. Renamed TTFF to be ZTR.
read/Read.[ch]
utils/traceType.c
utils/compress.c
ttff/* -> ztr/*
README
2. ZTR reading will now stop when it spots a ZTR magic number. This allows
concatenation of ZTR files.
ztr/ztr.[ch]
15th June 2000, James
---------------------
1. Added a TTFF_FOLLOW filter type to TTFF. This is enabled with compression
level 2 for the chromatogram data.
io_lib/ttff/ttff.[ch]
io_lib/ttff/compression.[ch]
9th June 2000, James
--------------------
* RELEASED 1.8.4 */
1. Added zlib bits to windows compilation.
io_lib/mk/windows.mk
2. Updated convert_trace. It can now reduce sample-size to 8-bit (with the
"-8" option) and the formats may now be specified as either integer or text
format. The text format is case insensitive.
io_lib/progs/convert_trace.c
io_lib/utils/traceType.c
3. More windows binary vs ascii fixes. When reading we switch to binary mode
before attempting fdetermine_trace_type, otherwise it fails to auto-detect
TTFF (which includes a newline as part of the magic number). Also added a
_setmode() call to the fwrite_reading code too.
io_lib/read/Read.c
4. Changed the default compression technique of TTFF to that used in 1.8.2. I
accidently left it set to the experimental dynamic-delta method in 1.8.3,
which currently doesn't have the uncompression function! Also removed lots of
debugging output.
io_lib/ttff/ttff.c
io_lib/ttff/ttff_translate.c
5. Bug fix to exp2read - when no right hand quality cutoff is specified we
were defaulting to the left end of the trace, instead of the right end. (This
only happens when opening experiment files which do not have clip points.)
io_lib/read/translate.c
6. Changed the strftime() format in ABI reading code to use %H:%M:%S instead
of %T, as %T doesn't appear to be part of ANSI (I think it's probably
XPG4-UNIX). It worked on Unix machines, but not on MS Windows.
io_lib/abi/seqIOABI.c
8th June 2000, James
--------------------
* RELEASED 1.8.3 */
1. Updated the CTF support so that it includes a couple of new block
types. This allows for base positions being non-sequentially ordered, as is
possible in severe compressions.
io_lib/ctf/ctfCompress.c
2. Overhaul of TTFF format - now more PNG based in style. Still highly
experimental.
io_lib/ttff/*
16th May 2000, James
--------------------
* RELEASED 1.8.0 */
1. Added szip support. Szip generally gives better compression ratios than
gzip and often marginally better than bzip2, but is generally considerably
slower at decompression.
io_lib/utils/compress.[ch]
2. Merged in Jean Thierre-Mieg's CTF code. This is a compressed trace format
which holds the same data as SCF, but in reduce space.
io_lib/read/Read.[ch]
io_lib/utils/traceType.c
io_lib/ctf/*
3. Added my own highly experimental TTFF format. (Thanks to Jean Thierre-Mieg
for re-awakining my interest in this.) TTFF files are typically equivalent in
size to bzip2'ed SCF files, but are much quicker to write than any of the
currently supported compressed formats. Depends on zlib.
io_lib/read/Read.[ch]
io_lib/utils/traceType.c
io_lib/ttff/*
4. Reorganised the Makefiles for easier building.
*/Makefile
5. New program "convert_trace". Primarily a test tool at present as it needs
a friendlier interface.
progs/convert_trace.c
20th April 2000, James
----------------------
1. Removed a file-descriptor leak in extract_seq.
io_lib/progs/extract_seq.c
22nd March 2000, James
----------------------
1. Fixed bug in time formatting from ABI files. We used strftime code
%a without setting tm.tm_wday (number of days since sunday). It's not
easy to work that out, so we convert from struct tm to time_t, which
resets any errornous elements of struct tm. Also fixed a silly error
where the end time was set to the start time (incorrectly).
io_lib/abi/seqIOABI.c
25th February 2000, James
-------------------------
2. Added checks for QR <= QL in the exp2read conversion function. This caused
trev to display incorrectly (blanking incorrect screen portions) when dealing
with inconsistent experiment files. Also changed qclip so that it doesn't
create this inconsistent case.
io_lib/read/translate.c
1st February 2000, Kathryn
--------------------------
1. Fixed bug which caused init_exp to crash when QL was more than 5 digits.
Increased it to handle 15 digits.
io_lib/read/translate.c
27th January 2000, James
------------------------
1. Moved Gap4's copy of scf_extras into io_lib, and renamed io_liub's
scf_bits to be scf_extras (to avoid editing too many #include statements).
Without this we were getting errors due to dynamic linking using odd
copies. Eg loading libread.so and then libgap.so meant that
find_trace_file called from edUtils2.c (libgap.so) would pick up the first
copy from libread.so, despite the fact that there's also a copy in the
same libgap.so.
gap4/scf_extras.[ch]
io_lib/scf_bits.[ch]
25th January 2000, Kathryn
--------------------------
1. Fixed crash in qclip due to insufficent arguments being passed to
find_trace_file and also fixed an array bounds error in scan_right of qclip.c
io_lib/read/scf_bits.c
19th January 2000, James
------------------------
4. Copied bits of the fakii and cap2/3 scf/expFile reading code into
io_lib. Not all of this is in there, just the things which seem to be
common and sensibly fit there. This also helps qclip to build on Windows.
FIXME: We should now remove some of this code from Gap4.
Also fixed a small memory leak in fopen_compressed() - it wasn't freeing
the result of tempnam().
io_lib/read/translate.c
io_lib/read/scf_bits.[ch]
io_lib/read/seqInfo.[ch]
io_lib/utils/files.c
io_lib/utils/compress.c
31st August 1999, James
-----------------------
1. -fasta_out mode of extract_seq now changes - to N.
io_lib/progs/extract_seq.c
27th August 1999, James
-----------------------
1. The order of information items added by the abi to scf code has
changed, to make it more sensible. Also fixed a bug in the textual (rather
than numerical) date output, and wrote this to the DATE field.
io_lib/abi/seqIOABI.c
2. makeSCF no longer adds a MACH field, as this was redudant.
io_lib/abi/makeSCF.c
3. Extract_seq now has proper use of CL and CR when using -cosmid_only. It
was assuming they were the same as QL/QR and SL/SR, which is not the case
(rather it's like having a CS line of `CL`..`CR`). Extract_seq also now
has a -fasta_out format option and can handle multiple files, which makes
it easier to produce a fasta file from multiple experiment files.
io_lib/progs/extract_seq.c
4th August 1999, James
----------------------
1. The exp2read() function in io_lib now initialises the confidence arrays
(eg r->prob_A) to zero, or to the experiment file AV line.
io_lib/read/translate.c
2nd June 1999, James
--------------------
1. The MegaBACE sequencer creates ABI files. However it does so in a odd way.
Sometimes the samples arrays are truncated such that bases are positioned
above samples which are not stored in the ABI file. We now realloc the samples
array in such cases and fill out the remainder with blank data. This removes a
crash in trev when viewing such data.
io_lib/abi/seqIOABI.c
2. Fixed a memory corruption of io-lib compression. The switch to use tempnam
(for Windows) implies that the filename returned is no longer allocated by us.
Unfortunately we forgot to remove the xfree(fname) calls.
src/io_lib/utils/compress.c
18th May 1999, James
--------------------
1. Fixed the trace rescaling option of makeSCF. We now go through the rescale
function twice. Once to work out the maximum value, and again to do the
rescaling. This fixes a bug where the maximum value after rescaling was
sometimes above 65536 and hence cause "trace wraparound" effects.
io_lib/progs/makeSCF.c
26th April 1999, JohnT
----------------------
1. Allow : to be entered in RAW_DATA by using ::
Misc/find.c
io_lib/utils/find.c
2. Support for fetching trace files using Corba
Modified:
Misc/find.c
mk/misc.mk
io_lib/utils/find.c
init_exp/init_exp.c
io_lib/read/Makefile
io_lib/utils/find.c
io_lib/utils/compress.c
io_lib/utils/Makefile
mk/global.mk
Added:
io_lib/utils/corba.cpp
io_lib/utils/stcorba.h
Generated from IDL:
io_lib/utils/trace.h
io_lib/utils/trace.cpp
io_lib/utils/basicServer.h
io_lib/utils/basicServer.cpp
3. Added ABI utility progs to NT port
mk/abi.mk
4. Added Windows 95 support
io_lib/utils/compress.c
mk/WINNT.mk
5th March 1999, JohnT
---------------------
Various changes for WINNT support as follows:
io_lib/utils - Don't redirect to /dev/null on WINNT
3rd February 1999, James
------------------------
1. Fixed problems reported by Insure on Windows NT.
These are mainly lack of prototypes (malloc/memcpy) and not returning properly
from 'int' functions. However one fix to seqed_translate.c (find_line_start3)
was a array read overflow.
io_lib/progs/makeSCF.c
18th January 1999, James
------------------------
1. Changed the read2exp io_lib translation function so that it can accept
lowercase a,c,g,t. Oddly enough it was already coded to accept lowercase IUB
codes, but we missed out a,c,g and t!
io_lib/read/translate.c
15th January 1999, JohnT
-----------------------
Modified files thoughout for Windows NT Compatibility as follows:
8. need to explicitly set text or binary file mode under WINNT
io_lib/exp_file/expFileIO.c
18. need to include stddef.h for size_t with Visual C++
io_lib/utils/array.h
19. need to have target LIBS (not LIB) and correct ordering for correct make
on WINNT. Also need additional abstractions to allow for different compile
and link calling conventions with Visual C++, and have rules for building
Windows .def files.
io_lib/abi/Makefile
io_lib/alf/Makefile
io_lib/exp_file/Makefile
io_lib/plain/Makefile
io_lib/progs/Makefile
io_lib/read/Makefile
io_lib/scf/Makefile
io_lib/utils/Makefile
18th December 1998, James
-------------------------
1. Added bzip2 recognition to the (de)compression code of io_lib. This is now
the latest bzip, and is recognised by phred (unlike bzip version 1). Bzip2 is
approx the same as bzip1, but more or less twice as fast for decompression.
io_lib/utils/compress.c
27th November 1998, James
-------------------------
1. Fixed the trace file searching mechanism in io_lib. When loading an
experiment file with LN/LT lines, we now first search for the trace file
relative to the location of the experiment file.
io_lib/read/Read.c
io_lib/read/translate.[ch]
16th November 1998, James
-------------------------
4. Added NT (NoTe) and GD (Gap4 Database) line types to the experiment file.
io_lib/exp_file/expFile.[ch]
24th September 1998, James
--------------------------
1. The scf reading and writing code now handles traces with zero bases.
Previously this failed after a malloc(0).
io_lib/scf/read_scf.c
io_lib/scf/write_scf.c
2. The ABI file reading code has been tidied up. It now also supports
conversion of more ABI fields, including RUND, RUNT, SPAC(2), CMNT, LANE and
MTXF.
io_lib/abi/seqIOABI.c
17th July 1998, James
---------------------
1. Extract_seq now copes with sequences containing no SQ line (instead of just
SEGV).
io_lib/progs/extract_seq
9th July 1998, James
--------------------
1. Enforce IUBC code set in io_lib when converting from trace (any format) to
experiment file. We leave the IUBC 'N' intact.
io_lib/read/translate.c
28th May 1998, James
--------------------
1. Added a read_sections() function to io_lib so that programs can state
which bits of a trace file they are interested in. The loading code only
then parses those bits. This can give big increases to things like init_exp
which only wants bases and does not care about the delta-delta format of SCF
trace data.
io_lib/read/Read.h
io_lib/read/translate.c
io_lib/scf/scf.h
io_lib/scf/read_scf.c
io_lib/abi/seqIOABI.c
io_lib/alf/seqIOALF.c
init_exp/init_exp.c
3. Extract GELN (gel name) from ABI file when converting to SCF.
io_lib/abi/seqIOABI.[ch]
2. Improved the makeSCF -normalise option. Background subtraction is now
cleaner (and simpler) and it also now scales the heights. Moved it to io_lib
as it's now freely available.
io_lib/progs/makeSCF.c
23rd March 1998, James
----------------------
1. Removed the change made on 7th May 1997 to seqIOPlain.c. This code is used
by extract_seq, and so clipping in seqIOPlain causes double clipping (and
hence wrong sections).
io_lib/plain/seqIOPlain.c
11th March 1998, James
----------------------
2. Removed the requirement of EXP_FILE_LINE_LENGTH in exp_fread_info().
This allows for (eg) tags with very long comments to be read in without
being truncated.
io_lib/exp_file/expFileIO.c
4th March 1998, James
---------------------
1. Following advice from Leif Hansson <leif.hansson@mbox4.swipnet.se>, the ALF
reading code now reads the "Raw data" subfile when the "Processed data"
subfile is not present, as "Processed data" is apparently an optional output
of the pharmacia software. Raw data is in the same format, although I do not
know what processing takes place to convert it to Processed data. (Looking at
some real traces, apparently none!)
io_lib/alf/seqIOALF.c
24th February 1998, James
-------------------------
1. Added an ABI in MacBinary format file type detector so that these are
now autodetected.
io_lib/utils/traceType.c
15th January 1998, James
------------------------
1. Rewrote the delta_samples1/2 functions to be faster. Times vary between 0.55
and 0.7 fractions of the original time.
io_lib/scf/misc_scf.c
4th December 1997, James
------------------------
1. First post-release bug fix.
Io_lib incorrect sets read->trace_name when reading anything except SCF files.
This means that when outputting to an experiment file no LN line is present.
io_lib/read/Read.c
1st October 1997, James
-----------------------
1. Allow for SCF files to contain 0 bases. This mainly affects memory
allocation, but also the display widget.
io_lib/scf/read_scf.c
io_lib/utils/read_alloc.c
28/29th August 1997, James
--------------------------
2. Added a few changes to make the code more portable for the Mac. Not really
used at present.
Misc/os.h
Misc/files.c
io_lib/utils/traceType.c
io_lib/read/translate.c
io_lib/utils/compress.c
30th June 1997, James
---------------------
1. The exp2read function produced invalid rightCutoff values (INT_MAX) when no
QR line is present. It now correctly sets it to 0.
io_lib/read/translate.c
|