1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 4160 4161 4162 4163 4164 4165 4166 4167 4168 4169 4170 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 4195 4196 4197 4198 4199 4200 4201 4202 4203 4204 4205 4206 4207 4208 4209 4210 4211 4212 4213 4214 4215 4216 4217 4218 4219 4220 4221 4222 4223 4224 4225 4226 4227 4228 4229 4230 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 4241 4242 4243 4244 4245 4246 4247 4248 4249 4250 4251 4252 4253 4254 4255 4256 4257 4258 4259 4260 4261 4262 4263 4264 4265 4266 4267 4268 4269 4270 4271 4272 4273 4274 4275 4276 4277 4278 4279 4280 4281 4282 4283 4284 4285 4286 4287 4288 4289 4290 4291 4292 4293 4294 4295 4296 4297 4298 4299 4300 4301 4302 4303 4304 4305 4306 4307 4308 4309 4310 4311 4312 4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 4323 4324 4325 4326 4327 4328 4329 4330 4331 4332 4333 4334 4335 4336 4337 4338 4339 4340 4341 4342 4343 4344 4345 4346 4347 4348 4349 4350 4351 4352 4353 4354 4355 4356 4357 4358 4359 4360 4361 4362 4363 4364 4365 4366 4367 4368 4369 4370 4371 4372 4373 4374 4375 4376 4377 4378 4379 4380 4381 4382 4383 4384 4385 4386 4387 4388 4389 4390 4391 4392 4393 4394 4395 4396 4397 4398 4399 4400 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413 4414 4415 4416 4417 4418 4419 4420 4421 4422 4423 4424 4425 4426 4427 4428 4429 4430 4431 4432 4433 4434 4435 4436 4437 4438 4439 4440 4441 4442 4443 4444 4445 4446 4447 4448 4449 4450 4451 4452 4453 4454 4455 4456 4457 4458 4459 4460 4461 4462 4463 4464 4465 4466 4467 4468 4469 4470 4471 4472 4473 4474 4475 4476 4477 4478 4479 4480 4481 4482 4483 4484 4485 4486 4487 4488 4489 4490 4491 4492 4493 4494 4495 4496 4497 4498 4499 4500 4501 4502 4503 4504 4505 4506 4507 4508 4509 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 4522 4523 4524 4525 4526 4527 4528 4529 4530 4531 4532 4533 4534 4535 4536 4537 4538 4539 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659 4660 4661 4662 4663 4664 4665 4666 4667 4668 4669 4670 4671 4672 4673 4674 4675 4676 4677 4678 4679 4680 4681 4682 4683 4684 4685 4686 4687 4688 4689 4690 4691 4692 4693 4694 4695 4696 4697 4698 4699 4700 4701 4702 4703 4704 4705 4706 4707 4708 4709 4710 4711 4712
|
Version 6.6.0 15-Jul-2013
Restrict and related applications now ignore matches where the
enzyme site is wider than the sequence length.
The SRS server at EMBL-EBI no longer serves the EMBL database!
EBI's SRS server databases in server.srs have been updated to
reflect their reduced service.
Reading large sequences is more efficient. Reference counted
strings are used for output. Where gaps do not need to be
replaced, a single copy of the sequence string is used for input,
processing and output.
New sequence format iguspto supports a variant of the
intelligenetics format with tolerance for format variants on
input.
Calculation of isoelectric point has been updated to use the same
data values as Expasy and the Open Bio packages. New data file
Epkexpasy.dat holds the values used by Expasy.
The final position of the reverse strand is now correctly numbered
in the output of sixpack and showseq.
Eukaryote join features in union were not correctly copied after
subfeatures were implemented to hold exons. The union code now
correctly relocates subfeatures.
Complex (join) feature positions were not relocated when the
parent sequence was trimmed by start and end position. This was
introduced when subfeatures were implemented, and is now
corrected.
New option -methionine for transeq translates any start codon as
methionine when a specific range is given (including 1 to end) and
an alternative genetic code is specified.
Wildcard filenames were broken by the query language rewrite. The
previous functionality is restored. Any query can use a wildcard
filename with '*' or '?' characters. The order in which files are
processed is determined by the operating system.
Dbxreport and dbxstat now support databases with a dbalias
(alternative base name for the database files).
Restriction digest applications occasionally reported more than
one identical match where several enzymes recognize the same
target site. The testing of isoschizomers has been improved to
catch these cases. In practice most runs are with only a few named
enzymes with different sites.
Fragment lengths in restrict are now included as extra columns in
the output, giving the fragments to the 5' and 3' side of each cut
in the forward strand. Note that the output includes all possible
cut sites, though it may be impossible for a double digest to
physically cut at each of two closely spaced sites.
The -name option of restrict had no effect on report output and
has been removed.
Cachedbfetch corrects bad EDAM references to EDAM_syntax: instead
of EDAM_format: in the definitions returned by EBI's dbfetch
and wsdbfetch servers.
Sequence identifiers now remove characters that may confuse output
file generation, changing to underscore any forward or backslash
(interpreted as host system paths), commas, semicolons and colons.
Sequence input now warns for bad sequence characters when the
format is known. When auto-detecting the format the warnings are
turned off so that failed formats can silently be ignored, but
when reading further sequences from the same input file warnings
are enabled. They can be disabled for individual format parsers by
passing zero as the format code to seqAppendWarn.
New nibble (nib) format stores sequence data in half-byte binary
compressed format. The format is available for input and output,
but as a binary format can only be read from a file, not from a pipe.
New GDE format for sequence input and output - a simple format
with a #id prefix.
Support added for SwissProt OH (viral host) records.
New sequence input associated qualifier (available for all
sequence inputs) -squick reads only the id, accession, description
and sequence, saving unnecessary parsing of more complex input
formats such as swissprot, embl and genbank.
String parsing objects are now reused rather than deleted to save
memory reallocation in parsing input streams with a large number
of entries. Input source code now uses reusable token objects
cleared only when the program exits.
Acdpretty now correctly preserves in-line comments in ACD files.
Efficiency improvements in matching sets of characters in strings,
especially in functions used for each entry in a large set of
input sequences.
New applications xmlget and xmltext read XML data, for example
from dbfetch:embl which offers emblxml format. Output can be as
input or in reformatted versions.
QA tests of EMBASSY applications look in a test/data directory in
the EMBASSY package as an alternative place for data files
prefixed by TESTDATA:
Clustal omega data types added to knowntypes.standard file.
Ranges can use a syntax of start+len or start,+len to give the
length rather than the end position. The end is calculated from
the start and length and used internally. This syntax allows a
closer fit to the command line of primer3_core in eprimer32 where
ranges in the native application are always specified as start and
length.
List file inputs now report an error if any text follows the first
token on a line, unless it is a comment following a '#'
character. Previous versions treated any remaining text as a
comment and silently ignored it.
New sequence format iguspto supports a multi-line IG format used
by the US Patent Office. The multi-line descriptions are preserved
only if EMBOSS reads and writes in this format. We can add the
capability to any other multi-line input format where the original
description lines should be preserved. Other formats treat
descriptions as a single record to be wrapped where there is a
maximum record length (e.g. in EMBL format).
Programs dreg and preg now only report sequences where a pattern
match was found, which is the same behaviour as fuzznuc, fuzzpro
and fuzztran.
New code added to handle xml datatype. Supports multiple named XML
formats, using the DOM parsers to interpret data. Multiple XML
input formats are supported, but on output, in the absence of a
conversion method, the original XML is normally reported as plain
"xml" format.
In database definitions, "example" is now a list attribute which
can appear multiple times, allowing multiple example queries to be
defined as separate records, with possible documentation following
a '!' delimiter.
Showserver now scales the column headers better for long cache
file names.
Showdb now displays the taxons, examples, and aliases defined for
a database. Examples and aliases can be preceded by a count of the
number of each. All columns are displayed with -full, individual
elements are controlled by -numtaxons, -taxscope (-taxonomy is a
database type option) -examples -numexamples -aliases and
-numaliases.
Showdb now displays a count of the number of fields in addition to
the list of field names. New command line qualifier -numfields
controls the display of the field count.
Showdb now displays all types defined for a database, separated by
commas, but will only display a database once so that, for
example, a protein and protfeatures database will appear in the
protein database set first (if displayed). If only the features
databases are displayed then it will appear with them.
Showdb no longer shows the access levels (id, query and all) by
default for a database. New command line qualifier -access or the
existing -full qualifier will show these values.
Entrez access was specific to sequence data retrieval. Entrez
server retrievals can now automatically detect ID and accession
fields and read text entries with textget where a text format is
available.
Genbank-related protein formats Refseqp and Genpept are updated to
process all record types. Genpept feature handling is updated to
correct the handling of multiple locations by using subfeatures.
GenBank and Refseq formats now handle the full set of record types
including common species names, reference details and comments.
Dbtell -full reports any alias names for a database after the
definition.
Dbtell recognizes alias names for a database, reporting the master
database definition and a comment describing where the alias is
defined.
Dbtell -server reports the database definition for a server. All
attributes are reported in the database definition, whether
defined for the database or at the server level.
Servertell -full now reports the definitions of all databases for
the server, including all aliases defined in the server definition
file. Without -full an extra comment line in the output suggests
running with -full for more detailed information.
Restrict output now sorts by the position closest to the start for
matches on the reverse strand (for an asymmetric target
site). This sort change can produce additional matches in the
output of restover.
Embossversion is now set to fail with a message if the update
information URL is unreachable.
HTTP and FTP error messages were simplified and blank lines removed.
The valgrind.pl script has a new qualifier -debug which runs the
test with -debug on the command line.
Needle, needleall and water now fail with "die" message if there
is insufficient virtual memory to calculate the alignment between
two long sequences
Indexing with the dbx applications miscalculated the secondary
page capacity when the secondary page size is less than the
primary page size.
Ranges in a file can use a dash as a delimiter for the start and
end positions in addition to white space.
For all data types, format names can be replaced by EDAM format
term identifiers, for example 1927 for "embl". The format terms
are defined in the source code. We will need to define aliases or
use more complex queries if a format splits into a hierarchy
but this is unlikely in most cases.
On FreeBSD systems embossversion source code has quotes corrected
on the line that reports FreeBSDLF is defined.
Version 6.5.0 15-Jul-2012
On Windows (mEMBOSS) the user home directory is checked for the
.embossrc file and .embossdata directory, using emboss.default for
settings defined for all users.
Database definitions with multiple types and formats now check
that there is at least one valid format defined for each type of
data.
The qatest.pl script handles references to the user's home
directory on Windows. "~/" is replaced with the user's home
directory, with the full path or filename enclosed in quotes.
The qatest.pl script has a new qualifier -debug which runs the
test with -debug on the command line. For ACD utility tests the
application name is taken from the first command line parameter
and will not match the debug file so these will give an error for
an unknown .dbg file. For all other tests this is a simple way to
obtain debug output for a problematic test result.
EMBOSS supports soap protocol access using the Apache axis2c
library. We use version 1.6.0 for testing. Installation can be
tricky on some systems. We are happy to help with anyone who finds
problems. A copy of the library is included in the initial 6.5.0.0
mEMBOSS build.
Date parsing for EMBL, GenBank, SwissProt, Refseq and related
formats has been made more robust.
New application embossupdate checks for the availability of an
updated EMBOSS distribution or patches from the EMBOSS website and
FTP server. Embossupdate can be run at the end of a successful
installation or reinstallation. We hope this will help our
users to keep their versions up to date more easily.
Feature data can be read from PIR and GCG formatted databases.
EDAM is updated to release 1.1. EDAM is used to define EMBOSS and
EMBASSY applications, to describe EMBOSS defined databases and
entries in the DRCAT data resource catalogue. This is a prerelease
from the EDAM team to ensure EMBOSS has the most recent set of
terms.
Lists and tables now support very large numbers, requiring long
integers (datatype ajulong) to represent the return values from
ajListGetLength, ajTableGetLength and ajTableGetSize. Further
extensions are planned in future releases.
Directory inputs now interpret ~/ or ~user/ in the user response
in the same way as file inputs.
Application embossversion -full now reports the versions of all
libraries, and all configuration settings used to compile EMBOSS,
plus the sizes of standard data types.
Dbxfasta has a new format "idsv" which finds sequence version
values if the accession number has a .number suffix.
Dbxflat creates a sequence version for UniProt entries using the
accession number and the sequence version from the DT records.
Dbx indexing stores secondary reference file positions only if the
database has more than one data file per entry. The entries file
records the number of files in the database and can if needed
store more than one reference file. Identifiers indexes can store
more entries per page for databases with one file (embl, uniprot),
but support reference files for gcg, pir and taxonomy indexing.
Dbx indexing supports separate caches for primary and secondary
pages. Larger caches can reduce the number of physical reads and
writes at the cost of a small increase in CPU time. The organism
and description indexes for large databases can have terms that
appear in a very large number of entries (e.g. 'protein' in
UniProt or 'bacteria' in EMBL). Secondary cache sizes up to 100k
can be used to try to reduce the physical page rewrites needed as
these indexes grow.
Dbx indexing supports a smaller size for secondary index
pages. These hold the lists of entry ids for indexed strings, and
the file offsets for non-unique identifiers (e.g. secondary
accession numbers). The environment variable EMBOSS_SECPAGESIZE
defaults to 512, a quarter of the EMBOSS_PAGESIZE value of 2048.
Resource definitions can specify field-specific secondary page
sizes using, for example accsecpagesize: "256"
Dbx indexing applications (dbxflat, dbxfasta, dbxgcg, dbxedam,
dbxobo, dbxresource, dbxtax) secondary index files (e.g. keyword,
taxonomy and description indexes) are more compact. The entry ids
for each keyword are stored as a simple list unless more than one
index page is needed. As most indexed tokens are in only a few
entries this saves many pages while the index is being built. The
compressed index size is also smaller.
Dbxflat, dbxfasta and dbxgcg now report index terms that exceed the
maximum length (attributes idlen, acclen, deslen, orglen, keylen,
svlen, gilen). Each term beyond the current maximum is
reported. When the run is completed, the longest term length for
each index field is reported so that excessively large values can
be reduced.
Dbxflat dbxfasta and dbxgcg have improved memory efficiency on
large indexing runs. Many more internal data structures are reused
in the parsers.
Window length options are renamed to -window consistently across
all EMBOSS applications. The change applies to pepwindow and
pepwindowall
Multiple inputs to einverted gave inconsistent results as two
internal variables were not reset for each new sequence.
Resource definitions for uniprot (swissresource) and embl
(emblresource) are updated to allow the maximum size for database
index keys. If the database contains longer values in future they
will be truncated and the maximum size found by the parser will be
reported by dbxflat.
New resource definitions chebiresource and sworesource are
provided in emboss.standard to index ontologies with
exceptionally large index keys.
Ontologies CHEBI, ECO, GO, PW, RO, SO are updated.
Ontology SWO is added. This is the software ontology, in its OBO
format. Some identifiers are really URLs.
Sequence and other databases with an organism ('org') or taxonomy
('tax') index can restrict retrieval to one or more indexed
organism names or any other indexed level in the
taxonomy. Examples include EMBL or UniProt whether indexed locally
with dbxflat or accessed through the EBI's SRS server as srs:embl
or srs:uniprot. A new database attribute 'organisms' can be used
to define one or more organisms or taxonomy levels to restrict
data retrieval from the master index of the complete file. A value
using EMBOSS query syntax of "rattus|mus" will allow data from
both genera to be retrieved. Values can also be separated by tabs,
commas ',' or semicolons ';' As organisms can include spaces we
chose not to allow space as a delimiter. The organisms attribute
is implemented for method "emboss" and "srswww" to allow remote
retrieval. We can implement organisms for other access methods if
there is a demand from the user community,
Ontology databases can combine more than one branch of an ontology
in a single file. Examples include the Gene Ontology (GO) with
namespaces for cellular_location molecular_function and
biological_process and EDAM with data, format, identifier,
operation and topic. A new database attribute 'namespace' can be
used to define one or more namespaces to restrict data retrieval
from the master index of the complete file. This is tricky for
EDAM data which is in the data or identifier namespaces. A value
using EMBOSS query syntax of "data|identifier" or spaced with
"data identifier" will allow data from both namespaces to be
retrieved. The namespace attribute is implemented for method
"emboss" (how the ontologies are indexed in the distribution) and
"srswww" to allow remote retrieval. We can implement namespace for
other access methods if there is a demand from the user community,
EDAM release 1.0 is included. Major changes were needed to EMBOSS
internals as the identifiers are all changed (different term ID
number and different prefix). ACD files and the DRCAT data
resource catalogue are updated with the nearest equivalent terms
from EDAM 1.0.
Assembly data is now loaded a few records at a time using a new
"loader" object. This allows very large files to be processed in
chunks.
Variation data is now loaded a few records at a time using a new
"loader" object. This allows very large files to be processed in
chunks.
Support for BioPerl/Open-Bio OBDA flatfile indexes is included as
database access method 'obda'. The indexing in BioPerl 1.6 is
broken for EMBL as the semicolon is not removed from
identifiers. The secondary index files have duplicated
records. Both problems should be fixed in a future BioPerl
release. Note also that OBDA indexing parses only the primary
accession number so that other accessions are not retrievable from
OBDA index files.
EMBL entries with a single (source) feature could ignore the
feature.
Output files for fuzznuc, fuzzpro, fuzztran, dreg and preg
included the pattern name and the pattern string in the last
release. The output format is changed to remove the space between
the pattern name and string so that parsers see the expected number
of space-delimited fields in the output.
The query language parser has been rewritten to handle the new
-iquery and -ioffset qualifiers. Badly formed queries may now
produce different error messages.
Any input type that uses queries, with the exception or URL
inputs, can use two new associated qualifiers. -ioffset is the
initial non-zero offset when reading from a file or a
URL. -iquery if the query field which can be applied to an FTP or
HTTP URL or to any query in a list file. These names also apply to
sequence and feature input where other qualifiers begin with 's'
and 'f' respectively.
FTP and HTTP URLs can now be used directly as input queries for
all data types in place of file names. EMBOSS automatically
detects the ftp:// or http:// prefix and uses the appropriate
protocol. Any query or offset is ignored as there is no way to
distinguish these from a genuine part of the URL.
Patterns for fuzznuc, fuzzpro and fuzztran can include escaped
codes to skip the expansion of ambiguity codes and look for them
explicitly in the input. A backslash (shells may need two) before
the code specifies an exact match, for example \S will only match
S in the input.
Patterns for fuzznuc with ambiguity codes are now expanded to
include the ambiguity code (and any overlapped ambiguity
codes). For example, S matches [GCS] and B (not A) matches
[TGCBSYK]
A new AJAX source file ajtagval.c handles general tag-value pairs
of strings which have uses beyond feature internals.
Pepwheel can plot up to 5 sets of residues, with a total of
"steps" at each level. Leucine zipper plots with a step of 7 and 2
turns required more residues to be visible. The updated pepwheel
rescales the size of the inner wheel to allow more residues to be
displayed.
Sequence and assembly reading in BAM format always fails if no
match as found in the first pass - attempting to read again could
loop with the same result as the file is rewound. Rereading is
intended for text formats such as FASTA where the next entry may
match.
Header files in AJAX and NUCLEUS have been cleaned to remove
redundant references. A new include file ajlib.h includes the core
set of ajdefine, ajarch, ajmem, ajmess, ajfmt and ajstr which were
almost universally included. Applications are expected to use
emboss.h as their only include, but references to ajax.h and
emboss.h in the libraries are now all replaced with the minimally
required set of include files.
The server.entrez file has been updated using a script
serverentrez.pl which queries Eutils to obtain a list of database
names and fields. An internal array is used to define the
datatypes and formats for each database as these are defined only
in a series of HTML tables in other pages.
Reading from the NCBI Entrez server failed. The cause was trimming
newlines from a reference-counted string where the data returned
has CR-LF format but only one character was removed.
New xygraph output device support for datafile formats. "bedgraph"
outputs in BedGraph format. "wig" outputs in Wiggle format.
The "sequence" attribute is implemented for xygraph outputs. If
set true, the X-axis label defaults to the name of the first input
and the source name used in datafile outputs is also the name of
the first input.
Dottup and dotmatcher now have the first sequence on the X axis
and the second on the Y axis. This follows standards for datafile
output of graphical data which default to the X axis relating to
the first input sequence.
Dbx index files from earlier releases defaulted to "secondary"
indexes. The test for an index with no "Type" parameter defined
now picks up the standard Identifier indexed fields (id, acc, sv
and gi) correctly. The files were identified by field name, but
the test was using the file extension.
Fuzznuc, fuzzpro, fuzztran, dreg and preg when searching with a
regular expression found only the largest possible match at each
start position. A new function in recent releases of the PCRE
regular expression library supports searching for all matches
using function ajRegExecallC instead of ajRegExecC. These
applications can now find all overlapping matches to a pattern
using a regular expression.
The PCRE library is updated to include the pcre_dfa_exec
function. This is called by ajRegExecall and ajRegExecallC. The
regular expression can be compiled as usual. The new calls set an
internal value to the number of matches found, retrievable by
ajRegGetMatches. Offsets (ajRegOffsetI) and substrings (ajRegSubI)
return these matches, starting at zero which is the longest match
(the same as in ajRegExec). Any shorter matches with the same
start are stored in place of bracketed substrings.
Prettyplot options are changed to remove dependencies on other
options. Option -plurality (which depended on the sequence
alignment weight or the number of input sequences) is now -ratio
with a default of 0.5. This is exactly equivalent to the default
-plurality value or half the total weight. Option -resbreak is
replaced by -blocksperline with a default value of 1. This has the
same default output as the -resbreak option which defaulted to the
-residuesperline value.
All header files now have an @include comment block which includes
the LGPL licence and RCS tags. Header files are commented in
consistent sections. The C++ compile extern wrapper for C
declarations is now a macro to avoid indentation issues in emacs
and other editors.
All obsolete functions are moved to the end of source files and
wrapped in an #ifdef AJ__COMPILE_DEPRECATED block. The configure
option --enable-buildalldeprecated includes these functions in
compilation. Functions described in the 6.2.0 books are included
in a similar AJ__COMPILE_DEPRECATED_BOOK block and built with the
--enable-buildbookdeprecated configure option.
Diffseq produced incorrect results when reporting an insertion in
the second sequence. The error was introduced in release 6.0.0. It
is fixed by defining a "between" location for the insert site in
the first sequence, and by adding support for "between" features
to diffseq and other report formats. A new constructor
ajFeatNewBetween with one position makes creating such features
easier.
New function ajListDrop removes a node from a list by searching
for its address.
Test data includes a new EMBL data file syn.dat containing a
circular sequence.
GFF3 input combines features with the same ID under a generated
parent so that features can be linked as subfeatures and sorted
together. These features are identified by the Flags attribute and
excluded from GFF3 output.
GFF3 output is required to use different feature types for
parent and child. This is broken by the annotated parent feature
we need to represent EMBL/GenBank/DDBJ joins. For these, the
parent has a new type of biological_region with a new featflag
type=CDS (for example) so we can restore the correct internal
representation when reading the GFF3 file.
A new sequence associated qualifier -scircular defines a sequence
input as a circular molecule where this is not defined in the
input format, for example EMBL/Genbank and GFF3 have the
information but FASTA input does not. For feature input there is a
new -fcircular qualifier. Any circular definition in a sequence
format overrides this qualifier. Sequences with features are set
circular if the feature table input is defined as circular.
GFF3 format has been corrected using the online GFF3
validator. Protein feature type names are corrected to use the
current SO term name. Tags are converted to lower case on output
and back to standard case on input, for example /EC_number in EMBL
format, as GFF tags must start in lower case.
In GFF3 protein features now always use '.' for the
strand. Previous releases could also write '+'. Both are
acceptable as input.
GFF3 and GFF2 scores now use a general floating point format to
write 4 significant figures (rather than 3 decimal places) to cope
with very large and very small score values. Trailing zeroes
after the decimal point are omitted in this format. A score of
zero is written as a dot (missing value).
Sequence queries can use two alternative syntaxes for sequence
ranges. Appending :start:end allows a syntax similar to DAS
queries. Appending :start..end allows a syntax similar to
EMBL/GenBank locations in other entries. Both can be followed by
:r to reverse the sequence region.
Sequences and reference sequences can be read from EMBL CON
division entries by using the same database with an ACC (accession
number) index to read the sequence fragments defined in the CO
record(s).
New code added to handle reference sequences in ajrefseq* source
files. The AjPRefseq object will hold large reference sequence data
in managed memory buffers.
Database definitions can use a new attribute "special" to give a
name=value definition for any attribute specific to one access
method. The first instances are SpeciesIdentifier for
ensemblgenomes databases, and tags for processing assembled entries
in CON (constructed) entries in EMBL. ConDatabase is the database
name used, ConField is the index field. By default CON entries use
the ACC field of the same database.
Standardized all licensing references in the libraries to GNU Lesser
GPL version 2.1. Added CVS keywords to record the CVS file
version, and the date and user of the latest commit.
Microbial genomes in ensemblgenomes have an enumerated species
code which must be included in an data retrieval request. The
codes are temporarily added to the comment attribute of the
databases in the server cache file. This will be replaced by a
more complete solution in the next release.
The DRCAT.dat file has a new set of lines to handle Nucleic Acids
Research classifications. A new NARCat line code is now separately
parsed by dbxresource into the NAR category name and the URI.
Long tag values in GFF3 format could exceed limits in the regular
expression. This is fixed by first testing for and replacing
escaped quotes and then using a simpler expression to extract
quoted string values.
When reading ranges from a file the strings were overwritten by
the parser.
Application tcode results disagreed with the original
publication. The calculation parameters have been corrected.
EDAM.obo is updated. 28 terms were added. Descriptions were
updated and names changed.
Short descriptions of EMBOSS and EMBASSY applications have been
updated to use consistent terminology and grammar rules.
Dbxflat failed to parse the organism ('org') field of a GenBank
entry when another secondary field (keyword or description) was
also parsed in the same run.
Dbxflat and dbiflat now use a separate parser for SwissProt format
data files. Previous releases used the EMBL parser which failed to
identify the first word in the specially formatted SwissProt
description records. The change only affects the 'des" index
field.
Reading ABI format failed to read the sample name field and
machine name. The sample name is now correctly parsed. The sample
name is used by EMBOSS as the sequence identifier.
Formats specified on the command line were ignored by database
queries. This behaviour was correct in previous releases where
only one format was permitted, but is required from 6.4.0 where a
database may have multiple possible formats. Any format defined
elsewhere on the command line is now used if there is no format in
the query string.
ACD files are stricter in checking ambiguous qualifiers. Options
that are also a short form of another qualifier now generate
warnings. These can be turned off with the application attribute
wrapper: "Y" where a third party command line is wrapped.
Showfeat had an option -type which was ambiguous. Changed the
options so those with a match option (-typematch) have a show
equivalent -typeshow to display the column.
Emma had options -dend and -slow which were short forms of other
qualifiers. They are renamed -dendreuse and -slowalign. The old
qualifier name will now give an "ambiguous qualifier" error
message and report the new name.
Eprimer3 and eprimer32 had options -otm and -osize which were
short forms of other qualifiers, and could cause confusion
between optimum and oligo values. They are renamed -opttm and
-optsize.
Helixturnhelix had an advanced option -sd which was a short form
of sequence qualifier -sdbname. It is renamed to -sdvalue.
Prettyplot had an option -box which was a short form of other
qualifiers. It has been renamed -doboxes to match the related
qualifier -docolour.
Showserver had an option -server which was a short form of
-serverversion (itself named to avoid a clash with -version). This
option is now renamed -servername.
Supermatcher and wordfinder had an option -errorfile which was a
longer form of the standard qualifier -error which can suppress
the reporting of error messages. The -errorfile qualifiers are
renamed -errfile.
Revseq added 'Reversed:' to the sequence description. For use
cases where the original sequence description is preferred
(e.g. FASTQ format formatted descriptions) a new -notag option
retains the original description.
Cirdna prints text inside solid blocks invisibly. When printed
outside the text scaling was too small. The text scale is now
adjusted for the radius and sequence length so that labels should
be readable outside the box.
Fuzznuc, fuzzpro and fuzztran using a pattern file ignored the
command line -mismatch qualifier for the first pattern. The
default mismatch is now set to this value at the start of the
pattern matching loop in the library.
qatest.pl which runs the QA tests now checks for a qatest.dat file
in the EMBOSS source directory and additional qatest.dat files in
the test subdirectory for all EMBASSY packages found under the
source embassy/ directory. By providing individual qatest.dat
files for each package we can simplify testing for a core
distribution. Some of the older EMBASSY packages derived from
domainatrix have cross-dependencies where one test uses the output
of an application from another package. New AX and AY lines define
foreign tests which are executed even where a single EMBASSY
package has been specified with the -embassy=package qualifier on
the command line.
Version 6.4.0 15-Jul-2011
DBXFLAT can index FASTQ format short read sequence files, allowing
individual sequences to be rapidly retrieved by name.
Genpept format has changed since we last tested it. The LOCUS line
is simpler. EMBOSS now supports GenPept as documented and
distributed by NCBI.
Sequence in SAM format ignores the reference sequence
name. Previous releases saved it as the accession number, but this
is inappropriate as it is then reported as the identifier in EMBL
format.
The -help output (and documentation) for align and report output
types now includes the default format if defined in the ACD file.
New code added to handle variation data in ajvar* source
files. The AjPVar object will hold genetic variation data from the
Ensembl API and from VCF input files.
New access methods for URLs have been added as ajurlread.c and for
URL output methods as ajurlwrite.c - supporting collecting and
reporting of URLs as output. URLs are saved as an array of strings,
intended to be reported as a set of links to the underlying data.
Sequence format "raw" now only reads binary files, which means it
cannot be used for piped data. The change was needed to avoid
accepting binary data where a file has a NULL and then no newline,
for example ABI data files where the initial 'ABIF' could be read
as a valid sequence.
Application tcode failed to plot results for more than one
sequence. It also reported a plplot error when reading random
non-coding input. It also failed to report the threshold lines
when they were outside the range of observed scores.
Four new functions combine tables where the keys and values are of
the same types. In each case the tables are resized to the larger
of the hash array sizes, and then at each hash array position all
keys in both tables are compared. The functions differ only in the
actions taken when a match is or is not found. ajTableMergeAnd
keeps all keys that are in both tables. ajTableMergeEor is the
inverse keeping only keys that are in only one table.
ajTableMergeNot removes keys that are also in the second
table. ajTableMergeOr adds keys from the second table that do not
match. All remaining keys and values are deleted using the tables
built-in destructor functions.
Some data resource catalogue applications failed when run with the
-debug option. Their debug calls have been updated.
New application dbtell reports the attributes for a database.
All messages written to the user are also logged to the debug file
to help locate where they are generated when debugging.
Applications showfeat, extractfeat and coderet are updated to
follow the new features /subfeatures data structures.
When using a simple numeric database identifier, the SV field is
only searched if it is defined.
Access to local SRS databases created an invalid command line for
getz with a stray '+' character needed only in the web version.
Nexus format input can now handle a missing taxlabels block by
using the matrix block to read sequence names.
GFF3 tag names are automatically converted to lower case unless
they match a known GFF3 "special" tag name.
GFF3 format has been rewritten to comply strictly with the GFF3
standard on the sequence ontology website. Characters are now
escaped in tag values. The 'featflag' tag has been changed to
convert the hex value into a readable list of flags, with some
flags now inferred from the content of the GFF line. The GFF3
special tags (all starting with an upper-case letter) are now
stored separately. The ID and Parent tags are used in
post-processing to build subfeatures which are stored under the
feature with an ID matching their first Parent tag.
GFF3 input requires the optional EMBOSS type comment to identify a
protein GFF3 file as there is currently no safe way to distinguish
protein from nucleotide features using only the standard GFF3
format.
GFF3 format sequence format failed to read files with additional
## comment records after the header block. These comments are now
ignored.
Feature objects have been extended. A feature may now include a
list of subfeatures. This is intended to allow exons to be stored
under the feature to which they belong. With this new structure,
sorting feature tables becomes easy as there is no need to match
group tags and sort by ID. Features simply sort by their main
(parent) feature, with the other subfeatures (exons) unseen by the
sort algorithm.
Application restrict crashed when the enzyme list was empty. If
reported invalid enzyme names, but not 'no enzyme name given'.
Reference-counted lists are enabled with the constructor
ajListNewRef creating a reference-counted copy. Lists are only
deleted when the reference count falls to zero.
Reference-counted tables are enabled with the constructor
ajTableNewRef creating a reference-counted copy. Tables are only
deleted when the reference count falls to zero.
Table code has been rewritten to automatically delete keys unless
the table is created with a Const version of the constructor. All
table constructors are renamed, with the older names retains as
"deprecated" functions which do not delete keys or values. All
EMBOSS code has been changed to use the new function names.
New functions ajTableMatch, ajTableMatchC and ajTableMatchS test a
key is present in a table. They can be used where the ajTableFetch
is inadequate because the value may be NULL. Some code used
ajTableFetchKey but this is intended only for case-insensitive keys.
Tables (AjPTable) have defined functions to hash and compare
keys. Two new functions can be defined to delete keys and
values. By default these are NULL and no keys or values are
deleted. The functions can be ajMemFree to simply free memory, or
more complex object destructors. As these require a void** argument
(all keys and values are void* internally) wrappers are needed
around object destructors. We recommend appending 'Void' to the
standard destructor name and casting the void** argument to pass
to the object-specific destructor.
Tables (AjPTable) can be resized using the ajTableResizeLen
function. When adding to a table with ajTablePut the table is
automatically resized when the number of entries exceeds an
average of 8 per bucket.
Function ajMemFree now accepts a void** argument and sets the
pointer to zero after free the memory. All EMBOSS code calls this
through the AJFREE macro which is now safer to use as the pointer
appears only once in the generated code.
Application digest conflicted with the name of a utility on some
systems. It has been renamed to pepdigest.
In the emboss.standard and emboss.default files certain attributes
can appear more than once if defined as type "ATTR_LIST" in the
ajnam.c source file. These include a new attribute 'field:' defined
once for each database query field, superseding the 'fields:'
list of field names. The 'field:' attribute has a list of field
names, with the first being the name preferred by EMBOSS and
others acceptable on the command line. A '!' delimiter marks the
end of the field names and the start of a free text description.
This style of description is also allowed for other attributes,
including 'taxon:' and the 'edam*:' attributes. The syntax is
taken from the metadata in OBO format.
Data retrieval using the HTTP protocol now checks for redirects in
the header and replaces the file buffer with the results from the
new URL. This allows EMBOSS to read outdated URLs for database
access.
New trace functions ajTableFetchTrace and ajTablePutTrace help to
debug adding new keys to a table.
New parsing function ajStrTokenNextParseDelimiters returns the
delimiter string in addition to the token parsed from a string
token handler.
Application einverted could report a bad alignment if the matched
region reached the end of the search window. Matches which go
beyond the search window are now ignored. This bug was reported
with a very low threshold score and was unlikely to be noticed
with the default settings.
Sequence format treecon failed if the only line of input started
with a number. Failure to find a second record now simply returns
false.
Tables can now use integer keys and values of four types - integer
and long, signed and unsigned. The unsigned longs are used
internally for emblcd index reading and for b+tree index creation.
Report output in from pattern patching applications (fuzznuc,
fuzzpro, fuzztran, dreg, preg) now includes the pattern as well
as the pattern name in the '*pat' or 'Pattern_name' feature tag
value.
New applications search the EDAM ontology by each of its query
fields, with common options to restrict the results to one of the
7 EDAM namespaces. Also new applications to look for EDAM term with
each of the 5 common relationships for EDAM data terms:
has_input, has_output, is_identifier_of, is_format_of and
is_source_of. The sixth relationship has_attribute is only used by
the obsolete 'entity' namespace terms.
New application dbxresource indexes the data resource catalogue
DRCAT.dat which is distributed with EMBOSS. Most fields in DRCAT
are indexed. The EDAM and Taxon fields are used by other
applications to search the EDAM and TAXON databases for terms which
are in turn used to select DRCAT entries by taxon, data type,
format, identifier and resource.
Any menu (list and selection ACD types) which allows all options
to be selected now accepts "*" to select everything. This can be
the default (e.g. for database index fields) or can be specified
by the user with quotes to protect it from interpretation by the
Unix shell.
Tokens indexed with the dbx* programs now have white space indexed
as underscores. Any index files with spaces in the tokens need to
be re-indexed. This applies to keyword and organism indexes.
New code added to handle short read assemblies in ajassem* source
files. The AjPAssem object will hold large numbers of short reads
in managed memory buffers.
New template for adding data types with specific formats for input
and output and data access methods. These templates are stored in
ajwxyz* source files with a script newdatatypes.pl to
automatically create new, properly named, stub functions in the
AJAX core and ajaxdb libraries.
Program nthseq now simply reports an error (not a fatal error) if
too few sequences were read.
Feature input and output was in one large file. This has now been
refactored with ajfeatdata.h for the data structures, ajfeatread.c
for input formats, ajfeatwrite.c for output formats and remaining
feature object handling code in ajfeat.c.
New access methods for text have been added as ajtextread.c and
for text output methods as ajtextwrite.c - supporting text and
(preserved) HTML and XML output. Text is saved as an array of
strings, intended to be used as one per input record although
storing the entire text in the first string is also possible.
Data queries have been made general. A new AjPQuery object handles
queries for any datatype, storing a list of field names and
queries, plus an operator (OR, AND, NOT, EOR, ELSE) for combining
fields. Previous releases had a hard-coded search for "id or
accession" which now uses the new query structure. Extensions to
the query language will allow more complex combinations, and will
allow any field to be defined for an external data resource
(e.g. fields for an SRSWWW server).
All data reading access methods have been restructured. Methods
that essentially return an open file with the pointer set to the
start of an entry (which covers most of the original access
methods) are moved to a new source file ajtextdb.c and use a new
AjPTextin input object which is included within AjPSeqin for
sequence input and AjPOboin for OBO term input. These functions
are generalized for any input data in some text-based file
format. Sequence access will first check for a text-based access
method, and then for a sequence-specific method (e.g. ensembl).
Other input datatypes can do the same. The code for OBO ontology
terms will use the new text access methods. Code for access to
other input data types (feature, alignment) will now be relatively
easy to add. Text retrieval of data from a new list of data
resources can also use these access methods.
Program einverted required at least one base between the halves of
an inverted repeat. Blunt joins are now reported where previous
versions reported a 2 base gap.
Error messages from database indexing now include the filename of
the index file. This is useful when identifying the indexing
operation where the problem occurred.
EMBOSS database index files are extended to mark numeric and
string index pages. In previous releases all were marked as
strings. Older index files remain valid for sequence retrieval,
but not for the new dbxreport index analysis application.
New application dbxreport analyses the contents of an EMBOSS
index, reporting the numbers of keys of various types, number of
pages, and percent free space. It also checks that all pages in
the index have been used and are linked to a higher page.
New application dbxedam is an extended version of dbxobo which
also indexes EDAM-specific relationships between terms.
New application dbxobo indexes OBO format ontology files. Index
fields are id, acc (alt_id records), name (name and synonym
records), ns (namespace records), isa (is_a records pointing to
the parent term) and des (def records).
EMBOSS database index files include an extra count value
"fullcount" for the total number of words indexed. The "count"
value is the number of unique terms (for example, words in
descriptions or accession numbers).
EMBOSS database index files include an extra type value "Type"
with the value "Identifier" for a simple primary identifier such
as ID or accession, and "Secondary" for an index of secondary terms
which points to the entry unique ID.
Database indexing application dbxfasta may corrupt index files with
long words in the description index. Dbxfasta now checks the
maximum word length, and as an added safeguard the indexing
library code also checks and truncates any word longer than the
maximum.
New application seqcount returns the number of sequences read.
This simple application was requested on the EMBOSS mailing list
to avoid complicated command line manipulations and unnecessary
sequence output.
acdpretty now writes lines up to 75 characters wide. The width was
restricted to 50 to allow space for in-line comments but this
restricted the length of indented text too severely.
In emboss.defaults and the user's .embossrc file variables are now
resolved at read time, including the names of include files. This
can simplify the configuration files for sites running more than
one installation.
Patched: SAM format file entries with negative insert sizes are
valid but were wrongly rejected.
Patched: BAM format misread the quality scores. An offset of 33
used to report values for debugging was incorrectly included in
the stored values.
Configuration now uses autoheader and has less dependency
on the libtool version.
Version 6.3.0 15-Jul-2010
'ensembl' is a new access method for accessing Ensembl
from MySQL. Queries take the form:
seqret ensembl:human:ENST00000262160
seqret ensembl:human:ENST0000026216?
seqret ensembl:human:ENSE00001533831
showing that transcripts, translations and exons are retrievable
and that partial queries are allowed. Example database
definitions are given in the emboss.default.template file. Please
read the note above those definitions regarding fair use of
the public Ensembl servers.
'sql' is a new access method for networked SQL servers
(MySQL or PostgreSQL). The server and database is described
using the 'url' field. As for biomart (described below) the
database definition must include definitions of new attributes
'sequence' (the sequence column) and 'identifier' (the
column used in the query). Additional columns may be
returned as description text if they are listed in the 'returns'
attribute of the DB definition. An example definition is
given in emboss.default.template.
tfextract has been updated to deal with multiple pattern lines
and empty sequence lines.
Three automatic EMBOSS environment variables are
added. EMBOSS_INSTALLDIRECTORY is the installation directory
reported by embossversion -full, EMBOSS_BASEDIRECTORY is the base
directory reported by embossversion -full, and
EMBOSS_ROOTDIRECTORY is the root directory reported by
embossversion-full. These are needed to allow the QA test
database definitions to point to the test data for the current
installation, and appear in the test/.embossrc file.
Validation of EMBL/GenBank feature tables has been updated by
reading EMBL release 104 (June 2010) and allowing many feature
qualifier non-standard values that appear in that release.
Biomart is a new access method for sequence databases, The
database definition must include definitions of new attributes
'sequence' (the biomart sequence attribute) and 'identifier' (the
Biomart identifier attribute). Additional attributes may be
returned as description text if they are listed in the returns'
attribute of the DB definition. An example definition is
given in emboss.default.template.
Database definitions have a new attribute serverversion which is
used by SRSWWW access to choose the best way to retrieve data.
SRSWWW database access, for example from the EBI's srs.ebi.ac.uk
server, had a problem processing queries returning more than 30
entries. This is now corrected by first asking the server for the
number of entries and then accessing the data in chunks. This will
unfortunately slow down SRSWWW access for single entries but was
the only solution available after checking with EBI's SRS support
team.
Infoseq has a new column "organism" which shows the species line
from an EMBL or UniProt entry. In a future release this may be
changed to show the standard name for the NCBI taxon identifier
from an entry as the species definitions for these databases can
be long with alternative names and possibly additional species.
Amino acid 280nm extinction coefficients in file Eamino.dat have
been adjusted to match those of the Expasy 'protparam' tool.
Pepstats now reports values with cysteine residues reduced and as
cysteine bridges.
Database types, originally defined as simply "N" for nucleotide
and "P" for protein, should now be named in full. The names are
expanded automatically when reading the definitions in the
emboss.default and .embossrc files. Expanding the types allows for
new database types to be added in the near future.
EMBOSS can now read and write BAM (binary SAM) sequence files to
extract all sequences and quality scores, for example to write
them out in FASTQ format. Although BAM data can also be read
through a pipe as standard input, in this case the format must be
specified on the command line as it is not currently possible for
EMBOSS to read a buffered text file as binary data.
Needle dynamic programming algorithm updated to allow adjacent
gaps in opposite strands.
Rabin-Karp multi pattern search algorithm moved into the nucleus
library. supermatcher application seed finding step updated to use
Rabin-Karp multi-pattern search.
Banded Smith-Waterman algorithm used by supermatcher and
wordfinder applications has been revised, fixing a problem with
occasional inconsistent alignments. Basic SAM format support for
these two applications as well as for the wordmatch
application. supermatcher assumes the second sequence as the
reference sequence while wordfinder and wordmatch considers the
first sequence as the reference sequence.
The acdvalid application now reads the EDAM (EMBRACE Data and
Methods) ontology to validate EDAM references in relations
attributes. All applications are expected to have at least one
topic and at least one operation term. Other qualifiers can have
any number of data terms.
New source file ajtax.c provides parsing and validation for the
NCBI taxonomy in its .dmp file form. The parser reads all taxonomy
data into memory. This takes up too much space for practical use,
so is only intended for subsets. The parser will be reused to
develop indexing applications to provide fast lookup of taxon
identifiers.
New source file ajobo.c provides parsing and validation for OBO
format ontology files. The parser includes strict warnings
according to the OBO format documentation, but these can be turned
off as in many cases the OBO foundry ontologies do not follow the
exact standard. Examples include terms not in sorted order, and
Typedef stanzas following Term stanzas, and dbxrefs to
non-existent terms (e.g. GO:ma in the gene ontology to cite a
curator).
Support for PDF and SVG graphic file output has been added. SVG
requires no additional libraries. PDF support requires the libhpdf
library (which, somewhat confusingly, is provided by the libharu
project). EMBOSS will attempt to find the library and development
files automatically and add PDF support (or not) appropriately.
However, if libhpdf is in a non-standard place, a --with-hpdf=DIR
configuration switch can be optionally used.
The output of showalign has changed. The reference sequence now
appears at the top, of selected. The ticks and sequence position
numbering is relative to a selected reference sequence. Gaps
within the reference appear as '.' and are not counted in
numbering. End gaps appear as '.' with 'V' and 'v' as the major
and minor tick marks, and numbering from -1 before the start and
from +1 after the end of the reference. The additional copy of the
consensus is no longer reported.
When reading ABI trace files the quality scores can now be
read. They are undefined in ABI files, but assumed to be phred
scores. ABI files can have two sequences and sets of quality
scores. The first is from the instrument base calling. The second
is from a second base caller. Where two sets are found, EMBOSS now
reads the second set.
Application nospace has a new -menu option to trim all, trailing, or
excess whitespace.
Output type outfileall is obsolete (it is essentially an outfile)
and has been deleted. No application was using it.
Input type filelist (comma-delimited list of filenames) now trims
excess whitespace from the beginning and end of each filename.
Command line qualifiers with an '=' but no value now have a
value of an empty string. Previous releases set the value to "="
The file extension for directory, dirlist and outdir ACD datatypes
is now a qualifier. This allows it to be defined as a default in
the ACD file but also substituted by the user. An empty string
means 'ignore the extension'. To specify 'no extension' a single
space can be used as the value.
On the command line, for a parameter (with no qualifier name
given) a single dot was used as a missing value in previous
versions. This causes problems when specifying the current
directory as a dot. On the command line an empty (missing) value
must now be an empty quoted string '' or "".
Ampersands in application descriptions have been removed. They
confuse HTML versions of documentation.
The QA test script qatest.pl has new options -simple to turn off
messages when running with a local test file, and -with to cancel
-without options
Output redirected to a file can now use ajSysExecOutname functions
to pass the filename to be used for standard output and possibly
standard error. The filename is most usefully picked up from a new
function ajAcdGetOutfileName which closes an ACD outfile and
returns the name of the file. The file will be empty if simply
opened, or will have existing contents if the append attribute is
true in the ACD file.
The output from tfscan is now in report format, replacing the
undefined text file produced in previous releases.
Where a new string is created by ajStrAssignS (the standard string
copy functions) the reserved space for the string is enough to
hold the current string value. In past releases the reserved
memory was the same size as the reserved memory of the string
being copied. This wasted memory where a large string had a short
value, especially when copying records read from a buffered input
file.
Sequence input formats now turn off buffering of input once they
can no longer fail (for example, FASTA format after the header
record will read everything until it finds another header).
Make ajaxdb code IPv6 compliant. Remove gethostbyname config
check.
pcre, expat & zlib include files now install to separate
subdirectories.
Showfeat failed to sort features with 'join' locations. The
sorting is corrected. A future internal change will improve
feature sorting in all cases.
Restriction mapping applications now process bad enzyme input
files without crashing.
PNG graphics output had an unwanted blank margin that did not
appear in other output formats. This is now turned off through
plplot.
Prettyplot formatting is corrected to improve the centring of
characters within boxes.
Restriction mapping applications no longer have an upper limit on
the number of cuts.
Warning messages for EMBL format sequences created by ENSEMBL
have been turned off.
Corrected references to the EMBL/GenBank feature table
documentation in ACD files and web pages
embossversion now reports the setting of debug options, and
corrects variable name warnrange to acdwarnrange.
Any numeric ACD type (integer, float, range or array) with
calculated values for the minimum or maximum attributes can
potentially have an impossible range (maximum less than minimum)
at run time. ACD processing now discovers these calculated values,
and requires a definition for a new attribute 'failrange' If this
is defined true, a 'failmessage' attribute must also be defined to
explain why the values are invalid (e.g. input sequence too short
for the algorithm). If 'failrange' is false, a value for another
new attribute 'trueminimum' must be set to define which of the
minimum or maximum values if to be used as the only accepted
value.
PNG graphics output had a plplot-defined margin limiting the
available plot space. This is now removed, allowing applications
such as prettyplot more space to display results.
Resource attribute identifier: is obsoleted. No code used it. It
is no longer allowed in resource definitions.
Database attributes identifier: description: and command: are
obsoleted. No code used them. They are no longer allowed in
database definitions.
Version 6.2.0 15-Jan-2010
Fixed GFF2 and GFF3 feature formats to always have the start
position less than the end position for features on the '-'
strand.
Updated sequence format refseqp to handle features for proteins in
the latest release of refseq protein.gpff files.
A new function ajDebugTest can be used to turn on/off specific
debug calls. The only argument is a quoted string. A file
.debugtest in the current directory or the user's home directory
is read. This contains a list of tokens to be debugged, so
ajDebugTest returns true if any of these tokens is passed in.
Optionally, the name in .debugtest can be followed by a number
which is the maximum number of times that token will be reported.
ajDebugTest is intended for developers who use ajDebug calls that
may be expensive or be excessively called.
Some attributes in ACD files may appear more than once. These
include any relations: attribute (now being populated with
references to the new EDAM ontology), the groups attribute for
applications, the (currently unused) keywords attribute for
applications, and the external attribute for applications.
Any external application must now be defined in the ACD file with
an external: attribute in the application section. The string
value has the name of the application as the first word, followed
by a message to be printed if it is not found. When the ACD file
is parsed, before any user prompts, the external applications are
searched for by first looking for an environment variable
EMBOSS_appname and then checking for an executable file in the
current directory or in the path.
All applications should be launched by using the name returned by
the new ajAcdGetPathC or ajAcdGetPathS functions. This ensures the
application has been found in ACD processing and any
EMBOSS_appname variable has been tested.
The acdvalid utility now tests for duplicate attributes.
Format specifiers for strings and characters (%S, %s and %c) now
have two flags U (e.g. %US) for uppercase and L for lower case
output.
The configure.in and main package Makefile.am files handle
--enable-devwarnings differently. For the imported libraries this
level of warning message is turned off. Messages are still
generated for warnings from the main EMBOSS libraries and
applications.
The QA testing script qatest.pl has new options -nocheck to skip
"make check" applications and -noembassy to skip EMBASSY packages.
Extractfeat processed failed to accept all features by default.
Extractfeat failed on reverse direction nucleotide features.
Coderet miscounted non-coding sequences in the output table.
Graphics devices now have improved and additional checks. 'tek'
was rejected as an ambiguous match. 'das' is only valid for an
xygraph - one based on sequence positions. On Windows (using
mEMBOSS) the plplot version supports fewer devices and these are
now excluded from selection.
The change to graphics library access makes the ajGraphInit call
which registered graphics functions for use by ACD parsing
redundant. In its place we need to register data access
functions. As all applications make use of this, we now include
this automatically in embInit so there is no longer a need for
applications to make a separate call before invoking
code (e.g. ACD parsing) that may require registration of
functions.
The AJAX ACD code is now in a separate library. New core library
functions store and retrieve ACD persistent data such as the
program name, command line and list of inputs. As ACD is now
linked separately from core AJAX and the graphics library, the
callback mechanism for ajGraph functions to be called from ACD is
no longer needed.
The database access code in ajseqdb.c has been moved to a separate
higher level library. This is where we will insert code to access
the new ensembl library functions in AJAX, and possible future
data access libraries. A callback mechanism is used so that the
embInit call automatically registers data access methods to make
them available within the core library functions that read
sequences. This allows ajSeqRead to remain in the core library
while calling database access methods that in turn may invoke
ensembl access.
The PCRE (perl-compatible regular expressions) code in AJAX has
been updated to release 7.9 of PCRE. Previous releases were still
at version 4.3. The code is standard PCRE code with the LINK_SIZE
set to 4 bytes to allow matches in long sequences.
ACD files include relations attributes with text taken from terms
in the EMBRACE EDAM ontology. These terms are also described in
the knowntypes.standard file and are matched to the known types
when validating ACD files.
EMBOSS now uses a more complete User-Agent string when
communicating with HTTP servers.
FASTQ short read sequence formats now read and write faster using
lookup tables to avoid calculations in the conversion of quality
scores.
FASTQ short read formats have additional warning messages for bad
or incomplete data.
All sequence input formats now recognize invalid partial entries
at the end of the input data and report an error message. A
notable exception is FASTA format where a partial entry is still a
valid ID line - these will give errors for zero length sequence
unless empty sequences are allowed.
Common output formats now write faster, using lightweight output
functions to copy strings to the output file.
SwissProt output formats now wrap long OS lines.
Needle has been updated with end-gap penalties support, allowing
complete global pairwise alignments. Three new options have been added;
the endopen and endextend options are used to specify
the gap opening and extension penalties for the end gaps,
while the endweight option turns on/off weighting of the end gaps.
New application needleall for all against all global/overlap
pairwise alignment of sequences in two multi-sequence files.
wordmatch updated for multi-sequence files using a modified version
of the Rabin-Karp algorithm for multi-pattern search. Also added is
a log file with statistical information on pattern matches.
The updated wordmatch can, for example, be used for efficiently
finding multiple patterns in large fastq files.
Application documentation has a new format HTML table for the
command line options. This is excluded from the text
documentation, where the format of the help output is improved.
Function names standardized for ajcod.c ajrange.c ajtranslate.c
ajgraph.c ajhist.c and a few other functions renamed. The old
names continue to work as "deprecated" functions although these
will generate warning messages with the gcc compiler.
Infoseq option -version is renamed -seqversion to avoid a clash
with the new global -version qualifier.
Three new "make check" applications entrailshtml, entrailsbook and
entrailswiki generate tables of internal data in HTML, DocBook or
WikiText formats. These are intended to update the website, books
and Wiki with the latest internal details. The -tables qualifier
specifies one or more tables to be printed. By default, all tables
are produced. The book tables are sorted in format name order.
Alignment output included headers only for EMBOSS-specific
formats. The headers have been dropped from the FASTA MARKX0
through MARKX10 formats to allow standard FASTA suite parsers to
use the EMBOSS versions of these outputs.
Fastq-solexa sequence formats converted phred scores of 1 to
Solexa scores of -6. They now convert to the limit of -5.
Fastq-sanger sequence format incorrectly stopped when the quality
scores started with a '@' (phred quality 31).
Intelligenetics sequence format now correctly ignores additional
carriage control characters.
Genbank-like protein formats (genpept and refseqp) failed when
reading more than one sequence. The input is now buffered when
the format is automatically reassigned to a related parser.
The -help output now includes the one-line documentation string
from the ACD file and the version number information reported by
--version.
All applications have a -version (or --version) qualifier which
will report the EMBOSS version number. For EMBASSY applications it
will also report the EMBASSY package version number as
"PACKAGE:version". All EMBASSY applications need to call embInitP
with an additional parameter of VERSION which will be defined
automatically by the configure.in template. If the "versionnumber"
attribute is defined in the ACD file this will also be reported as
the application version "progamname:version"
The ACD application attribute "version:" is renamed
"versionnumber:" to avoid a name clash with the new -version
qualifier. We need to use the qualifier name "-version" for
compatibility with other systems and applications, so the renaming
of the attribute is unavoidable. We believe it was only used (as
originally intended) for the definition of external applications
by SoapLab.
Version 6.1.0 15-Jul-2009
New application showpep displays protein sequences. Showseq is now
limited to nucleotide sequences. Many of the showseq options are
not appropriate for proteins. Showpep makes the remaining showseq
options available.
A new data structure AjPSeqXref holds details of cross-references
between a sequence object and any other data resource. The
cross-reference attributes include a type to indicate the source
of the cross-reference, for example XREF_DR for a reference in a
DR line from EMBL or Swiss-Prot. The other attributes are the
database name and up to 4 identifiers (as in the Swiss-Prot DR
line definition) and a start and end position where the source is
a feature table entry.
When reading a sequence with an identifiable species, attempts are
made to define the NCBI taxonomy identifier for the
species. Possible sources include the OX line in Swiss-Prot, the
taxon cross-reference in the EMBL/GenBank/DDBJ feature table
(available only if the feature table is read) and the species name
which can be matched to a set of common species obtained from
NCBI.
Swissprot entry descriptions in FASTA output no longer have a
trailing '.'. Where the source entry has the new Swiss-Prot DE
line format the name is built from the recommended full name with
other names in round brackets.
Binary files now consistently have null characters after strings
to pad them to full length. Previous versions wrote whatever
followed the NULL in the string object. The resulting files now
look cleaner although any extra characters were always ignored
when reading dbi index files.
Test databases were updated on 24th June 2009.
Blank lines are ignored before any sequence input. This is to
support the use of seqret to read data pasted into web forms where
extra blank lines are often accidentally included.
FASTQ is now a valid sequence format and can be detected
automatically. "fastq" format ignores all quality scores as there
is no automatic and safe way to determine whether scores are for
Sanger/phred or Illumina/Solexa quality. To read the quality
scores we support formats "fastq-sanger" and "fastq-illumina". We
also support "fastq-int" to read quality scores as integers. These
scores are assumed to be Sanger quality. For Illumina quality
scores out of range, a warning message is written once for each
sequence. Sanger scores do not have out of range values as they
allow the full set of quality characters, although high values
(over 40) should only appear for contig consensus sequences.
MEGA format has been rewritten to support the file format used by
MEGA 4. Title can be in mixed case. Format and Gene/domain command
lines are processed. Multiple gene/domain files are read by EMBOSS
as separate alignment sets by seqretsetall. This may change in a
future release as MEGA4 processes them as one alignment with
annotated gene regions. While EMBOSS has no annotation specific to
alignments this is a reasonable compromise.
embossdata will now always return directory listings alphabetically.
A new ACD function replaces an attribute value with an EMBOSS or
environment variable. The attribute syntax is (@value:VARNAME).
Infile datatypes in ACD have a new attribute directory: which
defines the default directory to be searched. If the user
specifies an explicit path the directory attribute is ignored.
Applications writing out multiple sets of sequences now correctly
reset the sequence output. This only affected one test application
in EMBOSS 6.0.1 (input type seqsetall and output type seqoutall).
Applications that use single letter qualifier names (for example
the HMMERNEW wrappers for HMMER applications) can be confused if a
single letter qualifier name matches uniquely an associated
qualifier for a preceding command line qualifier. An additional
check now ensures that a unique qualifier (for example -o) is
correctly recognized.
Global alignments with needle in rare cases missed the optimal
alignment of the first 2 residues. This was a bug introduced in
6.0.0.
When reading data using a launched application, including the SRS
access method which launches "getz", closing the input without
reading to the end caused the file close function to loop
forever. Examples included nthseq and seqret -firstonly both of
which stop reading when they have reached the nth or first
sequence. File closing now only waits if the input has reached end
of file, and has a timeout on the wait to break out of the loop.
Intelligenetics format sequence files with more than one sequence
are now read correctly. Where the sequence ends with a number,
intelligenetics format sequences can now be automatically
detected.
Add -methylation option to restrict/restover/remap/showseq
to simulate (e.g.) dam/dcm restriction enzyme knockouts.
remap now correctly reports restriction enzymes cutting a
greater number of times than an optionally-supplied maximum
value. The primary function of the application was unaffected.
showfeat has a new option -joinfeatures to display all exons on
one line for a join feature location. In previous releases this
was one of the -sort options. It is now possible to use
-joinfeatures and to select a sort order.
Installing without X11 (using the --without-x option for
./configure) used "x11" as the default graphics device in some
applications. These now use "png" (if available) or "ps".
needle and water with the -nobrief option repeated report header
information on the longest and shortest similarity and identities
because the previous header content was not cleared. This only
affected results where there was more than one sequence as the
second input.
In the EMBL/GenBank feature table the group() and one_of()
operators are obsolete. They are automatically converted to
order().
The command line syntax using the master qualifier name as a
suffix (for example -sreverse_asequence) ignored the master
qualifier name and set values for all matching inputs. This syntax
is intended as a way for wrappers to better control the use of
associated qualifiers, as it is cleaner than using a numeric
suffix (-sreverse1 -sreverse2 etc.)
Using -sreverse on the command line could reverse protein
sequences for inputs that can read more than one sequence (seqall,
seqaset, seqsetall). -sreverse is now only set for nucleotide
sequence inputs. Single sequence inputs correctly ignored the
-sreverse value.
Multiple sequence sets can be read as input type seqsetall, but
when this input was used for a single sequence set input (type
seqset) all sequence sets were read. seqset input now stops after
the first set (for example a PHYLIP or MSF alignment).
Genbank test data had incorrect format. The data was extracted
from a set of test GCG databases and had spaces in the feature
locations.
extractfeat now uses the new feature fetch functions and can
retrieve features that include joins across entries.
Feature parsing functions are added to fetch sequences from other
entries. These depend on reusing the USA of the original sequence,
with the identifier of the external sequence inserted in place
of the original. This is known to work for database references and
flat files.
coderet was limited to EMBL/GenBank feature tables. It now
processes any valid feature input including GFF files. The
previous parsing functions are obsolete and have been removed
as coderet was the only application calling them.
Very large pairwise alignments can fail to back trace through the
alignment because of rounding error. The alignment and traceback
functions now use double precision to maintain accuracy.
pepwindow and pepwindowall missed the plot value for the last
window in the sequence.
pepwindow and pepwindowall now process sequence ranges -sbegin and
-send.
pepwindow and pepwindowall now default to a window length of 19,
ideal for transmembrane regions. The old default of 7 was short
and gave noisy results.
pepwindow and pepwindowall have an extra option -normalize to
convert the amino acid data in the datafile to mean 0.0 and
standard deviation 1.0. The default Kyte-Doolittle data is not
normalized.
The EMBL/Genbank feature table definitions have been updated to
version 8.0 (October 2008). Sequence ontology terms are now
available for all feature types except S_region for which no
specific SO term exists. S_region is attached to an internal term
derived from SO:0000301 as a placeholder.
Programs searching with regular expressions and patterns reported
the pattern name with '1' added to the end. This was to support
pattern and regular expression files with multiple patterns. When
only one pattern is given on the command line the '1' is no longer
added.
Programs searching with regular expressions (dreg and preg) missed
overlapping matches to the pattern. The algorithm now steps
forward one character from the start of the match and searches
again. Some regular expressions with wildcards may produce a large
number of overlapping matches especially in low-complexity regions.
Protein sequences in GFF format now use GFF3 by default. For
release 6.0.0 protein sequences were written in GFF2 while the
GFF3 protein feature definitions were redefined using the Sequence
Ontology. This process is now completed.
When a sequence is reversed by revseq the description is tagged
with "Reversed: " so that the output and any sequence derived from
it has a note of the history.
EMBL and GenBank formats when used to read multiple entries failed
to reset the list of citations. Although the first set of
citations was reported correctly, all other entries in the same
run included the citation list from the first entry.
SwissProt/UniProt entries now preserve the complete entry content
when read and rewritten. All feature types are preserved and
feature lines wrap according to the widths in UniProt 14.8. Date
lines are stored and written. Comments are stored in blocks.
Database cross-references are stored in a list. The description
lines are saved in the new SwissProt structure. Tests on a set of
complex entries confirm that EMBOSS is able to read and write an
exact copy of this sample set.
Protein feature keys now use the Sequence Ontology identifiers
as internal names. This may change the way some feature keys are
converted between data formats. Protein feature keys have been
updated to correct some conversions, for example to distinguish
between "coiled coil" from pepcoil and "random coil" from garnier
output.
Fitch sequence format was only able to read a single
sequence. EMBOSS can now read 'fitch' as a multiple sequence
format.
Extractfeat now cleanly processes minscore and maxscore as limits
on the score. By default any score is allowed if these are
unchanged. Previous releases required minimum and maximum to be
equal - or minimum greater than maximum - to permit any feature
score.
New feature XML output format DASGFF. Feature output functions
have a changed interface to pass the AjPFeattabOut object so that
additional processing can handle the opening and closing of an XML
output file.
New sequence output formats "dasdna" and "das" write DASDNA and
DASSEQUENCE XML outputs. Sequence output functions have a new
capability to define a Cleanup function to write the final lines
of an XML output file. The AjPSeqout data structure already has
the Count attribute needed to identify the first sequence so that
the XML header can be written.
New environment variable EMBOSS_ACDFILENAME provides an
alternative way to set the default output filename for EMBOSS
applications. If set to true, the filename is used rather than the
current behaviour of using the first sequence name as the default
filename. When the filename is used the case of the name is
preserved.
Corrected display of exon ranges in showseq. Exons now display in
their original frame (all were displayed in frame 1 in earlier
versions). Display of 3-letter amino acid names corrected (but we
hope nobody is using 3-letter codes any more!)
Added create attribute for outdir datatype in ACD. If true, the
output directory will be created if it does not already exist.
The default is false. output directories must already exist. This
is the behaviour in previous releases.
Added attribute aligned for datatype seqoutall in ACD
files. Applications can write multiple sequences as a seqoutset
(aligned or unaligned) and can also write seqoutall - writing
sequences one at a time without first storing them as a set.
For phylogenetic applications (PHYLIPNEW) reading distance matrix
files failed for some formats written by other
applications. Distance matrix input now works for multiple
matrices in square, upper-triangular and lower-triangular formats.
The PLPLOT graphics library uses 4 environment variables to allow
local configuration. EMBOSS uses a local copy in libeplplot. For
sites that have the native PLPLOT also in use we have renamed the
environment variables to use the prefix EPLPLOT. This protects
EMBOSS from any configuration set only for the local plplot.
The variables are: EPLPLOT_BIN EPLPLOT_LIB EPLPLOT_TCL and
EPLPLOT_HOME. Versions of EMBOSS up to 2.8.0 defined PLPLOT_LIB
but this value is now automatically set and the environment
variable is no longer needed.
Command line qualifiers are renamed where the first 5 characters
are the same. These were:
eprimer3 major revision of all options
est2genome -splice to -usesplace
prettyplot -boxcolval to -boxuse
octanol -*plot to -plot*
showfeat -match* to -*match; -source to -origin
showpep -match* to -*match
showseq -match* to -*match; -source to -origin
vectorstrip -vectorfile to -readfile; -linker* to -*linker
and similar changes for EMBASSY applications.
ACD processing now objects if two or more qualifiers are not
unique in the first 6 characters. In a future release we would
like to reduce this to a 5 character unique name. Several EMBASSY
applications need to be modified to comply with this requirement.
MEMENEW updated for meme/mast version 4.0.0. ememe now
produces fasta, html, text, xml and xsl outputs. A new variant,
ememetext, produces only the text and fasta outputs.
DBX index file key deletion code added for ID/ACC/SV/KW/DE/TX
indexes.
HTTP access now adds a User-Agent string with the EMBOSS version
number so that servers can count the number of EMBOSS requests.
PDB model structures failed to generate a new name for each
model. Duplicate sequence names are not ideal. The model number
(from the MODEL record) is now appended to each sequence name in
"pdb" and "pdbnuc" format. The "pdbseq" and "pdbnucseq" formats
read a single copy of each sequence from the SEQRES records.
Added two new PDB formats to read nucleotide data. These are named
"pdbnuc" and "pdbnucseq". They are not available by default, to
avoid the problem of reading both protein and nucleotide sequence
data from a structure file for an oligonucleotide binding protein.
Alignment outputs now include most of the multiple sequence
alignment formats that EMBOSS can write. The functions for these
are trivial to write. New functions can be added to use any
existing sequence output format for alignments.
PDB entries can be read in two ways, with two named
formats. Sequence format "pdb" reads the ATOM records. Sequence
format "pdbseq" reads the SEQRES records. By default, only "pdb"
format was used, and could crash on entries where the ATOM records
were missing. Both formats now fail silently if no sequences are
found. By default, "pdb" format is used first, and if that fails
"pdbseq" will be tried.
The EMBOSS logfile (defined by variable EMBOSS_LOGFILE) now
reports two extra values: the number of cpu seconds and the
number of elapsed time seconds.
Extra stop codons in getorf for ORFs ending close to the end of
the input sequence no longer appear.
For optional qualifiers (defined as "nullok" in the ACD file) the
command line option -no(qualname) was causing output files to
appear by resetting the value to an empty string, which in turn
was converted to the default filename. Now -no(qualname) turns off
any output file defined with nullok, and -(qualname) "" asks for
an output file that is off by default and uses the default
filename for it.
Report output has a new tail format that reports the total
sequences and total sequence length read by the applications. The
previous "Total_sequences" report was the number of sequences
included in the report. This is renamed to "Reported_sequences".
Where the number of hits was limited by the -rmaxseq or -rmaxall
options, the number of unreported hits also appears. If the
rmaxall limit was exceeded, the report tails ends with
"Maxhits_stop: Y". If the -rmaxseq limit is exceeded, the sequence
report includes (as before) "HitLimit: max/total"
Refseq protein and Genpept now use a modified genbank format to
avoid warnings for "aa" replacing "bp" on the LOCUS line and to
provide better control over any other differences between
nucleotide and protein entries. Genbank format automatically calls
refseqp format if a LOCUS line has "aa".
Swissprot output was missing a '.' at the end of the organism line.
vectorstrip failed if the user failed to provide a filename for
the -vectorsfile option and failed to specify -novectorfile to
turn off file reading. The ACD file is changed so a vectorsfile is
required if -vectorfile is true and a check is put into the code
to catch the problem if the ACD interface changes in future.
Allow user-defined -carboxyl parameter for iep.
jaspscan now allows multiple sequences to be scanned.
Version 6.0.0 15-Jul-2008
New application aligncopy reads a set of aligned sequences and
prints a report in one of the standard alignment formats that can
accept the same number of sequences. Pairwise alignment formats
can only be used if the input has exactly two sequences.
New application aligncopypair reads a set of aligned sequences and
prints a report or each pair of aligned sequences in one of the
standard alignment formats.
New application featreport reads a sequence and a feature table,
and writes a report in and of the standard report formats.
New application featcopy reads and writes a feature table to
convert feature formats.
New applications maskambignuc and maskambigprot replace ambiguity
characters in nucleotide sequences with 'N' and in protein
sequences with 'X'.
New application consambig reports an alignment consensus sequence
using ambiguity characters. The intended use cases are sequencing
reads and SNP reporting.
New application sizeseq sorts sequences in ascending or descending
order of length. This is a port of the application seqsort from
the domsearch EMBASSY package.
New application skipredundant uses pairwise sequence matches to
exclude sequences that are similar from an input set. This is a
modified version of the application seqnr from the domsearch
EMBASSY package.
New applications provide utility functions for former GCG users:
nohtml removes HTML tags, notab replaces tabs with spaces,
nospace removes all whitespace from a file, skipspace removes
extra whitespace from a file.
Older EMBOSS applications can now generate a warning message
stating that they are marked as 'obsolete' with an explanation and
an indication of alternative programs in EMBOSS or in an EMBASSY
package. This warning can be turned off by defining environment
variable EMBOSS_WARNOBSOLETE with a value of "N" or by defining
the same variable in the emboss.defaults or ~/.embossrc files. We
will begin to mark applications as 'obsolete' in future releases.
A new EMBASSY package "myembossdemo" contains the demonstration
applications demoalign, demofeatures, demolist, demoreport,
demosequence, demostring, demostringnew and demotable that
illustrate how to use EMBOSS data types in your own
applications. The myembossdemo package allows novice developers to
try simple EMBOSS programming. The myemboss package is available
for adding your own applications. The demo applications are no
longer distributed with the main EMBOSS package. They were not
installed and were only built with the "make check" option.
Application short descriptions have been revised. The minimum
length of application one line descriptions is increased from 60
to 70 characters. The descriptions are easier to write. Output
from wossname can now be 90 characters wide. Interfaces that use
the description in menus may need to allow some extra space.
Function names in ajfile.c have been standardized. Old names are
still accepted but are marked as "deprecated" and will generate
warnings with the gcc compiler (see ajstr below). Other compilers
will see no difference. New source files ajfiledata.c and
ajfileio.c have been added. The buffered file data structures are
renamed internally to be more consistent (AjPFileBuff to AjPFilebuff).
notseq was unable to search for IDs containing '|' characters
but uses string matching (not regular expressions) and these
characters are valid in NCBI-style FASTA files if read with the
"pearson" format which accepts the whole ID string without parsing.
The sequence alignment code has been updated. Sequence alignments
with low gap penalties failed to allow two gaps (one in each
sequence) without a match in between. The embAlign functions are
now simplified. Scores are returned by the PathCalc functions. The
Walk functions that walk through the path and return the aligned
sequences are faster and need fewer parameters. Profile alignments
occasionally duplicated residues in the sequence around gap
positions. Fast alignments around a limited width include
additional residues at each end and require an offset rather than
separate start positions. The offset if the difference between the
two start positions used in 5.0.0 and earlier releases.
Eprimer3 citations are corrected in the help text (from the ACD
file) and in the documentation. The citation errors were traced to
the original primer3_core documentation which has now been
corrected.
Wordmatch could confuse overlapping matches. It occasionally
extended the wrong match and missed a corresponding new match.
Seqmatchall results were correct with the default output
format which reports match positions, but gave incorrect results
with some other local alignment formats that include the sequence.
Seqmatchall now stores alignments in the same way as other local
alignment applications, and the alignment internals are corrected
to ensure other applications will not have the same problem.
Emma was officially supporting clustalw 1.83. Issues with clustalw
2.0 are now resolved and this version is supported if clustalw2 is
installed. Emma executes an applications called clustalw (not
clustalw2) so version 2.0 must be installed under this name or an
environment variable EMBOSS_CLUSTALW needs to be defined to point
to the executable clustalw2 file.
Sequence format "selex" allows invalid sequence data files to be
accepted as input. Selex format is still available but is no
longer included in the formats that can be automatically
detected. When reading selex format data, users need to put
"-sformat selex" on the command line, or specify "selex::" at the
from of the USA. See the HMMER (old version EMBASSY package)
documentation for examples. HMMERNEW (recommended) examples use
Stockholm format and so are unchanged.
Program dbxfasta now defaults to a filename of "*.fasta"
The previous default "*.dat" is not commonly used for FASTA format
databases.
Program msbar block mutations were 1 longer than the specified
block and may crash if the block size was fixed (minimum and
maximum block sizes the same). This off-by-one error is now
corrected.
In GenBank output format, multiple line KEYWORD sections were not
formatted correctly.
ACD list and select values (the menus that appear in the user
prompt) can now have ACD variables. Although useful for local
application development these are not used in EMBOSS distributed
ACD files because the variables are difficult for web and GUI
interfaces to resolve when presenting the menu text.
List and Table internal data structures are now cached so that
creating and deleting temporary lists and tables is more efficient.
In emboss.default database definitions the filename and exclude
values can be delimited by spaces, commas or semicolons. Previous
releases used only spaces. Parsing is now consistent with the
fields definition which allowed all the above characters.
Protein sequences with pyrrolysine ('O') had 'O' converted to a
gap because this was a gap character in early versions of
Phylip. This was patched in 5.0.0 to allow 'O' in UniProt release
13. The gap character is upper case only, so 'o' was correctly
read as pyrrolysine.
Wordfinder used the same descriptions for two pairs of qualifiers.
The descriptions are changed to make their meaning clear in
commandline help and in web interfaces.
New function ajTimeDiff returns the difference in seconds between
two time values.
Profiling tests showed that file reading and string handling can
be made faster. String handling called functions many levels
deep. Making this code inline and using macro versions improved
performance for applications (e.g. database indexing) that use
many string calls. File input requires each input line to be
copied. Using copy-by-reference (ajStrAssignRef) often makes this
more efficient. Existing macros now test for undefined strings:
MAJSTRGETLEN, MAJSTRGETPTR, MAJSTRGETRES and MAJSTRGETUSE. New
macros are added for string handling: MAJSTRDEL,
MAJSTRGETUNIQUESTR, MAJSTRCMPC and MAJSTRCMPS.
Memory management includes new macros AJCRESIZE0 and AJRESIZE0
provide resize functions that guarantee new memory is set to
zero. The functions must be given the original allocated size.
Using the GNU C run-time library, calls to mcheck and mprobe are
available to test for memory corruption by examining the bytes
before and after an address allocated by malloc. This can be
turned on for any application, including Unix commands, with the
environment variable MALLOC_CHECK_ which has values 0, 1, 2 or
3. 1 writes to standard error when a problem is found, 2 aborts
the programs, 3 does both and 0 ignores errors. No recompilation
is needed for this simple method. EMBOSS now has a ./configure
option --enable-mprobe which enables two new
functions. ajMemProbe, passed an address from malloc (AJNEW0,
AJCNEW0, etc.) tests the bytes before and after and reports any
errors. The advantage of using ajMemProbe rather than mprobe is
that a macro MAJMEMPROBE also reports the file and line number
where it was called. To avoid large numbers of messages (when
code has problems) a limit can be set with ajMemCheckSetLimit
after which the program will exit. Note that enable-mprobe is
incompatible with using valgrind to test for memory leaks - as
mprobe and mcheck have to look at illegal bytes before and after
allocated memory blocks. Memory checking is turned on by a call to
mcheck, passing the function ajMemCheck, in ajnam.c before the
first memory allocation. If any program calls malloc before
calling embInit or embInitP this call will fail and issue a
warning (if compiled with --enable-mprobe). A special call
ajStrProbe tests any string with mprobe. Special calls ajListProbe
and ajListProbeData test lists and their contents. For more
details see http://www.gnu.org/software/libc/manual/
Protein sequences from the Staden package were read as nucleotide
because they were missing information on the ID line to identify
EMBL of SWISSPROT format. The sequences are now tested and
correctly typed.
Wordcount now accepts protein sequences as input. Previous
releases only allowed nucleotide sequences.
Wordfinder options had the same information prompt. These have
been changed from "limit" to "minimum" and "maximum" to make their
function clear.
Prompting for values from the user now includes a test for
standard input in use as an input file. If standard input is open,
the default response is accepted and a message is written to the
user. This is to avoid problems with command lines that use
"stdin" as an input and do not include -auto.
The acdpretty utility can now preserve comments in ACD files.
Comments are maintained in blocks with blank lines before and
after. Inline comments are started in column 50 unless they are
exceptionally long. Comments themselves have white space cleaned
up but otherwise are not reformatted.
A new function ajAcdGetValueDefault is added to return the default
value of an ACD qualifier. This can be combined with
ajAcdIsUserdefined in wrappers to test for values changed by the
user.
Infile qualifiers in ACD have a new attribute "trydefault" which
allows the default filename to fail. Any filename provided by the
user has to exist. This was added to support the behaviour of the
MIRA EMBASSY package. To allow an infile to fail the attribute
"nullok" also must be set to "Y"
Applications which produce an output file or graphics often
created an empty output file when the plot was selected.
The ACD files have been corrected to only create the file if it
will be written to. Applications changed are charge, dan,
freak, hmoment, iep and tcode.
Whichdb only writes to its output file if -get is false.
With -get it creates sequences. The outfile is no longer created
when whichdb is in -get mode.
String functions corrected so that Case in the name always means
case-insensitive and works by converting to upper case. Some
functions were defined the wrong way, with "Case" for the
case-insensitive form.
GFF3 format is now the default feature output.
A new function ajFeatIsCds identifies protein coding nucleotide
features (CDS) using the SO identifier. A new function
ajFeattagIsNote identifies feature tags that are for the default
feature tag.
Protein features now use the new Sequence Ontology terms defined
by BioSapiens. These are not yet accepted by GFF3 validators. The
new SO identifiers are added to protein feature definitions and
used internally.
Feature format definitions (the Efeatures and Etags files)
now allow #include references to other files. This allows a
standard EMBL and Swissprot feature table definition to be
included by the internal and GFF definitions. Redefinitions are
allowed using + and - prefixes to add and remove tags for existing
feature types.
GFF3 format feature (and report) output is added.
A new application "density" has been added. This reports the
A+C+G+T and AT+GC densities of nucleic acid sequences within
an adjustable sliding window. Plots of A+C+G+T or AT+GC are
optionally produced.
Molecular weight programs (e.g. digest, mowse) now have a
-mono switch to allow use of monoisotopic weights.
By default, average molecular weights are used.
The Eamino.dat format has changed. Molecular weight information
has been removed and put in its own Emolwt.dat file. This latter
now allows specification of average and monoisotopic weights. Values
for hydrogen and oxygen are specified as well as the amino acid weights.
The library representation of amino acid property information
has been changed. The EmbPropTable global table has been
removed and replaced with EmbPPropAmino and EmbPPropMolwt objects.
Pepcoil now produces a report (replacing a text output) in "motif"
format. The default is changed to not report non coiled-coil
regions as they are hard to distinguish in this format.
The "motif" report format is extended to allow two score positions
marked with "*" and "+" and labelled internally as "pos" and
"pos2". No application uses pos2 (it was added for pepcoil, but
both score maximum positions are always the same)
A new function ajAcdIsUserdefined allows wrappers to test which
qualifiers have values changed by the user so that they can use
shorter command lines to launch the wrapped application.
jaspscan application added. Scans sequences for transcription
factors using the JASPAR matrices.
jaspextract application added to move the JASPAR matrices into the
EMBOSS data area subdirectories.
Alignment format "trace" used to display internal data content, is
renamed to "debug" to be consistent with other formats. A "debug"
format is added for feature output.
Application documentation has been updated to remove obsolete
references to EMBL database identifiers. These are replaced with
the correct accession numbers.
Two new entries have been added to the "tembl" test EMBL database
for use in the QA tests.
Report output now checks the sequence and feature table type. Is
the sequence is not a valid protein, protein-only formats (pir,
swiss) will fail with an error message. Similarly, if the sequence
is not a valid nucleotide sequence then nucleotide-only formats
(embl, genbank) will fail with an error message.
Garnier now uses the correct SwissProt and internal feature keys
for protein secondary structure. The results will appear much
better for example as a swissprot feature table. This required
rewriting of the internals by recoding the secondary structure
features with a "garnier" tag replacing the previous "helix",
"sheet", "turns" and "coil" tags. The default output is
unchanged. The results in other report formats will be changed.
Silent no longer reports the "Dir" column. This is replaced by the
new "Strand" column which reports "+" for a forward feature and
"-" for a reverse feature.
The following programs have changed default report output, with
the strand included for nucleotide sequences: equicktandem,
etandem, fuzznuc, fuzztran, recoder, restrict, silent, tcode,
twofeat. The strand column can be removed with the new command line
associated qualifier -norstrandshow.
Reports for nucleotide sequences have confusing ways to represent
the start and end positions for features on the complementary
strand. A strand column has been added to these reports,
controlled by a new -rstrandshow qualifier and attribute. By
default the strand is shown for all nucleotide reports (see a list
of changed program outputs above). The start position is always
lower than the end position for features on the complementary
strand indicating the region that should be reversed. In past
releases the seqtable report format (fuzznuc, dreg, dan)
confusingly reversed start and end positions to indicate the
unreported strand. For all report formats (nametable, table) the
start and end positions are now consistent with nucleotide feature
formats (gff, embl, genbank).
Reports from dreg incorrectly reported sequences reversed with the
-sreverse qualifier.
Report headers now include the text "(Reversed)" when the input
sequence(s) are reverse complemented.
Phylogenetic trees in newick format are now parsed into internal
trees and converted back for use by Phylip. This allows us to
read other tree formats and pass them to Phylip (e.g. Nexus)
Some ACD data types did not allow the input to be NULL because
extra tests were carried out on the results. These are all cleaned
up and tested so that they can safely be set to nullok and missing
in local applications.
New sequence reading formats for PDB files. By default the ATOM
records are used (format "pdb"). An alternative format "pdbseq"
will read the SEQRES records which give the original sequence. The
ATOM records give the sequence determined from the structure.
Improved the help text for the -stdout and -filter options to
explain output files are written to standard output. Some users
expected graphics output (from plplot) to be controlled.
Version 5.0.0 15-jul-2007
Extractalign is a new applications to extract regions from a
sequence alignment in the same way extractseq extracts regions
from single sequences.
The MRS server in Nijmegen changed its syntax just before our
release. A new database access method "MRS3" supports the main
MRS3 server. We have very little documentation on the changed URL
query syntax. Access by ID appears to work at this stage. The
database URL is defined as http://mrs.cmbi.ru.nl/mrs-3/plain.do
The plain text output is now defined in the URL. The database
names have all changed on the server. At present the same server
appears to still support the old MRS access method with the URL
http://mrs.cmbi.ru.nl/mrs/cgi-bin/mrs.cgi
ACD parsing now allows square brackets within quoted strings.
Functions for lists and tables have been renamed to new standard
naming conventions. Some source files remain to be standardized
after the release, most importantly ajfile, ajfeat and some
remaining ajseq source files.
Warning messages are available for sequence formats that do not
allow additional characters. The environment variable
EMBOSS_SEQWARN needs to be set to "Y" to enable warnings. For
example, EMBL format allows numbers in the sequence records. Fasta
and related formats now warn for any characters that are not
whitespace and not known sequence characters. These warnings are
controlled by an environment variable so they can be disabled (or
enabled) for specific installations and/or wrappers. We expect
many cut-and-paste inputs can generate warnings. EMBOSS will
normally silently remove non-sequence characters.
Regular expression pattern file names (for dreg and preg) were
converted to upper case if the ACD file required the patterns to
be upper case.
The EMBOSS commandline now accepts gnu-style syntax with
--qualifier (we allow one or two '-' characters). Users who tried
this syntax were confused because EMBOSS treated --qualifier as a
parameter. In many cases it was used as the output filename, which
would give no error message but make it hard to find the output.
Antigenic now accepts any protein sequence as input (earlier
versions did not allow ambiguity codes). B and Z are treated as
weighted averages of D/N and E/Q. All others are converted to X
and treated as a weighted average of all values. The data table
used has no information for selenocysteine or pyrrolysine.
Dottup is corrected to plot only the selected sequence range. The
plot lines were 1 residue too long (only noticeable on very short
sequences).
Distance matrix data can now read multiple distance matrices from
a single input file. This is used by three programs (fneighbor,
ffitch and fkitsch) in the phylipnew EMBASSY package.
Discrete states input now correctly defaults to all non-space
characters if no characters attribute is given in the ACD file.
This was the intention, but two programs (fpars and fdiscboot)
were instead accepting only 0 and 1. Other phylip programs have
their discrete state character set specified in the ACD file.
A new function ajSystemOut calls a system command, and redirects
standard output to a named file.
Function names are standardized for the ajsys, ajtime and ajutil
functions.
New function ajStrTableFreeKey frees only the key from tables
where the value is a constant.
Error messages from reading badly formatted comparison matrix
files are improved to report the line and the token that failed
to parse.
Test data has been updated. EMBL and SwissProt entries are updated
to the latest versions of these entries. Swnew entries are now a
selection from the SpTrEmbl subset in UniProt. The wormpep
database is obsolete. We do not have current data for the gb
directory which contained GCG reformatted genbank entries.
NBRF (or PIR) format failed to read some entries from SRSWWW
servers because the sequence ID does not match if the protein is a
fragment.
Efficiency of building large strings is greatly improved by
doubling the reserved space each time the end is reached. This
speeds up the reading of all long sequences.
String function ajStrFmtWrap to wrap strings for output now
respect newlines in the original string. A new function
ajStrFmtWrapAt prefers to wrap at a selected character, for
example ',' for author lists.
Sequence objects are extended to include the full set of fields
defined in EMBL, Genbank and UniProt database entries. The "embl"
"genbank" and "swissprot" formats now read and write all fields,
so that entries will be rewritten exactly as in the originals
except for a few minor corrections (extra spaces in feature tables
are removed). We cannot guarantee that information is preserved
when writing out in a different format. For example, EMBL and
Genbank formats do not contain the same information.
GIF graphics output added where the gd library is a recent enough
version to provide support.
The plplot graphics library has been updated to 5.7.2. New files
are disptab.h pldll.h, file gd.c replaces file gdpng.c and needed
one change for FREETYPE.
Infoseq can now optionally display the database name.
The acdvalid utility warns about qualifier names that do not fit
the standard naming convention. The messages now include a
suggested valid name, for example an input file called -sites
will be suggested as -sitesfile.
Sequence output in EMBL and SWISS formats now defaults to the new
format of the databases from 2006. The previous formats are still
available as "emblold" and "swissold". As sequence input, "embl"
and "swiss" formats will read both versions of the files.
Function ajTableRemove deletes an entry in a table, but only
returns the value. This is replaced by ajTableRemoveKey which also
returns the original key. The caller now owns both the value and
the key, and is responsible for deleting them. ajTableRemove is now
declared obsolete and will be removed from a future release.
Infoseq by default uses columns with fixed width, but this fails
to delimit long sequence names (for example, long file names and
paths). Two changes make this better. Infoseq now inserts a space
in column-delimited output (the default) when a string fills the
whole column. It is also now possible to specify a tab as
delimiter with -nocolumn -delimiter "\t" to return to 3.0.0
behaviour. This was needed for the W2H interface and maybe some
other wrappers.
Renamed libplplot to libeplplot and plplot headers are now
installed to include/eplplot. This avoids collisions with later
versions of plplot.
Version 4.1.0 04-mar-2007
Bugfix 1: graphics output failed to reset the title correctly in
some applications. Prettyplot and banana badly rescale the output
from the second page of multipage output. Abiview produced
additional blank pages with only the title. Abiview also had bugs
in display when the user changed the window size or asked for
separate plots for each trace.
A new ACD attribute outputmodifier: "Y" identifies qualifiers that
cause the kinds of output changes that can break parsers. An
obvious example is the -html qualifier on may of the utility
programs. This attribute is a warning to wrapper developers and
maintainers that they may want to fix the value of this qualifier
and not allow users to change it. In some cases (as with toggle
qualifiers) it may be useful to wrap each possible value
separately. For example, tfm can run as an HTML version (-html)
and a text version (-nohtml -nomore).
Backtranseq now keeps stop positions in the sequence and replaces
them with the most common stop codon. Previous releases converted
stops to 'X' and back translated them as 'NNN'.
Reading sequences in NBRF (or PIR) format now only removes one '*'
from the end, allowing protein sequences to end with a stop codon.
Reading NBRF format sequences in FASTA format was retaining a ';'
in front of the sequence ID. This is now fixed.
Pattern files and regular expression files now use the -pformat
and -pname associated qualifiers which were ignored when they
first appeared in 4.0.0. Pattern file formats are "fasta" for the
original format in 4.0.0 with FASTA style identifiers, and
"simple" for files with a single pattern on each line. The format
defaults to testing the first character for a '>'. The pattern
name is used to set a name of "name1", "name2" and so on if no
name is in the FASTA file. By default patterns are called
pattern1, regular expressions are called "regex1".
Added a new function to read from a buffered file and trim
newlines. It was not needed before because input functions were
doing their own trimming.
Valgrind memory leak tests now cover all QA tests. The command
line is captured and used to generate test cases. Script
valgrind.pl knows about the few cases that need input files copied
and preprocesses them by name. A few tests can be flagged as
ignored. This is intended for tests known to run for a very long
time under valgrind. Memory leaks are fixed for all programs in
the main EMBOSS package and for the most used ones in the EMBASSY
packages.
A new environment variable ACDCOMMANDLINELOG takes a filename as
its value. This saves the command line equivalent of a program
run, converting user responses to prompts into their command line
equivalents. A number of bugs in command line saving for report
headers were identifier and fixed.
Two string functions had their names reversed. ajStrRemoveWhite is
to remove all white space from a string, ajStrRemoveWhiteExcess is
to remove white space from the ends and replace internal
whitespace with single spaces. When function names were
standardized these names were reversed. As function calls were
converted automatically EMBOSS code worked as before, but
developers will notice the functions to not behave as
expected. This is now corrected, and all existing calls in the
EMBOSS code have been checked and converted.
Showseq with a sequence end position now stops output at the end
of the user-specified range, Previous releases printed the whole
of the line with the last base/residue.
SRS servers use "gid" as the field name for GI numbers. The field
name has been changed to allow GI searches with local SRS and
remote SRSWWW access to Genbank.
A new configure option for developers --enable-devwarnings
turns on many more warning messages from the gcc compiler. Not all
warnings are useful - the less useful gcc options are documented
(and commented out) in the configure.in file devwarnings section.
Warnings include missing function prototypes, signed/unsigned
comparisons, potential loss of precision in casts, use of global
names (index for example) as variables.
Function names in ajseqwrite.c have been standardized. Old names are
still accepted but are marked as "deprecated" and will generate
warnings with the gcc compiler (see ajstr below). Other compilers
will see no difference.
Edialign is a new application, a port of the DIALIGN2 program by
B. Morgenstern, using an ACD file written by Guy Bottu. It takes
as input nucleic acid or protein sequences and produces as output
a multiple sequence alignment. The sequences need not be similar
over their complete length, since the program constructs
alignments from gapfree pairs of similar segments of the
sequences.
Wordfinder is a new application to find word-based matches of
limited size. It is based on code from supermatcher. The inputs
are reversed so the query sequence set (unaligned) is compared to
a streamed database of sequences. (Supermatcher should perhaps
have its inputs in this order too). Limits are provided for the
length of the word match and the length of the alignment. The
default gap penalties are also increased to limit the gaps allowed
in alignment.
Word-based algorithms found too many matches where both sequences
contains runs of X (protein) or N (nucleotide). These are now
ignored when building the word table.
Word-based algorithms complained if a sequence was shorter than
the wordsize. This was a problem for database searches with some
short sequences present. They now run silently and simply return
no word matches.
The EMBL format sequence entry parser was able to read swissprot
sequence data, but not the feature table. Efficiency improvements
to set the sequence type to nucleotide for EMBL entries showed
that swissprot entries were being read by the EMBL parser. A test
for swissprot protein information on the ID line should redirect
these entries to the swissprot parser. In previous releases the
sequence type was not set, so there was no problem with the
sequence type - although feature lines may not have been readable
from swissprot format flat files. Database definitions specify the
swiss or embl format so they are not affected.
Large sequences were running very slowly. This was traced to the
way sequence types are tested using regular expressions processed
by calls to the PCRE library. These calls were replaced by simple
string functions as they are only testing that a sequence is
entirely composed of characters from an allowed set. An
additional speedup was achieved by defining only upper case
characters as required (almost halving the number of tests) and
testing the upper case version of the sequence characters.
Sequence translation in the reverse direction adds extra amino
acids for partial codons. In the forward direction the overhang
was miscalculated so these codons were missed. No users have
complained, probably because in most cases they are translated as
'X' (it needs a 4-base wobble in the code to convert the first 2
bases of a codon into a single amino acid).
Sequence translation was relatively slow, at least on very large
sequences. Profiling with gprof indicated some changed to reduce
the number of string handling calls (each was very fast, but
there was a very large number of calls. The internal tables were
resized (from 15 elements to 16) for more efficient mapping.
Parsing NCBI format ID lines saves the database. This is available
for writing NCBI formatted output ID lines, but is not to be used
in reporting the USA.
Added "refseq" as a sequence and feature format. Initially a
simple alias of GenBank but we may let them diverge later.
REFSEQ entries have their own idea of what a ProteinID in the
feature table looks like, as they use REFSEQP protein IDs.
Validation now allows the third character to be an underscore.
Large numbers of database files could make the dbi indexing
programs (dbiflat, dbifasta, dbigcg, dbiblast) fail at the sort
merge stage when the index files are combined. The sort merge is
now in 2 steps to limit the number of open files required in the
system sort utility.
Added a script emblsplit.pl to split EMBL and UniProt database files
into 2Gbyte chunks.
The -sid qualifier now overwrites the sequence id if used. The
-sid value will be used for creating the output filename and for
reporting the sequence identifier in output files. For more than
one sequence as input currently the same ID is used. We may change
this in future to generate new IDs from this base name.
New sequence format gifasta is the same as "ncbi" but uses the GI
number as the identifier. Because the output is the same for both
formats we have to require -sformat gifasta to be on the
commandline. The default for such files will remain "ncbi" as the
automatically processed format. On output if there is no GI number
a dummy value of "000000" is currently used.
coderet now writes non-coding sequence to a new output file.
New feature function ajFeatLocMark marks selected features as
lower case. Used by coderet to report non-coding regions.
The help output now correctly reports output sequence default
filenames.
Phylip input distance matrices now allow integer values to be
treated as reals, although there is a possible confusion over
integer replicate values so the use of a trailing ".0" is strongly
recommended.
Sequences with NCBI deflines and no ID after the final "|" were
using the version part of the seqversion ("1" from "AB123456.1")
instead of the "AB123456" part to set the ID.
Graph titles were not standard on the general "graph" type output,
but are consistent for xygraph outputs. A new attribute gdesc
defines a prefix for graph titles which can be appended to by the
calling program, usually with a description of the input (sequence
USA, input filename). A new call ajGraphSetTitlePlus defines the
text to add to the gdesc as "[gdesc] of [text]". All graphs were
standardized except pepinfo which has 10 subplot titles already in
the intended format. This will be corrected later to have standard
main titles and shorter subplot titles.
The version of plplot we use has a bug in calculating character
sizes where the origin in user units is not the default of
(0,0). This has been fixed in the plgchrW and plstrlW functions in
the copy that is included with EMBOSS.
Dreg and preg ignored sequence begin and end positions. Both
programs now use the embpatlist function calls to process sequence
ranges.
Fuzznuc, fuzzpro and fuzztran lost the ability to use the sequence
begin and end positions when we switched to pattern lists. This
has been restored in the pattern list processing code.
The logfile caused a file close error if it was read only (because
it had not been successfully opened). Opening the logfile now
tests the file is writable and ignores logging for a read-only file.
More case-sensitive sequence comparison and matching functions
added to be consistent about providing both versions.
A few sequence databases have no accession number. For these a new
database attribute hasaccession: "N" in emboss.default prevents
EMBOSS trying to search the ACC field in addition to the ID field.
A few databases with duplicate IDs should be treated as
case-sensitive. The original example was a pdbprot database,
containing FASTA format sequences of individual chains from PDB
entries. In PDB, the entry itself is a 4-character string, and the
chain is a single character A through Z. When an entry has more
than 26 chains, the next 26 are labelled a through z. Pdbprot
appends these as _A, _B, etc. PDBPROT is available from some
public SRS servers - see the official list at
http://downloads.lionbio.co.uk/publicsrs.html.
This is resolved by adding a new database attribute caseidmatch in
emboss.default. A value of "Y" will force EMBOSS to exactly match
the case of the whole ID. This is done by post-processing and
rejecting entries with an ID that fails to match.
The run date included in report output has changed format to have
the day first and to lose the leading zero when the day is 1st to
9th of the month.
Program cpgplot can run on more than one input sequence, but the
plot failed on the second sequence. Fixing this required adding a
new function ajGraphDataReplaceI to replace the 1st, 2nd 3rd,
etc. subgraph. Some memory cleanup was also added to remove
the replaced graph data objects.
Programs pepwindow and pepwindowall can now process any
protein sequence. In previous versions pepwindow was restricted to
pureprotein (no ambiguity codes) while pepwindowall accepted any
protein sequence (it has to handle gaps) but was using a score of
zero for unknown amino acid residues. Changed so that missing amino
acid values can be filled in using Dayhoff frequency weighted
averages for B, J and Z and an overall average for X, J and O.
Program octanol can accept any protein sequence. Interpolated
values are used for B, Z and J. An average over all values is used
for X and also for O and U where there is no data. Interpolations
and averages used the Dayhoff amino acid frequencies.
Program iep can accept any protein sequence. Ambiguity codes B and
Z are resolved by converting to the carboxylic acid (D or E) or
amide (N or Q) according to the Dayhoff amino acid frequencies,
giving a consistent value for any input protein.
Sequence set type testing was checking whether the seqset is
defined as protein but ignoring the type of the first
sequence. This is now fixed.
Program tfm looks in the obsolete install directory with the -html
option. Changed to find the embassy package name from the
installed ACD file and then to find the installed HTML file. If
EMBOSS has not been installed, will also search the original
source files.
Modified NCBI/FASTA format to preserve the database name from the
NCBI style ID. The database name is reported in one of the many
and varied NCBI syntax variants, depending on whether there is a
version or accession number, and whether there is an EMBOSS
database name also involved (for example, an entry in a file
indexed with dbxfasta or dbifasta)
Modified "pearson" sequence format to keep the FASTA file ID
complete. For historical reasons GCG-style dbname:id syntax was
still having the db part trimmed. This will still be trimmed from
fasta or ncbi format.
The report for digest has Cterm and Nterm columns capitalized to
match the rest of the report. Sequence ranges now give correct
cterm and nterm results.
The list file Cut.index for codon usage tables was changed to
remove old file names (commented out list at the end) and to
remove underscores from the species names.
Programs water, needle, merger and prophet calculate an internal
path size from the lengths of the input sequences. For sequences
that are too long, a fatal error is produced. But if the sequences
are extremely long, the test failed and the program gave a
segmentation fault. This fix tests in a different way that will
catch all cases. (added as a fix to 4.0.0)
The new MRS access method used a general search. This gave strange
results when the ID or accession appeared in any other entry. It
appears that MRS can search for id or accession only. This worked
on the main MRS server at least. (added as a fix to 4.0.0)
New database access methods MRS and DBFETCH need to be explicitly
turned on so that showdb can report them. (added as a fix to
4.0.0)
When deleting the last line of buffered input, failed to reset the
pointer to the last buffered line. This only affected debug
traces. Unfortunately, the ajFileBuffClear function does call the
debug trace. In practice we have only seen this bug when
processing sequence data in EMBL format from an MRS server. (added
as a fix to 4.0.0)
Pattern and regular expression searches failed to correctly
reverse a nucleotide sequence. The change is to use
ajSeqReverseForce (always reverses the sequence provided) instead
of ajSeqReverseDo (which only reverses if the reverse flag is
set). (added as a fix to 4.0.0)
Reports in list format failed to write a usable USA for "asis"
sequence input, and incorrectly reported reverse strand nucleotide
features. (added as a fix to 4.0.0)
The lists files Matrices.nucleotide, Matrices.protein and
Matrices.proteinstructure now have comment headers explaining
their format. Fixed issues with nucleotide features in the
reverse direction in reports. The start/end positions were stored
the wrong way around and then reversed again when reported in one
of the report formats. However, reporting as EMBL features showed
the incorrect storage. ajFeatNewII now checks start/end and
reverses the feature if start is greater than end. ajFeatNewIIRev
sets the reverse strand and also checks that the start position is
greater than (or equal to) the end position (added as a fix to 4.0.0)
To reduce the size of very large reports, for example when fuzznuc
or fuzzpro run over very large databases, new qualifiers are added
to report output. -rmaxseq gives the maximum hits for any one
sequence, -maxall gives the total maximum number of hits. The
report tail contains a record of the number of hits reported and
found. The qualifiers are intended for web interfaces to control
the maximum output they need to report. When the maximum hits
figure is reached, ajReportWrite returns false so that programs
can terminate at that point. (added as a fix to 4.0.0)
Reports now write a header and tail when closed, to make sure that
all programs will write something to the report file. The default
header contains the command line provenance, the tail contains the
number of sequences and hits. (added as a fix to 4.0.0)
Version 4.0.0 15-jul-2006
The format of the knowntypes.standard file in the emboss/acd
directory has changed to list the knowntype first, then the
datatype and finally the description. The file should be sorted by
knowntype, and any description should not end in "file" so that
file and directory prompts can be generated.
Standard prompts can be generated from the knowntype for files,
directories and other data types. This can reduce the need for
special information: attributes, but to help those who maintain
parsers and wrappers we will try to keep an information string in
the ACD file to match the prompt generated by EMBOSS. Acdvalid
will report cases where the information string does not match the
generated prompt. There may be a few cases where two inputs or
outputs of the same knowntype are needed.
The output produced by -help provides more information about
associated qualifiers than the HTML table view (from acdtable)
which is included in the HTML documentation in the
distribution. However, there is also a lot of extra information
in the acdtable output on the default values and the allowed
values for each qualifier. The -help output is now expanded to
include all the information provided by the acdtable view. A
benefit of this is that we can now remove the badly formatted
acdtable from the text version of the documentation. This is used
by tfm so the output of the tfm program will now be easier to read.
The default prompts for input and output files have been very
simple for the first 10 years. EMBOSS now has a "known type"
defined for all files in ACD. The known type is now included in
the automatically generated prompt for input and output files. To
help in this process, the known type should not have the word
"file" at the end. This will be added automatically in the prompt.
Printing with conversion type %g could write extra zeros where the
decimal point was stripped. In C, %g conversion removes trailing
zeros and the decimal point if nothing remains after it. The AJAX
print conversion functions added extra zeros at start of the
output to extend the result up to the expected width.
Prophet modified to use an "align:" ACD definition rather than an
"outfile:". A bug which was mixing up the name of the profile with
the name of the sequence has been fixed.
Simple XML DOM added. This has no additional library
dependencies. This is a preliminary step in producing (revisiting)
XML graphics output etc.
EMBL/Genbank have agreed to add a new amino acid code 'O' for
pyrrolysine. O has been added to EMBOSS checking for protein
sequence data, and to the existing data files that contain 'U'
(selenocysteine). IUPAC/IUBMB has accepted the use of O for protein
sequences. This means that any alphabetic text is now a valid
protein sequence. There are 20 naturally occurring amino acids,
plus 'X' (unknown) 'B' and 'Z' ('D' or 'N' and 'E' or 'Q' for
analysis of complete digests) 'J' ('I' or 'L' in mass spectrometry)
plus 'U' (selenocysteine) and 'O' (pyrrolysine). There is a small
complication - older versions of phylip sometimes use 'O' as a gap
character. EMBOSS will still allow this in nucleotide sequences.
New sequence access method "mrs" uses CMBI's "Maarten's Retrieval
System" http://mrs.cmbi.ru.nl/mrs/cgi-bin/mrs.cgi to query
databases by ID or accession.
New sequence access method "dbfetch" uses the EBI's dbfetch REST
services http://www.ebi.ac.uk/cgi-bin/dbfetch to query databases
by ID or accession.
iep changed to allow users to specify number of modified
(uncharged) lysines and intrachain disulphide bridges. This
includes extensions to embIep functions to include the two new
parameters. These updates were provided by Clemens Broger of
F.Hofmann-La Roche Ltd.
Changes to splitter and union by Kim Rutherford (Artemis
maintainer at the Sanger Institute) allow features to be preserve
for nucleotide sequences. The default operation of both programs
is unchanged.
Regular expression pattern lists are accepted by dreg and preg.
The output reports include pattern names which default to regex1,
regex2, and so on. The "regex" prefix can be set using the new
associated qualifier -pname on the command line.
Prosite pattern lists are accepted by fuzznuc, fuzzpro and fuzztran.
The output reports include pattern names which default to pattern1,
pattern2, and so on. The "pattern" prefix can be set using the new
associated qualifier -pname on the command line.
Regular expressions have the same syntax as the new pattern
datatype - they can be in a file, with pattern names, and have a
qualifier -pname to set the name for a pattern. Regular
expressions also have a type defined in ACD which can be
nucleotide (e.g. for dreg), protein (e.g. for preg) and string for
general patterns. Function ajAcdGetRegexSingle will read a single
regular expression. ajAcdGetRegex now reads a list of regular
expressions.
New ACD pattern type reads a PROSITE style pattern, or @filename
where filename contains patterns with names in FASTA
format. Patterns in the file are concatenated if on multiple
lines. The file may also contain mismatch=n after the ID to set
the number of mismatches for a pattern. Patterns also have
associated qualifiers -pmismatch and -pname for the pattern on the
commandline or all patterns in the file.
Pattern processing is changed to use lists of patterns, as
submitted by Henrikki Almusa of Medical in Helsinki. This is
implemented as new ACD data type "pattern" which required some
nucleus embPat functions and data types to be moved to AJAX ajPat
so that they can be called from ajacd.c
"a2m" alignment format (which is just fasta) is now supported in
ACD.
New EMBASSY MEME package containing "wrapper" applications
providing an EMBOSS-style interface to the applications in
the original MEME package version 3.0.14 developed by Timothy
L. Bailey. The package is fully documented.
New EMBASSY HMMER package contains "wrapper" applications
providing an EMBOSS-style interface to the applications in
the original HMMER package version 2.3.2 developed by Sean Eddy.
The package is fully documented.
ACD dirlist: order of list of files is now system-independent.
fuzztran: now always generates an output file, even if there
is no data.
coderet: now writes any permutation of cds, mrna and protein
sequence output to separate files. Output file formats may
be set independently and have the default file extensions of
"cds", "mrna" and "prot".
oddcomp: New ACD option to set the window size equal to length
of the current protein. Code cleaned up.
Restrict: alphabetic sorting fixed in the case where -limit
is specified
Digest changed to add ragging option. Original code was
contributed by Gregoire R Thomas.
infoseq: code largely rewritten. Two new advanced ACD options
to specify output using a user-defined delimiter or in columns.
Output much cleaner, e.g. columns are aligned.
Digest changed to read a sequence stream (earlier versions read
only one sequence). Code for this was contributed by Henrikki
Almusa of Medicel in Finland.
Two new programs makenucseq and makeprotseq have been submitted by
Henrikki Almusa of Medicel in Finland. They create sets of random
sequences, Sequence composition can be specified by a codon usage
file or by pepstats output.
New format "swissnew", with aliases "swnew" and "swissprotnew",
added. UniProt has announced future changes to the UniProt entry
format, which is still called "swiss" in EMBOSS. The ID line had
"Reviewed" and "Unreviewed" in place of "STANDARD" and
"PRELIMINARY", and no longer has the "PRT;" placeholder for the
EMBL format "division" - now obsolete as EMBL has changed this
part of their ID line in the latest release. In EMBOSS 4.0.0 we
replace "STANDARD" with "Unreviewed" as more appropriate to
entries that come from FASTA files and other sources.
Programs which analyze nucleotide features now call ajFeatGet
functions in most places. In previous releases, some of these
programs used the internal feature data structures directly.
GFF format feature files are designed for nucleotide
sequences. EMBOSS supports the use of GFF for protein sequence.
Feature keys (to use the EMBL/Genbank feature table term) are now
defined with external names for each format and a list of internal
names to be used by EMBOSS. This greatly simplified the
conversion of SwissProt and PIR feature tables. The internal table
also has a list of aliases. The internal aliases for nucleotide
features are as far as possible identifiers from the Sequence
Ontology SOFA (feature annotation) subset. In a few cases, where
multiple EMBL/Genbank terms map to a single SOFA term, new terms
have been added to extend the SOFA name uniquely (we simply append
the EMBL/Genbank feature key).
MSF format files with more than 5000 sequences were truncated on
input - only the first 5000 names were being read. This limit has
been removed. As "emma" uses MSF format for the clustalw run it
launches, this problem limited emma to 5000 output sequences in
previous releases.
The EMBL database has changed its ID line. The new line has
semicolons after each token, the primary accession instead of the
ID (there is no ID in the new EMBL format), and the sequence
version as a number. Internally in EMBOSS we continue to build the
accnum.n style sequence version. We expect most other packages
will take some time to change EMBL formats, so for output this is
called "emblnew" format. As input, "embl" format will accept both
the old and new style entries. For database indexing, dbiflat and
dbxflat will read old and new formats as "embl" by looking for SV
on the ID line. EMBL and EMBLNEW format output is also improved by
wrapping long DE lines.
Wossname will now search for each word in a phrase used as the
search text. By default, all words must match. A new qualifier
-noallmatch tells wossname to match any word in the
search. Partial word matches are accepted so "restrict" will match
"restriction". The search term is also compared to the groups and
keywords attributes in the ACD file. A new qualifier -showkey will
report the keywords to help explain why applications were matched.
All ACD files have a new application attribute keywords: which
provides keywords to search for in addition to the groups. This
is intended for keywords which are hard to include correctly in
the short description. A file keywords.standard is provided with a
list of all keywords. this is for use by utilities searching
programs by keyword, which will be expected to check the groups
and keywords attributes in a single query.
Reading a sequence of type "any" sets the sequence type to
nucleotide by default. Any x or X ambiguity codes will be
converted to 'n' or 'N' to avoid confusion in programs that will
convert a second nucleotide sequence (alignment programs, for
example). X is allowed as an unknown character in nucleotide
sequences (and N is also allowed as 'any base').
Stockholm and Selex sequence formats, used mainly by the HMMER and
HMMERNEW embassy packages, have been corrected for a few cases
where automatic format detection generated errors.
Function names in ajseq.c have been standardized. Old names are
still accepted but are marked as "deprecated" and will generate
warnings with the gcc compiler (see ajstr below). Other compilers
will see no difference.
Further correction to reversed sequence numbering for local alignments
from water and supermatcher. For these local alignments all reversed
alignments were ending at "1" because the end offset was not
calculated correctly. Matcher called a different function to set
sequence positions and reported correct positions.
For alignments with a line of gaps, adjusted the numbering to
report the last sequence position instead of the next at the start
of the line.
Program einverted output is changed to include the sequence ID
and the program input is changed to process more than one sequence
as input. The change to the output format was needed to indicate
which sequence is reported. The program is also speeded up by not
dynamically resizing the internal arrays used to hold sequence
positions.
Added additional information to "entrails" output (entrails is
built by "make check" and displays internal data to assist
developers of wrappers and interfaces). The output now includes
application attributes and reports definitions which are aliases
(with -full on the commandline).
Added -mincount option to wordcount to report only words occurring
a given number of times. The default of 1 does not change the
previous results.
Oddcomp had a number of bugs. A window size equal to the sequence
length resulted in no hits. The word size was used before reading
the input file. A match in the last possible window was missed.
Biosed modified to specify a position so it can be used to edit A
to L in position 2 (for example) in a single sequence or
throughout an alignment. Normal use is unchanged. If there is
demand, the target could be changed from a string to a pattern.
Clustal sequence format output is now version 1.83 with 60
bases/residues per line. Previous EMBOSS releases reported it as
1.4 and printed 50 bases/residues per line.
The tmap program had an upper limit of 6000 residues and 300
sequences. All fixed size arrays were made dynamic. The length
limit was exceeded by one of our users.
GCG formatted databases were found to have split entries into more
than 1000 chunks - for example human chromosome 7 in a TPA (third
party annotation) entry in EMBL. A regular expression is now used
to check for any number of subsequences in GCG data.
ajSysStrTok and ajSysStrTokR changed to match the behaviour of the
C run time library function strtok. Both now keep their internal
pointer at the first delimiter after the matched token. This only
changes the result if the delimiter set is changed on the next call.
Another code cleanup is the addition of Exit functions to all AJAX
and NUCLEUS source files that could still have static memory
allocated when a program ends. We aim to clean up memory for all
the standard memory tests in test/memtest.dat. This includes
creating a new function acdReset which resets the stats of ACD
processing so that a new ACD file could, in theory, be read once a
program has completed. All programs need to call the embExit
function at the end to call the NUCLEUS and AJAX cleanup
functions. Some of these functions will also log memory usage
statistics if debugging is turned on (-debug on the command line).
We are working through all the library code making standard
function names. Old function names will be retained at least until
release 4.0.0. They are marked with the __deprecated flag, which
causes the gcc compiler to report all uses of the old name. Other
compilers are not affected. The first set to be processed is in
ajstr.c (string and character functions).
Sequence reading from website URLs now defaults to HTTP 1.1, with
chunked blocks of data. A bug in processing small (single line)
chunks was fixed.
Report and alignment output now includes the full commandline used
to run the program, with any replies to prompts included.
Excel report format includes a column for Strand to indicate
sequences on the reverse strand. The strand column is + for a
forward feature (all protein features are forward) or - for a
reverse direction feature.
New sequence type gapstopprotein for proteins with gaps and
internal stops.
Translation functions in ajax/ajtranslate.c have been cleaned up.
New program backtranambig to backtranslate as most ambiguous
codons.
Phylip sequence format can now read sets of alignments with blank
lines in between. Such formats were produced by the new fseqboot
program and used by the new phylip programs and seqsetall in ACD.
The list of graph devices produced when an invalid device (or '?')
is given now lists only the unique devices (those defined
differently in the plplot library code) with alternative names
(xwindows for x11, for example) added in brackets. Specifying an
ambiguous device used to accept the first match found, now an
error message is given.
Prettyplot and cons were producing different consensus
sequences. Comparison of the results showed two problems. Cons was
missing consensus characters because of an error in calculating
the plurality (since fixed in prettyplot, but the library function
used by cons had not been corrected). Prettyplot was missing
consensus characters for a different reason - prettyplot has a
"collision detection" feature to skip consensus characters for
positions where more than one amino acid or base is valid as a
consensus character. This was turned on by default, when the ACD
file clearly states it should be turned off. In fixing both bugs
the two programs will give the same consensus, except for cases
where collisions occur - in these cases prettyplot may not select
the same character as cons, where both are equally valid.
Programs that write sequences need to call ajSeqWriteClose before
they exit. This forces output from sequence formats that save up
sequences in memory and write at the end. An example is MSF, which
has to wait for all sequences in order to calculate the file
checksum.
Functions that process directories now skip the '.' and '..'
directories so that '*' wildcards will work correctly.
Prettyplot has been revised. A debugging commandline option has
been removed. String commandline options have been changes to
array and select types for better validation with the same user
responses. Colours are now corrected for proteins - in version
3.0.0 and earlier the colours depended on the column order in the
matrix. Nucleotide colours follow the ABI base colours used in
abiview. The examples in the documentation showed no boxes because
of low sequence weights in the MSF format input data. The weights
have been updated to give the 'expected' results.
All programs now store the command line needed to recreate the
run. The result is logged by the database indexing programs, and
will be added to other program outputs in a future release. The
command line includes all non-default responses to prompts by the
user.
dbiflat, dbifasta, dbigcg and dbiblast set the system sort to use
normal "C" sort order. On systems where the locale is set to a
language other than English, sort can have strange behaviour. In
particular, the underscore character fails to sort in the correct
place so that indexing SwissProt/UniProt or RefSeq entries fails
to put certain entries in the correct sort position for
retrieval. There is now no need to set LC_ALL=C locally, although
this is good practice whenever sort is used.
Version 3.0.0 15-jul-2005
Gap penalty qualifiers were standardized for all programs.
water, needle and other alignment programs occasionally could
report suboptimal alignments (off by the gap extension penalty
score). The reported alignments were correct, but rearranging the
gaps could give a slightly higher score. Matcher and stretcher use
different alignment functions and were unaffected.
Cpgplot no longer has a -shift option to speed processing on long
sequences. The output was broken. We will restore it if there is
demand.
Two new variables added for developers using the MYEMBOSS package
to write their own EMBOSS programs. EMBOSS_MYEMBOSSROOT (the same
will work for other EMBASSY packages) points to the location of
the ACD files for an EMBASSY package which is not installed - as
would be the case for an ordinary user developing and maintaining
their own code using MYEMBOSS. This requires the use of embInitP
rather than embInit to pass the package name - something all
EMBASSY programs should (and will do). The second variable is
EMBOSS_ACDUTILROOT and is required so that utilities such as
acdvalid can also find the ACD files. Utilities acdvalid, acdc,
acdhelp, acdtable and acdpretty use embInit as they no nothing
about any package name.
Sequence sets (seqset and seqsetall) have a new ACD attribute
"aligned" which is true or false. If true, the sequences will be
extended with gaps and passed to the application as a full
alignment. It is assumed that they are already aligned. If false,
the application needs all sequences in memory but has no need for
aligned input. The aligned attribute is required (to help ACD
parsers) so acdvalid will object if it is not found.
embossdata now requires a filename, or an empty string to search
for all files. If no filename is given, it will prompt for one
with a default of an empty string.
acdvalid now tests the order in which sections appear in the ACD
file. The order must be: input, required, additional, advanced,
output. There are already constraints on which ACD data types can
appear in each section. All existing ACD files passed this test.
If any external ACD files have a problem the acdvalid tests can be
revised.
Sequence format "experiment" is now correctly the Staden package
experiment file format. The description is taken from the "EX"
experiment description line. EMBL line types (including features)
are allowed in this format and are supported if used before the
sequence. The accuracy values are read and stored (one per base,
using the highest base value if all 4 bases have individual
numbers) and written. These values could possibly be passed to
primer3, for example.
Staden and GCG input formats can now parse out comments from
anywhere in the sequence records.
Nexus and nexusnon output formats now correctly report the
datatype for protein alignments.
Documentation of the @data datatype header tags updated on the
developers webpages.
Coderet reports the number of CDS, mRNA and translation sequences
to an output file. Requested for easier tracing of inputs that
gave no sequences.
Nbrf (pir) input can now read from an SRSWWW server. The problem
was that SRS reports an extra ">P1;seqid" header before the
sequence. Now if there is no sequence, a duplicate header (one
with the same ID) can be skipped.
Clustal output format no longer writes in blocks of 10.
Clustal and other multiple sequence formats were unable to return
single named sequences. Fixed for all such formats.
Phylip3 output renamed phylipnon for compatibility with other
formats. The phylip3 name is retained for back compatibility. The
header for phylip non-interleaved format is corrected to that
accepted by phylip 3.6 (no need for YF on the header line, and
correct number of sequences). Documentation of these formats (for
seqret and general format documentation) has been updated.
Programs chips, cusp, prettyseq and showtran used a codon usage
table as input only to define the genetic code (amino acids for
each codon) for the table they produce. This is no longer needed
as a new AjPCod constructor ajCodNewCode can be given a genetic
code (default 0 to use the standard code) and will set the amino
acid data.
The ajCodClear function now clears all data, including the amino
acid assignments, for use in reading multiple codon usage
formats. A new function ajCodClearData clears only the data and
other values, and leaves the amino acid assignments in case other
applications may make the same assumptions.
Codon usage input filenames can now be used to set the output
filename. The codcmp program for example will no longer default to
"outfile.codcmp" for output. However, this can cause unexpected
results when a codon usage table and a sequence are read in, so
codon usage filenames are only used if no other input file (or
sequence, or feature table, or other input type) has been
read. This is done by passing a "reset" boolean when setting the
saved first input file name so that other inputs can overwrite a
name defined by a codon usage input. A remaining side effect is
that if the first input is stdin (for example with -filter on)
then a second input file can now set the default for output. The
recommendation for anyone developing wrappers is to always
explicitly set the output filenames if there is a need to know the
name for a specific output.
Codon usage tables support multiple formats. All can be read
automatically. EMBOSS will now, for example, accept native GCG
codon usage tables including those used by the codonusage and
transterm databases. The format can be specified for "codon" input
by a -format qualifier. Outcodon is now used as an ACD datatype
for writing codon usage tables, and has a -oformat qualifier. A
new application codcopy can inter-convert the codon usage table
formats. The default codon usage table format is called "emboss"
and includes structured comments to identify the species, database
release, database division, number of CDSs and codons, and GC
content. These values are calculated of searched for in the text
within a file for other formats.
In the emboss.default and .embossrc files the same name can be
used for variables, databases, and resources. In previous versions
a single table was used and name clashes could occur. This becomes
an issue with the increasing use of resource definitions.
Colours for abiview set to the ABI standard colours.
Sequence types explicitly set in source code for cons, sixpack and
backtranseq. GCG format output was showing nucleotide instead of
protein sequence type.
Correction to reversed sequence numbering for local alignments
from water.
Version 2.10.0 03-Jan-2005
Profile analysis with gprof indicates that the regular expressions
(and the PCRE library) are very inefficient. Wildcards in regular
expressions lead to millions of recursive calls to the match
function. Although they are very readable for code maintenance,
replaced them for EMBL sequence and feature reading to get about a
4-fold speedup. Profile analysis will continue up to version 3.0.0
Feature table updated for nucleotide sequences to
EMBL/GenBank/DDBJ version 6.2. A few obsoleted qualifiers.
tranalign now allows for the proteins to have Methionine residues
at the start which now match a START codon in the corresponding
nucleic acid sequence.
diffseq has a new option '-global' which makes it treat the whole
of the sequences as regions to be aligned, rather than the
default which looks for the longest region of overlap and only
reports differences within that overlapping region. This new
option is useful when looking at protein and mRNA sequences
which are expected to align over their whole length.
Alignment output issues resolved. Specifying begin and end of
input sequences now works for all alignment formats. Markx formats
have been rewritten as the original code we used has nasty
dependencies on global variables which we struggled to reproduce
for all cases. The rewritten code is much simpler. Note that the
gap penalty reported by markx10 format is the EMBOSS
penalty. Markx10 as used in the FASTA package subtracts the gap
extension penalty from the gap penalty ... and adds it back when
calculating.
transeq failed to check sequence ranges in list files
correctly. It was only using the range from the first sequence if
the USA included a start and end. The range is now reset for each
sequence.
remap (and other programs that display translations) had problems
with masking ORFs (using strange characters instead of '0'),
caused by bad calls to an AJAX function.
Entrez added as an access method. Sequence format must be
genbank. Server URL is hard-coded at NCBI (for now). Works by
finding GIs GenInfo Identifiers) that match the query, and then
retrieving them one at a time. This is still a prototype - more work is
needed. Note that apparently Entrez cannot retrieve by LOCUS (id).
Seqhound added as an access method. Sequence format must be
genbank. Needs a URL to find the server. Works by finding GIs
(GenInfo Identifiers) that match the query, and then retrieving
them one at a time. This is still a prototype - more work is
needed. Some Entrez error conditions are less graceful in
SeqHound. Des and Key searches are turned off until SeqHound adds
indexing for these. Org searches work, but require the numeric
taxon ID. This is not friendly, so we are looking for a way to get
the taxid from the species or genus.
Direct access databases now support exclude wildcards. The syntax
is as for emblcd indexing, but only files listed in filename are
included.
Database names must be letters, numbers and underscores
only. Reading emboss.default and .embossrc now generates a warning
message for any bad database name. Bad names were ignored by USA
processing, leading to confusing results.
seqretsplit has a new -feature option (as for seqret)
noreturn can write files for PC or Mac file systems using a new
-system qualifier.
FASTA format sequence files with a sequence ID starting P1; were
assumed to be PIR format. These can now be read as FASTA, assuming
that PIR format has already been tested for.
Sequences with zero length were accepted. Sequences must now have a
length of at least 1. Some user scripts could create FASTA format
files with no sequence, or with the sequence on the ID line. These
can crash many programs, including a core dump from clustalw
(through emma).
Added a calculated attribute "haslengths" to (phylogenetic) tree
input in ACD for use in phylipnew interfaces
Wossname and seealso have a new commandline option -showembassy
which defines one embassy package to be shown. The main use is in
finding applications when automatically building the
documentation, but end users and interface builders may find some
uses for this option too.
Added an "embassy" string attribute to the application in ACD so
that wossname can find whether an application is in EMBASSY or
not. Wossname was depending on the source directory, but could not
distinguish between EMBOSS and EMBASSY ACD files once they were
installed.
The EFUNC and EDATA databases have been enhanced to provide better
views and links within SRS. The new versions are available at both
HGMP and EBI. In future, EBI will probably become the sole site
(as HGMP/RFCGR is closing in 2005).
The official EMBOSS website has moved to emboss.sourceforge.net
which includes redefining links in applications and major
modifications to the scripts which maintain the application web
pages. The sourceforge web pages are now committed to CVS under
doc/sourceforge. The pages on sourceforge itself can only be
modified by registering at sourceforge and joining the emboss
project.
Version 2.9.0 15-Jul-2004
ajListMapRead and ajListstrMapRead functions for read-only lists.
As an added check, the functions these call for each element have
a different prototype.
ajStrStr function now returns const, as do various 'Get' functions.
The few cases where a true char* is needed must now call
ajStrStrMod with the AjPStr passed by reference so that we can
check it is being modified. All calls to ajStrStr in EMBOSS and
most EMBASSY packages have been resolved to compiler remove
warning messages. ajStrFix also needs the AjPStr passed by
reference.
tfm -html now gives full path to image files.
Remove need for the definition of PLPLOT_LIB.
Add configuration for cygwin dlls.
Allow filenames of the form drive:/filename for cygwin.
Fixes for list files with sequence ranges in the USAs. The
sequence input object is now reset during list processing.
Sequence sets with begin and end positions are now automatically
trimmed on input. This applies for example to list input with
ranges in the USAs for programs such as polydot which were
previously reporting the entire sequence.
graph output now has the default title including the date in
dd-mmm-yy format instead of the unreadable dd/mm/yy format.
Align output for seqmatchall (like wordmatch). The algorithm is
not maintaining the sequence accession and description
information. They may be restored in a future update.
infoalign now also displays the weight of the sequences in the
alignment. This can be turned off using '-noweight'.
New output types in ACD for all input data types, including those
for phylogenetics and protein structure data. Initially these are
a new AjPOutfile type with a defined format (fixed until any of
them has a choice).
Programs that produce graphics or text (outfile) output now
by default will not create the outfile if there is a graph (done
by setting the nullok attribute of the outfile).
Acdvalid now checks for incomplete ACD types and attributes.
trimest now has the option '-toplower' which changes the
poly-A tail to lower-case instead of cutting it off.
new ACD attribute 'relation' added to all ACD types. This will
hold some information about how output data types relate to inputs
and parameters. The syntax of the string is not yet clear. Running
of EMBOSS programs will not be affected - the relation string is
defined for web services and related wrappers to maintain
provenance better.
New ACD function oneof added, syntax is @($(var)=={a,b,c}) to test
for a choice of menu options. Intended to clean up some ACD files
- but they are already clean so it may not be useful. At some
stage the unused ACD functions should be declared obsolete for
simplicity (and efficiency). We will leave the code in place, but
remove them from the list of functions tested.
acdvalid now tests the knowntype attribute for strings. ACD files
have been cleanup up to give knowntypes for all strings (defined
in knowntypes.standard) or to convert strings to datafile or other
ACD types as appropriate.
showfeat now has the qualifier '-annotation'. This allows you to
add your own brief annotations of regions on the displayed
figure.
remap now has has the option '-frame' which allows you to specify
a list of the frames to be translated and displayed.
Major cleanup of @data documentation. Added @datatype for typedef
data types (e.g. AjBool). Checking all have attributes, and all
attribute names and types match. Comments in the code are moved to
the @attr documentation. Added an @cc documentation line for
comments.
Eprimer3 has been changed so that it runs a separate child process
of primer3_core for every sequence. This is to cure a problem
seen when more than about 23 sequences were input, in which there
was some blocking contention between the input and output streams.
Major cleanup of ACD files to match acdvalid standards. Featout
qualifiers are now -outfeat, which means all output start with
-out but it does clash with -outfile so -outf is not always usable
as an abbreviation.
Options for emma have been cleaned up. -insist is no longer used
(use -sprotein instead) and -slowfast is now a simple boolean
-slow. Both changed lead to a much cleaner ACD file.
Options for eprimer3 have been cleaned up. New options -primer
(true) and -hybridprobe (false) make the dependencies far
simpler. The default task is now 1 (same as the old zero) and the
-hybridprobe option is needed to calculate the hybridization
probes. This removes a lot of dependencies on tasks 1 and 4
(hybridprobe) and not-task-4 (primer)
New AjPDir to hold directory path and default extension. Intended
for domainatrix applications. This requires changing
ajAcdGetDirectory to return an AjPDir and providing
ajAcdGetDirectoryName to return the path as a string. Several
programs were changed to reflect this changed call.
New ACD type outdirectory for a directory to which files will be
written. Must have a knowntype describing the files that will
appear there. Expected qualifier name is -outdir.
compseq now has the option '-calcfreq'. This makes it calculate
the expected frequencies of the words in the sequences from the
observed frequencies of the single bases or residues in those
frequencies.
HTML data from remote sites is becoming more complex. EMBOSS now
makes a first pass to look for a single preformatted block and
accepts this as the data (thus avoiding horrors such as the Entrez
headers and javascript which NCBI's search service includes).
At the same time, an old fix to patch SRS 6.1.0 output has been
removed as this clashed with the new code.
Optional outputs have a new behaviour. With nulldefault defined,
an output is, by default, turned off and will return a NULL value
to the calling program if nullok is set. Setting the value to ""
on the command line will now ask for the standard filename to be
generated. The "missing" attribute, if defined, allows simply
-qualname on the commandline to request the default filename,
although care must be taken to avoid anything following the
qualifier appearing to be a filename. This means the qualifier
must be last on the commandline, or must be followed by another
qualifier.
Indexing programs dbifasta and dbiflat no longer store the source
directory in the division.lkp file - directory is specified in the
database definition. This was only done originally to share index
files with "efetch" at the Sanger Centre. With index files and
data files in the same directory (as for efetch) it is not needed.
All ACD files revised for new acdvalid checks.
New ACD section "additional" added for qualifiers with
additional:"Y" defined. These have been put in the "advanced"
section until now. Acdvalid checks that these qualifiers are in
the appropriate section.
Acdvalid now checks that qualifiers are in the expected
section. All input qualifiers (including cfile and datafile) are
now in the input section, all output qualifiers are in the output
section. All (remaining) standard, additional and advanced
qualifiers are in the "required" "additional" and "advanced"
sections.
New ACD type "toggle" added. This is the same as "boolean" but is
allowed in any section by "acdvalid" checks. Toggle is to be used
for ACD qualifiers that "toggle" (turn on or off) other
qualifiers. An example in many ACD files would be "-plot".
Cirdna and lindna now dynamically allocate memory. For simplicity
they do still have an upper limit for the number of groups and
labels per group, but no longer have static arrays.
Version 2.8.0 30-Nov-2003
tfm accepts the PAGER environment variable. It can be overridden
by EMBOSS_PAGER.
Fix for HTTP 1.1 lines for MacOSX added (Cedric Rossi).
The home directory ~/.embossrc file can be turned off with
"setenv EMBOSS_RCHOME N" This was added for cleaner QA tests
but may have other uses.
Report format output added (by Henrikki Almusa) for dreg, preg,
recoder and silent.
pestfind renamed to epestfind and handling of terminal water
residue adjusted.
Align formats: Added "tcoffee" as a valid -aformat which writes a
T-Coffee library file suitable for input as -in=Lfilename to
T-Coffee.
Pepstats: added molar extinction coefficient and extinction
coefficient at 1mg/ml for A280.
Nexus format sequence input added, with new functions to parse all
standard nexus files. Later releases will accept nexus format for
other input data.
Jackknifer, Mega, Treecon Mase and Fitch formats parsed, at least in
their EMBOSS output forms.
Underscores are allowed in accession numbers and sequence versions
to handle REFSEQ fasta format entries.
New function ajRegPre returns the original string before the
regular expression match.
New function ajStrArrayDel deletes a string array.
New functions ajListstrToArrayApp appends strings in a list to the
end of a string array.
Sequence input changes: Allow '?' as a valid character (it has
been seen in phylip sequences) for 'unknown' and convert to X for
protein (or any) and 'N' for nucleotide. Note that this can give
an X or N depending on whether the program accepts nucleotide only
or any sequence. We may find a cleaner fix, but it would depend on
knowing the sequence type.
Added binding factor output to tfscan plus option to specify a
custom data file
Removed the Henry Spencer regular expression libraries. There were
a few calls to the ajPosReg functions, but only to test it worked
the same way as ajReg. Added a case-insensitive ajRegComp and
ajRegCompC (which the ajPosReg functions had) using
PCRE. Farewell, Henry. You were a great servant to EMBOSS.
Water S-W alignment program no longer truncates some matches
Vector arithmetic added to ajax library.
Compilation now uses large file handling by default. To disable use
--disable-large when configuring. An effect is to make the default
size of ajlongs 64 bits.
Pepstats modified to allow multiple sequences
Major (well, obvious impact on ACD authors) ACD change - the
"required" attribute is renamed "standard" and the "optional"
attribute is renamed "additional". They have exactly the same
functions as before. The change is to (hopefully) make their
meaning more obvious to those developing ACD parsers and wrappers
for EMBOSS. ACD attribute "standardtype" clashed with "standard"
and is renamed "knowntype".
ACD attributes have been added for applications and for all ACD
types to make wrappers easier to control. These new attributes are
specifically for SoapLab from EBI, and need not have any impact on
other wrappers (SoapLab uses ACD to define non-EMBOSS applications
and needs extra attributes to define some additional properties).
pepinfo now writes to a file with a standard output filename of
(sequenceid).pepinfio instead of pepinfo.out
Completed the standardization of ACD definitions, using "acdvalid"
to remove all errors and allowing only selected and hard to avoid
warnings to remain. The warnings are for calculated "required" or
"optional" definitions (simple true/false relations to another
boolean are accepted). In particular: all essential inputs and
outputs are parameters, with standardtype defined. Non-essential
inputs and outputs have the nullok attribute set. Information
strings are defined only where there is no standard prompt.
The definition of AjPStr and other "pointers to structs" is
causing strange problems in specifying "const" for structs that
are unchanged by function calls. In summary, it appears (for all
compilers we tried) that "const" only knows it is for a pointer if
it can see the "*" in the type. This means, for example, that
"const AjPStr" failed but "const AjOStr*" worked. With "const" if
it knows it is a pointer, it makes the data structure
constant. Otherwise it makes the pointer itself constant, the
equivalent of "AjOStr* const". We fixed this by changing AjPStr to
be a #define of AjOStr*. This has the advantages that most code is
unaffected and that const now works as expected. The only code
changes we needed are lines with multiple AjPStr definitions
(which is anyway deprecated), for example "AjPStr astr, bstr"
which clearly fail when you think about the #define (astr is an
AjPStr, but bstr is now an AjOStr and will give strange compiler
errors). We may change this again to define a separate const data
type for each struct, but probably the #define is a good solution
and we expect to stay with it.
PCRE is now the library of choice for regular expressions. This
allows the full Perl regular expression syntax, and was very easy
to integrate. Regular expressions are used internally for parsing
and for manipulating strings such as file and directory names, and
also for matching by programs such as dreg and preg.
The previous Henry Spencer library functions are renamed from
ajReg to ajHsReg. The Posix version of the Henry Spencer library
remains available as ajPosReg but may be removed as it was not
used by the EMBOSS distribution, and PCRE can provide the same or
higher functionality.
acdpretty now writes the name of the output file to standard
output. For example "Created seqret.acdpretty".
The ACD qualifiers -acdpretty -acdtable and -acdlog are
removed. Programs acdpretty and acdtable do the first two tasks
(in the same way as before). To turn on the acdlog file, use
environment variable EMBOSS_ACDLOG.
Graphs can now use "-graph data" to produce files compatible with
the Staden package's spin2 and spin GUIs. This makes some ACD
options obsolete, especially the various -data and -outfile
combinations. Banana already wrote an output file which caused
some confusion in these options. The outfile and the graph are
both produced by default, but have the nullok attribute and can be
turned off with -nooutfile or -nograph on the command line.
graph and xygraph output can now be optional - the ACD files can
have a nullok: "Y" attribute which allows -nograph on the command
line.
In ACD files alternatives for protein and nucleotide input are
common. Added an automatic variable $(acdprotein) which is defined
as the calculated ".protein" attribute of the first input
sequence(s). The value will be "Y" or "N". Acdvalid will check
that this is how proteins are tested, so the original
"$(asequence.protein)" syntax will become obsolete. The intention is
that any wrappers can use this to make protein and nucleotide
versions of the ACD file, and in general to use only simple
boolean tests in calculated ACD values.
Added wait call to wait for a piped command to complete
before reading data (needed for listfile input with
many piped reads, for example getz calls from SRS databases.
Version 2.7.1 03-jun-2003
Corrected Jemboss for displaying emma & prettyplot forms
Corrected display of recognition sequence for restrict -solofragment
Version 2.7.0 01-jun-2003
Standardtype attribute added for filelist in ACD
Datafile for mwfilter changed from string to datafile ACD type.
A new test application acdvalid will check for deprecated ACD
syntax and report errors for something that should be fixed, or
warnings for something still to be clearly defined. None of these
"errors" will stop an ACD file from working correctly, but they do
cause confusion to the authors and maintainers of wrappers, GUIs,
and so on.
Sequence types are extended to include new types for programs that
can handle selenocysteine.
Sequence types are simplified so that input can be converted to
the specified type. Gaps can be removed, and unsupported
characters can be converted to X for protein or N for nucleotide.
A few applications may be unable to handle any ambiguity
(pureprotein, puredna, etc.) and will require correct input. To
make it safe to run a program over (for example) swissprot or
embl, such programs should read single sequences only, or be
converted to support ambiguity codes. This may take a little
time. banana, octanol and pepwindow already read single sequences.
In need of attention are hmoment and iep.
In ACD files a new application attribute "external" is added where
a third-party tool is needed. examples include clustalw (emma) and
primer3_core (eprimer3 and primers).
ACD definitions for feature and featout now have a "type"
attribute. The feature output type defaults to the sequence type,
as for sequence output. Feature types are "protein" or
"nucleotide" or "any".
ACD sections now have "information" instead of merely "info" for
consistency.
Boundary fix for ajStrMask
Tightened up on reporting of isoschizomer groups in 'showseq -limit'
and 'remap -limit'.
Added embPatRestrictPreferred.
Added -individual option to RESTRICT. This gives the fragment
lengths produced by restriction assuming only each named RE
of the set that can cut the sequence is used. Results are
added to the tail section of the report.
Added a -equivalences option (on by default) to rebaseextract.
This option calculates an embossre.equ file using RE
prototypes in the withrefm file.
A guide to the EMBASSY package domainatrix (domainatrix.doc)
has been added to /emboss/emboss/doc/manuals
Extractfeat now has the -describe qualifier to allow it to add
the value of selected tags to the Description line of the output
sequence.
Revseq can now read in gapped nucleic acid sequences.
Removed old corba code in preparation for adding corba server as
an embassy package.
Simplified error messages for sequence reading, and corrected
handling of a bad USA as the first in a list file.
Padded temporary filename for emma to avoid clustalw bug with
short input filename (this will not work in all cases and
a corrected clustalw should be used nevertheless).
-help output modified to align all the qualifiers
acdpretty output revised to resolve to full names
Complete overhaul of all ACD error conditions. Parsing and command
line validation messages are now all used, and all tested in the
qatest suite. These tests used bad ACD files in the test/acd
directory.
whichdb failed to report error messages. They are now turned on -
and most of the common errors are reported with less verbosity.
TCODE application added. Calculates the TESTCODE statistic.
Eprimer3 now reports the primer positions using the coordinates
of the original sequence when -sbegin and -send are used to
specify a sub-sequence to consider. The input ranges, such as
the -exclude and -target ranges are always given using the
positions from the original sequence.
tfm looks for documentation in EMBOSS_DOCROOT (an environment
variable, or defined in emboss.default), then in the install
directory, and finally the original build directory.
In some cases, EMBOSS programs could terminate with an exit status
of 255 (-1). Terminating with "Die:" message exists with status 1.
All exit calls now use either 0 (success) or the standard
library EXIT_FAILURE value (usually 1).
All report output fields have a new attribute (and qualifier)
rscoreshow which defaults to "Y". Setting rscoreshow: "N" will
remove the score from the output, except for GFF where it is
required, and SRS format where it can be kept for use in standard
parsers. The aim is to exclude the score value from applications
that have no scoring method (restrict for example). For these,
putting -rscore on the command line will override the ACD file and
display the score.
Showseq and showfeat both now have the qualifier '-stricttags'.
By default if any tag/value pair in a feature matches the
specified tag and value, then all the tags/value pairs of that
feature will be displayed. If '-stricttags' is set to be true,
then only those tag/value pairs in a feature that match the
specified tag and value will be displayed.
Megamerger now has the qualifier '-prefer' which makes it use
the first sequence to create the merged sequence whenever there is a
mismatch between the two sequences.
Sirna now has the qualifier '-context' which writes the first
two bases (in brackets) of the 23 base target region.
Maskseq and maskfeat now both have the qualifier '-tolower'
which will change the masked regions to lower-case characters
instead of replacing them with a mask character.
ACD parsing internals are rewritten to find and report errors more
cleanly and to make the syntax stricter for other ACD parsers used
by (for example) GUI developers.
Sequence output types now have a 'type:' attribute which defaults
to the type of the first input sequence. For most applications
this is good enough as a default. For those which add gaps or
translate DNA to protein (or vice versa) a 'type:' attribute will
be needed. This is to improve support for automated workflow
building by more strongly typing input and output data.
acdpretty now wraps long lines of ACD definitions, splitting at
any lone backslash (which defines a newline for -help output) or
at whitespace. Attributes and sections are indented by 2 spaces.
Until now, the ACD file syntax has allowed name=value syntax and
the use of {} () and even <> for quoted strings just in case they
needed both ' and " characters. These are now removed. We believe
no ACD files were using this syntax.
valgrind.pl is a new addition to the script directory that runs
valgrind memory leak tests under linux. the tests are a copy of
those in purify.pl - they may one day move to a separate file.
EMBOSS feature output now copies (where available) the name of the
input sequence as the filename, so filenames match more closely to
the sequence output. For example, "seqret -feat tembl:paamir" will
now create 2 files called paamir.fasta and paamir.gff where the
feature file previously was called 'unknown.gff'
EMBOSS feature output defaults (as before) to GFF format, but the
default format can now be set by variable EMBOSS_OUTFEATFORMAT
All EMBOSS output files now have a default output directory
(required by some webservices implementations that run in
the 'wrong' default directory). Variable EMBOSS_OUTDIRECTORY
if set becomes the default output directory for outfile, align,
report, graph, sequence and feature output.
The output directory can also be set from the command line (or as
an ACD attribute) using the associated qualifier -odirectory
(outfile), -rdirectory (report) -adirectory (align) -gdirectory
(Graph and graphxy) -osdirectory (sequence) or -ofdirectory
(featout).
The "g*"" attributes for graph and graphxy in ACD have been deleted as
they have the same name (and function) as existing associated
qualifiers - and can still be used with these names in ACD files.
Duplicate ACD attribute and associated qualifier functions exist
in many ACD types, but usually have different names and so are
left for compatibility purposes.
emboss.default and ~/.embossrc configuration files now have
extensive error messages reporting filename and line number.
showdb has additional validation for all database definitions.
Environment variable EMBOSS_NAMVALID (boolean) turns this on for
all programs.
ajnam.c has debugging turned on by environment variable
EMBOSS_NAMDEBUG (boolean). This processing (of emboss.default and
~/.embossrc) happens before command line option -debug has taken
effect. The output goes to standard error.
Function ajFmtVPrintS is a previously missing complement to ajFmtPrintS
EMBL/Genbank feature tables updated to FTv5.0
SwissProt feature table '<' '>' and '?' location modifiers are
now handled correctly.
Added new applications acdlog, acdpretty and acdtable. Run like
acdc they provide the same functions as the command line options
-acdlog -acdpretty and "-acdtable -help" These -acd options are
now obsolete and will be removed in a future release to clean up
the ACD interface.
Transeq now has the option '-clean' that converts all '*'
characters to 'X's. This may be useful because not all programs
accept protein sequences containing '*' characters.
Version 2.6.0 20-Sep-2002
Showdb now can display the presence of any of the extra sv, des,
org, and key search fields that can be used to index and search in
databases.
Added twofeat - Finds neighbouring pairs of features in sequences.
Extractfeat - added option (-featinname) to include the name of
the feature as part of the ID name of the sequence that is
written out.
Added sirna - designs siRNA probes in mRNA.
Sigcleave sorts results highest score first.
Helixturnhelix sorts results highest score first and reports the
score position as an integer.
Added pestfind.
Moved the following programs into the "domainatrix" embassy
package:
contacts, domainer, fraggle, hetparse, hmmgen, interface,
pdbparse, pdbtosp, profgen, scopalign, scopnr, scopparse,
scoprep, scopreso, scopseqs, seqalign, seqnr, seqsearch,
seqsort, seqwords, siggen, sigplot, sigscan
Palindrome no longer reports palindromes that are only composed
of N's.
Msbar can now check that the result doesn't match a set of
input other sequences. For example you could specify that it
doesn't match the input sequence or a set of previously produced
mutation results.
Getorf reporting of circular genome positions tidied up - it now
reports positions starting in the range 1 to the sequence length
and indicates if the ORF goes through the breakpoint. A clear
indication of when ORFs are in the reverse sense has been added.
Pasteseq now behaves correctly when -sask2, -sbegin2 or -send2
are used.
Version 2.5.1 12-Aug-2002
Whichdb new option -showall to see which databases are being
searched for use where searches hang. The order of searching is
undefined - it depends on the order in which databases are
returned from the internal table, which is unrelated to the order
in which they were defined.
Wordmatch alignments save the entire sequence but use part only.
Fixed all alignment formats to work with these by adding a
SubOffset attribute.
Duplicate IDs fix. The database indexing programs skipped
duplicate IDs but did not reset the size of the entryname index
file so some queries could fail to find the later IDs in the
databases. Duplicate IDs are illegal for -nosystemsort (no easy
way to correct because entry numbers are stored internally). For
the default case duplicate IDs are merged even if they are
different. REFSEQ is the main problem area.
Writing data files used EMBOSS_DATA, or by default the install
directory. Earlier versions, if not installed, could write to the
source tree emboss/data directory. Fixed to continue if there is
no install data directory, and to check EMBOSS_DATA (if defined) is
a real directory.
Sigcleave options pval and nval hardcoded. They depend on the
weight matrix size - which is hardcoded as 15 in the ACD file and
is not checked in the program. They were introduced in EGCG in
1988 but never used because no other weight matrix length was
tried.
Version 2.5.0 25-may-2002
"fasta" format now uses the "ncbi" parser, so both formats report
"fasta" as the format. "pearson" is the old "fasta" format for a few
cases (empty IDs for example) there ncbi parsing fails completely.
SPLITTER changed to match documentation. Old behaviour is
now selectable by using the -addoverlap command line
option.
Configuration modifications. --without-x works. Removed odd
but harmless -I definitions. PNG detection improved.
Corrected EMBLCD index searching for queries that start with a
wildcard. For example, tembl-key:?* should search for all entries
that have a keyword (key:* is regarded as 'all entries'). Entries
with no keyword (in PIR's pir4.ref file for example) will be
ignored.
Updated source code docs for EFUNC and EDATA. Corrected all bad
headers. efunc.out has no errors. efunc.check only reports
'missing headers' for duplicated function names (#ifdef code)
which is a known 'feature'.
Updated source code to fix most lines over 80 bytes.
Calculated ACD attributes now QA tested. Feature attributes will
be correctly set, although none are used in the ACD files at present.
purify.pl has a new option -block=n where n is a number from 1
upwards. 1 runs the first 10 tests, 2 runs the next 10
(blocksize=10 is hardcoded for now).
Cleaned up string position code. Inspections showed ajStrPos and
related functions gave results from 0 to length of a string. This
caused confusion in many other functions and applications. These
functions are now static strPos functions because only ajstr.c had
calls to them (though the ajStrPos versions are still available).
All calls were checked for positions out of range. As a result,
many calls to ajStrAssSub and AjStrCut were fixed. ajStrInsertC
requires a value from 0 to length (start position to insert can be
before or after the string, or any position in between). Fixed by
passing length+1 to strPosII.
Added a functions ajUtilCatch for use in debugging with gdb. When
a nasty special case occurs, call ajUtilCatch and make it a
breakpoint in gdb. The resulting backtrace will give the call stack
and all variable values.
Cleaned up code for chunk HTML input. Added a new variable
EMBOSS_HTTPVERSION which defaults to 1.0 (so HTTP is not chunked)
and a DB attribute httpversion. This must be a floating point
number, and is included in the HTTP header to specify the HTTP
protocol version to be used. There is no check in the code to
change behaviour for different versions. This is used in the
SRSWWW and URL access methods.
Added check to qatest.pl to report any EMBOSS (rather than
EMBASSY) applications for which there is no defined test. The
EMBASSY test uses wossname results, checked against the names of
ACD files in the source tree, as qatest always runs in the test/qa
directory.
Allowed sequences as values for EMBL rpt_unit feature qualifiers
because so many entries have them. They are illegal according to
the Version 4.0 (current) feature table document.
Allow ? before from and to feature locations in SwissProt. For
now, these are ignored, though we could add something to hold them
for accurate output.
Added modified Harrison solubility probability to PEPSTATS
ACD attributes now have descriptions in the ajacd.c code which are
reported by 'entrails'. All ACD attributes have been checked by
inspection of the code to note those which are used/unused by ACD.
The ACD "type" attribute for files is renamed "standardtype" to
reflect its intended use to note standard file types for linking
applications. Sequences and alignments still have a "type"
attribute for protein or dna sequence types.
Aaindexextract (new) reads the AAINDEX database and writes each
entry to data/AAINDEX directory. New function ajFileDataDirNew to
read data files from a named directory. New ACD datafile attribute
'directory' passed to ajFileDataDirNew. AAINDEX directory defined
for pepwindow and pepwindowall.
Palindrome can now read in multiple sequences
Palindrome now does not print a '|' in an alignment where there
is a mismatched pair of bases.
Added filelist datatype to ACD
Mwcontam program added. Displays molecular weights that are common
across a set of files.
Showfeat - added '-sort join' to display joined features on one line.
Diffseq - don't give summary of SNPs if the sequences are proteins.
Inclusion of stat64 and readdir64 for offsetbits=64 (ajfile.c
and ajsys.c)
Workaround for broken Solaris readdir64_r (jembossctl)
Infoseq can now optionally display GI and Sequence Version numbers.
Notseq can now read in a file of sequence names.
Added '-alternative' qualifier to transeq to allow reverse frame
translations to be done using the codons counted from the start
of the reversed sequence, rather than, by default, using the
codons of the corresponding forward frame.
Added the qualifier '-join' to the program extractfeat.
If '-join' is set then joined features, such as 'CDS' and 'mRNA'
are output as a single concatenated sequence.
Changed the default output filename from 'stdout' to a file for
the following:
infoalign
megamerger
merger
showalign
showfeat
showseq
textsearch
Lindna/cirdna can now draw filled boxes and the user can change the
text size on the command-line. They can also read and display
complete genomic sequences.
Major new revision of protein structure applications - w/o full
documentation.
New applications have been added:
pdbparse.c / acd
scopseqs.c / acd
scopnr.c / acd
seqsearch.c / acd
seqwords.c / acd
seqalign.c / acd
hetparse.c / acd
scopreso.c / acd
scoprep.c / acd
profgen.c / acd
funky.c / acd
hmmgen.c / acd
fraggle.c / acd
Some applications have been deleted:
scope.c / acd
nrscope.c / acd
psiblasts.c / acd
swissparse.c / acd
alignwrap.c / acd
dichet.c / acd
The deleted applications have been replaced as follows:
coordenew --> pdbparse (coordnew was deleted a while back)
scope --> scopparse
nrscope --> scopnr
psiblasts --> seqsearch
swissparse --> seqwords
alignwrap --> seqalign
New versions of code have been committed:
pdbparse.c / acd
domainer.c / acd
contacts.c / acd
interface.c / acd
pdbtosp.c / acd
scopparse.c / acd
scopreso.c / acd
scopseqs.c / acd
scopnr.c / acd
scoprep.c / acd
scopalign.c / acd
seqsearch.c / acd
seqwords.c / acd
seqsort.c / acd
seqnr.c / acd
seqalign.c / acd
siggen.c / acd
sigscan.c / acd
sigplot.c / acd
hetparse.c / acd
profgen.c / acd
funky.c / acd
hmmgen.c / acd
Plus
ajxyz.c / ajxyz.h
Short summaries of the applications are as follows:
pdbparse - Parses pdb files and writes cleaned-up protein
coordinate files.
domainer - Reads protein coordinate files and writes
domains coordinate files.
contacts - Reads coordinate files and writes files of
intra-chain residue-residue contact data.
interface- Reads coordinate files and writes files of
inter-chain residue-residue contact data.
pdbtosp - Convert raw swissprot:pdb equivalence file to
embl-like format.
scopparse- Converts raw scop classification files to a
file in embl-like format.
scopreso - Removes low resolution domains from a scop
classification file.
scopseqs - Adds pdb and swissprot sequence records to a
scop classification file.
scopnr - Removes redundant domains from a scop
classification file.
scoprep - Reorder scop classification file so that the
representative structure of each family is
given first.
scopalign- Generate alignments for families in a scop
classification file by using STAMP.
seqsearch- Generate files of hits for families in a scop
classification file by using PSI-BLAST with
seed alignments.
seqwords - Generate files of hits for scop families by
searching swissprot with keywords.
seqsort - Reads multiple files of hits and writes a
non-ambiguous file of hits (scop families file)
plus a validation file.
seqnr - Removes redundant hits from a scop families file.
seqalign - Generate extended alignments for families in
a scop families file by using CLUSTALW with seed
alignments.
siggen - Generates a sparse protein signature from an
alignment and residue contact data.
sigscan - Scans a signature against swissprot and writes
a signature hits files.
sigplot - Reads a signature hits file and validation file
and generates gnuplot data files of signature
performance.
profgen - Generates various profiles for each alignment
in a directory.
hmmgen - Generates a hidden Markov model for each alignment
in a directory.
hetparse - Converts raw dictionary of heterogen groups to
a file in embl-like format.
funky - Reads clean coordinate files and writes file
of protein-heterogen contact data.
Updated "make check" program entrails. Corrected sequence format
reports, added report and alignment formats and database access
methods.
Added scripts/logreport1.pl to report EMBOSS usage from the
logfile. Takes the logfile name on the command line. Reports
total use, most active user, and total user count.
Extractseq now only reads one sequence as input.
Version 2.4.1 14-may-2002
Fixed error reading multiple databases
Fixed MacOSX reading of incomplete sequence files
Fixed indexing of REFSEQ
Version 2.4.0 11-Apr-2002
New Jemboss authorizing server code. This uses a new set-uid
program (jembossctl) to perform tasks as the user.
New alignment output format "match" for wordmatch, reports the
length, sequence names, and range in each sequence.
emboss.default.template has been changed to include the new SRSWWW
access method and the fields definitions for the test databases.
In dbiblast, renamed the -filename option -filenames to match the
other dbi indexing programs, and because wildcard filenames are
supported.
Removed the -staden option for the dbi indexing programs. This had
no effect (it was originally included to rename files as
division.lookup for use by internal utilities at the Sanger
Centre).
In qatest.pl test script, added test for missing expected file.
Only seen for obsolete secondary output files, no tests were passing
that should have failed.
Script (scripts/dbilist.pl) to report the contents of EMBLCD
database indices created by dbiflat, dbigcg, dbifasta or dbiblast.
Proxy HTTP access for remote servers. Define EMBOSS_PROXY as an
environment variable, or in emboss.defaults. Can also be set for
any database as proxy: "hostname:port" or overridden with
proxy: ":" to use a local server for a database. This is used by
both the URL and SRSWWW access methods.
New ajListUnique function to remove duplicate nodes in a list.
New embxyz.c / .h embXyzSeqsetNRRange functions added
Report format "table" is the default for several applications. In
this format, the sequence USA has been removed because it already
appears in the sequence header part of the report. A new format
"-rformat nametable" will produce the previous report output for
users who are relying on parsing it.
Output files defined with the "nullok" attribute in ACD are not
created unless requested. The file name and extension are ignored.
It is possible to add a new associated qualifier to control this
behaviour, but its use may be confusing with more than one output
file.
Precision attribute for report score (default is 3). Other
floating point report values are written as strings by the
original application so their precision is defined in the
code. The score is a float, as part of the internal (GFF) feature
structure. A zero value produces an integer score (strictly, it
uses %.0f as the format). Set precision for etandem, fuzznuc,
fuzzpro, fuzztran, patmatdb, patmatmotifs (integer scores) and
restrict (no score)
Report output for equicktandem and etandem, with -origfile to
write the original output format for sites (Sanger for example)
who still require it. By default, the origfile output file is
not created.
Report output for patmatdb and patmatmotifs. For patmatmotifs the
prosite documentation appears in the report footer, with the
addition of the motif name and the number of matches in the
sequence.
Report headers and footers automatically trim last newline.
Reports in -rformat SeqTable right-align numbers.
Report output for marscan (-rformat GFF by default)
Report output for fuzztran (-rformat table with the translation
included as a report field). Using -rformat seqtable with fuzztran
now also shows the original DNA sequence.
Report output for fuzznuc and fuzzpro (-rformat SeqTable by default)
New report qualifiers -raccshow to include accession in header
and -rdesshow to include description in header
Two access methods "file" and "offset" were defined as valid in
database definitions, but are really reserved for simple file reading.
They are removed from the database access methods list.
Two access methods "cmd" and "nbrf" are obsolete (cmd was never
implemented, nbrf is replaced by gcg which includes a query
mechanism). Both are removed from the database access methods list,
and the source code is commented out.
SRS, SRSFASTA and SRSWWW database access can read all entries This
is not recommended for SRSWWW access because it will read
everything into memory - all of EMBL for example - then strip out
HTML tags before reading. For SRS it is not recommended because
"methodall: direct" is faster. For SRSFASTA it is necessary
because using SRSFASTA implies EMBOSS does not read the original
data format. However, not implementing an "all" search left a gap
in the SRS access methods which would generate a bad SRS command
line or URL.
NBRF sequence reading trims last character only if it is '*'
to catch cases where SRS reports the sequence as 'plain'
GCG database text has the spaces in ". ." strings removed.
Database entry text and sequence saved for binary formats (GCG, BLAST)
for use by entret and other applications
Dbiblast indices with split databases (formatdb -v) fixed for reading
all entries (was only reading the first file)
Dbiblast and dbigcg indices support exclude and file definitions
to create database subsets
Database include and file definitions can use the simple filename.
In some cases the full path was used. Database files are checked
both with and without the directory path for back-compatibility.
srswww access method created to query a remote web server.
Preferred to using URL access as SRS queries can be built
Sequence objects include the SeqVersion, Keyword list and Taxonomy
list.
The GI number is read as an alternative SeqVersion where it is
available (GenBank and some NCBI formats). The GI number is
reported in GenBank format if available, but the GenBank VERSION
line may have only the SeqVersion if, for example, the sequence
was read from an EMBL entry. "sv" queries check both the
SeqVersion and GI number.
Accession numbers have a strict definition, which covers the old
and new EMBL/GenBank format, SwissProt, PIR, and REFSEQ
(NM_nnnnnn). Earlier versions would accept any "accession number"
in some sequence formats, especially NCBI format.
SeqVersion (EMBL SV line, GenBank VERSION line) is used in preference
to accession number where available. Can also be read in FASTA
and NCBI formats. Where only the SeqVersion is available, the
accession number is generated.
USA queries implement searches by SV, DES, ORG and KEY. These work
with SRS access methods (SRS, SRSFASTA, SRSWWW) by building SRS
queries, and with direct access (simple file reading) by
testing the sequence object.
Key and Org queries are for full keywords (including spaces) and
for each level of the taxonomy.
Des queries, if the access method does not provide a mechanism,
(if the access method does not have its own index) are applied to
words within the description. Words start with a letter or number,
and end with a letter or number. SRS typically does the same, but
allows a single quote at the end. This catches words such as 3'
and 5' but is a problem with some quoted text.
Queries for ID ACC SV DES ORG and KEY are valid for all file
access methods, including URL, external, cmd, app, file and by
default any new method added. If the internal query data is not
flagged by the access method (to show the database has been
queried) the sequence object is automatically tested.
Missing description, keyword, organism, or seqversion fields cause
queries to fail if they are used on inappropriate data.
Dbiflat, dbigcg dbifasta and dbiblast can index the new
fields. All fields are available in dbiflat and dbigcg. The sv and
des fields are available in dbifasta and dbiblast. If any specific
formats make it possible to parse the org (or key) field they can
be added as new formats.
The new EMBLCD index files are named as follows: des for the
descriptions (no obvious standard name), seqvn for the seqversion
(no obvious standard name), keyword for keywords (EMBLCD
distribution name) and taxon to organism (EMBCD distribution
name). The EMBLCD distribution also included a freetext index
which is similar to the SRS alltext search so we did not use the
name for the description index.
We are working through the EMBLCD format documentation to make
EMBOSS indices more compatible. For example, all tokens in the TRG
index files should have trailing spaces. We use a NULL to mark the
end of the string.
EMBLCD index files now expand to fit the longest token, including
the entryname index which was limited to 12 characters (only one
site reported a problem with this in dbifasta with long ID names).
A new qualifier -maxindex sets an upper limit (25 is recommended)
to limit the size of all index files. Currently this applies to
all indices. We can add separate maxima for each field if
needed. We expect very few sites to use the extra index fields
as SRS is a simpler alternative.
New database definition token 'fields' with a list of indexed fields
can be set to 'sv des org key' for SRS databases.
USAs check the query field against the database 'fields'
definition. ID and ACC are always allowed. dbname:name still
searches ID and ACC (no change from previous version)
USAs with a filename can include the new query fields. The syntax is
filename:field:query for example empro.dat:id:eclaci (the extended
syntax is because empro.dat-id:eclaci looks like a filename ending
in -id)
Application 'tranalign' added.
This aligns nucleic coding regions based on a set of aligned proteins.
Version 2.3.1 07-mar-2002
Est2genome fixed for large alignments (over 40Mbase for
est * genomic sequence length).
Sequence reading for ABI files fixed (and selex files tested).
Genbank feature input working.
Pepinfo PNG output larger to make the text readable (only affects
PNG output).
Empty sequence file input fails gracefully.
Empty sequence input fails gracefully (and only needs one
^D from stdin).
Version 2.3.0 03-mar-2002
Seqretall, seqretallfeat and seqretset moved to 'make check'.
Seqret has all the functionality of the above.
Fix for NBRF accession number reading (ajseqread.c).
Whichdb program added.
Fix for dbifasta and wormpep.
Fix for problem reading plain format sequences by primer3.
Primer3 renamed eprimer3 to avoid conflicts with the Whitehead's
Primer3 version 3.0.6.
Transeq's '-frame' can have a list of values, as: '-frame=1,2,3'.
Non-existent files in lists are again ignored.
Various wildcard database search fixes.
ESIM4 added as an embassy package.
Version 2.2.0 12-Jan-2001
New applications:
Biosed, Contacts, Dichet, Psiblasts,
Scopalign, Sigscan, Siggen.
Configure tidy.
Alignment report fixes.
Version 2.1.0 24-Dec-2001
Jemboss.
More formats for reports and alignments.
Version 2.0.1 29-jul-2001
Release of HMMER as an embassy package.
DBIGCG bugfix
Version 2.0.0 15-jul-2001
New feature table handling etc.
Version 1.13.1 25-may-2001
Fix emboss.default.template problem
Version 1.13.0 24-may-2001
New applications showalign and embossversion.
Prophet fixed.
Version 1.12.0 17-Apr-2001
New applications distmat and cai.
Version 1.11.0 10-mar-2001
New applications charge and degapseq.
Version 1.10.1 26-Feb-2001
Bug fixes of marscan, getorf and garnier
Version 1.10.0 18-Feb-2001
New applications scope, nrscope, domainer.
Initial large file model support.
Version 1.9.0 22-Jan-2001
New applications abiview and recode.
Linked list and string iterator code rewritten.
Version 1.8.0 20-Nov-2000
New application coderet.
Corba test routines
Version 1.7.0 31-Oct-2000
New application entret.
GCG output style changed.
Fixed -slower & -supper input options for multiple sequences
Version 1.6.3 25-Oct-2000
Further mods for seqed files.
Rewrite of profile core routines.
Added %id, %sim and fasta output to needle and water.
Version 1.6.2 23-Oct-2000
Now reads GCG seqed mangled files.
Phylip output fixed.
Numerous minor changes.
Version 1.6.1 11-Oct-2000
RedHat Linux 7.0 fpos_t fix
Version 1.6.0 06-Oct-2000
New application cons.
Version 1.5.6 3-Oct-2000
URL access handles new SRS6.07* format.
Library and applications leak-free.
Error messages made less daunting.
Version 1.5.5 28-Sep-2000
dbigcg changes for genbank.
Memory leaks plugged.
Version 1.5.4 23-Sep-2000
Added blast multi-volume support for database indexing.
More gui hints in ACD files.
Version 1.5.3 18-Sep-2000
LinuxPPC support added.
Version 1.5.2 5-Sep-2000
dbigcg changes for embl database in GCG format.
Version 1.5.1 09-Sep-2000
Changes to graphics data output for GUIs.
Version 1.5.0 07-Sep-2000
New application emowse.
Version 1.4.3 03-Sep-2000
tfm corrected.
HTML documentation corrected.
More GUI work.
Version 1.4.2 29-aug-2000
Changes to graphics data output for GUIs.
Version 1.4.1 25-aug-2000
Minor library changes.
Version 1.4.0 20-aug-2000
New application silent
Version 1.3.1 18-aug-2000
Indexing filenamelen fix.
Modification to diffseq.
Version 1.3.0 17-aug-2000
New applications vectorstrip and diffseq.
Version 1.2.0 15-aug-2000
Version 1.1.0 09-aug-2000
Version 1.0.2 08-aug-2000
Version 1.0.0 15-jul-2000
Version 0.0.4 Dec-1998
|