1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266
|
#!/usr/bin/perl
#
# Script to convert raw trace files into dbEST submission
# files, using phred and cross_match
# See trace2dbest documentation for more details
# Version 3 mainly by rearrangements by Ralf Schmid
# last update 10/11/05 Ralf Schmid
# modified trimming
# Author - Alasdair Anthony, University of Edinburgh
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
use warnings;
use strict;
use Term::ANSIColor;
use Term::ReadLine; #makes user input a bit more friendly
use File::Path; #for rmtree
use File::Copy;
use Cwd; #see sub storependingEST
use Bio::SearchIO;
use Bio::SeqIO;
#### from config file
my $vector_file;
my $ecoli_file;
my $Lib_file;
my $Pub_file;
my $Cont_file;
my $EST_file;
my $phredoptions;
#### general global variables
my @PATH=split(":","$ENV{'PATH'}"); ### Get users Path
my $homedir = $ENV{HOME};
my $version = "3.0.1";
my $phredenv = `printenv | grep PHRED_PARAMETER_FILE`;
chomp $phredenv;
if ($phredenv) {$phredenv =~ s/PHRED_PARAMETER_FILE=//;}
my $ESTfileflag=0;
my $Pubfileflag=0;
my $Libfileflag=0;
my $Contfileflag = 0;
my $dirflag=0;
my $pblastall =1; my $pblastcl3=1; #allows bits of trace2dbest to be missed out if certain prog not in PATH
my $no_vect =1; #flag to indicate if user can see their vector in vector.seq
my $trace_dir_flag =1; #flag to allow returns to start of trace directory section
my $blast = 0;
my $blast_save =0;
my $flag =1; #general use flag, generally used for while loops enclosing print questions
my @dirlist=("fasta","phd_dir","qual","scf","subfiles","fastafiles", "blast_reports", "partigene", "process", "process/traceinfo", "raw_traces");
my @progs = ("phred", "cross_match");
my @optprogs = ("blastcl3", "blastall");
#declare @
my (@stats, @ESTfile_data, @Libfile_data, @Pubfile_data, @Contfile_data, @blastdb);
#declare $
my ($answer,$less_flag, $dir, $file_type, $type_id, $input, $seq_primer, $pcrf_primer, $pcrr_primer, $p_end, $comment, $public_date,
$adapter, $scheme, $tracedir, $library, $high_qual_cutoff, $vminmatch, $vminscore,$eminmatch, $eminscore, $poly_num,
$sl_flag, $sl_seq, $blastdata, $blast_database, $ecoli_cross, $cont_name, $lib_name, $pub_cit, $lib_descr, $final_file_num,
$dbESTsubmission, $species, $save_location, $no_lib, $no_cont, $no_pub, $chim_flag, $chim_flag_e, $bits_cutoff, $read_gnu);
#declare %
for(my $i=0;$i<6;$i++) { #set stats to 0
$stats[$i]=0;
}
#### less/more for display
if (-f "/usr/bin/less"){
$less_flag = 1;
} else {
$less_flag = 0;
}
#### readline stuff
my $term = new Term::ReadLine 'sample';
$term ->ornaments(0); #stops prompt getting underlined
my $attribs = $term->Attribs;
$attribs->{completion_entry_function} =
$attribs->{filename_completion_function};
if ($term ->ReadLine() =~ /Gnu$/) { #returns the actual package that executes the commands - we need gnu
$read_gnu = 1;
}
else {
print "The readline package Term::ReadLine:Gnu was not found.\nInstalling this package will make interaction with trace2dbest more friendly.\nContinuing...\n\n";
$read_gnu = 0;
}
my $log = "logfile";
if (-e "logfile") {
unlink "logfile";
}
open(LOG, ">$log");
#############################################################################
############################ start main menu ################################
#############################################################################
&print_title();
&options();
#########################################################################################################################
sub options () { #### options menu
my $answer=0;
while($answer!=7) {
$answer=&title_page();
if($answer==1) { &check_config();}
if($answer==2) { &tests(); } # run tests
if($answer==3) { &processing_options(); } #processing
if($answer==4) { &blast(); } #preliminary annotation
if($answer==5) { &paperwork_options(); } #edit files
if($answer==6) { &submission(); } #as it says
}
system("clear");
exit(); #exit program
}
############################################################################################################################
###################################################################################
sub title_page() { #### title comments and option selection
print "\n\t Enter the number corresponding to the part of the trace2dbEST\n";
print "\t process you want to perform:\n\n";
print "\t\t1. Setup configuration.\n";
print "\t\t2. Checks and tests.\n";
print "\t\t3. Process traces.\n";
print "\t\t4. Blasts for preliminary annotation.\n";
print "\t\t5. Create or view submission information.\n";
print "\t\t6. Prepare dbEST submission files.\n";
print "\t\t7. Exit.\n\t";
my $flag=0;
while($flag==0) {
$answer=<>;
if($answer=~/^[1|2|3|4|5|6|7]$/) { $flag=1; next; }
else {print " You have entered $answer This is not an option. Please try again\n";}
}
return $answer;
}
####################################################################################
#########################################################################################################################
sub processing_options () { #### options menu
my $answer=0;
while($answer!=3) {
$answer=&processing_page();
if($answer==1) { &process("1"); } # quick&dirty
if($answer==2) { &process("2"); } # full monty
if($answer==3) { &options(); } #back
}
system("clear");
exit(); #exit program
}
############################################################################################################################
###################################################################################
sub processing_page() { #### title comments and option selection
print "\n\t Enter the number corresponding to the processing option you want\n";
print "\t to perform:\n\n";
print "\t\t1. Standard.\n";
print "\t\t2. Advanced.\n";
print "\t\t3. Back to main menu.\n\t";
my $flag=0;
while($flag==0) {
$answer=<>;
if($answer=~/^[1|2|3]$/) { $flag=1; next; }
else {print " You have entered $answer This is not an option. Please try again\n";}
}
return $answer;
}
####################################################################################
#########################################################################################################################
sub paperwork_options () { #### options menu
my $answer=0;
print colored("\n\t##### SUBMISSION INFORMATION #####\n","white bold", "on_black");
while($answer!=5) {
$answer=&paperwork_page();
if($answer==1) { &paperwork("Library"); }
if($answer==2) { &paperwork("Contact"); }
if($answer==3) { &paperwork("Publication"); }
if($answer==4) { &paperwork("EST"); }
if($answer==5) { &options(); } #back
}
system("clear");
exit();
}
############################################################################################################################
###################################################################################
sub paperwork_page() { #### title comments and option selection
print "\n\t This menu allows you to view and add information";
print "\n\t required for submitting sequences to dbEST.";
print "\n\t Enter the number corresponding to the file you want\n";
print "\t to check:\n\n";
print "\t\t1. Library files.\n";
print "\t\t2. Contact files.\n";
print "\t\t3. Publication files.\n";
print "\t\t4. EST files.\n";
print "\t\t5. Back to main menu.\n\t";
my $flag=0;
while($flag==0) {
$answer=<>;
if($answer=~/^[1|2|3|4|5]$/) { $flag=1; next; }
else {print " You have entered $answer This is not an option. Please try again\n";}
}
return $answer;
}
####################################################################################
############################################################################################################################
sub print_title() { ##### title scheme
#displays title
print "\n";
print colored("\t################################################################\n","white bold", "on_black");
print colored("\t### ###\n","white bold", "on_black");
print colored("\t### trace2dbest - Vs $version ###\n","white bold", "on_black");
print colored("\t### a trace file processing and dbEST ###\n","white bold", "on_black");
print colored("\t### sequence submission tool ###\n","white bold", "on_black");
print colored("\t### ###\n","white bold", "on_black");
print colored("\t### A. Anthony, R. Schmid and colleagues for BANG 2005 ###\n","white bold", "on_black");
print colored("\t### ###\n","white bold", "on_black");
print colored("\t### News, upgrades and help: nematode.bioinf\@ed.ac.uk ###\n","white bold", "on_black");
print colored("\t### ###\n","white bold", "on_black");
print colored("\t################################################################\n","white bold", "on_black");
}
###########################################################################################################################
############################
##### SECTION 1 CONFIG #####
############################
sub check_config() {
print colored("\n\t##### CONFIGURATION FILE #####\n","white bold", "on_black");
print "\nThis option will create or update the trace2dbest configuration file\n";
print "'.trace2dbest.conf' in your home directory. Please note, you can also\n";
print "edit your configuration file later on using a text editor.\n";
my $filename = "~/.trace2dbest.conf";
$filename =~ s{ ^ ~ ( [^/]* ) }
{ $1
? (getpwnam($1))[7]
: ( $ENV{HOME} || $ENV{LOGDIR}
|| (getpwuid($>))[7]
)
}ex;
my $config_flag;
if (-e "$filename") {
&get_config();
print "\n\nA configuration file '.trace2dbest.conf' does already exist in your home \n";
print "directory. If you want to recreate the configuration file from scratch please \n";
print "type 'y' now. Typing 'n' will keep it as it is.";
$config_flag=&yes_no;
}
else {
$config_flag=1;
$vector_file="/usr/software/trace2dbest/trace2dbest/vector.seq";
$ecoli_file="/usr/software/trace2dbest/trace2dbest/ecoli.seq";
$Lib_file="/usr/software/trace2dbest/trace2dbest/Libfile.db";
$Pub_file="/usr/software/trace2dbest/trace2dbest/Pubfile.db";
$Cont_file="/usr/software/trace2dbest/trace2dbest/Contfile.db";
$EST_file="/usr/software/trace2dbest/trace2dbest/ESTfile.db";
$phredoptions = "-cd scf -pd phd_dir -qd qual -sd fasta -exit_nomatch -trim_alt \\\"\\\" 2>>phredlogfile 1> /dev/null";
}
#### now doing the real work - if wanted
if ($config_flag==1) {
print "\ntrace2dbest can screen for vector contamination using a FASTA file\n";
print "containing the relevant vector sequences. The location of this file\n";
print "'vector.seq' is set to \'$vector_file\'.\nDo you want to change this entry?";
my $answer = &yes_no;
if ($answer == 1) {
my $input = &get_input("\nPlease enter new location for 'vector.seq' file (full path)\n");
chomp $input;
if (-e "$input") {$vector_file=$input}
else {print "$input does not exist, keeping $vector_file\n"}
}
print "\ntrace2dbest can screen for E. coli contamination using a FASTA file\n";
print "containing the relevant sequences. The location of this file\n";
print "'ecoli.seq' is set to \'$ecoli_file\'.\nDo you want to change this entry?";
$answer = &yes_no;
if ($answer == 1) {
my $input= &get_input ("\nPlease enter new location for 'ecoli.seq' file (full path)\n");
chomp $input;
if (-e "$input") {$ecoli_file=$input}
else {print "$input does not exist, keeping $ecoli_file\n"}
}
print "\ntrace2dbest store entries for library files in the file 'Libfile.db'\n";
print "The location of this file is set to $Lib_file . \nDo you want to change this?\n";
$answer = &yes_no;
if ($answer == 1) {
my $input= &get_input ("\nPlease enter a new location (full path).\n");
chomp $input;
$Lib_file = $input;
}
print "\ntrace2dbest store entries for publication files in the file 'Pubfile.db'\n";
print "The location of this file is set to $Pub_file . \nDo you want to change this?\n";
$answer = &yes_no;
if ($answer == 1) {
my $input= &get_input ("\nPlease enter a new location (full path).\n");
chomp $input;
$Pub_file = $input;
}
print "\ntrace2dbest store entries for contact files in the file 'Contfile.db'\n";
print "The location of this file is set to $Cont_file . \nDo you want to change this?\n";
$answer = &yes_no;
if ($answer == 1) {
my $input= &get_input ("\nPlease enter a new location (full path).\n");
chomp $input;
$Cont_file = $input;
}
print "\ntrace2dbest store entries for EST files in the file 'ESTfile.DB'\n";
print "The location of this file is set to $EST_file . \nDo you want to change this?\n";
$answer = &yes_no;
if ($answer == 1) {
my $input= &get_input ("\nPlease enter a new location (full path).\n");
chomp $input;
$EST_file = $input;
}
&create_conf($vector_file,$ecoli_file,$Lib_file,$Pub_file,$Cont_file,$EST_file,$phredoptions);
}
print "\n\nBack to main menu\n";
&options;
}
###################################################################################################################
############################
#### SECTION 2 - tests #####
############################
sub tests () {
print colored("\n\t##### TESTS & CHECKS #####\n","white bold", "on_black");
print "\nThis option will test for the presence of the required executables,\n";
print "and also check for the presence of some environmental variables and datafiles.\n";
print "We recommend to sort out any issues highlighted in this option before \n";
print "processing real data.\n\n";
sleep(1);
&get_config;
#### first executables
print "The following programs are essential for trace2dbest:\n";
foreach my $prog (@progs) {
if (my $tmp=&find_program("$prog","1")) {print "searching for: $prog\t................OK\n"; sleep(1);}
}
print "\n\nThe following programs are optional for trace2dbest:\n";
foreach my $prog (@optprogs) {
if (my $tmp=&find_program("$prog","1")) {print "searching for: $prog\t................OK\n"; sleep(1);}
}
#### environmental variables
print "\n\nThe phred parameter file is essential for trace2dbest:\n";
if ($phredenv) {
print "checking ENV: PHRED_PARAMETER_FILE\t........OK\n"; sleep(1);
if (-e "$phredenv") {
print "checking file: PHRED_PARAMETER_FILE\t........OK\n"; sleep(1);
}
else {
print colored("ERROR: PHRED_PARAMETER_FILE does not exist.\n","red bold");
}
}
else {
print colored("ERROR: PHRED_PARAMETER_FILE environment variable is not set.\n","red bold");
}
#### files
print "\n\nThe following files need to be writeable for trace2dbest\n";
if (-e "$Lib_file" && -w "$Lib_file") {
print "checking: Libfile.db\t................OK\n";sleep (1);
}
elsif (-e "$Lib_file") {
print colored("ERROR: $Lib_file not writeable - trying to fix it\n","red bold");
my $lucky = system ("chmod $Lib_file 755");
unless ($lucky) {
print "checking: Libfile.db\t................OK\n";sleep (1);
}
else{print colored("ERROR: fixing failed\n","red bold");}
}
else {
print colored("ERROR: $Lib_file does not exist - trying to create it\n","red bold");
my $lucky = system ("touch $Lib_file");
unless ($lucky) {
print "checking: Libfile.db\t................OK\n";sleep (1);
}
else{print colored("ERROR: creating $Lib_file failed\n","red bold");}
}
if (-e "$Pub_file" && -w "$Pub_file") {
print "checking: Pubfile.db\t................OK\n";sleep (1);
}
elsif (-e "$Pub_file") {
print colored("ERROR: $Pub_file not writeable - trying to fix it\n","red bold");
my $lucky = system ("chmod $Pub_file 755");
unless ($lucky) {
print "checking: Pubfile.db\t................OK\n";sleep (1);
}
else{print colored("ERROR: fixing failed\n","red bold");}
}
else {
print colored("ERROR: $Pub_file does not exist - trying to create it\n","red bold");
my $lucky = system ("touch $Pub_file");
unless ($lucky) {
print "checking: Pubfile.db\t................OK\n";sleep (1);
}
else{print colored("ERROR: creating $Pub_file failed\n","red bold");}
}
if (-e "$Cont_file" && -w "$Cont_file") {
print "checking: Contfile.db\t................OK\n";sleep (1);
}
elsif (-e "$Cont_file") {
print colored("ERROR: $Cont_file not writeable - trying to fix it\n","red bold");
my $lucky = system ("chmod $Cont_file 755");
unless ($lucky) {
print "checking: Contfile.db\t................OK\n";sleep (1);
}
else{print colored("ERROR: fixing failed\n","red bold");}
}
else {
print colored("ERROR: $Cont_file does not exist - trying to create it\n","red bold");
my $lucky = system ("touch $Cont_file");
unless ($lucky) {
print "checking: Contfile.db\t................OK\n";sleep (1);
}
else{print colored("ERROR: creating $Cont_file failed\n","red bold");}
}
if (-e "$EST_file" && -w "$EST_file") {
print "checking: EST_file.db\t................OK\n";sleep (1);
}
elsif (-e "$EST_file") {
print colored("ERROR: $EST_file not writeable - trying to fix it\n","red bold");
my $lucky = system ("chmod $EST_file 755");
unless ($lucky) {
print "checking: EST_file.db\t................OK\n";sleep (1);
}
else{print colored("ERROR: fixing failed\n","red bold");}
}
else {
print colored("ERROR: $EST_file does not exist - trying to create it\n","red bold");
my $lucky = system ("touch $EST_file");
unless ($lucky) {
print "checking: EST_file.db\t................OK\n";sleep (1);
}
else{print colored("ERROR: creating $EST_file failed\n","red bold");}
}
#### more files
print "\n\nThe following files are optional for trace2dbest\n";
if (-e "$vector_file") {print "checking: vector.seq\t................OK\n";sleep (1);}
else { print colored("Couldn't find $vector_file , we recommend to install them now.\n","red bold");
}
sleep(1);
if (-e "$ecoli_file") {print "checking: ecoli.seq\t................OK\n";sleep (1);}
else { print colored("Couldn't find $ecoli_file , we recommend to install them now.\n","red bold");
}
sleep(1);
print "\n\n\nTests and checks completed - Back to main menu.\n\n";
}
##############################
#### SECTION 3 processing ####
##############################
sub process () {
print colored("\n\t##### PROCESSING TRACES #####\n","white bold", "on_black");
print "\nUsing this option will process your trace files. First you will be asked to\n";
print "enter some information required for successfull processing.\n";
sleep(1);
&get_config;
my $process=$_[0];
if ($process == 1) {
print "You have opted for the 'standard' option - This option uses standard locations\n";
print "for files and standard parameters for processing. It is used best for quick tests.\n";
print "To make most of your analysis we recommend to use the 'advanced' option\n";
}
else {
print "You have opted for the 'advanced' option - This option allows you to set several\n";
print "processing parameters to make most of your analysis. \n";
}
sleep(1);
#### naming scheme
if ($process == 1) {$scheme = 1;}
else {
print colored ("\nTrace file naming scheme\n","bold");
print "In order for your traces to be processed, the file names must follow one of\nthese schemes:\n";
print "\n1 - NERC Environmental Genomics scheme\n2 - STRESSGENES scheme\n";
$input = &get_input("\nPlease enter the appropriate number: ");
chomp $input;
$flag =1;
while ($flag) {
if ($input !~ /^(1|2)$/) {
$input = &get_input("Please enter 1 or 2: \n");
chomp $input;
} else {
$flag =0;
}
}
$scheme = $input;
}
#### trace directory
print colored ("\n\nTrace file directory","bold");
print "\n"; #needed to stop the following question coming up in bold
while ($trace_dir_flag) {
$input = &get_input("Please enter the full path of the directory containing the trace files\nto be processed\n");
while ($input =~ /\./) { #check for . usage (invalid)
print "The current directory is not a valid option. See trace2dbest documentation\n";
print "for more details.\n";
$input = &get_input("Please enter a different directory or q to quit trace2dbest.\n");
chomp $input;
}
while (!-R $input) {
$input = &get_input("ERROR: $input is not readable. Please re-enter the trace file\ndirectory path or hit 'q' to quit\n");
chomp $input;
if ($input =~ /^q{1}$/i) {
print "Ok, trace2dbest will now exit.\n";
exit();
}
while ($input =~ /\./) { #check for . usage (invalid)
print "The current directory is not a valid option. See trace2dbest documentation\n";
print "for more details.\n";
$input = &get_input("Please enter a different directory or q to quit trace2dbest.\n");
chomp $input;
}
}
$tracedir = $input;
#get total files in dir and total with library identifer
opendir(DIR, "$tracedir") or die "$tracedir could not be opened!\n";
my @file_num = grep !/^\.\.?\z/ , readdir DIR; #don't get . and ..
my $file_nums=@file_num;
my @trace_num; #declare trace_num
close DIR; #don't know why DIR has to be closed and then re=opened
opendir(DIR, "$tracedir")or die "$tracedir could not be opened!\n";
if ($scheme == 1) { #EG format
@trace_num = grep { m/^[A-Za-z]{2}_\w{2,5}_\d{2,3}[A-Za-z]\d{2}$/ } readdir DIR;
} elsif ($scheme ==2) { #SG format
@trace_num = grep { m/^[A-Z][a-z][A-Z]{2}[0-9]{2}[a-z][0-9]{2}[a-z][0-9]{2}[a-z][0-9]_[A-Z][a-z]{2}[A-Z][a-z]$/ } readdir DIR;
} else {
die "Naming scheme outside allowable range!\n";
}
my $trace_nums=@trace_num;
if ($trace_nums ==0) { #special case for when NO trace files are found
$input = &get_input("\nNo files with the specified naming scheme were found in $tracedir.\nHit return to re-enter the trace file directory details, or q to quit\ntrace2dbest (so you can edit your trace directory): ");
chomp $input;
if ($input =~ /^q{1}$/i) {
print "Ok, trace2dbest will now exit...\n";
exit();
} else {
next;
}
}
my @nolib;
if ($file_nums > $trace_nums) {
print "\nWarning: There are $file_nums files in $tracedir, however, only $trace_nums files match\nthe specified naming scheme.\n";
print "The following files in do not match the specified naming scheme: \n\n";
#for the grep operator (cf glob) we can use a regex to match the specified naming scheme.
if ($scheme == 1) { #EGformat
@nolib = grep { !m/^([A-Za-z]{2}_\w{2,5}_\d{2,3}[A-Za-z]\d{2})$/} @file_num; #find files that don't match (!m) EG format regex (could have used two foreach loops)
} elsif ($scheme ==2) { #SG format
@nolib = grep { !m/^([A-Z][a-z][A-Z]{2}[0-9]{2}[a-z][0-9]{2}[a-z][0-9]{2}[a-z][0-9]_[A-Z][a-z]{2}[A-Z][a-z])$/} @file_num;
} else {
die "Naming scheme outside allowable range!\n";
}
my $rejects = join "\n", @nolib;
print $rejects;
print "\n\nThe above files will be DELETED and not be further processed. ";
} elsif ($file_nums == $trace_nums) {
print "\nThere are $trace_nums files that match this naming scheme in $tracedir\n";
} else {
die "There are $trace_nums trace files but only $file_nums files!\n";
}
$input = "y";
unless ($process == 1) {$input = &get_input("Is this correct? (y/n): "); chomp $input;};
if ($input =~ /n/i) {
$input = &get_input("Hit return to re-enter the trace file directory details, or q to quit\ntrace2dbest (so you can edit your trace directory): ");
chomp $input;
if ($input =~ /^q{1}$/i) {
print "Ok, trace2dbest will now exit...\n";
exit();
}
} else {
foreach my $file (@nolib) {
my $rm_file = $tracedir . "/" . $file;
print "removing $rm_file\n";
unlink ($rm_file) or die "could not unlink $rm_file! Please manually edit your trace directory\n"; #remove any files in trace that don't match naming scheme -brutal.
}
$trace_dir_flag =0;
}
$final_file_num = $trace_nums; #save trace nums as global variable for use later. ok to use trace nums because file_num and trace_num should be the same now.
}
#### get rid of old directories
foreach $dir (@dirlist) {
if(stat("$dir")) {
$dirflag=1;
last;
}
}
if($dirflag==1) {
print colored ("\nCheck working directory\n","bold");
print colored ("\n#### WARNING ! ####","green bold");
print "\nSome directories/files from previous runs exist. These\n"; $|=1;
print "directories must be removed before trace2dbest can continue.\n";
print "Remove now [y/n]? : ";
&query_for_exit();
foreach $dir (@dirlist) {
if(stat("$dir")) {
print "Remove directory \'$dir\' and its contents ? [y/n] : ";
&query_for_exit();
&rmtree($dir);
}
}
}
if (-e "phredlogfile") {
print "Removing old phred logfile\n";
unlink "phredlogfile";
}
if (-e "cmlogfile") {
print "Removing old cross_match logfile\n";
unlink "cmlogfile";
}
##### set up dirs
mkdir "subfiles";
mkdir "fasta";
mkdir "qual";
mkdir "phd_dir";
mkdir "scf";
mkdir "fastafiles";
mkdir "partigene";
mkdir "blast_reports";
mkdir "process";
mkdir "process/traceinfo";
mkdir "raw_traces";
my @traces = glob ("$tracedir/*");
foreach my $file (@traces) {
copy ($file, "raw_traces")or die "Can't copy $tracedir to raw_traces\n", print LOG " $tracedir/* to raw_traces:$!\n";
}
##### get info for specialists
if ($process == 2) {
print "\nThe following information is required to allow trace2dbest to process your\ntraces efficiently.\n";
print colored ("\nAdapter","bold");
print "\n"; #on separate line so next question doesn't get bolded
$flag =1;
while ($flag) {
#print "If you would like trace2dbest to trim off an adapter sequence (and everything\nupstream of it), please enter the adapter sequence here or hit\nreturn to continue: ";
$adapter = &get_input("If you would like trace2dbest to trim off an adapter sequence (and everything\nupstream of it), please enter the adapter sequence here or hit\nreturn to continue: ");
chomp $adapter;
#this check has been hashed out so that a regex can be used in the adapter sequence
#if ($adapter =~ /[^ACGT]/i) {
# print "Please enter a valid DNA sequence.\n";
#} else {
$flag = 0;
#}
}
print colored ("\nVector file\n","bold");
unless (-s $vector_file) {
print "ERROR: $vector_file does not appear to exist or is empty.\n";
print "Please check your setup - Back to main menu.\n";
&options;
}
$no_vect=0; #assume vector is in file (but see below)
$input= &get_input("Would you like to see a list of all the vector sequences in\n $vector_file? (y/n): ");
chomp $input;
if ($input =~ /y/i) {
$no_vect = &Get_vectors($vector_file); #will return 1 if vector not in file
unless ($no_vect=1) {
print "Please update $vector_file . - Exiting now.\n";
sleep(1);
exit;
}
}
#extra bit to check that ecoli.seq is in place
print colored ("\nE.coli sequence information","bold");
print "\n"; #so that next bit doesn't get bolded
$input = &get_input("Do you want to screen for E.coli sequence in your ESTs? (y/n): ");
chomp $input;
if ($input =~ /y/i) {
print "Ok, checking for ecoli.seq file...";
if (-s "$ecoli_file") { #file exists with non-zero size
print "ecoli.seq file found at $ecoli_file.\n";
} else {
print "ERROR: The file ecoli.seq was not found in $ecoli_file. Please check your configuration file.\n";
print "trace2dbest will NOT screen for E.coli sequence.\n";
}
$ecoli_cross = 0;
}
else {
print "Ok, trace2dbest will NOT screen for E.coli sequence.\n";
$ecoli_cross = 0;
}
print "\nFor each of the parameters, enter the value you wish to use or hit return to\nuse the default shown in brackets().\n";
print colored ("\nphred length cut-off setting","bold");
print "\n";
$flag = "1";
while ($flag) {
my $input = &get_input("Number of high quality bases required in sequence (default setting is 150 bases)");
chomp $input;
if ($input) {
if ($input =~ /^\d*$/ ) {
$high_qual_cutoff = $input;
print "High quality cut-off set to $high_qual_cutoff\n";
$flag = 0;
} else {
print "You must enter a number!\n";
next;
}
}else {
$high_qual_cutoff = 150;
print "High quality cut-off of $high_qual_cutoff selected\n";
$flag =0;
}
}
print colored ("\ncross_match parameter setting\n","bold");
print "cross_match for vector sequence - \n";
$flag = "1"; #flag reused
while ($flag) {
my $input = &get_input("minmatch (10): ");
chomp $input;
if ($input) {
if ($input =~ /^\d*$/ ) {
$vminmatch = $input;
print "minmatch set to $vminmatch\n";
$flag = 0;
} else {
print "You must enter a number!\n";
next;
}
}else {
$vminmatch = 10;
print "minmatch value of $vminmatch selected\n";
$flag =0;
}
}
$flag = 1;
while ($flag) {
my $input = &get_input("minscore (20): ");
chomp $input;
if ($input) {
if ($input =~ /^\d*$/ ) {
$vminscore = $input;
print "minscore set to $vminscore\n";
$flag = 0;
} else {
print "You must enter a number!\n";
next;
}
}else {
$vminscore = 20;
print "minscore value of $vminscore selected\n";
$flag =0;
}
}
if ($ecoli_cross) {
print "cross_match for E.coli sequence - \n";
$flag = "1";
while ($flag) {
my $input = &get_input("minmatch (20): ");
chomp $input;
if ($input) {
if ($input =~ /^\d*$/ ) {
$eminmatch = $input;
print "minmatch set to $eminmatch\n";
$flag = 0;
} else {
print "You must enter a number!\n";
next;
}
}else {
$eminmatch = 20;
print "minmatch value of $eminmatch selected\n";
$flag =0;
}
}
$flag = 1;
while ($flag) {
my $input = &get_input("minscore (30): ");
chomp $input;
if ($input) {
if ($input =~ /^\d*$/ ) {
$eminscore = $input;
print "minscore set to $eminscore\n";
$flag = 0;
} else {
print "You must enter a number!\n";
next;
}
}else {
$eminscore = 30;
print "minscore value of $eminscore selected\n";
$flag =0;
}
}
}
print colored ("\nPoly(A) tail length cut-off setting","bold");
print "\n";
$flag = 1;
while ($flag) {
my $input = &get_input("Enter number bases in poly(A) tail (default setting is 12 consecutive A bases): ");
chomp $input;
if ($input) { #if $input eq "0" then this will not pass and poly a will get set to 12. setting poly(A) to zero is a bad idea
if ($input =~ /^\d{1,2}$/ ) {
$poly_num = $input;
print "poly(A) tail set to $poly_num\n";
$flag=0;
} else {
print "You must enter a whole number between 1 and 99!\n";
next;
}
} else {
$poly_num = 12;
print "poly(A) tail set to $poly_num\n";
$flag=0;
}
}
print colored ("\nSpliced leader 1 identification\n","bold");
print "If you wish to trim the nematode spliced leader sequence, type 'yes', or\n";
print "enter an alternative nucleotide sequence.";
$input = &get_input("If you do not wish to trim any spliced leader sequence just hit return: ");
chomp $input;
if ($input) {
if ($input =~ /yes/i) {
$sl_flag = 1;
$sl_seq = "ggtttaattacccaagtttgag";
print "\nThe nematode SL sequence will be trimmed\n";
} elsif ($input =~ /[acgt]*/) {
$sl_flag =1;
$sl_seq = $input;
print "The sequence $sl_seq will be trimmed\n";
}
} else {
print "\nOk, no SL will be trimmed\n";
$sl_flag = 0;
}
}
#### using all standard
else {
unless (-e "$vector_file") {
print "WARNING: couldn't find vector.seq\n";
}
$ecoli_cross =1;
unless (-e "$vector_file") {
print "Ok, trace2dbest will NOT screen for E.coli sequence.\n";
$ecoli_cross = 0;
}
$vminmatch = 10; $vminscore =20; $eminmatch=20; $eminscore =30; $high_qual_cutoff =150; $poly_num = 12;
}
##### phred it
print "trace2dbest has gathered all necessary information for trace processing\n";
print "Starting phred (basecalling from raw traces) ...\n";
sleep(2);
print "Running phred - please wait...";
#see below (sequence processing) for explaination of phred arguements
system("phred -id $tracedir $phredoptions "); #exit_nomatch means phred will exit if the trace chemistry is not found in phredpar.dat
if (my @phred_out = glob ("fasta/*.seq")) {
my $phred_out_size = @phred_out;
if ($phred_out_size == $final_file_num) {
print LOG "PHRED: .seq files found in fasta - phred ok.\n";
print "Done\n";
} else {
print LOG "\nPHRED WARNING: $final_file_num files with a valid name where passed to phred, but only $phred_out_size files were processed!\n";
my $unphreded = $final_file_num - $phred_out_size;
print "\n\t$unphreded of $final_file_num files were not processed by phred. See phredlogfile and logfile for details.\n";
my $input = &get_input("\tEnter 'q' to quit or hit return to continue: ");
chomp $input;
if ($input =~ /^q{1}$/i) {
print "Ok, trace2dbest will now exit...\n";
exit();
}
}
$stats[1] = $phred_out_size;
} else {
print LOG "PHRED: No .seq files found in fasta directory\n";
print colored ("FAILED!\n", "red bold");
print "Check logfile and phredlogfile. trace2dbest will now exit...\n";
exit();
}
#cross_match for vector
#concatenate the .seq files created by phred into all.seq - makes cross_match quicker
system("cat ./fasta/*.seq > all.seq" || die "cat fasta/*.seq failed!\n");
print "Running cross_match (screening for vector sequence) - please wait...";
system("cross_match all.seq $vector_file -minmatch $vminmatch -minscore $vminscore -screen 2>>cmlogfile 1> /dev/null");
if (-s "all.seq.screen") {
print LOG "Vector crossmatch: all.seq.screen - vector crossmatch ok.\n";
print "Done\n";
} else {
print LOG "Vector crossmatch: No all.seq.screen\n";
print colored ("FAILED!\n", "red bold");
print "Check logfile and cmlogfile. trace2dbest will now exit...\n";
exit();
}
#change the cross_match Xs to Zs so that they can be differentiated from the cross_match Xs from ecoli screen.
open (SCREENED, "all.seq.screen")or die "Could not open all.seq.screen|\n";
my @screened = <SCREENED>;
close SCREENED;
foreach my $line (@screened) {
unless ($line =~ /^>/) {
$line =~ s/X/Z/g;
}
}
open (SCREEN_F, ">all.seq.screen"); #overwrite all.seq.screen
print SCREEN_F @screened;
close SCREEN_F;
#cross_match for ecoli
if ($ecoli_cross) {
rename "all.seq.screen", "all.n"; #vector masked sequence produced by cross_match to all.n
print "Running cross_match (screening for E. coli sequence) - please wait...";
system("cross_match all.n $ecoli_file -minmatch $eminmatch -minscore $eminscore -screen 2>>cmlogfile 1> /dev/null");
if (-s "all.n.screen") {
print LOG "Ecoli crossmatch: all.n.screen - ecoli crossmatch ok.\n";
} else {
print LOG "Ecoli crossmatch: No all.n.screen\n";
print colored ("FAILED!\n", "red bold");
print "Check logfile and cmlogfile. trace2dbest will now exit...\n";
exit();
}
split_screened("all.n.screen","fasta"); #split output (all.n.screen) into individual files
if (glob ("fasta/*.seqsc")) {
print LOG "E.coli crossmatch: .seqsc files found in fasta - ecoli cm ok.\n";
print "Done\n";
} else {
print LOG "E.coli crossmatch: No .seqsc files found in fasta\n";
print colored ("FAILED!\n", "red bold");
print "Check logfile and cmlogfile. trace2dbest will now exit...\n";
exit();
}
unlink glob "all.n*";
unlink glob "all.seq*";
} else {
split_screened ("all.seq.screen", "fasta");
unlink ("all.seq","all.seq.log", "all.seq.screen");
}
print colored("\nSequence processing\n", "bold");
#after basecalling by phred, the sequence is trimmed using the lowest(upstream) and highest(downstream) trim values from the phd
#file TRIM: field and the trim values suggested by the -trim arguement (shown in header of fasta dir files). Generally the lowest
#and highest values come from the phd file but sometimes -trim gives the lowest trim position. The remaining trim values are used
#to define the "high quality" section of the sequence. These values are changed accordingly following the subsequent sequnce
#processing to remove vector, ecoli, NNN+ etc.
opendir (PHD_DIR, "./phd_dir"); #open the directory to allow looping thru the list of files
while (my $file= readdir PHD_DIR) {
my ($sequence, $t_hq_start, $t_hq_total, $t_hq_end, $p_hq_start, $p_hq_end,
$trim_start, $trim_end, $hi_qual_start, $hi_qual_end, $seq_length, $trace_name, $svector_seq);#clear varibables
my $polyA_des = "poly(A) not found";
next if $file =~ /^\./; #avoid . and ..
$file=~ s/\.phd\.1/\.seqsc/; #convert file name
open (FASTA_FILE, "<./fasta/$file") or die "The file $file in the ./fasta directory could not be opened!\n";
while (my $line = <FASTA_FILE> || !eof(FASTA_FILE)) {
if ($line =~ />.+\s+\d+\s+(\d+)\s+(\d+)/) { #get the trim info from the fasta header (number of bases to be trimmed at begining and length of seq after trimming
$t_hq_start = $1;
$t_hq_total =$2;
if ($t_hq_total ==0) {
$stats[2]++; #add to 'number of low quality traces' stats
}
} else {
chomp $line;
$sequence .= $line; #add the sequence to $sequence
}
}
close (FASTA_FILE);
$file=~ s/\.seqsc/\.phd\.1/; #and back again
open (PHD_FILE, "<./phd_dir/$file") or die "The file $file in the ./phd_dir directory could not be opened!\n"; #open the equivalent file in the phd_dir
while (my $line = <PHD_FILE>){
if($line=~/TRIM:\s(-?\d+)\s(-?\d+)/){ #get 1st and last bases of high quality read segment use -? to allow for presence of -1
$p_hq_start=$1; $p_hq_end=$2; last; #from trim line of phd file
}
}
close(PHD_FILE);
#convert trim positions to same format
$t_hq_end = $t_hq_start + $t_hq_total;
$t_hq_start++; #in fasta file this value refers to number of bases to be trimmed at start, we want position of high quality, so add 1
#get lowest and highest for trimming, use remaining two values for defining high qual section
if ($p_hq_start < 0) {
$p_hq_start = 0;
}
if ($t_hq_start >= $p_hq_start) {
$trim_start = $p_hq_start-1; #minus 1 to give position of last base to be trimmed at start of raw sequence (= number of bases trimmed at start)
$hi_qual_start = $t_hq_start;
} else {
$trim_start = $t_hq_start -1; #minus 1 to give position of last base to be trimmed at start of raw sequence
$hi_qual_start = $p_hq_start;
}
if ($t_hq_end <= $p_hq_end) {
$trim_end = $p_hq_end +1; #add one to give postion of first base to be trimmed at end of raw sequence
$hi_qual_end = $t_hq_end;
} else {
$trim_end = $t_hq_end +1; #add one to give postion of first base to be trimmed at end of raw sequence
$hi_qual_end = $p_hq_end;
}
#trim sequence
$seq_length = ($trim_end -$trim_start) -1; #minus 1 to give length of trimmed seq
#trim_start shouldn't be negative because that messes up the substr. so if <0 add 1. make new variable to keep trim_start consistent with rest of script
my $trim_start_new;
if ($trim_start < 0) {
$trim_start_new = 0;
} else {
$trim_start_new = $trim_start;
}
$sequence = substr ($sequence, $trim_start_new, $seq_length);
#convert hi qual regions to positions in trimmed sequence (instead of raw sequence)
$hi_qual_start = $hi_qual_start -$trim_start;
$hi_qual_end = $hi_qual_end - $trim_start; #adjust high qual region positions
#print sequence details to process dir
($trace_name = $file) =~ s/\.phd.1//;
open (TRACEINFO, ">./process/traceinfo/$trace_name") or die "The file $trace_name in the ./process/traceinfo directory could not be created/opened!\n";
print TRACEINFO "$trace_name Sequence processing information\n";
print TRACEINFO "Quality trimming information\n";
print TRACEINFO "Position of last base trimmed at start of raw sequence: $trim_start\n";
print TRACEINFO "Position of first base trimmed at end of raw sequence: $trim_end\n";
print TRACEINFO "Length of sequence after trimming: $seq_length\n";
print TRACEINFO "Start position of high quality region in trimmed sequence: $hi_qual_start\n";
print TRACEINFO "End position of high quality region in trimmed sequence: $hi_qual_end\n";
#chuck out sequences that have high qual section < $high_qual_cutoff
if (($hi_qual_end - $hi_qual_start) <= $high_qual_cutoff) {
my $high_qual_length = $hi_qual_end - $hi_qual_start;
if ($high_qual_length < 0) { $high_qual_length =0 }
print TRACEINFO "Sequence rejected - high quality length () is less than $high_qual_cutoff\n";
print "$trace_name has beeen rejected, high quality sequence length = $high_qual_length\n";
print LOG "$trace_name has beeen rejected, high quality sequence length = $high_qual_length\n";
next;
}
#trim adapter (if one has been entered)
#do this before vector trimming, otherwise vector trimming may take off a little bit of adapter meaning that this section wouldn't recognise it.
if($adapter){
if($sequence =~ s/^(.{0,200}$adapter)//i) { # remove adapter if it appears in first 200 bases
my $adapter_trim = $1;
my @adapter_trim = split ('', $adapter_trim);
my $at_length= @adapter_trim;
open (ADAPTERTRIM, ">>./process/adapter_trim") or die "Could not open process/adapter_trim!\n";
print ADAPTERTRIM "$trace_name: $at_length bases were trimmed\nTrimmed sequence: $adapter_trim\n\n";
close ADAPTERTRIM;
print TRACEINFO "Adapter trimming\nAdapter sequence: $adapter\nLength of sequence trimmed (from start):$at_length\nSequence trimmed= $adapter_trim\n";
$hi_qual_start = $hi_qual_start - $at_length; #may make hi qual start -ve but that shouldn't matter
$hi_qual_end = $hi_qual_end -$at_length;
}
}
#trimming for vector - Xs now converted to Zs
if($sequence =~ s/^(.{0,150}Z+)//i) { #substitue sequence beginning with any number of chars between
#0 and 150, followed by any number of X's, with nothing.
my $svector_seq = $1;
my @svector = split('',$svector_seq);
my $sv_length = @svector;
open (VECTORTRIM, ">>./process/vector_trim") or die "Could not open process/vector_trim!\n";
print VECTORTRIM "$trace_name: $sv_length bases were trimmed at the start of the sequence.\nTrimmed sequence: $svector_seq\n\n";
close VECTORTRIM;
print TRACEINFO "Leading vector trimming\nLength trimmed at start= $sv_length\nSequence trimmed= $svector_seq\n";
$hi_qual_start = $hi_qual_start - $sv_length; #may make hi qual start -ve but that shouldn't matter
$hi_qual_end = $hi_qual_end -$sv_length;
}
if($sequence =~ s/(Z+.{0,150})$//i){ # remove trailing vector seq
my $evector_seq = $1;
my @evector = split('',$evector_seq);
my $ev_length = @evector; #sv_length reused
open (VECTORTRIM, ">>./process/vector_trim") or die "Could not open process/vector_trim!\n";
print VECTORTRIM "$trace_name: $ev_length bases were trimmed at the end of the sequence.\nTrimmed sequence: $evector_seq\n\n";
close VECTORTRIM;
print TRACEINFO "Trailing vector trimming\nLength trimmed at end= $ev_length\nSequence trimmed= $evector_seq\n";
#high qual start shouldn't be affected but high qual end might be. hi qual end can be made = end of seq later.
}
#trimming for ecoli - Xs still Xs
if($sequence =~ s/^(.{0,150}X+)//i) { #substitue sequence beginning with any number of chars between
#0 and 150, followed by any number of X's, with nothing.
my $secoli_seq = $1;
my @secoli = split('',$secoli_seq);
my $sv_length = @secoli;
open (ECOLITRIM, ">>./process/ecoli_trim") or die "Could not open process/ecoli_trim!\n";
print ECOLITRIM "$trace_name: $sv_length bases were trimmed at the start of the sequence.\nTrimmed sequence: $secoli_seq\n\n";
close ECOLITRIM;
print TRACEINFO "Leading E.coli sequence trimming\nLength trimmed at start= $sv_length\nSequence trimmed= $secoli_seq\n";
$hi_qual_start = $hi_qual_start - $sv_length; #may make hi qual start -ve but that shouldn't matter
$hi_qual_end = $hi_qual_end -$sv_length;
}
if($sequence =~ s/(X+.{0,150})$//i){ # remove trailing ecoli seq
my $eecoli_seq = $1;
my @eecoli = split('',$eecoli_seq);
my $ev_length = @eecoli; #sv_length reused
open (ECOLITRIM, ">>./process/ecoli_trim") or die "Could not open process/ecoli_trim!\n";
print ECOLITRIM "$trace_name: $ev_length bases were trimmed at the end of the sequence.\nTrimmed sequence: $eecoli_seq\n\n";
close ECOLITRIM;
print TRACEINFO "Trailing E.coli sequence trimming\nLength trimmed at end= $ev_length\nSequence trimmed= $eecoli_seq\n";
#high qual start shouldn't be affected but high qual end might be. hi qual end can be made = end of seq later.
}
#trimming for N's
if($sequence =~ s/^(.{0,150}NNN+)//i) { # remove low quality bases if at least 3 in a row
my $ntrim_seq = $1;
my @ntrim = split('',$ntrim_seq);
my $nt_length = @ntrim;
open (QUALTRIM, ">>./process/quality_trim") or die "Could not open process/quality_trim!\n";
print QUALTRIM "$trace_name: $nt_length bases trimmed from start of sequence\nTrimmed sequence:$ntrim_seq\n\n";
close QUALTRIM;
print TRACEINFO "Start N trimming\nLength of sequence trimmed (from start): $nt_length\nSequence trimmed= $ntrim_seq\n";
$hi_qual_start = $hi_qual_start - $nt_length; #may make hi qual start -ve but that shouldn't matter
$hi_qual_end = $hi_qual_end -$nt_length;
}
#trim SL-1 or equivalent
#Most C. elegans genes are trans-spliced at their 5' end to a small leader exon,
#called SL-1. Estimates of the actual proportion range from 80% to 60%. The SL-1 sequence can be used as a
#5' end marker - any sequence upstream of it can be removed.
#this bit of code finds the sl and any sequence before it.
#complement sl sequence not trimmed because this is likley to be in the low quality region of the sequence read.
#added 6/5/03
#this an optional step and the user can enter their own sequence for removal
if ($sl_flag) { #sl sequence to be found and all upstream sequence removed
if ($sequence =~ s/^(.*$sl_seq)//i) {
my $sl_trim = $1;
my @sl_trim = split ('',$sl_trim);
my $st_length = @sl_trim;
open (ADAPTERTRIM, ">>./process/adapter_trim") or die "Could not open process/adapter_trim!\n";
print ADAPTERTRIM "$trace_name: $sl_seq found, $st_length bases were trimmed\nTrimmed sequence: $sl_trim\n\n";
close ADAPTERTRIM;
print TRACEINFO "Trimming for $sl_seq\nLength of sequence trimmed (from start): $st_length\nSequence trimmed= $sl_trim\n";
$hi_qual_start = $hi_qual_start - $st_length; #may make hi qual start -ve but that shouldn't matter
$hi_qual_end = $hi_qual_end -$st_length;
}
}
#poly A trimming there is still the potential that a poly A/T will not be properly trimmed if it overlaps the 150 bp boundary, but this shouldn't cause too much of a problem.
my $polya_cut = $high_qual_cutoff - $poly_num;
if ($sequence =~ /(.{$polya_cut,}?)(a{$poly_num}.*)/i) {
#last occurence of polya after acceptable length
#poly_num is the user specified length of poly a tail (default is 12)
my $pa_trim = $2;
$sequence = $1; #not using substition of $2 so that we get the right string in the right position extracted.
my @pa_trim = split ('',$pa_trim);
my $pat_length = @pa_trim;
open (POLYATRIM, ">>./process/polya_trim") or die "Could not open process/polya_trim!\n";
print POLYATRIM "$trace_name: $pat_length bases were trimmed for poly(A)\nTrimmed sequence: $pa_trim\n\n";
close POLYATRIM;
print TRACEINFO "Poly(A) trimming\nLength of sequence trimmed (from end): $pat_length\nSequence trimmed= $pa_trim\n";
$polyA_des = "poly(A) trimmed";
}
#polyT trimming - revised algorithm.
my $rev_sequence = reverse $sequence; #reverse sequence so we can look for polyT in same way as polyA
if ($rev_sequence =~ /(.{$polya_cut,}?)(t{$poly_num}.*)/i ) {
#poly_num is the user specified length of poly a tail (default is 12)
my $pt_trim = $2;
$rev_sequence = $1;
my @pt_trim = split ('',$pt_trim);
my $ptt_length = @pt_trim;
$pt_trim = reverse $pt_trim; #make pt_trim represent trimmed sequence as it would be in the actual (non reversed) sequence
open (POLYATRIM, ">>./process/polya_trim") or die "Could not open process/polya_trim!\n";
print POLYATRIM "$trace_name: $ptt_length bases were trimmed for poly(t)\nTrimmed sequence: $pt_trim\n\n";
close POLYATRIM;
print TRACEINFO "Poly(t) trimming\nLength of sequence trimmed (from start): $ptt_length\nSequence trimmed= $pt_trim\n";
$hi_qual_start = $hi_qual_start - $ptt_length; #may make hi qual start -ve but that shouldn't matter
$hi_qual_end = $hi_qual_end -$ptt_length;
$polyA_des = "poly(T) trimmed";
}
$sequence = reverse $rev_sequence; #return $sequence to original orientation
#check for Zs (vector) in middle of sequence (leading and trailing Zs already removed). Zs in middle may have occured because the trace represents a chimeric EST or cross_match has mistakenly identified ecoli seq in middle of sequence. they may also be other reasons.
$chim_flag = 0;
my $mid_z_num = 0;
while ($sequence =~ s/[z]/n/i) {
$mid_z_num++;
$chim_flag = 1;
}
if ($chim_flag) {
open (VECTORTRIM, ">>./process/vector_trim") or die "Could not open process/vector_trim!\n";
print VECTORTRIM "$trace_name is a potential chimera - cross_match has masked $mid_z_num vector bases\n";
print VECTORTRIM "in the middle of this sequence. The cross_match X's have been replaced with n's.\n";
close VECTORTRIM;
print TRACEINFO "**potential chimera - cross_match has masked $mid_z_num vector bases in the middle of the sequence!\n";
print "$trace_name is a potential chimera - cross_match has masked $mid_z_num vector bases\n";
print "in the middle of this sequence. The cross_match X's have been replaced with n's.\n";
print LOG "$trace_name is a potential chimera - cross_match has masked $mid_z_num vector bases in the\nmiddle of this sequence. The cross_match X's have been replaced with n's.\n";
}
#repeat above for Xs (ecoli)
#---check for Xs in middle of sequence (leading and trailing Xs already removed). Xs in middle may have occured because the trace represents a chimeric EST or cross_match has mistakenly identified ecoli seq in middle of sequence. they may also be other reasons.
$chim_flag_e = 0;
my $mid_x_num = 0;
while ($sequence =~ s/[x]/n/i) {
$mid_x_num++;
$chim_flag_e = 1;
}
if ($chim_flag_e) {
open (VECTORTRIM, ">>./process/vector_trim") or die "Could not open process/vector_trim!\n";
print VECTORTRIM "$trace_name is a potential chimera - cross_match has masked $mid_x_num E.coli bases\n";
print VECTORTRIM "in the middle of this sequence. The cross_match X's have been replaced with n's.\n";
close VECTORTRIM;
print TRACEINFO "**potential chimera - cross_match has masked $mid_x_num E.coli bases in the middle of the sequence!\n";
print "$trace_name is a potential chimera - cross_match has masked $mid_x_num E.coli bases\n";
print "in the middle of this sequence. The cross_match X's have been replaced with n's.\n";
print LOG "$trace_name is a potential chimera - cross_match has masked $mid_x_num E.coli bases in the\nmiddle of this sequence. The cross_match X's have been replaced with n's.\n";
}
#finalise and check high qual lengths
my @sequence = split('',$sequence);
my $sequence_length = @sequence;
if ($hi_qual_end > $sequence_length) { #adjust high qual end in case it is longer than the sequence length (e.g. if a big chunk of trailing vector was removed)
$hi_qual_end = $sequence_length;
}
if ($hi_qual_start < 1) { #adjust high qual start in case it is less than 1 (e.g. if a big chunk of leading vector was removed)
$hi_qual_start = 1;
}
#chuck out sequences that have high qual section < $high_qual_cutoff
if (($hi_qual_end - $hi_qual_start) <= $high_qual_cutoff) {
my $high_qual_length = $hi_qual_end - $hi_qual_start;
if ($high_qual_length < 0) { $high_qual_length =0 } #make high qual a sensible number if less than zero
print TRACEINFO "Sequence rejected - high quality length ($high_qual_length bp) is less than $high_qual_cutoff\n";
print "$trace_name has beeen rejected, high quality sequence length after trimming = $high_qual_length (cut off $high_qual_cutoff)\n";
print LOG "$trace_name has beeen rejected, high quality sequence length after trimming = $high_qual_length (cut off $high_qual_cutoff)\n";
next;
}
open(OUTFILE,">>subfiles/$trace_name");
print OUTFILE ">$trace_name hq start $hi_qual_start hq end $hi_qual_end $polyA_des";
if ($chim_flag) {
print OUTFILE " chimeric(vector)?";
}
if ($chim_flag_e) {
print OUTFILE " chimeric(E.coli)?";
}
print OUTFILE "\n";
my $n=0;
while ($sequence =~ /(.)/g) {
print OUTFILE "$1";
$n++;
if(($n % 60) == 0) {
print OUTFILE "\n";
}
}
if(($n % 60) != 0) {
print OUTFILE "\n";
}
close OUTFILE;
close TRACEINFO; close VECTORTRIM; close QUALTRIM; close ADAPTERTRIM; close POLYATRIM; #if not already closed
} #while from PHD_DIR
# statistics
print colored("\nStatistics\n", "bold");
opendir (DIR, "./subfiles");
while (defined (my $file = readdir(DIR))) {
if (($scheme ==1 && $file =~ /^[A-Za-z]{2}_\w{2,5}_(\d{2,3})([A-Za-z])(\d{2})$/) ||
($scheme ==2 && $file =~ /^[A-Z][a-z][A-Z]{2}[0-9]{2}[a-z]([0-9]{2})([a-z])([0-9]{2})[a-z][0-9]_[A-Z][a-z]{2}[A-Z][a-z]$/)) {
my ($hi_qual_s, $hi_qual_end, $seq);
open(INFILE,"<subfiles/$file"); #open this file to read in seq info
while (my $line = <INFILE>) {
if ($line=~ /^>.+\shq\sstart\s(\d+)\shq\send\s(\d+)(.*)/) { #regex matching fasta type header of the file
$hi_qual_s=$1; $hi_qual_end=$2;
}
else{ # Read in the sequence
chop $line;
$seq .= $line;
}
}
$stats[3]++; #add to "number of submission files"
$stats[4] += length($seq); #gives length of sequence
$stats[5]+= ($hi_qual_end - $hi_qual_s); #gives length of high qual part of seq
}
}
print ("\n\t$stats[1]\t\tTraces processed\n");
if ($stats[1] == 0) {
print "No traces processed. Check logfile, cmlogfile and phred logfile for errors\n";
print "trace2dbest will now exit\n";
exit();
}
my $good = $stats[1] - $stats[2];
my $percent = int ($good * 100 / $stats[1]);
print ("\t$good ($percent%)\t'Good quality' traces\n");
my $percent2 = int ($stats[3] * 100/$stats[1]);
print ("\t$stats[3] ($percent2%)\tSubmissable sequences after trimming\n");
if ($stats[3] == 0) {
print "No submissable traces. Check logfile, cmlogfile and phred logfile for errors\n";
print "trace2dbest will now exit\n";
exit();
}
my $av_l = int($stats[4] / $stats[3]);
print ("\t$av_l\t\tAverage length of submissable sequences\n");
my $av_h = int($stats[5] / $stats[3]);
print ("\t$av_h\t\tAverage number of high quality bases for submissable sequences\n");
print ("\n\n");
#write these stats to a file for inclusion in output dir
open (STATS, ">>./process/statistics");
print STATS "\n\t### Trace process statistics ###\n";
print STATS "$stats[1]\t\tTraces processed\n";
print STATS "$good ($percent%)\t\t'Good quality' traces\n";
print STATS "$stats[3] ($percent2%)\t\tSubmissable sequences after trimming\n";
print STATS "$av_l\t\tAverage length of submissable sequences\n";
print STATS "$av_h\t\tAverage number of high quality bases for submissable sequences\n";
sleep(1);
print "\n\n\nDone - Back to processing.\n\n";
}
##############################
#### SECTION 4 annotation ####
##############################
sub blast () {
print colored("\n\t##### ANNOTATION OF SEQUENCES #####\n","white bold", "on_black");
print "\ntrace2dbest gives the option to preliminary annotate your ESTs based\n";
print "on BLAST. This annotation will be added to the header each sequence before\n";
print "submission to dbEST\n";
sleep(1);
&get_config;
if (!$pblastall && !$pblastcl3) {
print "trace2dbest facilitates preliminary annotation in the form of a BLAST hit,\n
however neither remote nor local BLAST utilities were found in your PATH.\n";
} else {
print "Would you like to add BLAST-based preliminary annotation to the sequences?\n";
my $input = &get_input(" (y/n): ");
chomp $input;
if ($input !~ /y/i) {
print "Ok, no BLAST annotation will be added.\n"; &options;
} else {
if ($pblastall && $pblastcl3) {
print "You have two options:\n\n";
print "\t1 - Remote BLAST using NCBI (max ~400 sequences/day)\n";
print "\t2 - Local BLAST\n\n";
while () {
my $input = &get_input("Enter number: ");
chomp $input;
if ($input =~ /^1$/) {
$blast = 1;
print "Ok, remote BLAST will be carried out on the processed sequences.\n";
last;
} elsif ($input =~ /^2$/) {
$blast = 2;
last;
} else {
print "Please enter 1 or 2\n";
next;
}
}
}elsif ($pblastall==1) {
$blast =2
} elsif ($pblastcl3==1) {
$blast = 1;
print "Ok, remote BLAST will be carried out on the processed sequences.\n";
} else {
die "blastall = $pblastall, blastcl3 = $pblastcl3";
}
#selection of local blast databse
if ($blast ==2) {
my $flag =1;
while ($flag) {
my $input = &get_input("Please enter the name of the directory that holds your BLAST databases\n(e.g. /home/db/blastdb): ");
chomp $input;
if ($input =~ /^q{1}$/i) {
print "Ok, trace2dbest will now exit...\n";
exit ();
}
if (-d $input && -r $input) {
$blastdata =$input;
@blastdb = glob ("$blastdata/*.pin");
if (!@blastdb) {
print "$blastdata does not appear to have any protein BLAST databases, please try again.\n";
next; #start while loop again now so that flag doesn't get set to 0
}
$flag = 0;
} else {
print "$input is not a directory or could not be read. Please try\nagain or type q to exit.\n";
}
}
if (@blastdb) {
print "The following protein BLAST databases were found-\n";
my $number =1;
foreach my $db (@blastdb) {
$db =~ s/$blastdata\///;
$db =~ s/\.pin//;
print "$number - $db\n";
$number++;
}
$number = $number -1; #get real number of databases
my $flag =1;
while ($flag) {
my $input = &get_input("Select a database by entering its number: ");
chomp $input;
if ($input !~ /^\s*[0-9]+\s*$/i) { #only numbers and spaces
print "Please enter a number between 1 and $number.\n";
} elsif ($input > $number && $input >= 1) {
print "Please enter a number between 1 and $number.\n";
} else {
$input--;
$blast_database = $blastdb[$input];
$blast_database = $blastdata."/".$blast_database; #add path to blast database
print "Ok, the database $blast_database will be used\n";
$flag =0;
}
}
}
}
$flag = 1;
while ($flag) {
$input = &get_input("What BLAST bit score cut-off value would you like to use (default 55)? ");
chomp $input;
if ($input) {
if ($input !~ /^\s*[0-9]+\s*$/) {
print "please enter a number.\n";
} else {
$bits_cutoff = $input;
$flag = 0;
}
} else {
$bits_cutoff = 55;
$flag = 0;
}
}
$blast_save =1;
}
}
#### run blasts,
my $seq;
unless (-e "subfiles") {
print "\nCouldn't find subfiles directory. Please change your working\n";
print "directory back to the directory where you have processed your\n";
print "traces. - Exiting now.\n\n";
sleep(1);
exit;
}
my @list = glob ("subfiles/*");
unless (@list) {print "subfiles directory appears to be empty - Please check.";exit;}
if (-e "blast_reports/blasts_full") {system ("mv blast_reports/blasts_full blast_reports/blasts_full.old")}
if (-e "blast_reports/blasts_tophit") {system ("mv blast_reports/blasts_tophit blast_reports/blasts_tophit.old")}
foreach my $list (@list) {
if ($list =~ /\.sub/) { print "subfiles directory has been used already - Please check."; exit}
my $inseq = Bio::SeqIO->new('-file' => "$list",
'-format' => 'Fasta');
while (my $handle = $inseq->next_seq) {
$seq=$handle->seq();
}
$list =~ s/subfiles\///;
put_id($blast,$blast_save,$seq,$list,$blast_database, $bits_cutoff);
}
sleep(1);
print "\n\n\nBLAST completed - Back to main menu.\n\n";
}
###########################################
#### SECTION 5 SUBMISSION INFORMATION ####
###########################################
sub paperwork () {
&get_config;
sleep(1);
my $file_type=$_[0];
my $tryagain = 0;
if ($file_type eq "Library") {
&file_update($file_type,$Lib_file,$tryagain)
}
elsif ($file_type eq "Contact") {
&file_update($file_type,$Cont_file,$tryagain)
}
elsif ($file_type eq "Publication") {
&file_update($file_type,$Pub_file,$tryagain)
}
elsif ($file_type eq "EST") {
&file_update($file_type,$EST_file,$tryagain)
}
else {print "Bugs have eaten some bytes - Exiting now\n"; exit;}
sleep(1);
print "\n\n\nSubmission information update done - Back to main menu.\n\n";
}
###########################################
#### SECTION 6 SUBMISSION ###############
###########################################
sub submission () {
print colored("\n\t##### SEQUENCE SUBMISSION #####\n","white bold", "on_black");
print "\nThis option gets the files ready for dbEST submission and, if you want to,\n";
print "submits them to dbEST immediately\n";
sleep(1);
&get_config;
# Ask for info to complete Lib, Cont and Pub files - then EST file can be made (using some info from the other files)
print colored("\nLib, Cont, Pub and EST file information\n", "bold");
my $tryagain = 0; #flag so that if users need to 'try again' to get the file then they don't get the header info which they've seen already
while($Libfileflag==0) { #get Library file info
$file_type = "Library";
$type_id = "TYPE: Lib";
my $answer=&file_choices_page($file_type, $Lib_file, $tryagain);
$tryagain =1;
if($answer=~ /h/i) { &print_help(); next; } #use less to show trace2dbest.doc
if($answer=~ /q/i) { print "Ok, trace2dbest exiting...\nBye \n"; exit(); } #exit program
if($answer == 1) { @Libfile_data= &dummyLib_file_data(); $no_lib = 1;} #no Lib file needed so make dummy file with lib name for library field of est file. set flag so we know what to add to sub file
else { @Libfile_data=&get_file_data($answer, $Lib_file, $type_id); } #record from Libfile.db to be used
if (@Libfile_data) {
$Libfileflag =1;
foreach my $line (@Libfile_data) { #extract various bit of info for other bits of the process
if ($line =~ /^NAME:\s(.*)/) {
$lib_name = $1;
}
if ($line =~ /ORGANISM:\s(.*)/) {
$species = $1;
}
}
}
}
$tryagain = 0;
while($Contfileflag==0) { #get Contact file info
$file_type = "Contact";
$type_id = "TYPE: Cont";
my $answer=&file_choices_page($file_type, $Cont_file, $tryagain);
$tryagain =1;
if($answer=~ /h/i) { &print_help(); next; } #use less to show trace2dbest.doc
if($answer=~ /q/i) { print "Ok, trace2dbest exiting...\nBye \n"; exit(); } #exit program
if($answer == 1) { @Contfile_data= &dummyCont_file_data(); $no_cont = 1;} #no Cont file needed so make dummy file with name for contact field of est file.
else { @Contfile_data=&get_file_data($answer, $Cont_file, $type_id); } #record from Contfile.db to be used
if (@Contfile_data) {
$Contfileflag =1;
foreach my $line (@Contfile_data) {
if ($line =~ /^NAME:\s(.*)/) {
$cont_name = $1;
}
}
}
}
$tryagain = 0;
while($Pubfileflag==0) { #get Publication file info
$file_type = "Publication";
$type_id = "TYPE: Pub";
my $answer=&file_choices_page($file_type, $Pub_file, $tryagain);
$tryagain =1;
if($answer=~ /h/i) { &print_help(); next; } #use less to show trace2dbest.doc
if($answer=~ /q/i) { print "Ok, trace2dbest exiting...\nBye \n"; exit(); } #exit program
if($answer == 1) { @Pubfile_data= &dummyPub_file_data(); $no_pub = 1;} #if pub file is not to be created. make dummy file with just the title to put in EST file
else { #record from Pubfile.db to be used
@Pubfile_data=&get_file_data($answer, $Pub_file, $type_id);
#get pub_cit for est file. needs to be done separately from cases where pub file was made
if (@Pubfile_data) { #...but we only want to extract this info if a pub file was selected
my $flag =0;
foreach my $line (@Pubfile_data) {
if ($line =~ /AUTHORS:/ || $line =~ /\|\|/) { #need || in case dummpy pub file being used
$flag = 0; #the multi lined title has come to an end so stop reading lines to pub-cit
}
if ($flag == 1) {
$pub_cit = $line;
chomp $pub_cit;
}
if ($line =~ /^TITLE:/ ) {
$flag = 1;
}
}
}
}
if (@Pubfile_data) {
$Pubfileflag =1;
}
}
chomp $pub_cit;
#use species name from lib file to make directory structure do now in case it fails
$species =~ s/\s/_/g; #swap spaces for underscores
if (-w "/home/db/est_solutions") {
unless (-w "/home/db/est_solutions/$species") {
mkdir "/home/db/est_solutions/$species", 0777 or die "Can't make directory /home/db/est_solutions/$species\n $!\n";
}
unless (-w "/home/db/est_solutions/$species/trace2dbest") {
mkdir "/home/db/est_solutions/$species/trace2dbest", 0777 or die "Can't make directory /home/db/est_solutions/$species/trace2dbest\n$!\n";
}
$save_location = 1; #so we know where to save the files at the end
} else { #set up directory structure in user's home directory
unless (-w "$homedir/est_solutions") {
mkdir "$homedir/est_solutions", 0755 or die "Can't make directory : $!\n";
}
unless (-w "$homedir/est_solutions/$species") {
mkdir "$homedir/est_solutions/$species" , 0755 or die "Can't make directory : $!\n";
}
unless (-w "$homedir/est_solutions/$species/trace2dbest") {
mkdir "$homedir/est_solutions/$species/trace2dbest" , 0755 or die "Can't make directory : $!\n";
}
$save_location = 0;
}
my @EST_file = &EST_file_data();
#set variables for final EST submission files
foreach my $field (@EST_file) {
if ($field =~ /SEQ_PRIMER:\s(.*)/ ) {
$seq_primer = $1; #primer name extracted later
} elsif ($field =~ /PCR_F:\s(.*)/ ) {
$pcrf_primer = $1;
}elsif ($field =~ /PCR_B:\s(.*)/ ) {
$pcrr_primer = $1;
}elsif ($field =~ /P_END:\s(.*)/ ) {
$p_end = $1;
}elsif ($field =~ /PUBLIC:\s(.*)/ ) {
$public_date = $1;
}elsif ($field =~ /COMMENT:\s(.*)/ ) {
$comment = $1;
}
}
my $blastexist;
if (-e "blast_reports/blasts_tophit") {$blastexist=1;}
else {
print "No BLAST results found - blast_reports/blasts_tophit\n";
print "doesn't seem to exist\n";
my $carryon = &get_input("Would you like to carry on without using annotation? [y/n]");
chomp $carryon;
unless ($carryon =~ /y/i) {
print "Exiting now.\n";
exit;
}
}
# creation of submission files
print colored("\nCreating submission files...\n", "bold");
#
my %blast;
if ($blastexist) {open(BLASTFILE,"<blast_reports/blasts_tophit");
while (my $line = <BLASTFILE>) {
chomp $line;
if ($line=~ /^([A-Za-z]{2}_\w{2,5}_\d{2,3}[A-Za-z]\d{2})\s(.*)/) {
$blast{$1} = $2;
}
}
}
opendir (DIR, "./subfiles");
unless ($scheme) {
print colored ("\nTrace file naming scheme\n","bold");
print "In order to extract some necessary information, the file names must follow one of\nthese schemes:\n";
print "\n1 - NERC Environmental Genomics scheme\n2 - STRESSGENES scheme\n";
$input = &get_input("\nPlease enter the appropriate number: ");
chomp $input;
$flag =1;
while ($flag) {
if ($input !~ /^(1|2)$/) {
$input = &get_input("Please enter 1 or 2: \n");
chomp $input;
} else {
$flag =0;
}
}
$scheme = $input;
}
while (defined (my $file = readdir(DIR))) {
if (($scheme ==1 && $file =~ /^[A-Za-z]{2}_\w{2,5}_(\d{2,3})([A-Za-z])(\d{2})$/) ||
($scheme ==2 && $file =~ /^[A-Z][a-z][A-Z]{2}[0-9]{2}[a-z]([0-9]{2})([a-z])([0-9]{2})[a-z][0-9]_[A-Z][a-z]{2}[A-Z][a-z]$/)) {
my ($hi_qual_s, $hi_qual_end, $seq, $put_id, $plate, $row, $column, $poly_A, $chimeric, $chimeric_e);
$plate = $1; $row = $2; $column = $3; #get plate coordinates from name to use in submission file
open(INFILE,"<subfiles/$file"); #open this file to read in seq info
open(SUBFILE,">subfiles/$file.sub"); #create this file for EST sub file
open(FASTAFILE,">fastafiles/$file.fsa"); #create this file to hold the processed seq.
while (my $line = <INFILE>) {
if ($line=~ /^>.+\shq\sstart\s(\d+)\shq\send\s(\d+)(.*)/) { #regex matching fasta type header of the file
$hi_qual_s=$1; $hi_qual_end=$2; my $dollar3 = $3;
if ($dollar3) {
if ($dollar3 =~ /trimmed/){
$poly_A=1;
}
if ($dollar3 =~ /chimeric\(vector\)/) {
$chimeric =1;
}
if ($dollar3 =~ /chimeric\(E.coli\)/) {
$chimeric_e = 1;
}
}
}
else{ # Read in the sequence
chop $line;
$seq .= $line;
}
}
if ($blastexist) {$put_id = $blast{$file}}
#extract from $primer the primer NAME. $primer should be in format (e.g.) "T7 (aggagagagcgtg)" or "T7"
my $primer_name = $seq_primer;
if ($primer_name =~ s/\(.*\)//) {} #get name part
if ($primer_name =~ s/\s//g) {} #remove whitespace
print SUBFILE "TYPE: EST\nSTATUS: New\n";
print SUBFILE "CONT_NAME: $cont_name\nCITATION:\n";
print SUBFILE "$pub_cit\n";
print SUBFILE "LIBRARY: $lib_name\n";
print SUBFILE "EST#: $file\_$primer_name\n";
print SUBFILE "CLONE: $file\n";
print SUBFILE "SOURCE:\n";
print SUBFILE "PCR_F: $pcrf_primer\n";
print SUBFILE "PCR_B: $pcrr_primer\n";
print SUBFILE "PLATE: $plate\n";
print SUBFILE "ROW: $row\n";
print SUBFILE "COLUMN: $column\n";
print SUBFILE "SEQ_PRIMER: $seq_primer\n";
if ($p_end) {
print SUBFILE "P_END: $p_end\n";
}
print SUBFILE "HIQUAL_START: $hi_qual_s\n";
print SUBFILE "HIQUAL_STOP: $hi_qual_end\n";
print SUBFILE "DNA_TYPE: cDNA\n";
print SUBFILE "PUBLIC: $public_date\n";
if($put_id && $put_id !~ /No\shits\sfound/) {
print SUBFILE "PUT_ID: Similar to $put_id\n";
} else {
print SUBFILE "PUT_ID:\n";
}
if ($poly_A) {
print SUBFILE "POLYA: Y\n";
}
print SUBFILE "COMMENT:\n";
if ($comment) {
print SUBFILE "$comment\n";
}
if ($chimeric) { #1 if x's (from vector)(possible chimera) found in sequence
print SUBFILE "A section within this EST was identified as potentially derived from\nvector sequence. It has been replaced by 'N' residues.\n";
}
if ($chimeric_e) { #1 if x's (from E.coli)(possible chimera) found in sequence
print SUBFILE "A section within this EST was identified as potentially derived from\nE.coli sequence. It has been replaced by 'N' residues.\n";
}
print SUBFILE "SEQUENCE:\n";
print FASTAFILE ">$file ";
if($put_id && $put_id !~ /No\shits\sfound/) {
print FASTAFILE "Similar to $put_id\n";
}
else {
print FASTAFILE "no similar sequence found\n";
}
my $n=0;
while ($seq=~/(.)/g) {
print SUBFILE "$1";
print FASTAFILE "$1";
$n++;
if(($n % 60) == 0) { print FASTAFILE "\n"; print SUBFILE "\n"; }
}
if(($n % 60) != 0) { print FASTAFILE "\n"; print SUBFILE "\n"; }
print SUBFILE "||\n";
close SUBFILE;
close FASTAFILE;
}
}
print colored ("Done\n", "bold");
#Section 7 - submission
print colored ("\nSubmission and saving of files\n\n", "bold");
opendir (SUBFILES, "./subfiles") or die "Could not open ./subfiles\n";
my @sub_files = grep /.sub/ , readdir SUBFILES; #don't get . and ..
foreach my $sub_file (@sub_files) {
open (DBEST, "./subfiles/$sub_file") or die "could not open ./subfile/$sub_file\n";
while (my $line = <DBEST>) {
$dbESTsubmission .= $line;
}
}
#add Pub, Lib and Cont files
if (!$no_lib) { #only put these files in sub file if they are to be submitted.
$dbESTsubmission = (join "", @Libfile_data).$dbESTsubmission;
}
if (!$no_pub) {
$dbESTsubmission = (join "", @Pubfile_data).$dbESTsubmission;
}
if (!$no_cont) {
$dbESTsubmission = (join "", @Contfile_data).$dbESTsubmission;
}
#write the sub file to a file
open (FINALSUB, ">./dbEST_submission.txt");
print FINALSUB $dbESTsubmission;
close FINALSUB;
print "The EST submission records have been merged into one file.\n";
my $input_view = &get_input("Would you like to view this completed submission file? [y/n]");
chomp $input_view;
if ($input_view =~ /y/i) {
if ($less_flag == 1) {
system("less dbEST_submission.txt");
} else {
system("more dbEST_submission.txt");
}
}
print "\nThe EST submission file may now be sent by e-mail to NCBI dbEST.\n";
print "Enter yes to send file to dbEST now, or any other key to continue without\n";
my $input_submit = &get_input("submitting (your file will be saved): ");
chomp $input_submit;
my $sent_flag = 0;
if ($input_submit =~ /^yes$/i) {
my @info = &sendEST($dbESTsubmission); #send sub file and get reply to for email to egdc
my $mail_flag = shift @info;
if ($mail_flag) {
print "Mail sent - A copy of the mail was sent to the reply email entered above.\n";
$sent_flag = 1;
}
my $reply_to = shift @info;
my $smtp_server = shift @info;
}
else {
print "Ok, file not submitted to dbEST.\n"
}
#merge logfiles and tidy up.
if (-e "phredlogfile") {
system ("cat phredlogfile >> logfile");
unlink "phredlogfile";
}
if (-e "cmlogfile") {
system ("cat cmlogfile >> logfile");
unlink "cmlogfile";
}
if (-e "temp.seq") {
unlink "temp.seq";
}
&save_files($sent_flag, $save_location);
print "\n\n\nEnd of submission option - Back to main menu.\n\n";
}
#######n##################
######################################################/#\#####}#####################################################
#################### - - SUB ROUTINES - - ###########/###\####}####################+++++++#########################
####################################################/#####\###}~~~################################################
##########################
#####################################################################################
sub file_choices_page() {
#Asks where to get Publication/Lib/Cont file info from (and if it is required)
my ($file_type) =shift @_;
my ($db_file) = shift @_;
my ($skipintro) = shift @_;
my $answer;
unless ($skipintro) {
print colored ("\n$file_type file\n", "bold");
print "\n\tEach batch of EST submissions to dbEST must have an associated\n";
print "\t$file_type file.\n";
print "\tPlease choose one of the following options:\n";
}
print "\n\t1 - $file_type file already submitted\n";
if (!-r $db_file) { #this loop allows us to reset the location of the files, e.g. reset the $lib_file variable
while (!-r $db_file) { #file is (!not)readable, e.g. .db files haven't stayed in installation dir of basedir/file.db
$db_file = &get_input("Warning: $db_file\nwas not found.\nPlease enter the location of the $file_type file: ");
chomp $db_file;
}
#correct .db file location if it has changed from default defined at top of script
#and make $db_file point to the actual file
if ($file_type eq "Library") {
$Lib_file = $db_file."/Libfile.db";
$db_file = $Lib_file;
}
if ($file_type eq "Publication") {
$Pub_file = $db_file."/Pubfile.db";
$db_file = $Pub_file;
}
if ($file_type eq "Contact") {
$Cont_file = $db_file."/Contfile.db";
$db_file = $Cont_file;
}
}
open (DBFILE, "$db_file") or die "Error: $db_file could not be opened!\n";
my $num = 1;
my $flag = 1;
my $marker = 0;
while (my $line =<DBFILE>) {
if ($marker) { #in the 1st line of TITLE free text in pub file
$num++;
if ($num == 2) { #we only want to print this line once
print "\n\tOr use a saved file...\n\n";
}
print "\t$num - ";
chomp $line; #keeps pub file entries in same format as others
print $line;
print "\n";
$marker = 0;
} elsif ($line =~ /^TITLE:/) { #pub file
$marker = 1; #next line is the title free text
} elsif ($line =~ /^NAME:\s(.*)/) { #lib and cont files
$num++;
if ($num == 2) { #we only want to print this line once
print "\n\tOr use a saved file...\n\n";
}
print "\t$num - ";
print $1;
print "\n";
}
}
close DBFILE;
my $flag1=0;
while($flag1==0) {
$answer= &get_input("\n\tDon't fancy one of those? Enter h for help or q to quit.\n");
chomp $answer;
if($answer =~/h/i || $answer =~/q/i ) { $flag1=1; next; }
elsif ($answer =~ /^\d+$/) { #if it's only digits
if ($answer > $num || $answer == 0) { #if answer is a number outside the right range...
print "Please enter a number between 1 and $num, or h for help and q to quit:\n";
}
elsif ($answer eq 1) { #else it must be in range so send back if 1
$flag1=1;
next;
}
else { #one of the records from ESTfile.db has been selected
$flag1 = 1;
next;
}
}
else { #if answer is not digits...
print "Please enter a number between 1 and $num, or h for help and q to quit:\n";
}
}
return $answer;
}
############################################################################################
sub file_update() {
#Asks where to get Publication/Lib/Cont file info from (and if it is required)
my ($file_type) =shift @_;
my ($db_file) = shift @_;
my ($skipintro) = shift @_;
my $answer; my $i; my $n; my $m;
unless ($skipintro) {
print colored ("\n$file_type file\n", "bold");
print "\n\tEach batch of EST submissions to dbEST must have an associated\n";
print "\t$file_type file.\n";
}
unless (-e "$db_file") {system ("touch $db_file")}
open (DBFILE, "$db_file") or die "Error: $db_file could not be opened!\n";
my @list; my $marker=0;
while (my $line =<DBFILE>) {
if ($marker==1) {chomp $line; push (@list,$line);}
if ($line =~ /^TITLE:/) { #pub file
$marker = 1; #next line is the title free text
}
else {$marker=0}
if ($line =~ /^NAME:\s(.*)/) { push (@list,$1);} #lib and cont files
if ($line =~ /^RECORD_ID:\s(.*)/) { push (@list,$1);} #EST files
}
close DBFILE;
my $number = scalar @list;
if ($number == 0) {
print "There are no entries for $file_type available yet.\n";
if ($file_type =~ "Library") {newLib_file_data()}
if ($file_type =~ "Publication") {newPub_file_data()}
if ($file_type =~ "Contact") {newCont_file_data()}
if ($file_type =~ "EST") {new_EST_file()}
print "\nBack to main menu\n\n";
&options;
}
else {
print "The following entries for $file_type are available for viewing - :\n\n";
for ($i=0;$i<$number;$i++) {
$n = $i+1;
print "$n - $list[$i]\n";
}
my $m = $n + 1;
print "$m - or alternatively choose this option to add a new entry.\n";
print "\n\nSelect the respective number to view an entry or to create a new entry:\n";
my $flag1=0;
while($flag1==0) {
$answer= &get_input("\n\n");
chomp $answer;
if ($answer =~ /^\d+$/) { #if it's only digits
if ($answer > $m || $answer == 0) { #if answer is a number outside the right range...
print "Please enter a number between 1 and $m .\n";
}
else { #one of the records from ESTfile.db has been selected
$flag1 = 1;
next;
}
}
else { #if answer is not digits...
print "Please enter a number between 1 and $m, or h for help and q to quit:\n";
}
}
if ($answer==$m) {
if ($file_type =~ "Library") {newLib_file_data()}
if ($file_type =~ "Publication") {newPub_file_data()}
if ($file_type =~ "Contact") {newCont_file_data()}
if ($file_type =~ "EST") {new_EST_file()}
}
else {
open (DBFILE, "$db_file") or die "Error: $db_file could not be opened!\n";
my $dummy=1; print "\n\n";
while (my $line =<DBFILE>) {
if ($dummy == $answer) {print $line}
if ($line =~ /^\|\|/) { $dummy++}
}
close DBFILE;
}
}
print "\nBack to main menu\n\n\n"; &options;
}
############################################################################################
sub get_file_data() {
#gets the selected record from ESTfile.db, Libfile.db, Pubfile.db or Contfile.db
my ($record_num) = shift @_;
my ($db_file) = shift @_;
my ($type_id) = shift @_;
my $count =0;
my $flag;
my @rec;
if ($type_id ne "RECORD_ID: ") {
$record_num = $record_num -1; #set record num for all file types except ESTfile (different because no "file not needed" option)
} else {
$record_num--; #make record equal to the desired record number for all other types of file
}
open (DBFILE, "$db_file");
while (my $line =<DBFILE>) {
if ($line =~ /^$type_id/) {
$count++;
}
if ($count == $record_num) {
$flag = 1;
}
if ($flag) {
push (@rec, $line);
}
if ($line =~ /^\|\|/) {
$flag = 0;
}
}
print "\nPlease take a few seconds to check this information\n\n";
print @rec;
#print "\nIs this the correct data? [y/n]\n";
my $input = &get_input("\nIs this the correct data? [y/n]\n");
chomp $input;
if ($input =~ /y/i) {
return @rec;
} else {
print "Ok, please choose again.\n";
undef @rec; #make the array null to indicate that no file has been selected
return @rec;
}
}
################################################################################################
sub newPub_file_data() {
#gets info from user to make Pub file.
#General plan - from the user input side, it is better to get all the obligatory info, then ask for the optional
#info. Use the status to ask only relevent questions. storing the answers to the obligatory info as varibles
# (instead of just putting them straight into the array) means that they can be added later, with the
# non-obligatory information. This allows all the data to be put in the array in the right order.
my @nPub_file_data;
my $ok_flag = 1;
while ($ok_flag) {
$nPub_file_data[0] = "TYPE: Pub\n";
my $ob_info1 = "STATUS: ";
my $ob_info2 = "TITLE: \n"; #set up variables for the obligatory info (\n in TITLE so the free text starts on the next line)
my $ob_info3 = "AUTHORS: \n";
my $ob_info4 = "YEAR: ";
my ($status, $nob_info1, $nob_info2, $nob_info3, $nob_info4, $nob_info5 );
print "\nInformation for new Publication file\n";
print "Please answer the following questions about the publication for the ESTs you wish\n";
print "to submit.\n";
my $flag = 1;
while ($flag == 1) {
print "\nWhat is the publication status? 1=unpublished, 2=submitted, 3=in press,\n4=published.\n";
$status = &get_input("Enter the appropriate number: ");
chomp $status;
if ($status !~ /^[1-4]$/) {
print "Please enter a number between 1 and 4!";
} else {
$flag =0;
}
}
$ob_info1 .= $status; #save status for later (only ask relevent questions for non-obligatory part of pub file)
$ob_info1 .= "\n";
$flag = 1;
while ($flag) {
print "\nWhat is the title of the article? ";
print "(The title you enter here will also be used in the CITATION field of the EST file.)\n";
#print "Title: ";
$pub_cit = &get_input("Title: "); #get this for est file
$pub_cit.= "\n"; #replace newline removed in get_input
if ($pub_cit =~ /\w+/) { #check pub cit has someting in it.
$flag =0;
} else {
print "You must enter something!\n";
}
}
$ob_info2 .= $pub_cit;
$flag = 1;
while ($flag) {
my $input .= &get_input("Who are the authors?\n");
$input .= "\n";
if ($input =~ /\w+/) {
$flag = 0;
$ob_info3 .= $input;
} else {
print "You must enter something!\n";
}
}
$flag = 1;
while ($flag) {
my $input .= &get_input("Year of publication?\n");
$input .= "\n";
if ($input =~ /^[0-9]{4}$/) { #only 4 digits
$flag = 0;
$ob_info4 .= $input;
} else {
print "You must enter a year in the valid format, e.g. 2004!\n";
}
}
if ($status == 2 ||$status== 3 ||$status== 4) { #only ask questions that are relevent given the pub status
$nob_info1 = "JOURNAL: " . &get_input("What is the publication journal?\n") . "\n";
}
if ($status== 4) {
$nob_info2 = "VOLUME: " . &get_input("What is the volume number? ") . "\n";
$nob_info3 = "ISSUE: " . &get_input("What is the issue number? ") . "\n";
$nob_info4 = "PAGES: " . &get_input("What's the page numbers? ") . "\n";
$nob_info5 = "MEDUID: " .&get_input("Medline unique identifer (if known): ") . "\n";
}
#assemble pub file and display. if ok then proceed, else redo!
$nPub_file_data [1] = $ob_info2; #add title
$nPub_file_data [2] = $ob_info3; #add authors
if ($status == 1) {
$nPub_file_data [3] = $ob_info4; #add year
$nPub_file_data [4] = $ob_info1; #add status
}
elsif ($status == 2 || $status ==3) {
$nPub_file_data [3] = $nob_info1; #add journal
$nPub_file_data [4] = $ob_info4; #add year
$nPub_file_data [5] = $ob_info1; #add status
}
elsif ($status == 4) {
$nPub_file_data [3] = $nob_info1; #add journal
$nPub_file_data [4] = $nob_info2; #add volume
$nPub_file_data [5] = $nob_info3; #add issue
$nPub_file_data [6] = $nob_info4; #add pages
$nPub_file_data [7] = $ob_info4; #add year
$nPub_file_data [8] = $ob_info1; #add status
}
#separate section to add meduid, if present
#if ($nob_info5 =~ /MEDUID:\s\w+/) { #true if an identifier entered
if ($nob_info5) {
unshift @nPub_file_data, $nob_info5; #add nob_info5 to start of the array
}
print "\n\tPlease check your new Pub file:\n @nPub_file_data\n";
#print "\nAre you happy to continue? (Y/N): ";
my $input = &get_input("\nAre you happy to continue? (Y/N): ");
chomp $input;
if ($input =~ /n/i) {
print "Ok, please re-enter the details...\n";
}else {
push @nPub_file_data, "||\n"; #add double pipe file separator to end of array
&save_new_file(@nPub_file_data);
$ok_flag = 0;
}
}
return @nPub_file_data;
}
################################################################################################
sub dummyPub_file_data() {
#Pub file not needed so get 'title' info for est file
my @dPub_file_data;
#print "What is the title of the article in the Pub file you have submitted?\n";
my $input .= &get_input("What is the title of the article in the Pub file you have submitted?\n");
#chomp $input;
$pub_cit = $input; #get this for est file
$dPub_file_data [0] = "TYPE: Pub\n";
$dPub_file_data [1] = "TITLE:\n" .$input;
push @dPub_file_data, "||\n"; #add double pipe file separator to end of array
return @dPub_file_data;
}
################################################################################################
sub newLib_file_data() {
#now updated to include all library file fields
my @nLib_file_data;
my $ok_flag = 1;
while ($ok_flag) {
@nLib_file_data = "";
$nLib_file_data[0] = "TYPE: Lib\n";
print "\nInformation for new Library file\n";
print "Please answer the following questions about the Library used to generate the ESTs\n";
print "you wish to submit.\n";
my $input; #gets used for all inputs. sometimes use strict seems really stupid. or is it just me?
my $flag =1;
while ($flag) {
#print "What is the name of the library?\n";
$input = &get_input("What is the name of the library?\n");
chomp $input;
if ($input) { #check that something has been entered
$flag = 0;
} else {
print "You must enter the library name\n";
}
}
$nLib_file_data[1] = "NAME: " . $input . "\n";
$flag =1; #reset flag for next question
while ($flag) {
#print "What is the scientific name of the organism from which the library was prepared?\n";
$input = &get_input("What is the scientific name of the organism from which the library was prepared?\n");
chomp $input;
if ($input =~ /[A-Z]+\s[A-Z]+/i) { #rough match of a binomial name
$flag=0;
} else {
print "You must enter the bionomial scientific name of the organism\n";
}
}
$nLib_file_data[2] = "ORGANISM: " . $input ."\n";
print "The following information is not compulsory but please enter as much as you can.\nIf you can't enter the requested information, press enter.\n";
#print "Organism strain: ";
my $number = 3; #for place in array. could use push or something but this works as well.
$input = &get_input("Organism strain: ");
chomp $input;
if ($input) {
$nLib_file_data[$number]= "STRAIN: " .$input ."\n" ;
$number++;
}
#print "Cultivar: ";
$input = &get_input("Cultivar: ");
chomp $input;
if ($input) {
$nLib_file_data[$number]= "CULTIVAR: " .$input ."\n" ;
$number++;
}
#print "Sex: ";
$input = &get_input("Sex: ");
chomp $input;
if ($input) {
$nLib_file_data[$number]= "SEX: " .$input ."\n" ;
$number++;
}
#print "Organ: ";
$input = &get_input("Organ: ");
chomp $input;
if ($input) {
$nLib_file_data[$number]= "ORGAN: " .$input ."\n" ;
$number++;
}
#print "Tissue: ";
$input = &get_input("Tissue: ");
chomp $input;
if ($input) {
$nLib_file_data[$number]= "TISSUE: " .$input ."\n" ;
$number++;
}
#print "Cell type: ";
$input = &get_input("Cell Type: ");
chomp $input;
if ($input) {
$nLib_file_data[$number]= "CELL_TYPE: " .$input ."\n" ;
$number++;
}
#print "Cell line: ";
$input = &get_input("Cell Line: ");
chomp $input;
if ($input) {
$nLib_file_data[$number]= "CELL_LINE: " .$input ."\n" ;
$number++;
}
#print "Stage: ";
$input = &get_input("Stage: ");
chomp $input;
if ($input) {
$nLib_file_data[$number]= "STAGE: " .$input ."\n" ;
$number++;
}
#print "Host: ";
$input = &get_input("Host: ");
chomp $input;
if ($input) {
$nLib_file_data[$number]= "HOST: " .$input ."\n" ;
$number++;
}
#print "Vector: ";
$input = &get_input("Vector: ");
chomp $input;
if ($input) {
$nLib_file_data[$number]= "VECTOR: " .$input ."\n" ;
$number++;
}
#print "Vector type: ";
$input = &get_input("Vector type: ");
chomp $input;
if ($input) {
$nLib_file_data[$number]= "V_TYPE: " .$input ."\n" ;
$number++;
}
#print "Restriction enzyme 1: ";
$input = &get_input("Restriction enzyme 1: ");
chomp $input;
if ($input) {
$nLib_file_data[$number]= "RE_1: " .$input ."\n" ;
$number++;
}
#print "REstriction enzyme 2: ";
$input = &get_input("Restriction enzyme 2: ");
chomp $input;
if ($input) {
$nLib_file_data[$number]= "RE_2: " .$input ."\n" ;
$number++;
}
#print "Library description: ";
$input = &get_input("Library description: ");
chomp $input;
if ($input) {
$nLib_file_data[$number]= "DESCR: \n" .$input ."\n" ;
$number++;
}
print "\n\tPlease check your new Lib file:\n @nLib_file_data\n";
#print "\nAre you happy to continue? (Y/N): ";
$input = &get_input("\nAre you happy to continue? (Y/N): ");
chomp $input;
if ($input =~ /n/i) {
print "Ok, please re-enter the details...\n";
}else {
push @nLib_file_data, "||\n"; #add double pipe file separator to end of array
&save_new_file(@nLib_file_data);
$ok_flag = 0;
}
}
return @nLib_file_data;
}
################################################################################################
sub dummyLib_file_data() {
#Lib file not needed so get 'name' info for est file
my @dLib_file_data;
my $flag=1;
my $input;
while ($flag) {
#print "What is the name of the library?\n";
$input .= &get_input("What is the name of the library?\n");
chomp $input;
if ($input) {
$flag=0;
} else {
print "You must enter the library name\n";
}
}
#print "From what organism are the ESTs derived (e.g. Lumbricus rubellus)?\n";
$species = &get_input("From what organism are the ESTs derived (e.g. Lumbricus rubellus)?\n");
chomp $species;
$flag = 1;
while ($flag) {
if ($species !~ /[A-Z]*\s[A-Z]/i) {
#print "Please enter the binomial name for the species\n";
$species = &get_input("Please enter the binomial name for the species\n");
chomp $species;
} else {
$flag =0;
}
}
$dLib_file_data [0] = "TYPE: Lib\n";
$dLib_file_data [1] = "NAME: " .$input;
push @dLib_file_data, "||\n"; #add double pipe file separator to end of array
return @dLib_file_data;
}
################################################################################################
sub newCont_file_data() {
my @nCont_file_data;
my $ok_flag = 1;
while ($ok_flag) {
@nCont_file_data = "";
$nCont_file_data[0] = "TYPE: Cont\n";
print "\nInformation for new Contact file\n";
print "Please answer the following questions about the contact person for the ESTs\n";
print "you wish to submit.\n";
my $input; #gets used for all inputs.
my $flag =1;
while ($flag) {
#print "What is the name of the person submitting the ESTs?\n";
$input = &get_input("What is the name of the person submitting the ESTs?\n");
chomp $input;
if ($input) { #check that something has been entered
$flag = 0;
} else {
print "You must enter a contact name\n";
}
}
$nCont_file_data[1] = "NAME: " . $input . "\n";
print "The following information is not compulsory but please enter as much as you can.\nIf you can't enter the requested information, press enter.\n";
#print "FAX number: ";
my $number = 2;
$input = &get_input("FAX number: ");
chomp $input;
if ($input) {
$nCont_file_data[$number]= "FAX: " .$input ."\n" ;
$number++;
}
#print "Telephone number: ";
$input = &get_input("Telephone number: ");
chomp $input;
if ($input) {
$nCont_file_data[$number]= "TEL: " .$input ."\n" ;
$number++;
}
#print "Email: ";
$input = &get_input("Email: ");
chomp $input;
if ($input) {
$nCont_file_data[$number]= "EMAIL: " .$input ."\n" ;
$number++;
}
#print "Laboratory providing EST: ";
$input = &get_input("Laboratory providing EST: ");
chomp $input;
if ($input) {
$nCont_file_data[$number]= "LAB: " .$input ."\n" ;
$number++;
}
#print "Institution name: ";
$input = &get_input("Institution name: ");
chomp $input;
if ($input) {
$nCont_file_data[$number]= "INST: " .$input ."\n" ;
$number++;
}
#print "Address (comma delineate): ";
$input = &get_input("Address (comma delineate): ");
chomp $input;
if ($input) {
$nCont_file_data[$number]= "ADDR: " .$input ."\n" ;
$number++;
}
print "\n\tPlease check your new Cont file:\n @nCont_file_data\n";
#print "\nAre you happy to continue? (Y/N): ";
$input = &get_input("\nAre you happy to continue? (Y/N): ");
chomp $input;
if ($input =~ /n/i) {
print "Ok, please re-enter the details...\n";
}else {
push @nCont_file_data, "||\n"; #add double pipe file separator to end of array
&save_new_file(@nCont_file_data);
$ok_flag = 0;
}
}
return @nCont_file_data;
}
################################################################################################
sub dummyCont_file_data() {#Cont file not needed so get 'cont name' info for est file
my @dCont_file_data;
my $flag=1;
my $input;
while ($flag) {
#print "What is the name of the person submitting the ESTs?\n";
$input .= &get_input("What is the name of the person submitting the ESTs?\n");
chomp $input;
if ($input) {
$flag =0;
} else {
print "You must enter a name\n";
}
}
$dCont_file_data [0] = "TYPE: Cont\n";
$dCont_file_data [1] = "NAME: " .$input;
push @dCont_file_data, "||\n"; #add double pipe file separator to end of array
return @dCont_file_data;
}
################################################################################################
sub save_new_file() {
#allows the new EST, Pub, Lib, or Cont file to be saved to the relevant .db file
my @new_file = @_;
my $file; #gets set to scalar rep one of the .db files
#print "Would you like to save this file for future use?(y/n):";
my $input = &get_input("Would you like to save this file for future use?(y/n):");
chomp $input;
if ($input =~ /y/i) {
foreach my $line (@new_file) {
if ($line =~ /TYPE:\s.*Pub/) {
$file = $Pub_file;
} elsif ($line =~ /TYPE:\s.*Lib/) {
$file = $Lib_file;
} elsif ($line =~ /TYPE:\s.*Cont/) {
$file = $Cont_file;
} elsif ($line =~ /TYPE:\s.*EST/) {
$file = $EST_file;
#print "Please enter a name for this file: "; #get name for this record so it can be identifed when recalling saved files
my $id = &get_input("Please enter a name for this file: ");
chomp $id;
unshift @new_file, "RECORD_ID: $id\n";
last;
}
}
open (DBFILE, ">>$file") or die "Could not open $file\n"; #use >> for appending to file
print DBFILE @new_file;
print "File saved to $file\n";
close DBFILE;
} else {
print "Ok, file will not be saved.\n";
}
}
################################################################################################
sub EST_file_data () {
#allows user to select saved file or enter new EST file info (via sub new_EST_file), both return an array with the file data.
my $sub_flag = 1;
print colored ("\nEST File\n", "bold");
print "\n\tEach sequence submitted to dbEST must have an associated EST file.\n\tPlease choose one of the following options for the creation of this file:\n\n";
while ($sub_flag) {
$sub_flag = 0; #hope everything goes ok
print "\t1 - Enter information for a new EST file now\n";
if (!-r $EST_file) { #this loop allows us to reset the location of the file
while (!-r $EST_file) { #file is (!not)readable, e.g. .db files haven't stayed in installation dir of basedir/file.db
#print "Warning: $EST_file was not found.\nPlease enter the location of the EST file: ";
$EST_file = &get_input("Warning: $EST_file was not found.\nPlease enter the location of the EST file: ");
chomp $EST_file;
}
}
open (ESTFILE, "$EST_file") or die "Error: $EST_file could not be opened!\n";
my $num = 1;
while (my $line =<ESTFILE>) {
if ($line =~ /^RECORD_ID:\s(.*)/) {
$num++;
if ($num == 2) { #we only want to print this line once
print "\n\tOr use a saved file...\n\n";
}
print "\t$num - ";
print $1;
print "\n";
}
}
close ESTFILE;
# print "\n\tDon't fancy one of those? Enter h for help or q to quit.\n";
my $flag1=0;
while($flag1==0) {
my $answer= &get_input("\n\tDon't fancy one of those? Enter h for help or q to quit.\n");
chomp $answer;
if($answer=~ /h/i) { &print_help(); next; } #use less to show trace2dbest.doc
if($answer=~ /q/i) { print "Ok, trace2dbest exiting...\nBye \n"; exit(); } #exit program
if($answer =~/h/i || $answer =~/q/i ) { $flag1=1; next; }
elsif ($answer =~ /^\d+$/) { #if it's only digits
if ($answer > $num || $answer == 0) { #if answer is a number outside the right range...
print "Please enter a number between 1 and $num, or h for help and q to quit:\n";
}
elsif ($answer eq 1) {
@ESTfile_data = &new_EST_file();
$flag1=1;
}
else { #one of the records from ESTfile.db has been selected
my $type_id = "RECORD_ID: "; #set type id for get_file_data
@ESTfile_data = &get_file_data($answer, $EST_file, $type_id);
$flag1 = 1;
}
}
else { #if answer is not digits...
print "Please enter a number between 1 and $num, or h for help and q to quit:\n";
}
}
if (@ESTfile_data) {
return @ESTfile_data;
} else {
$sub_flag = 1; #everything not ok, return to near start
}
}
}
################################################################################################
sub new_EST_file() {
#gets info from user to make EST.db file
my @nEST_file_data;
my $input;
my $ok_flag = 1;
while ($ok_flag) {
my $flag = 1;
while ($flag) {
$flag=0;
#print "\nWhat was the sequencing primer used? [enter the primer name, optionally followed by the\nsequence in brackets()].\nE.g. SAC(GGGAACAAAAGCTGGAG): ";
$input = &get_input("\nWhat was the sequencing primer used? [enter the primer name, optionally followed by the\nsequence in brackets()].\nE.g. SAC(GGGAACAAAAGCTGGAG): ");
chomp $input;
unless ($input) { #simple check for some kind of input
print "You must enter at least the name of the sequencing primer.\n";
$flag=1;
}
$nEST_file_data[0] = "SEQ_PRIMER: $input\n";
}
#print "What was the forward PCR primer? ";
$input = get_input("What was the forward PCR primer? ");
chomp $input;
$nEST_file_data[1] = "PCR_F: $input\n";
#print "What was the reverse PCR primer? ";
$input = get_input("What was the reverse PCR primer? ");
chomp $input;
$nEST_file_data[2] = "PCR_B: $input\n";
$flag =1;
while ($flag) {
#print "Which end was sequenced (5'/3')? If neither, just hit return: ";
$input = get_input("Which end was sequenced (5'/3')? If neither, just hit return: ");
chomp $input;
if ($input =~ /(5'|3')/ || !$input) {
$flag = 0;
} else {
print "Please enter 5' or 3'\n";
}
}
$nEST_file_data[3] = "P_END: $input\n";
#get release date
print "\nPlease enter the date you would like your data to be made public. This date\n";
print "will be inserted into the PUBLIC field of the submission file.\n";
#print "Enter the date in the format MM/DD/YYYY. For immediate release, just press enter.\n";
my $input = get_input("Enter the date in the format MM/DD/YYYY. For immediate release, just press enter.\n");
$input .= "\n";
if ($input) {
while ($input !~ /(\d\d)\/(\d\d)\/(\d\d\d\d)\n/) {
if ($input=~/^\s$/) {
$input = "";
last;
}
else {
print "Please re-enter the date in the format MM/DD/YYYY\n";
$input = get_input("Please re-enter the date in the format MM/DD/YYYY\n");
$input .= "\n";
}
}
}
chomp $input;
$nEST_file_data[4] = "PUBLIC: $input\n";
#get comment
#print "Please enter a comment about the ESTs for inclusion in the \"COMMENT\" field of\nthe EST file.\n";
$input = get_input("Please enter a comment about the ESTs for inclusion in the \"COMMENT\" field of\nthe EST file.\n");
chomp $input;
$nEST_file_data[5] = "COMMENT: $input\n";
#now check it's ok and save then return.
print "\n\tPlease check the data you have entered:\n @nEST_file_data\n";
print "\nAre you happy to continue? (Y/N): ";
$input = get_input("\nAre you happy to continue? (Y/N): ");
chomp $input;
if ($input =~ /n/i) {
print "Ok, please re-enter the details...\n";
}else {
unshift @nEST_file_data, "TYPE: EST\n"; #add file type to start of array
push @nEST_file_data, "||\n"; #add double pipe file separator to end of array
&save_new_file(@nEST_file_data);
$ok_flag = 0;
}
}
return @nEST_file_data;
}
################################################################################################
sub Get_vectors () {
#uses vector file path to read in (fasta format) file and print out vector names (fasta headers)
my ($file) = @_;
my $line;
my @vectors;
my $count;
my $input;
my $return; #the return value, 1 for vector not there (or no FASTA headers found), 0 if it is.
open (VECTORFILE, "<$file") || die "$file could not be opened";
while ($line = <VECTORFILE>) {
if ($line =~ /^>/) {
push @vectors, $line;
$count++;
}
}
if ($count) {
print "\nThe following $count vectors were found in $file\n\n";
print @vectors;
#print "\nAre you satisfied that the relevant vector is here? (y/n): ";
$input = &get_input("\nAre you satisfied that the relevant vector is here? (y/n): ");
chomp $input;
if ($input =~ /n/i) {
print "Ok, please re-select a vector file...\n\n";
$return =1;
}
}else {
print "ERROR: No FASTA headers were found in $file.\nPlease re-select a FASTA format file...\n\n";
$return =1;
}
return $return;
}
#################################################################################
sub split_screened {
# converts a fasta file to individual
# txt files
my $file=shift(@_);
my $dir=shift(@_);
my $flag=0;
my $in_seq= "";
my @heading;
my $header;
my $filename;
open(FSAFILE, "$file") || die "Can't open $file\n";; #catenated cross_match output file
while(my $line=<FSAFILE>) {
if($flag==1 && $line!~/^>/) {
chop $line;
$in_seq.=$line;
}
if($flag==1 && $line=~/^>/) { # print to file
$flag=0;
open(OUTFILE, ">$dir/$filename.seqsc") || die "Can't open $dir/$filename.seqsc\n";
print OUTFILE "$header\n";
my $n=0;
while ($in_seq=~/(.)/g) {
print OUTFILE "$1";
$n++;
if(($n % 60) == 0) { print OUTFILE "\n"; }
}
if(($n % 60) != 0) { print OUTFILE "\n"; }
$in_seq="";
close OUTFILE;
}
if($flag==0 && substr($line,0,1) eq ">" ) { #an unusual regex.
chop $line;
$flag=1;
$header=$line;
@heading = split(' ',$line);
$heading[0] =~ s/>//;
$filename = $heading[0];
next;
}
}
#print last record to file
open(OUTFILE, ">>$dir/$filename.seqsc") || die "Can't open $filename.txt\n";
print OUTFILE "$header\n";
my $n=0;
while ($in_seq=~/(.)/g) {
print OUTFILE "$1";
$n++;
if(($n % 60) == 0) { print OUTFILE "\n"; }
}
if(($n % 60) != 0) { print OUTFILE "\n"; }
close OUTFILE;
}
################################################################################################
sub put_id {
#runs BLAST (either local or remote) and gets id line of top hit
my $blast_type = shift (@_);
my $blast_save = shift (@_);
my $sequence = shift (@_);
my $file = shift (@_);
my $blast_db = shift (@_);
my $bits_cutoff = shift (@_); #sounds painful.
my $prot_id = "";
my $flag = 0;
my $evalue = "0";
my $scores;
my $bits; #ooh er
open(TEMPFILE,">temp.seq");
print TEMPFILE ">Query\n$sequence\n";
close TEMPFILE;
while ($flag < 5) { ## Attempt blast 5 times
if ($blast_type == 1) {
print "Remote BLAST of $file ...\n";
system ("blastcl3 -p blastx -d nr -i temp.seq -o blast.out -T F");
my $aflag = 0;
while ($aflag == 0) {
wait();
open(FH,"blast.out");
if( -s FH > 550) { $aflag=1; close(FH); next; } #-s =non-zero file test operator
close(FH);
}
} else {
print "Local BLAST of $file ...\n";
system("blastall -p blastx -d $blast_db -i temp.seq -o blast.out -T F");
}
open(FH,"blast.out");
if(-s FH > 550) { $flag=10; close(FH); next; }
close(FH);
$flag++;
}
if($flag==10){
open(BLASTFILE,"<blast.out");
if ($blast_save ==1) {
open (BLASTSAVE, ">>blast_reports/blasts_full");
print BLASTSAVE "\n==========================================================================\n";
print BLASTSAVE "\n$file BLAST report\n";
}
$flag=0;
while (my $line=<BLASTFILE>) {
if ($blast_save ==1) {
print BLASTSAVE "$line";
}
if($line=~/^>(.*)/ && $flag==0) { $prot_id=$1; $flag=1; next; }
if($line =~ /Expect\s=\s(\w.*)/ && $flag==1) {
chomp $line;
$scores = $line;
$scores =~ /Score\s+=\s+(\d+)/;
$bits = $1;
if ($bits >= $bits_cutoff) {
$evalue=1; #use as flag to say that hit was found
}
last;
}
}
if ($evalue ==0) { #catch no hit situation
$prot_id = "";
} else {
$prot_id =~ s/>.*//; #remove > from hit description
$prot_id =~ s/\[(.*)\]/ \- $1/; #remove square brackets from descr
$prot_id .= ". $scores"; #add score and e-value
open (BLASTHIT, ">>blast_reports/blasts_tophit");
print BLASTHIT ("$file $prot_id\n");
}
} else {
print LOG "$file blast failed\n";
}
unlink("blast.out"); #removes blast.out
close BLASTFILE; close BLASTHIT;
return $prot_id;
}
###############################################################################
sub get_id () {
}
#################################################################################
sub sendEST (){
#send the submission file by email to dbEST
use Mail::Mailer;
my ($dbESTsend) = @_;
my $flag = 1; #flag set to 0 if mail returns error
my $recipient = "batch-sub\@ncbi.nlm.nih.gov";
#my $recipient = "al.anthony\@ed.ac.uk";
print "The EST submission file will now be sent by e-mail to\n";
print "$recipient\n";
#print "Please enter your e-mail address for replies:\n";
my $replyto = &get_input("Please enter your e-mail address for replies:\n");
chomp $replyto;
my $subject = "EST submission";
my $body = "\n$dbESTsend";
my $SMTP_SERVER = &get_input("Please enter your SMTP server for outgoing mail (your\nlocal computer support will be able to tell you what this is):\n");
chomp $SMTP_SERVER;
print "Sending to $recipient...\n";
eval {
my $mailer = Mail::Mailer->new("smtp", Server => $SMTP_SERVER);
$mailer -> open ({ From => "",
To => $recipient,
Cc => $replyto,
'Reply-to' => $replyto,
Subject => $subject });
print $mailer $body;
$mailer ->close;
};
if ($@) { #this variable will contain any error messages returned
print colored ("\nWARNING! - There was a problem sending this email.\n", "green bold");
print "see logfile for details. trace2dbest continuing...\n\n";
open (LOG, ">>logfile");
print LOG "MAIL WARNING!\n$@\nsmtp server was $SMTP_SERVER\nrecipeint was $recipient\nreply to was $replyto\n";
close LOG;
$flag = 0;
}
#put smtp server and reply to in array and return it for use in EGTDC mail
my @info = ($flag, $replyto, $SMTP_SERVER);
return @info;
}
################################################################################
sub save_files (){
#saves all the output files in the users home dir or the standard common location.
my ($sent) = shift @_;
my ($place) = shift @_;
my $location;
my @traces;
my $date= `date "+%m_%d_%y_%H-%M-%S"`; #get date and time to use as a unique identifier for file
chomp $date;
my $x = $date;
if ($sent) { #add tag to indicate if file was submitted to dbEST or not
$x .= "submitted";
} else {
$x .= "unsubmitted";
}
if ($place) {
$location = "/home/db/est_solutions/$species/trace2dbest/$x";
} else {
$location = "$homedir/est_solutions/$species/trace2dbest/$x";
}
mkdir "$location", 0777 or die "Can't make directory $location/raw_traces\n$!\n";
my @files = ("dbEST_submission.txt", "blast_reports/","fasta/","fastafiles/","partigene/","phd_dir/","process/","qual/","scf/","subfiles/");
foreach my $file (@files) {
rename "$file", "$location/$file" or die "There was an error moving $file. Check logfile for details\n",
print LOG "rename $file to $location$file failed with $!\n";
}
#put relevant files in partigene dir
my @quals = glob ("$location/qual/*.qual");
foreach my $qual (@quals) {
copy ("$qual" , "$location/partigene") or die "Can't copy $qual to $location/partigene\n", print LOG "$qual to $location/partigene:$!\n";
}
my @seqs = glob ("$location/fasta/*.seq");
foreach my $seq (@seqs) {
copy ("$seq" , "$location/partigene") or die "Can't copy $seq/* to $location/partigene\n", print LOG "$seq/* to $location/partigene:$!\n";
}
system ("mv raw_traces $location/raw_traces");
close LOG;
rename "logfile", "$location/logfile" or die "Could not move logfile!\n$!";
print "\n\nThe EST submission file and the other output directories have been\nsaved in the following location:\n";
print "$location\n\n";
sleep(2);
}
########################################################################################################################
sub print_help() {
#prints help document using less
#system("less $basedir/trace2dbest_UG.txt");
#temporarily disabled - incompatibility with some unix tastes
# Ok this is a bit of a joke ...
print "\nSearching for online help ..."; sleep (2);
print "\nCouldn't find online help - please use userguide provided with the trace2dbest package.\n\n";
}
##########################################################################################################################
#########################################################################################################################
sub find_program() { #### find executables
my $prog=$_[0];
my $check=$_[1];
my $pathflag=0;
my $path;
my $finalpath;
foreach $path (@PATH) {
if (-f "$path/$prog") {
$pathflag=1; $finalpath=$path; last;
}
}
if($pathflag==0) {
print colored("Can't find the $prog utility on your system\n","red bold");
print colored("Please ensure it is installed and in your path\n","red bold");
sleep(1);
unless ($check) {exit()};
return 0;
}
else { return "$finalpath/$prog"; }
}
#########################################################################################################################
sub query_for_exit() {
#exits program if 'n' entered
my $input='';
while($input!~/y|n/i) {
#print "\b";
$input= &get_input("\b");
chomp $input;
}
if($input=~/^n/i) { print "Bye \n"; exit(); }
}
##########################################################################################################################
####################################################################################
sub yes_no() { #### returns '1' for yes
my $yflag=0;
print colored("\n[y/n] : ","green bold");
my $input='';
while($input!~/y|n/i) {
print "\b";
$input=<STDIN>;
chomp $input;
}
if($input=~/^y/i) { $yflag=1; }
return $yflag;
}
####################################################################################
sub get_input() {
#uses either readline module or the traditional way to get user response to question printed to screen
my $input;
my $question = shift (@_);
if ($read_gnu) { #true if gnu readline module installed
$input = $term->readline("$question"); #print question and get user input
} else { #do it the old way, without readline
print "$question";
$input =<>;
}
chomp $input; #remove trailing newline
$input =~ s/\s*$//; #remove trailing space
return $input;
}
######################################################################################################################
#########################################################################################################################
sub get_config() {
my $filename = "~/.trace2dbest.conf";
$filename =~ s{ ^ ~ ( [^/]* ) }
{ $1
? (getpwnam($1))[7]
: ( $ENV{HOME} || $ENV{LOGDIR}
|| (getpwuid($>))[7]
)
}ex;
open (CONFILE,"$filename") || die "Error opening .trace2dbest.conf\n";
while (my $line=<CONFILE>) {
if ($line=~/^VECTOR\=(.+)/i) { $vector_file=$1; }
if ($line=~/^ECOLI\=(.+)/i) { $ecoli_file=$1; }
if ($line=~/^LIB_FILE\=(.+)/i) { $Lib_file=$1; }
if ($line=~/^PUB_FILE\=(.+)/i) { $Pub_file=$1; }
if ($line=~/^CONT_FILE\=(.+)/i) { $Cont_file=$1; }
if ($line=~/^EST_FILE\=(.+)/i) { $EST_file=$1; }
if ($line=~/^PHRED_OPTIONS\=(.+)/i) { $phredoptions=$1; }
}
close (CONFILE);
}
##############################################################################################################
############################################################################################################################
sub create_conf() { #### creates configfile if not found in home directory
my $config_file = $homedir;
$config_file = "$config_file" . "/.trace2dbest.conf";
open (CONF, ">$config_file") || die "Sorry, could not open/.trace2dbest.conf $!";
print CONF "### location of vector.seq ###\n";
print CONF "VECTOR=$_[0]\n";
print CONF "### location of ecoli.seq file ###\n";
print CONF "ECOLI=$_[1]\n";
print CONF "### location of library file ###\n";
print CONF "LIB_FILE=$_[2]\n";
print CONF "### location of publication files ###\n";
print CONF "PUB_FILE=$_[3]\n";
print CONF "### location of cont file ###\n";
print CONF "CONT_FILE=$_[4]\n";
print CONF "### location of EST file ###\n";
print CONF "EST_FILE=$_[5]\n";
print CONF "### phred parameters (Change only if you are really sure what you are doing)###\n";
print CONF "PHRED_OPTIONS=$_[6]\n";
close (CONF) || die "Error - cannot close: $!\n";
}
#######################################################################################################
|