1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 4160 4161 4162 4163 4164 4165 4166 4167 4168 4169 4170 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 4195 4196 4197 4198 4199 4200 4201 4202 4203 4204 4205 4206 4207 4208 4209 4210 4211 4212 4213 4214 4215 4216 4217 4218 4219 4220 4221 4222 4223 4224 4225 4226 4227 4228 4229 4230 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 4241 4242 4243 4244 4245 4246 4247 4248 4249 4250 4251 4252 4253 4254 4255 4256 4257 4258 4259 4260 4261 4262 4263 4264 4265 4266 4267 4268 4269 4270 4271 4272 4273 4274 4275 4276 4277 4278 4279 4280 4281 4282 4283 4284 4285 4286 4287 4288 4289 4290 4291 4292 4293 4294 4295
|
==========
Change Log
==========
NOTE: In the future release 3.2.0, use of many of the pre-PEP8 methods (such as
`ParserElement.parseString`) will start to raise `DeprecationWarnings`. 3.2.0 should
get released some time later in 2024. I currently plan to completely
drop the pre-PEP8 methods in pyparsing 4.0, though we won't see that release until
at least late 2024 if not 2025. So there is plenty of time to convert existing parsers to
the new function names before the old functions are completely removed. (Big
help from Devin J. Pohly in structuring the code to enable this peaceful transition.)
Version 3.2.0 will also discontinue support for Python versions 3.6 and 3.7.
Version 3.1.3 - August, 2024
----------------------------
- Added new `Tag` ParserElement, for inserting metadata into the parsed results.
This allows a parser to add metadata or annotations to the parsed tokens.
The `Tag` element also accepts an optional `value` parameter, defaulting to `True`.
See the new `tag_metadata.py` example in the `examples` directory.
Example:
# add tag indicating mood
end_punc = "." | ("!" + Tag("enthusiastic")))
greeting = "Hello" + Word(alphas) + end_punc
result = greeting.parse_string("Hello World.")
print(result.dump())
result = greeting.parse_string("Hello World!")
print(result.dump())
prints:
['Hello', 'World', '.']
['Hello', 'World', '!']
- enthusiastic: True
- Added example `mongodb_query_expression.py`, to convert human-readable infix query
expressions (such as `a==100 and b>=200`) and transform them into the equivalent
query argument for the pymongo package (`{'$and': [{'a': 100}, {'b': {'$gte': 200}}]}`).
Supports many equality and inequality operators - see the docstring for the
`transform_query` function for more examples.
- Fixed issue where PEP8 compatibility names for `ParserElement` static methods were
not themselves defined as `staticmethods`. When called using a `ParserElement` instance,
this resulted in a `TypeError` exception. Reported by eylenburg (#548).
- To address a compatibility issue in RDFLib, added a property setter for the
`ParserElement.name` property, to call `ParserElement.set_name`.
- Modified `ParserElement.set_name()` to accept a None value, to clear the defined
name and corresponding error message for a `ParserElement`.
- Updated railroad diagram generation for `ZeroOrMore` and `OneOrMore` expressions with
`stop_on` expressions, while investigating #558, reported by user Gu_f.
- Added `<META>` tag to HTML generated for railroad diagrams to force UTF-8 encoding
with older browsers, to better display Unicode parser characters.
- Fixed some cosmetics/bugs in railroad diagrams:
- fixed groups being shown even when `show_groups`=False
- show results names as quoted strings when `show_results_names`=True
- only use integer loop counter if repetition > 2
- Some type annotations added for parse action related methods, thanks August
Karlstedt (#551).
- Added exception type to `trace_parse_action` exception output, while investigating
SO question posted by medihack.
- Added `set_name` calls to internal expressions generated in `infix_notation`, for
improved railroad diagramming.
- `delta_time`, `lua_parser`, `decaf_parser`, and `roman_numerals` examples cleaned up
to use latest PEP8 names and add minor enhancements.
- Fixed bug (and corresponding test code) in `delta_time` example that did not handle
weekday references in time expressions (like "Monday at 4pm") when the weekday was
the same as the current weekday.
- Minor performance speedup in `trim_arity`, to benefit any parsers using parse actions.
- Added early testing support for Python 3.13 with JIT enabled.
Version 3.1.2 - March, 2024
---------------------------
- Added `ieee_float` expression to `pyparsing.common`, which parses float values,
plus "NaN", "Inf", "Infinity". PR submitted by Bob Peterson (#538).
- Updated pep8 synonym wrappers for better type checking compatibility. PR submitted
by Ricardo Coccioli (#507).
- Fixed empty error message bug, PR submitted by InSync (#534). This _should_ return
pyparsing's exception messages to a former, more helpful form. If you have code that
parses the exception messages returned by pyparsing, this may require some code
changes.
- Added unit tests to test for exception message contents, with enhancement to
`pyparsing.testing.assertRaisesParseException` to accept an expected exception message.
- Updated example `select_parser.py` to use PEP8 names and added Groups for better retrieval
of parsed values from multiple SELECT clauses.
- Added example `email_address_parser.py`, as suggested by John Byrd (#539).
- Added example `directx_x_file_parser.py` to parse DirectX template definitions, and
generate a Pyparsing parser from a template to parse .x files.
- Some code refactoring to reduce code nesting, PRs submitted by InSync.
- All internal string expressions using '%' string interpolation and `str.format()`
converted to f-strings.
Version 3.1.1 - July, 2023
--------------------------
- Fixed regression in Word(min), reported by Ricardo Coccioli, good catch! (Issue #502)
- Fixed bug in bad exception messages raised by Forward expressions. PR submitted
by Kyle Sunden, thanks for your patience and collaboration on this (#493).
- Fixed regression in SkipTo, where ignored expressions were not checked when looking
for the target expression. Reported by catcombo, Issue #500.
- Fixed type annotation for enable_packrat, PR submitted by Mike Urbach, thanks! (Issue #498)
- Some general internal code cleanup. (Instigated by Michal Čihař, Issue #488)
Version 3.1.0 - June, 2023
--------------------------
- Added `tag_emitter.py` to examples. This example demonstrates how to insert
tags into your parsed results that are not part of the original parsed text.
Version 3.1.0b2 - May, 2023
---------------------------
- Updated `create_diagram()` code to be compatible with railroad-diagrams package
version 3.0. Fixes Issue #477 (railroad diagrams generated with black bars),
reported by Sam Morley-Short.
- Fixed bug in `NotAny`, where parse actions on the negated expr were not being run.
This could cause `NotAny` to incorrectly fail if the expr would normally match,
but would fail to match if a condition used as a parse action returned False.
Fixes Issue #482, raised by byaka, thank you!
- Fixed `create_diagram()` to accept keyword args, to be passed through to the
`template.render()` method to generate the output HTML (PR submitted by Aussie Schnore,
good catch!)
- Fixed bug in `python_quoted_string` regex.
- Added `examples/bf.py` Brainf*ck parser/executor example. Illustrates using
a pyparsing grammar to parse language syntax, and attach executable AST nodes to
the parsed results.
Version 3.1.0b1 - April, 2023
-----------------------------
- Added support for Python 3.12.
- API CHANGE: A slight change has been implemented when unquoting a quoted string
parsed using the `QuotedString` class. Formerly, when unquoting and processing
whitespace markers such as \t and \n, these substitutions would occur first, and
then any additional '\' escaping would be done on the resulting string. This would
parse "\\n" as "\<newline>". Now escapes and whitespace markers are all processed
in a single pass working left to right, so the quoted string "\\n" would get unquoted
to "\n" (a backslash followed by "n"). Fixes issue #474 raised by jakeanq,
thanks!
- Added named field "url" to `pyparsing.common.url`, returning the entire
parsed URL string.
- Fixed bug when parse actions returned an empty string for an expression that
had a results name, that the results name was not saved. That is:
expr = Literal("X").add_parse_action(lambda tokens: "")("value")
result = expr.parse_string("X")
print(result["value"])
would raise a `KeyError`. Now empty strings will be saved with the associated
results name. Raised in Issue #470 by Nicco Kunzmann, thank you.
- Fixed bug in `SkipTo` where ignore expressions were not properly handled while
scanning for the target expression. Issue #475, reported by elkniwt, thanks
(this bug has been there for a looooong time!).
- Updated `ci.yml` permissions to limit default access to source - submitted by Joyce
Brum of Google. Thanks so much!
- Updated the `lucene_grammar.py` example (better support for '*' and '?' wildcards)
and corrected the test cases - brought to my attention by Elijah Nicol, good catch!
Version 3.1.0a1 - March, 2023
-----------------------------
- API ENHANCEMENT: `Optional(expr)` may now be written as `expr | ""`
This will make this code:
"{" + Optional(Literal("A") | Literal("a")) + "}"
writable as:
"{" + (Literal("A") | Literal("a") | "") + "}"
Some related changes implemented as part of this work:
- `Literal("")` now internally generates an `Empty()` (and no longer raises an exception)
- `Empty` is now a subclass of `Literal`
Suggested by Antony Lee (issue #412), PR (#413) by Devin J. Pohly.
- Added new class property `identifier` to all Unicode set classes in `pyparsing.unicode`,
using the class's values for `cls.identchars` and `cls.identbodychars`. Now Unicode-aware
parsers that formerly wrote:
ppu = pyparsing.unicode
ident = Word(ppu.Greek.identchars, ppu.Greek.identbodychars)
can now write:
ident = ppu.Greek.identifier
# or
# ident = ppu.Ελληνικά.identifier
- `ParseResults` now has a new method `deepcopy()`, in addition to the current
`copy()` method. `copy()` only makes a shallow copy - any contained `ParseResults`
are copied as references - changes in the copy will be seen as changes in the original.
In many cases, a shallow copy is sufficient, but some applications require a deep copy.
`deepcopy()` makes a deeper copy: any contained `ParseResults` or other mappings or
containers are built with copies from the original, and do not get changed if the
original is later changed. Addresses issue #463, reported by Bryn Pickering.
- Reworked `delimited_list` function into the new `DelimitedList` class.
`DelimitedList` has the same constructor interface as `delimited_list`, and
in this release, `delimited_list` changes from a function to a synonym for
`DelimitedList`. `delimited_list` and the older `delimitedList` method will be
deprecated in a future release, in favor of `DelimitedList`.
- Error messages from `MatchFirst` and `Or` expressions will try to give more details
if one of the alternatives matches better than the others, but still fails.
Question raised in Issue #464 by msdemlei, thanks!
- Added new class method `ParserElement.using_each`, to simplify code
that creates a sequence of `Literals`, `Keywords`, or other `ParserElement`
subclasses.
For instance, to define suppressible punctuation, you would previously
write:
LPAR, RPAR, LBRACE, RBRACE, SEMI = map(Suppress, "(){};")
You can now write:
LPAR, RPAR, LBRACE, RBRACE, SEMI = Suppress.using_each("(){};")
`using_each` will also accept optional keyword args, which it will
pass through to the class initializer. Here is an expression for
single-letter variable names that might be used in an algebraic
expression:
algebra_var = MatchFirst(
Char.using_each(string.ascii_lowercase, as_keyword=True)
)
- Added new builtin `python_quoted_string`, which will match any form
of single-line or multiline quoted strings defined in Python. (Inspired
by discussion with Andreas Schörgenhumer in Issue #421.)
- Extended `expr[]` notation for repetition of `expr` to accept a
slice, where the slice's stop value indicates a `stop_on`
expression:
test = "BEGIN aaa bbb ccc END"
BEGIN, END = Keyword.using_each("BEGIN END".split())
body_word = Word(alphas)
expr = BEGIN + Group(body_word[...:END]) + END
# equivalent to
# expr = BEGIN + Group(ZeroOrMore(body_word, stop_on=END)) + END
print(expr.parse_string(test))
Prints:
['BEGIN', ['aaa', 'bbb', 'ccc'], 'END']
- `ParserElement.validate()` is deprecated. It predates the support for left-recursive
parsers, and was prone to false positives (warning that a grammar was invalid when
it was in fact valid). It will be removed in a future pyparsing release. In its
place, developers should use debugging and analytical tools, such as `ParserElement.set_debug()`
and `ParserElement.create_diagram()`.
(Raised in Issue #444, thanks Andrea Micheli!)
- Added bool `embed` argument to `ParserElement.create_diagram()`.
When passed as True, the resulting diagram will omit the `<DOCTYPE>`,
`<HEAD>`, and `<BODY>` tags so that it can be embedded in other
HTML source. (Useful when embedding a call to `create_diagram()` in
a PyScript HTML page.)
- Added `recurse` argument to `ParserElement.set_debug` to set the
debug flag on an expression and all of its sub-expressions. Requested
by multimeric in Issue #399.
- Added '·' (Unicode MIDDLE DOT) to the set of Latin1.identbodychars.
- Fixed bug in `Word` when `max=2`. Also added performance enhancement
when specifying `exact` argument. Reported in issue #409 by
panda-34, nice catch!
- `Word` arguments are now validated if `min` and `max` are both
given, that `min` <= `max`; raises `ValueError` if values are invalid.
- Fixed bug in srange, when parsing escaped '/' and '\' inside a
range set.
- Fixed exception messages for some `ParserElements` with custom names,
which instead showed their contained expression names.
- Fixed bug in pyparsing.common.url, when input URL is not alone
on an input line. Fixes Issue #459, reported by David Kennedy.
- Multiple added and corrected type annotations. With much help from
Stephen Rosen, thanks!
- Some documentation and error message clarifications on pyparsing's
keyword logic, cited by Basil Peace.
- General docstring cleanup for Sphinx doc generation, PRs submitted
by Devin J. Pohly. A dirty job, but someone has to do it - much
appreciated!
- `invRegex.py` example renamed to `inv_regex.py` and updated to PEP-8
variable and method naming. PR submitted by Ross J. Duff, thanks!
- Removed examples `sparser.py` and `pymicko.py`, since each included its
own GPL license in the header. Since this conflicts with pyparsing's
MIT license, they were removed from the distribution to avoid
confusion among those making use of them in their own projects.
Version 3.0.9 - May, 2022
-------------------------
- Added Unicode set `BasicMultilingualPlane` (may also be referenced
as `BMP`) representing the Basic Multilingual Plane (Unicode
characters up to code point 65535). Can be used to parse
most language characters, but omits emojis, wingdings, etc.
Raised in discussion with Dave Tapley (issue #392).
- To address mypy confusion of `pyparsing.Optional` and `typing.Optional`
resulting in `error: "_SpecialForm" not callable` message
reported in issue #365, fixed the import in `exceptions.py`. Nice
sleuthing by Iwan Aucamp and Dominic Davis-Foster, thank you!
(Removed definitions of `OptionalType`, `DictType`, and `IterableType`
and replaced them with `typing.Optional`, `typing.Dict`, and
`typing.Iterable` throughout.)
- Fixed typo in jinja2 template for railroad diagrams, thanks for the
catch Nioub (issue #388).
- Removed use of deprecated `pkg_resources` package in
railroad diagramming code (issue #391).
- Updated `bigquery_view_parser.py` example to parse examples at
https://cloud.google.com/bigquery/docs/reference/legacy-sql
Version 3.0.8 - April, 2022
---------------------------
- API CHANGE: modified `pyproject.toml` to require Python version
3.6.8 or later for pyparsing 3.x. Earlier minor versions of 3.6
fail in evaluating the `version_info` class (implemented using
`typing.NamedTuple`). If you are using an earlier version of Python
3.6, you will need to use pyparsing 2.4.7.
- Improved pyparsing import time by deferring regex pattern compiles.
PR submitted by Anthony Sottile to fix issue #362, thanks!
- Updated build to use flit, PR by Michał Górny, added `BUILDING.md`
doc and removed old Windows build scripts - nice cleanup work!
- More type-hinting added for all arithmetic and logical operator
methods in `ParserElement`. PR from Kazantcev Andrey, thank you.
- Fixed `infix_notation`'s definitions of `lpar` and `rpar`, to accept
parse expressions such that they do not get suppressed in the parsed
results. PR submitted by Philippe Prados, nice work.
- Fixed bug in railroad diagramming with expressions containing `Combine`
elements. Reported by Jeremy White, thanks!
- Added `show_groups` argument to `create_diagram` to highlight grouped
elements with an unlabeled bounding box.
- Added `unicode_denormalizer.py` to the examples as a demonstration
of how Python's interpreter will accept Unicode characters in
identifiers, but normalizes them back to ASCII so that identifiers
`print` and `𝕡𝓻ᵢ𝓃𝘁` and `𝖕𝒓𝗂𝑛ᵗ` are all equivalent.
- Removed imports of deprecated `sre_constants` module for catching
exceptions when compiling regular expressions. PR submitted by
Serhiy Storchaka, thank you.
Version 3.0.7 - January, 2022
-----------------------------
- Fixed bug #345, in which delimitedList changed expressions in place
using `expr.streamline()`. Reported by Kim Gräsman, thanks!
- Fixed bug #346, when a string of word characters was passed to WordStart
or `WordEnd` instead of just taking the default value. Originally posted
as a question by Parag on StackOverflow, good catch!
- Fixed bug #350, in which `White` expressions could fail to match due to
unintended whitespace-skipping. Reported by Fu Hanxi, thank you!
- Fixed bug #355, when a `QuotedString` is defined with characters in its
quoteChar string containing regex-significant characters such as ., *,
?, [, ], etc.
- Fixed bug in `ParserElement.run_tests` where comments would be displayed
using `with_line_numbers`.
- Added optional "min" and "max" arguments to `delimited_list`. PR
submitted by Marius, thanks!
- Added new API change note in `whats_new_in_pyparsing_3_0_0`, regarding
a bug fix in the `bool()` behavior of `ParseResults`.
Prior to pyparsing 3.0.x, the `ParseResults` class implementation of
`__bool__` would return `False` if the `ParseResults` item list was empty,
even if it contained named results. In 3.0.0 and later, `ParseResults` will
return `True` if either the item list is not empty *or* if the named
results dict is not empty.
# generate an empty ParseResults by parsing a blank string with
# a ZeroOrMore
result = Word(alphas)[...].parse_string("")
print(result.as_list())
print(result.as_dict())
print(bool(result))
# add a results name to the result
result["name"] = "empty result"
print(result.as_list())
print(result.as_dict())
print(bool(result))
Prints:
[]
{}
False
[]
{'name': 'empty result'}
True
In previous versions, the second call to `bool()` would return `False`.
- Minor enhancement to Word generation of internal regular expression, to
emit consecutive characters in range, such as "ab", as "ab", not "a-b".
- Fixed character ranges for search terms using non-Western characters
in booleansearchparser, PR submitted by tc-yu, nice work!
- Additional type annotations on public methods.
Version 3.0.6 - November, 2021
------------------------------
- Added `suppress_warning()` method to individually suppress a warning on a
specific ParserElement. Used to refactor `original_text_for` to preserve
internal results names, which, while undocumented, had been adopted by
some projects.
- Fix bug when `delimited_list` was called with a str literal instead of a
parse expression.
Version 3.0.5 - November, 2021
------------------------------
- Added return type annotations for `col`, `line`, and `lineno`.
- Fixed bug when `warn_ungrouped_named_tokens_in_collection` warning was raised
when assigning a results name to an `original_text_for` expression.
(Issue #110, would raise warning in packaging.)
- Fixed internal bug where `ParserElement.streamline()` would not return self if
already streamlined.
- Changed `run_tests()` output to default to not showing line and column numbers.
If line numbering is desired, call with `with_line_numbers=True`. Also fixed
minor bug where separating line was not included after a test failure.
Version 3.0.4 - October, 2021
-----------------------------
- Fixed bug in which `Dict` classes did not correctly return tokens as nested
`ParseResults`, reported by and fix identified by Bu Sun Kim, many thanks!!!
- Documented API-changing side-effect of converting `ParseResults` to use `__slots__`
to pre-define instance attributes. This means that code written like this (which
was allowed in pyparsing 2.4.7):
result = Word(alphas).parseString("abc")
result.xyz = 100
now raises this Python exception:
AttributeError: 'ParseResults' object has no attribute 'xyz'
To add new attribute values to ParseResults object in 3.0.0 and later, you must
assign them using indexed notation:
result["xyz"] = 100
You will still be able to access this new value as an attribute or as an
indexed item.
- Fixed bug in railroad diagramming where the vertical limit would count all
expressions in a group, not just those that would create visible railroad
elements.
Version 3.0.3 - October, 2021
-----------------------------
- Fixed regex typo in `one_of` fix for `as_keyword=True`.
- Fixed a whitespace-skipping bug, Issue #319, introduced as part of the revert
of the `LineStart` changes. Reported by Marc-Alexandre Côté,
thanks!
- Added header column labeling > 100 in `with_line_numbers` - some input lines
are longer than others.
Version 3.0.2 - October, 2021
-----------------------------
- Reverted change in behavior with `LineStart` and `StringStart`, which changed the
interpretation of when and how `LineStart` and `StringStart` should match when
a line starts with spaces. In 3.0.0, the `xxxStart` expressions were not
really treated like expressions in their own right, but as modifiers to the
following expression when used like `LineStart() + expr`, so that if there
were whitespace on the line before `expr` (which would match in versions prior
to 3.0.0), the match would fail.
3.0.0 implemented this by automatically promoting `LineStart() + expr` to
`AtLineStart(expr)`, which broke existing parsers that did not expect `expr` to
necessarily be right at the start of the line, but only be the first token
found on the line. This was reported as a regression in Issue #317.
In 3.0.2, pyparsing reverts to the previous behavior, but will retain the new
`AtLineStart` and `AtStringStart` expression classes, so that parsers can chose
whichever behavior applies in their specific instance. Specifically:
# matches expr if it is the first token on the line
# (allows for leading whitespace)
LineStart() + expr
# matches only if expr is found in column 1
AtLineStart(expr)
- Performance enhancement to `one_of` to always generate an internal `Regex`,
even if `caseless` or `as_keyword` args are given as `True` (unless explicitly
disabled by passing `use_regex=False`).
- `IndentedBlock` class now works with `recursive` flag. By default, the
results parsed by an `IndentedBlock` are grouped. This can be disabled by constructing
the `IndentedBlock` with `grouped=False`.
Version 3.0.1 - October, 2021
-----------------------------
- Fixed bug where `Word(max=n)` did not match word groups less than length 'n'.
Thanks to Joachim Metz for catching this!
- Fixed bug where `ParseResults` accidentally created recursive contents.
Joachim Metz on this one also!
- Fixed bug where `warn_on_multiple_string_args_to_oneof` warning is raised
even when not enabled.
Version 3.0.0 - October, 2021
-----------------------------
- A consolidated list of all the changes in the 3.0.0 release can be found in
`docs/whats_new_in_3_0_0.rst`.
(https://github.com/pyparsing/pyparsing/blob/master/docs/whats_new_in_3_0_0.rst)
Version 3.0.0.final - October, 2021
-----------------------------------
- Added support for python `-W` warning option to call `enable_all_warnings`() at startup.
Also detects setting of `PYPARSINGENABLEALLWARNINGS` environment variable to any non-blank
value. (If using `-Wd` for testing, but wishing to disable pyparsing warnings, add
`-Wi:::pyparsing`.)
- Fixed named results returned by `url` to match fields as they would be parsed
using `urllib.parse.urlparse`.
- Early response to `with_line_numbers` was positive, with some requested enhancements:
. added a trailing "|" at the end of each line (to show presence of trailing spaces);
can be customized using `eol_mark` argument
. added expand_tabs argument, to control calling str.expandtabs (defaults to True
to match `parseString`)
. added mark_spaces argument to support display of a printing character in place of
spaces, or Unicode symbols for space and tab characters
. added mark_control argument to support highlighting of control characters using
'.' or Unicode symbols, such as "␍" and "␊".
- Modified helpers `common_html_entity` and `replace_html_entity()` to use the HTML
entity definitions from `html.entities.html5`.
- Updated the class diagram in the pyparsing docs directory, along with the supporting
.puml file (PlantUML markup) used to create the diagram.
- Added global method `autoname_elements()` to call `set_name()` on all locally
defined `ParserElements` that haven't been explicitly named using `set_name()`, using
their local variable name. Useful for setting names on multiple elements when
creating a railroad diagram.
a = pp.Literal("a")
b = pp.Literal("b").set_name("bbb")
pp.autoname_elements()
`a` will get named "a", while `b` will keep its name "bbb".
Version 3.0.0rc2 - October, 2021
--------------------------------
- Added `url` expression to `pyparsing_common`. (Sample code posted by Wolfgang Fahl,
very nice!)
This new expression has been added to the `urlExtractorNew.py` example, to show how
it extracts URL fields into separate results names.
- Added method to `pyparsing_test` to help debugging, `with_line_numbers`.
Returns a string with line and column numbers corresponding to values shown
when parsing with expr.set_debug():
data = """\
A
100"""
expr = pp.Word(pp.alphanums).set_name("word").set_debug()
print(ppt.with_line_numbers(data))
expr[...].parseString(data)
prints:
1
1234567890
1: A
2: 100
Match word at loc 3(1,4)
A
^
Matched word -> ['A']
Match word at loc 11(2,7)
100
^
Matched word -> ['100']
- Added new example `cuneiform_python.py` to demonstrate creating a new Unicode
range, and writing a Cuneiform->Python transformer (inspired by zhpy).
- Fixed issue #272, reported by PhasecoreX, when `LineStart`() expressions would match
input text that was not necessarily at the beginning of a line.
As part of this fix, two new classes have been added: AtLineStart and AtStringStart.
The following expressions are equivalent:
LineStart() + expr and AtLineStart(expr)
StringStart() + expr and AtStringStart(expr)
[`LineStart` and `StringStart` changes reverted in 3.0.2.]
- Fixed `ParseFatalExceptions` failing to override normal exceptions or expression
matches in `MatchFirst` expressions. Addresses issue #251, reported by zyp-rgb.
- Fixed bug in which `ParseResults` replaces a collection type value with an invalid
type annotation (as a result of changed behavior in Python 3.9). Addresses issue #276, reported by
Rob Shuler, thanks.
- Fixed bug in `ParseResults` when calling `__getattr__` for special double-underscored
methods. Now raises `AttributeError` for non-existent results when accessing a
name starting with '__'. Addresses issue #208, reported by Joachim Metz.
- Modified debug fail messages to include the expression name to make it easier to sync
up match vs success/fail debug messages.
Version 3.0.0rc1 - September, 2021
----------------------------------
- Railroad diagrams have been reformatted:
. creating diagrams is easier - call
expr.create_diagram("diagram_output.html")
create_diagram() takes 3 arguments:
. the filename to write the diagram HTML
. optional 'vertical' argument, to specify the minimum number of items in a path
to be shown vertically; default=3
. optional 'show_results_names' argument, to specify whether results name
annotations should be shown; default=False
. every expression that gets a name using `setName()` gets separated out as
a separate subdiagram
. results names can be shown as annotations to diagram items
. `Each`, `FollowedBy`, and `PrecededBy` elements get [ALL], [LOOKAHEAD], and [LOOKBEHIND]
annotations
. removed annotations for Suppress elements
. some diagram cleanup when a grammar contains Forward elements
. check out the examples make_diagram.py and railroad_diagram_demo.py
- Type annotations have been added to most public API methods and classes.
- Better exception messages to show full word where an exception occurred.
Word(alphas, alphanums)[...].parseString("ab1 123", parseAll=True)
Was:
pyparsing.ParseException: Expected end of text, found '1' (at char 4), (line:1, col:5)
Now:
pyparsing.exceptions.ParseException: Expected end of text, found '123' (at char 4), (line:1, col:5)
- Suppress can be used to suppress text skipped using "...".
source = "lead in START relevant text END trailing text"
start_marker = Keyword("START")
end_marker = Keyword("END")
find_body = Suppress(...) + start_marker + ... + end_marker
print(find_body.parseString(source).dump())
Prints:
['START', 'relevant text ', 'END']
- _skipped: ['relevant text ']
- New string constants `identchars` and `identbodychars` to help in defining identifier Word expressions
Two new module-level strings have been added to help when defining identifiers, `identchars` and `identbodychars`.
Instead of writing::
import pyparsing as pp
identifier = pp.Word(pp.alphas + "_", pp.alphanums + "_")
you will be able to write::
identifier = pp.Word(pp.identchars, pp.identbodychars)
Those constants have also been added to all the Unicode string classes::
import pyparsing as pp
ppu = pp.pyparsing_unicode
cjk_identifier = pp.Word(ppu.CJK.identchars, ppu.CJK.identbodychars)
greek_identifier = pp.Word(ppu.Greek.identchars, ppu.Greek.identbodychars)
- Added a caseless parameter to the `CloseMatch` class to allow for casing to be
ignored when checking for close matches. (Issue #281) (PR by Adrian Edwards, thanks!)
- Fixed bug in Located class when used with a results name. (Issue #294)
- Fixed bug in `QuotedString` class when the escaped quote string is not a
repeated character. (Issue #263)
- `parseFile()` and `create_diagram()` methods now will accept `pathlib.Path`
arguments.
Version 3.0.0b3 - August, 2021
------------------------------
- PEP-8 compatible names are being introduced in pyparsing version 3.0!
All methods such as `parseString` have been replaced with the PEP-8
compliant name `parse_string`. In addition, arguments such as `parseAll`
have been renamed to `parse_all`. For backward-compatibility, synonyms for
all renamed methods and arguments have been added, so that existing
pyparsing parsers will not break. These synonyms will be removed in a future
release.
In addition, the Optional class has been renamed to Opt, since it clashes
with the common typing.Optional type specifier that is used in the Python
type annotations. A compatibility synonym is defined for now, but will be
removed in a future release.
- HUGE NEW FEATURE - Support for left-recursive parsers!
Following the method used in Python's PEG parser, pyparsing now supports
left-recursive parsers when left recursion is enabled.
import pyparsing as pp
pp.ParserElement.enable_left_recursion()
# a common left-recursion definition
# define a list of items as 'list + item | item'
# BNF:
# item_list := item_list item | item
# item := word of alphas
item_list = pp.Forward()
item = pp.Word(pp.alphas)
item_list <<= item_list + item | item
item_list.run_tests("""\
To parse or not to parse that is the question
""")
Prints:
['To', 'parse', 'or', 'not', 'to', 'parse', 'that', 'is', 'the', 'question']
Great work contributed by Max Fischer!
- `delimited_list` now supports an additional flag `allow_trailing_delim`,
to optionally parse an additional delimiter at the end of the list.
Contributed by Kazantcev Andrey, thanks!
- Removed internal comparison of results values against b"", which
raised a `BytesWarning` when run with `python -bb`. Fixes issue #271 reported
by Florian Bruhin, thank you!
- Fixed STUDENTS table in sql2dot.py example, fixes issue #261 reported by
legrandlegrand - much better.
- Python 3.5 will not be supported in the pyparsing 3 releases. This will allow
for future pyparsing releases to add parameter type annotations, and to take
advantage of dict key ordering in internal results name tracking.
Version 3.0.0b2 - December, 2020
--------------------------------
- API CHANGE
`locatedExpr` is being replaced by the class `Located`. `Located` has the same
constructor interface as `locatedExpr`, but fixes bugs in the returned
`ParseResults` when the searched expression contains multiple tokens, or
has internal results names.
`locatedExpr` is deprecated, and will be removed in a future release.
Version 3.0.0b1 - November, 2020
--------------------------------
- API CHANGE
Diagnostic flags have been moved to an enum, `pyparsing.Diagnostics`, and
they are enabled through module-level methods:
- `pyparsing.enable_diag()`
- `pyparsing.disable_diag()`
- `pyparsing.enable_all_warnings()`
- API CHANGE
Most previous `SyntaxWarnings` that were warned when using pyparsing
classes incorrectly have been converted to `TypeError` and `ValueError` exceptions,
consistent with Python calling conventions. All warnings warned by diagnostic
flags have been converted from `SyntaxWarnings` to `UserWarnings`.
- To support parsers that are intended to generate native Python collection
types such as lists and dicts, the `Group` and `Dict` classes now accept an
additional boolean keyword argument `aslist` and `asdict` respectively. See
the `jsonParser.py` example in the `pyparsing/examples` source directory for
how to return types as `ParseResults` and as Python collection types, and the
distinctions in working with the different types.
In addition parse actions that must return a value of list type (which would
normally be converted internally to a `ParseResults`) can override this default
behavior by returning their list wrapped in the new `ParseResults.List` class:
# this parse action tries to return a list, but pyparsing
# will convert to a ParseResults
def return_as_list_but_still_get_parse_results(tokens):
return tokens.asList()
# this parse action returns the tokens as a list, and pyparsing will
# maintain its list type in the final parsing results
def return_as_list(tokens):
return ParseResults.List(tokens.asList())
This is the mechanism used internally by the `Group` class when defined
using `aslist=True`.
- A new `IndentedBlock` class is introduced, to eventually replace the
current `indentedBlock` helper method. The interface is largely the same,
however, the new class manages its own internal indentation stack, so
it is no longer necessary to maintain an external `indentStack` variable.
- API CHANGE
Added `cache_hit` keyword argument to debug actions. Previously, if packrat
parsing was enabled, the debug methods were not called in the event of cache
hits. Now these methods will be called, with an added argument
`cache_hit=True`.
If you are using packrat parsing and enable debug on expressions using a
custom debug method, you can add the `cache_hit=False` keyword argument,
and your method will be called on packrat cache hits. If you choose not
to add this keyword argument, the debug methods will fail silently,
behaving as they did previously.
- When using `setDebug` with packrat parsing enabled, packrat cache hits will
now be included in the output, shown with a leading '*'. (Previously, cache
hits and responses were not included in debug output.) For those using custom
debug actions, see the previous item regarding an optional API change
for those methods.
- `setDebug` output will also show more details about what expression
is about to be parsed (the current line of text being parsed, and
the current parse position):
Match integer at loc 0(1,1)
1 2 3
^
Matched integer -> ['1']
The current debug location will also be indicated after whitespace
has been skipped (was previously inconsistent, reported in Issue #244,
by Frank Goyens, thanks!).
- Modified the repr() output for `ParseResults` to include the class
name as part of the output. This is to clarify for new pyparsing users
who misread the repr output as a tuple of a list and a dict. pyparsing
results will now read like:
ParseResults(['abc', 'def'], {'qty': 100}]
instead of just:
(['abc', 'def'], {'qty': 100}]
- Fixed bugs in Each when passed `OneOrMore` or `ZeroOrMore` expressions:
. first expression match could be enclosed in an extra nesting level
. out-of-order expressions now handled correctly if mixed with required
expressions
. results names are maintained correctly for these expressions
- Fixed traceback trimming, and added `ParserElement.verbose_traceback`
save/restore to `reset_pyparsing_context()`.
- Default string for `Word` expressions now also include indications of
`min` and `max` length specification, if applicable, similar to regex length
specifications:
Word(alphas) -> "W:(A-Za-z)"
Word(nums) -> "W:(0-9)"
Word(nums, exact=3) -> "W:(0-9){3}"
Word(nums, min=2) -> "W:(0-9){2,...}"
Word(nums, max=3) -> "W:(0-9){1,3}"
Word(nums, min=2, max=3) -> "W:(0-9){2,3}"
For expressions of the `Char` class (similar to `Word(..., exact=1)`, the expression
is simply the character range in parentheses:
Char(nums) -> "(0-9)"
Char(alphas) -> "(A-Za-z)"
- Removed `copy()` override in `Keyword` class which did not preserve definition
of ident chars from the original expression. PR #233 submitted by jgrey4296,
thanks!
- In addition to `pyparsing.__version__`, there is now also a `pyparsing.__version_info__`,
following the same structure and field names as in `sys.version_info`.
Version 3.0.0a2 - June, 2020
----------------------------
- Summary of changes for 3.0.0 can be found in "What's New in Pyparsing 3.0.0"
documentation.
- API CHANGE
Changed result returned when parsing using `countedArray`,
the array items are no longer returned in a doubly-nested
list.
- An excellent new enhancement is the new railroad diagram
generator for documenting pyparsing parsers:
import pyparsing as pp
from pyparsing.diagram import to_railroad, railroad_to_html
from pathlib import Path
# define a simple grammar for parsing street addresses such
# as "123 Main Street"
# number word...
number = pp.Word(pp.nums).setName("number")
name = pp.Word(pp.alphas).setName("word")[1, ...]
parser = number("house_number") + name("street")
parser.setName("street address")
# construct railroad track diagram for this parser and
# save as HTML
rr = to_railroad(parser)
Path('parser_rr_diag.html').write_text(railroad_to_html(rr))
Very nice work provided by Michael Milton, thanks a ton!
- Enhanced default strings created for Word expressions, now showing
string ranges if possible. `Word(alphas)` would formerly
print as `W:(ABCD...)`, now prints as `W:(A-Za-z)`.
- Added `ignoreWhitespace(recurse:bool = True)`` and added a
recurse argument to `leaveWhitespace`, both added to provide finer
control over pyparsing's whitespace skipping. Also contributed
by Michael Milton.
- The unicode range definitions for the various languages were
recalculated by interrogating the unicodedata module by character
name, selecting characters that contained that language in their
Unicode name. (Issue #227)
Also, pyparsing_unicode.Korean was renamed to Hangul (Korean
is also defined as a synonym for compatibility).
- Enhanced `ParseResults` dump() to show both results names and list
subitems. Fixes bug where adding a results name would hide
lower-level structures in the `ParseResults`.
- Added new __diag__ warnings:
"warn_on_parse_using_empty_Forward" - warns that a Forward
has been included in a grammar, but no expression was
attached to it using '<<=' or '<<'
"warn_on_assignment_to_Forward" - warns that a Forward has
been created, but was probably later overwritten by
erroneously using '=' instead of '<<=' (this is a common
mistake when using Forwards)
(**currently not working on PyPy**)
- Added `ParserElement`.recurse() method to make it simpler for
grammar utilities to navigate through the tree of expressions in
a pyparsing grammar.
- Fixed bug in `ParseResults` repr() which showed all matching
entries for a results name, even if `listAllMatches` was set
to False when creating the `ParseResults` originally. Reported
by Nicholas42 on GitHub, good catch! (Issue #205)
- Modified refactored modules to use relative imports, as
pointed out by setuptools project member jaraco, thank you!
- Off-by-one bug found in the roman_numerals.py example, a bug
that has been there for about 14 years! PR submitted by
Jay Pedersen, nice catch!
- A simplified Lua parser has been added to the examples
(lua_parser.py).
- Added make_diagram.py to the examples directory to demonstrate
creation of railroad diagrams for selected pyparsing examples.
Also restructured some examples to make their parsers importable
without running their embedded tests.
Version 3.0.0a1 - April, 2020
-----------------------------
- Removed Py2.x support and other deprecated features. Pyparsing
now requires Python 3.5 or later. If you are using an earlier
version of Python, you must use a Pyparsing 2.4.x version
Deprecated features removed:
. `ParseResults.asXML()` - if used for debugging, switch
to using `ParseResults.dump()`; if used for data transfer,
use `ParseResults.asDict()` to convert to a nested Python
dict, which can then be converted to XML or JSON or
other transfer format
. `operatorPrecedence` synonym for `infixNotation` -
convert to calling `infixNotation`
. `commaSeparatedList` - convert to using
pyparsing_common.comma_separated_list
. `upcaseTokens` and `downcaseTokens` - convert to using
`pyparsing_common.upcaseTokens` and `downcaseTokens`
. __compat__.collect_all_And_tokens will not be settable to
False to revert to pre-2.3.1 results name behavior -
review use of names for `MatchFirst` and Or expressions
containing And expressions, as they will return the
complete list of parsed tokens, not just the first one.
Use `__diag__.warn_multiple_tokens_in_named_alternation`
to help identify those expressions in your parsers that
will have changed as a result.
- Removed support for running `python setup.py test`. The setuptools
maintainers consider the test command deprecated (see
<https://github.com/pypa/setuptools/issues/1684>). To run the Pyparsing test,
use the command `tox`.
- API CHANGE:
The staticmethod `ParseException.explain` has been moved to
`ParseBaseException.explain_exception`, and a new `explain` instance
method added to `ParseBaseException`. This will make calls to `explain`
much more natural:
try:
expr.parseString("...")
except ParseException as pe:
print(pe.explain())
- POTENTIAL API CHANGE:
`ZeroOrMore` expressions that have results names will now
include empty lists for their name if no matches are found.
Previously, no named result would be present. Code that tested
for the presence of any expressions using "if name in results:"
will now always return True. This code will need to change to
"if name in results and results[name]:" or just
"if results[name]:". Also, any parser unit tests that check the
`asDict()` contents will now see additional entries for parsers
having named `ZeroOrMore` expressions, whose values will be `[]`.
- POTENTIAL API CHANGE:
Fixed a bug in which calls to `ParserElement.setDefaultWhitespaceChars`
did not change whitespace definitions on any pyparsing built-in
expressions defined at import time (such as `quotedString`, or those
defined in pyparsing_common). This would lead to confusion when
built-in expressions would not use updated default whitespace
characters. Now a call to `ParserElement.setDefaultWhitespaceChars`
will also go and update all pyparsing built-ins to use the new
default whitespace characters. (Note that this will only modify
expressions defined within the pyparsing module.) Prompted by
work on a StackOverflow question posted by jtiai.
- Expanded __diag__ and __compat__ to actual classes instead of
just namespaces, to add some helpful behavior:
- enable() and .disable() methods to give extra
help when setting or clearing flags (detects invalid
flag names, detects when trying to set a __compat__ flag
that is no longer settable). Use these methods now to
set or clear flags, instead of directly setting to True or
False.
import pyparsing as pp
pp.__diag__.enable("warn_multiple_tokens_in_named_alternation")
- __diag__.enable_all_warnings() is another helper that sets
all "warn*" diagnostics to True.
pp.__diag__.enable_all_warnings()
- added new warning, "warn_on_match_first_with_lshift_operator" to
warn when using '<<' with a '|' `MatchFirst` operator, which will
create an unintended expression due to precedence of operations.
Example: This statement will erroneously define the `fwd` expression
as just `expr_a`, even though `expr_a | expr_b` was intended,
since '<<' operator has precedence over '|':
fwd << expr_a | expr_b
To correct this, use the '<<=' operator (preferred) or parentheses
to override operator precedence:
fwd <<= expr_a | expr_b
or
fwd << (expr_a | expr_b)
- Cleaned up default tracebacks when getting a `ParseException` when calling
`parseString`. Exception traces should now stop at the call in `parseString`,
and not include the internal traceback frames. (If the full traceback
is desired, then set `ParserElement`.verbose_traceback to True.)
- Fixed `FutureWarnings` that sometimes are raised when '[' passed as a
character to Word.
- New namespace, assert methods and classes added to support writing
unit tests.
- `assertParseResultsEquals`
- `assertParseAndCheckList`
- `assertParseAndCheckDict`
- `assertRunTestResults`
- `assertRaisesParseException`
- `reset_pyparsing_context` context manager, to restore pyparsing
config settings
- Enhanced error messages and error locations when parsing fails on
the Keyword or `CaselessKeyword` classes due to the presence of a
preceding or trailing keyword character. Surfaced while
working with metaperl on issue #201.
- Enhanced the Regex class to be compatible with re's compiled with the
re-equivalent regex module. Individual expressions can be built with
regex compiled expressions using:
import pyparsing as pp
import regex
# would use regex for this expression
integer_parser = pp.Regex(regex.compile(r'\d+'))
Inspired by PR submitted by bjrnfrdnnd on GitHub, very nice!
- Fixed handling of `ParseSyntaxExceptions` raised as part of Each
expressions, when sub-expressions contain '-' backtrack
suppression. As part of resolution to a question posted by John
Greene on StackOverflow.
- Potentially *huge* performance enhancement when parsing Word
expressions built from pyparsing_unicode character sets. Word now
internally converts ranges of consecutive characters to regex
character ranges (converting "0123456789" to "0-9" for instance),
resulting in as much as 50X improvement in performance! Work
inspired by a question posted by Midnighter on StackOverflow.
- Improvements in select_parser.py, to include new SQL syntax
from SQLite. PR submitted by Robert Coup, nice work!
- Fixed bug in `PrecededBy` which caused infinite recursion, issue #127
submitted by EdwardJB.
- Fixed bug in `CloseMatch` where end location was incorrectly
computed; and updated partial_gene_match.py example.
- Fixed bug in `indentedBlock` with a parser using two different
types of nested indented blocks with different indent values,
but sharing the same indent stack, submitted by renzbagaporo.
- Fixed bug in Each when using Regex, when Regex expression would
get parsed twice; issue #183 submitted by scauligi, thanks!
- `BigQueryViewParser.py` added to examples directory, PR submitted
by Michael Smedberg, nice work!
- booleansearchparser.py added to examples directory, PR submitted
by xecgr. Builds on searchparser.py, adding support for '*'
wildcards and non-Western alphabets.
- Fixed bug in delta_time.py example, when using a quantity
of seconds/minutes/hours/days > 999.
- Fixed bug in regex definitions for real and sci_real expressions in
pyparsing_common. Issue #194, reported by Michael Wayne Goodman, thanks!
- Fixed `FutureWarning` raised beginning in Python 3.7 for Regex expressions
containing '[' within a regex set.
- Minor reformatting of output from `runTests` to make embedded
comments more visible.
- And finally, many thanks to those who helped in the restructuring
of the pyparsing code base as part of this release. Pyparsing now
has more standard package structure, more standard unit tests,
and more standard code formatting (using black). Special thanks
to jdufresne, klahnakoski, mattcarmody, and ckeygusuz, to name just
a few.
Version 2.4.7 - April, 2020
---------------------------
- Backport of selected fixes from 3.0.0 work:
. Each bug with Regex expressions
. And expressions not properly constructing with generator
. Traceback abbreviation
. Bug in delta_time example
. Fix regexen in pyparsing_common.real and .sci_real
. Avoid FutureWarning on Python 3.7 or later
. Cleanup output in runTests if comments are embedded in test string
Version 2.4.6 - December, 2019
------------------------------
- Fixed typos in White mapping of whitespace characters, to use
correct "\u" prefix instead of "u\".
- Fix bug in left-associative ternary operators defined using
infixNotation. First reported on StackOverflow by user Jeronimo.
- Backport of pyparsing_test namespace from 3.0.0, including
TestParseResultsAsserts mixin class defining unittest-helper
methods:
. def assertParseResultsEquals(
self, result, expected_list=None, expected_dict=None, msg=None)
. def assertParseAndCheckList(
self, expr, test_string, expected_list, msg=None, verbose=True)
. def assertParseAndCheckDict(
self, expr, test_string, expected_dict, msg=None, verbose=True)
. def assertRunTestResults(
self, run_tests_report, expected_parse_results=None, msg=None)
. def assertRaisesParseException(self, exc_type=ParseException, msg=None)
To use the methods in this mixin class, declare your unittest classes as:
from pyparsing import pyparsing_test as ppt
class MyParserTest(ppt.TestParseResultsAsserts, unittest.TestCase):
...
Version 2.4.5 - November, 2019
------------------------------
- NOTE: final release compatible with Python 2.x.
- Fixed issue with reading README.rst as part of setup.py's
initialization of the project's long_description, with a
non-ASCII space character causing errors when installing from
source on platforms where UTF-8 is not the default encoding.
Version 2.4.4 - November, 2019
--------------------------------
- Unresolved symbol reference in 2.4.3 release was masked by stdout
buffering in unit tests, thanks for the prompt heads-up, Ned
Batchelder!
Version 2.4.3 - November, 2019
------------------------------
- Fixed a bug in ParserElement.__eq__ that would for some parsers
create a recursion error at parser definition time. Thanks to
Michael Clerx for the assist. (Addresses issue #123)
- Fixed bug in indentedBlock where a block that ended at the end
of the input string could cause pyparsing to loop forever. Raised
as part of discussion on StackOverflow with geckos.
- Backports from pyparsing 3.0.0:
. __diag__.enable_all_warnings()
. Fixed bug in PrecededBy which caused infinite recursion, issue #127
. support for using regex-compiled RE to construct Regex expressions
Version 2.4.2 - July, 2019
--------------------------
- Updated the shorthand notation that has been added for repetition
expressions: expr[min, max], with '...' valid as a min or max value:
- expr[...] and expr[0, ...] are equivalent to ZeroOrMore(expr)
- expr[1, ...] is equivalent to OneOrMore(expr)
- expr[n, ...] or expr[n,] is equivalent
to expr*n + ZeroOrMore(expr)
(read as "n or more instances of expr")
- expr[..., n] is equivalent to expr*(0, n)
- expr[m, n] is equivalent to expr*(m, n)
Note that expr[..., n] and expr[m, n] do not raise an exception
if more than n exprs exist in the input stream. If this
behavior is desired, then write expr[..., n] + ~expr.
Better interpretation of [...] as ZeroOrMore raised by crowsonkb,
thanks for keeping me in line!
If upgrading from 2.4.1 or 2.4.1.1 and you have used `expr[...]`
for `OneOrMore(expr)`, it must be updated to `expr[1, ...]`.
- The defaults on all the `__diag__` switches have been set to False,
to avoid getting alarming warnings. To use these diagnostics, set
them to True after importing pyparsing.
Example:
import pyparsing as pp
pp.__diag__.warn_multiple_tokens_in_named_alternation = True
- Fixed bug introduced by the use of __getitem__ for repetition,
overlooking Python's legacy implementation of iteration
by sequentially calling __getitem__ with increasing numbers until
getting an IndexError. Found during investigation of problem
reported by murlock, merci!
Version 2.4.2a1 - July, 2019
----------------------------
It turns out I got the meaning of `[...]` absolutely backwards,
so I've deleted 2.4.1 and am repushing this release as 2.4.2a1
for people to give it a try before I can call it ready to go.
The `expr[...]` notation was pushed out to be synonymous with
`OneOrMore(expr)`, but this is really counter to most Python
notations (and even other internal pyparsing notations as well).
It should have been defined to be equivalent to ZeroOrMore(expr).
- Changed [...] to emit ZeroOrMore instead of OneOrMore.
- Removed code that treats ParserElements like iterables.
- Change all __diag__ switches to False.
Version 2.4.1.1 - July 24, 2019
-------------------------------
This is a re-release of version 2.4.1 to restore the release history
in PyPI, since the 2.4.1 release was deleted.
There are 3 known issues in this release, which are fixed in
the upcoming 2.4.2:
- API change adding support for `expr[...]` - the original
code in 2.4.1 incorrectly implemented this as OneOrMore.
Code using this feature under this release should explicitly
use `expr[0, ...]` for ZeroOrMore and `expr[1, ...]` for
OneOrMore. In 2.4.2 you will be able to write `expr[...]`
equivalent to `ZeroOrMore(expr)`.
- Bug if composing And, Or, MatchFirst, or Each expressions
using an expression. This only affects code which uses
explicit expression construction using the And, Or, etc.
classes instead of using overloaded operators '+', '^', and
so on. If constructing an And using a single expression,
you may get an error that "cannot multiply ParserElement by
0 or (0, 0)" or a Python `IndexError`. Change code like
cmd = Or(Word(alphas))
to
cmd = Or([Word(alphas)])
(Note that this is not the recommended style for constructing
Or expressions.)
- Some newly-added `__diag__` switches are enabled by default,
which may give rise to noisy user warnings for existing parsers.
You can disable them using:
import pyparsing as pp
pp.__diag__.warn_multiple_tokens_in_named_alternation = False
pp.__diag__.warn_ungrouped_named_tokens_in_collection = False
pp.__diag__.warn_name_set_on_empty_Forward = False
pp.__diag__.warn_on_multiple_string_args_to_oneof = False
pp.__diag__.enable_debug_on_named_expressions = False
In 2.4.2 these will all be set to False by default.
Version 2.4.1 - July, 2019
--------------------------
- NOTE: Deprecated functions and features that will be dropped
in pyparsing 2.5.0 (planned next release):
. support for Python 2 - ongoing users running with
Python 2 can continue to use pyparsing 2.4.1
. ParseResults.asXML() - if used for debugging, switch
to using ParseResults.dump(); if used for data transfer,
use ParseResults.asDict() to convert to a nested Python
dict, which can then be converted to XML or JSON or
other transfer format
. operatorPrecedence synonym for infixNotation -
convert to calling infixNotation
. commaSeparatedList - convert to using
pyparsing_common.comma_separated_list
. upcaseTokens and downcaseTokens - convert to using
pyparsing_common.upcaseTokens and downcaseTokens
. __compat__.collect_all_And_tokens will not be settable to
False to revert to pre-2.3.1 results name behavior -
review use of names for MatchFirst and Or expressions
containing And expressions, as they will return the
complete list of parsed tokens, not just the first one.
Use __diag__.warn_multiple_tokens_in_named_alternation
(described below) to help identify those expressions
in your parsers that will have changed as a result.
- A new shorthand notation has been added for repetition
expressions: expr[min, max], with '...' valid as a min
or max value:
- expr[...] is equivalent to OneOrMore(expr)
- expr[0, ...] is equivalent to ZeroOrMore(expr)
- expr[1, ...] is equivalent to OneOrMore(expr)
- expr[n, ...] or expr[n,] is equivalent
to expr*n + ZeroOrMore(expr)
(read as "n or more instances of expr")
- expr[..., n] is equivalent to expr*(0, n)
- expr[m, n] is equivalent to expr*(m, n)
Note that expr[..., n] and expr[m, n] do not raise an exception
if more than n exprs exist in the input stream. If this
behavior is desired, then write expr[..., n] + ~expr.
- '...' can also be used as short hand for SkipTo when used
in adding parse expressions to compose an And expression.
Literal('start') + ... + Literal('end')
And(['start', ..., 'end'])
are both equivalent to:
Literal('start') + SkipTo('end')("_skipped*") + Literal('end')
The '...' form has the added benefit of not requiring repeating
the skip target expression. Note that the skipped text is
returned with '_skipped' as a results name, and that the contents of
`_skipped` will contain a list of text from all `...`s in the expression.
- '...' can also be used as a "skip forward in case of error" expression:
expr = "start" + (Word(nums).setName("int") | ...) + "end"
expr.parseString("start 456 end")
['start', '456', 'end']
expr.parseString("start 456 foo 789 end")
['start', '456', 'foo 789 ', 'end']
- _skipped: ['foo 789 ']
expr.parseString("start foo end")
['start', 'foo ', 'end']
- _skipped: ['foo ']
expr.parseString("start end")
['start', '', 'end']
- _skipped: ['missing <int>']
Note that in all the error cases, the '_skipped' results name is
present, showing a list of the extra or missing items.
This form is only valid when used with the '|' operator.
- Improved exception messages to show what was actually found, not
just what was expected.
word = pp.Word(pp.alphas)
pp.OneOrMore(word).parseString("aaa bbb 123", parseAll=True)
Former exception message:
pyparsing.ParseException: Expected end of text (at char 8), (line:1, col:9)
New exception message:
pyparsing.ParseException: Expected end of text, found '1' (at char 8), (line:1, col:9)
- Added diagnostic switches to help detect and warn about common
parser construction mistakes, or enable additional parse
debugging. Switches are attached to the pyparsing.__diag__
namespace object:
- warn_multiple_tokens_in_named_alternation - flag to enable warnings when a results
name is defined on a MatchFirst or Or expression with one or more And subexpressions
(default=True)
- warn_ungrouped_named_tokens_in_collection - flag to enable warnings when a results
name is defined on a containing expression with ungrouped subexpressions that also
have results names (default=True)
- warn_name_set_on_empty_Forward - flag to enable warnings when a Forward is defined
with a results name, but has no contents defined (default=False)
- warn_on_multiple_string_args_to_oneof - flag to enable warnings when oneOf is
incorrectly called with multiple str arguments (default=True)
- enable_debug_on_named_expressions - flag to auto-enable debug on all subsequent
calls to ParserElement.setName() (default=False)
warn_multiple_tokens_in_named_alternation is intended to help
those who currently have set __compat__.collect_all_And_tokens to
False as a workaround for using the pre-2.3.1 code with named
MatchFirst or Or expressions containing an And expression.
- Added ParseResults.from_dict classmethod, to simplify creation
of a ParseResults with results names using a dict, which may be nested.
This makes it easy to add a sub-level of named items to the parsed
tokens in a parse action.
- Added asKeyword argument (default=False) to oneOf, to force
keyword-style matching on the generated expressions.
- ParserElement.runTests now accepts an optional 'file' argument to
redirect test output to a file-like object (such as a StringIO,
or opened file). Default is to write to sys.stdout.
- conditionAsParseAction is a helper method for constructing a
parse action method from a predicate function that simply
returns a boolean result. Useful for those places where a
predicate cannot be added using addCondition, but must be
converted to a parse action (such as in infixNotation). May be
used as a decorator if default message and exception types
can be used. See ParserElement.addCondition for more details
about the expected signature and behavior for predicate condition
methods.
- While investigating issue #93, I found that Or and
addCondition could interact to select an alternative that
is not the longest match. This is because Or first checks
all alternatives for matches without running attached
parse actions or conditions, orders by longest match, and
then rechecks for matches with conditions and parse actions.
Some expressions, when checking with conditions, may end
up matching on a shorter token list than originally matched,
but would be selected because of its original priority.
This matching code has been expanded to do more extensive
searching for matches when a second-pass check matches a
smaller list than in the first pass.
- Fixed issue #87, a regression in indented block.
Reported by Renz Bagaporo, who submitted a very nice repro
example, which makes the bug-fixing process a lot easier,
thanks!
- Fixed MemoryError issue #85 and #91 with str generation for
Forwards. Thanks decalage2 and Harmon758 for your patience.
- Modified setParseAction to accept None as an argument,
indicating that all previously-defined parse actions for the
expression should be cleared.
- Modified pyparsing_common.real and sci_real to parse reals
without leading integer digits before the decimal point,
consistent with Python real number formats. Original PR #98
submitted by ansobolev.
- Modified runTests to call postParse function before dumping out
the parsed results - allows for postParse to add further results,
such as indications of additional validation success/failure.
- Updated statemachine example: refactored state transitions to use
overridden classmethods; added <statename>Mixin class to simplify
definition of application classes that "own" the state object and
delegate to it to model state-specific properties and behavior.
- Added example nested_markup.py, showing a simple wiki markup with
nested markup directives, and illustrating the use of '...' for
skipping over input to match the next expression. (This example
uses syntax that is not valid under Python 2.)
- Rewrote delta_time.py example (renamed from deltaTime.py) to
fix some omitted formats and upgrade to latest pyparsing idioms,
beginning with writing an actual BNF.
- With the help and encouragement from several contributors, including
Matěj Cepl and Cengiz Kaygusuz, I've started cleaning up the internal
coding styles in core pyparsing, bringing it up to modern coding
practices from pyparsing's early development days dating back to
2003. Whitespace has been largely standardized along PEP8 guidelines,
removing extra spaces around parentheses, and adding them around
arithmetic operators and after colons and commas. I was going to hold
off on doing this work until after 2.4.1, but after cleaning up a
few trial classes, the difference was so significant that I continued
on to the rest of the core code base. This should facilitate future
work and submitted PRs, allowing them to focus on substantive code
changes, and not get sidetracked by whitespace issues.
Version 2.4.0 - April, 2019
---------------------------
- Well, it looks like the API change that was introduced in 2.3.1 was more
drastic than expected, so for a friendlier forward upgrade path, this
release:
. Bumps the current version number to 2.4.0, to reflect this
incompatible change.
. Adds a pyparsing.__compat__ object for specifying compatibility with
future breaking changes.
. Conditionalizes the API-breaking behavior, based on the value
pyparsing.__compat__.collect_all_And_tokens. By default, this value
will be set to True, reflecting the new bugfixed behavior. To set this
value to False, add to your code:
import pyparsing
pyparsing.__compat__.collect_all_And_tokens = False
. User code that is dependent on the pre-bugfix behavior can restore
it by setting this value to False.
In 2.5 and later versions, the conditional code will be removed and
setting the flag to True or False in these later versions will have no
effect.
- Updated unitTests.py and simple_unit_tests.py to be compatible with
"python setup.py test". To run tests using setup, do:
python setup.py test
python setup.py test -s unitTests.suite
python setup.py test -s simple_unit_tests.suite
Prompted by issue #83 and PR submitted by bdragon28, thanks.
- Fixed bug in runTests handling '\n' literals in quoted strings.
- Added tag_body attribute to the start tag expressions generated by
makeHTMLTags, so that you can avoid using SkipTo to roll your own
tag body expression:
a, aEnd = pp.makeHTMLTags('a')
link = a + a.tag_body("displayed_text") + aEnd
for t in s.searchString(html_page):
print(t.displayed_text, '->', t.startA.href)
- indentedBlock failure handling was improved; PR submitted by TMiguelT,
thanks!
- Address Py2 incompatibility in simpleUnitTests, plus explain() and
Forward str() cleanup; PRs graciously provided by eswald.
- Fixed docstring with embedded '\w', which creates SyntaxWarnings in
Py3.8, issue #80.
- Examples:
- Added example parser for rosettacode.org tutorial compiler.
- Added example to show how an HTML table can be parsed into a
collection of Python lists or dicts, one per row.
- Updated SimpleSQL.py example to handle nested selects, reworked
'where' expression to use infixNotation.
- Added include_preprocessor.py, similar to macroExpander.py.
- Examples using makeHTMLTags use new tag_body expression when
retrieving a tag's body text.
- Updated examples that are runnable as unit tests:
python setup.py test -s examples.antlr_grammar_tests
python setup.py test -s examples.test_bibparse
Version 2.3.1 - January, 2019
-----------------------------
- POSSIBLE API CHANGE: this release fixes a bug when results names were
attached to a MatchFirst or Or object containing an And object.
Previously, a results name on an And object within an enclosing MatchFirst
or Or could return just the first token in the And. Now, all the tokens
matched by the And are correctly returned. This may result in subtle
changes in the tokens returned if you have this condition in your pyparsing
scripts.
- New staticmethod ParseException.explain() to help diagnose parse exceptions
by showing the failing input line and the trace of ParserElements in
the parser leading up to the exception. explain() returns a multiline
string listing each element by name. (This is still an experimental
method, and the method signature and format of the returned string may
evolve over the next few releases.)
Example:
# define a parser to parse an integer followed by an
# alphabetic word
expr = pp.Word(pp.nums).setName("int")
+ pp.Word(pp.alphas).setName("word")
try:
# parse a string with a numeric second value instead of alpha
expr.parseString("123 355")
except pp.ParseException as pe:
print(pp.ParseException.explain(pe))
Prints:
123 355
^
ParseException: Expected word (at char 4), (line:1, col:5)
__main__.ExplainExceptionTest
pyparsing.And - {int word}
pyparsing.Word - word
explain() will accept any exception type and will list the function
names and parse expressions in the stack trace. This is especially
useful when an exception is raised in a parse action.
Note: explain() is only supported under Python 3.
- Fix bug in dictOf which could match an empty sequence, making it
infinitely loop if wrapped in a OneOrMore.
- Added unicode sets to pyparsing_unicode for Latin-A and Latin-B ranges.
- Added ability to define custom unicode sets as combinations of other sets
using multiple inheritance.
class Turkish_set(pp.pyparsing_unicode.Latin1, pp.pyparsing_unicode.LatinA):
pass
turkish_word = pp.Word(Turkish_set.alphas)
- Updated state machine import examples, with state machine demos for:
. traffic light
. library book checkin/checkout
. document review/approval
In the traffic light example, you can use the custom 'statemachine' keyword
to define the states for a traffic light, and have the state classes
auto-generated for you:
statemachine TrafficLightState:
Red -> Green
Green -> Yellow
Yellow -> Red
Similar for state machines with named transitions, like the library book
state example:
statemachine LibraryBookState:
New -(shelve)-> Available
Available -(reserve)-> OnHold
OnHold -(release)-> Available
Available -(checkout)-> CheckedOut
CheckedOut -(checkin)-> Available
Once the classes are defined, then additional Python code can reference those
classes to add class attributes, instance methods, etc.
See the examples in examples/statemachine
- Added an example parser for the decaf language. This language is used in
CS compiler classes in many colleges and universities.
- Fixup of docstrings to Sphinx format, inclusion of test files in the source
package, and convert markdown to rst throughout the distribution, great job
by Matěj Cepl!
- Expanded the whitespace characters recognized by the White class to include
all unicode defined spaces. Suggested in Issue #51 by rtkjbillo.
- Added optional postParse argument to ParserElement.runTests() to add a
custom callback to be called for test strings that parse successfully. Useful
for running tests that do additional validation or processing on the parsed
results. See updated chemicalFormulas.py example.
- Removed distutils fallback in setup.py. If installing the package fails,
please update to the latest version of setuptools. Plus overall project code
cleanup (CRLFs, whitespace, imports, etc.), thanks Jon Dufresne!
- Fix bug in CaselessKeyword, to make its behavior consistent with
Keyword(caseless=True). Fixes Issue #65 reported by telesphore.
Version 2.3.0 - October, 2018
-----------------------------
- NEW SUPPORT FOR UNICODE CHARACTER RANGES
This release introduces the pyparsing_unicode namespace class, defining
a series of language character sets to simplify the definition of alphas,
nums, alphanums, and printables in the following language sets:
. Arabic
. Chinese
. Cyrillic
. Devanagari
. Greek
. Hebrew
. Japanese (including Kanji, Katakana, and Hirigana subsets)
. Korean
. Latin1 (includes 7 and 8-bit Latin characters)
. Thai
. CJK (combination of Chinese, Japanese, and Korean sets)
For example, your code can define words using:
korean_word = Word(pyparsing_unicode.Korean.alphas)
See their use in the updated examples greetingInGreek.py and
greetingInKorean.py.
This namespace class also offers access to these sets using their
unicode identifiers.
- POSSIBLE API CHANGE: Fixed bug where a parse action that explicitly
returned the input ParseResults could add another nesting level in
the results if the current expression had a results name.
vals = pp.OneOrMore(pp.pyparsing_common.integer)("int_values")
def add_total(tokens):
tokens['total'] = sum(tokens)
return tokens # this line can be removed
vals.addParseAction(add_total)
print(vals.parseString("244 23 13 2343").dump())
Before the fix, this code would print (note the extra nesting level):
[244, 23, 13, 2343]
- int_values: [244, 23, 13, 2343]
- int_values: [244, 23, 13, 2343]
- total: 2623
- total: 2623
With the fix, this code now prints:
[244, 23, 13, 2343]
- int_values: [244, 23, 13, 2343]
- total: 2623
This fix will change the structure of ParseResults returned if a
program defines a parse action that returns the tokens that were
sent in. This is not necessary, and statements like "return tokens"
in the example above can be safely deleted prior to upgrading to
this release, in order to avoid the bug and get the new behavior.
Reported by seron in Issue #22, nice catch!
- POSSIBLE API CHANGE: Fixed a related bug where a results name
erroneously created a second level of hierarchy in the returned
ParseResults. The intent for accumulating results names into ParseResults
is that, in the absence of Group'ing, all names get merged into a
common namespace. This allows us to write:
key_value_expr = (Word(alphas)("key") + '=' + Word(nums)("value"))
result = key_value_expr.parseString("a = 100")
and have result structured as {"key": "a", "value": "100"}
instead of [{"key": "a"}, {"value": "100"}].
However, if a named expression is used in a higher-level non-Group
expression that *also* has a name, a false sub-level would be created
in the namespace:
num = pp.Word(pp.nums)
num_pair = ("[" + (num("A") + num("B"))("values") + "]")
U = num_pair.parseString("[ 10 20 ]")
print(U.dump())
Since there is no grouping, "A", "B", and "values" should all appear
at the same level in the results, as:
['[', '10', '20', ']']
- A: '10'
- B: '20'
- values: ['10', '20']
Instead, an extra level of "A" and "B" show up under "values":
['[', '10', '20', ']']
- A: '10'
- B: '20'
- values: ['10', '20']
- A: '10'
- B: '20'
This bug has been fixed. Now, if this hierarchy is desired, then a
Group should be added:
num_pair = ("[" + pp.Group(num("A") + num("B"))("values") + "]")
Giving:
['[', ['10', '20'], ']']
- values: ['10', '20']
- A: '10'
- B: '20'
But in no case should "A" and "B" appear in multiple levels. This bug-fix
fixes that.
If you have current code which relies on this behavior, then add or remove
Groups as necessary to get your intended results structure.
Reported by Athanasios Anastasiou.
- IndexError's raised in parse actions will get explicitly reraised
as ParseExceptions that wrap the original IndexError. Since
IndexError sometimes occurs as part of pyparsing's normal parsing
logic, IndexErrors that are raised during a parse action may have
gotten silently reinterpreted as parsing errors. To retain the
information from the IndexError, these exceptions will now be
raised as ParseExceptions that reference the original IndexError.
This wrapping will only be visible when run under Python3, since it
emulates "raise ... from ..." syntax.
Addresses Issue #4, reported by guswns0528.
- Added Char class to simplify defining expressions of a single
character. (Char("abc") is equivalent to Word("abc", exact=1))
- Added class PrecededBy to perform lookbehind tests. PrecededBy is
used in the same way as FollowedBy, passing in an expression that
must occur just prior to the current parse location.
For fixed-length expressions like a Literal, Keyword, Char, or a
Word with an `exact` or `maxLen` length given, `PrecededBy(expr)`
is sufficient. For varying length expressions like a Word with no
given maximum length, `PrecededBy` must be constructed with an
integer `retreat` argument, as in
`PrecededBy(Word(alphas, nums), retreat=10)`, to specify the maximum
number of characters pyparsing must look backward to make a match.
pyparsing will check all the values from 1 up to retreat characters
back from the current parse location.
When stepping backwards through the input string, PrecededBy does
*not* skip over whitespace.
PrecededBy can be created with a results name so that, even though
it always returns an empty parse result, the result *can* include
named results.
Idea first suggested in Issue #30 by Freakwill.
- Updated FollowedBy to accept expressions that contain named results,
so that results names defined in the lookahead expression will be
returned, even though FollowedBy always returns an empty list.
Inspired by the same feature implemented in PrecededBy.
Version 2.2.2 - September, 2018
-------------------------------
- Fixed bug in SkipTo, if a SkipTo expression that was skipping to
an expression that returned a list (such as an And), and the
SkipTo was saved as a named result, the named result could be
saved as a ParseResults - should always be saved as a string.
Issue #28, reported by seron.
- Added simple_unit_tests.py, as a collection of easy-to-follow unit
tests for various classes and features of the pyparsing library.
Primary intent is more to be instructional than actually rigorous
testing. Complex tests can still be added in the unitTests.py file.
- New features added to the Regex class:
- optional asGroupList parameter, returns all the capture groups as
a list
- optional asMatch parameter, returns the raw re.match result
- new sub(repl) method, which adds a parse action calling
re.sub(pattern, repl, parsed_result). Simplifies creating
Regex expressions to be used with transformString. Like re.sub,
repl may be an ordinary string (similar to using pyparsing's
replaceWith), or may contain references to capture groups by group
number, or may be a callable that takes an re match group and
returns a string.
For instance:
expr = pp.Regex(r"([Hh]\d):\s*(.*)").sub(r"<\1>\2</\1>")
expr.transformString("h1: This is the title")
will return
<h1>This is the title</h1>
- Fixed omission of LICENSE file in source tarball, also added
CODE_OF_CONDUCT.md per GitHub community standards.
Version 2.2.1 - September, 2018
-------------------------------
- Applied changes necessary to migrate hosting of pyparsing source
over to GitHub. Many thanks for help and contributions from hugovk,
jdufresne, and cngkaygusuz among others through this transition,
sorry it took me so long!
- Fixed import of collections.abc to address DeprecationWarnings
in Python 3.7.
- Updated oc.py example to support function calls in arithmetic
expressions; fixed regex for '==' operator; and added packrat
parsing. Raised on the pyparsing wiki by Boris Marin, thanks!
- Fixed bug in select_parser.py example, group_by_terms was not
reported. Reported on SF bugs by Adam Groszer, thanks Adam!
- Added "Getting Started" section to the module docstring, to
guide new users to the most common starting points in pyparsing's
API.
- Fixed bug in Literal and Keyword classes, which erroneously
raised IndexError instead of ParseException.
Version 2.2.0 - March, 2017
---------------------------
- Bumped minor version number to reflect compatibility issues with
OneOrMore and ZeroOrMore bugfixes in 2.1.10. (2.1.10 fixed a bug
that was introduced in 2.1.4, but the fix could break code
written against 2.1.4 - 2.1.9.)
- Updated setup.py to address recursive import problems now
that pyparsing is part of 'packaging' (used by setuptools).
Patch submitted by Joshua Root, much thanks!
- Fixed KeyError issue reported by Yann Bizeul when using packrat
parsing in the Graphite time series database, thanks Yann!
- Fixed incorrect usages of '\' in literals, as described in
https://docs.python.org/3/whatsnew/3.6.html#deprecated-python-behavior
Patch submitted by Ville Skyttä - thanks!
- Minor internal change when using '-' operator, to be compatible
with ParserElement.streamline() method.
- Expanded infixNotation to accept a list or tuple of parse actions
to attach to an operation.
- New unit test added for dill support for storing pyparsing parsers.
Ordinary Python pickle can be used to pickle pyparsing parsers as
long as they do not use any parse actions. The 'dill' module is an
extension to pickle which *does* support pickling of attached
parse actions.
Version 2.1.10 - October, 2016
-------------------------------
- Fixed bug in reporting named parse results for ZeroOrMore
expressions, thanks Ethan Nash for reporting this!
- Fixed behavior of LineStart to be much more predictable.
LineStart can now be used to detect if the next parse position
is col 1, factoring in potential leading whitespace (which would
cause LineStart to fail). Also fixed a bug in col, which is
used in LineStart, where '\n's were erroneously considered to
be column 1.
- Added support for multiline test strings in runTests.
- Fixed bug in ParseResults.dump when keys were not strings.
Also changed display of string values to show them in quotes,
to help distinguish parsed numeric strings from parsed integers
that have been converted to Python ints.
Version 2.1.9 - September, 2016
-------------------------------
- Added class CloseMatch, a variation on Literal which matches
"close" matches, that is, strings with at most 'n' mismatching
characters.
- Fixed bug in Keyword.setDefaultKeywordChars(), reported by Kobayashi
Shinji - nice catch, thanks!
- Minor API change in pyparsing_common. Renamed some of the common
expressions to PEP8 format (to be consistent with the other
pyparsing_common expressions):
. signedInteger -> signed_integer
. sciReal -> sci_real
Also, in trying to stem the API bloat of pyparsing, I've copied
some of the global expressions and helper parse actions into
pyparsing_common, with the originals to be deprecated and removed
in a future release:
. commaSeparatedList -> pyparsing_common.comma_separated_list
. upcaseTokens -> pyparsing_common.upcaseTokens
. downcaseTokens -> pyparsing_common.downcaseTokens
(I don't expect any other expressions, like the comment expressions,
quotedString, or the Word-helping strings like alphas, nums, etc.
to migrate to pyparsing_common - they are just too pervasive. As for
the PEP8 vs camelCase naming, all the expressions are PEP8, while
the parse actions in pyparsing_common are still camelCase. It's a
small step - when pyparsing 3.0 comes around, everything will change
to PEP8 snake case.)
- Fixed Python3 compatibility bug when using dict keys() and values()
in ParseResults.getName().
- After some prodding, I've reworked the unitTests.py file for
pyparsing over the past few releases. It uses some variations on
unittest to handle my testing style. The test now:
. auto-discovers its test classes (while maintining their order
of definition)
. suppresses voluminous 'print' output for tests that pass
Version 2.1.8 - August, 2016
----------------------------
- Fixed issue in the optimization to _trim_arity, when the full
stacktrace is retrieved to determine if a TypeError is raised in
pyparsing or in the caller's parse action. Code was traversing
the full stacktrace, and potentially encountering UnicodeDecodeError.
- Fixed bug in ParserElement.inlineLiteralsUsing, causing infinite
loop with Suppress.
- Fixed bug in Each, when merging named results from multiple
expressions in a ZeroOrMore or OneOrMore. Also fixed bug when
ZeroOrMore expressions were erroneously treated as required
expressions in an Each expression.
- Added a few more inline doc examples.
- Improved use of runTests in several example scripts.
Version 2.1.7 - August, 2016
----------------------------
- Fixed regression reported by Andrea Censi (surfaced in PyContracts
tests) when using ParseSyntaxExceptions (raised when using operator '-')
with packrat parsing.
- Minor fix to oneOf, to accept all iterables, not just space-delimited
strings and lists. (If you have a list or set of strings, it is
not necessary to concat them using ' '.join to pass them to oneOf,
oneOf will accept the list or set or generator directly.)
Version 2.1.6 - August, 2016
----------------------------
- *Major packrat upgrade*, inspired by patch provided by Tal Einat -
many, many, thanks to Tal for working on this! Tal's tests show
faster parsing performance (2X in some tests), *and* memory reduction
from 3GB down to ~100MB! Requires no changes to existing code using
packratting. (Uses OrderedDict, available in Python 2.7 and later.
For Python 2.6 users, will attempt to import from ordereddict
backport. If not present, will implement pure-Python Fifo dict.)
- Minor API change - to better distinguish between the flexible
numeric types defined in pyparsing_common, I've changed "numeric"
(which parsed numbers of different types and returned int for ints,
float for floats, etc.) and "number" (which parsed numbers of int
or float type, and returned all floats) to "number" and "fnumber"
respectively. I hope the "f" prefix of "fnumber" will be a better
indicator of its internal conversion of parsed values to floats,
while the generic "number" is similar to the flexible number syntax
in other languages. Also fixed a bug in pyparsing_common.numeric
(now renamed to pyparsing_common.number), integers were parsed and
returned as floats instead of being retained as ints.
- Fixed bug in upcaseTokens and downcaseTokens introduced in 2.1.5,
when the parse action was used in conjunction with results names.
Reported by Steven Arcangeli from the dql project, thanks for your
patience, Steven!
- Major change to docs! After seeing some comments on reddit about
general issue with docs of Python modules, and thinking that I'm a
little overdue in doing some doc tuneup on pyparsing, I decided to
following the suggestions of the redditor and add more inline examples
to the pyparsing reference documentation. I hope this addition
will clarify some of the more common questions people have, especially
when first starting with pyparsing/Python.
- Deprecated ParseResults.asXML. I've never been too happy with this
method, and it usually forces some unnatural code in the parsers in
order to get decent tag names. The amount of guesswork that asXML
has to do to try to match names with values should have been a red
flag from day one. If you are using asXML, you will need to implement
your own ParseResults->XML serialization. Or consider migrating to
a more current format such as JSON (which is very easy to do:
results_as_json = json.dumps(parse_result.asDict()) Hopefully, when
I remove this code in a future version, I'll also be able to simplify
some of the craziness in ParseResults, which IIRC was only there to try
to make asXML work.
- Updated traceParseAction parse action decorator to show the repr
of the input and output tokens, instead of the str format, since
str has been simplified to just show the token list content.
(The change to ParseResults.__str__ occurred in pyparsing 2.0.4, but
it seems that didn't make it into the release notes - sorry! Too
many users, especially beginners, were confused by the
"([token_list], {names_dict})" str format for ParseResults, thinking
they were getting a tuple containing a list and a dict. The full form
can be seen if using repr().)
For tracing tokens in and out of parse actions, the more complete
repr form provides important information when debugging parse actions.
Version 2.1.5 - June, 2016
------------------------------
- Added ParserElement.split() generator method, similar to re.split().
Includes optional arguments maxsplit (to limit the number of splits),
and includeSeparators (to include the separating matched text in the
returned output, default=False).
- Added a new parse action construction helper tokenMap, which will
apply a function and optional arguments to each element in a
ParseResults. So this parse action:
def lowercase_all(tokens):
return [str(t).lower() for t in tokens]
OneOrMore(Word(alphas)).setParseAction(lowercase_all)
can now be written:
OneOrMore(Word(alphas)).setParseAction(tokenMap(str.lower))
Also simplifies writing conversion parse actions like:
integer = Word(nums).setParseAction(lambda t: int(t[0]))
to just:
integer = Word(nums).setParseAction(tokenMap(int))
If additional arguments are necessary, they can be included in the
call to tokenMap, as in:
hex_integer = Word(hexnums).setParseAction(tokenMap(int, 16))
- Added more expressions to pyparsing_common:
. IPv4 and IPv6 addresses (including long, short, and mixed forms
of IPv6)
. MAC address
. ISO8601 date and date time strings (with named fields for year, month, etc.)
. UUID (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
. hex integer (returned as int)
. fraction (integer '/' integer, returned as float)
. mixed integer (integer '-' fraction, or just fraction, returned as float)
. stripHTMLTags (parse action to remove tags from HTML source)
. parse action helpers convertToDate and convertToDatetime to do custom parse
time conversions of parsed ISO8601 strings
- runTests now returns a two-tuple: success if all tests succeed,
and an output list of each test and its output lines.
- Added failureTests argument (default=False) to runTests, so that
tests can be run that are expected failures, and runTests' success
value will return True only if all tests *fail* as expected. Also,
parseAll now defaults to True.
- New example numerics.py, shows samples of parsing integer and real
numbers using locale-dependent formats:
4.294.967.295,000
4 294 967 295,000
4,294,967,295.000
Version 2.1.4 - May, 2016
------------------------------
- Split out the '==' behavior in ParserElement, now implemented
as the ParserElement.matches() method. Using '==' for string test
purposes will be removed in a future release.
- Expanded capabilities of runTests(). Will now accept embedded
comments (default is Python style, leading '#' character, but
customizable). Comments will be emitted along with the tests and
test output. Useful during test development, to create a test string
consisting only of test case description comments separated by
blank lines, and then fill in the test cases. Will also highlight
ParseFatalExceptions with "(FATAL)".
- Added a 'pyparsing_common' class containing common/helpful little
expressions such as integer, float, identifier, etc. I used this
class as a sort of embedded namespace, to contain these helpers
without further adding to pyparsing's namespace bloat.
- Minor enhancement to traceParseAction decorator, to retain the
parse action's name for the trace output.
- Added optional 'fatal' keyword arg to addCondition, to indicate that
a condition failure should halt parsing immediately.
Version 2.1.3 - May, 2016
------------------------------
- _trim_arity fix in 2.1.2 was very version-dependent on Py 3.5.0.
Now works for Python 2.x, 3.3, 3.4, 3.5.0, and 3.5.1 (and hopefully
beyond).
Version 2.1.2 - May, 2016
------------------------------
- Fixed bug in _trim_arity when pyparsing code is included in a
PyInstaller, reported by maluwa.
- Fixed catastrophic regex backtracking in implementation of the
quoted string expressions (dblQuotedString, sglQuotedString, and
quotedString). Reported on the pyparsing wiki by webpentest,
good catch! (Also tuned up some other expressions susceptible to the
same backtracking problem, such as cStyleComment, cppStyleComment,
etc.)
Version 2.1.1 - March, 2016
---------------------------
- Added support for assigning to ParseResults using slices.
- Fixed bug in ParseResults.toDict(), in which dict values were always
converted to dicts, even if they were just unkeyed lists of tokens.
Reported on SO by Gerald Thibault, thanks Gerald!
- Fixed bug in SkipTo when using failOn, reported by robyschek, thanks!
- Fixed bug in Each introduced in 2.1.0, reported by AND patch and
unit test submitted by robyschek, well done!
- Removed use of functools.partial in replaceWith, as this creates
an ambiguous signature for the generated parse action, which fails in
PyPy. Reported by Evan Hubinger, thanks Evan!
- Added default behavior to QuotedString to convert embedded '\t', '\n',
etc. characters to their whitespace counterparts. Found during Q&A
exchange on SO with Maxim.
Version 2.1.0 - February, 2016
------------------------------
- Modified the internal _trim_arity method to distinguish between
TypeError's raised while trying to determine parse action arity and
those raised within the parse action itself. This will clear up those
confusing "<lambda>() takes exactly 1 argument (0 given)" error
messages when there is an actual TypeError in the body of the parse
action. Thanks to all who have raised this issue in the past, and
most recently to Michael Cohen, who sent in a proposed patch, and got
me to finally tackle this problem.
- Added compatibility for pickle protocols 2-4 when pickling ParseResults.
In Python 2.x, protocol 0 was the default, and protocol 2 did not work.
In Python 3.x, protocol 3 is the default, so explicitly naming
protocol 0 or 1 was required to pickle ParseResults. With this release,
all protocols 0-4 are supported. Thanks for reporting this on StackOverflow,
Arne Wolframm, and for providing a nice simple test case!
- Added optional 'stopOn' argument to ZeroOrMore and OneOrMore, to
simplify breaking on stop tokens that would match the repetition
expression.
It is a common problem to fail to look ahead when matching repetitive
tokens if the sentinel at the end also matches the repetition
expression, as when parsing "BEGIN aaa bbb ccc END" with:
"BEGIN" + OneOrMore(Word(alphas)) + "END"
Since "END" matches the repetition expression "Word(alphas)", it will
never get parsed as the terminating sentinel. Up until now, this has
to be resolved by the user inserting their own negative lookahead:
"BEGIN" + OneOrMore(~Literal("END") + Word(alphas)) + "END"
Using stopOn, they can more easily write:
"BEGIN" + OneOrMore(Word(alphas), stopOn="END") + "END"
The stopOn argument can be a literal string or a pyparsing expression.
Inspired by a question by Lamakaha on StackOverflow (and many previous
questions with the same negative-lookahead resolution).
- Added expression names for many internal and builtin expressions, to
reduce name and error message overhead during parsing.
- Converted helper lambdas to functions to refactor and add docstring
support.
- Fixed ParseResults.asDict() to correctly convert nested ParseResults
values to dicts.
- Cleaned up some examples, fixed typo in fourFn.py identified by
aristotle2600 on reddit.
- Removed keepOriginalText helper method, which was deprecated ages ago.
Superceded by originalTextFor.
- Same for the Upcase class, which was long ago deprecated and replaced
with the upcaseTokens method.
Version 2.0.7 - December, 2015
------------------------------
- Simplified string representation of Forward class, to avoid memory
and performance errors while building ParseException messages. Thanks,
Will McGugan, Andrea Censi, and Martijn Vermaat for the bug reports and
test code.
- Cleaned up additional issues from enhancing the error messages for
Or and MatchFirst, handling Unicode values in expressions. Fixes Unicode
encoding issues in Python 2, thanks to Evan Hubinger for the bug report.
- Fixed implementation of dir() for ParseResults - was leaving out all the
defined methods and just adding the custom results names.
- Fixed bug in ignore() that was introduced in pyparsing 1.5.3, that would
not accept a string literal as the ignore expression.
- Added new example parseTabularData.py to illustrate parsing of data
formatted in columns, with detection of empty cells.
- Updated a number of examples to more current Python and pyparsing
forms.
Version 2.0.6 - November, 2015
------------------------------
- Fixed a bug in Each when multiple Optional elements are present.
Thanks for reporting this, whereswalden on SO.
- Fixed another bug in Each, when Optional elements have results names
or parse actions, reported by Max Rothman - thank you, Max!
- Added optional parseAll argument to runTests, whether tests should
require the entire input string to be parsed or not (similar to
parseAll argument to parseString). Plus a little neaten-up of the
output on Python 2 (no stray ()'s).
- Modified exception messages from MatchFirst and Or expressions. These
were formerly misleading as they would only give the first or longest
exception mismatch error message. Now the error message includes all
the alternatives that were possible matches. Originally proposed by
a pyparsing user, but I've lost the email thread - finally figured out
a fairly clean way to do this.
- Fixed a bug in Or, when a parse action on an alternative raises an
exception, other potentially matching alternatives were not always tried.
Reported by TheVeryOmni on the pyparsing wiki, thanks!
- Fixed a bug to dump() introduced in 2.0.4, where list values were shown
in duplicate.
Version 2.0.5 - October, 2015
-----------------------------
- (&$(@#&$(@!!!! Some "print" statements snuck into pyparsing v2.0.4,
breaking Python 3 compatibility! Fixed. Reported by jenshn, thanks!
Version 2.0.4 - October, 2015
-----------------------------
- Added ParserElement.addCondition, to simplify adding parse actions
that act primarily as filters. If the given condition evaluates False,
pyparsing will raise a ParseException. The condition should be a method
with the same method signature as a parse action, but should return a
boolean. Suggested by Victor Porton, nice idea Victor, thanks!
- Slight mod to srange to accept unicode literals for the input string,
such as "[а-яА-Я]" instead of "[\u0430-\u044f\u0410-\u042f]". Thanks
to Alexandr Suchkov for the patch!
- Enhanced implementation of replaceWith.
- Fixed enhanced ParseResults.dump() method when the results consists
only of an unnamed array of sub-structure results. Reported by Robin
Siebler, thanks for your patience and persistence, Robin!
- Fixed bug in fourFn.py example code, where pi and e were defined using
CaselessLiteral instead of CaselessKeyword. This was not a problem until
adding a new function 'exp', and the leading 'e' of 'exp' was accidentally
parsed as the mathematical constant 'e'. Nice catch, Tom Grydeland - thanks!
- Adopt new-fangled Python features, like decorators and ternary expressions,
per suggestions from Williamzjc - thanks William! (Oh yeah, I'm not
supporting Python 2.3 with this code any more...) Plus, some additional
code fixes/cleanup - thanks again!
- Added ParserElement.runTests, a little test bench for quickly running
an expression against a list of sample input strings. Basically, I got
tired of writing the same test code over and over, and finally added it
as a test point method on ParserElement.
- Added withClass helper method, a simplified version of withAttribute for
the common but annoying case when defining a filter on a div's class -
made difficult because 'class' is a Python reserved word.
Version 2.0.3 - October, 2014
-----------------------------
- Fixed escaping behavior in QuotedString. Formerly, only quotation
marks (or characters designated as quotation marks in the QuotedString
constructor) would be escaped. Now all escaped characters will be
escaped, and the escaping backslashes will be removed.
- Fixed regression in ParseResults.pop() - pop() was pretty much
broken after I added *improvements* in 2.0.2. Reported by Iain
Shelvington, thanks Iain!
- Fixed bug in And class when initializing using a generator.
- Enhanced ParseResults.dump() method to list out nested ParseResults that
are unnamed arrays of sub-structures.
- Fixed UnboundLocalError under Python 3.4 in oneOf method, reported
on Sourceforge by aldanor, thanks!
- Fixed bug in ParseResults __init__ method, when returning non-ParseResults
types from parse actions that implement __eq__. Raised during discussion
on the pyparsing wiki with cyrfer.
Version 2.0.2 - April, 2014
---------------------------
- Extended "expr(name)" shortcut (same as "expr.setResultsName(name)")
to accept "expr()" as a shortcut for "expr.copy()".
- Added "locatedExpr(expr)" helper, to decorate any returned tokens
with their location within the input string. Adds the results names
locn_start and locn_end to the output parse results.
- Added "pprint()" method to ParseResults, to simplify troubleshooting
and prettified output. Now instead of importing the pprint module
and then writing "pprint.pprint(result)", you can just write
"result.pprint()". This method also accepts additional positional and
keyword arguments (such as indent, width, etc.), which get passed
through directly to the pprint method
(see https://docs.python.org/2/library/pprint.html#pprint.pprint).
- Removed deprecation warnings when using '<<' for Forward expression
assignment. '<<=' is still preferred, but '<<' will be retained
for cases where '<<=' operator is not suitable (such as in defining
lambda expressions).
- Expanded argument compatibility for classes and functions that
take list arguments, to now accept generators as well.
- Extended list-like behavior of ParseResults, adding support for
append and extend. NOTE: if you have existing applications using
these names as results names, you will have to access them using
dict-style syntax: res["append"] and res["extend"]
- ParseResults emulates the change in list vs. iterator semantics for
methods like keys(), values(), and items(). Under Python 2.x, these
methods will return lists, under Python 3.x, these methods will
return iterators.
- ParseResults now has a method haskeys() which returns True or False
depending on whether any results names have been defined. This simplifies
testing for the existence of results names under Python 3.x, which
returns keys() as an iterator, not a list.
- ParseResults now supports both list and dict semantics for pop().
If passed no argument or an integer argument, it will use list semantics
and pop tokens from the list of parsed tokens. If passed a non-integer
argument (most likely a string), it will use dict semantics and
pop the corresponding value from any defined results names. A
second default return value argument is supported, just as in
dict.pop().
- Fixed bug in markInputline, thanks for reporting this, Matt Grant!
- Cleaned up my unit test environment, now runs with Python 2.6 and
3.3.
Version 2.0.1 - July, 2013
--------------------------
- Removed use of "nonlocal" that prevented using this version of
pyparsing with Python 2.6 and 2.7. This will make it easier to
install for packages that depend on pyparsing, under Python
versions 2.6 and later. Those using older versions of Python
will have to manually install pyparsing 1.5.7.
- Fixed implementation of <<= operator to return self; reported by
Luc J. Bourhis, with patch fix by Mathias Mamsch - thanks, Luc
and Mathias!
Version 2.0.0 - November, 2012
------------------------------
- Rather than release another combined Python 2.x/3.x release
I've decided to start a new major version that is only
compatible with Python 3.x (and consequently Python 2.7 as
well due to backporting of key features). This version will
be the main development path from now on, with little follow-on
development on the 1.5.x path.
- Operator '<<' is now deprecated, in favor of operator '<<=' for
attaching parsing expressions to Forward() expressions. This is
being done to address precedence of operations problems with '<<'.
Operator '<<' will be removed in a future version of pyparsing.
Version 1.5.7 - November, 2012
-----------------------------
- NOTE: This is the last release of pyparsing that will try to
maintain compatibility with Python versions < 2.6. The next
release of pyparsing will be version 2.0.0, using new Python
syntax that will not be compatible for Python version 2.5 or
older.
- An awesome new example is included in this release, submitted
by Luca DellOlio, for parsing ANTLR grammar definitions, nice
work Luca!
- Fixed implementation of ParseResults.__str__ to use Pythonic
''.join() instead of repeated string concatenation. This
purportedly has been a performance issue under PyPy.
- Fixed bug in ParseResults.__dir__ under Python 3, reported by
Thomas Kluyver, thank you Thomas!
- Added ParserElement.inlineLiteralsUsing static method, to
override pyparsing's default behavior of converting string
literals to Literal instances, to use other classes (such
as Suppress or CaselessLiteral).
- Added new operator '<<=', which will eventually replace '<<' for
storing the contents of a Forward(). '<<=' does not have the same
operator precedence problems that '<<' does.
- 'operatorPrecedence' is being renamed 'infixNotation' as a better
description of what this helper function creates. 'operatorPrecedence'
is deprecated, and will be dropped entirely in a future release.
- Added optional arguments lpar and rpar to operatorPrecedence, so that
expressions that use it can override the default suppression of the
grouping characters.
- Added support for using single argument builtin functions as parse
actions. Now you can write 'expr.setParseAction(len)' and get back
the length of the list of matched tokens. Supported builtins are:
sum, len, sorted, reversed, list, tuple, set, any, all, min, and max.
A script demonstrating this feature is included in the examples
directory.
- Improved linking in generated docs, proposed on the pyparsing wiki
by techtonik, thanks!
- Fixed a bug in the definition of 'alphas', which was based on the
string.uppercase and string.lowercase "constants", which in fact
*aren't* constant, but vary with locale settings. This could make
parsers locale-sensitive in a subtle way. Thanks to Kef Schecter for
his diligence in following through on reporting and monitoring
this bugfix!
- Fixed a bug in the Py3 version of pyparsing, during exception
handling with packrat parsing enabled, reported by Catherine
Devlin - thanks Catherine!
- Fixed typo in ParseBaseException.__dir__, reported anonymously on
the SourceForge bug tracker, thank you Pyparsing User With No Name.
- Fixed bug in srange when using '\x###' hex character codes.
- Added optional 'intExpr' argument to countedArray, so that you
can define your own expression that will evaluate to an integer,
to be used as the count for the following elements. Allows you
to define a countedArray with the count given in hex, for example,
by defining intExpr as "Word(hexnums).setParseAction(int(t[0],16))".
Version 1.5.6 - June, 2011
----------------------------
- Cleanup of parse action normalizing code, to be more version-tolerant,
and robust in the face of future Python versions - much thanks to
Raymond Hettinger for this rewrite!
- Removal of exception cacheing, addressing a memory leak condition
in Python 3. Thanks to Michael Droettboom and the Cape Town PUG for
their analysis and work on this problem!
- Fixed bug when using packrat parsing, where a previously parsed
expression would duplicate subsequent tokens - reported by Frankie
Ribery on stackoverflow, thanks!
- Added 'ungroup' helper method, to address token grouping done
implicitly by And expressions, even if only one expression in the
And actually returns any text - also inspired by stackoverflow
discussion with Frankie Ribery!
- Fixed bug in srange, which accepted escaped hex characters of the
form '\0x##', but should be '\x##'. Both forms will be supported
for backwards compatibility.
- Enhancement to countedArray, accepting an optional expression to be
used for matching the leading integer count - proposed by Mathias on
the pyparsing mailing list, good idea!
- Added the Verilog parser to the provided set of examples, under the
MIT license. While this frees up this parser for any use, if you find
yourself using it in a commercial purpose, please consider making a
charitable donation as described in the parser's header.
- Added the excludeChars argument to the Word class, to simplify defining
a word composed of all characters in a large range except for one or
two. Suggested by JesterEE on the pyparsing wiki.
- Added optional overlap parameter to scanString, to return overlapping
matches found in the source text.
- Updated oneOf internal regular expression generation, with improved
parse time performance.
- Slight performance improvement in transformString, removing empty
strings from the list of string fragments built while scanning the
source text, before calling ''.join. Especially useful when using
transformString to strip out selected text.
- Enhanced form of using the "expr('name')" style of results naming,
in lieu of calling setResultsName. If name ends with an '*', then
this is equivalent to expr.setResultsName('name',listAllMatches=True).
- Fixed up internal list flattener to use iteration instead of recursion,
to avoid stack overflow when transforming large files.
- Added other new examples:
. protobuf parser - parses Google's protobuf language
. btpyparse - a BibTex parser contributed by Matthew Brett,
with test suite test_bibparse.py (thanks, Matthew!)
. groupUsingListAllMatches.py - demo using trailing '*' for results
names
Version 1.5.5 - August, 2010
----------------------------
- Typo in Python3 version of pyparsing, "builtin" should be "builtins".
(sigh)
Version 1.5.4 - August, 2010
----------------------------
- Fixed __builtins__ and file references in Python 3 code, thanks to
Greg Watson, saulspatz, sminos, and Mark Summerfield for reporting
their Python 3 experiences.
- Added new example, apicheck.py, as a sample of scanning a Tcl-like
language for functions with incorrect number of arguments (difficult
to track down in Tcl languages). This example uses some interesting
methods for capturing exceptions while scanning through source
code.
- Added new example deltaTime.py, that takes everyday time references
like "an hour from now", "2 days ago", "next Sunday at 2pm".
Version 1.5.3 - June, 2010
--------------------------
- ======= NOTE: API CHANGE!!!!!!! ===============
With this release, and henceforward, the pyparsing module is
imported as "pyparsing" on both Python 2.x and Python 3.x versions.
- Fixed up setup.py to auto-detect Python version and install the
correct version of pyparsing - suggested by Alex Martelli,
thanks, Alex! (and my apologies to all those who struggled with
those spurious installation errors caused by my earlier
fumblings!)
- Fixed bug on Python3 when using parseFile, getting bytes instead of
a str from the input file.
- Fixed subtle bug in originalTextFor, if followed by
significant whitespace (like a newline) - discovered by
Francis Vidal, thanks!
- Fixed very sneaky bug in Each, in which Optional elements were
not completely recognized as optional - found by Tal Weiss, thanks
for your patience.
- Fixed off-by-1 bug in line() method when the first line of the
input text was an empty line. Thanks to John Krukoff for submitting
a patch!
- Fixed bug in transformString if grammar contains Group expressions,
thanks to patch submitted by barnabas79, nice work!
- Fixed bug in originalTextFor in which trailing comments or otherwised
ignored text got slurped in with the matched expression. Thanks to
michael_ramirez44 on the pyparsing wiki for reporting this just in
time to get into this release!
- Added better support for summing ParseResults, see the new example,
parseResultsSumExample.py.
- Added support for composing a Regex using a compiled RE object;
thanks to my new colleague, Mike Thornton!
- In version 1.5.2, I changed the way exceptions are raised in order
to simplify the stacktraces reported during parsing. An anonymous
user posted a bug report on SF that this behavior makes it difficult
to debug some complex parsers, or parsers nested within parsers. In
this release I've added a class attribute ParserElement.verbose_stacktrace,
with a default value of False. If you set this to True, pyparsing will
report stacktraces using the pre-1.5.2 behavior.
- New examples:
. pymicko.py, a MicroC compiler submitted by Zarko Zivanov.
(Note: this example is separately licensed under the GPLv3,
and requires Python 2.6 or higher.) Thank you, Zarko!
. oc.py, a subset C parser, using the BNF from the 1996 Obfuscated C
Contest.
. stateMachine2.py, a modified version of stateMachine.py submitted
by Matt Anderson, that is compatible with Python versions 2.7 and
above - thanks so much, Matt!
. select_parser.py, a parser for reading SQLite SELECT statements,
as specified at https://www.sqlite.org/lang_select.html this goes
into much more detail than the simple SQL parser included in pyparsing's
source code
. excelExpr.py, a *simplistic* first-cut at a parser for Excel
expressions, which I originally posted on comp.lang.python in January,
2010; beware, this parser omits many common Excel cases (addition of
numbers represented as strings, references to named ranges)
. cpp_enum_parser.py, a nice little parser posted my Mark Tolonen on
comp.lang.python in August, 2009 (redistributed here with Mark's
permission). Thanks a bunch, Mark!
. partial_gene_match.py, a sample I posted to Stackoverflow.com,
implementing a special variation on Literal that does "close" matching,
up to a given number of allowed mismatches. The application was to
find matching gene sequences, with allowance for one or two mismatches.
. tagCapture.py, a sample showing how to use a Forward placeholder to
enforce matching of text parsed in a previous expression.
. matchPreviousDemo.py, simple demo showing how the matchPreviousLiteral
helper method is used to match a previously parsed token.
Version 1.5.2 - April, 2009
------------------------------
- Added pyparsing_py3.py module, so that Python 3 users can use
pyparsing by changing their pyparsing import statement to:
import pyparsing_py3
Thanks for help from Patrick Laban and his friend Geremy
Condra on the pyparsing wiki.
- Removed __slots__ declaration on ParseBaseException, for
compatibility with IronPython 2.0.1. Raised by David
Lawler on the pyparsing wiki, thanks David!
- Fixed bug in SkipTo/failOn handling - caught by eagle eye
cpennington on the pyparsing wiki!
- Fixed second bug in SkipTo when using the ignore constructor
argument, reported by Catherine Devlin, thanks!
- Fixed obscure bug reported by Eike Welk when using a class
as a ParseAction with an errant __getitem__ method.
- Simplified exception stack traces when reporting parse
exceptions back to caller of parseString or parseFile - thanks
to a tip from Peter Otten on comp.lang.python.
- Changed behavior of scanString to avoid infinitely looping on
expressions that match zero-length strings. Prompted by a
question posted by ellisonbg on the wiki.
- Enhanced classes that take a list of expressions (And, Or,
MatchFirst, and Each) to accept generator expressions also.
This can be useful when generating lists of alternative
expressions, as in this case, where the user wanted to match
any repetitions of '+', '*', '#', or '.', but not mixtures
of them (that is, match '+++', but not '+-+'):
codes = "+*#."
format = MatchFirst(Word(c) for c in codes)
Based on a problem posed by Denis Spir on the Python tutor
list.
- Added new example eval_arith.py, which extends the example
simpleArith.py to actually evaluate the parsed expressions.
Version 1.5.1 - October, 2008
-------------------------------
- Added new helper method originalTextFor, to replace the use of
the current keepOriginalText parse action. Now instead of
using the parse action, as in:
fullName = Word(alphas) + Word(alphas)
fullName.setParseAction(keepOriginalText)
(in this example, we used keepOriginalText to restore any white
space that may have been skipped between the first and last
names)
You can now write:
fullName = originalTextFor(Word(alphas) + Word(alphas))
The implementation of originalTextFor is simpler and faster than
keepOriginalText, and does not depend on using the inspect or
imp modules.
- Added optional parseAll argument to parseFile, to be consistent
with parseAll argument to parseString. Posted by pboucher on the
pyparsing wiki, thanks!
- Added failOn argument to SkipTo, so that grammars can define
literal strings or pyparsing expressions which, if found in the
skipped text, will cause SkipTo to fail. Useful to prevent
SkipTo from reading past terminating expression. Instigated by
question posed by Aki Niimura on the pyparsing wiki.
- Fixed bug in nestedExpr if multi-character expressions are given
for nesting delimiters. Patch provided by new pyparsing user,
Hans-Martin Gaudecker - thanks, H-M!
- Removed dependency on xml.sax.saxutils.escape, and included
internal implementation instead - proposed by Mike Droettboom on
the pyparsing mailing list, thanks Mike! Also fixed erroneous
mapping in replaceHTMLEntity of " to ', now correctly maps
to ". (Also added support for mapping ' to '.)
- Fixed typo in ParseResults.insert, found by Alejandro Dubrovsky,
good catch!
- Added __dir__() methods to ParseBaseException and ParseResults,
to support new dir() behavior in Py2.6 and Py3.0. If dir() is
called on a ParseResults object, the returned list will include
the base set of attribute names, plus any results names that are
defined.
- Fixed bug in ParseResults.asXML(), in which the first named
item within a ParseResults gets reported with an <ITEM> tag
instead of with the correct results name.
- Fixed bug in '-' error stop, when '-' operator is used inside a
Combine expression.
- Reverted generator expression to use list comprehension, for
better compatibility with old versions of Python. Reported by
jester/artixdesign on the SourceForge pyparsing discussion list.
- Fixed bug in parseString(parseAll=True), when the input string
ends with a comment or whitespace.
- Fixed bug in LineStart and LineEnd that did not recognize any
special whitespace chars defined using ParserElement.setDefault-
WhitespaceChars, found while debugging an issue for Marek Kubica,
thanks for the new test case, Marek!
- Made Forward class more tolerant of subclassing.
Version 1.5.0 - June, 2008
--------------------------
This version of pyparsing includes work on two long-standing
FAQ's: support for forcing parsing of the complete input string
(without having to explicitly append StringEnd() to the grammar),
and a method to improve the mechanism of detecting where syntax
errors occur in an input string with various optional and
alternative paths. This release also includes a helper method
to simplify definition of indentation-based grammars. With
these changes (and the past few minor updates), I thought it was
finally time to bump the minor rev number on pyparsing - so
1.5.0 is now available! Read on...
- AT LAST!!! You can now call parseString and have it raise
an exception if the expression does not parse the entire
input string. This has been an FAQ for a LONG time.
The parseString method now includes an optional parseAll
argument (default=False). If parseAll is set to True, then
the given parse expression must parse the entire input
string. (This is equivalent to adding StringEnd() to the
end of the expression.) The default value is False to
retain backward compatibility.
Inspired by MANY requests over the years, most recently by
ecir-hana on the pyparsing wiki!
- Added new operator '-' for composing grammar sequences. '-'
behaves just like '+' in creating And expressions, but '-'
is used to mark grammar structures that should stop parsing
immediately and report a syntax error, rather than just
backtracking to the last successful parse and trying another
alternative. For instance, running the following code:
port_definition = Keyword("port") + '=' + Word(nums)
entity_definition = Keyword("entity") + "{" +
Optional(port_definition) + "}"
entity_definition.parseString("entity { port 100 }")
pyparsing fails to detect the missing '=' in the port definition.
But, since this expression is optional, pyparsing then proceeds
to try to match the closing '}' of the entity_definition. Not
finding it, pyparsing reports that there was no '}' after the '{'
character. Instead, we would like pyparsing to parse the 'port'
keyword, and if not followed by an equals sign and an integer,
to signal this as a syntax error.
This can now be done simply by changing the port_definition to:
port_definition = Keyword("port") - '=' + Word(nums)
Now after successfully parsing 'port', pyparsing must also find
an equals sign and an integer, or it will raise a fatal syntax
exception.
By judicious insertion of '-' operators, a pyparsing developer
can have their grammar report much more informative syntax error
messages.
Patches and suggestions proposed by several contributors on
the pyparsing mailing list and wiki - special thanks to
Eike Welk and Thomas/Poldy on the pyparsing wiki!
- Added indentedBlock helper method, to encapsulate the parse
actions and indentation stack management needed to keep track of
indentation levels. Use indentedBlock to define grammars for
indentation-based grouping grammars, like Python's.
indentedBlock takes up to 3 parameters:
- blockStatementExpr - expression defining syntax of statement
that is repeated within the indented block
- indentStack - list created by caller to manage indentation
stack (multiple indentedBlock expressions
within a single grammar should share a common indentStack)
- indent - boolean indicating whether block must be indented
beyond the current level; set to False for block of
left-most statements (default=True)
A valid block must contain at least one indented statement.
- Fixed bug in nestedExpr in which ignored expressions needed
to be set off with whitespace. Reported by Stefaan Himpe,
nice catch!
- Expanded multiplication of an expression by a tuple, to
accept tuple values of None:
. expr*(n,None) or expr*(n,) is equivalent
to expr*n + ZeroOrMore(expr)
(read as "at least n instances of expr")
. expr*(None,n) is equivalent to expr*(0,n)
(read as "0 to n instances of expr")
. expr*(None,None) is equivalent to ZeroOrMore(expr)
. expr*(1,None) is equivalent to OneOrMore(expr)
Note that expr*(None,n) does not raise an exception if
more than n exprs exist in the input stream; that is,
expr*(None,n) does not enforce a maximum number of expr
occurrences. If this behavior is desired, then write
expr*(None,n) + ~expr
- Added None as a possible operator for operatorPrecedence.
None signifies "no operator", as in multiplying m times x
in "y=mx+b".
- Fixed bug in Each, reported by Michael Ramirez, in which the
order of terms in the Each affected the parsing of the results.
Problem was due to premature grouping of the expressions in
the overall Each during grammar construction, before the
complete Each was defined. Thanks, Michael!
- Also fixed bug in Each in which Optional's with default values
were not getting the defaults added to the results of the
overall Each expression.
- Fixed a bug in Optional in which results names were not
assigned if a default value was supplied.
- Cleaned up Py3K compatibility statements, including exception
construction statements, and better equivalence between _ustr
and basestring, and __nonzero__ and __bool__.
Version 1.4.11 - February, 2008
-------------------------------
- With help from Robert A. Clark, this version of pyparsing
is compatible with Python 3.0a3. Thanks for the help,
Robert!
- Added WordStart and WordEnd positional classes, to support
expressions that must occur at the start or end of a word.
Proposed by piranha on the pyparsing wiki, good idea!
- Added matchOnlyAtCol helper parser action, to simplify
parsing log or data files that have optional fields that are
column dependent. Inspired by a discussion thread with
hubritic on comp.lang.python.
- Added withAttribute.ANY_VALUE as a match-all value when using
withAttribute. Used to ensure that an attribute is present,
without having to match on the actual attribute value.
- Added get() method to ParseResults, similar to dict.get().
Suggested by new pyparsing user, Alejandro Dubrovksy, thanks!
- Added '==' short-cut to see if a given string matches a
pyparsing expression. For instance, you can now write:
integer = Word(nums)
if "123" == integer:
# do something
print [ x for x in "123 234 asld".split() if x==integer ]
# prints ['123', '234']
- Simplified the use of nestedExpr when using an expression for
the opening or closing delimiters. Now the content expression
will not have to explicitly negate closing delimiters. Found
while working with dfinnie on GHOP Task #277, thanks!
- Fixed bug when defining ignorable expressions that are
later enclosed in a wrapper expression (such as ZeroOrMore,
OneOrMore, etc.) - found while working with Prabhu
Gurumurthy, thanks Prahbu!
- Fixed bug in withAttribute in which keys were automatically
converted to lowercase, making it impossible to match XML
attributes with uppercase characters in them. Using with-
Attribute requires that you reference attributes in all
lowercase if parsing HTML, and in correct case when parsing
XML.
- Changed '<<' operator on Forward to return None, since this
is really used as a pseudo-assignment operator, not as a
left-shift operator. By returning None, it is easier to
catch faulty statements such as a << b | c, where precedence
of operations causes the '|' operation to be performed
*after* inserting b into a, so no alternation is actually
implemented. The correct form is a << (b | c). With this
change, an error will be reported instead of silently
clipping the alternative term. (Note: this may break some
existing code, but if it does, the code had a silent bug in
it anyway.) Proposed by wcbarksdale on the pyparsing wiki,
thanks!
- Several unit tests were added to pyparsing's regression
suite, courtesy of the Google Highly-Open Participation
Contest. Thanks to all who administered and took part in
this event!
Version 1.4.10 - December 9, 2007
---------------------------------
- Fixed bug introduced in v1.4.8, parse actions were called for
intermediate operator levels, not just the deepest matching
operation level. Again, big thanks to Torsten Marek for
helping isolate this problem!
Version 1.4.9 - December 8, 2007
--------------------------------
- Added '*' multiplication operator support when creating
grammars, accepting either an integer, or a two-integer
tuple multiplier, as in:
ipAddress = Word(nums) + ('.'+Word(nums))*3
usPhoneNumber = Word(nums) + ('-'+Word(nums))*(1,2)
If multiplying by a tuple, the two integer values represent
min and max multiples. Suggested by Vincent of eToy.com,
great idea, Vincent!
- Fixed bug in nestedExpr, original version was overly greedy!
Thanks to Michael Ramirez for raising this issue.
- Fixed internal bug in ParseResults - when an item was deleted,
the key indices were not updated. Thanks to Tim Mitchell for
posting a bugfix patch to the SF bug tracking system!
- Fixed internal bug in operatorPrecedence - when the results of
a right-associative term were sent to a parse action, the wrong
tokens were sent. Reported by Torsten Marek, nice job!
- Added pop() method to ParseResults. If pop is called with an
integer or with no arguments, it will use list semantics and
update the ParseResults' list of tokens. If pop is called with
a non-integer (a string, for instance), then it will use dict
semantics and update the ParseResults' internal dict.
Suggested by Donn Ingle, thanks Donn!
- Fixed quoted string built-ins to accept '\xHH' hex characters
within the string.
Version 1.4.8 - October, 2007
-----------------------------
- Added new helper method nestedExpr to easily create expressions
that parse lists of data in nested parentheses, braces, brackets,
etc.
- Added withAttribute parse action helper, to simplify creating
filtering parse actions to attach to expressions returned by
makeHTMLTags and makeXMLTags. Use withAttribute to qualify a
starting tag with one or more required attribute values, to avoid
false matches on common tags such as <TD> or <DIV>.
- Added new examples nested.py and withAttribute.py to demonstrate
the new features.
- Added performance speedup to grammars using operatorPrecedence,
instigated by Stefan Reichör - thanks for the feedback, Stefan!
- Fixed bug/typo when deleting an element from a ParseResults by
using the element's results name.
- Fixed whitespace-skipping bug in wrapper classes (such as Group,
Suppress, Combine, etc.) and when using setDebug(), reported by
new pyparsing user dazzawazza on SourceForge, nice job!
- Added restriction to prevent defining Word or CharsNotIn expressions
with minimum length of 0 (should use Optional if this is desired),
and enhanced docstrings to reflect this limitation. Issue was
raised by Joey Tallieu, who submitted a patch with a slightly
different solution. Thanks for taking the initiative, Joey, and
please keep submitting your ideas!
- Fixed bug in makeHTMLTags that did not detect HTML tag attributes
with no '= value' portion (such as "<td nowrap>"), reported by
hamidh on the pyparsing wiki - thanks!
- Fixed minor bug in makeHTMLTags and makeXMLTags, which did not
accept whitespace in closing tags.
Version 1.4.7 - July, 2007
--------------------------
- NEW NOTATION SHORTCUT: ParserElement now accepts results names using
a notational shortcut, following the expression with the results name
in parentheses. So this:
stats = "AVE:" + realNum.setResultsName("average") + \
"MIN:" + realNum.setResultsName("min") + \
"MAX:" + realNum.setResultsName("max")
can now be written as this:
stats = "AVE:" + realNum("average") + \
"MIN:" + realNum("min") + \
"MAX:" + realNum("max")
The intent behind this change is to make it simpler to define results
names for significant fields within the expression, while keeping
the grammar syntax clean and uncluttered.
- Fixed bug when packrat parsing is enabled, with cached ParseResults
being updated by subsequent parsing. Reported on the pyparsing
wiki by Kambiz, thanks!
- Fixed bug in operatorPrecedence for unary operators with left
associativity, if multiple operators were given for the same term.
- Fixed bug in example simpleBool.py, corrected precedence of "and" vs.
"or" operations.
- Fixed bug in Dict class, in which keys were converted to strings
whether they needed to be or not. Have narrowed this logic to
convert keys to strings only if the keys are ints (which would
confuse __getitem__ behavior for list indexing vs. key lookup).
- Added ParserElement method setBreak(), which will invoke the pdb
module's set_trace() function when this expression is about to be
parsed.
- Fixed bug in StringEnd in which reading off the end of the input
string raises an exception - should match. Resolved while
answering a question for Shawn on the pyparsing wiki.
Version 1.4.6 - April, 2007
---------------------------
- Simplified constructor for ParseFatalException, to support common
exception construction idiom:
raise ParseFatalException, "unexpected text: 'Spanish Inquisition'"
- Added method getTokensEndLoc(), to be called from within a parse action,
for those parse actions that need both the starting *and* ending
location of the parsed tokens within the input text.
- Enhanced behavior of keepOriginalText so that named parse fields are
preserved, even though tokens are replaced with the original input
text matched by the current expression. Also, cleaned up the stack
traversal to be more robust. Suggested by Tim Arnold - thanks, Tim!
- Fixed subtle bug in which countedArray (and similar dynamic
expressions configured in parse actions) failed to match within Or,
Each, FollowedBy, or NotAny. Reported by Ralf Vosseler, thanks for
your patience, Ralf!
- Fixed Unicode bug in upcaseTokens and downcaseTokens parse actions,
scanString, and default debugging actions; reported (and patch submitted)
by Nikolai Zamkovoi, spasibo!
- Fixed bug when saving a tuple as a named result. The returned
token list gave the proper tuple value, but accessing the result by
name only gave the first element of the tuple. Reported by
Poromenos, nice catch!
- Fixed bug in makeHTMLTags/makeXMLTags, which failed to match tag
attributes with namespaces.
- Fixed bug in SkipTo when setting include=True, to have the skipped-to
tokens correctly included in the returned data. Reported by gunars on
the pyparsing wiki, thanks!
- Fixed typobug in OnceOnly.reset method, omitted self argument.
Submitted by eike welk, thanks for the lint-picking!
- Added performance enhancement to Forward class, suggested by
akkartik on the pyparsing Wiki discussion, nice work!
- Added optional asKeyword to Word constructor, to indicate that the
given word pattern should be matched only as a keyword, that is, it
should only match if it is within word boundaries.
- Added S-expression parser to examples directory.
- Added macro substitution example to examples directory.
- Added holaMundo.py example, excerpted from Marco Alfonso's blog -
muchas gracias, Marco!
- Modified internal cyclic references in ParseResults to use weakrefs;
this should help reduce the memory footprint of large parsing
programs, at some cost to performance (3-5%). Suggested by bca48150 on
the pyparsing wiki, thanks!
- Enhanced the documentation describing the vagaries and idiosyncrasies
of parsing strings with embedded tabs, and the impact on:
. parse actions
. scanString
. col and line helper functions
(Suggested by eike welk in response to some unexplained inconsistencies
between parsed location and offsets in the input string.)
- Cleaned up internal decorators to preserve function names,
docstrings, etc.
Version 1.4.5 - December, 2006
------------------------------
- Removed debugging print statement from QuotedString class. Sorry
for not stripping this out before the 1.4.4 release!
- A significant performance improvement, the first one in a while!
For my Verilog parser, this version of pyparsing is about double the
speed - YMMV.
- Added support for pickling of ParseResults objects. (Reported by
Jeff Poole, thanks Jeff!)
- Fixed minor bug in makeHTMLTags that did not recognize tag attributes
with embedded '-' or '_' characters. Also, added support for
passing expressions to makeHTMLTags and makeXMLTags, and used this
feature to define the globals anyOpenTag and anyCloseTag.
- Fixed error in alphas8bit, I had omitted the y-with-umlaut character.
- Added punc8bit string to complement alphas8bit - it contains all the
non-alphabetic, non-blank 8-bit characters.
- Added commonHTMLEntity expression, to match common HTML "ampersand"
codes, such as "<", ">", "&", " ", and """. This
expression also defines a results name 'entity', which can be used
to extract the entity field (that is, "lt", "gt", etc.). Also added
built-in parse action replaceHTMLEntity, which can be attached to
commonHTMLEntity to translate "<", ">", "&", " ", and
""" to "<", ">", "&", " ", and "'".
- Added example, htmlStripper.py, that strips HTML tags and scripts
from HTML pages. It also translates common HTML entities to their
respective characters.
Version 1.4.4 - October, 2006
-------------------------------
- Fixed traceParseAction decorator to also trap and record exception
returns from parse actions, and to handle parse actions with 0,
1, 2, or 3 arguments.
- Enhanced parse action normalization to support using classes as
parse actions; that is, the class constructor is called at parse
time and the __init__ function is called with 0, 1, 2, or 3
arguments. If passing a class as a parse action, the __init__
method must use one of the valid parse action parameter list
formats. (This technique is useful when using pyparsing to compile
parsed text into a series of application objects - see the new
example simpleBool.py.)
- Fixed bug in ParseResults when setting an item using an integer
index. (Reported by Christopher Lambacher, thanks!)
- Fixed whitespace-skipping bug, patch submitted by Paolo Losi -
grazie, Paolo!
- Fixed bug when a Combine contained an embedded Forward expression,
reported by cie on the pyparsing wiki - good catch!
- Fixed listAllMatches bug, when a listAllMatches result was
nested within another result. (Reported by don pasquale on
comp.lang.python, well done!)
- Fixed bug in ParseResults items() method, when returning an item
marked as listAllMatches=True
- Fixed bug in definition of cppStyleComment (and javaStyleComment)
in which '//' line comments were not continued to the next line
if the line ends with a '\'. (Reported by eagle-eyed Ralph
Corderoy!)
- Optimized re's for cppStyleComment and quotedString for better
re performance - also provided by Ralph Corderoy, thanks!
- Added new example, indentedGrammarExample.py, showing how to
define a grammar using indentation to show grouping (as Python
does for defining statement nesting). Instigated by an e-mail
discussion with Andrew Dalke, thanks Andrew!
- Added new helper operatorPrecedence (based on e-mail list discussion
with Ralph Corderoy and Paolo Losi), to facilitate definition of
grammars for expressions with unary and binary operators. For
instance, this grammar defines a 6-function arithmetic expression
grammar, with unary plus and minus, proper operator precedence,and
right- and left-associativity:
expr = operatorPrecedence( operand,
[("!", 1, opAssoc.LEFT),
("^", 2, opAssoc.RIGHT),
(oneOf("+ -"), 1, opAssoc.RIGHT),
(oneOf("* /"), 2, opAssoc.LEFT),
(oneOf("+ -"), 2, opAssoc.LEFT),]
)
Also added example simpleArith.py and simpleBool.py to provide
more detailed code samples using this new helper method.
- Added new helpers matchPreviousLiteral and matchPreviousExpr, for
creating adaptive parsing expressions that match the same content
as was parsed in a previous parse expression. For instance:
first = Word(nums)
matchExpr = first + ":" + matchPreviousLiteral(first)
will match "1:1", but not "1:2". Since this matches at the literal
level, this will also match the leading "1:1" in "1:10".
In contrast:
first = Word(nums)
matchExpr = first + ":" + matchPreviousExpr(first)
will *not* match the leading "1:1" in "1:10"; the expressions are
evaluated first, and then compared, so "1" is compared with "10".
- Added keepOriginalText parse action. Sometimes pyparsing's
whitespace-skipping leaves out too much whitespace. Adding this
parse action will restore any internal whitespace for a parse
expression. This is especially useful when defining expressions
for scanString or transformString applications.
- Added __add__ method for ParseResults class, to better support
using Python sum built-in for summing ParseResults objects returned
from scanString.
- Added reset method for the new OnlyOnce class wrapper for parse
actions (to allow a grammar to be used multiple times).
- Added optional maxMatches argument to scanString and searchString,
to short-circuit scanning after 'n' expression matches are found.
Version 1.4.3 - July, 2006
------------------------------
- Fixed implementation of multiple parse actions for an expression
(added in 1.4.2).
. setParseAction() reverts to its previous behavior, setting
one (or more) actions for an expression, overwriting any
action or actions previously defined
. new method addParseAction() appends one or more parse actions
to the list of parse actions attached to an expression
Now it is harder to accidentally append parse actions to an
expression, when what you wanted to do was overwrite whatever had
been defined before. (Thanks, Jean-Paul Calderone!)
- Simplified interface to parse actions that do not require all 3
parse action arguments. Very rarely do parse actions require more
than just the parsed tokens, yet parse actions still require all
3 arguments including the string being parsed and the location
within the string where the parse expression was matched. With this
release, parse actions may now be defined to be called as:
. fn(string,locn,tokens) (the current form)
. fn(locn,tokens)
. fn(tokens)
. fn()
The setParseAction and addParseAction methods will internally decorate
the provided parse actions with compatible wrappers to conform to
the full (string,locn,tokens) argument sequence.
- REMOVED SUPPORT FOR RETURNING PARSE LOCATION FROM A PARSE ACTION.
I announced this in March, 2004, and gave a final warning in the last
release. Now you can return a tuple from a parse action, and it will
be treated like any other return value (i.e., the tuple will be
substituted for the incoming tokens passed to the parse action,
which is useful when trying to parse strings into tuples).
- Added setFailAction method, taking a callable function fn that
takes the arguments fn(s,loc,expr,err) where:
. s - string being parsed
. loc - location where expression match was attempted and failed
. expr - the parse expression that failed
. err - the exception thrown
The function returns no values. It may throw ParseFatalException
if it is desired to stop parsing immediately.
(Suggested by peter21081944 on wikispaces.com)
- Added class OnlyOnce as helper wrapper for parse actions. OnlyOnce
only permits a parse action to be called one time, after which
all subsequent calls throw a ParseException.
- Added traceParseAction decorator to help debug parse actions.
Simply insert "@traceParseAction" ahead of the definition of your
parse action, and each invocation will be displayed, along with
incoming arguments, and returned value.
- Fixed bug when copying ParserElements using copy() or
setResultsName(). (Reported by Dan Thill, great catch!)
- Fixed bug in asXML() where token text contains <, >, and &
characters - generated XML now escapes these as <, > and
&. (Reported by Jacek Sieka, thanks!)
- Fixed bug in SkipTo() when searching for a StringEnd(). (Reported
by Pete McEvoy, thanks Pete!)
- Fixed "except Exception" statements, the most critical added as part
of the packrat parsing enhancement. (Thanks, Erick Tryzelaar!)
- Fixed end-of-string infinite looping on LineEnd and StringEnd
expressions. (Thanks again to Erick Tryzelaar.)
- Modified setWhitespaceChars to return self, to be consistent with
other ParserElement modifiers. (Suggested by Erick Tryzelaar.)
- Fixed bug/typo in new ParseResults.dump() method.
- Fixed bug in searchString() method, in which only the first token of
an expression was returned. searchString() now returns a
ParseResults collection of all search matches.
- Added example program removeLineBreaks.py, a string transformer that
converts text files with hard line-breaks into one with line breaks
only between paragraphs.
- Added example program listAllMatches.py, to illustrate using the
listAllMatches option when specifying results names (also shows new
support for passing lists to oneOf).
- Added example program linenoExample.py, to illustrate using the
helper methods lineno, line, and col, and returning objects from a
parse action.
- Added example program parseListString.py, to which can parse the
string representation of a Python list back into a true list. Taken
mostly from my PyCon presentation examples, but now with support
for tuple elements, too!
Version 1.4.2 - April 1, 2006 (No foolin'!)
-------------------------------------------
- Significant speedup from memoizing nested expressions (a technique
known as "packrat parsing"), thanks to Chris Lesniewski-Laas! Your
mileage may vary, but my Verilog parser almost doubled in speed to
over 600 lines/sec!
This speedup may break existing programs that use parse actions that
have side-effects. For this reason, packrat parsing is disabled when
you first import pyparsing. To activate the packrat feature, your
program must call the class method ParserElement.enablePackrat(). If
your program uses psyco to "compile as you go", you must call
enablePackrat before calling psyco.full(). If you do not do this,
Python will crash. For best results, call enablePackrat() immediately
after importing pyparsing.
- Added new helper method countedArray(expr), for defining patterns that
start with a leading integer to indicate the number of array elements,
followed by that many elements, matching the given expr parse
expression. For instance, this two-liner:
wordArray = countedArray(Word(alphas))
print wordArray.parseString("3 Practicality beats purity")[0]
returns the parsed array of words:
['Practicality', 'beats', 'purity']
The leading token '3' is suppressed, although it is easily obtained
from the length of the returned array.
(Inspired by e-mail discussion with Ralf Vosseler.)
- Added support for attaching multiple parse actions to a single
ParserElement. (Suggested by Dan "Dang" Griffith - nice idea, Dan!)
- Added support for asymmetric quoting characters in the recently-added
QuotedString class. Now you can define your own quoted string syntax
like "<<This is a string in double angle brackets.>>". To define
this custom form of QuotedString, your code would define:
dblAngleQuotedString = QuotedString('<<',endQuoteChar='>>')
QuotedString also supports escaped quotes, escape character other
than '\', and multiline.
- Changed the default value returned internally by Optional, so that
None can be used as a default value. (Suggested by Steven Bethard -
I finally saw the light!)
- Added dump() method to ParseResults, to make it easier to list out
and diagnose values returned from calling parseString.
- A new example, a search query string parser, submitted by Steven
Mooij and Rudolph Froger - a very interesting application, thanks!
- Added an example that parses the BNF in Python's Grammar file, in
support of generating Python grammar documentation. (Suggested by
J H Stovall.)
- A new example, submitted by Tim Cera, of a flexible parser module,
using a simple config variable to adjust parsing for input formats
that have slight variations - thanks, Tim!
- Added an example for parsing Roman numerals, showing the capability
of parse actions to "compile" Roman numerals into their integer
values during parsing.
- Added a new docs directory, for additional documentation or help.
Currently, this includes the text and examples from my recent
presentation at PyCon.
- Fixed another typo in CaselessKeyword, thanks Stefan Behnel.
- Expanded oneOf to also accept tuples, not just lists. This really
should be sufficient...
- Added deprecation warnings when tuple is returned from a parse action.
Looking back, I see that I originally deprecated this feature in March,
2004, so I'm guessing people really shouldn't have been using this
feature - I'll drop it altogether in the next release, which will
allow users to return a tuple from a parse action (which is really
handy when trying to reconstuct tuples from a tuple string
representation!).
Version 1.4.1 - February, 2006
------------------------------
- Converted generator expression in QuotedString class to list
comprehension, to retain compatibility with Python 2.3. (Thanks, Titus
Brown for the heads-up!)
- Added searchString() method to ParserElement, as an alternative to
using "scanString(instring).next()[0][0]" to search through a string
looking for a substring matching a given parse expression. (Inspired by
e-mail conversation with Dave Feustel.)
- Modified oneOf to accept lists of strings as well as a single string
of space-delimited literals. (Suggested by Jacek Sieka - thanks!)
- Removed deprecated use of Upcase in pyparsing test code. (Also caught by
Titus Brown.)
- Removed lstrip() call from Literal - too aggressive in stripping
whitespace which may be valid for some grammars. (Point raised by Jacek
Sieka). Also, made Literal more robust in the event of passing an empty
string.
- Fixed bug in replaceWith when returning None.
- Added cautionary documentation for Forward class when assigning a
MatchFirst expression, as in:
fwdExpr << a | b | c
Precedence of operators causes this to be evaluated as:
(fwdExpr << a) | b | c
thereby leaving b and c out as parseable alternatives. Users must
explicitly group the values inserted into the Forward:
fwdExpr << (a | b | c)
(Suggested by Scot Wilcoxon - thanks, Scot!)
Version 1.4 - January 18, 2006
------------------------------
- Added Regex class, to permit definition of complex embedded expressions
using regular expressions. (Enhancement provided by John Beisley, great
job!)
- Converted implementations of Word, oneOf, quoted string, and comment
helpers to utilize regular expression matching. Performance improvements
in the 20-40% range.
- Added QuotedString class, to support definition of non-standard quoted
strings (Suggested by Guillaume Proulx, thanks!)
- Added CaselessKeyword class, to streamline grammars with, well, caseless
keywords (Proposed by Stefan Behnel, thanks!)
- Fixed bug in SkipTo, when using an ignoreable expression. (Patch provided
by Anonymous, thanks, whoever-you-are!)
- Fixed typo in NoMatch class. (Good catch, Stefan Behnel!)
- Fixed minor bug in _makeTags(), using string.printables instead of
pyparsing.printables.
- Cleaned up some of the expressions created by makeXXXTags helpers, to
suppress extraneous <> characters.
- Added some grammar definition-time checking to verify that a grammar is
being built using proper ParserElements.
- Added examples:
. LAparser.py - linear algebra C preprocessor (submitted by Mike Ellis,
thanks Mike!)
. wordsToNum.py - converts word description of a number back to
the original number (such as 'one hundred and twenty three' -> 123)
. updated fourFn.py to support unary minus, added BNF comments
Version 1.3.3 - September 12, 2005
----------------------------------
- Improved support for Unicode strings that would be returned using
srange. Added greetingInKorean.py example, for a Korean version of
"Hello, World!" using Unicode. (Thanks, June Kim!)
- Added 'hexnums' string constant (nums+"ABCDEFabcdef") for defining
hexadecimal value expressions.
- NOTE: ===THIS CHANGE MAY BREAK EXISTING CODE===
Modified tag and results definitions returned by makeHTMLTags(),
to better support the looseness of HTML parsing. Tags to be
parsed are now caseless, and keys generated for tag attributes are
now converted to lower case.
Formerly, makeXMLTags("XYZ") would return a tag with results
name of "startXYZ", this has been changed to "startXyz". If this
tag is matched against '<XYZ Abc="1" DEF="2" ghi="3">', the
matched keys formerly would be "Abc", "DEF", and "ghi"; keys are
now converted to lower case, giving keys of "abc", "def", and
"ghi". These changes were made to try to address the lax
case sensitivity agreement between start and end tags in many
HTML pages.
No changes were made to makeXMLTags(), which assumes more rigorous
parsing rules.
Also, cleaned up case-sensitivity bugs in closing tags, and
switched to using Keyword instead of Literal class for tags.
(Thanks, Steve Young, for getting me to look at these in more
detail!)
- Added two helper parse actions, upcaseTokens and downcaseTokens,
which will convert matched text to all uppercase or lowercase,
respectively.
- Deprecated Upcase class, to be replaced by upcaseTokens parse
action.
- Converted messages sent to stderr to use warnings module, such as
when constructing a Literal with an empty string, one should use
the Empty() class or the empty helper instead.
- Added ' ' (space) as an escapable character within a quoted
string.
- Added helper expressions for common comment types, in addition
to the existing cStyleComment (/*...*/) and htmlStyleComment
(<!-- ... -->)
. dblSlashComment = // ... (to end of line)
. cppStyleComment = cStyleComment or dblSlashComment
. javaStyleComment = cppStyleComment
. pythonStyleComment = # ... (to end of line)
Version 1.3.2 - July 24, 2005
-----------------------------
- Added Each class as an enhanced version of And. 'Each' requires
that all given expressions be present, but may occur in any order.
Special handling is provided to group ZeroOrMore and OneOrMore
elements that occur out-of-order in the input string. You can also
construct 'Each' objects by joining expressions with the '&'
operator. When using the Each class, results names are strongly
recommended for accessing the matched tokens. (Suggested by Pradam
Amini - thanks, Pradam!)
- Stricter interpretation of 'max' qualifier on Word elements. If the
'max' attribute is specified, matching will fail if an input field
contains more than 'max' consecutive body characters. For example,
previously, Word(nums,max=3) would match the first three characters
of '0123456', returning '012' and continuing parsing at '3'. Now,
when constructed using the max attribute, Word will raise an
exception with this string.
- Cleaner handling of nested dictionaries returned by Dict. No
longer necessary to dereference sub-dictionaries as element [0] of
their parents.
=== NOTE: THIS CHANGE MAY BREAK SOME EXISTING CODE, BUT ONLY IF
PARSING NESTED DICTIONARIES USING THE LITTLE-USED DICT CLASS ===
(Prompted by discussion thread on the Python Tutor list, with
contributions from Danny Yoo, Kent Johnson, and original post by
Liam Clarke - thanks all!)
Version 1.3.1 - June, 2005
----------------------------------
- Added markInputline() method to ParseException, to display the input
text line location of the parsing exception. (Thanks, Stefan Behnel!)
- Added setDefaultKeywordChars(), so that Keyword definitions using a
custom keyword character set do not all need to add the keywordChars
constructor argument (similar to setDefaultWhitespaceChars()).
(suggested by rzhanka on the SourceForge pyparsing forum.)
- Simplified passing debug actions to setDebugAction(). You can now
pass 'None' for a debug action if you want to take the default
debug behavior. To suppress a particular debug action, you can pass
the pyparsing method nullDebugAction.
- Refactored parse exception classes, moved all behavior to
ParseBaseException, and the former ParseException is now a subclass of
ParseBaseException. Added a second subclass, ParseFatalException, as
a subclass of ParseBaseException. User-defined parse actions can raise
ParseFatalException if a data inconsistency is detected (such as a
begin-tag/end-tag mismatch), and this will stop all parsing immediately.
(Inspired by e-mail thread with Michele Petrazzo - thanks, Michelle!)
- Added helper methods makeXMLTags and makeHTMLTags, that simplify the
definition of XML or HTML tag parse expressions for a given tagname.
Both functions return a pair of parse expressions, one for the opening
tag (that is, '<tagname>') and one for the closing tag ('</tagname>').
The opening tagame also recognizes any attribute definitions that have
been included in the opening tag, as well as an empty tag (one with a
trailing '/', as in '<BODY/>' which is equivalent to '<BODY></BODY>').
makeXMLTags uses stricter XML syntax for attributes, requiring that they
be enclosed in double quote characters - makeHTMLTags is more lenient,
and accepts single-quoted strings or any contiguous string of characters
up to the next whitespace character or '>' character. Attributes can
be retrieved as dictionary or attribute values of the returned results
from the opening tag.
- Added example minimath2.py, a refinement on fourFn.py that adds
an interactive session and support for variables. (Thanks, Steven Siew!)
- Added performance improvement, up to 20% reduction! (Found while working
with Wolfgang Borgert on performance tuning of his TTCN3 parser.)
- And another performance improvement, up to 25%, when using scanString!
(Found while working with Henrik Westlund on his C header file scanner.)
- Updated UML diagrams to reflect latest class/method changes.
Version 1.3 - March, 2005
----------------------------------
- Added new Keyword class, as a special form of Literal. Keywords
must be followed by whitespace or other non-keyword characters, to
distinguish them from variables or other identifiers that just
happen to start with the same characters as a keyword. For instance,
the input string containing "ifOnlyIfOnly" will match a Literal("if")
at the beginning and in the middle, but will fail to match a
Keyword("if"). Keyword("if") will match only strings such as "if only"
or "if(only)". (Proposed by Wolfgang Borgert, and Berteun Damman
separately requested this on comp.lang.python - great idea!)
- Added setWhitespaceChars() method to override the characters to be
skipped as whitespace before matching a particular ParseElement. Also
added the class-level method setDefaultWhitespaceChars(), to allow
users to override the default set of whitespace characters (space,
tab, newline, and return) for all subsequently defined ParseElements.
(Inspired by Klaas Hofstra's inquiry on the Sourceforge pyparsing
forum.)
- Added helper parse actions to support some very common parse
action use cases:
. replaceWith(replStr) - replaces the matching tokens with the
provided replStr replacement string; especially useful with
transformString()
. removeQuotes - removes first and last character from string enclosed
in quotes (note - NOT the same as the string strip() method, as only
a single character is removed at each end)
- Added copy() method to ParseElement, to make it easier to define
different parse actions for the same basic parse expression. (Note, copy
is implicitly called when using setResultsName().)
(The following changes were posted to CVS as Version 1.2.3 -
October-December, 2004)
- Added support for Unicode strings in creating grammar definitions.
(Big thanks to Gavin Panella!)
- Added constant alphas8bit to include the following 8-bit characters:
ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ
- Added srange() function to simplify definition of Word elements, using
regexp-like '[A-Za-z0-9]' syntax. This also simplifies referencing
common 8-bit characters.
- Fixed bug in Dict when a single element Dict was embedded within another
Dict. (Thanks Andy Yates for catching this one!)
- Added 'formatted' argument to ParseResults.asXML(). If set to False,
suppresses insertion of whitespace for pretty-print formatting. Default
equals True for backward compatibility.
- Added setDebugActions() function to ParserElement, to allow user-defined
debugging actions.
- Added support for escaped quotes (either in \', \", or doubled quote
form) to the predefined expressions for quoted strings. (Thanks, Ero
Carrera!)
- Minor performance improvement (~5%) converting "char in string" tests
to "char in dict". (Suggested by Gavin Panella, cool idea!)
Version 1.2.2 - September 27, 2004
----------------------------------
- Modified delimitedList to accept an expression as the delimiter, instead
of only accepting strings.
- Modified ParseResults, to convert integer field keys to strings (to
avoid confusion with list access).
- Modified Combine, to convert all embedded tokens to strings before
combining.
- Fixed bug in MatchFirst in which parse actions would be called for
expressions that only partially match. (Thanks, John Hunter!)
- Fixed bug in fourFn.py example that fixes right-associativity of ^
operator. (Thanks, Andrea Griffini!)
- Added class FollowedBy(expression), to look ahead in the input string
without consuming tokens.
- Added class NoMatch that never matches any input. Can be useful in
debugging, and in very specialized grammars.
- Added example pgn.py, for parsing chess game files stored in Portable
Game Notation. (Thanks, Alberto Santini!)
Version 1.2.1 - August 19, 2004
-------------------------------
- Added SkipTo(expression) token type, simplifying grammars that only
want to specify delimiting expressions, and want to match any characters
between them.
- Added helper method dictOf(key,value), making it easier to work with
the Dict class. (Inspired by Pavel Volkovitskiy, thanks!).
- Added optional argument listAllMatches (default=False) to
setResultsName(). Setting listAllMatches to True overrides the default
modal setting of tokens to results names; instead, the results name
acts as an accumulator for all matching tokens within the local
repetition group. (Suggested by Amaury Le Leyzour - thanks!)
- Fixed bug in ParseResults, throwing exception when trying to extract
slice, or make a copy using [:]. (Thanks, Wilson Fowlie!)
- Fixed bug in transformString() when the input string contains <TAB>'s
(Thanks, Rick Walia!).
- Fixed bug in returning tokens from un-Grouped And's, Or's and
MatchFirst's, where too many tokens would be included in the results,
confounding parse actions and returned results.
- Fixed bug in naming ParseResults returned by And's, Or's, and Match
First's.
- Fixed bug in LineEnd() - matching this token now correctly consumes
and returns the end of line "\n".
- Added a beautiful example for parsing Mozilla calendar files (Thanks,
Petri Savolainen!).
- Added support for dynamically modifying Forward expressions during
parsing.
Version 1.2 - 20 June 2004
--------------------------
- Added definition for htmlComment to help support HTML scanning and
parsing.
- Fixed bug in generating XML for Dict classes, in which trailing item was
duplicated in the output XML.
- Fixed release bug in which scanExamples.py was omitted from release
files.
- Fixed bug in transformString() when parse actions are not defined on the
outermost parser element.
- Added example urlExtractor.py, as another example of using scanString
and parse actions.
Version 1.2beta3 - 4 June 2004
------------------------------
- Added White() token type, analogous to Word, to match on whitespace
characters. Use White in parsers with significant whitespace (such as
configuration file parsers that use indentation to indicate grouping).
Construct White with a string containing the whitespace characters to be
matched. Similar to Word, White also takes optional min, max, and exact
parameters.
- As part of supporting whitespace-signficant parsing, added parseWithTabs()
method to ParserElement, to override the default behavior in parseString
of automatically expanding tabs to spaces. To retain tabs during
parsing, call parseWithTabs() before calling parseString(), parseFile() or
scanString(). (Thanks, Jean-Guillaume Paradis for catching this, and for
your suggestions on whitespace-significant parsing.)
- Added transformString() method to ParseElement, as a complement to
scanString(). To use transformString, define a grammar and attach a parse
action to the overall grammar that modifies the returned token list.
Invoking transformString() on a target string will then scan for matches,
and replace the matched text patterns according to the logic in the parse
action. transformString() returns the resulting transformed string.
(Note: transformString() does *not* automatically expand tabs to spaces.)
Also added scanExamples.py to the examples directory to show sample uses of
scanString() and transformString().
- Removed group() method that was introduced in beta2. This turns out NOT to
be equivalent to nesting within a Group() object, and I'd prefer not to sow
more seeds of confusion.
- Fixed behavior of asXML() where tags for groups were incorrectly duplicated.
(Thanks, Brad Clements!)
- Changed beta version message to display to stderr instead of stdout, to
make asXML() easier to use. (Thanks again, Brad.)
Version 1.2beta2 - 19 May 2004
------------------------------
- *** SIMPLIFIED API *** - Parse actions that do not modify the list of tokens
no longer need to return a value. This simplifies those parse actions that
use the list of tokens to update a counter or record or display some of the
token content; these parse actions can simply end without having to specify
'return toks'.
- *** POSSIBLE API INCOMPATIBILITY *** - Fixed CaselessLiteral bug, where the
returned token text was not the original string (as stated in the docs),
but the original string converted to upper case. (Thanks, Dang Griffith!)
**NOTE: this may break some code that relied on this erroneous behavior.
Users should scan their code for uses of CaselessLiteral.**
- *** POSSIBLE CODE INCOMPATIBILITY *** - I have renamed the internal
attributes on ParseResults from 'dict' and 'list' to '__tokdict' and
'__toklist', to avoid collisions with user-defined data fields named 'dict'
and 'list'. Any client code that accesses these attributes directly will
need to be modified. Hopefully the implementation of methods such as keys(),
items(), len(), etc. on ParseResults will make such direct attribute
accessess unnecessary.
- Added asXML() method to ParseResults. This greatly simplifies the process
of parsing an input data file and generating XML-structured data.
- Added getName() method to ParseResults. This method is helpful when
a grammar specifies ZeroOrMore or OneOrMore of a MatchFirst or Or
expression, and the parsing code needs to know which expression matched.
(Thanks, Eric van der Vlist, for this idea!)
- Added items() and values() methods to ParseResults, to better support using
ParseResults as a Dictionary.
- Added parseFile() as a convenience function to parse the contents of an
entire text file. Accepts either a file name or a file object. (Thanks
again, Dang!)
- Added group() method to And, Or, and MatchFirst, as a short-cut alternative
to enclosing a construct inside a Group object.
- Extended fourFn.py to support exponentiation, and simple built-in functions.
- Added EBNF parser to examples, including a demo where it parses its own
EBNF! (Thanks to Seo Sanghyeon!)
- Added Delphi Form parser to examples, dfmparse.py, plus a couple of
sample Delphi forms as tests. (Well done, Dang!)
- Another performance speedup, 5-10%, inspired by Dang! Plus about a 20%
speedup, by pre-constructing and cacheing exception objects instead of
constructing them on the fly.
- Fixed minor bug when specifying oneOf() with 'caseless=True'.
- Cleaned up and added a few more docstrings, to improve the generated docs.
Version 1.1.2 - 21 Mar 2004
---------------------------
- Fixed minor bug in scanString(), so that start location is at the start of
the matched tokens, not at the start of the whitespace before the matched
tokens.
- Inclusion of HTML documentation, generated using Epydoc. Reformatted some
doc strings to better generate readable docs. (Beautiful work, Ed Loper,
thanks for Epydoc!)
- Minor performance speedup, 5-15%
- And on a process note, I've used the unittest module to define a series of
unit tests, to help avoid the embarrassment of the version 1.1 snafu.
Version 1.1.1 - 6 Mar 2004
--------------------------
- Fixed critical bug introduced in 1.1, which broke MatchFirst(!) token
matching.
**THANK YOU, SEO SANGHYEON!!!**
- Added "from future import __generators__" to permit running under
pre-Python 2.3.
- Added example getNTPservers.py, showing how to use pyparsing to extract
a text pattern from the HTML of a web page.
Version 1.1 - 3 Mar 2004
-------------------------
- ***Changed API*** - While testing out parse actions, I found that the value
of loc passed in was not the starting location of the matched tokens, but
the location of the next token in the list. With this version, the location
passed to the parse action is now the starting location of the tokens that
matched.
A second part of this change is that the return value of parse actions no
longer needs to return a tuple containing both the location and the parsed
tokens (which may optionally be modified); parse actions only need to return
the list of tokens. Parse actions that return a tuple are deprecated; they
will still work properly for conversion/compatibility, but this behavior will
be removed in a future version.
- Added validate() method, to help diagnose infinite recursion in a grammar tree.
validate() is not 100% fool-proof, but it can help track down nasty infinite
looping due to recursively referencing the same grammar construct without some
intervening characters.
- Cleaned up default listing of some parse element types, to more closely match
ordinary BNF. Instead of the form <classname>:[contents-list], some changes
are:
. And(token1,token2,token3) is "{ token1 token2 token3 }"
. Or(token1,token2,token3) is "{ token1 ^ token2 ^ token3 }"
. MatchFirst(token1,token2,token3) is "{ token1 | token2 | token3 }"
. Optional(token) is "[ token ]"
. OneOrMore(token) is "{ token }..."
. ZeroOrMore(token) is "[ token ]..."
- Fixed an infinite loop in oneOf if the input string contains a duplicated
option. (Thanks Brad Clements)
- Fixed a bug when specifying a results name on an Optional token. (Thanks
again, Brad Clements)
- Fixed a bug introduced in 1.0.6 when I converted quotedString to use
CharsNotIn; I accidentally permitted quoted strings to span newlines. I have
fixed this in this version to go back to the original behavior, in which
quoted strings do *not* span newlines.
- Fixed minor bug in HTTP server log parser. (Thanks Jim Richardson)
Version 1.0.6 - 13 Feb 2004
----------------------------
- Added CharsNotIn class (Thanks, Lee SangYeong). This is the opposite of
Word, in that it is constructed with a set of characters *not* to be matched.
(This enhancement also allowed me to clean up and simplify some of the
definitions for quoted strings, cStyleComment, and restOfLine.)
- **MINOR API CHANGE** - Added joinString argument to the __init__ method of
Combine (Thanks, Thomas Kalka). joinString defaults to "", but some
applications might choose some other string to use instead, such as a blank
or newline. joinString was inserted as the second argument to __init__,
so if you have code that specifies an adjacent value, without using
'adjacent=', this code will break.
- Modified LineStart to recognize the start of an empty line.
- Added optional caseless flag to oneOf(), to create a list of CaselessLiteral
tokens instead of Literal tokens.
- Added some enhancements to the SQL example:
. Oracle-style comments (Thanks to Harald Armin Massa)
. simple WHERE clause
- Minor performance speedup - 5-15%
Version 1.0.5 - 19 Jan 2004
----------------------------
- Added scanString() generator method to ParseElement, to support regex-like
pattern-searching
- Added items() list to ParseResults, to return named results as a
list of (key,value) pairs
- Fixed memory overflow in asList() for deeply nested ParseResults (Thanks,
Sverrir Valgeirsson)
- Minor performance speedup - 10-15%
Version 1.0.4 - 8 Jan 2004
---------------------------
- Added positional tokens StringStart, StringEnd, LineStart, and LineEnd
- Added commaSeparatedList to pre-defined global token definitions; also added
commasep.py to the examples directory, to demonstrate the differences between
parsing comma-separated data and simple line-splitting at commas
- Minor API change: delimitedList does not automatically enclose the
list elements in a Group, but makes this the responsibility of the caller;
also, if invoked using 'combine=True', the list delimiters are also included
in the returned text (good for scoped variables, such as a.b.c or a::b::c, or
for directory paths such as a/b/c)
- Performance speed-up again, 30-40%
- Added httpServerLogParser.py to examples directory, as this is
a common parsing task
Version 1.0.3 - 23 Dec 2003
---------------------------
- Performance speed-up again, 20-40%
- Added Python distutils installation setup.py, etc. (thanks, Dave Kuhlman)
Version 1.0.2 - 18 Dec 2003
---------------------------
- **NOTE: Changed API again!!!** (for the last time, I hope)
+ Renamed module from parsing to pyparsing, to better reflect Python
linkage.
- Also added dictExample.py to examples directory, to illustrate
usage of the Dict class.
Version 1.0.1 - 17 Dec 2003
---------------------------
- **NOTE: Changed API!**
+ Renamed 'len' argument on Word.__init__() to 'exact'
- Performance speed-up, 10-30%
Version 1.0.0 - 15 Dec 2003
---------------------------
- Initial public release
Version 0.1.1 thru 0.1.17 - October-November, 2003
--------------------------------------------------
- initial development iterations:
- added Dict, Group
- added helper methods oneOf, delimitedList
- added helpers quotedString (and double and single), restOfLine, cStyleComment
- added MatchFirst as an alternative to the slower Or
- added UML class diagram
- fixed various logic bugs
|