1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 4160 4161 4162 4163 4164 4165 4166 4167 4168 4169 4170 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 4195 4196 4197 4198 4199 4200 4201 4202 4203 4204 4205 4206 4207 4208 4209 4210 4211 4212 4213 4214 4215 4216 4217 4218 4219 4220 4221 4222 4223 4224 4225 4226 4227 4228 4229 4230 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 4241 4242 4243 4244 4245 4246 4247 4248 4249 4250 4251 4252 4253 4254 4255 4256 4257 4258 4259 4260 4261 4262 4263 4264 4265 4266 4267 4268 4269 4270 4271 4272 4273 4274 4275 4276 4277 4278 4279 4280 4281 4282 4283 4284 4285 4286 4287 4288 4289 4290 4291 4292 4293 4294 4295 4296 4297 4298 4299 4300 4301 4302 4303 4304 4305 4306 4307 4308 4309 4310 4311 4312 4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 4323 4324 4325 4326 4327 4328 4329 4330 4331 4332 4333 4334 4335 4336 4337 4338 4339 4340 4341 4342 4343 4344 4345 4346 4347 4348 4349 4350 4351 4352 4353 4354 4355 4356 4357 4358 4359 4360 4361 4362 4363 4364 4365 4366 4367 4368 4369 4370 4371 4372 4373 4374 4375 4376 4377 4378 4379 4380 4381 4382 4383 4384 4385 4386 4387 4388 4389 4390 4391 4392 4393 4394 4395 4396 4397 4398 4399 4400 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413 4414 4415 4416 4417 4418 4419 4420 4421 4422 4423 4424 4425 4426 4427 4428 4429 4430 4431 4432 4433 4434 4435 4436 4437 4438 4439 4440 4441 4442 4443 4444 4445 4446 4447 4448 4449 4450 4451 4452 4453 4454 4455 4456 4457 4458 4459 4460 4461 4462 4463 4464 4465 4466 4467 4468 4469 4470 4471 4472 4473 4474 4475 4476 4477 4478 4479 4480 4481 4482 4483 4484 4485 4486 4487 4488 4489 4490 4491 4492 4493 4494 4495 4496 4497 4498 4499 4500 4501 4502 4503 4504 4505 4506 4507 4508 4509 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 4522 4523 4524 4525 4526 4527 4528 4529 4530 4531 4532 4533 4534 4535 4536 4537 4538 4539 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659 4660 4661 4662 4663 4664 4665 4666 4667 4668 4669 4670 4671 4672 4673 4674 4675 4676 4677 4678 4679 4680 4681 4682 4683 4684 4685 4686 4687 4688 4689 4690 4691 4692 4693 4694 4695 4696 4697 4698 4699 4700 4701 4702 4703 4704 4705 4706 4707 4708 4709 4710 4711 4712 4713 4714 4715 4716 4717 4718 4719 4720 4721 4722 4723 4724 4725 4726 4727 4728 4729 4730 4731 4732 4733 4734 4735 4736 4737 4738 4739 4740 4741 4742 4743 4744 4745 4746 4747 4748 4749 4750 4751 4752 4753 4754 4755 4756 4757 4758 4759 4760 4761 4762 4763 4764 4765 4766 4767 4768 4769 4770 4771 4772 4773 4774 4775 4776 4777 4778 4779 4780 4781 4782 4783 4784 4785 4786 4787 4788 4789 4790 4791 4792 4793 4794 4795 4796 4797 4798 4799 4800 4801 4802 4803 4804 4805 4806 4807 4808 4809 4810 4811 4812 4813 4814 4815 4816 4817 4818 4819 4820 4821 4822 4823 4824 4825 4826 4827 4828 4829 4830 4831 4832 4833 4834 4835 4836 4837 4838 4839 4840 4841 4842 4843 4844 4845 4846 4847 4848 4849 4850 4851 4852 4853 4854 4855 4856 4857 4858 4859 4860 4861 4862 4863 4864 4865 4866 4867 4868 4869 4870 4871 4872 4873 4874 4875 4876 4877 4878 4879 4880 4881 4882 4883 4884 4885 4886 4887 4888 4889 4890 4891 4892 4893 4894 4895 4896 4897 4898 4899 4900 4901 4902 4903 4904 4905 4906 4907 4908 4909 4910 4911 4912 4913 4914 4915 4916 4917 4918 4919 4920 4921 4922 4923 4924 4925 4926 4927 4928 4929 4930 4931 4932 4933 4934 4935 4936 4937 4938 4939 4940 4941 4942 4943 4944 4945 4946 4947 4948 4949 4950 4951 4952 4953 4954 4955 4956 4957 4958 4959 4960 4961 4962 4963 4964 4965 4966 4967 4968 4969 4970 4971 4972 4973 4974 4975 4976 4977 4978 4979 4980 4981 4982 4983 4984 4985 4986 4987 4988 4989 4990 4991 4992 4993 4994 4995 4996 4997 4998 4999 5000 5001 5002 5003 5004 5005 5006 5007 5008 5009 5010 5011 5012 5013 5014 5015 5016 5017 5018 5019 5020 5021 5022 5023 5024 5025 5026 5027 5028 5029 5030 5031 5032 5033 5034 5035 5036 5037 5038 5039 5040 5041 5042 5043 5044 5045 5046 5047 5048 5049 5050 5051 5052 5053 5054 5055 5056 5057 5058 5059 5060 5061 5062 5063 5064 5065 5066 5067 5068 5069 5070 5071 5072 5073 5074 5075 5076 5077 5078 5079 5080 5081 5082 5083 5084 5085 5086 5087 5088 5089 5090 5091 5092 5093 5094 5095 5096 5097 5098 5099 5100 5101 5102 5103 5104 5105 5106 5107 5108 5109 5110 5111 5112 5113 5114 5115 5116 5117 5118 5119 5120 5121 5122 5123 5124 5125 5126 5127 5128 5129 5130 5131 5132 5133 5134 5135 5136 5137 5138 5139 5140 5141 5142 5143 5144 5145 5146 5147 5148 5149 5150 5151 5152 5153 5154 5155 5156 5157 5158 5159 5160 5161 5162 5163 5164 5165 5166 5167 5168 5169 5170 5171 5172 5173 5174 5175 5176 5177 5178 5179 5180 5181 5182 5183 5184 5185 5186 5187 5188 5189 5190 5191 5192 5193 5194 5195 5196 5197 5198 5199 5200 5201 5202 5203 5204 5205 5206 5207 5208 5209 5210 5211 5212 5213 5214 5215 5216 5217 5218 5219 5220 5221 5222 5223 5224 5225 5226 5227 5228 5229 5230 5231 5232 5233 5234 5235 5236 5237 5238 5239 5240 5241 5242 5243 5244 5245 5246 5247 5248 5249 5250 5251 5252 5253 5254 5255 5256 5257 5258 5259 5260 5261 5262 5263 5264 5265 5266 5267 5268 5269 5270 5271 5272 5273 5274 5275 5276 5277 5278 5279 5280 5281 5282 5283 5284 5285 5286 5287 5288 5289 5290 5291 5292 5293 5294 5295 5296 5297 5298 5299 5300 5301 5302 5303 5304 5305 5306 5307 5308 5309 5310 5311 5312 5313 5314 5315 5316 5317 5318 5319 5320 5321 5322 5323 5324 5325 5326 5327 5328 5329 5330 5331 5332 5333 5334 5335 5336 5337 5338 5339 5340 5341 5342 5343 5344 5345 5346 5347 5348 5349 5350 5351 5352 5353 5354 5355 5356 5357 5358 5359 5360 5361 5362 5363 5364 5365 5366 5367 5368 5369 5370 5371 5372 5373 5374 5375 5376 5377 5378 5379 5380 5381 5382 5383 5384 5385 5386 5387 5388 5389 5390 5391 5392 5393 5394 5395 5396 5397 5398 5399 5400 5401 5402 5403 5404 5405 5406 5407 5408 5409 5410 5411 5412 5413 5414 5415 5416 5417 5418 5419 5420 5421 5422 5423 5424 5425 5426 5427 5428 5429 5430 5431 5432 5433 5434 5435 5436 5437 5438 5439 5440 5441 5442 5443 5444 5445 5446 5447 5448 5449 5450 5451 5452 5453 5454 5455 5456 5457 5458 5459 5460 5461 5462 5463 5464 5465 5466 5467 5468 5469 5470 5471 5472 5473 5474 5475 5476 5477 5478 5479 5480 5481 5482 5483 5484 5485 5486 5487 5488 5489 5490 5491 5492 5493 5494 5495 5496 5497 5498 5499 5500 5501 5502 5503 5504 5505 5506 5507 5508 5509 5510 5511 5512 5513 5514 5515 5516 5517 5518 5519 5520 5521 5522 5523 5524 5525 5526 5527 5528 5529 5530 5531 5532 5533 5534 5535 5536 5537 5538 5539 5540 5541 5542 5543 5544 5545 5546 5547 5548 5549 5550 5551 5552 5553 5554 5555 5556 5557 5558 5559 5560 5561 5562 5563 5564 5565 5566 5567 5568 5569 5570 5571 5572 5573 5574 5575 5576 5577 5578 5579 5580 5581 5582 5583 5584 5585 5586 5587 5588 5589 5590 5591 5592 5593 5594 5595 5596 5597 5598 5599 5600 5601 5602 5603 5604 5605 5606 5607 5608 5609 5610 5611 5612 5613 5614 5615 5616 5617 5618 5619 5620 5621 5622 5623 5624 5625 5626 5627 5628 5629 5630 5631 5632 5633 5634 5635 5636 5637 5638 5639 5640 5641 5642 5643 5644 5645 5646 5647 5648 5649 5650 5651 5652 5653 5654 5655 5656 5657 5658 5659 5660 5661 5662 5663 5664 5665 5666 5667 5668 5669 5670 5671 5672 5673 5674 5675 5676 5677 5678 5679 5680 5681 5682 5683 5684 5685 5686 5687 5688 5689 5690 5691 5692 5693 5694 5695 5696 5697 5698 5699 5700 5701 5702 5703 5704 5705 5706 5707 5708 5709 5710 5711 5712 5713 5714 5715 5716 5717 5718 5719 5720 5721 5722 5723 5724 5725 5726 5727 5728 5729 5730 5731 5732 5733 5734 5735 5736 5737 5738 5739 5740 5741 5742 5743 5744 5745 5746 5747 5748 5749 5750 5751 5752 5753 5754 5755 5756 5757 5758 5759 5760 5761 5762 5763 5764 5765 5766 5767 5768 5769 5770 5771 5772 5773 5774 5775 5776 5777 5778 5779 5780 5781 5782 5783 5784 5785 5786 5787 5788 5789 5790 5791 5792 5793 5794 5795 5796 5797 5798 5799 5800 5801 5802 5803 5804 5805 5806 5807 5808 5809 5810 5811 5812 5813 5814 5815 5816 5817 5818 5819 5820 5821 5822 5823 5824 5825 5826 5827 5828 5829 5830 5831 5832 5833 5834 5835 5836 5837 5838 5839 5840 5841 5842 5843 5844 5845 5846 5847 5848 5849 5850 5851 5852 5853 5854 5855 5856 5857 5858 5859 5860 5861 5862 5863 5864 5865 5866 5867 5868 5869 5870 5871 5872 5873 5874 5875 5876 5877 5878 5879 5880 5881 5882 5883 5884 5885 5886 5887 5888 5889 5890 5891 5892 5893 5894 5895 5896 5897 5898 5899 5900 5901 5902 5903 5904 5905 5906 5907 5908 5909 5910 5911 5912 5913 5914 5915 5916 5917 5918 5919 5920 5921 5922 5923 5924 5925 5926 5927 5928 5929 5930 5931 5932 5933 5934 5935 5936 5937 5938 5939 5940 5941 5942 5943 5944 5945 5946 5947 5948 5949 5950 5951 5952 5953 5954 5955 5956 5957 5958 5959 5960 5961 5962 5963 5964 5965 5966 5967 5968 5969 5970 5971 5972 5973 5974 5975 5976 5977 5978 5979 5980 5981 5982 5983 5984 5985 5986 5987 5988 5989 5990 5991 5992 5993 5994 5995 5996 5997 5998 5999 6000 6001 6002 6003 6004 6005 6006 6007 6008 6009 6010 6011 6012 6013 6014 6015 6016 6017 6018 6019 6020 6021 6022 6023 6024 6025 6026 6027 6028 6029 6030 6031 6032 6033 6034 6035 6036 6037 6038 6039 6040 6041 6042 6043 6044 6045 6046 6047 6048 6049 6050 6051 6052 6053 6054 6055 6056 6057 6058 6059 6060 6061 6062 6063 6064 6065 6066 6067 6068 6069 6070 6071 6072 6073 6074 6075 6076 6077 6078 6079 6080 6081 6082 6083 6084 6085 6086 6087 6088 6089 6090 6091 6092 6093 6094 6095 6096 6097 6098 6099 6100 6101 6102 6103 6104 6105 6106 6107 6108 6109 6110 6111 6112 6113 6114 6115 6116 6117 6118 6119 6120 6121 6122 6123 6124 6125 6126 6127 6128 6129 6130 6131 6132 6133 6134 6135 6136 6137 6138 6139 6140 6141 6142 6143 6144 6145 6146 6147 6148 6149 6150 6151 6152 6153 6154 6155 6156 6157 6158 6159 6160 6161 6162 6163 6164 6165 6166 6167 6168 6169 6170 6171 6172 6173 6174 6175 6176 6177 6178 6179 6180 6181 6182 6183 6184 6185 6186 6187 6188 6189 6190 6191 6192 6193 6194 6195 6196 6197 6198 6199 6200 6201 6202 6203 6204 6205 6206 6207 6208 6209 6210 6211 6212 6213 6214 6215 6216 6217 6218 6219 6220 6221 6222 6223 6224 6225 6226 6227 6228 6229 6230 6231 6232 6233 6234 6235 6236 6237 6238 6239 6240 6241 6242 6243 6244 6245 6246 6247 6248 6249 6250 6251 6252 6253 6254 6255 6256 6257 6258 6259 6260 6261 6262 6263 6264 6265 6266 6267 6268 6269 6270 6271 6272 6273 6274 6275 6276 6277 6278 6279 6280 6281 6282 6283 6284 6285 6286 6287 6288 6289 6290 6291 6292 6293 6294 6295 6296 6297 6298 6299 6300 6301 6302 6303 6304 6305 6306 6307 6308 6309 6310 6311 6312 6313 6314 6315 6316 6317 6318 6319 6320 6321 6322 6323 6324 6325 6326 6327 6328 6329 6330 6331 6332 6333 6334 6335 6336 6337 6338 6339 6340 6341 6342 6343 6344 6345 6346 6347 6348 6349 6350 6351 6352 6353 6354 6355 6356 6357 6358 6359 6360 6361 6362 6363 6364 6365 6366 6367 6368 6369 6370 6371 6372 6373 6374 6375 6376 6377 6378 6379 6380 6381 6382 6383 6384 6385 6386 6387 6388 6389 6390 6391 6392 6393 6394 6395 6396 6397 6398 6399 6400 6401 6402 6403 6404 6405 6406 6407 6408 6409 6410 6411 6412 6413 6414 6415 6416 6417 6418 6419 6420 6421 6422 6423 6424 6425 6426 6427 6428 6429 6430 6431 6432 6433 6434 6435 6436 6437 6438 6439 6440 6441 6442 6443 6444 6445 6446 6447 6448 6449 6450 6451 6452 6453 6454 6455 6456 6457 6458 6459 6460 6461 6462 6463 6464 6465 6466 6467 6468 6469 6470 6471 6472 6473 6474 6475 6476 6477 6478 6479 6480 6481 6482 6483 6484 6485 6486 6487 6488 6489 6490 6491 6492 6493 6494 6495 6496 6497 6498 6499 6500 6501 6502 6503 6504 6505 6506 6507 6508 6509 6510 6511 6512 6513 6514 6515 6516 6517 6518 6519 6520 6521 6522 6523 6524 6525 6526 6527 6528 6529 6530 6531 6532 6533 6534 6535 6536 6537 6538 6539 6540 6541 6542 6543 6544 6545 6546 6547 6548 6549 6550 6551 6552 6553 6554 6555 6556 6557 6558 6559 6560 6561 6562 6563 6564 6565 6566 6567 6568 6569 6570 6571 6572 6573 6574 6575 6576 6577 6578 6579 6580 6581 6582 6583 6584 6585 6586 6587 6588 6589 6590 6591 6592 6593 6594 6595 6596 6597 6598 6599 6600 6601 6602 6603 6604 6605 6606 6607 6608 6609 6610 6611 6612 6613 6614 6615 6616 6617 6618 6619 6620 6621 6622 6623 6624 6625 6626 6627 6628 6629 6630 6631 6632 6633 6634 6635 6636 6637 6638 6639 6640 6641 6642 6643 6644 6645 6646 6647 6648 6649 6650 6651 6652 6653 6654 6655 6656 6657 6658 6659 6660 6661 6662 6663 6664 6665 6666 6667 6668 6669 6670 6671 6672 6673 6674 6675 6676 6677 6678 6679 6680 6681 6682 6683 6684 6685 6686 6687 6688 6689 6690 6691 6692 6693 6694 6695 6696 6697 6698 6699 6700 6701 6702 6703 6704 6705 6706 6707 6708 6709 6710 6711 6712 6713 6714 6715 6716 6717 6718 6719 6720 6721 6722 6723 6724 6725 6726 6727 6728 6729 6730 6731 6732 6733 6734 6735 6736 6737 6738 6739 6740 6741 6742 6743 6744 6745 6746 6747 6748 6749 6750 6751 6752 6753 6754 6755 6756 6757 6758 6759 6760 6761 6762 6763 6764 6765 6766 6767 6768 6769 6770 6771 6772 6773 6774 6775 6776 6777 6778 6779 6780 6781 6782 6783 6784 6785 6786 6787 6788 6789 6790 6791 6792 6793 6794 6795 6796 6797 6798 6799 6800 6801 6802 6803 6804 6805 6806 6807 6808 6809 6810 6811 6812 6813 6814 6815 6816 6817 6818 6819 6820 6821 6822 6823 6824 6825 6826 6827 6828 6829 6830 6831 6832 6833 6834 6835 6836 6837 6838 6839 6840 6841 6842 6843 6844 6845 6846 6847 6848 6849 6850 6851 6852 6853 6854 6855 6856 6857 6858 6859 6860 6861 6862 6863 6864 6865 6866 6867 6868 6869 6870 6871 6872 6873 6874 6875 6876 6877 6878 6879 6880 6881 6882 6883 6884 6885 6886 6887 6888 6889 6890 6891 6892 6893 6894 6895 6896 6897 6898 6899 6900 6901 6902 6903 6904 6905 6906 6907 6908 6909 6910 6911 6912 6913 6914 6915 6916 6917 6918 6919 6920 6921 6922 6923 6924 6925 6926 6927 6928 6929 6930 6931 6932 6933 6934 6935 6936 6937 6938 6939 6940 6941 6942 6943 6944 6945 6946 6947 6948 6949 6950 6951 6952 6953 6954 6955 6956 6957 6958 6959 6960 6961 6962 6963 6964 6965 6966 6967 6968 6969 6970 6971 6972 6973 6974 6975 6976 6977 6978 6979 6980 6981 6982 6983 6984 6985 6986 6987 6988 6989 6990 6991 6992 6993 6994 6995 6996 6997 6998 6999 7000 7001 7002 7003 7004 7005 7006 7007 7008 7009 7010 7011 7012 7013 7014 7015 7016 7017 7018 7019 7020 7021 7022 7023 7024 7025 7026 7027 7028 7029 7030 7031 7032 7033 7034 7035 7036 7037 7038 7039 7040 7041 7042 7043 7044 7045 7046 7047 7048 7049 7050 7051 7052 7053 7054 7055 7056 7057 7058 7059 7060 7061 7062 7063 7064 7065 7066 7067 7068 7069 7070 7071 7072 7073 7074 7075 7076 7077 7078 7079 7080 7081 7082 7083 7084 7085 7086 7087 7088 7089 7090 7091 7092 7093 7094 7095 7096 7097 7098 7099 7100 7101 7102 7103 7104 7105 7106 7107 7108 7109 7110 7111 7112 7113 7114 7115 7116 7117 7118 7119 7120 7121 7122 7123 7124 7125 7126 7127 7128 7129 7130 7131 7132 7133 7134 7135 7136 7137 7138 7139 7140 7141 7142 7143 7144 7145 7146 7147 7148 7149 7150 7151 7152 7153 7154 7155 7156 7157 7158 7159 7160 7161 7162 7163 7164 7165 7166 7167 7168 7169 7170 7171 7172 7173 7174 7175 7176 7177 7178 7179 7180 7181 7182 7183 7184 7185 7186 7187 7188 7189 7190 7191 7192 7193 7194 7195 7196 7197 7198 7199 7200 7201 7202 7203 7204 7205 7206 7207 7208 7209 7210 7211 7212 7213 7214 7215 7216 7217 7218 7219 7220 7221 7222 7223 7224 7225 7226 7227 7228 7229 7230 7231 7232 7233 7234 7235 7236 7237 7238 7239 7240 7241 7242 7243 7244 7245 7246 7247 7248 7249 7250 7251 7252 7253 7254 7255 7256 7257 7258 7259 7260 7261 7262 7263 7264 7265 7266 7267 7268 7269 7270 7271 7272 7273 7274 7275 7276 7277 7278 7279 7280 7281 7282 7283 7284 7285 7286 7287 7288 7289 7290 7291 7292 7293 7294 7295 7296 7297 7298 7299 7300 7301 7302 7303 7304 7305 7306 7307 7308 7309 7310 7311 7312 7313 7314 7315 7316 7317 7318 7319 7320 7321 7322 7323 7324 7325 7326 7327 7328 7329 7330 7331 7332 7333 7334 7335 7336 7337 7338 7339 7340 7341 7342 7343 7344 7345 7346 7347 7348 7349 7350 7351 7352 7353 7354 7355 7356 7357 7358 7359 7360 7361 7362 7363 7364 7365 7366 7367 7368 7369 7370 7371 7372 7373 7374 7375 7376 7377 7378 7379 7380 7381 7382 7383 7384 7385 7386 7387 7388 7389 7390 7391 7392 7393 7394 7395 7396 7397 7398 7399 7400 7401 7402 7403 7404 7405 7406 7407 7408 7409 7410 7411 7412 7413 7414 7415 7416 7417 7418 7419 7420 7421 7422 7423 7424 7425 7426 7427 7428 7429 7430 7431 7432 7433 7434 7435 7436 7437 7438 7439 7440 7441 7442 7443 7444 7445 7446 7447 7448 7449 7450 7451 7452 7453 7454 7455 7456 7457 7458 7459 7460 7461 7462 7463 7464 7465 7466 7467 7468 7469 7470 7471 7472 7473 7474 7475 7476 7477 7478 7479 7480 7481 7482 7483 7484 7485 7486 7487 7488 7489 7490 7491 7492 7493 7494 7495 7496 7497 7498 7499 7500 7501 7502 7503 7504 7505 7506 7507 7508 7509 7510 7511 7512 7513 7514 7515 7516 7517 7518 7519 7520 7521 7522 7523 7524 7525 7526 7527 7528 7529 7530 7531 7532 7533 7534 7535 7536 7537 7538 7539 7540 7541 7542 7543 7544 7545 7546 7547 7548 7549 7550 7551 7552 7553 7554 7555 7556 7557 7558 7559 7560 7561 7562 7563 7564 7565 7566 7567 7568 7569 7570 7571 7572 7573 7574 7575 7576 7577 7578 7579 7580 7581 7582 7583 7584 7585 7586 7587 7588 7589 7590 7591 7592 7593 7594 7595 7596 7597 7598 7599 7600 7601 7602 7603 7604 7605 7606 7607 7608 7609 7610 7611 7612 7613 7614 7615 7616 7617 7618 7619 7620 7621 7622 7623 7624 7625 7626 7627 7628 7629 7630 7631 7632 7633 7634 7635 7636 7637 7638 7639 7640 7641 7642 7643 7644 7645 7646 7647 7648 7649 7650 7651 7652 7653 7654 7655 7656 7657 7658 7659 7660 7661 7662 7663 7664 7665 7666 7667 7668 7669 7670 7671 7672 7673 7674 7675 7676 7677 7678 7679 7680 7681 7682 7683 7684 7685 7686 7687 7688 7689 7690 7691 7692 7693 7694 7695 7696 7697 7698 7699 7700 7701 7702 7703 7704 7705 7706 7707 7708 7709 7710 7711 7712 7713 7714 7715 7716 7717 7718 7719 7720 7721 7722 7723 7724 7725 7726 7727 7728 7729 7730 7731 7732 7733 7734 7735 7736 7737 7738 7739 7740 7741 7742 7743 7744 7745 7746 7747 7748 7749 7750 7751 7752 7753 7754 7755 7756 7757 7758 7759 7760 7761 7762 7763 7764 7765 7766 7767 7768 7769 7770 7771 7772 7773 7774 7775 7776 7777 7778 7779 7780 7781 7782 7783 7784 7785 7786 7787 7788 7789 7790 7791 7792 7793 7794 7795 7796 7797 7798 7799 7800 7801 7802 7803 7804 7805 7806 7807 7808 7809 7810 7811 7812 7813 7814 7815 7816 7817 7818 7819 7820 7821 7822 7823 7824 7825 7826 7827 7828 7829 7830 7831 7832 7833 7834 7835 7836 7837 7838 7839 7840 7841 7842 7843 7844 7845 7846 7847 7848 7849 7850 7851 7852 7853 7854 7855 7856 7857 7858 7859 7860 7861 7862 7863 7864 7865 7866 7867 7868 7869 7870 7871 7872 7873 7874 7875 7876 7877 7878 7879 7880 7881 7882 7883 7884 7885 7886 7887 7888 7889 7890 7891 7892 7893 7894 7895 7896 7897 7898 7899 7900 7901 7902 7903 7904 7905 7906 7907 7908 7909 7910 7911 7912 7913 7914 7915 7916 7917 7918 7919 7920 7921 7922 7923 7924 7925 7926 7927 7928 7929 7930 7931 7932 7933 7934 7935 7936 7937 7938 7939 7940 7941 7942 7943 7944 7945 7946 7947 7948 7949 7950 7951 7952 7953 7954 7955 7956 7957 7958 7959 7960 7961 7962 7963 7964 7965 7966 7967 7968 7969 7970 7971 7972 7973 7974 7975 7976 7977 7978 7979 7980 7981 7982 7983 7984 7985 7986 7987 7988 7989 7990 7991 7992 7993 7994 7995 7996 7997 7998 7999 8000 8001 8002 8003 8004 8005 8006 8007 8008 8009 8010 8011 8012 8013 8014 8015 8016 8017 8018 8019 8020 8021 8022 8023 8024 8025 8026 8027 8028 8029 8030 8031 8032 8033 8034 8035 8036 8037 8038 8039 8040 8041 8042 8043 8044 8045 8046 8047 8048 8049 8050 8051 8052 8053 8054 8055 8056 8057 8058 8059 8060 8061 8062 8063 8064 8065 8066 8067 8068 8069 8070 8071 8072 8073 8074 8075 8076 8077 8078 8079 8080 8081 8082 8083 8084 8085 8086 8087 8088 8089 8090 8091 8092 8093 8094 8095 8096 8097 8098 8099 8100 8101 8102 8103 8104 8105 8106 8107 8108 8109 8110 8111 8112 8113 8114 8115 8116 8117 8118 8119 8120 8121 8122 8123 8124 8125 8126 8127 8128 8129 8130 8131 8132 8133 8134 8135 8136 8137 8138 8139 8140 8141 8142 8143 8144 8145 8146 8147 8148 8149 8150 8151 8152 8153 8154 8155 8156 8157 8158 8159 8160 8161 8162 8163 8164 8165 8166 8167 8168 8169 8170 8171 8172 8173 8174 8175 8176 8177 8178 8179 8180 8181 8182 8183 8184 8185 8186 8187 8188 8189 8190 8191 8192 8193 8194 8195 8196 8197 8198 8199 8200 8201 8202 8203 8204 8205 8206 8207 8208 8209 8210 8211 8212 8213 8214 8215 8216 8217 8218 8219 8220 8221 8222 8223 8224 8225 8226 8227 8228 8229 8230 8231 8232 8233 8234 8235 8236 8237 8238 8239 8240 8241 8242 8243 8244 8245 8246 8247 8248 8249 8250 8251 8252 8253 8254 8255 8256 8257 8258 8259 8260 8261 8262 8263 8264 8265 8266 8267 8268 8269 8270 8271 8272 8273 8274 8275 8276 8277 8278 8279 8280 8281 8282 8283 8284 8285 8286 8287 8288 8289 8290 8291 8292 8293 8294 8295 8296 8297 8298 8299 8300 8301 8302 8303 8304 8305 8306 8307 8308 8309 8310 8311 8312 8313 8314 8315 8316 8317 8318 8319 8320 8321 8322 8323 8324 8325 8326 8327 8328 8329 8330 8331 8332 8333 8334 8335 8336 8337 8338 8339 8340 8341 8342 8343 8344 8345 8346 8347 8348 8349 8350 8351 8352 8353 8354 8355 8356 8357 8358 8359 8360 8361 8362 8363 8364 8365 8366 8367 8368 8369 8370 8371 8372 8373 8374 8375 8376 8377 8378 8379 8380 8381 8382 8383 8384 8385 8386 8387 8388 8389 8390 8391 8392 8393 8394 8395 8396 8397 8398 8399 8400 8401 8402 8403 8404 8405 8406 8407 8408 8409 8410 8411 8412 8413 8414 8415 8416 8417 8418 8419 8420 8421 8422 8423 8424 8425 8426 8427 8428 8429 8430 8431 8432 8433 8434 8435 8436 8437 8438 8439 8440 8441 8442 8443 8444 8445 8446 8447 8448 8449 8450 8451 8452 8453 8454 8455 8456 8457 8458 8459 8460 8461 8462 8463 8464 8465 8466 8467 8468 8469 8470 8471 8472 8473 8474 8475 8476 8477 8478 8479 8480 8481 8482 8483 8484 8485 8486 8487 8488 8489 8490 8491 8492 8493 8494 8495 8496 8497 8498 8499 8500 8501 8502 8503 8504 8505 8506 8507 8508 8509 8510 8511 8512 8513 8514 8515 8516 8517 8518 8519 8520 8521 8522 8523 8524 8525 8526 8527 8528 8529 8530 8531 8532 8533 8534 8535 8536 8537 8538 8539 8540 8541 8542 8543 8544 8545 8546 8547 8548 8549 8550 8551 8552 8553 8554 8555 8556 8557 8558 8559 8560 8561 8562 8563 8564 8565 8566 8567 8568 8569 8570 8571 8572 8573 8574 8575 8576 8577 8578 8579 8580 8581 8582 8583 8584 8585 8586 8587 8588 8589 8590 8591 8592 8593 8594 8595 8596 8597 8598 8599 8600 8601 8602 8603 8604 8605 8606 8607 8608 8609 8610 8611 8612 8613 8614 8615 8616 8617 8618 8619 8620 8621 8622 8623 8624 8625 8626 8627 8628 8629 8630 8631 8632 8633 8634 8635 8636 8637 8638 8639 8640 8641 8642 8643 8644 8645 8646 8647 8648 8649 8650 8651 8652 8653 8654 8655 8656 8657 8658 8659 8660 8661 8662 8663 8664 8665 8666 8667 8668 8669 8670 8671 8672 8673 8674 8675 8676 8677 8678 8679 8680 8681 8682 8683 8684 8685 8686 8687 8688 8689 8690 8691 8692 8693 8694 8695 8696 8697 8698 8699 8700 8701 8702 8703 8704 8705 8706 8707 8708 8709 8710 8711 8712 8713 8714 8715 8716 8717 8718 8719 8720 8721 8722 8723 8724 8725 8726 8727 8728 8729 8730 8731 8732 8733 8734 8735 8736 8737 8738 8739 8740 8741 8742 8743 8744 8745 8746 8747 8748 8749 8750 8751 8752 8753 8754 8755 8756 8757 8758 8759 8760 8761 8762 8763 8764 8765 8766 8767 8768 8769 8770 8771 8772 8773 8774 8775 8776 8777 8778 8779 8780 8781 8782 8783 8784 8785 8786 8787 8788 8789 8790 8791 8792 8793 8794 8795 8796 8797 8798 8799 8800 8801 8802 8803 8804 8805 8806 8807 8808 8809 8810 8811 8812 8813 8814 8815 8816 8817 8818 8819 8820 8821 8822 8823 8824 8825 8826 8827 8828 8829 8830 8831 8832 8833 8834 8835 8836 8837 8838 8839 8840 8841 8842 8843 8844 8845 8846 8847 8848 8849 8850 8851 8852 8853 8854 8855 8856 8857 8858 8859 8860 8861 8862 8863 8864 8865 8866 8867 8868 8869 8870 8871 8872 8873 8874 8875 8876 8877 8878 8879 8880 8881 8882 8883 8884 8885 8886 8887 8888 8889 8890 8891 8892 8893 8894 8895 8896 8897 8898 8899 8900 8901 8902 8903 8904 8905 8906 8907 8908 8909 8910 8911 8912 8913 8914 8915 8916 8917 8918 8919 8920 8921 8922 8923 8924 8925 8926 8927 8928 8929 8930 8931 8932 8933 8934 8935 8936 8937 8938 8939 8940 8941 8942 8943 8944 8945 8946 8947 8948 8949 8950 8951 8952 8953 8954 8955 8956 8957 8958 8959 8960 8961 8962 8963 8964 8965 8966 8967 8968 8969 8970 8971 8972 8973 8974 8975 8976 8977 8978 8979 8980 8981 8982 8983 8984 8985 8986 8987 8988 8989 8990 8991 8992 8993 8994 8995 8996 8997 8998 8999 9000 9001 9002 9003 9004 9005 9006 9007 9008 9009 9010 9011 9012 9013 9014 9015 9016 9017 9018 9019 9020 9021 9022 9023 9024 9025 9026 9027 9028 9029 9030 9031 9032 9033 9034 9035 9036 9037 9038 9039 9040 9041 9042 9043 9044 9045 9046 9047 9048 9049 9050 9051 9052 9053 9054 9055 9056 9057 9058 9059 9060 9061 9062 9063 9064 9065 9066 9067 9068 9069 9070 9071 9072 9073 9074 9075 9076 9077 9078 9079 9080 9081 9082 9083 9084 9085 9086 9087 9088 9089 9090 9091 9092 9093 9094 9095 9096 9097 9098 9099 9100 9101 9102 9103 9104 9105 9106 9107 9108 9109 9110 9111 9112 9113 9114 9115 9116 9117 9118 9119 9120 9121 9122 9123 9124 9125 9126 9127 9128 9129 9130 9131 9132 9133 9134 9135 9136 9137 9138 9139 9140 9141 9142 9143 9144 9145 9146 9147 9148 9149 9150 9151 9152 9153 9154 9155 9156 9157 9158 9159 9160 9161 9162 9163 9164 9165 9166 9167 9168 9169 9170 9171 9172 9173 9174 9175 9176 9177 9178 9179 9180 9181 9182 9183 9184 9185 9186 9187 9188 9189 9190 9191 9192 9193 9194 9195 9196 9197 9198 9199 9200 9201 9202 9203 9204 9205 9206 9207 9208 9209 9210 9211 9212 9213 9214 9215 9216 9217 9218 9219 9220 9221 9222 9223 9224 9225 9226 9227 9228 9229 9230 9231 9232 9233 9234 9235 9236 9237 9238 9239 9240 9241 9242 9243 9244 9245 9246 9247 9248 9249 9250 9251 9252 9253 9254 9255 9256 9257 9258 9259 9260 9261 9262 9263 9264 9265 9266 9267 9268 9269 9270 9271 9272 9273 9274 9275 9276 9277 9278 9279 9280 9281 9282 9283 9284 9285 9286 9287 9288 9289 9290 9291 9292 9293 9294 9295 9296 9297 9298 9299 9300 9301 9302 9303 9304 9305 9306 9307 9308 9309 9310 9311 9312 9313 9314 9315 9316 9317 9318 9319 9320 9321 9322 9323 9324 9325 9326 9327 9328 9329 9330 9331 9332 9333 9334 9335 9336 9337 9338 9339 9340 9341 9342 9343 9344 9345 9346 9347 9348 9349 9350 9351 9352 9353 9354 9355 9356 9357 9358 9359 9360 9361 9362 9363 9364 9365 9366 9367 9368 9369 9370 9371 9372 9373 9374 9375 9376 9377 9378 9379 9380 9381 9382 9383 9384 9385 9386 9387 9388 9389 9390 9391 9392 9393 9394 9395 9396 9397 9398 9399 9400 9401 9402 9403 9404 9405 9406 9407 9408 9409 9410 9411 9412 9413 9414 9415 9416 9417 9418 9419 9420 9421 9422 9423 9424 9425 9426 9427 9428 9429 9430 9431 9432 9433 9434 9435 9436 9437 9438 9439 9440 9441 9442 9443 9444 9445 9446 9447 9448 9449 9450 9451 9452 9453 9454 9455 9456 9457 9458 9459 9460 9461 9462 9463 9464 9465 9466 9467 9468 9469 9470 9471 9472 9473 9474 9475 9476 9477 9478 9479 9480 9481 9482 9483 9484 9485 9486 9487 9488 9489 9490 9491 9492 9493 9494 9495 9496 9497 9498 9499 9500 9501 9502 9503 9504 9505 9506 9507 9508 9509 9510 9511 9512 9513 9514 9515 9516 9517 9518 9519 9520 9521 9522 9523 9524 9525 9526 9527 9528 9529 9530 9531 9532 9533 9534 9535 9536 9537 9538 9539 9540 9541 9542 9543 9544 9545 9546 9547 9548 9549 9550 9551 9552 9553 9554 9555 9556 9557 9558 9559 9560 9561 9562 9563 9564 9565 9566 9567 9568 9569 9570 9571 9572 9573 9574 9575 9576 9577 9578 9579 9580 9581 9582 9583 9584 9585 9586 9587 9588 9589 9590 9591 9592 9593 9594 9595 9596 9597 9598 9599 9600 9601 9602 9603 9604 9605 9606 9607 9608 9609 9610 9611 9612 9613 9614 9615 9616 9617 9618 9619 9620 9621 9622 9623 9624 9625 9626 9627 9628 9629 9630 9631 9632 9633 9634 9635 9636 9637 9638 9639 9640 9641 9642 9643 9644 9645 9646 9647 9648 9649 9650 9651 9652 9653 9654 9655 9656 9657 9658 9659 9660 9661 9662 9663 9664 9665 9666 9667 9668 9669 9670 9671 9672 9673 9674 9675 9676 9677 9678 9679 9680 9681 9682 9683 9684 9685 9686 9687 9688 9689 9690 9691 9692 9693 9694 9695 9696 9697 9698 9699 9700 9701 9702 9703 9704 9705 9706 9707 9708 9709 9710 9711 9712 9713 9714 9715 9716 9717 9718 9719 9720 9721 9722 9723 9724 9725 9726 9727 9728 9729 9730 9731 9732 9733 9734 9735 9736 9737 9738 9739 9740 9741 9742 9743 9744 9745 9746 9747 9748 9749 9750 9751 9752 9753 9754 9755 9756 9757 9758 9759 9760 9761 9762 9763 9764 9765 9766 9767 9768 9769 9770 9771 9772 9773 9774 9775 9776 9777 9778 9779 9780 9781 9782 9783 9784 9785 9786 9787 9788 9789 9790 9791 9792 9793 9794 9795 9796 9797 9798 9799 9800 9801 9802 9803 9804 9805 9806 9807 9808 9809 9810 9811 9812 9813 9814 9815 9816 9817 9818 9819 9820 9821 9822 9823 9824 9825 9826 9827 9828 9829 9830 9831 9832 9833 9834 9835 9836 9837 9838 9839 9840 9841 9842 9843 9844 9845 9846 9847 9848 9849 9850 9851 9852 9853 9854 9855 9856 9857 9858 9859 9860 9861 9862 9863 9864 9865 9866 9867 9868 9869 9870 9871 9872 9873 9874 9875 9876 9877 9878 9879 9880 9881 9882 9883 9884 9885 9886 9887 9888 9889 9890 9891 9892 9893 9894 9895 9896 9897 9898 9899 9900 9901 9902 9903 9904 9905 9906 9907 9908 9909 9910 9911 9912 9913 9914 9915 9916 9917 9918 9919 9920 9921 9922 9923 9924 9925 9926 9927 9928 9929 9930 9931 9932 9933 9934 9935 9936 9937 9938 9939 9940 9941 9942 9943 9944 9945 9946 9947 9948 9949 9950 9951 9952 9953 9954 9955 9956 9957 9958 9959 9960 9961 9962 9963 9964 9965 9966 9967 9968 9969 9970 9971 9972 9973 9974 9975 9976 9977 9978 9979 9980 9981 9982 9983 9984 9985 9986 9987 9988 9989 9990 9991 9992 9993 9994 9995 9996 9997 9998 9999 10000 10001 10002 10003 10004 10005 10006 10007 10008 10009 10010 10011 10012 10013 10014 10015 10016 10017 10018 10019 10020 10021 10022 10023 10024 10025 10026 10027 10028 10029 10030 10031 10032 10033 10034 10035 10036 10037 10038 10039 10040 10041 10042 10043 10044 10045 10046 10047 10048 10049 10050 10051 10052 10053 10054 10055 10056 10057 10058 10059 10060 10061 10062 10063 10064 10065 10066 10067 10068 10069 10070 10071 10072 10073 10074 10075 10076 10077 10078 10079 10080 10081 10082 10083 10084 10085 10086 10087 10088 10089 10090 10091 10092 10093 10094 10095 10096 10097 10098 10099 10100 10101 10102 10103 10104 10105 10106 10107 10108 10109 10110 10111 10112 10113 10114 10115 10116 10117 10118 10119 10120 10121 10122 10123 10124 10125 10126 10127 10128 10129 10130 10131 10132 10133 10134 10135 10136 10137 10138 10139 10140 10141 10142 10143 10144 10145 10146 10147 10148 10149 10150 10151 10152 10153 10154 10155 10156 10157 10158 10159 10160 10161 10162 10163 10164 10165 10166 10167 10168 10169 10170 10171 10172 10173 10174 10175 10176 10177 10178 10179 10180 10181 10182 10183 10184 10185 10186 10187 10188 10189 10190 10191 10192 10193 10194 10195 10196 10197 10198 10199 10200 10201 10202 10203 10204 10205 10206 10207 10208 10209 10210 10211 10212 10213 10214 10215 10216 10217 10218 10219 10220 10221 10222 10223 10224 10225 10226 10227 10228 10229 10230 10231 10232 10233 10234 10235 10236 10237 10238 10239 10240 10241 10242 10243 10244 10245 10246 10247 10248 10249 10250 10251 10252 10253 10254 10255 10256 10257 10258 10259 10260 10261 10262 10263 10264 10265 10266 10267 10268 10269 10270 10271 10272 10273 10274 10275 10276 10277 10278 10279 10280 10281 10282 10283 10284 10285 10286 10287 10288 10289 10290 10291 10292 10293 10294 10295 10296 10297 10298 10299 10300 10301 10302 10303 10304 10305 10306 10307 10308 10309 10310 10311 10312 10313 10314 10315 10316 10317 10318 10319 10320 10321 10322 10323 10324 10325 10326 10327 10328 10329 10330 10331 10332 10333 10334 10335 10336 10337 10338 10339 10340 10341 10342 10343 10344 10345 10346 10347 10348 10349 10350 10351 10352 10353 10354 10355 10356 10357 10358 10359 10360 10361 10362 10363 10364 10365 10366 10367 10368 10369 10370 10371 10372 10373 10374 10375 10376 10377 10378 10379 10380 10381 10382 10383 10384 10385 10386 10387 10388 10389 10390 10391 10392 10393 10394 10395 10396 10397 10398 10399 10400 10401 10402 10403 10404 10405 10406 10407 10408 10409 10410 10411 10412 10413 10414 10415 10416 10417 10418 10419 10420 10421 10422 10423 10424 10425 10426 10427 10428 10429 10430 10431 10432 10433 10434 10435 10436 10437 10438 10439 10440 10441 10442 10443 10444 10445 10446 10447 10448 10449 10450 10451 10452 10453 10454 10455 10456 10457 10458 10459 10460 10461 10462 10463 10464 10465 10466 10467 10468 10469 10470 10471 10472 10473 10474 10475 10476 10477 10478 10479 10480 10481 10482 10483 10484 10485 10486 10487 10488 10489 10490 10491 10492 10493 10494 10495 10496 10497 10498 10499 10500 10501 10502 10503 10504 10505 10506 10507 10508 10509 10510 10511 10512 10513 10514 10515 10516 10517 10518 10519 10520 10521 10522 10523 10524 10525 10526 10527 10528 10529 10530 10531 10532 10533 10534 10535 10536 10537 10538 10539 10540 10541 10542 10543 10544 10545 10546 10547 10548 10549 10550 10551 10552 10553 10554 10555 10556 10557 10558 10559 10560 10561 10562 10563 10564 10565 10566 10567 10568 10569 10570 10571 10572 10573 10574 10575 10576 10577 10578 10579 10580 10581 10582 10583 10584 10585 10586 10587 10588 10589 10590 10591 10592 10593 10594 10595 10596 10597 10598 10599 10600 10601 10602 10603 10604 10605 10606 10607 10608 10609 10610 10611 10612 10613 10614 10615 10616 10617 10618 10619 10620 10621 10622 10623 10624 10625 10626 10627 10628 10629 10630 10631 10632 10633 10634 10635 10636 10637 10638 10639 10640 10641 10642 10643 10644 10645 10646 10647 10648 10649 10650 10651 10652 10653 10654 10655 10656 10657 10658 10659 10660 10661 10662 10663 10664 10665 10666 10667 10668 10669 10670 10671 10672 10673 10674 10675 10676 10677 10678 10679 10680 10681 10682 10683 10684 10685 10686 10687 10688 10689 10690 10691 10692 10693 10694 10695 10696 10697 10698 10699 10700 10701 10702 10703 10704 10705 10706 10707 10708 10709 10710 10711 10712 10713 10714 10715 10716 10717 10718 10719 10720 10721 10722 10723 10724 10725 10726 10727 10728 10729 10730 10731 10732 10733 10734 10735 10736 10737 10738 10739 10740 10741 10742 10743 10744 10745 10746 10747 10748 10749 10750 10751 10752 10753 10754 10755 10756 10757 10758 10759 10760 10761 10762 10763 10764 10765 10766 10767 10768 10769 10770 10771 10772 10773 10774 10775 10776 10777 10778 10779 10780 10781 10782 10783 10784 10785 10786 10787 10788 10789 10790 10791 10792 10793 10794 10795 10796 10797 10798 10799 10800 10801 10802 10803 10804 10805 10806 10807 10808 10809 10810 10811 10812 10813 10814 10815 10816 10817 10818 10819 10820 10821 10822 10823 10824 10825 10826 10827 10828 10829 10830 10831 10832 10833 10834 10835 10836 10837 10838 10839 10840 10841 10842 10843 10844 10845 10846 10847 10848 10849 10850 10851 10852 10853 10854 10855 10856 10857 10858 10859 10860 10861 10862 10863 10864 10865 10866 10867 10868 10869 10870 10871 10872 10873 10874 10875 10876 10877 10878 10879 10880 10881 10882 10883 10884 10885 10886 10887 10888 10889 10890 10891 10892 10893 10894 10895 10896 10897 10898 10899 10900 10901 10902 10903 10904 10905 10906 10907 10908 10909 10910 10911 10912 10913 10914 10915 10916 10917 10918 10919 10920 10921 10922 10923 10924 10925 10926 10927 10928 10929 10930 10931 10932 10933 10934 10935 10936 10937 10938 10939 10940 10941 10942 10943 10944 10945 10946 10947 10948 10949 10950 10951 10952 10953 10954 10955 10956 10957 10958 10959 10960 10961 10962 10963 10964 10965 10966 10967 10968 10969 10970 10971 10972 10973 10974 10975 10976 10977 10978 10979 10980 10981 10982 10983 10984 10985 10986 10987 10988 10989 10990 10991 10992 10993 10994 10995 10996 10997 10998 10999 11000 11001 11002 11003 11004 11005 11006 11007 11008 11009 11010 11011 11012 11013 11014 11015 11016 11017 11018 11019 11020 11021 11022 11023 11024 11025 11026 11027 11028 11029 11030 11031 11032 11033 11034 11035 11036 11037 11038 11039 11040 11041 11042 11043 11044 11045 11046 11047 11048 11049 11050 11051 11052 11053 11054 11055 11056 11057 11058 11059 11060 11061 11062 11063 11064 11065 11066 11067 11068 11069 11070 11071 11072 11073 11074 11075 11076 11077 11078 11079 11080 11081 11082 11083 11084 11085 11086 11087 11088 11089 11090 11091 11092 11093 11094 11095 11096 11097 11098 11099 11100 11101 11102 11103 11104 11105 11106 11107 11108 11109 11110 11111 11112 11113 11114 11115 11116 11117 11118 11119 11120 11121 11122 11123 11124 11125 11126 11127 11128 11129 11130 11131 11132 11133 11134 11135 11136 11137 11138 11139 11140 11141 11142 11143 11144 11145 11146 11147 11148 11149 11150 11151 11152 11153 11154 11155 11156 11157 11158 11159 11160 11161 11162 11163 11164 11165 11166 11167 11168 11169 11170 11171 11172 11173 11174 11175 11176 11177 11178 11179 11180 11181 11182 11183 11184 11185 11186 11187 11188 11189 11190 11191 11192 11193 11194 11195 11196 11197 11198 11199 11200 11201 11202 11203 11204 11205 11206 11207 11208 11209 11210 11211 11212 11213 11214 11215 11216 11217 11218 11219 11220 11221 11222 11223 11224 11225 11226 11227 11228 11229 11230 11231 11232 11233 11234 11235 11236 11237 11238 11239 11240 11241 11242 11243 11244 11245 11246 11247 11248 11249 11250 11251 11252 11253 11254 11255 11256 11257 11258 11259 11260 11261 11262 11263 11264 11265 11266 11267 11268 11269 11270 11271 11272 11273 11274 11275 11276 11277 11278 11279 11280 11281 11282 11283 11284 11285 11286 11287 11288 11289 11290 11291 11292 11293 11294 11295 11296 11297 11298 11299 11300 11301 11302 11303 11304 11305 11306 11307 11308 11309 11310 11311 11312 11313 11314 11315 11316 11317 11318 11319 11320 11321 11322 11323 11324 11325 11326 11327 11328 11329 11330 11331 11332 11333 11334 11335 11336 11337 11338 11339 11340 11341 11342 11343 11344 11345 11346 11347 11348 11349 11350 11351 11352 11353 11354 11355 11356 11357 11358 11359 11360 11361 11362 11363 11364 11365 11366 11367 11368 11369 11370 11371 11372 11373 11374 11375 11376 11377 11378 11379 11380 11381 11382 11383 11384 11385 11386 11387 11388 11389 11390 11391 11392 11393 11394 11395 11396 11397 11398 11399 11400 11401 11402 11403 11404 11405 11406 11407 11408 11409 11410 11411 11412 11413 11414 11415 11416 11417 11418 11419 11420 11421 11422 11423 11424 11425 11426 11427 11428 11429 11430 11431 11432 11433 11434 11435 11436 11437 11438 11439 11440 11441 11442 11443 11444 11445 11446 11447 11448 11449 11450 11451 11452 11453 11454 11455 11456 11457 11458 11459 11460 11461 11462 11463 11464 11465 11466 11467 11468 11469 11470 11471 11472 11473 11474 11475 11476 11477 11478 11479 11480 11481 11482 11483 11484 11485 11486 11487 11488 11489 11490 11491 11492 11493 11494 11495 11496 11497 11498 11499 11500 11501 11502 11503 11504 11505 11506 11507 11508 11509 11510 11511 11512 11513 11514 11515 11516 11517 11518 11519 11520 11521 11522 11523 11524 11525 11526 11527 11528 11529 11530 11531 11532 11533 11534 11535 11536 11537 11538 11539 11540 11541 11542 11543 11544 11545 11546 11547 11548 11549 11550 11551 11552 11553 11554 11555 11556 11557 11558 11559 11560 11561 11562 11563 11564 11565 11566 11567 11568 11569 11570 11571 11572 11573 11574 11575 11576 11577 11578 11579 11580 11581 11582 11583 11584 11585 11586 11587 11588 11589 11590 11591 11592 11593 11594 11595 11596 11597 11598 11599 11600 11601 11602 11603 11604 11605 11606 11607 11608 11609 11610 11611 11612 11613 11614 11615 11616 11617 11618 11619 11620 11621 11622 11623 11624 11625 11626 11627 11628 11629 11630 11631 11632 11633 11634 11635 11636 11637 11638 11639 11640 11641 11642 11643 11644 11645 11646 11647 11648 11649 11650 11651 11652 11653 11654 11655 11656 11657 11658 11659 11660 11661 11662 11663 11664 11665 11666 11667 11668 11669 11670 11671 11672 11673 11674 11675 11676 11677 11678 11679 11680 11681 11682 11683 11684 11685 11686 11687 11688 11689 11690 11691 11692 11693 11694 11695 11696 11697 11698 11699 11700 11701 11702 11703 11704 11705 11706 11707 11708 11709 11710 11711 11712 11713 11714 11715 11716 11717 11718 11719 11720 11721 11722 11723 11724 11725 11726 11727 11728 11729 11730 11731 11732 11733 11734 11735 11736 11737 11738 11739 11740 11741 11742 11743 11744 11745 11746 11747 11748 11749 11750 11751 11752 11753 11754 11755 11756 11757 11758 11759 11760 11761 11762 11763 11764 11765 11766 11767 11768 11769 11770 11771 11772 11773 11774 11775 11776 11777 11778 11779 11780 11781 11782 11783 11784 11785 11786 11787 11788 11789 11790 11791 11792 11793 11794 11795 11796 11797 11798 11799 11800 11801 11802 11803 11804 11805 11806 11807 11808 11809 11810 11811 11812 11813 11814 11815 11816 11817 11818 11819 11820 11821 11822 11823 11824 11825 11826 11827 11828 11829 11830 11831 11832 11833 11834 11835 11836 11837 11838 11839 11840 11841 11842 11843 11844 11845 11846 11847 11848 11849 11850 11851 11852 11853 11854 11855 11856 11857 11858 11859 11860 11861 11862 11863 11864 11865 11866 11867 11868 11869 11870 11871 11872 11873 11874 11875 11876 11877 11878 11879 11880 11881 11882 11883 11884 11885 11886 11887 11888 11889 11890 11891 11892 11893 11894 11895 11896 11897 11898 11899 11900 11901 11902 11903 11904 11905 11906 11907 11908 11909 11910 11911 11912 11913 11914 11915 11916 11917 11918 11919 11920 11921 11922 11923 11924 11925 11926 11927 11928 11929 11930 11931 11932 11933 11934 11935 11936 11937 11938 11939 11940 11941 11942 11943 11944 11945 11946 11947 11948 11949 11950 11951 11952 11953 11954 11955 11956 11957 11958 11959 11960 11961 11962 11963 11964 11965 11966 11967 11968 11969 11970 11971 11972 11973 11974 11975 11976 11977 11978 11979 11980 11981 11982 11983 11984 11985 11986 11987 11988 11989 11990 11991 11992 11993 11994 11995 11996 11997 11998 11999
|
% This file is part of HINT
% Copyright 2017-2021 Martin Ruckert, Hochschule Muenchen, Lothstrasse 64, 80336 Muenchen
%
% Permission is hereby granted, free of charge, to any person obtaining a copy
% of this software and associated documentation files (the "Software"), to deal
% in the Software without restriction, including without limitation the rights
% to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
% copies of the Software, and to permit persons to whom the Software is
% furnished to do so, subject to the following conditions:
%
% The above copyright notice and this permission notice shall be
% included in all copies or substantial portions of the Software.
%
% THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
% IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
% FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
% COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
% WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT
% OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
% THE SOFTWARE.
%
% Except as contained in this notice, the name of the copyright holders shall
% not be used in advertising or otherwise to promote the sale, use or other
% dealings in this Software without prior written authorization from the
% copyright holders.
\input btxmac.tex
\input hintmac.tex
%% defining how to display certain C identifiers
@s int8_t int
@s uint8_t int
@s int16_t int
@s uint16_t int
@s uint32_t int
@s int32_t int
@s uint64_t int
@s bool int
@
\makeindex
\maketoc
\makecode
%\makefigindex
\titletrue
\null
\font\largetitlefont=cmssbx10 scaled\magstep4
\font\Largetitlefont=cmssbx10 at 40pt
\font\hugetitlefont=cmssbx10 at 48pt
\font\smalltitlefontit=cmbxti10 scaled\magstep3
\font\smalltitlefont=cmssbx10 scaled\magstep3
%halftitle
\def\raggedleft{\leftskip=0pt plus 5em\parfillskip=0pt
\spaceskip=.3333em \xspaceskip=0.5em \emergencystretch=1em\relax
\hyphenpenalty=1000\exhyphenpenalty=1000\pretolerance=10000\linepenalty=5000
}
\hbox{}
\vskip 0pt plus 1fill
{ \baselineskip=60pt
\hugetitlefont\hfill HINT:\par
\Largetitlefont\raggedleft The File Format\par
}
\vskip 0pt plus 5fill
\eject
% verso of half title
\titletrue
\null
\vfill
\eject
% title
\titletrue
\hbox{}
\vskip 0pt plus 1fill
{
\baselineskip=1cm\parindent=0pt
{\largetitlefont\raggedright HINT: The File Format}\par
\leftline{\smalltitlefont Version 1.4}
\vskip 10pt plus 0.5fill
\leftline{\smalltitlefont Reflowable}
\vskip-3pt
\leftline{\smalltitlefont Output}
\vskip-3pt
\leftline{\smalltitlefont for \TeX}
\vskip 10pt plus 0.5fill
\hskip 0pt plus 2fill{\it F\"ur meine Mutter}\hskip 0pt plus 0.5fill\hbox{}
\bigskip
\vskip 10pt plus 3fill
\raggedright\baselineskip=12pt
{\bf MARTIN RUCKERT} \ {\it Munich University of Applied Sciences}\par
\bigskip
\leftline{Second edition}
\bigskip
% \leftline{\bf Eigendruck im Selbstverlag}
% \bigskip
}
\eject
% verso of title
% copyright page (ii)
\titletrue
\begingroup
\figrm
\parindent=0pt
%\null
{\raggedright\advance\rightskip 3.5pc
The author has taken care in the preparation of this book,
but makes no expressed or implied warranty of any kind and assumes no
responsibility for errors or omissions. No liability is assumed for
incidental or consequential damages in connection with or arising out
of the use of the information or programs contained herein.
\bigskip
{\figtt\obeylines\obeyspaces\baselineskip=11pt
Ruckert, Martin.
HINT: The File Format
Includes index.
ISBN 979-854992684-4
}
\bigskip
{\raggedright\advance\rightskip 3.5pc
\def\:{\discretionary{}{}{}}
Internet page {\tt http:\://hint.\:userweb.\:mwn.\:de/\:hint/\:format.html}
may contain current information about this book, downloadable software,
and news.
\vfill
Copyright $\copyright$ 2019, 2021 by Martin Ruckert
\smallskip
All rights reserved.
Printed by Kindle Direct Publishing.
This publication is protected by copyright, and permission must be
obtained prior to any prohibited reproduction, storage in
a~retrieval system, or transmission in any form or by any means, electronic,
mechanical, photocopying, recording, or likewise.
To obtain permission to use material from this work, please submit a written
request to Martin Ruckert,
Hochschule M\"unchen,
Fakult\"at f\"ur Informatik und Mathematik,
Lothstrasse 64,
80335 M\"unchen,
Germany.
\medskip
{\tt ruckert\:@@cs.hm.edu}
\medskip
ISBN-13: 979-854992684-4\par
\medskip
First printing: August 2019\par
Second edition: August 2021\par
\medskip
Last commit: \input lastcommit.tex
\par
}
}
\endgroup
\frontmatter
\plainsection{Preface}
Late in summer 2017, with my new \CEE\ based {\tt cweb} implementation
of \TeX\cite{Knuth:tex} in hand\cite{MR:webtocweb}\cite{MR:tug38}\cite{MR:web2w}, I started to write
the first prototype of the \HINT\ viewer. I basically made two copies
of \TeX: In the first copy, I replaced the |build_page| procedure by
an output routine which used more or less the printing routines
already available in \TeX. This was the beginning of the
\HINT\ file format.
In the second copy, I replaced \TeX's main loop by an input routine
that would feed the \HINT\ file more or less directly to \TeX's
|build_page| procedure. And after replacing \TeX's |ship_out|
procedure by a modified rendering routine of a dvi viewer that I had
written earlier for my experiments with \TeX's Computer Modern
fonts\cite{MR:tug37}, I had my first running \HINT\ viewer. My
sabbatical during the following Fall term gave me time for ``rapid
prototyping'' various features that I considered necessary for
reflowable \TeX\ output\cite{MR:tug39}.
The textual output format derived from the original \TeX\ debugging
routines proved to be insufficient when I implemented a ``page up''
button because it did not support reading the page content
``backwards''. As a consequence, I developed a compact binary file
format that could be parsed easily in both directions. The \HINT\
short file format was born. I stopped an initial attempt at
eliminating the old textual format because it was so much nicer when
debugging. Instead, I converted the long textual format into the short
binary format as a preliminary step in the viewer. This was not a long
term solution. When opening a big file, as produced from a 1000
pages \TeX\ file, the parsing took several seconds before the first
page would appear on screen. This delay, observed on a fast desktop
PC, is barley tolerable, and the delay one would expect on a low-cost,
low-power, mobile device seemed prohibitive. The consequence is
simple: The viewer will need an input file in the short format; and to
support debugging (or editing), separate programs are needed to
translate the short format into the long format and back again. But
for the moment, I did not bother to implement any of this but
continued with unrestricted experimentation.
With the beginning of the Spring term 2018, I stopped further
experiments with the \HINT\ viewer and decided that I had to write
down a clean design of the \HINT\ file format. Or of both file
formats? Professors are supposed to do research, and hence I tried an
experiment: Instead of writing down a traditional language
specification, I decided to stick with the ``literate programming''
paradigm\cite{Knuth:lp} and write the present book. It describes and implements
the \.{stretch} and \.{shrink} programs translating one file format
into the other. As a side effect, it contains the underlying language
specification. Whether this experiment is a success as a language
specification remains to be seen, and you should see for yourself. But
the only important measure for the value of a scientific experiment is
how much you can learn form it---and I learned a lot.
The whole project turned out to be much more difficult than I had
expected. Early on, I decided that I would use a recursive descent
parser for the short format and an LR($k$) parser for the long
format. Of course, I would use {\tt lex}/{\tt flex} and {\tt yacc}/{\tt bison}
to generate the LR($k$) parser, and so I had to extend the {\tt
cweb} tools\cite{Knuth:cweb} to support the corresponding source files.
About in mid May, after writing down about 100 pages, the first
problems emerged that could not be resolved with my current
approach. I had started to describe font definitions containing
definitions of the interword glue and the default hyphen, and the
declarative style of my exposition started to conflict with the
sequential demands of writing an output file. So it was time for a
first complete redesign. Two more passes over the whole book were
necessary to find the concepts and the structure that would allow me
to go forward and complete the book as you see it now.
While rewriting was on its way, many ``nice ideas'' were pruned from
the book. For example, the initial idea of optimizing the \HINT\ file
while translating it was first reduced to just gathering statistics
and then disappeared completely. The added code and complexity was
just too distracting.
What you see before you is still a snapshot of the \HINT\ file format
because its development is still under way. We will know what
features are needed for a reflowable \TeX\ file format only after many
people have started using the format. To use the format, the end-user
will need implementations, and the implementer will need a language
specification. The present book is the first step in an attempt to
solve this ``chicken or egg'' dilemma.
\vskip 1cm
\noindent {\it M\"unchen\hfil\break
August 20, 2019 \hfill Martin Ruckert}
\tableofcontent
%\thefigindex
\mainmatter
\section{Introduction}\label{intro}
This book defines a file format for reflowable text.
Actually it describes two file formats: a long format
that optimizes readability for human beings, and
a short format that optimizes readability for machines
and the use of storage space. Both formats use the concept of nodes and lists of
nodes to describe the file content. Programs that process these nodes
will likely want to convert the compressed binary representation of a
node---the short format---or the lengthy textual representation of a
node---the long format---into a convenient internal representation.
So most of what follows is just a description of these nodes: their short format,
their long format and sometimes their internal representation.
Where as the description of the long and short external format is part
of the file specification, the description of the internal representation
is just informational. Different internal representations can be chosen
based on the individual needs of the program.
While defining the format, I illustrate the processing of long and short format
files by implementing two utilities: \.{shrink} and \.{stretch}.
\.{shrink} converts the long format into the short format and \.{stretch}
goes the other way.
There is also a prototype viewer for this
file format and a special version of \TeX\cite{DK:texbook} to produce output
in this format. Both are not described here; a survey describing
them can be found in \cite{MR:tug39}.
\subsection{Glyphs}
Let's start with a simple and very common kind of node: a node describing
a character.
Because we describe a format that is used to display text,
we are not so much interested in the
character itself but we are interested in the specific glyph\index{glyph}.
In typography, a glyph is a unique mark to be placed on the page representing
a character. For example the glyph representing the character `a' can have
many forms among them `{\it a\/}', `{\bf a}', or `{\tenss a}'.
Such glyphs come in collections, called fonts, representing every character
of the alphabet in a consistent way.
The long format of a node describing the glyph `a'
might look like this:`` \.{<glyph} \.{97} \.{*1>}''.
Here ``\.{97}'' is the character code which
happens to be the ASCII code of the letter `a' and ``{\tt *1}'' is a font reference
that stands for ``Computer Modern Roman 10pt''.
Reference numbers, as you can see,
start with an asterisk reminiscent of references in the \CEE\ programming language.
The Astrix enables us to distinguish between ordinary numbers like ``\.{1}'' and references like ``{\tt *1}''.
To make this node more readable, we will see in section~\secref{chars} that it is also
possible to write `` \.{<glyph 'a' (cmr10) *1>}''.
The latter form uses a comment ``\.{(cmr10)}'', enclosed in parentheses, to
give an indication of what kind of font happens to be font 1, and it uses ``\.{'a'}'',
the character enclosed in single quotes to denote the ASCII code of `a'.
But let's keep things simple for now and stick with the decimal notation of the character code.
The rest is common for all nodes: a keyword, here ``\.{glyph}'', and a pair of pointed brackets ``\.{<}\dots\.{>}''.
Internally, we represent a glyph by the font number
and the character number or character code.
To store the internal representation of a glyph node,
we define an appropriate structure type, named after the node with an uppercase first letter.
@<hint types@>=
typedef struct {@+ uint32_t c;@+ uint8_t f; @+} Glyph;
@
Let us now look at the program \.{shrink} and see how it will convert the long format description
to the internal representation of the glyph and finally to a short format description.
\subsection{Scanning the Long Format}
First, \.{shrink} reads the input file and extracts a sequence of
tokens. This is called ``scanning''\index{scanning}. We generate the
procedure to do the scanning using the program
\.{flex}\cite{JL:flexbison}\index{flex+{\tt flex}} which is the GNU
version of the common UNIX tool \.{lex}\cite{JL:lexyacc}\index{lex+{\tt lex}}.
The input to \.{flex} is a list of pattern/\kern -1pt action rules
where the pattern is a regular expression and the action is a piece of
\CEE\ code. Most of the time, the \CEE\ code is very simple: it just
returns the right token\index{token} number to the parser which we
consider shortly.
The code that defines the tokens will be marked with a line ending in
``\redsymbol''. This symbol\index{symbol} stands for ``{\it Reading
the long format\/}''. These code sequences define the syntactical
elements of the long format and at the same time implement the reading
process. All sections where that happens are preceded by a similar
heading and for reference they are conveniently listed together
starting on page~\pageref{codeindex}.
\codesection{\redsymbol}{Reading the Long Format}\redindex{1}{2}{Glyphs}
@s START symbol
@s END symbol
@s GLYPH symbol
@s UNSIGNED symbol
@s REFERENCE symbol
@<symbols@>=
%token START "<"
%token END ">"
%token GLYPH "glyph"
%token <u> UNSIGNED
%token <u> REFERENCE
@
You might notice that a small caps font is used for |START|, |END| or |GLYPH|.
These are ``terminal symbols'' or ``tokens''.
Next are the scanning rules which define the connection between tokens and their
textual representation.
@<scanning rules@>=
::@="<"@> :< SCAN_START; return START; >:
::@=">"@> :< SCAN_END; return END; >:
::@=glyph@> :< return GLYPH; >:
::@=0|[1-9][0-9]*@> :< SCAN_UDEC(yytext); return UNSIGNED; >:
::@=\*(0|[1-9][0-9]*)@> :< SCAN_UDEC(yytext+1); return REFERENCE; >:
::@=[[:space:]]@> :< ; >:
::@=\([^()\n]*[)\n]@> :< ; >:
@
As we will see later, the macros starting with |SCAN_|\dots\ are scanning macros.
Here |SCAN_UDEC| is a macro that converts the decimal representation
that did match the given pattern to an unsigned integer value; it is explained in
section~\secref{integers}.
The macros |SCAN_START| and |SCAN_END| are explained in section~\secref{text}.
The action ``{\tt ;}'' is a ``do nothing'' action; here it causes spaces or comments\index{comment}
to be ignored. Comments start with an opening parenthesis and are terminated by a
closing parenthesis or the end of line character.
The pattern ``\.{[\^()\\n]}'' is a negated
character class that matches all characters except parentheses and the newline
character. These are not allowed inside comments. For detailed information about
the patterns used in a \.{flex} program, see the \.{flex} user manual\cite{JL:flexbison}.
\subsection{Parsing the Long Format}
\label{parse_glyph}
Next, the tokens produced by the scanner are assembled into larger entities.
This is called ``parsing''\index{parsing}.
We generate the procedure to do the parsing using the program \.{bison}\cite{JL:flexbison}\index{bison+{\tt bison}} which is
the GNU version of the common UNIX tool \.{yacc}\cite{JL:lexyacc}\index{yacc+{\tt yacc}}.
The input to \.{bison} is a list of parsing rules, called a ``grammar''\index{grammar}.
The rules describe how to build larger entities from smaller entities.
For a simple glyph node like `` \.{<glyph 97 *1>}'', we need just these rules:
\codesection{\redsymbol}{Reading the Long Format}%\redindex{1}{2}{Glyphs}
@s content_node symbol
@s node symbol
@s glyph symbol
@s Glyph int
@s start symbol
@<symbols@>=
%type <u> start
%type <c> glyph
@
@<parsing rules@>=@/
glyph: UNSIGNED REFERENCE @/{ $$.c=$1; REF(font_kind,$2); $$.f=$2; };
content_node: start GLYPH glyph END { hput_tags($1,hput_glyph(&($3))); };
start: START {HPUTNODE; $$=(uint32_t)(hpos++-hstart);}
@
You might notice that a slanted font is used for |glyph|, |content_node|, or |start|.
These are ``nonterminal symbols' and occur on the left hand side of a rule. On the
right hand side of a rule you find nonterminal symbols, as well as terminal\index{terminal symbol} symbols
and \CEE\ code enclosed in braces.
Within the \CEE\ code, the expressions |$1| and |$2| refer to the variables on the parse stack
that are associated with the first and second symbol on the right hand side of the rule.
In the case of our glyph node, these will be the values 97 and 1, respectively, as produced
by the macro |SCAN_UDEC|.
|$$| refers to the variable associated with the left hand side of the rule.
These variables contain the internal representation of the object in question.
The type of the variable is specified by a mandatory {\bf token} or optional {\bf type} clause
when we define the symbol.
In the above {\bf type} clause for |start| and |glyph| , the identifiers |u| and |c| refer to
the |union| declaration of the parser (see page~\pageref{union})
where we find |uint32_t u| and |Glyph c|. The macro |REF| tests a reference number for
its valid range.
Reading a node is usually split into the following sequence of steps:
\itemize
\item Reading the node specification, here a |glyph|
consisting of an |UNSIGNED| value and a |REFERENCE| value.
\item Creating the internal representation in the variable |$$|
based on the values of |$1|, |$2|, \dots\ Here the character
code field |c| is initialized using the |UNSIGNED| value
stored in |$1| and the font field |f| is initialized using
|$2| after checking the reference number for the proper range.
\item A |content_node| rule explaining that |start| is followed by |GLYPH|,
the keyword that directs the parser to |glyph|, the
node specification, and a final |END|.
\item Parsing |start|, which is defined as the token |START| will assign
to the corresponding variable |p| on the parse stack the current
position |hpos| in the output and increments that position
to make room for the start byte, which we will discuss shortly.
\item At the end of the |content_node| rule, the \.{shrink} program calls
a {\it hput\_\dots\/} function, here |hput_glyph|, to write the short
format of the node as given by its internal representation to the output
and return the correct tag value.
\item Finally the |hput_tags| function will add the tag as a start byte and end byte
to the output stream.
\enditemize
Now let's see how writing the short format works in detail.
\subsection{Writing the Short Format}
A content node in short form begins with a start\index{start byte}
byte. It tells us what kind of node it is. To describe the content of
a short \HINT\ file, 32 different kinds\index{kind} of nodes are
defined. Hence the kind of a node can be stored in 5 bits and the
remaining bits of the start byte can be used to contain a 3 bit
``info''\index{info} value.
We define an enumeration type to give symbolic names to the
kind-values. The exact numerical values are of no specific
importance; we will see in section~\secref{text}, however, that the
assignment chosen below, has certain advantages.
Because the usage of kind-values in content nodes is slightly
different from the usage in definition nodes, we define alternative
names for some kind-values. To display readable names instead of
numerical values when debugging, we define two arrays of strings as
well. Keeping the definitions consistent is achieved by creating all
definitions from the same list of identifiers using different
definitions of the macro |DEF_KIND|.
@<hint basic types@>=
#define DEF_KIND(C,D,N) @[C##_kind=N@]
typedef enum {@+@<kinds@>@+,@+ @<alternative kind names@> @+} Kind;
#undef DEF_KIND
@
@<define |content_name| and |definition_name|@>=
#define DEF_KIND(C,D,N) @[#C@]
const char *content_name[32]=@+{@+@<kinds@>@;@+}@+;
#undef DEF_KIND@#
#define DEF_KIND(C,D,N) @[#D@]
const char *definition_name[0x20]=@+{@+@<kinds@>@;@+}@+;
#undef DEF_KIND
@
@<print |content_name| and |definition_name|@>=
printf("const char *content_name[32]={");
for (k=0; k<= 31;k++)
{ printf("\"%s\"",content_name[k]);
if (k<31) printf(", ");
}
printf("};\n\n");
printf("const char *definition_name[32]={");
for (k=0; k<= 31;k++)
{ printf("\"%s\"",definition_name[k]);
if (k<31) printf(", ");
}
printf("};\n\n");
@
\goodbreak
\index{glyph kind+\\{glyph\_kind}}
\index{font kind+\\{font\_kind}}
\index{penalty kind+\\{penalty\_kind}}
\index{int kind+\\{int\_kind}}
\index{kern kind+\\{kern\_kind}}
\index{xdimen kind+\\{xdimen\_kind}}
\index{ligature kind+\\{ligature\_kind}}
\index{disc kind+\\{disc\_kind}}
\index{glue kind+\\{glue\_kind}}
\index{language kind+\\{language\_kind}}
\index{rule kind+\\{rule\_kind}}
\index{image kind+\\{image\_kind}}
\index{baseline kind+\\{baseline\_kind}}
\index{dimen kind+\\{dimen\_kind}}
\index{hbox kind+\\{hbox\_kind}}
\index{vbox kind+\\{vbox\_kind}}
\index{par kind+\\{par\_kind}}
\index{math kind+\\{math\_kind}}
\index{table kind+\\{table\_kind}}
\index{item kind+\\{item\_kind}}
\index{hset kind+\\{hset\_kind}}
\index{vset kind+\\{vset\_kind}}
\index{hpack kind+\\{hpack\_kind}}
\index{vpack kind+\\{vpack\_kind}}
\index{stream kind+\\{stream\_kind}}
\index{page kind+\\{page\_kind}}
\index{range kind+\\{range\_kind}}
\index{adjust kind+\\{adjust\_kind}}
\index{param kind+\\{param\_kind}}
\index{list kind+\\{list\_kind}}
\label{kinddef}
@<kinds@>=
DEF_KIND(l@&ist,l@&ist,0),@/
DEF_KIND(p@&aram,p@&aram,1),@/
DEF_KIND(r@&ange,r@&ange,2),@/
DEF_KIND(x@&dimen,x@&dimen,3),@/
DEF_KIND(a@&djust,a@&djust,4),@/
DEF_KIND(g@&lyph, f@&ont,5),@/
DEF_KIND(k@&ern,d@&imen,6),@/
DEF_KIND(g@&lue,g@&lue,7),@/
DEF_KIND(l@&igature,l@&igature,8),@/
DEF_KIND(d@&isc,d@&isc,9),@/
DEF_KIND(l@&anguage,l@&anguage,10),@/
DEF_KIND(r@&ule,r@&ule,11),@/
DEF_KIND(i@&mage,i@&mage,12),@/
DEF_KIND(l@&eaders,l@&eaders,13),@/
DEF_KIND(b@&aseline,b@&aseline,14),@/
DEF_KIND(h@&b@&ox,h@&b@&ox,15),@/
DEF_KIND(v@&b@&ox,v@&b@&ox,16),@/
DEF_KIND(p@&ar,p@&ar,17),@/
DEF_KIND(m@&ath,m@&ath,18),@/
DEF_KIND(t@&able,t@&able,19),@/
DEF_KIND(i@&tem,i@&tem,20),@/
DEF_KIND(h@&set,h@&set,21),@/
DEF_KIND(v@&set,v@&set,22),@/
DEF_KIND(h@&pack,h@&pack,23),@/
DEF_KIND(v@&pack,v@&pack,24),@/
DEF_KIND(s@&tream,s@&tream,25),@/
DEF_KIND(p@&age,p@&age,26),@/
DEF_KIND(l@&ink,l@&abel,27),@/
DEF_KIND(c@&olor,c@&olor,28),@/
DEF_KIND(u@&ndefined1,u@&ndefined1,29),@/
DEF_KIND(u@&ndefined2,u@&ndefined2,30),@/
DEF_KIND(p@&enalty, i@&nt,31)
@t@>
@
For a few kind-values we have
alternative names; we will use them
to express different intentions when using them.
@<alternative kind names@>=
font_kind=glyph_kind,int_kind=penalty_kind, unknown_kind=penalty_kind,
dimen_kind=kern_kind, label_kind=link_kind, outline_kind=link_kind@/@t{}@>
@
The info\index{info value} values can be used to represent numbers
in the range 0 to 7; for an example
see the |hput_glyph| function later in this section.
Mostly, however, the individual bits are used as flags indicating the presence
or absence of immediate parameter values. If the info bit is set, it
means the corresponding parameter is present as an immediate value; if it
is zero, it means that there is no immediate parameter value present, and
the node specification will reveal what value to use instead.
In some cases there is a common default value that can be used, in other
cases a one byte reference number is used to select a predefined value.
To make the binary
representation of the info bits more readable, we define an
enumeration type.
\index{b000+\\{b000}}
\index{b001+\\{b001}}
\index{b010+\\{b010}}
\index{b011+\\{b011}}
\index{b100+\\{b100}}
\index{b101+\\{b101}}
\index{b110+\\{b110}}
\index{b111+\\{b111}}
@<hint basic types@>=
typedef enum {@+ b000=0,b001=1,b010=2,b011=3,b100=4,b101=5,b110=6,b111=7@+ } Info;
@
After the start byte follows the node content and it is the purpose of
the start byte to reveal the exact syntax and semantics of the node
content. Because we want to be able to read the short form of a \HINT\
file in forward direction and in backward direction, the start byte is
duplicated after the content as an end\index{end byte} byte.
We store a kind and an info value in one byte and call this a tag.
@<hint basic types@>=
typedef uint8_t Tag;
@
The following macros are used to assemble and disassemble tags:\index{TAG+\.{TAG}}
@<hint macros@>=
#define @[KIND(T)@] (((T)>>3)&0x1F)
#define @[NAME(T)@] @[content_name[KIND(T)]@]
#define @[INFO(T)@] ((T)&0x7)
#define @[TAG(K,I)@] (((K)<<3)|(I))
@
Writing a short format \HINT\ file is implemented by a collection of {\it hput\_\kern 1pt\dots\/} functions;
they follow most of the time the same schema:
\itemize
\item First, we define a variable for |info|.
\item Then follows the main part of the function body, where we
decide on the output format, do the actual output and set the |info| value accordingly.
\item We combine the info value with the kind-value and return the correct tag.
\item The tag value will be passed to |hput_tags| which generates
debugging information, if requested, and stores the tag before and after the node content.
\enditemize
After these preparations, we turn our attention again to the |hput_glyph| function.
The font number in a glyph node is between 0 and 255 and fits nicely in one byte,
but the character code is more difficult: we want to store the most common character
codes as a single byte and less frequent codes with two, three, or even four byte.
Naturally, we use the |info| bits to store the number of bytes needed for the character code.
\codesection{\putsymbol}{Writing the Short Format}\putindex{1}{2}{Glyphs}
@<put functions@>=
static uint8_t hput_n(uint32_t n)
{@+ if (n<=0xFF) @+
{@+HPUT8(n);@+ return 1;@+}
else if (n<=0xFFFF) @+
{@+HPUT16(n);@+ return 2;@+}
else if (n<=0xFFFFFF)@+
{@+HPUT24(n);@+ return 3;@+}
else @+
{@+HPUT32(n);@+ return 4;@+}
}
Tag hput_glyph(Glyph *g)
{ Info info;
info = hput_n(g->c);
HPUT8(g->f);@/
return TAG(glyph_kind,info);
}
@
The |hput_tags| function is called after the node content has been written to the
stream. It gets a the position of the start byte and the tag. With this information
it writes the start byte at the given position and the end byte at the current stream position.
@<put functions@>=
void hput_tags(uint32_t pos, Tag tag)
{ DBGTAG(tag,hstart+pos);DBGTAG(tag,hpos);
HPUTX(1); *(hstart+pos)=*(hpos++)=tag; @+
}
@
The variables |hpos| and |hstart|, the macros |HPUT8|, |HPUT16|,
|HPUT24|, |HPUT32|, and |HPUTX| are all defined in
section~\secref{HPUT}; they put 8, 16, 24, or 32 bits into the output
stream and check for sufficient space in the output buffer. The macro
|DBGTAG| writes debugging output; its definition is found in
section~\secref{error_section}.
Now that we have seen the general outline of the \.{shrink} program,
starting with a long format file and ending with a short format file,
we will look at the program \.{stretch} that reverses this
transformation.
\subsection{Parsing the Short Format}
The inverse of writing the short format with a {\it hput\_\kern 1pt\dots\/} function
is reading the short format with a {\it hget\_\kern 1pt\dots\/} function.
The schema of {\it hget\_\kern 1pt\dots\/} functions reverse the schema of {\it hput\_\kern 1pt\dots\/} functions.
Here is the code for the initial and final part of a get function:
@<read the start byte |a|@>=
Tag a,z; /* the start and the end byte*/
uint32_t node_pos=(uint32_t)(hpos-hstart);
if (hpos>=hend) QUIT("Attempt to read a start byte at the end of the section");
HGETTAG(a);@/@t{}@>
@
@<read and check the end byte |z|@>=
HGETTAG(z);@+
if (a!=z)
QUIT(@["Tag mismatch [%s,%d]!=[%s,%d] at 0x%x to " SIZE_F "\n"@],@|
NAME(a),INFO(a),NAME(z),INFO(z),@|node_pos, hpos-hstart-1);
@
The central routine to parse\index{parsing} the content section of a short format
file is the function |hget_content_node| which calls |hget_content| to
do most of the processing.
|hget_content_node| will read a content node in short format and write
it out in long format: It reads the start\index{start byte} byte |a|, writes the |START|
token using the function |hwrite_start|, and based on |KIND(a)|, it
writes the node's keyword found in the |content_name| array. Then it
calls |hget_content| to read the node's content and write it out.
Finally it reads the end\index{end byte} byte, checks it against the start byte, and
finishes up the content node by writing the |END| token using the
|hwrite_end| function. The function returns the tag byte so that
the calling function might check that the content node meets its requirements.
|hget_content| uses the start byte |a|, passed as a parameter, to
branch directly to the reading routine for the given combination of
kind and info value. The reading routine will read the data and store
its internal representation in a variable. All that the \.{stretch}
program needs to do with this internal representation is writing it in
the long format. As we will see, the call to the proper
{\it hwrite\_\kern 1pt\dots} function is included as final part of the the
reading routine (avoiding another switch statement).
\codesection{\getsymbol}{Reading the Short Format}\getindex{1}{2}{Content Nodes}
@<get functions@>=
static void hget_content(Tag a);
Tag hget_content_node(void)
{ @<read the start byte |a|@>@;@+ hwrite_start();
if (content_known[KIND(a)]&(1<<INFO(a))) hwritef("%s",content_name[KIND(a)]);
hget_content(a);@/
@<read and check the end byte |z|@>@; hwrite_end();
return a;
}
static void hget_content(Tag a)
{@+
switch (a)@/
{@+
@<cases to get content@>@;@t\1@>@/
default:
if (!hget_unknown(a))
TAGERR(a);
break;@t\2@>@/
}
}
@
We implement the code to read a glyph node in two stages.
First we define a general reading macro |HGET_GLYPH(I,G)| that reads a glyph node with info value |I| into
a |Glyph| variable |G|; then we insert this macro
in the above switch statement for all cases where it applies.
Knowing the function |hput_glyph|, the macro |HGET_GLYPH| should not be a surprise.
It reverses |hput_glyph|, storing the glyph node in its internal representation.
After that, the \.{stretch} program calls |hwrite_glyph| to produce the glyph
node in long format.
\codesection{\getsymbol}{Reading the Short Format}\getindex{1}{2}{Glyphs}
@<get macros@>=
#define @[HGET_N(I,X)@] \
if ((I)==1) (X)=HGET8;\
else if ((I)==2) HGET16(X);\
else if ((I)==3) HGET24(X);\
else if ((I)==4) HGET32(X);
#define @[HGET_GLYPH(I,G)@] \
HGET_N(I,(G).c); (G).f=HGET8; @+REF_RNG(font_kind,(G).f);@/\
hwrite_glyph(&(G));\
@
Note that we allow a glyph to reference a font even before that font is defined.
This is necessary because fonts usually contain definitions---for example
the fonts hyphen character---that reference this or other fonts.
@<cases to get content@>=
@t\1\kern1em@>case TAG(glyph_kind,1): @+{@+Glyph g;@+ HGET_GLYPH(1,g);@+}@+break;
case TAG(glyph_kind,2): @+{@+Glyph g;@+ HGET_GLYPH(2,g);@+}@+break;
case TAG(glyph_kind,3): @+{@+Glyph g;@+ HGET_GLYPH(3,g);@+}@+break;
case TAG(glyph_kind,4): @+{@+Glyph g;@+ HGET_GLYPH(4,g);@+}@+break;
@
If this two stage method seems strange to you, consider what the \CEE\ compiler will
do with it. It will expand the |HGET_GLYPH| macro four times inside the switch
statement. The macro is, however, expanded with a constant |I| value, so the expansion
of the |if| statement in |HGET_GLYPH(1,g)|, for example,
will become ``|if (1==1)| \dots\ |else if (1==2)| \dots''
and the compiler will have no difficulties eliminating the constant tests and
the dead branches altogether. This is the most effective use of the switch statement:
a single jump takes you to a specialized code to handle just the given combination
of kind and info value.
Last not least, we implement the function |hwrite_glyph| to write a
glyph node in long form---that is: in a form that is as readable as possible.
\subsection{Writing the Long Format}
The |hwrite_glyph| function inverts the scanning and parsing process we have described
at the very beginning of this chapter.
To implement the |hwrite_glyph| function, we use the function |hwrite_charcode|
to write the character code.
Besides writing the character code as a decimal number, this function can handle also other
representations of character codes as fully explained in section~\secref{chars}.
We split off the writing of the opening and the closing pointed bracket, because
we will need this function very often and because it will keep track of the |nesting|
of nodes and indent them accordingly. The |hwrite_range| and |hwrite_label| functions
used in |hwrite_end| are discussed in section~\secref{range} and~\secref{hwritelabel}.
\codesection{\wrtsymbol}{Writing the Long Format}\wrtindex{1}{2}{Glyphs}
@<write functions@>=
int nesting=0;
void hwrite_nesting(void)
{ int i;
hwritec('\n');
for (i=0;i<nesting;i++) hwritec(' ');
}
void hwrite_start(void)
{ @+hwrite_nesting();@+ hwritec('<');@+ nesting++;
}
void hwrite_range(void);
void hwrite_label(void);
void hwrite_end(void)
{ nesting--; hwritec('>');
if (section_no==2)
{ if (nesting==0) hwrite_range();
hwrite_label();
}
}
void hwrite_comment(char *str)
{ char c;
if (str==NULL) return;
hwritef(" (");
while ((c=*str++)!=0)
if (c=='(' || c==')') hwritec('_');
else if (c=='\n') hwritef("\n(");
else hwritec(c);
hwritec(')');
}
void hwrite_charcode(uint32_t c);
void hwrite_ref(int n);
void hwrite_glyph(Glyph *g)
{ char *n=hfont_name[g->f];
hwrite_charcode(g->c);
hwrite_ref(g->f);
if (n!=NULL) hwrite_comment(n);
}
@
The two primitive operations to write the long format file are defined
as macros:
@<write macros@>=
#define @[hwritec(c)@] @[(hout?putc(c,hout):0)@]
#define @[hwritef(...)@] @[(hout?fprintf(hout,__VA_ARGS__):0)@]
@
Now that we have completed the round trip of shrinking and stretching
glyph nodes, we continue the description of the \HINT\ file formats
in a more systematic way.
\section{Data Types}\hascode
\subsection{Integers}
\label{integers}
We have already seen the pattern/\kern -1pt action rule for unsigned decimal\index{decimal number} numbers. It remains
to define the macro |SCAN_UDEC| which converts a string containing an unsigned\index{unsigned} decimal
number into an unsigned integer\index{integer}.
We use the \CEE\ library function | strtoul|:
\readcode
@<scanning macros@>=
#define @[SCAN_UDEC(S)@] @[yylval.u=strtoul(S,NULL,10)@]
@
Unsigned integers can be given in hexadecimal\index{hexadecimal} notation as well.
@<scanning definitions@>=
::@=HEX@> :< @=[0-9A-F]@> >:
@
@<scanning rules@>=
::@=0x{HEX}+@> :< SCAN_HEX(yytext+2); return UNSIGNED; >:
@
Note that the pattern above allows only upper case letters in the
hexadecimal notation for integers.
@<scanning macros@>=
#define @[SCAN_HEX(S)@] @[yylval.u=strtoul(S,NULL,16)@]
@
Last not least, we add rules for signed\index{signed integer} integers.
@s SIGNED symbol
@s number symbol
@s integer symbol
@<symbols@>=
%token <i> SIGNED
%type <i> integer
@
@<scanning rules@>=
::@=[+-](0|[1-9][0-9]*)@> :< SCAN_DEC(yytext); return SIGNED; >:
@
@<scanning macros@>=
#define @[SCAN_DEC(S)@] @[yylval.i=strtol(S,NULL,10)@]
@
@<parsing rules@>=
integer: SIGNED @+| UNSIGNED { RNG("number",$1,0,0x7FFFFFFF);};
@
To preserve the ``signedness'' of an integer also for positive signed integers
in the long format, we implement the function |hwrite_signed|.
\writecode
@<write functions@>=
void hwrite_signed(int32_t i)
{ if (i<0) hwritef(" -%d",-i);
else hwritef(" +%d",+i);
}
@
Reading and writing integers in the short format is done directly with the {\tt HPUT} and {\tt HGET}
macros.
\subsection{Strings}
\label{strings}
Strings\index{string} are needed in the definition part of a \HINT\
file to specify names of objects, and in the long file format, we also use them for file\index{file name} names.
In the long format, strings are sequences of characters delimited by single quote\index{single quote} characters;
for example: ``\.{'Hello'}'' or ``\.{'cmr10-600dpi.tfm'}''; in the short format, strings are
byte sequences terminated by a zero byte.
Because file names are system dependent, we no not allow arbitrary characters in strings
but only printable ASCII codes which we can reasonably expect to be available on most operating systems.
If your file names in a long format \HINT\ file are supposed to be portable,
you should probably be even more restrictive. For example you should avoid characters like
``\.{\\}'' or ``\.{/}'' which are used in different ways for directories.
The internal representation of a string is a simple zero terminated \CEE\ string.
When scanning a string, we copy it to the |str_buffer| keeping track
of its length in |str_length|. When done,
we make a copy for permanent storage and return the pointer to the parser.
To operate on the |str_buffer|, we define a few macros.
The constant |MAX_STR| determines the maximum size of a string (including the zero byte) to be $2^{10}$ byte.
This restriction is part of the \HINT\ file format specification.
@<scanning macros@>=
#define MAX_STR (1<<10) /* $2^{10}$ Byte or 1kByte */
static char str_buffer[MAX_STR];
static int str_length;
#define STR_START @[(str_length=0)@]
#define @[STR_PUT(C)@] @[(str_buffer[str_length++]=(C))@]
#define @[STR_ADD(C)@] @[STR_PUT(C);RNG("String length",str_length,0,MAX_STR-1)@]
#define STR_END @[str_buffer[str_length]=0@]
#define SCAN_STR @[yylval.s=str_buffer@]
@
To scan a string, we switch the scanner to |STR| mode when we find a quote character,
then we scan bytes in the range |0x20| to |0x7E|, which is the range of printable ASCII
characters, until we find the closing single\index{single quote} quote.
Quote characters inside the string are written as two consecutive single quote characters.
\readcode
@s STRING symbol
@s STR symbol
@s INITIAL symbol
@<scanning definitions@>=
%x STR
@
@<symbols@>=
%token <s> STRING
@
@<scanning rules@>=
::@='@> :< STR_START; BEGIN(STR); >:
<STR>{
::@='@> :< STR_END; SCAN_STR; BEGIN(INITIAL); return STRING; >:
::@=''@> :< STR_ADD('\''); >:
::@=[\x20-\x7E]@> :< STR_ADD(yytext[0]); >:
::@=.@> :< RNG("String character",yytext[0],0x20,0x7E); >:
::@=\n@> :< QUIT("Unterminated String in line %d",yylineno); >:
}
@
The function |hwrite_string| reverses this process; it must take care of the quote symbols.
\writecode
@<write functions@>=
void hwrite_string(char *str)
{@+hwritec(' ');
if (str==NULL) hwritef("''");
else@/
{ hwritec('\'');
while (*str!=0)@/
{ @+if (*str=='\'') hwritec('\'');
hwritec(*str);
str++;
}
hwritec('\'');
}
}
@
In the short format, a string is just a byte sequence terminated by a zero byte.
This makes the function |hput_string|, to write a string, and the macro |HGET_STRING|,
to read a string in short format, very simple. Note that after writing an unbounded
string to the output buffer, the macro |HPUTNODE| will make sure that there is enough
space left to write the remainder of the node.
\putcode
@<put functions@>=
void hput_string(char *str)
{ char *s=str;
if (s!=NULL)
{ do {
HPUTX(1);
HPUT8(*s);
} while (*s++!=0);
HPUTNODE;
}
else HPUT8(0);
}
@
\getcode
@<shared get macros@>=
#define @[HGET_STRING(S)@] @[S=(char*)hpos;\
while(hpos<hend && *hpos!=0) { RNG("String character",*hpos,0x20,0x7E); hpos++;}\
hpos++;
@
\subsection{Character Codes}
\label{chars}
We have already seen in the introduction that character\index{character code} codes can be written as decimal numbers
and section~\secref{integers} adds the possibility to use hexadecimal numbers as well.
It is, however, in most cases more readable if we represent character codes directly
using the characters themselves. Writing ``\.{a}'' is just so much better than writing ``\.{97}''.
To distinguish the character ``\.{9}'' from the number ``\.{9}'', we use the common technique
of enclosing characters within single\index{single quote} quotes. So ``\.{'9'}'' is the character code and
``\.{9}'' is the number.
Therefore we will define |CHARCODE| tokens and complement the parsing rules of section~\secref{parse_glyph}
with the following rule:
\readcode
@<parsing rules@>=
glyph: CHARCODE REFERENCE @|{ $$.c=$1; REF(font_kind,$2); $$.f=$2; };
@
If the character codes are small, we can represent them using
ASCII character codes. We do not offer a special notation for very small
character codes that map to the non-printable ASCII control codes; for them, the decimal
or hexadecimal notation will suffice.
For larger character codes, we use the multibyte encoding scheme known from UTF8\index{UTF8} as
follows. Given a character code~|c|:
\itemize
\item
Values in the range |0x00| to |0x7f| are encoded as a single byte with a leading bit of 0.
@<scanning definitions@>=
::@=UTF8_1@> :< @=[\x00-\x7F]@> >:
@
@<scanning macros@>=
#define @[SCAN_UTF8_1(S)@] @[yylval.u=((S)[0]&0x7F)@]
@
\item
Values in the range |0x80| to |0x7ff| are encoded in two byte with the first byte
having three high bits |110|, indicating a two byte sequence, and the lower five bits equal
to the five high bits of |c|. It is followed by a continuation byte having two high bits |10|
and the lower six bits
equal to the lower six bits of |c|.
@<scanning definitions@>=
::@=UTF8_2@> :< @=[\xC0-\xDF][\x80-\xBF]@> >:
@
@<scanning macros@>=
#define @[SCAN_UTF8_2(S)@] @[yylval.u=(((S)[0]&0x1F)<<6)+((S)[1]&0x3F)@]
@
\item
Values in the range |0x800| to |0xFFFF| are encoded in three byte with the first byte
having the high bits |1110| indicating a three byte sequence followed by two continuation bytes.
@<scanning definitions@>=
::@=UTF8_3@> :< @=[\xE0-\xEF][\x80-\xBF][\x80-\xBF]@> >:
@
@<scanning macros@>=
#define @[SCAN_UTF8_3(S)@] @[yylval.u=(((S)[0]&0x0F)<<12)+(((S)[1]&0x3F)<<6)+((S)[2]&0x3F)@]
@
\item
Values in the range |0x1000| to |0x1FFFFF| are encoded in four byte with the first byte
having the high bits |11110| indicating a four byte sequence followed by three continuation bytes.
@<scanning definitions@>=
::@=UTF8_4@> :< @=[\xF0-\xF7][\x80-\xBF][\x80-\xBF][\x80-\xBF]@> >:
@
@<scanning macros@>=
#define @[SCAN_UTF8_4(S)@] @[yylval.u=(((S)[0]&0x03)<<18)+(((S)[1]&0x3F)<<12)+@|(((S)[2]&0x3F)<<6)+((S)[3]&0x3F)@]
@
\enditemize
In the long format file, we enclose a character code in single\index{single quote} quotes, just as we do for strings.
This is convenient but it has the downside that we must exercise special care when giving the
scanning rules in order
not to confuse character codes with strings. Further we must convert character codes back into strings
in the rare case where the parser expects a string and gets a character code because the string
was only a single character long.
Let's start with the first problem:
The scanner might confuse a string\index{string} and a character code if the first or second
character of the string is a quote character which is written as two consecutive quotes.
For example \.{'a''b'} is a string with three characters, ``\.{a}'',
``\.{'}'', and ``\.{b}''. Two character codes would need a space to separate
them like this: \.{'a' 'b'}.
@s CHARCODE symbol
@<symbols@>=
%token <u> CHARCODE
@
@<scanning rules@>=
::@='''@> :< STR_START; STR_PUT('\''); BEGIN(STR); >:
::@=''''@> :< SCAN_UTF8_1(yytext+1); return CHARCODE; >:
::@='[\x20-\x7E]''@> :< STR_START; STR_PUT(yytext[1]); STR_PUT('\''); BEGIN(STR); >:
::@='''''@> :< STR_START; STR_PUT('\''); STR_PUT('\''); BEGIN(STR); >:
::@='{UTF8_1}'@> :< SCAN_UTF8_1(yytext+1); return CHARCODE; >:
::@='{UTF8_2}'@> :< SCAN_UTF8_2(yytext+1); return CHARCODE; >:
::@='{UTF8_3}'@> :< SCAN_UTF8_3(yytext+1); return CHARCODE; >:
::@='{UTF8_4}'@> :< SCAN_UTF8_4(yytext+1); return CHARCODE; >:
@
If needed, the parser can convert character codes back to single character strings.
@s string symbol
@<symbols@>=
%type <s> string
@
@<parsing rules@>=
string: STRING @+ | CHARCODE { static char s[2];
RNG("String element",$1,0x20,0x7E);
s[0]=$1; s[1]=0; $$=s;};
@
The function |hwrite_charcode| will write a character code. While ASCII codes are handled directly,
larger character codes are passed to the function |hwrite_utf8|.
It returns the number of characters written.
\writecode
@<write functions@>=
int hwrite_utf8(uint32_t c)
{@+ if (c<0x80)
{ hwritec(c); @+return 1;@+ }
else if (c<0x800)@/
{ hwritec(0xC0|(c>>6));@+ hwritec(0x80|(c&0x3F));@+ return 2;@+}
else if (c<0x10000)@/
{ hwritec(0xE0|(c>>12)); hwritec(0x80|((c>>6)&0x3F));@+ hwritec(0x80|(c&0x3F)); return 3; }
else if (c<0x200000)@/
{ hwritec(0xF0|(c>>18));@+ hwritec(0x80|((c>>12)&0x3F));
hwritec(0x80|((c>>6)&0x3F));@+ hwritec(0x80|(c&0x3F)); return 4;}
else
RNG("character code",c,0,0x1FFFFF);
return 0;
}
void hwrite_charcode(uint32_t c)
{ @+if (c < 0x20)
{ if (option_hex) hwritef(" 0x%02X",c); /* non printable ASCII */
else hwritef(" %u",c);
}
else if (c=='\'') hwritef(" ''''");
else if (c<=0x7E) hwritef(" \'%c\'",c); /* printable ASCII */
else if (option_utf8) { hwritef(" \'"); @+ hwrite_utf8(c); @+ hwritec('\'');@+}
else if (option_hex) hwritef(" 0x%04X",c);
else hwritef(" %u",c);
}
@
\getcode
@<shared get functions@>=
#define @[HGET_UTF8C(X)@] (X)=HGET8;@+ if ((X&0xC0)!=0x80) \
QUIT(@["UTF8 continuation byte expected at " SIZE_F " got 0x%02X\n"@],hpos-hstart-1,X)@;
uint32_t hget_utf8(void)
{ uint8_t a;
a=HGET8;
if (a<0x80) return a;
else
{ if ((a&0xE0)==0xC0) @/
{ uint8_t b; @+ HGET_UTF8C(b);
return ((a&~0xE0)<<6)+(b&~0xC0);
}
else if ((a&0xF0)==0xE0) @/
{ uint8_t b,c; @+ HGET_UTF8C(b); @+ HGET_UTF8C(c);
return ((a&~0xF0)<<12)+((b&~0xC0)<<6)+(c&~0xC0);
}
else if ((a&0xF8)==0xF0) @/
{ uint8_t b,c,d; @+ HGET_UTF8C(b); @+ HGET_UTF8C(c); @+ HGET_UTF8C(d);
return ((a&~0xF8)<<18)@|+ ((b&~0xC0)<<12)+((c&~0xC0)<<6)+(d&~0xC0);
}
else QUIT("UTF8 byte sequence expected");
}
}
@
\putcode
@<put functions@>=
void hput_utf8(uint32_t c)
{ @+HPUTX(4);
if (c<0x80)
HPUT8(c);
else if (c<0x800)
{ HPUT8(0xC0|(c>>6));@+ HPUT8(0x80|(c&0x3F));@+ }
else if (c<0x10000)@/
{ HPUT8(0xE0|(c>>12)); HPUT8(0x80|((c>>6)&0x3F));@+ HPUT8(0x80|(c&0x3F)); }
else if (c<0x200000)@/
{ HPUT8(0xF0|(c>>18));@+ HPUT8(0x80|((c>>12)&0x3F));
HPUT8(0x80|((c>>6)&0x3F));@+ HPUT8(0x80|(c&0x3F)); }
else
RNG("character code",c,0,0x1FFFFF);
}
@
\subsection{Floating Point Numbers}
You know a floating point numbers\index{floating point number} when
you see it because it features a radix\index{radix point} point. The
optional exponent\index{exponent} allows you to ``float'' the point.
\readcode
@s FPNUM symbol
@s number symbol
@<symbols@>=
%token <f> FPNUM
%type <f> number
@
@<scanning rules@>=
::@=[+-]?[0-9]+\.[0-9]+(e[+-]?[0-9])?@> :< SCAN_DECFLOAT; return FPNUM; >:
@
The layout of floating point variables of type |double|
or |float| typically follows the IEEE754\index{IEEE754}
standard\cite{IEEE754-1985}\cite{IEEE754-2008}.
We use the following definitions:
\index{float32 t+\&{float32\_t}}
\index{float64 t+\&{float64\_t}}
@<hint basic types@>=
#define FLT_M_BITS 23
#define FLT_E_BITS 8
#define FLT_EXCESS 127
#define DBL_M_BITS 52
#define DBL_E_BITS 11
#define DBL_EXCESS 1023
@
@s float32_t int
@s float64_t int
We expect a variable of type |float64_t| to have a binary
representation using 64 bit. The most significant bit is the sign
bit, then follow $|DBL_E_BITS|=11$ bits for the
exponent\index{exponent}, and $|DBL_M_BITS|=52$ bits for the
mantissa\index{mantissa}. The sign\index{sign bit} bit is 1 for a
negative number and 0 for a positive number. A floating point number
is stored in normalized\index{normalization} form which means that the
mantissa is shifted such that it has exactly 52+1 bit not counting
leading zeros. The leading bit is then always 1 and there is no need
to store it. So 52 bits suffice. To store the exponent, the excess
$q=1023$ is added and the result is stored as an unsigned 11 bit
number. For example if we regard the exponent bits and the mantissa
bits as unsigned binary numbers $e$ and $m$ then the absolute value of
such a floating point number can be expressed as
$(1+m*2^{-52})\cdot2^{e-1023}$. We make similar assumptions about
variables of type |float32_t| using the constants as defined above.
To convert the decimal representation of a floating point number to
binary values of type |float64_t|, we use a \CEE\ library function.
@<scanning macros@>=
#define SCAN_DECFLOAT @[yylval.f=atof(yytext)@]
@
When the parser expects a floating point number and gets an integer number,
it converts it. So whenever in the long format a floating point number is expected,
an integer number will do as well.
@<parsing rules@>=
number: UNSIGNED {$$=(float64_t)$1; } | SIGNED {$$=(float64_t)$1; } | FPNUM;
@
Unfortunately the decimal representation is not optimal for floating
point numbers since even simple numbers in decimal notation like $0.1$
do not have an exact representation as a binary floating point number.
So if we want a notation that allows an exact representation of binary
floating point numbers, we must use a hexadecimal\index{hexadecimal}
representation. Hexadecimal floating point numbers start with an
optional sign, then as usual the two characters ``{\tt 0x}'', then
follows a sequence of hex digits, a radix point, more hex digits, and
an optional exponent. The optional exponent starts with the character
``{\tt x}'', followed by an optional sign, and some more hex
digits. The hexadecimal exponent is given as a base 16 number and it
is interpreted as an exponent with the base 16. As an example an
exponent of ``{\tt x10}'', would multiply the mantissa by $16^{16}$.
In other words it would shift any mantissa 16 hexadecimal digits to
the left. Here are the exact rules:
@<scanning rules@>=
::@=[+-]?0x{HEX}+\.{HEX}+(x[+-]?{HEX}+)?@> :< SCAN_HEXFLOAT; return FPNUM; >:
@
@<scanning macros@>=
#define SCAN_HEXFLOAT @[yylval.f=xtof(yytext)@]
@
There is no function in the \CEE\ library for hexadecimal floating point notation
so we have to write our own conversion routine.
The function |xtof| converts a string matching the above regular expression to
its binary representation. Its outline is very simple:
@<scanning functions@>=
float64_t xtof(char *x)
{ int sign, digits, exp;
uint64_t mantissa=0;
DBG(DBGFLOAT,"converting %s:\n",x);
@<read the optional sign@>@;
x=x+2; /* skip ``\.{0x}'' */
@<read the mantissa@>@;
@<normalize the mantissa@>@;
@<read the optional exponent@>@;
@<return the binary representation@>@;
}
@
Now the pieces:
@<read the optional sign@>=
if (*x=='-') { sign=-1;@+ x++;@+ }
else if (*x=='+') { sign=+1;@+ x++;@+ }
else @+sign=+1;
DBG(DBGFLOAT,"\tsign=%d\n",sign);
@
When we read the mantissa, we use the temporary variable |mantissa|, keep track
of the number of digits, and adjust the exponent while reading the fractional part.
@<read the mantissa@>=
digits=0;
while (*x=='0') x++; /*ignore leading zeros*/
while (*x!='.')@/
{ mantissa=mantissa<<4;
if (*x<'A') mantissa=mantissa+*x-'0';
else mantissa=mantissa+*x-'A'+10;
x++;
digits++;
}
x++; /* skip ``\.{.}'' */
exp=0;
while (*x!=0 && *x!='x')@/
{ mantissa=mantissa<<4;
exp=exp-4;
if (*x<'A') mantissa=mantissa+*x-'0';
else mantissa=mantissa+*x-'A'+10;
x++;
digits++;
}
DBG(DBGFLOAT,"\tdigits=%d mantissa=0x%" PRIx64 ", exp=%d\n",@|digits,mantissa,exp);
@
To normalize the mantissa, first we shift it to place exactly one nonzero hexadecimal
digit to the left of the radix point. Then we shift it right bit-wise until there is
just a single 1 bit to the left of the radix point.
To compensate for the shifting, we adjust the exponent accordingly.
Finally we remove the most significant bit because it is
not stored.
@<normalize the mantissa@>=
if (mantissa==0) return 0.0;
{ int s;
s = digits-DBL_M_BITS/4;
if (s>1)
mantissa=mantissa>>(4*(s-1));
else if (s<1)
mantissa=mantissa<<(4*(1-s));
exp=exp+4*(digits-1);
DBG(DBGFLOAT,"\tdigits=%d mantissa=0x%" PRIx64 ", exp=%d\n",@|digits,mantissa,exp);
while ((mantissa>>DBL_M_BITS)>1)@/ { mantissa=mantissa>>1; @+ exp++;@+ }
DBG(DBGFLOAT,"\tdigits=%d mantissa=0x%" PRIx64 ", exp=%d\n",@|digits,mantissa,exp);
mantissa=mantissa&~((uint64_t)1<<DBL_M_BITS);
DBG(DBGFLOAT,"\tdigits=%d mantissa=0x%" PRIx64 ", exp=%d\n",@|digits,mantissa,exp);
}
@
In the printed representation,
the exponent is an exponent with base 16. For example, an exponent of 2 shifts
the hexadecimal mantissa two hexadecimal digits to the left, which corresponds to a
multiplication by ${16}^2$.
@<read the optional exponent@>=
if (*x=='x')@/
{ int s;
x++; /* skip the ``\.{x}'' */
if (*x=='-') {s=-1;@+x++;@+}
else if (*x=='+') {s=+1;@+x++;@+}
else s=+1;
DBG(DBGFLOAT,"\texpsign=%d\n",s);
DBG(DBGFLOAT,"\texp=%d\n",exp);
while (*x!=0 )
{ if (*x<'A') exp=exp+4*s*(*x-'0');
else exp=exp+4*s*(*x-'A'+10);
x++;
DBG(DBGFLOAT,"\texp=%d\n",exp);
}
}
RNG("Floating point exponent",@|exp,-DBL_EXCESS,DBL_EXCESS);
@
To assemble the binary representation, we use a |union| of a |float64_t| and |uint64_t|.
@<return the binary representation@>=
{ union {@+float64_t d; @+uint64_t bits; @+} u;
if (sign<0) sign=1;@+ else@+ sign=0; /* the sign bit */
exp=exp+DBL_EXCESS; /* the exponent bits */
u.bits=((uint64_t)sign<<63)@/
| ((uint64_t)exp<<DBL_M_BITS) | mantissa;
DBG(DBGFLOAT," return %f\n",u.d);
return u.d;
}
@
The inverse function is |hwrite_float64|. It strives to print floating point numbers
as readable as possible. So numbers without fractional part are written as integers.
Numbers that can be represented exactly in decimal notation are represented in
decimal notation. All other values are written as hexadecimal floating point numbers.
We avoid an exponent if it can be avoided by using up to |MAX_HEX_DIGITS|.
For the use with extended dimensions, floating point numbers should be printed as a suffix:
without a leading space and with a mandatory sign.
\writecode
@<write functions@>=
#define MAX_HEX_DIGITS 12
void hwrite_float64(float64_t d, bool suffix)
{ uint64_t bits, mantissa;
int exp, digits;
if (!suffix) hwritec(' ');
else if (d>=0) hwritec('+');
if (floor(d)==d)
{ hwritef("%d",(int)d);@+ return;@+}
if (floor(10000.0*d)==10000.0*d)
{ hwritef("%g",d); @+return;@+}
DBG(DBGFLOAT,"Writing hexadecimal float %f\n",d);
if (d<0) { hwritec('-');@+ d=-d;@+}
hwritef("0x");
@<extract mantissa and exponent@>@;
if (exp>MAX_HEX_DIGITS)
@<write large numbers@>@;
else if (exp>=0) @<write medium numbers@>@;
else @<write small numbers@>@;
}
@
The extraction just reverses the creation of the binary representation.
@<extract mantissa and exponent@>=
{ union {@+float64_t d; @+ uint64_t bits; @+} u;
u.d=d; @+ bits=u.bits;
}
mantissa= bits&(((uint64_t)1<<DBL_M_BITS)-1);
mantissa=mantissa+((uint64_t)1<<DBL_M_BITS);
exp= ((bits>>DBL_M_BITS)&((1<<DBL_E_BITS)-1))-DBL_EXCESS;
digits=DBL_M_BITS+1;
DBG(DBGFLOAT,"\tdigits=%d mantissa=0x%" PRIx64 " binary exp=%d\n",@|digits,mantissa,exp);
@
After we have obtained the binary exponent,
we round it down, and convert it to a hexadecimal
exponent.
@<extract mantissa and exponent@>=
{ int r;
if (exp>=0)
{ r= exp%4;
if (r>0)
{ mantissa=mantissa<<r; @+exp=exp-r; @+digits=digits+r; @+}
}
else
{ r=(-exp)%4;
if (r>0)
{ mantissa=mantissa>>r; @+exp=exp+r; @+digits=digits-r;@+}
}
}
exp=exp/4;
DBG(DBGFLOAT,"\tdigits=%d mantissa=0x%" PRIx64 " hex exp=%d\n",@|digits,mantissa,exp);
@
In preparation for writing,
we shift the mantissa to the left so that the leftmost hexadecimal
digit of it will occupy the 4 leftmost bits of the variable |bits| .
@<extract mantissa and exponent@>=
mantissa=mantissa<<(64-DBL_M_BITS-4); /* move leading digit to leftmost nibble */
@
If the exponent is larger than |MAX_HEX_DIGITS|, we need to
use an exponent even if the mantissa uses only a few digits.
When we use an exponent, we always write exactly one digit preceding the radix point.
@<write large numbers@>=
{ DBG(DBGFLOAT,"writing large number\n");
hwritef("%X.",(uint8_t)(mantissa>>60));
mantissa=mantissa<<4;
do {
hwritef("%X",(uint8_t)(mantissa>>DBL_M_BITS)&0xF);
mantissa=mantissa<<4;
} while (mantissa!=0);
hwritef("x+%X",exp);
}
@
If the exponent is small and non negative, we can write the
number without an exponent by writing the radix point at the
appropriate place.
@<write medium numbers@>=
{ DBG(DBGFLOAT,"writing medium number\n");
do {
hwritef("%X",(uint8_t)(mantissa>>60));
mantissa=mantissa<<4;
if (exp--==0) hwritec('.');
} while (mantissa!=0 || exp>=-1);
}
@
Last non least, we write numbers that would require additional zeros after the
radix point with an exponent, because it keeps the mantissa shorter.
@<write small numbers@>=
{ DBG(DBGFLOAT,"writing small number\n");
hwritef("%X.",(uint8_t)(mantissa>>60));
mantissa=mantissa<<4;
do {
hwritef("%X",(uint8_t)(mantissa>>60));
mantissa=mantissa<<4;
} while (mantissa!=0);
hwritef("x-%X",-exp);
}
@
Compared to the complications of long format floating point numbers,
the short format is very simple because we just use the binary representation.
Since 32 bit floating point numbers offer sufficient precision we use only
the |float32_t| type.
It is however not possible to just write |HPUT32(d)| for a |float32_t| variable |d|
or |HPUT32((uint32_t)d)| because in the \CEE\ language this would imply
rounding the floating point number to the nearest integer.
But we have seen how to convert floating point values to bit pattern before.
@<put functions@>=
void hput_float32(float32_t d)
{ union {@+float32_t d; @+ uint32_t bits; @+} u;
u.d=d; @+ HPUT32(u.bits);
}
@
@<shared get functions@>=
float32_t hget_float32(void)
{ union {@+float32_t d; @+ uint32_t bits; @+} u;
HGET32(u.bits);
return u.d;
}
@
\subsection{Fixed Point Numbers}
\TeX\ internally represents most real numbers as fixed\index{fixed
point number} point numbers or ``scaled integers''\index{scaled
integer}. The type {\bf Scaled} is defined as a signed 32 bit
integer, but we consider it as a fixed point number with the binary
radix point just in the middle with sixteen bits before and sixteen
bits after it. To convert an integer into a scaled number, we
multiply it by |ONE|; to convert a floating point number into a scaled
number, we multiply it by |ONE| and |ROUND| the result to the nearest
integer; to convert a scaled number to a floating point number we
divide it by |(float64_t)ONE|.
\noindent
@<hint basic types@>=
typedef int32_t Scaled;
#define ONE ((Scaled)(1<<16))
@
@<hint macros@>=
#define ROUND(X) ((int)((X)>=0.0?floor((X)+0.5):ceil((X)-0.5)))
@
\writecode
@<write functions@>=
void hwrite_scaled(Scaled x)
{ hwrite_float64(x/(float64_t)ONE, false);
}
@
\subsection{Dimensions}
In the long format,
the dimensions\index{dimension} of characters, boxes, and other things can be given
in three units: \.{pt}, \.{in}, and \.{mm}.
\readcode
@s PT symbol
@s MM symbol
@s INCH symbol
@s dimension symbol
@s DIMEN symbol
@<symbols@>=
%token DIMEN "dimen"
%token PT "pt"
%token MM "mm"
%token INCH "in"
%type <d> dimension
@
@<scanning rules@>=
::@=dimen@> :< return DIMEN; >:
::@=pt@> :< return PT; >:
::@=mm@> :< return MM; >:
::@=in@> :< return INCH; >:
@
The unit \.{pt} is a printers point\index{point}\index{pt+{\tt pt}}.
The unit ``\.{in}'' stands for inches\index{inch}\index{in+{\tt in}} and we have $1\.{in}= 72.27\,\.{pt}$.
The unit ``\.{mm}'' stands for millimeter\index{millimeter}\index{mm+{\tt mm}} and we have $1\.{in}= 25.4\,\.{mm}$.
The definition of a printers\index{printers point} point given above follows the definition used in
\TeX\ which is slightly larger than the official definition of a printer's
point which was defined to equal exactly 0.013837\.{in} by the American Typefounders
Association in~1886\cite{DK:texbook}.
We follow the tradition of \TeX\ and
store dimensions as ``scaled points''\index{scaled point} that is a dimension of $d$ points is
stored as $d\cdot2^{16}$ rounded to the nearest integer.
The maximum absolute value of a dimension is $(2^{30}-1)$ scaled points.
@<hint basic types@>=
typedef Scaled Dimen;
#define MAX_DIMEN ((Dimen)(0x3FFFFFFF))
@
@<parsing rules@>=
dimension: number PT @|{$$=ROUND($1*ONE); RNG("Dimension",$$,-MAX_DIMEN,MAX_DIMEN); }
| number INCH @|{$$=ROUND($1*ONE*72.27); RNG("Dimension",$$,-MAX_DIMEN,MAX_DIMEN);@+}
| number MM @|{$$=ROUND($1*ONE*(72.27/25.4)); RNG("Dimension",$$,-MAX_DIMEN,MAX_DIMEN);@+};
@
When \.{stretch} is writing dimensions in the long format,
for simplicity it always uses the unit ``\.{pt}''.
\writecode
@<write functions@>=
void hwrite_dimension(Dimen x)
{ hwrite_scaled(x);
hwritef("pt");
}
@
In the short format, dimensions are stored as 32 bit scaled point values without conversion.
\getcode
@<get functions@>=
void hget_dimen(Tag a)
{ if (INFO(a)==b000)
{uint8_t r; r=HGET8; REF(dimen_kind,r); hwrite_ref(r);}
else
{ uint32_t d; HGET32(d); hwrite_dimension(d); }
}
@
\putcode
@<put functions@>=
Tag hput_dimen(Dimen d)
{ HPUT32(d);
return TAG(dimen_kind, b001);
}
@
\subsection{Extended Dimensions}\index{extended dimension}\index{hsize+{\tt hsize}}\index{vsize+{\tt vsize}}
The dimension that is probably used most frequently in a \TeX\ file is {\tt hsize}:
the ho\-ri\-zon\-tal size of a line of text. Common are also assignments
like \.{\\hsize=0.5\\hsize} \.{\\advance\\hsize by -10pt}, for example to
get two columns with lines almost half as wide as usual, leaving a small gap
between left and right column. Similar considerations apply to {\tt vsize}.
Because we aim at a reflowable format for \TeX\ output, we have to postpone
such computations until the values of \.{hsize} and \.{vsize} are known in the viewer.
Until then, we do symbolic computations on linear functions\index{linear function} of \.{hsize} and \.{vsize}.
We call such a linear function $w+h\cdot\.{hsize}+v\cdot\.{vsize}$
an extended dimension and represent it by the three numbers $w$, $h$, and $v$.
@<hint basic types@>=
typedef struct {@+
Dimen w; @+ float32_t h, v; @+
} Xdimen;
@
Since very often a component of an extended dimension is zero, we
store in the short format only the nonzero components and use the
info bits to mark them: |b100| implies $|w|\ne0$,
|b010| implies $|h|\ne 0$, and |b001| implies $|v|\ne 0$.
\readcode
@s XDIMEN symbol
@s xdimen symbol
@s xdimen_node symbol
@s H symbol
@s V symbol
@<symbols@>=
%token XDIMEN "xdimen"
%token H "h"
%token V "v"
%type <xd> xdimen
@
@<scanning rules@>=
::@=xdimen@> :< return XDIMEN; >:
::@=h@> :< return H; >:
::@=v@> :< return V; >:
@
@<parsing rules@>=
xdimen: dimension number H number V { $$.w=$1; @+$$.h=$2; @+$$.v=$4; }
| dimension number H { $$.w=$1; @+$$.h=$2; @+$$.v=0.0; }
| dimension number V { $$.w=$1; @+$$.h=0.0; @+$$.v=$2; }
| dimension { $$.w=$1; @+$$.h=0.0; @+$$.v=0.0; };
xdimen_node: start XDIMEN xdimen END { hput_tags($1,hput_xdimen(&($3))); };
@
\writecode
@<write functions@>=
void hwrite_xdimen(Xdimen *x)
{ hwrite_dimension(x->w);
if (x->h!=0.0) {hwrite_float64(x->h,true); @+hwritec('h');@+}
if (x->v!=0.0) {hwrite_float64(x->v,true); @+hwritec('v');@+}
}
void hwrite_xdimen_node(Xdimen *x)
{ hwrite_start(); hwritef("xdimen"); hwrite_xdimen(x); hwrite_end();}
@
\getcode
@<get macros@>=
#define @[HGET_XDIMEN(I,X)@] \
if((I)&b100) HGET32((X).w);@+ else (X).w=0;\
if((I)&b010) (X).h=hget_float32(); @+ else (X).h=0.0;\
if((I)&b001) (X).v=hget_float32(); @+else (X).v=0.0;
@
@<get functions@>=
void hget_xdimen(Tag a, Xdimen *x)
{ switch(a)
{
case TAG(xdimen_kind,b001): HGET_XDIMEN(b001,*x);@+break;
case TAG(xdimen_kind,b010): HGET_XDIMEN(b010,*x);@+break;
case TAG(xdimen_kind,b011): HGET_XDIMEN(b011,*x);@+break;
case TAG(xdimen_kind,b100): HGET_XDIMEN(b100,*x);@+break;
case TAG(xdimen_kind,b101): HGET_XDIMEN(b101,*x);@+break;
case TAG(xdimen_kind,b110): HGET_XDIMEN(b110,*x);@+break;
case TAG(xdimen_kind,b111): HGET_XDIMEN(b111,*x);@+break;
default:
QUIT("Extent expected got [%s,%d]",NAME(a),INFO(a));
}
}
@
Note that the info value |b000|, usually indicating a reference,
is not supported for extended dimensions.
Most nodes that need an extended dimension offer the opportunity to give
a reference directly without the start and end byte.
An exception is the glue node,
but glue nodes that need an extended width are rare.
@<get functions@>=
void hget_xdimen_node(Xdimen *x)
{ @<read the start byte |a|@>@;
if (KIND(a)==xdimen_kind)
hget_xdimen(a,x);
else
QUIT("Extent expected at 0x%x got %s",node_pos,NAME(a));
@<read and check the end byte |z|@>@;
}
@
\putcode
@<put functions@>=
Tag hput_xdimen(Xdimen *x)
{ Info info=b000;
if (x->w==0 && x->h==0.0 && x->v==0.0){ HPUT32(0); @+info|=b100; @+}
else
{ if (x->w!=0) { HPUT32(x->w); @+info|=b100; @+}
if (x->h!=0.0) { hput_float32(x->h); @+info|=b010; @+}
if (x->v!=0.0) { hput_float32(x->v); @+info|=b001; @+}
}
return TAG(xdimen_kind,info);
}
void hput_xdimen_node(Xdimen *x)
{ uint32_t p=hpos++-hstart;
hput_tags(p, hput_xdimen(x));
}
@
\subsection{Stretch and Shrink}\label{stretch}
In section~\secref{glue}, we will consider glue\index{glue} which
is something that can stretch and shrink.
The stretchability\index{stretchability} and shrinkability\index{shrinkability} of the
glue can be given in ``\.{pt}'' like a dimension,
but there are three more units: \.{fil}, \.{fill}, and \.{filll}.
A glue with a stretchability of $1\,\hbox{\tt fil}$ will stretch infinitely more
than a glue with a stretchability of $1\,\hbox{\tt pt}$. So if you stretch both glues
together, the first glue will do all the stretching and the latter will not stretch
at all. The ``\.{fil}'' glue has simply a higher order of infinity.
You might guess that ``\.{fill}'' glue and ``\.{filll}'' glue have even higher
orders of infinite stretchability.
The order of infinity is 0 for \.{pt}, 1 for \.{fil}, 2 for \.{fill}, and 3 for \.{filll}.
The internal representation of a stretch is a variable of type |Stretch|.
It stores the floating point value and the order of infinity separate as a |float64_t| and a |uint8_t|.
The short format tries to be space efficient and because it is not necessary to give the
stretchability with a precision exceeding about six decimal digits,
we use a single 32 bit floating point value.
To write a |float32_t| value and an order value as one 32 bit value,
we round the two lowest bit of the |float32_t| variable to zero
using ``round to even'' and store the order of infinity in these bits.
We define a union type \&{Stch} to simplify conversion.
@<hint basic types@>=
typedef enum { @+ normal_o=0, fil_o=1, fill_o=2, filll_o=3@+} Order;
typedef struct {@+ float64_t f;@+ Order o; @+} Stretch;
typedef union {@+float32_t f; @+ uint32_t u; @+} Stch;
@
\putcode
@<put functions@>=
void hput_stretch(Stretch *s)
{ uint32_t mantissa, lowbits, sign, exponent;
Stch st;
st.f=s->f;
DBG(DBGFLOAT,"joining %f->%f(0x%X),%d:",s->f,st.f,st.u,s->o);
mantissa = st.u &(((uint32_t)1<<FLT_M_BITS)-1);
lowbits = mantissa&0x7; /* lowest 3 bits */
exponent=(st.u>>FLT_M_BITS)&(((uint32_t)1<<FLT_E_BITS)-1);
sign=st.u & ((uint32_t)1<<(FLT_E_BITS+FLT_M_BITS));
DBG(DBGFLOAT,"s=%d e=0x%x m=0x%x",sign, exponent, mantissa);
switch (lowbits) /* round to even */
{ @+case 0: break; /* no change */
case 1: mantissa = mantissa -1; @+break;/* round down */
case 2: mantissa = mantissa -2; @+break;/* round down to even */
case 3: mantissa = mantissa +1; @+break; /* round up */
case 4: break; /* no change */
case 5: mantissa = mantissa -1; @+break;/* round down */
case 6: mantissa = mantissa +1; /* round up to even, fall through */
case 7: mantissa = mantissa +1; /* round up to even */
if (mantissa >= ((uint32_t)1<<FLT_M_BITS))@/
{exponent++; /* adjust exponent */
RNG("Float32 exponent",exponent,1,2*FLT_EXCESS);
@+mantissa=mantissa>>1;
}
break;
}
DBG(DBGFLOAT," round s=%d e=0x%x m=0x%x",sign, exponent, mantissa);
st.u=sign| (exponent<<FLT_M_BITS) | mantissa | s->o;
DBG(DBGFLOAT,"float %f hex 0x%x\n",st.f, st.u);
HPUT32(st.u);
}
@
\getcode
@<get macros@>=
#define @[HGET_STRETCH(S)@] { Stch st; @+ HGET32(st.u);@+ S.o=st.u&3; st.u&=~3; S.f=st.f; @+}
@
\readcode
@s FIL symbol
@s FILL symbol
@s FILLL symbol
@s order symbol
@<symbols@>=
%token FIL "fil"
%token FILL "fill"
%token FILLL "filll"
%type <st> stretch
%type <o> order
@
@<scanning rules@>=
::@=fil@> :< return FIL; >:
::@=fill@> :< return FILL; >:
::@=filll@> :< return FILLL; >:
@
@s stretch symbol
@s Stretch int
@<parsing rules@>=
order: PT {$$=normal_o;} | FIL {$$=fil_o;} @+| FILL {$$=fill_o;} @+| FILLL {$$=filll_o;};
stretch: number order { $$.f=$1; $$.o=$2; };
@
\writecode
@<write functions@>=
void hwrite_order(Order o)
{ switch (o)
{ case normal_o: hwritef("pt"); @+break;
case fil_o: hwritef("fil"); @+break;
case fill_o: hwritef("fill"); @+break;
case filll_o: hwritef("filll"); @+break;
default: QUIT("Illegal order %d",o); @+ break;
}
}
void hwrite_stretch(Stretch *s)
{ hwrite_float64(s->f, false);
hwrite_order(s->o);
}
@
\section{Simple Nodes}\hascode
\label{simple}
\subsection{Penalties}
Penalties\index{penalty} are very simple nodes. They specify the cost of breaking a
line or page at the present position. For the internal representation
we use an |int32_t|. The full range of integers is, however, not
used. Instead penalties must be between -20000 and +20000.
(\TeX\ specifies a range of -10000 to +10000, but plain \TeX\ uses the value -20000
when it defines the supereject control sequence.)
The more general node is called an integer node;
it shares the same kind-value |int_kind=penalty_kind|
but allows the full range of values.
The info value of a penalty node is 1 or 2 and indicates the number of bytes
used to store the integer. The info value 3 can be used for general
integers (see section~\secref{definitions}) that need four byte of storage.
\readcode
@s PENALTY symbol
@s INTEGER symbol
@s penalty symbol
@<symbols@>=
%token PENALTY "penalty"
%token INTEGER "int"
%type <i> penalty
@
@<scanning rules@>=
::@=penalty@> :< return PENALTY; >:
::@=int@> :< return INTEGER; >:
@
@<parsing rules@>=
penalty: integer {RNG("Penalty",$1,-20000,+20000);$$=$1;};
content_node: start PENALTY penalty END { hput_tags($1,hput_int($3));@+};
@
\getcode
@<cases to get content@>=
@t\1\kern1em@>case TAG(penalty_kind,1): @+{int32_t p;@+ HGET_PENALTY(1,p);@+} @+break;
case TAG(penalty_kind,2): @+{int32_t p;@+ HGET_PENALTY(2,p);@+} @+break;
case TAG(penalty_kind,3): @+{int32_t p;@+ HGET_PENALTY(3,p);@+} @+break;
@
@<get macros@>=
#define @[HGET_PENALTY(I,P)@] \
if (I==1) {int8_t n=HGET8; @+P=n;@+ } \
else if (I==2) {int16_t n;@+ HGET16(n);@+RNG("Penalty",n,-20000,+20000); @+ P=n; @+}\
else if (I==3) {int32_t n;@+ HGET32(n);@+RNG("Penalty",n,-20000,+20000); @+ P=n; @+}\
hwrite_signed(P);
@
\putcode
@<put functions@>=
Tag hput_int(int32_t n)
{ Info info;
if (n>=0) @/
{ @+if (n<0x80) { @+HPUT8(n); @+info=1;@+ }
else if (n<0x8000) {@+ HPUT16(n);@+ info=2;@+ }
else {@+ HPUT32(n);@+ info=3;@+ }
}
else@/
{@+ if (n>=-0x80) {@+ HPUT8(n);@+ info=1;@+ }
else if (n>=-0x8000) {@+ HPUT16(n);@+ info=2;@+ }
else {@+ HPUT32(n);@+ info=3;@+ }
}
return TAG(int_kind,info);
}
@
\subsection{Languages}
To render a \HINT\ file on screen, information about the language is not necessary.
Knowing the language is, however, very important for language translation and
text to speech conversion which makes texts accessible to the visually-impaired.
For this reason, \HINT\ offers the opportunity to add this information
and encourages authors to supply this information.
Language information by itself is not sufficient to decode text. It must be supplemented
by information about the character encoding (see section~\secref{fonts}).
To represent language information, the world wide web has set universally
accepted standards. The Internet Engineering Task Force IETF has defined tags for identifying
languages\cite{rfc5646}: short strings like ``en'' for English
or ``de'' for Deutsch, and longer ones like ``sl-IT-nedis'', for the specific variant of
the Nadiza dialect of Slovenian that is spoken in Italy.
We assume that any \HINT\ file
will contain only a small number of different languages and all language nodes can be
encoded using a reference to a predefined node from the
definition section (see section~\secref{reference}).
In the definition section, a language node will just
contain the language tag as given in~\cite{iana:language} (see section~\secref{definitions}).
\readcode
@s LANGUAGE symbol
@s language symbol
@<symbols@>=
%token LANGUAGE "language"
@
@<scanning rules@>=
::@=language@> :< return LANGUAGE; >:
@
When encoding language nodes in the short format,
we use the info value |b000| for language nodes in the definition section
and for language nodes in the content section that contain just a one-byte
reference (see section~\secref{reference}).
We use the info value |1| to |7| as a shorthand for
the references {\tt *0} and {\tt *6} to the predefined language nodes.
\goodbreak
\vbox{\getcode\vskip -\baselineskip\writecode}
@<cases to get content@>=
@t\1\kern1em@>case TAG(language_kind,1): REF(language_kind,0); @+hwrite_ref(0); @+break;
case TAG(language_kind,2): REF(language_kind,1); @+hwrite_ref(1); @+break;
case TAG(language_kind,3): REF(language_kind,2); @+hwrite_ref(2); @+break;
case TAG(language_kind,4): REF(language_kind,3); @+hwrite_ref(3); @+break;
case TAG(language_kind,5): REF(language_kind,4); @+hwrite_ref(4); @+break;
case TAG(language_kind,6): REF(language_kind,5); @+hwrite_ref(5); @+break;
case TAG(language_kind,7): REF(language_kind,6); @+hwrite_ref(6); @+break;
@
\putcode
@<put functions@>=
Tag hput_language(uint8_t n)
{ if (n<7) return TAG(language_kind,n+1);
HPUT8(n); return TAG(language_kind,0);
}
@
\subsection{Rules}
Rules\index{rule} are simply black rectangles having a height, a depth, and a
width. All of these dimensions can also be negative but a rule will
not be visible unless its width is positive and its height plus depth
is positive.
As a specialty, rules can have ``running dimensions''\index{running dimension}. If any of the
three dimensions is a running dimension, its actual value will be
determined by running the rule up to the boundary of the innermost
enclosing box. The width is never running in an horizontal\index{horizontal list} list; the
height and depth are never running in a vertical\index{vertical list} list. In the long
format, we use a vertical bar ``{\tt \VB}'' or a horizontal bar
``{\tt \_}'' (underscore character) to indicate a running
dimension. Of course the vertical bar is meant to indicate a running
height or depth while the horizontal bar stands for a running
width. The parser, however, makes no distinction between the two and
you can use either of them. In the short format, we follow \TeX\ and
implement a running dimension by using the special value
$-2^{30}=|0xC0000000|$.
@<hint macros@>=
#define RUNNING_DIMEN 0xC0000000
@
It could have been possible to allow extended dimensions in a rule node,
but in most circumstances, the mechanism of running dimensions is sufficient
and simpler to use. If a rule is needed that requires an extended dimension as
its length, it is always possible to put it inside a suitable box and use a
running dimension.
To make the short format encoding more compact, the first info bit
|b100| will be zero to indicate a running height, bit |b010| will be
zero to indicate a running depth, and bit |b001| will be zero to
indicate a running width.
Because leaders\index{leaders} (see section~\secref{leaders}) may contain a rule
node, we also provide functions to read and write a complete rule
node. While parsing the symbol ``{\sl rule\/}'' will just initialize a variable of type
\&{Rule} (the writing is done with a separate routine),
parsing a {\sl rule\_node\/} will always include writing it.
% Currently no predefined rules.
%Further, a {\sl rule\_node} will permit the
%use of a predefined rule (see section~\secref{reference}),
@<hint types@>=
typedef struct {@+
Dimen h,d,w; @+
} Rule;
@
\readcode
@s RULE symbol
@s RUNNING symbol
@s rule_dimension symbol
@s rule symbol
@s rule_node symbol
@<symbols@>=
%token RULE "rule"
%token RUNNING "|"
%type <d> rule_dimension
%type <r> rule
@
@<scanning rules@>=
::@=rule@> :< return RULE; >:
::@="|"@> :< return RUNNING; >:
::@="_"@> :< return RUNNING; >:
@
@<parsing rules@>=
rule_dimension: dimension@+ | RUNNING {$$=RUNNING_DIMEN;};
rule: rule_dimension rule_dimension rule_dimension @/
{ $$.h=$1; @+ $$.d=$2; @+ $$.w=$3;
if ($3==RUNNING_DIMEN && ($1==RUNNING_DIMEN || $2==RUNNING_DIMEN))
QUIT("Incompatible running dimensions 0x%x 0x%x 0x%x",@|$1,$2,$3); };
rule_node: start RULE rule END { hput_tags($1,hput_rule(&($3))); };
content_node: rule_node;
@
\writecode
@<write functions@>=
static void hwrite_rule_dimension(Dimen d, char c)
{ @+if (d==RUNNING_DIMEN) hwritef(" %c",c);
else hwrite_dimension(d);
}
void hwrite_rule(Rule *r)
{ @+hwrite_rule_dimension(r->h,'|');
hwrite_rule_dimension(r->d,'|');
hwrite_rule_dimension(r->w,'_');
}
@
\getcode
@<cases to get content@>=
@t\1\kern1em@>
case TAG(rule_kind,b011): {Rule r;@+ HGET_RULE(b011,r); @+hwrite_rule(&(r));@+ } @+break;
case TAG(rule_kind,b101): {Rule r;@+ HGET_RULE(b101,r); @+hwrite_rule(&(r));@+ } @+break;
case TAG(rule_kind,b001): {Rule r;@+ HGET_RULE(b001,r); @+hwrite_rule(&(r));@+ } @+break;
case TAG(rule_kind,b110): {Rule r;@+ HGET_RULE(b110,r); @+hwrite_rule(&(r));@+ } @+break;
case TAG(rule_kind,b111): {Rule r;@+ HGET_RULE(b111,r); @+hwrite_rule(&(r));@+ } @+break;
@
@<get macros@>=
#define @[HGET_RULE(I,R)@]@/\
if ((I)&b100) HGET32((R).h); @+else (R).h=RUNNING_DIMEN;\
if ((I)&b010) HGET32((R).d); @+else (R).d=RUNNING_DIMEN;\
if ((I)&b001) HGET32((R).w); @+else (R).w=RUNNING_DIMEN;
@
@<get functions@>=
void hget_rule_node(void)
{ @<read the start byte |a|@>@;
if (KIND(a)==rule_kind) @/
{ @+Rule r; @+HGET_RULE(INFO(a),r); @/
hwrite_start();@+ hwritef("rule"); @+hwrite_rule(&r); @+hwrite_end();
}
else
QUIT("Rule expected at 0x%x got %s",node_pos,NAME(a));
@<read and check the end byte |z|@>@;
}
@
\putcode
@<put functions@>=
Tag hput_rule(Rule *r)
{ Info info=b000;
if (r->h!=RUNNING_DIMEN) { HPUT32(r->h); @+info|=b100; @+}
if (r->d!=RUNNING_DIMEN) { HPUT32(r->d); @+info|=b010; @+}
if (r->w!=RUNNING_DIMEN) { HPUT32(r->w); @+info|=b001; @+}
return TAG(rule_kind,info);
}
@
\subsection{Kerns}
A kern\index{kern} is a bit of white space with a certain length. If the kern is part of a
horizontal list, the length is measured in the horizontal direction,
if it is part of a vertical list, it is measured in the vertical
direction. The length of a kern is mostly given as a dimension
but provisions are made to use extended dimensions as well.
The typical
use of a kern is its insertion between two characters to make the natural
distance between them a bit wider or smaller. In the latter case, the kern
has a negative length. The typographic optimization just described is called
``kerning'' and has given the kern node its name.
Kerns inserted from font information or math mode calculations are normal kerns,
while kerns inserted from \TeX's {\tt \BS kern} or {\tt \BS/}
commands are explicit kerns.
Kern nodes do not disappear at a line break unless they are explicit\index{explicit kern}.
In the long format, explicit kerns are marked with an ``!'' sign
and in the short format with the |b100| info bit.
The two low order info bits are: 0 for a reference to a dimension, 1 for a reference to
an extended dimension, 2 for an immediate dimension, and 3 for an immediate extended dimension node.
To distinguish in the long format between a reference to a dimension and a reference to an extended dimension,
the latter is prefixed with the keyword ``{\tt xdimen}'' (see section~\secref{reference}).
@<hint types@>=
typedef struct {@+
bool x;@+
Xdimen d;@+
} Kern;
@
\readcode
@s KERN symbol
@s EXPLICIT symbol
@s kern symbol
@s explicit symbol
@<symbols@>=
%token KERN "kern"
%token EXPLICIT "!"
%type <b> explicit
%type <kt> kern
@
@<scanning rules@>=
::@=kern@> :< return KERN; >:
::@=!@> :< return EXPLICIT; >:
@
@<parsing rules@>=
explicit: {$$=false;} @+| EXPLICIT {$$=true;};
kern: explicit xdimen {$$.x=$1; $$.d=$2;};
content_node: start KERN kern END { hput_tags($1,hput_kern(&($3)));}
@
\writecode
@<write functions@>=
void hwrite_explicit(bool x)
{ @+if (x) hwritef(" !"); @+}
void hwrite_kern(Kern *k)
{ @+hwrite_explicit(k->x);
if (k->d.h==0.0 && k->d.v==0.0 && k->d.w==0) hwrite_ref(zero_dimen_no);
else hwrite_xdimen(&(k->d));
}
@
\getcode
@<cases to get content@>=
@t\1\kern1em@>
case TAG(kern_kind,b010): @+ {@+Kern k; @+HGET_KERN(b010,k);@+ } @+break;
case TAG(kern_kind,b011): @+ {@+Kern k; @+HGET_KERN(b011,k);@+ } @+break;
case TAG(kern_kind,b110): @+ {@+Kern k; @+HGET_KERN(b110,k);@+ } @+break;
case TAG(kern_kind,b111): @+ {@+Kern k; @+HGET_KERN(b111,k);@+ } @+break;
@
@<get macros@>=
#define @[HGET_KERN(I,K)@] \
K.x=(I)&b100;\
if (((I)&b011)==2) {HGET32(K.d.w);@+ K.d.h=K.d.v=0.0;@+}\
else if (((I)&b011)==3) hget_xdimen_node(&(K.d));\
hwrite_kern(&k);
@
\putcode
@<put functions@>=
Tag hput_kern(Kern *k)
{ Info info;
if (k->x) info=b100; @+else info=b000;
if (k->d.h==0.0 && k->d.v==0.0)
{ if (k->d.w==0) HPUT8(zero_dimen_no);
else {HPUT32(k->d.w); info=info|2;@+}
}
else {hput_xdimen_node(&(k->d));info=info|3;@+}
return TAG(kern_kind,info);
}
@
\subsection{Glue}\label{glue}
%Glue considerations
%So what are the cases:
%\itemize
%\item reference to a dimen (common)
%\item reference to a xdimen
%\item reference to a dimen plus and minus
%\item reference to a xdimen plus and minus
%\item reference to a dimen plus
%\item reference to a xdimen plus
%\item reference to a dimen minus
%\item reference to a xdimen minus
%\item dimen
%\item xdimen
%\item dimen plus and minus
%\item xdimen plus and minus (covers all other cases)
%\item dimen plus
%\item xdimen plus
%\item dimen minus
%\item xdimen minus
%\item plus and minus
%\item plus
%\item minus
%\item zero glue (rare, can be replaced by a reference to the zero glue)
%\item reference to a predefined glue (common)
%\enditemize
%This is a total of 21 cases. Can we use the info bits to specify 7 common
%cases and one catch all? First the use of an extended dimension in a glue
%is probably not very common. More typically is the use of a fill glue
%that extends to the boundaries of the enclosing box.
%Here is the statistics for ctex:
%total 58937 glue entries
%total 49 defined glues (so 200 still available)
%There are three font specific glues defined for each font used in texts.
%The explicit glue nodes are the following:
%\itemize
%\item 35\% is predefined zero glue
%\item 30\% are 39 other predefined glue most of them less than 1%
%\item 8\% (4839) is one glue with 25pt pure stretch with order 0
%\item 25\% (14746) is one glue with 100pt stretch and 10pt shrink with order 0
%\item 2\% (1096) is one glue with 10pt no stretch and shrink
%\item 0\% (13) are 7 different glues with no stretch and shrink
%\item 0\% (3) different glues with width!=0 and some stretch of order 0
%\item 0\% (27) 20 different glues with stretch and shrink
%\enditemize
%Some more glue with 1fil is insider 55 leaders
%one vset has an extent 1 no stretch and shrink
%56 hset specify an extent 2 and 1 fil stretch
We have seen in section~\secref{stretch} how to deal with
stretchability\index{stretchability} and
shrinkability\index{shrinkability} and we will need this now.
Glue\index{glue} has a natural width---which in general can be an
extended dimension---and in addition it can stretch and shrink. It
might have been possible to allow an extended dimension also for the
stretch\-ability or shrink\-ability of a glue, but this seems of
little practical relevance and so simplicity won over generality.
Even with that restriction, it is an understatement to regard glue
nodes as "simple" nodes.
%, and we could equally well list them in
%section~\secref{composite} as composite nodes.
To use the info bits in the short format wisely, I collected some
statistical data using the \TeX book as an example. It turns out that
about 99\% of all the 58937 glue nodes (not counting the interword
glues used inside texts) could be covered with only 43 predefined
glues. So this is by far the most common case; we reserve the info
value |b000| to cover it and postpone the description of such glue
nodes until we describe references in section~\secref{reference}.
We expect the remaining cases to contribute not too much to the file
size, and hence, simplicity is a more important aspect than efficiency
when allocating the remaining info values.
Looking at the glues in more detail, we find that the most common
cases are those where either one, two, or all three glue components
are zero. We use the two lowest bits to indicate the presence of a
nonzero stretchability or shrinkability and reserve the info values
|b001|, |b010|, and |b011| for those cases where the width of the glue
is zero. The zero glue, where all components are zero, is defined as
a fixed, predefined glue instead of reserving a special info value for
it. The cost of one extra byte when encoding it seems not too high a
price to pay. After reserving the info value |b111| for the most
general case of a glue, we have only three more info values left:
|b100|, |b101|, and |b110|. Keeping things simple implies using the
two lowest info bits---as before---to indicate a nonzero
stretchability or shrinkability. For the width, three choices remain:
using a reference to a dimension, using a reference to an extended
dimension, or using an immediate value. Since references to glues are
already supported, an immediate width seems best for glues that are
not frequently reused, avoiding the overhead of references.
% It also makes parsing simpler because we avoid the confusion
% between references to dimensions
% and references to glues and references to extended dimensions.
Here is a summary of the info bits and the implied layout
of glue nodes in the short format:
\itemize
\item |b000|: reference to a predefined glue
\item |b001|: zero width and nonzero shrinkability
\item |b010|: zero width and nonzero stretchability
\item |b011|: zero width and nonzero stretchability and shrinkability
\item |b100|: nonzero width
\item |b101|: nonzero width and nonzero shrinkability
\item |b110|: nonzero width and nonzero stretchability
\item |b111|: extended dimension and nonzero stretchability and shrinkability
\enditemize
@<hint basic types@>=
typedef struct {@+
Xdimen w; @+
Stretch p, m;@+
} Glue;
@
To test for a zero glue,
we implement a macro:
@<hint macros@>=
#define @[ZERO_GLUE(G)@] ((G).w.w==0 && (G).w.h==0.0 && (G).w.v==0.0 && (G).p.f==0.0 && (G).m.f==0.0)
@
Because other nodes (leaders, baselines, and fonts)
contain glue nodes as parameters, we provide functions
to read and write a complete glue node in the same way as we did
for rule nodes.
Further, such an internal {\sl glue\_node\/} has the special property that
in the short format a node for the zero glue might be omitted entirely.
\readcode
@s GLUE symbol
@s glue symbol
@s glue_node symbol
@s PLUS symbol
@s MINUS symbol
@s plus symbol
@s minus symbol
@<symbols@>=
%token GLUE "glue"
%token PLUS "plus"
%token MINUS "minus"
%type <g> glue
%type <b> glue_node
%type <st> plus minus
@
@<scanning rules@>=
::@=glue@> :< return GLUE; >:
::@=plus@> :< return PLUS; >:
::@=minus@> :< return MINUS; >:
@
@<parsing rules@>=
plus: { $$.f=0.0; $$.o=0; } | PLUS stretch {$$=$2;};
minus: { $$.f=0.0; $$.o=0; } | MINUS stretch {$$=$2;};
glue: xdimen plus minus {$$.w=$1; $$.p=$2; $$.m=$3; };
content_node: start GLUE glue END {if (ZERO_GLUE($3)) {HPUT8(zero_skip_no);
hput_tags($1,TAG(glue_kind,0)); } else hput_tags($1,hput_glue(&($3))); };
glue_node: start GLUE glue END @/
{@+ if (ZERO_GLUE($3)) { hpos--; $$=false;@+}@/
else { hput_tags($1,hput_glue(&($3))); $$=true;@+}@+ };
@
\writecode
@<write functions@>=
void hwrite_plus(Stretch *p)
{ @+if (p->f!=0.0) { hwritef(" plus");@+hwrite_stretch(p); @+}
}
void hwrite_minus(Stretch *m)
{@+ if (m->f!=0.0) { hwritef(" minus");@+hwrite_stretch(m); @+}
}
void hwrite_glue(Glue *g)
{ hwrite_xdimen(&(g->w)); @+
hwrite_plus(&g->p); @+hwrite_minus(&g->m);
}
void hwrite_ref_node(Kind k, uint8_t n);
void hwrite_glue_node(Glue *g)
{@+
if (ZERO_GLUE(*g)) hwrite_ref_node(glue_kind,zero_skip_no);
else @+{ hwrite_start(); @+hwritef("glue"); @+hwrite_glue(g); @+hwrite_end();@+}
}
@
\getcode
@<cases to get content@>=
@t\1\kern1em@>
case TAG(glue_kind,b001): { Glue g;@+ HGET_GLUE(b001,g);@+ hwrite_glue(&g);@+}@+break;
case TAG(glue_kind,b010): { Glue g;@+ HGET_GLUE(b010,g);@+ hwrite_glue(&g);@+}@+break;
case TAG(glue_kind,b011): { Glue g;@+ HGET_GLUE(b011,g);@+ hwrite_glue(&g);@+}@+break;
case TAG(glue_kind,b100): { Glue g;@+ HGET_GLUE(b100,g);@+ hwrite_glue(&g);@+}@+break;
case TAG(glue_kind,b101): { Glue g;@+ HGET_GLUE(b101,g);@+ hwrite_glue(&g);@+}@+break;
case TAG(glue_kind,b110): { Glue g;@+ HGET_GLUE(b110,g);@+ hwrite_glue(&g);@+}@+break;
case TAG(glue_kind,b111): { Glue g;@+ HGET_GLUE(b111,g);@+ hwrite_glue(&g);@+}@+break;
@
@<get macros@>=
#define @[HGET_GLUE(I,G)@] {\
if((I)!=b111) { if ((I)&b100) HGET32((G).w.w);@+ else (G).w.w=0;}\
if((I)&b010) HGET_STRETCH((G).p) @+else (G).p.f=0.0, (G).p.o=0;\
if((I)&b001) HGET_STRETCH((G).m) @+else (G).m.f=0.0, (G).m.o=0;\
if((I)==b111) hget_xdimen_node(&((G).w)); else (G).w.h=(G).w.v=0.0;@+}
@
The |hget_glue_node| can cope with a glue node that is omitted and
will supply a zero glue instead.
@<get functions@>=
void hget_glue_node(void)
{ @<read the start byte |a|@>@;
if (KIND(a)!=glue_kind)
{@+ hpos--; hwrite_ref_node(glue_kind,zero_skip_no);@+return; @+}
if (INFO(a)==b000)
{ uint8_t n=HGET8;@+ REF(glue_kind,n);@+hwrite_ref_node(glue_kind,n); @+}
else
{ @+Glue g; @+HGET_GLUE(INFO(a),g);@+ hwrite_glue_node(&g);@+}
@<read and check the end byte |z|@>@;
}
@
\putcode
@<put functions@>=
Tag hput_glue(Glue *g)
{ Info info=b000;
if (ZERO_GLUE(*g)) { HPUT8(zero_skip_no); @+ info=b000; }
else if ( (g->w.w==0 && g->w.h==0.0 && g->w.v==0.0))
{ if (g->p.f!=0.0) { hput_stretch(&g->p); @+info|=b010; @+}
if (g->m.f!=0.0) { hput_stretch(&g->m); @+info|=b001; @+}
}
else if ( g->w.h==0.0 && g->w.v==0.0 && (g->p.f==0.0 || g->m.f==0.0))
{ HPUT32(g->w.w); @+ info=b100;
if (g->p.f!=0.0) { hput_stretch(&g->p); @+info|=b010; @+}
if (g->m.f!=0.0) { hput_stretch(&g->m); @+info|=b001; @+}
}
else@/
{ hput_stretch(&g->p);@+ hput_stretch(&g->m);
hput_xdimen_node(&(g->w));
info=b111;
}
return TAG(glue_kind,info);
}
@
\section{Lists}\hascode\label{lists}
When a node contains multiple other nodes, we package these nodes into
a list\index{list} node. It is important to note that list nodes
never occur as individual nodes, they only occur as parts of other
nodes. In total, we have three different types of lists: plain lists
that use the kind-value |list_kind|, text\index{text} lists that use
the kind-value |list_kind| together with the info bit |b100|,
and parameter\index{parameter} lists that use the
kind-value |param_kind|. A description of the first two types of
lists follows here. Parameter lists are described in section~\secref{paramlist}.
Because lists are of variable size, it is not possible in the short
format to tell from the kind and info bits of a tag byte the size of
the list node. So advancing from the beginning of a list node to the
next node after the list is not as simple as usual. To solve this
problem, we store the size of the list immediately after the start
byte and before the end byte. Alternatively we could require programs
to traverse the entire list. The latter solution is more compact but
inefficient for list with many nodes; our solution will cost some
extra bytes, but the amount of extra bytes will only grow
logarithmically with the size of the \HINT\ file. It would be
possible to allow both methods so that a \HINT\ file could balance
size and time trade-offs by making small lists---where the size can be
determined easily by reading the entire list---without size
information and making large lists with size information so that they
can be skipped easily without reading them. But the added complexity
seems too high a price to pay.
Now consider the problem of reading a content stream starting at an arbitrary
position $i$ in the middle of the stream. This situation occurs
naturally when resynchronizing\index{resynchronization} a content stream after
an error has been detected, but implementing links poses a similar problem.
We can inspect the byte at position $i$ and see
if it is a valid tag. If yes, we are faced with the problem of
verifying that this is not a mere coincidence.
So we determine the size $s$ of the node. If the byte in question is a start byte,
we should find a matching byte $s$ bytes later in the stream; if it is an end byte,
we should find the matching byte $s$ bytes earlier in the stream; if we
find no matching byte, this was neither a start nor an end byte.
If we find exactly one matching byte, we can be quite confident (error
probability 1/256 if assuming equal probability of all byte values)
that we have found a tag, and we know whether
it is the beginning or the end tag. If we find two matching byte, we
have most likely the start or the end of a node, but we do not know which
of the two. To find out which of the two possibilities is true
or to reduce the probability of an error, we can
check the start and end byte of the node immediately preceding a start byte or
immediately following an end byte in a similar way.
By testing two more byte, this additional check will reduce the error
probability further to $1/2^{24}$ (under the same assumption as before). So
checking more nodes is rarely necessary. This whole schema
would, however, not work if we happen to find a tag byte that indicated
either the begin or the end of a list without specifying the size
of the list. Sure, we can verify the bytes before and after it to
find out whether the byte following it is the begin of a node and the
byte preceding it is the end of a node, but we still don't know if the
byte itself starts a node list or ends a node list. Even reading along
in either direction until finding a matching tag will not answer the
question. The situation is better if we specify a
size: we can read the suspected size after or before the tag and check if we
find a matching tag and size at the position indicated.
In the short format, we use the two lower bits of the |info| value to indicate the number of
byte used to store the list size: A list with $\hbox{|info&0x3|}=1$ uses 1 byte,
with $\hbox{|info&0x3|}=2$ uses 2 byte, and with $\hbox{|info&0x3|}=3$ uses 4 byte.
The |info&0x3| value zero is reserved for references to predefined lists.
An empty list is always represented using zero as the reference number.
General predefined lists are currently implemented only for parameter lists.
Storing the list size immediately preceding the end tag creates a new
problem: If we try to recover from an error, we might not know the
size of the list and searching for the end of a list, we might be
unable to tell the difference between the bytes that encode the list
size and the start tag of a possible next node. If we parse the
content backward, the problem is completely symmetric.
To solve the problem, we insert an additional byte immediately before
the final size and after the initial size marking the size boundary.
We choose the byte values |0xFF|, |0xFE|, and |0xFD| which can
not be confused with valid tag bytes and indicate that the size is
stored using 1, 2, or 4 byte respectively. Under regular
circumstances, these bytes are simply skipped. When searching for the
list end (or start) these bytes would correspond to
|TAG(penalty_kind,i)| with $7 \ge \hbox{|i|} \ge 5$ and can not be
confused with valid penalty nodes which use only the info values 0, 1,
and~2. An empty list always uses the info value 0 and the reference value 0.
We are a bit lazy when it comes to the internal representation of a list.
Since we need the representation as a short format byte sequence anyway,
it consists of the position |p| of the start of the byte sequence
combined with an integer |s| giving the size of the byte sequence.
If the list is empty, |s| is zero.
@<hint types@>=
typedef struct {@+
Tag t;@+
uint32_t p;@+
uint32_t s;@+
} List;
@
The major drawback of this choice of representation is that it ties
together the reading of the long format and the writing of the short
format; these are no longer independent.
So starting with the present section, we have to take the short format
representation of a node into account already when we parse the long
format representation.
In the long format, we may start a list node with an
estimate\index{estimate} of the size needed to store the list in the
short format. We do not want to require the exact size because this
would make editing of long format \HINT\ files almost impossible. Of
course this makes it also impossible to derive the exact |s| value of
the internal representation from the long format
representation. Therefore we start by parsing the estimate of the list
size and use it to reserve the necessary number of byte to store the
size. Then we parse the |content_list|. As a side effect---and this
is an important point---this will write the list content in short
format into the output buffer. As mentioned above, whenever a node
contains a list, we need to consider this side effect when we give the
parsing rules. We will see examples for this in
section~\secref{composite}.
The function |hput_list| will be called {\it after} the short format
of the list is written to the output. Before we pass the internal
representation of the list to the |hput_list|
function, we update |s| and |p|. Further, we pass the position in the stream where the
list size and its boundary mark is supposed to be.
Before |hput_list| is called, space for the tag, the size, and the boundary mark
is allocated based on the estimate. The function
|hsize_bytes| computes the number of byte required to store the list
size, and the function |hput_list_size| will later write the list
size. If the estimate turns out to be wrong, the list data can be moved
to make room for a larger or smaller size field.
If the long format does not specify a size estimate, a suitable default must be chosen.
A statistical analysis shows
%
%statistics about list sizes using my old prototype
%
%name type size_byte list_count total_size
%hello.hnt text 1 6 748
% text 2 2 1967
% list 1 65 3245
% list 2 1 352
%web2w.hnt text 1 1043 121925
% text 2 1344 859070
% list 1 19780 725514
% list 2 487 199243
%ctex.hnt text 1 9121 4241128
% text 2 12329 7872687
% text 3 1 75010
% list 1 121557 4600743
% list 2 222 147358
%
that most plain lists need only a single byte to store the size; and even the
total amount of data contained in these lists exceeds the amount of data stored
in longer lists by a factor of about 3. Hence if we do not have an estimate,
we reserve only a single byte to store the size of a list.
The statistics looks different for lists stored as a text: The number of texts
that require two byte for the size is slightly larger than the number of texts that
need only one byte, and the total amount of data stored in these texts is larger
by a factor of 2 to 7 than the total amount of data found in all other texts.
Hence as a default, we reserve two byte to store the size for texts.
\subsection{Plain Lists}\label{plainlists}
Plain list nodes start and end with a tag of kind |list_kind|.
Not uncommon are empty\index{empty list} lists; these can be
stored using $|info|=0$ and a reference to the predefined empty list.
Writing the long format uses the fact that the function
|hget_content_node|, as implemented in the \.{stretch} program, will
output the node in the long format.
\readcode
@s list symbol
@s content_list symbol
@s estimate symbol
@s position symbol
@<symbols@>=
%type <l> list
%type <u> position content_list
@
@<parsing rules@>=
position: {$$=hpos-hstart;};
content_list: @+ position @+
| content_list content_node;
estimate: {hpos+=2; } @+
| UNSIGNED {hpos+=hsize_bytes($1)+1; } ;
list: start estimate content_list END @/
{@+$$.t=TAG(list_kind,b010);@+ $$.p=$3; @+ $$.s=(hpos-hstart)-$3;
hput_tags($1,hput_list($1+1, &($$)));@+};
@
\writecode
@<write functions@>=
void hwrite_list(List *l)
{ uint32_t h=hpos-hstart, e=hend-hstart; /* save |hpos| and |hend| */
hpos=l->p+hstart;@+ hend=hpos+l->s;
if (KIND(l->t)==list_kind)
{ if (INFO(l->t)&b100) @<write a text@>@; else @<write a list@>@; }
else QUIT("List expected got %s", content_name[KIND(l->t)]);
hpos=hstart+h;@+ hend=hstart+e; /* restore |hpos| and |hend| */
}
@
@<write a list@>=
{@+if (l->s==0) hwritef(" <>");@/
else@/
{@+DBG(DBGNODE,"Write list at 0x%x size=%u\n", l->p, l->s);
@+hwrite_start();@+
if (section_no==2) hwrite_label();
if (l->s>0xFF) hwritef("%d",l->s);
while(hpos<hend)
hget_content_node();
hwrite_end();
}
}
@
\getcode
@<shared get functions@>=
void hget_size_boundary(Info info)
{ uint32_t n;
info=info&0x3;
if (info==0) return;
n=HGET8;
if (n!=0x100-info) QUIT("Non matching boundary byte 0x%x with info value %d at 0x%x",
n, info,(uint32_t)(hpos-hstart-1));
}
uint32_t hget_list_size(Info info)
{ uint32_t n=0;
info=info&0x3;
if (info==0) return 0;
else if (info==1) n=HGET8;
else if (info==2) HGET16(n);
else if (info==3) HGET32(n);
else QUIT("List info %d must be 0, 1, 2, or 3",info);
return n;
}
void hget_list(List *l)
{@+if (KIND(*hpos)!=list_kind && KIND(*hpos)!=param_kind) @/
QUIT("List expected at 0x%x", (uint32_t)(hpos-hstart));
else
{
@<read the start byte |a|@>@;
l->t=a;
HGET_LIST(INFO(a),*l);
@<read and check the end byte |z|@>@;
DBG(DBGNODE,"Get list at 0x%x size=%u\n", l->p, l->s);
}
}
@
If a list has the info value zero, the list is the empty list.
Other list references are currently not implemented.
@<shared get macros@>=
#define @[HGET_LIST(I,L)@] \
if (((I)&0x3)==0) {uint8_t n=HGET8; @+REF_RNG(KIND((L).t),n);@+ (L).s=0;@+}\
else { (L).s=hget_list_size(I); hget_size_boundary(I);\
(L).p=hpos-hstart; \
hpos=hpos+(L).s; hget_size_boundary(I);\
{ uint32_t s=hget_list_size(I); \
if (s!=(L).s) \
QUIT(@["List sizes at 0x%x and " SIZE_F " do not match 0x%x != 0x%x"@],node_pos+1,hpos-hstart-I-1,(L).s,s);}}
@
\putcode
@<put functions@>=
uint8_t hsize_bytes(uint32_t n)
{ @+if (n==0) return 0;
else if (n<0x100) return 1;
else if (n<0x10000) return 2;
else return 4;
}
void hput_list_size(uint32_t n, int i)
{ @+if (i==0) return;
else if (i==1) HPUT8(n);
else if (i==2) HPUT16(n);
else HPUT32(n);
}
Tag hput_list(uint32_t start_pos, List *l)
{ @+if (l->s==0)
{ hpos=hstart+start_pos; @+ HPUT8(0); @+return TAG(KIND(l->t),INFO(l->t)&b100);@+}
else@/
{ uint32_t list_end=hpos-hstart;
int i=l->p -start_pos-1; /* number of byte allocated for size */
int j=hsize_bytes(l->s); /* number of byte needed for size */
Info k;
if (j==4) k=3; else k=j;
DBG(DBGNODE,"Put list at 0x%x size=%u\n", l->p, l->s);
if (i>j && l->s> 0x100) j=i; /* avoid moving large lists */
if (i!=j)@/
{ int d= j-i;
DBG(DBGNODE,"Moving %u byte by %d\n", l->s,d);
if (d>0) HPUTX(d);
memmove(hstart+l->p+d,hstart+l->p,l->s);
@<adjust label positions after moving a list@>@;
l->p=l->p+d;@+
list_end=list_end+d;
}
hpos=hstart+start_pos; @+ hput_list_size(l->s,j);@+ HPUT8(0x100-k);
hpos=hstart+list_end;@+ HPUT8(0x100-k);@+ hput_list_size(l->s,j);
return TAG(KIND(l->t),k|(INFO(l->t)&b100));
}
}
@
\subsection{Texts}\label{text}
A Text\index{text} is a list of nodes with a representation optimized
for character nodes. In the long format, a sequence of characters
like ``{\tt Hello}'' is written ``\.{<glyph 'H'} \.{*0>} \.{<glyph} \.{'e'}
\.{*0>} \.{<glyph 'l' *0>} \.{<glyph 'l' *0>} \.{<glyph 'o' *0>}'', and
even in the short format it requires 4 byte per character! As a text,
the same sequence is written ``{\tt\,"Hello"\,}'' in the long format and the
short format requires usually just 1 byte per character. Indeed
except the bytes with values from |0x00| to |0x20|, which are
considered control\index{control code} codes, all bytes and all
\hbox{UTF-8}\index{UTF8} multibyte sequences are simply considered
character\index{character code} codes. They are equivalent to a glyph
node in the ``current font''. The current\index{current font}
font\index{font} is font number 0 at the beginning of a text and it
can be changed using the control codes. We introduce the concept of a
``current font'' because we do not expect the font to change too
often, and it allows for a more compact representation if we do not
store the font with every character code. It has an important
disadvantage though: storing only font changes prevents us from
parsing a text backwards; we always have to start at the beginning of
the text, where the font is known to be font number~0.
Defining a second format for encoding lists of nodes adds another
difficulty to the problem we had discussed at the beginning of
section~\secref{lists}. When we try to recover from an error and start
reading a content stream at an arbitrary position, the first thing we
need to find out is whether at this position we have the tag byte of
an ordinary node or whether we have a position inside a text.
Inside a text, character nodes start with a byte in the range
|0x21|--|0xF7|. This is a wide range and it overlaps considerably with
the range of valid tag bytes. It is however possible to choose the
kind-values in such a way that the control codes do not overlap with
the valid tag bytes that start a node.
For this reason, the values
|list_kind==0|, |param_kind==1|, |range_kind==2|, |xdimen_kind==3|, and
|adjust_kind==4| were chosen on page~\pageref{kinddef}. Lists,
parameter lists, and extended dimensions occur only {\it inside} of
content nodes, but are not content nodes in their own right;
page ranges occur only in the definition section; so the
values |0x00| to |0x1F| are not used as tag bytes of content
nodes. The value |0x20| would, as a tag byte, indicate an adjust node
(|adjust_kind==4|) with info value zero. Because there are no
predefined adjustments, |0x20| is not used as a tag byte either.
(An alternative choice would be to use the kind value 4 for paragraph
nodes because there are no predefined paragraphs.)
The largest byte that starts an UTF8 code is |0xF7|; hence, there are
eight possible control codes, from |0xF8| to |0xFF|, available. The
first three values |0xF8|, |0xF9|, and |0xFA| are actually used for
penalty nodes with info values, 0, 1, and 2. The last three
|0xFD|, |0xFE|, and |0xFF| are used as boundary marks for the text
size and therefore we can use only |0xFB| and \label{FC}|0xFC| as control codes.
In the long format, we do not provide a syntax for specifying a size
estimate\index{estimate} as we did for plain lists, because we expect
text to be quite short. We allocate two byte for the size and hope
that this will prove to be sufficient most of the time. Further, we
will disallow the use of non-printable ASCII codes, because these
are---by definition---not very readable, and we will give special
meaning to some of the printable ASCII codes because we will need a
notation for the beginning and ending of a text, for nodes inside a
text, and the control codes.
Here are the details:
\itemize
\item In the long format, a text starts and ends with a
double\index{double quote} quote character ``{\tt "}''. In the short
format, texts are encoded similar to lists setting the info bit |b100|.
\item Arbitrary nodes can be embedded inside a text. In the long
format, they are enclosed in pointed brackets \.{<} \dots \.{>} as
usual. In the short format, an arbitrary node can follow the control
code $|txt_node|=|0x1E|$. Because text may occur in nodes, the scanner
needs to be able to parse texts nested inside nodes nested inside
nodes nested inside texts \dots\ To accomplish this, we use the
``stack'' option of \.{flex} and include the pushing and popping of the
stack in the macros |SCAN_START| and |SCAN_END|.
\item The space\index{space character} character ``\.{\ }'' with ASCII
value |0x20| stands in both formats for the font specific interword
glue node (control code |txt_glue|).
\item The hyphen\index{hyphen character} character ``\.{-}'' in the
long format and the control code $|txt_hyphen|=|0x1F|$ in the short
format stand for the font specific discretionary hyphenation node.
\item In the long format, the backslash\index{backslash} character
``\.{\\}'' is used as an escape character. It is used to introduce
notations for control codes, as described below, and to access the
character codes of those ASCII characters that otherwise carry a
special meaning. For example ``{\tt \BS "}'' denotes the character code
of the double quote character ``{\tt "}''; and similarly ``\.{\\\\}'',
``\.{\\<}'', ``\.{\\>}'', ``\.{\\\ }'', and ``\.{\\-}'' denote the
character codes of ``\.{\\}'', ``\.{<}'', ``\.{>}'', ``\.{\ }'', and
``\.{-}'' respectively.
\item In the long format, a TAB-character (ASCII code
|0x09|)\index{tab character} is silently converted to a
space\index{space character} character (ASCII code |0x20|);
a NL-character\index{newline character} (ASCII code |0x0A|), together
with surrounding spaces, TAB-characters,
and CR-characters\index{carriage return character} (ASCII code |0x0D|),
is silently converted to a single space character. All other ASCII
characters in the range |0x00| to |0x1F| are not allowed inside a
text. This rule avoids the problems arising from ``invisible''
characters embedded in a text and it allows to break texts into lines,
even with indentation\index{indentation}, at word boundaries.
To allow breaking a text into lines without inserting spaces, a
NL-character together with surrounding spaces, TAB-characters, and
CR-characters is completely ignored if the whole group of spaces,
TAB-characters, CR-characters, and the NL-character is preceded by a
backslash character.
For example, the text ``\.{"There\ is\ no\ more\ gas\ in\ the\
tank."}''\hfil\break can be written as \medskip
\qquad\vbox{\hsize=0.5\hsize\noindent
\.{"There\ is\ }\hfil\break
\.{\hbox to 2em {$\rightarrow$\hfill}no more g\\\ \ }\hfil\break
\.{\hbox to 2em {$\rightarrow$\hfill}as in the tank."}
}\hss
To break long lines when writing a long format file, we use the
variable |txt_length| to keep track of the approximate length of the
current line.
\item The control codes $|txt_font|=|0x00|$, |0x01|, |0x02|, \dots,
and |0x07| are used to change the current font to font
number 0, 1, 2, \dots, and 7. In the long format these control
codes are written \.{\\0}, \.{\\1}, \.{\\2}, \dots, and \.{\\7}.
\item The control code $|txt_global|=|0x08|$ is followed by a second
parameter byte. If the value of the parameter byte is $n$, it will set
the current font to font number $n$. In the long format, the two byte
sequence is written ``\.{\\F}$n$\.{\\}'' where $n$ is the decimal
representation of the font number.
\item The control codes |0x09|, |0x0A|, |0x0B|, |0x0C|, |0x0D|,
|0x0E|, |0x0F|, and |0x10| are also followed by a second parameter
byte. They are used to reference the global definitions of
penalty\index{penalty}, kern\index{kern}, ligature\index{ligature},
disc\index{discretionary hyphen}, glue\index{glue}, language\index{language},
rule\index{rule}, and image\index{image} nodes. The parameter byte
contains the reference number. For example, the byte sequence |0x09|
|0x03| is equivalent to the node \.{<penalty *3>}.
In the long format these two-byte sequences are written,
``\.{\\P}$n$\.{\\}'' (penalty),
``\.{\\K}$n$\.{\\}'' (kern),
``\.{\\L}$n$\.{\\}'' (ligature),
``\.{\\D}$n$\.{\\}'' (disc),
``\.{\\G}$n$\.{\\}'' (glue),
``\.{\\S}$n$\.{\\}'' (speak or German ``Sprache''),
``\.{\\R}$n$\.{\\}'' (rule), and
``\.{\\I}$n$\.{\\}'' (image), where $n$ is the decimal representation
of the parameter value.
\item The control codes from $|txt_local|=|0x11|$ to |0x1C| are used
to reference one of the 12 font specific parameters\index{font
parameter}. In the long format they are written ``\.{\\a}'',
``\.{\\b}'', ``\.{\\c}'', \dots, ``\.{\\j}'', ``\.{\\k}'',``\.{\\l}''.
\item The control code $|txt_cc|=|0x1D|$ is used as a prefix for an
arbitrary character code represented as an UTF-8 multibyte sequence.
Its main purpose is providing a method for including character codes
less or equal to |0x20| which otherwise would be considered control
codes. In the long format, the byte sequence is written ``\.{\\C}$n$\.{\\}''
where $n$ is the decimal representation of the character code.
\item The control code $|txt_node|=|0x1E|$ is used as a prefix for an
arbitrary node in short format. In the long format, it is written
``\.{<}'' and is followed by the node content in long format
terminated by ``\.{>}''.
\item The control code $|txt_hyphen|=|0x1F|$ is used to access the
font specific discretionary hyphen\index{hyphen}. In the long format
it is simply written as ``\.{-}''.
\item The control code $|txt_glue|=|0x20|$ is the space character, it
is used to access the font specific interword\index{interword glue}
glue. In the long format, we use the space character\index{space
character} ``\.{\ }'' as well.
\item The control code $|txt_ignore|=|0xFB|$ is ignored, its position
can be used in a link to specify a position between two characters. In
the long format it is written as ``\.{\\@@}''.
\item The control code |0xFC| is currently unused.
\enditemize
For the control codes, we define an enumeration type
and for references, a reference type.
@<hint types@>=
typedef enum { @+txt_font=0x00, txt_global=0x08, txt_local=0x11,
txt_cc=0x1D, txt_node=0x1E, txt_hyphen=0x1F,
txt_glue=0x20, txt_ignore=0xFB} Txt;
@
\readcode
@s TXT symbol
@s TXT_START symbol
@s TXT_END symbol
@s TXT_FONT symbol
@s TXT_LOCAL symbol
@s TXT_GLOBAL symbol
@s TXT_FONT_GLUE symbol
@s TXT_FONT_HYPHEN symbol
@s TXT_CC symbol
@s TXT_IGNORE symbol
@s text symbol
@<scanning definitions@>=
%x TXT
@
@<symbols@>=
%token TXT_START TXT_END TXT_IGNORE
%token TXT_FONT_GLUE TXT_FONT_HYPHEN
%token <u> TXT_FONT TXT_LOCAL
%token <rf> TXT_GLOBAL
%token <u> TXT_CC
%type <u> text
@
@<scanning rules@>=
::@=\"@> :< SCAN_TXT_START; return TXT_START; >:
<TXT>{
::@=\"@> :< SCAN_TXT_END; return TXT_END; >:
::@="<"@> :< SCAN_START; return START; >:
::@=">"@> :< QUIT("> not allowed in text mode");>:
::@=\\\\@> :< yylval.u='\\'; return TXT_CC; >:
::@=\\\"@> :< yylval.u='"'; return TXT_CC; >:
::@=\\"<"@> :< yylval.u='<'; return TXT_CC; >:
::@=\\">"@> :< yylval.u='>'; return TXT_CC; >:
::@=\\" "@> :< yylval.u=' '; return TXT_CC; >:
::@=\\"-"@> :< yylval.u='-'; return TXT_CC; >:
::@=\\"@@"@> :< return TXT_IGNORE; >:
::@=[ \t\r]*(\n[ \t\r]*)+@> :< return TXT_FONT_GLUE; >:
::@=\\[ \t\r]*\n[ \t\r]*@> :< ; >:
::@=\\[0-7]@> :< yylval.u=yytext[1]-'0'; return TXT_FONT; >:
::@=\\F[0-9]+\\@> :< SCAN_REF(font_kind); return TXT_GLOBAL; >:
::@=\\P[0-9]+\\@> :< SCAN_REF(penalty_kind); return TXT_GLOBAL; >:
::@=\\K[0-9]+\\@> :< SCAN_REF(kern_kind); return TXT_GLOBAL; >:
::@=\\L[0-9]+\\@> :< SCAN_REF(ligature_kind); return TXT_GLOBAL; >:
::@=\\D[0-9]+\\@> :< SCAN_REF(disc_kind); return TXT_GLOBAL; >:
::@=\\G[0-9]+\\@> :< SCAN_REF(glue_kind); return TXT_GLOBAL; >:
::@=\\S[0-9]+\\@> :< SCAN_REF(language_kind); return TXT_GLOBAL; >:
::@=\\R[0-9]+\\@> :< SCAN_REF(rule_kind); return TXT_GLOBAL; >:
::@=\\I[0-9]+\\@> :< SCAN_REF(image_kind); return TXT_GLOBAL; >:
::@=\\C[0-9]+\\@> :< SCAN_UDEC(yytext+2); return TXT_CC; >:
::@=\\[a-l]@> :< yylval.u=yytext[1]-'a'; return TXT_LOCAL; >:
::@=" "@> :< return TXT_FONT_GLUE; >:
::@="-"@> :< return TXT_FONT_HYPHEN; >:
::@={UTF8_1}@> :< SCAN_UTF8_1(yytext); return TXT_CC; >:
::@={UTF8_2}@> :< SCAN_UTF8_2(yytext); return TXT_CC; >:
::@={UTF8_3}@> :< SCAN_UTF8_3(yytext); return TXT_CC; >:
::@={UTF8_4}@> :< SCAN_UTF8_4(yytext); return TXT_CC; >:
}
@
@<scanning macros@>=
#define @[SCAN_REF(K)@] @[yylval.rf.k=K;@+ yylval.rf.n=atoi(yytext+2)@;@]
static int scan_level=0;
#define SCAN_START @[yy_push_state(INITIAL);@+if (1==scan_level++) hpos0=hpos;@]
#define SCAN_END @[if (scan_level--) yy_pop_state(); @/else QUIT("Too many '>' in line %d",yylineno)@]
#define SCAN_TXT_START @[BEGIN(TXT)@;@]
#define SCAN_TXT_END @[BEGIN(INITIAL)@;@]
@
@s txt symbol
@<parsing rules@>=
list: TXT_START position @|
{hpos+=4; /* start byte, two size byte, and boundary byte */ }
text TXT_END@|
{ $$.t=TAG(list_kind,b110);$$.p=$4; $$.s=(hpos-hstart)-$4;
hput_tags($2,hput_list($2+1, &($$)));@+};
text: position @+| text txt;
txt: TXT_CC { hput_txt_cc($1); }
| TXT_FONT { REF(font_kind,$1); hput_txt_font($1); }
| TXT_GLOBAL { REF($1.k,$1.n); hput_txt_global(&($1)); }
| TXT_LOCAL { RNG("Font parameter",$1,0,11); hput_txt_local($1); }
| TXT_FONT_GLUE { HPUTX(1); HPUT8(txt_glue); }
| TXT_FONT_HYPHEN { HPUTX(1);HPUT8(txt_hyphen); }
| TXT_IGNORE { HPUTX(1);HPUT8(txt_ignore); }
| { HPUTX(1); HPUT8(txt_node);} content_node;
@
The following function keeps track of the position in the current line.
If the line gets too long it will break the text at the next space
character. If no suitable space character comes along,
the line will be broken after any regular character.
\writecode
@<write a text@>=
{@+if (l->s==0) hwritef(" \"\"");
else@/
{ int pos=nesting+20; /* estimate */
hwritef(" \"");
while(hpos<hend)@/
{ int i=hget_txt();
if (i<0)
{ if (pos++<70) hwritec(' ');
else hwrite_nesting(), pos=nesting;
}
else if (i==1 && pos>=100)@/
{ hwritec('\\'); @+hwrite_nesting(); @+pos=nesting; @+}
else
pos+=i;
}
hwritec('"');
}
}
@
The function returns the number of characters written
because this information is needed in |hget_txt| below.
@<write functions@>=
int hwrite_txt_cc(uint32_t c)
{@+ if (c<0x20)
return hwritef("\\C%d\\",c);
else@+
switch(c)
{ case '\\': return hwritef("\\\\");
case '"': return hwritef("\\\"");
case '<': return hwritef("\\<");
case '>': return hwritef("\\>");
case ' ': return hwritef("\\ ");
case '-': return hwritef("\\-");
default: return option_utf8?hwrite_utf8(c):hwritef("\\C%d\\",c);
}
}
@
\getcode
@<get macros@>=
#define @[HGET_GREF(K,S)@] {uint8_t n=HGET8;@+ REF(K,n); @+ return hwritef("\\" S "%d\\",n);@+}
@
The function |hget_txt| reads a text element and writes it immediately.
To enable the insertion of line breaks when writing a text, we need to keep track
of the number of characters in the current line. For this purpose
the function |hget_txt| returns the number of characters written.
It returns $-1$ if a space character needs to be written
providing a good opportunity for a break.
@<get functions@>=
int hget_txt(void)
{@+ if (*hpos>=0x80 && *hpos<=0xF7)
{ if (option_utf8)
return hwrite_utf8(hget_utf8());
else
return hwritef("\\C%d\\",hget_utf8());
}
else @/
{ uint8_t a;
a=HGET8;
switch (a)
{ case txt_font+0: return hwritef("\\0");
case txt_font+1: return hwritef("\\1");
case txt_font+2: return hwritef("\\2");
case txt_font+3: return hwritef("\\3");
case txt_font+4: return hwritef("\\4");
case txt_font+5: return hwritef("\\5");
case txt_font+6: return hwritef("\\6");
case txt_font+7: return hwritef("\\7");
case txt_global+0: HGET_GREF(font_kind,"F");
case txt_global+1: HGET_GREF(penalty_kind,"P");
case txt_global+2: HGET_GREF(kern_kind,"K");
case txt_global+3: HGET_GREF(ligature_kind,"L");
case txt_global+4: HGET_GREF(disc_kind,"D");
case txt_global+5: HGET_GREF(glue_kind,"G");
case txt_global+6: HGET_GREF(language_kind,"S");
case txt_global+7: HGET_GREF(rule_kind,"R");
case txt_global+8: HGET_GREF(image_kind,"I");
case txt_local+0: return hwritef("\\a");
case txt_local+1: return hwritef("\\b");
case txt_local+2: return hwritef("\\c");
case txt_local+3: return hwritef("\\d");
case txt_local+4: return hwritef("\\e");
case txt_local+5: return hwritef("\\f");
case txt_local+6: return hwritef("\\g");
case txt_local+7: return hwritef("\\h");
case txt_local+8: return hwritef("\\i");
case txt_local+9: return hwritef("\\j");
case txt_local+10: return hwritef("\\k");
case txt_local+11: return hwritef("\\l");
case txt_cc: return hwrite_txt_cc(hget_utf8());
case txt_node: { int i;
@<read the start byte |a|@>@;
i=hwritef("<");
i+= hwritef("%s",content_name[KIND(a)]);@+ hget_content(a);
@<read and check the end byte |z|@>@;
hwritec('>');@+ return i+10; /* just an estimate */
}
case txt_hyphen: hwritec('-'); @+return 1;
case txt_glue: return -1;
case '<': return hwritef("\\<");
case '>': return hwritef("\\>");
case '"': return hwritef("\\\"");
case '-': return hwritef("\\-");
case txt_ignore: return hwritef("\\@@");
default: hwritec(a); @+return 1;
}
}
}
@
\putcode
@<put functions@>=
void hput_txt_cc(uint32_t c)
{ @+ if (c<=0x20) { HPUTX(2); HPUT8(txt_cc);@+ HPUT8(c); @+ }
else hput_utf8(c);
}
void hput_txt_font(uint8_t f)
{@+ if (f<8) HPUTX(1),HPUT8(txt_font+f);
else QUIT("Use \\F%d\\ instead of \\%d for font %d in a text",f,f,f);
}
void hput_txt_global(Ref *d)
{ @+ HPUTX(2);
switch (d->k)
{ case font_kind: HPUT8(txt_global+0);@+ break;
case penalty_kind: HPUT8(txt_global+1);@+ break;
case kern_kind: HPUT8(txt_global+2);@+ break;
case ligature_kind: HPUT8(txt_global+3);@+ break;
case disc_kind: HPUT8(txt_global+4);@+ break;
case glue_kind: HPUT8(txt_global+5);@+ break;
case language_kind: HPUT8(txt_global+6);@+ break;
case rule_kind: HPUT8(txt_global+7);@+ break;
case image_kind: HPUT8(txt_global+8);@+ break;
default: QUIT("Kind %s not allowed as a global reference in a text",NAME(d->k));
}
HPUT8(d->n);
}
void hput_txt_local(uint8_t n)
{ HPUTX(1);
HPUT8(txt_local+n);
}
@
\section{Composite Nodes}\hascode
\label{composite}
The nodes that we consider in this section can contain one or more list nodes.
When we implement the parsing\index{parsing} routines
for composite nodes in the long format, we have to take into account
that parsing such a list node will already write the list node
to the output. So we split the parsing of composite nodes into several parts
and store the parts immediately after parsing them. On the parse stack, we will only
keep track of the info value.
This new strategy is not as transparent as our previous strategy used for
simple nodes where we had a clean separation of reading and writing:
reading would store the internal representation of a node and writing the internal
representation to output would start only after reading is completed.
The new strategy, however, makes it easier to reuse
the grammar\index{grammar} rules for the component nodes.
Another rule applies to composite nodes: in the short format, the subnodes
will come at the end of the node, and especially a list node that contains content nodes
comes last. This helps when traversing the content section as we will see in
appendix~\secref{fastforward}.
\subsection{Boxes}\label{boxnodes}
The central structuring elements of \TeX\ are boxes\index{box}.
Boxes have a height |h|, a depth |d|, and a width |w|.
The shift amount |a| shifts the contents of the box,
the glue ratio\index{glue ratio} |r| is a factor applied to the glue inside the box,
the glue order |o| is its order of stretchability\index{stretchability},
and the glue sign |s| is $-1$ for shrinking\index{shrinkability},
0 for rigid, and $+1$ for stretching.
Most importantly, a box contains a list |l| of content nodes inside the box.
@<hint types@>=
typedef struct @/{@+ Dimen h,d,w,a;@+ float32_t r;@+ int8_t s,o; @+List l; @+} Box;
@
There are two types of boxes: horizontal\index{horizontal box} boxes
and vertical\index{vertical box} boxes.
The difference between the two is simple:
a horizontal box aligns the reference\index{reference point}
points of its content nodes horizontally, and a positive shift amount\index{shift amount} |a|
shifts the box down;
a vertical box aligns\index{alignment} the reference\index{reference point}
points vertically, and a positive shift amount |a| shifts the box right.
Not all box parameters are used frequently. In the short format, we use the info bits
to indicated which of the parameters are used.
Where as the width of a horizontal box is most of the time (80\%) nonzero,
other parameters are most of the time zero,
like the shift amount (99\%) or the glue settings (99.8\%).
The depth is zero in about 77\%, the height in about 53\%,
and both together are zero in about 47\%. The results for vertical boxes,
which constitute about 20\% of all boxes, are similar,
except that the depth is zero in about 89\%,
but the height and width are almost never zero.
For this reason we use bit |b001| to indicate a nonzero depth,
bit |b010| for a nonzero shift amount, and |b100| for nonzero glue settings.
Glue sign and glue order can be packed as two nibbles in a single byte.
% A different use of the info bits for vertical and horizontal boxes is possible,
% but does not warrant the added complexity.
\goodbreak
\readcode
@s HBOX symbol
@s VBOX symbol
@s box symbol
@s boxparams symbol
@s hbox_node symbol
@s vbox_node symbol
@s box_dimen symbol
@s box_shift symbol
@s box_glue_set symbol
@<symbols@>=
%token HBOX "hbox"
%token VBOX "vbox"
%token SHIFTED "shifted"
%type <info> box box_dimen box_shift box_glue_set
@
@<scanning rules@>=
::@=hbox@> :< return HBOX; >:
::@=vbox@> :< return VBOX; >:
::@=shifted@> :< return SHIFTED; >:
@
@<parsing rules@>=@/
box_dimen: dimension dimension dimension @/
{$$= hput_box_dimen($1,$2,$3); };
box_shift: {$$=b000;} @+
| SHIFTED dimension {$$=hput_box_shift($2);};
box_glue_set: {$$=b000;}
| PLUS stretch { $$=hput_box_glue_set(+1,$2.f,$2.o); }
| MINUS stretch { $$=hput_box_glue_set(-1,$2.f,$2.o); };
box: box_dimen box_shift box_glue_set list {$$=$1|$2|$3; };
hbox_node: start HBOX box END { hput_tags($1, TAG(hbox_kind,$3)); };
vbox_node: start VBOX box END { hput_tags($1, TAG(vbox_kind,$3)); };
content_node: hbox_node @+ | vbox_node;
@
\writecode
@<write functions@>=
void hwrite_box(Box *b)
{ hwrite_dimension(b->h);
hwrite_dimension(b->d);
hwrite_dimension(b->w);
if (b->a!=0) { hwritef(" shifted"); @+hwrite_dimension(b->a); @+}
if (b->r!=0.0 && b->s!=0 )@/
{ @+if (b->s>0) @+hwritef(" plus"); @+else @+hwritef(" minus");
@+hwrite_float64(b->r, false); @+hwrite_order(b->o);
}
hwrite_list(&(b->l));
}
@
\getcode
@<cases to get content@>=
@t\1\kern1em@>
case TAG(hbox_kind,b000): {Box b; @+HGET_BOX(b000,b); @+hwrite_box(&b);@+} @+ break;
case TAG(hbox_kind,b001): {Box b; @+HGET_BOX(b001,b); @+hwrite_box(&b);@+} @+ break;
case TAG(hbox_kind,b010): {Box b; @+HGET_BOX(b010,b); @+hwrite_box(&b);@+} @+ break;
case TAG(hbox_kind,b011): {Box b; @+HGET_BOX(b011,b); @+hwrite_box(&b);@+} @+ break;
case TAG(hbox_kind,b100): {Box b; @+HGET_BOX(b100,b); @+hwrite_box(&b);@+} @+ break;
case TAG(hbox_kind,b101): {Box b; @+HGET_BOX(b101,b); @+hwrite_box(&b);@+} @+ break;
case TAG(hbox_kind,b110): {Box b; @+HGET_BOX(b110,b); @+hwrite_box(&b);@+} @+ break;
case TAG(hbox_kind,b111): {Box b; @+HGET_BOX(b111,b); @+hwrite_box(&b);@+} @+ break;
case TAG(vbox_kind,b000): {Box b; @+HGET_BOX(b000,b); @+hwrite_box(&b);@+} @+ break;
case TAG(vbox_kind,b001): {Box b; @+HGET_BOX(b001,b); @+hwrite_box(&b);@+} @+ break;
case TAG(vbox_kind,b010): {Box b; @+HGET_BOX(b010,b); @+hwrite_box(&b);@+} @+ break;
case TAG(vbox_kind,b011): {Box b; @+HGET_BOX(b011,b); @+hwrite_box(&b);@+} @+ break;
case TAG(vbox_kind,b100): {Box b; @+HGET_BOX(b100,b); @+hwrite_box(&b);@+} @+ break;
case TAG(vbox_kind,b101): {Box b; @+HGET_BOX(b101,b); @+hwrite_box(&b);@+} @+ break;
case TAG(vbox_kind,b110): {Box b; @+HGET_BOX(b110,b); @+hwrite_box(&b);@+} @+ break;
case TAG(vbox_kind,b111): {Box b; @+HGET_BOX(b111,b); @+hwrite_box(&b);@+} @+ break;
@
@<get macros@>=
#define @[HGET_BOX(I,B)@] \
HGET32(B.h);\
if ((I)&b001) HGET32(B.d); @+ else B.d=0;\
HGET32(B.w);\
if ((I)&b010) HGET32(B.a); @+else B.a=0;\
if ((I)&b100) @/{ B.r=hget_float32();@+ B.s=HGET8; @+ B.o=B.s&0xF; @+B.s=B.s>>4;@+ }\
else { B.r=0.0;@+ B.o=B.s=0;@+ }\
hget_list(&(B.l));
@
@<get functions@>=
void hget_hbox_node(void)
{ Box b;
@<read the start byte |a|@>@;
if (KIND(a)!=hbox_kind) QUIT("Hbox expected at 0x%x got %s",node_pos,NAME(a));
HGET_BOX(INFO(a),b);@/
@<read and check the end byte |z|@>@;
hwrite_start();@+
hwritef("hbox");@+
hwrite_box(&b);@+
hwrite_end();
}
void hget_vbox_node(void)
{ Box b;
@<read the start byte |a|@>@;
if (KIND(a)!=vbox_kind) QUIT("Vbox expected at 0x%x got %s",node_pos,NAME(a));
HGET_BOX(INFO(a),b);@/
@<read and check the end byte |z|@>@;
hwrite_start();@+
hwritef("vbox");@+
hwrite_box(&b);@+
hwrite_end();
}
@
\putcode
@<put functions@>=
Info hput_box_dimen(Dimen h, Dimen d, Dimen w)
{ Info i;
@+HPUT32(h);
if (d!=0) { HPUT32(d); @+i=b001;@+ } @+else@+ i=b000;
HPUT32(w);
return i;
}
Info hput_box_shift(Dimen a)
{ @+if (a!=0) { @+ HPUT32(a); @+return @+ b010;@+} @+ else @+return b000;
}
Info hput_box_glue_set(int8_t s, float32_t r, Order o)
{ @+if (r!=0.0 && s!=0 )
{ hput_float32(r);@+
HPUT8((s<<4)|o);@+
return b100;@+
}
else return b000;
}
@
\subsection{Extended Boxes}
Hi\TeX\ produces two kinds of extended\index{extended box} horizontal
boxes, |hpack_kind| and |hset_kind|, and the same for vertical boxes
using |vpack_kind| and |vset_kind|. Let us focus on horizontal boxes;
the handling of vertical boxes is completely parallel.
The \\{hpack} procedure of Hi\TeX\ produces an extended box of |hset_kind|
either if it is given an extended\index{extended dimension} dimension as its width
or if it discovers that the width of its content is an extended
dimension. After the final width of the box has been computed in the
viewer, it just remains to set the glue; a very simple operation
indeed.
If the \\{hpack} procedure of Hi\TeX\ can not determine the natural
dimensions of the box content because it contains
paragraphs\index{paragraph} or other extended boxes, it produces a box
of |hpack_kind|. Now the viewer needs to traverse the list of content
nodes to determine the natural\index{natural dimension}
dimensions. Even the amount of stretchability\index{stretchability}
and shrinkability\index{shrinkability} has to be determined in the
viewer. For example, the final stretchability of a paragraph with some
stretchability in the baseline\index{baseline skip} skip will depend
on the number of lines which, in turn, depends on \.{hsize}. It is
not possible to merge these traversals of the box content with the
traversal necessary when displaying the box. The latter needs to
convert glue nodes into positioning instructions which requires a
fixed glue\index{glue ratio} ratio. The computation of the glue ratio,
however, requires a complete traversal of the content.
In the short format of a box node of type |hset_kind|, |vset_kind|,
|hpack_kind|, or |vpack_kind|, the info bit |b100| indicates, if set,
a complete extended dimension, and if unset, a reference to a
predefined extended dimension for the target size; the info bit |b010|
indicates a nonzero shift amount. For a box of type |hset_kind| or
|vset_kind|, the info bit |b001| indicates, if set, a nonzero depth.
For a box of type |hpack_kind| or |vpack_kind|, the info bit |b001|
indicates, if set, an additional target size, and if unset, an exact
target size. For a box of type |vpack_kind| also the maximum depth
is given. If in the long format the maximum depth is omitted, the
value |MAX_DIMEN| is used.
The reference point of a vertical box is usually the reference point of
the last box inside it and multiple vertical boxes are aligned
along this common baseline. Occasionaly, however, we want to align
vertical boxes using the baselines of their first box.
We indicate this alternative setting of the reference point
using the keyword {\tt top} in the long form.
In the short form, we use the fact the the absolut value of any dimension
is less or equal to |MAX_DIMEN| which is equal to |0x3fffffff|. This means
that the two most significant bits are always the same. So a vtop node
can be marked by toggling the second of these bits.
\readcode
@s box_options symbol
@s vbox_dimen symbol
@s hpack symbol
@s vpack symbol
@s box_goal symbol
@s HPACK symbol
@s HSET symbol
@s VPACK symbol
@s VSET symbol
@s TO symbol
@s ADD symbol
@s box_flex symbol
@s vxbox_node symbol
@s hxbox_node symbol
@s DEPTH symbol
@s max_depth symbol
@<symbols@>=
%token HPACK "hpack"
%token HSET "hset"
%token VPACK "vpack"
%token VSET "vset"
%token DEPTH "depth"
%token ADD "add"
%token TO "to"
%type <info> box_options box_goal hpack vpack vbox_dimen
%type <d> max_depth
@
@<scanning rules@>=
::@=hpack@> :< return HPACK; >:
::@=hset@> :< return HSET; >:
::@=vpack@> :< return VPACK; >:
::@=vset@> :< return VSET; >:
::@=add@> :< return ADD; >:
::@=to@> :< return TO; >:
::@=depth@> :< return DEPTH; >:
@
@<parsing rules@>=
box_flex: plus minus { hput_stretch(&($1));hput_stretch(&($2)); };
box_options: box_shift box_flex xdimen_ref list {$$=$1;}
| box_shift box_flex xdimen_node list {$$=$1|b100;};
hxbox_node: start HSET box_dimen box_options END { hput_tags($1, TAG(hset_kind,$3|$4)); };
vbox_dimen: box_dimen
| TOP dimension dimension dimension @/
{$$= hput_box_dimen($2,$3^0x40000000,$4); };
vxbox_node: start VSET vbox_dimen box_options END { hput_tags($1, TAG(vset_kind,$3|$4)); };
box_goal: TO xdimen_ref {$$=b000;}
| ADD xdimen_ref {$$=b001;}
| TO xdimen_node {$$=b100;}
| ADD xdimen_node {$$=b101;};
hpack: box_shift box_goal list {$$=$2;};
hxbox_node: start HPACK hpack END { hput_tags($1, TAG(hpack_kind,$3)); };
max_depth: {$$=MAX_DIMEN;} | MAX DEPTH dimension { $$=$3; };
vpack: max_depth {HPUT32($1);} box_shift box_goal list {$$= $3|$4;}
| TOP max_depth {HPUT32($2^0x40000000);} @/ box_shift box_goal list {$$= $4|$5;};
vxbox_node: start VPACK vpack END { hput_tags($1, TAG(vpack_kind,$3)); };
content_node: vxbox_node | hxbox_node;
@
\getcode
@<cases to get content@>=
@t\1\kern1em@>
case TAG(hset_kind,b000): HGET_SET(hset_kind,b000); @+ break;
case TAG(hset_kind,b001): HGET_SET(hset_kind,b001); @+ break;
case TAG(hset_kind,b010): HGET_SET(hset_kind,b010); @+ break;
case TAG(hset_kind,b011): HGET_SET(hset_kind,b011); @+ break;
case TAG(hset_kind,b100): HGET_SET(hset_kind,b100); @+ break;
case TAG(hset_kind,b101): HGET_SET(hset_kind,b101); @+ break;
case TAG(hset_kind,b110): HGET_SET(hset_kind,b110); @+ break;
case TAG(hset_kind,b111): HGET_SET(hset_kind,b111); @+ break;@#
case TAG(vset_kind,b000): HGET_SET(vset_kind,b000); @+ break;
case TAG(vset_kind,b001): HGET_SET(vset_kind,b001); @+ break;
case TAG(vset_kind,b010): HGET_SET(vset_kind,b010); @+ break;
case TAG(vset_kind,b011): HGET_SET(vset_kind,b011); @+ break;
case TAG(vset_kind,b100): HGET_SET(vset_kind,b100); @+ break;
case TAG(vset_kind,b101): HGET_SET(vset_kind,b101); @+ break;
case TAG(vset_kind,b110): HGET_SET(vset_kind,b110); @+ break;
case TAG(vset_kind,b111): HGET_SET(vset_kind,b111); @+ break;@#
case TAG(hpack_kind,b000): HGET_PACK(hpack_kind,b000); @+ break;
case TAG(hpack_kind,b001): HGET_PACK(hpack_kind,b001); @+ break;
case TAG(hpack_kind,b010): HGET_PACK(hpack_kind,b010); @+ break;
case TAG(hpack_kind,b011): HGET_PACK(hpack_kind,b011); @+ break;
case TAG(hpack_kind,b100): HGET_PACK(hpack_kind,b100); @+ break;
case TAG(hpack_kind,b101): HGET_PACK(hpack_kind,b101); @+ break;
case TAG(hpack_kind,b110): HGET_PACK(hpack_kind,b110); @+ break;
case TAG(hpack_kind,b111): HGET_PACK(hpack_kind,b111); @+ break;@#
case TAG(vpack_kind,b000): HGET_PACK(vpack_kind,b000); @+ break;
case TAG(vpack_kind,b001): HGET_PACK(vpack_kind,b001); @+ break;
case TAG(vpack_kind,b010): HGET_PACK(vpack_kind,b010); @+ break;
case TAG(vpack_kind,b011): HGET_PACK(vpack_kind,b011); @+ break;
case TAG(vpack_kind,b100): HGET_PACK(vpack_kind,b100); @+ break;
case TAG(vpack_kind,b101): HGET_PACK(vpack_kind,b101); @+ break;
case TAG(vpack_kind,b110): HGET_PACK(vpack_kind,b110); @+ break;
case TAG(vpack_kind,b111): HGET_PACK(vpack_kind,b111); @+ break;
@
@<get macros@>=
#define @[HGET_SET(K,I)@] @/\
{ Dimen h,d; @+HGET32(h);\
if ((I)&b001) HGET32(d); @+ else d=0;\
if (K==vset_kind && (d>MAX_DIMEN || d<-MAX_DIMEN)) { hwritef(" top"); d^=0x40000000;}\
hwrite_dimension(h); hwrite_dimension(d); @+}\
{ Dimen w; @+HGET32(w); @+hwrite_dimension(w);@+} \
if ((I)&b010) { Dimen a; @+HGET32(a); hwritef(" shifted"); @+hwrite_dimension(a);@+}\
{ Stretch p; @+HGET_STRETCH(p);@+hwrite_plus(&p);@+}\
{ Stretch m; @+HGET_STRETCH(m);@+hwrite_minus(&m);@+}\
if ((I)&b100) {Xdimen x;@+ hget_xdimen_node(&x); @+hwrite_xdimen_node(&x);@+} else HGET_REF(xdimen_kind);\
{ List l; @+hget_list(&l);@+ hwrite_list(&l); @+}
@#
#define @[HGET_PACK(K,I)@] @/\
if (K==vpack_kind) {@+Dimen d; HGET32(d); \
if (d>MAX_DIMEN || d<-MAX_DIMEN) { hwritef(" top"); d^=0x40000000;}\
if (d!=MAX_DIMEN) {hwritef(" max depth");@+hwrite_dimension(d);}} \
if ((I)&b010)@+{@+Dimen s; HGET32(s); hwritef(" shifted"); @+hwrite_dimension(s); }\
if ((I)&b001) hwritef(" add");@+ else hwritef(" to");\
if ((I)&b100) {Xdimen x;@+ hget_xdimen_node(&x);@+hwrite_xdimen_node(&x);@+}\
else @+HGET_REF(xdimen_kind);\
{ List l; @+hget_list(&l);@+ hwrite_list(&l); @+}
@
\subsection{Leaders}\label{leaders}
Leaders\index{leaders} are a special type of glue that is best explained by a few
examples.
Where as ordinary glue fills its designated space with \hfil\ whiteness,\break
leaders fill their designated space with either a rule \xleaders\hrule\hfil\ or\break
some sort of repeated\leaders\hbox to 15pt{$\hss.\hss$}\hfil content.\break
In multiple leaders, the dots\leaders\hbox to 15pt{$\hss.\hss$}\hfil are usually aligned\index{alignment} across lines,\break
as in the last\leaders\hbox to 15pt{$\hss.\hss$}\hfil three lines.\break
Unless you specify centered\index{centered}\cleaders\hbox to 15pt{$\hss.\hss$}\hfil leaders\break
or you specify expanded\index{expanded}\xleaders\hbox to 15pt{$\hss.\hss$}\hfil leaders.\break
The former pack the repeated content tight and center
the repeated content in the available space, the latter distributes
the extra space between all the repeated instances.
In the short format, the two lowest info bits store the type
of leaders: 1 for aligned, 2 for centered, and 3 for expanded.
The |b100| info bit is usually set and only zero in the unlikely
case that the glue is zero and therefore not present.
\readcode
@s LEADERS symbol
@s ALIGN symbol
@s CENTER symbol
@s EXPAND symbol
@s leaders symbol
@s ltype symbol
@<symbols@>=
%token LEADERS "leaders"
%token ALIGN "align"
%token CENTER "center"
%token EXPAND "expand"
%type <info> leaders
%type <info> ltype
@
@<scanning rules@>=
::@=leaders@> :< return LEADERS; >:
::@=align@> :< return ALIGN; >:
::@=center@> :< return CENTER; >:
::@=expand@> :< return EXPAND; >:
@
@<parsing rules@>=
ltype: {$$=1;} | ALIGN {$$=1;} @+| CENTER {$$=2;} @+| EXPAND {$$=3;};
leaders: glue_node ltype rule_node {@+if ($1) $$=$2|b100;@+else $$=$2; @+}
| glue_node ltype hbox_node {@+if ($1) $$=$2|b100;@+else $$=$2;@+}
| glue_node ltype vbox_node {@+if ($1) $$=$2|b100;@+else $$=$2;@+};
content_node: start LEADERS leaders END @| {@+ hput_tags($1, TAG(leaders_kind, $3));}
@
\writecode
@<write functions@>=
void hwrite_leaders_type(int t)
{@+
if (t==2) hwritef(" center");
else if (t==3) hwritef(" expand");
}
@
\getcode
@<cases to get content@>=
@t\1\kern1em@>
case TAG(leaders_kind,1): @+ HGET_LEADERS(1); @+break;
case TAG(leaders_kind,2): @+ HGET_LEADERS(2); @+break;
case TAG(leaders_kind,3): @+ HGET_LEADERS(3); @+break;
case TAG(leaders_kind,b100|1): @+ HGET_LEADERS(b100|1); @+break;
case TAG(leaders_kind,b100|2): @+ HGET_LEADERS(b100|2); @+break;
case TAG(leaders_kind,b100|3): @+ HGET_LEADERS(b100|3); @+break;
@
@<get macros@>=
#define @[HGET_LEADERS(I)@]@/ \
if ((I)&b100) hget_glue_node();\
hwrite_leaders_type((I)&b011);\
if (KIND(*hpos)==rule_kind) hget_rule_node(); \
else if (KIND(*hpos)==hbox_kind) hget_hbox_node(); \
else hget_vbox_node();
@
\subsection{Baseline Skips}
Baseline\index{baseline skip} skips are small amounts of glue inserted
between two consecutive lines of text. To get nice looking pages, the
amount of glue\index{glue} inserted must take into account the depth
of the line above the glue and the height of the line below the glue
to achieve a constant distance of the baselines. For example, if we
have the lines
\medskip
\qquad\vbox{\hsize=0.5\hsize\noindent
``There is no\hfil\break
more gas\hfil\break
in the tank.''
}\hss
\medskip\noindent
\TeX\ will insert 7.69446pt of baseline skip between the first and the
second line and 3.11111pt of baseline skip between the second and the
third line. This is due to the fact that the first line has no
descenders, its depth is zero, the second line has no ascenders but
the ``g'' descends below the baseline, and the third line has
ascenders (``t'', ``h'',\dots) so it is higher than the second line.
\TeX's choice of baseline skips ensures that the baselines are exactly
12pt apart in both cases.
Things get more complicated if the text contains mathematical formulas because then
a line can get so high or deep that it is impossible to keep the distance between
baselines constant without two adjacent lines touching each other. In such cases,
\TeX\ will insert a small minimum line skip glue\index{line skip glue}.
For the whole computation, \TeX\ uses three parameters: {\tt base\-line\-skip},
{\tt line\-skip\-limit},\index{line skip limit} and
{\tt lineskip}. {\tt baselineskip} is a glue value; its size is the
normal distance of two baselines. \TeX\ adjusts the size of the
{\tt baselineskip} glue for the height and the depth of the two lines and
then checks the result against {\tt lineskiplimit}. If the result is
smaller than {\tt lineskiplimit} it will use the {\tt lineskip} glue
instead.
Because the depth and the height of lines depend on the outcome
of the line breaking\index{line breaking}
routine, baseline computations must be done in the viewer.
The situation gets even more complicated because \TeX\ can manipulate the insertion
of baseline skips in various ways. Therefore \HINT\ requires the insertion of
baseline nodes wherever the viewer is supposed to perform a baseline skip
computation.
In the short format of a baseline definition, we store only
the nonzero components and use the
info bits to mark them: |b100| implies $|bs|\ne0$,
|b010| implies $|ls|\ne 0$, and |b001| implies $|lslimit|\ne 0$.
If the baseline has only zero components, we put a reference to baseline number 0
in the output.
@<hint basic types@>=
typedef struct {@+
Glue bs, ls;@+
Dimen lsl;@+
} Baseline;
@
\readcode
@s BASELINE symbol
@s baseline symbol
@<symbols@>=
%token BASELINE "baseline"
%type <info> baseline
@
@<scanning rules@>=
::@=baseline@> :< return BASELINE; >:
@
@<parsing rules@>=
baseline: dimension { if ($1!=0) HPUT32($1); }
glue_node glue_node @/{ $$=b000; if ($1!=0) $$|=b001;
if ($3) $$|=b100;
if ($4) $$|=b010;
@+};
content_node: start BASELINE baseline END @/
{ @+if ($3==b000) HPUT8(0); @+hput_tags($1,TAG(baseline_kind, $3)); };
@
\getcode
@<cases to get content@>=
@t\1\kern1em@>
case TAG(baseline_kind,b001): { Baseline b;@+ HGET_BASELINE(b001,b);@+ }@+break;
case TAG(baseline_kind,b010): { Baseline b;@+ HGET_BASELINE(b010,b);@+ }@+break;
case TAG(baseline_kind,b011): { Baseline b;@+ HGET_BASELINE(b011,b);@+ }@+break;
case TAG(baseline_kind,b100): { Baseline b;@+ HGET_BASELINE(b100,b);@+ }@+break;
case TAG(baseline_kind,b101): { Baseline b;@+ HGET_BASELINE(b101,b);@+ }@+break;
case TAG(baseline_kind,b110): { Baseline b;@+ HGET_BASELINE(b110,b);@+ }@+break;
case TAG(baseline_kind,b111): { Baseline b;@+ HGET_BASELINE(b111,b);@+ }@+break;
@
@<get macros@>=
#define @[HGET_BASELINE(I,B)@] \
if((I)&b001) HGET32((B).lsl); @+else B.lsl=0; hwrite_dimension(B.lsl);\
if((I)&b100) hget_glue_node(); \
else {B.bs.p.o=B.bs.m.o=B.bs.w.w=0; @+B.bs.w.h=B.bs.w.v=B.bs.p.f=B.bs.m.f=0.0; @+hwrite_glue_node(&(B.bs));@+}\
if((I)&b010) hget_glue_node(); \
else {B.ls.p.o=B.ls.m.o=B.ls.w.w=0; @+B.ls.w.h=B.ls.w.v=B.ls.p.f=B.ls.m.f=0.0; @+hwrite_glue_node(&(B.ls));@+}
@
\putcode
@<put functions@>=
Tag hput_baseline(Baseline *b)
{ Info info=b000;
if (!ZERO_GLUE(b->bs)) @+info|=b100;
if (!ZERO_GLUE(b->ls)) @+ info|=b010;
if (b->lsl!=0) { @+ HPUT32(b->lsl); @+info|=b001; @+}
return TAG(baseline_kind,info);
}
@
\subsection{Ligatures}
Ligatures\index{ligature} occur only in horizontal lists. They specify characters
that combine the glyphs of several characters into one specialized
glyph. For example in the word ``{\it difficult\/}'' the three letters
``{\it f{}f{}i\/}'' are combined into the ligature ``{\it ffi\/}''.
Hence, a ligature is very similar to a simple glyph node; the
characters that got replaced are, however, retained in the ligature
because they might be needed for example to support searching. Since
ligatures are therefore only specialized list of characters and since
we have a very efficient way to store such lists of characters, namely
as a |text|, input and output of ligatures is quite simple.
The info value zero is reserved for references to a ligature. If the
info value is between 1 and 6, it gives the number of bytes used to encode
the characters in UTF8. Note that a ligature will always include a
glyph byte, so the minimum size is 1. A typical ligature like ``{\it fi\/}''
will need 3 byte: the ligature character ``{\it fi\/}'', and
the replacement characters ``f'' and ''i''. More byte might be
required if the character codes exceed |0x7F| since we use the UTF8
encoding scheme for larger character codes. If the info value is 7,
a full text node follows the font byte. In the long
format, we give the font, the character code, and then the replacement
characters represented as a text.
@<hint types@>=
typedef struct{@+uint8_t f; @+List l;@+} Lig;
@
\readcode
@s ref symbol
@s LIGATURE symbol
@s ligature symbol
@s cc_list symbol
@s lig_cc symbol
@<symbols@>=
%token LIGATURE "ligature"
%type <u> lig_cc
%type <lg> ligature
%type <u> ref
@
@<scanning rules@>=
::@=ligature@> :< return LIGATURE; >:
@
@<parsing rules@>=@/
cc_list:@+ | cc_list TXT_CC { hput_utf8($2); };
lig_cc: UNSIGNED {RNG("UTF-8 code",$1,0,0x1FFFFF);$$=hpos-hstart; hput_utf8($1); };
lig_cc: CHARCODE {$$=hpos-hstart; hput_utf8($1); };
ref: REFERENCE { HPUT8($1); $$=$1; };
ligature: ref { REF(font_kind,$1);} lig_cc TXT_START cc_list TXT_END @/
{ $$.f=$1; $$.l.p=$3; $$.l.s=(hpos-hstart)-$3;
RNG("Ligature size",$$.l.s,0,255);};
content_node: start LIGATURE ligature END {hput_tags($1,hput_ligature(&($3)));};
@
\writecode
@<write functions@>=
void hwrite_ligature(Lig *l)
{ uint32_t pos=hpos-hstart;
hwrite_ref(l->f);
hpos=l->l.p+hstart;
hwrite_charcode(hget_utf8());
hwritef(" \"");
while (hpos<hstart+l->l.p+l->l.s)
hwrite_txt_cc(hget_utf8());
hwritec('"');
hpos=hstart+pos;
}
@
\getcode
@<cases to get content@>=
@t\1\kern1em@>
case TAG(ligature_kind,1):@+ {Lig l; @+HGET_LIG(1,l);@+} @+break;
case TAG(ligature_kind,2):@+ {Lig l; @+HGET_LIG(2,l);@+} @+break;
case TAG(ligature_kind,3):@+ {Lig l; @+HGET_LIG(3,l);@+} @+break;
case TAG(ligature_kind,4):@+ {Lig l; @+HGET_LIG(4,l);@+} @+break;
case TAG(ligature_kind,5):@+ {Lig l; @+HGET_LIG(5,l);@+} @+break;
case TAG(ligature_kind,6):@+ {Lig l; @+HGET_LIG(6,l);@+} @+break;
case TAG(ligature_kind,7):@+ {Lig l; @+HGET_LIG(7,l);@+} @+break;
@
@<get macros@>=
#define @[HGET_LIG(I,L)@] @/\
(L).f=HGET8;REF(font_kind,(L).f);\
if ((I)==7) hget_list(&((L).l)); \
else { (L).l.s=(I); (L).l.p=hpos-hstart; @+ hpos+=(L).l.s;} \
hwrite_ligature(&(L));
@
\putcode
@<put functions@>=
Tag hput_ligature(Lig *l)
{ @+if (l->l.s < 7) return TAG(ligature_kind,l->l.s);
else@/
{ uint32_t pos=l->l.p;
l->l.t=TAG(list_kind,b100);
hput_tags(pos,hput_list(pos+1, &(l->l)));
return TAG(ligature_kind,7);
}
}
@
\subsection{Discretionary breaks}\label{discbreak}\index{discretionary break}
\HINT\ is capable to break lines into paragraphs. It does this
primarily at interword spaces but it might also break a line in the
middle of a word if it finds a discretionary\index{discretionary break}
line break there. These discretionary breaks are usually
provided by an automatic hyphenation algorithm but they might be also
explicitly\index{explicit} inserted by the author of a
document.
When a line break occurs at such a discretionary break, the line
before the break ends with a |pre_break| list of nodes, the line after
the break starts with a |post_break| list of nodes, and the next
|replace_count| nodes after the discretionary break will be
ignored. Both lists must consist entirely of glyphs\index{glyph},
kerns\index{kern}, boxes\index{box}, rules\index{rule}, or
ligatures\index{ligature}. For example, an ordinary discretionary
break will have a |pre_break| list containing ``-'', an empty
|post_break| list, and a |replace_count| of zero.
The long format starts with an optional ``{\tt !}'', indicating an
explicit discretionary break, followed by the replace-count.
Then comes the pre-break list followed by the post-break list.
The replace-count can be omitted if it is zero;
an empty post-break list may be omitted as well.
Both list may be omitted only if both are empty.
In the short format, the three components of a disc node are stored
in this order: |replace_count|, |pre_break| list, and |post_break| list.
The |b100| bit in the info value indicates the presence of a replace-count,
the |b010| bit the presence of a |pre_break| list,
and the |b001| bit the presence of a |post_break| list.
Since the info value |b000| is reserved for references, at least one
of these must be specified; so we represent a node with empty lists
and a replace\index{replace count} count of zero using the info value
|b100| and a zero byte for the replace count.
Replace counts must be in the range 0 to 31; so the short format can
set the high bit of the replace count to indicate an explicit\index{explicit} break.
@<hint types@>=
typedef struct@+ {@+ bool x; @+List p,q;@+ uint8_t r;@+ } Disc;
@
\readcode
@s DISC symbol
@s disc symbol
@s disc_node symbol
@s replace_count symbol
@<symbols@>=
%token DISC "disc"
%type <dc> disc
%type <u> replace_count
@
@<scanning rules@>=
::@=disc@> :< return DISC; >:
@
@<parsing rules@>=@/
replace_count: explicit {@+ if ($1) {$$=0x80; HPUT8(0x80);@+}@+ else $$=0x00;@+}
| explicit UNSIGNED { RNG("Replace count",$2,0,31);
$$=($2)|(($1)?0x80:0x00); @+ if ($$!=0) HPUT8($$);@+};
disc: replace_count list list { $$.r=$1;$$.p=$2; $$.q=$3;
if ($3.s==0) { hpos=hpos-3;@+ if ($2.s==0) hpos=hpos-3; @+}@+}
| replace_count list { $$.r=$1;$$.p=$2; if ($2.s==0) hpos=hpos-3;@+ $$.q.s=0; }
| replace_count { $$.r=$1;$$.p.s=0; $$.q.s=0; };
disc_node: start DISC disc END
{hput_tags($1,hput_disc(&($3)));};
content_node: disc_node;
@
\writecode
@<write functions@>=
void hwrite_disc(Disc *h)
{ @+hwrite_explicit(h->x);
if (h->r!=0) hwritef(" %d",h->r);
if (h->p.s!=0 || h->q.s!=0) hwrite_list(&(h->p));
if (h->q.s!=0) hwrite_list(&(h->q));
}
void hwrite_disc_node(Disc *h)
{ @+ hwrite_start(); @+hwritef("disc"); @+ hwrite_disc(h); @+hwrite_end();}
@
\getcode
@<cases to get content@>=
@t\1\kern1em@>
case TAG(disc_kind,b001): {Disc h; @+HGET_DISC(b001,h);@+ hwrite_disc(&h); @+} @+break;
case TAG(disc_kind,b010): {Disc h; @+HGET_DISC(b010,h);@+ hwrite_disc(&h); @+} @+break;
case TAG(disc_kind,b011): {Disc h; @+HGET_DISC(b011,h);@+ hwrite_disc(&h); @+} @+break;
case TAG(disc_kind,b100): {Disc h; @+HGET_DISC(b100,h);@+ hwrite_disc(&h); @+} @+break;
case TAG(disc_kind,b101): {Disc h; @+HGET_DISC(b101,h);@+ hwrite_disc(&h); @+} @+break;
case TAG(disc_kind,b110): {Disc h; @+HGET_DISC(b110,h);@+ hwrite_disc(&h); @+} @+break;
case TAG(disc_kind,b111): {Disc h; @+HGET_DISC(b111,h);@+ hwrite_disc(&h); @+} @+break;
@
@<get macros@>=
#define @[HGET_DISC(I,Y)@]\
if ((I)&b100) {uint8_t r=HGET8; (Y).r=r&0x7F; @+ RNG("Replace count",(Y).r,0,31); @+(Y).x=(r&0x80)!=0; @+}\
@+else { (Y).r=0; @+ (Y).x=false;@+}\
if ((I)&b010) hget_list(&((Y).p)); else { (Y).p.p=hpos-hstart; @+(Y).p.s=0; @+(Y).p.t=TAG(list_kind,b000); @+}\
if ((I)&b001) hget_list(&((Y).q)); else { (Y).q.p=hpos-hstart; @+(Y).q.s=0; @+(Y).q.t=TAG(list_kind,b000); @+}
@
@<get functions@>=
void hget_disc_node(Disc *h)
{ @<read the start byte |a|@>@;
if (KIND(a)!=disc_kind || INFO(a)==b000)
QUIT("Hyphen expected at 0x%x got %s,%d",node_pos,NAME(a),INFO(a));
HGET_DISC(INFO(a),*h);
@<read and check the end byte |z|@>@;
}
@
When |hput_disc| is called, the node is already written to the output,
but empty lists might have been deleted, and the info value needs to be determined.
Because the info value |b000| is reserved for references, a zero reference
count is written to avoid this case.
\putcode
@<put functions@>=
Tag hput_disc(Disc *h)
{ Info info=b000;
if (h->r!=0) info|=b100;
if (h->q.s!=0) info|=b011;
else if (h->p.s!=0) info|=b010;
if (info==b000) { @+info|=b100; @+HPUT8(0);@+}
return TAG(disc_kind,info);
}
@
\subsection{Paragraphs}
The most important procedure that the \HINT\ viewer inherits from \TeX\ is the
line breaking routine. If the horizontal size of the paragraph is not known,
breaking the paragraph\index{paragraph} into lines must be postponed and this is done by creating
a paragraph node. The paragraph node must contain all information that \TeX's
line breaking\index{line breaking} algorithm needs to do its job.
Besides the horizontal list describing the content of the paragraph and
the extended dimension describing the horizontal size,
this is the set of parameters that guide the line breaking algorithm:
\itemize
\item
Integer parameters:\hfill\break
{\tt pretolerance} (badness tolerance before hyphenation),\hfill\break
{\tt tolerance} (badness tolerance after hyphenation),\hfill\break
{\tt line\_penalty} (added to the badness of every line, increase to get fewer lines),\hfill\break
{\tt hy\-phen\_pe\-nal\-ty} (penalty for break after hyphenation break),\hfill\break
{\tt ex\_hy\-phen\_pe\-nal\-ty} (penalty for break after explicit\index{explicit} break),\hfill\break
{\tt doub\-le\_hy\-phen\_de\-merits} (demerits for double hyphen break),\hfill\break
{\tt final\_hyphen\_de\-me\-rits} (demerits for final hyphen break),\hfill\break
{\tt adj\_demerits} (demerits for adjacent incompatible lines),\hfill\break
{\tt looseness} (make the paragraph that many lines longer than its optimal size),\hfill\break
{\tt inter\_line\_penalty} (additional penalty between lines),\hfill\break
{\tt club\_pe\-nal\-ty} (penalty for creating a club line),\hfill\break
{\tt widow\_penalty} (penalty for creating a widow line),\hfill\break
{\tt display\_widow\_penalty} (ditto, just before a display),\hfill\break
{\tt bro\-ken\_pe\-nal\-ty} (penalty for breaking a page at a broken line),\hfill\break
{\tt hang\_af\-ter} (start/end hanging indentation at this line).
\item
Dimension parameters:\hfill\break
{\tt line\_skip\_limit} (threshold for {\tt line\_skip} instead of {\tt base\-line\_skip}),\hfill\break
{\tt hang\_in\-dent} (amount of hanging indentation),\hfill\break
{\tt emergency\_stretch} (stretchability added to every line in the final pass of line breaking).
\item
Glue parameters:\hfill\break
{\tt baseline\_skip} (desired glue between baselines),\hfill\break
{\tt line\_skip} (interline glue if {\tt baseline\_skip} is infeasible),\hfill\break
{\tt left\_skip} (glue at left of justified lines),\hfill\break
{\tt right\_skip} (glue at right of justified lines),\hfill\break
{\tt par\_fill\_skip} (glue on last line of paragraph).
\enditemize
For a detailed explanation of these parameters and how they influence
line breaking, you should consult the {\TeX}book\cite{DK:texbook};
\TeX's {\tt parshape} feature is currently not implemented. There are
default values for all of these parameters (see section~\secref{defaults}),
and therefore it might not be necessary to specify any of them.
Any local adjustments are contained in a list of
parameters contained in the paragraph node.
A further complication arises from displayed\index{displayed formula} formulas
that interrupt a paragraph. Such displays are described in the next
section.
To summarize, a paragraph node in the long format specifies an
extended dimension, a parameter list,
and a node list. The extended dimension is given either as an
|xdimen| node (info bit |b100|) or as a reference; similarly the parameter list
can be embedded in the node (info bit |b010|) or again it is given by a reference.
\readcode
@s PAR symbol
@s par symbol
@s xdimen_ref symbol
@s param_ref symbol
@s par_dimen symbol
@<symbols@>=
%token PAR "par"
%type <info> par
@
@<scanning rules@>=
::@=par@> :< return PAR; >:
@
The following parsing rules are slightly more complicated than I would
like them to be, but it seems more important to achieve a regular
layout of the short format nodes where all sub nodes are located at
the end of a node. In this case, I want to put a |param_ref| before
an |xdimen| node, but otherwise have the |xdimen_ref| before a
|param_list|. The |par_dimen| rule is introduced only to avoid a
reduce/reduce conflict in the parser.
@<parsing rules@>=
par_dimen: xdimen { hput_xdimen_node(&($1)); };
par: xdimen_ref param_ref list {$$=b000;}
| xdimen_ref param_list list { $$=b010;}
| xdimen param_ref { hput_xdimen_node(&($1)); } list { $$=b100;}
| par_dimen param_list list { $$=b110;};
content_node: start PAR par END { hput_tags($1,TAG(par_kind,$3));};
@
\getcode
@<cases to get content@>=
@t\1\kern1em@>
case TAG(par_kind,b000): @+HGET_PAR(b000);@+break;
case TAG(par_kind,b010): @+HGET_PAR(b010);@+break;
case TAG(par_kind,b100): @+HGET_PAR(b100);@+break;
case TAG(par_kind,b110): @+HGET_PAR(b110);@+break;
@
@<get macros@>=
#define @[HGET_PAR(I)@] @/\
{ uint8_t n;\
if ((I)==b100) {n=HGET8; @+REF(param_kind,n);@+}\
if ((I)&b100) {Xdimen x; @+hget_xdimen_node(&x); @+hwrite_xdimen(&x);@+} else HGET_REF(xdimen_kind);\
if ((I)&b010) { List l; @+hget_param_list(&l); @+hwrite_param_list(&l); @+} \
else if ((I)!=b100) HGET_REF(param_kind)@; else hwrite_ref(n);\
{ List l; @+hget_list(&l);@+ hwrite_list(&l); @+}}
@
\subsection{Mathematics}\index{Mathematics}\index{displayed formula}
\gdef\subcodetitle{Displayed Math}
Being able to handle mathematics\index{mathematics} nicely is one
of the primary features of \TeX\ and
so you should expect the same from \HINT.
We start here with the more complex case---displayed equations---and finish with the
simpler case of mathematical formulas that are part of the normal flow of text.
Displayed equations occur inside a paragraph\index{paragraph}
node. They interrupt normal processing of the paragraph and the
paragraph processing is resumed after the display. Positioning of the
display depends on several parameters, the shape of the paragraph, and
the length of the last line preceding the display. Displayed formulas
often feature an equation number which can be placed either left or
right of the formula. Also the size of the equation number will
influence the placement of the formula.
In a \HINT\ file, the parameter list is followed by a list of content
nodes, representing the formula, and an optional horizontal box
containing the equation number.
In the short format, we use the info bit |b100| to indicate the
presence of a parameter list (which might be empty---so it's actually the absence of a
reference to a parameter list); the info bit |b010| to indicate the presence of
a left equation number; and the info bit |b001| for a right
equation\index{equation number} number.
In the long format, we use ``{\tt eqno}'' or ``{\tt left eqno}'' to indicate presence and
placement of the equation number.
\readcode
@s MATH symbol
@s math symbol
@<symbols@>=
%token MATH "math"
%type <info> math
@
@<scanning rules@>=
::@=math@> :< return MATH; >:
@
@<parsing rules@>=
math: param_ref list {$$=b000;}
| param_ref list hbox_node {$$=b001;}
| param_ref hbox_node list {$$=b010;}
| param_list list {$$=b100;}
| param_list list hbox_node {$$=b101;}
| param_list hbox_node list {$$=b110;};
content_node: start MATH math END @/{ hput_tags($1,TAG(math_kind,$3));};
@
\getcode
@<cases to get content@>=
@t\1\kern1em@>
case TAG(math_kind,b000): HGET_MATH(b000); @+ break;
case TAG(math_kind,b001): HGET_MATH(b001); @+ break;
case TAG(math_kind,b010): HGET_MATH(b010); @+ break;
case TAG(math_kind,b100): HGET_MATH(b100); @+ break;
case TAG(math_kind,b101): HGET_MATH(b101); @+ break;
case TAG(math_kind,b110): HGET_MATH(b110); @+ break;
@
@<get macros@>=
#define @[HGET_MATH(I)@] \
if ((I)&b100) { List l; @+hget_param_list(&l); @+hwrite_param_list(&l); @+} \
else HGET_REF(param_kind);\
if ((I)&b010) hget_hbox_node(); \
{ List l; @+hget_list(&l);@+ hwrite_list(&l); @+} \
if ((I)&b001) hget_hbox_node();
@
\gdef\subcodetitle{Text Math}
Things are much simpler if mathematical formulas are embedded in regular text.
Here it is just necessary to mark the beginning and the end of the formula
because glue inside a formula is not a possible point for a line break.
To break the line within a formula you can insert a penalty node.
In the long format, such a simple math node just consists of the keyword ``on''
or ``off''. In the short format, there are two info values still unassigned:
we use |b011| for ``off'' and |b111| for ``on''.
\readcode
@s ON symbol
@s OFF symbol
@s on_off symbol
@<symbols@>=
%token ON "on"
%token OFF "off"
%type <i> on_off
@
@<scanning rules@>=
::@=on@> :< return ON; >:
::@=off@> :< return OFF; >:
@
@<parsing rules@>=
on_off: ON {$$=1;} | OFF {$$=0;};
math: on_off { $$=b011|($1<<2); };
@
\getcode
@<cases to get content@>=
@t\1\kern1em@>
case TAG(math_kind,b111): hwritef(" on");@+break;
case TAG(math_kind,b011): hwritef(" off");@+break;
@
Note that \TeX\ allows math nodes to specify a width using the current value of
mathsurround. If this width is nonzero, it is equivalent to inserting a
kern node before the math on node and after the math off node.
\subsection{Adjustments}\label{adjust}
An adjustment\index{adjustment} occurs only in paragraphs\index{paragraph}.
When the line breaking\index{line breaking} routine finds an adjustment, it inserts
the vertical material contained in the adjustment node right after the current line.
Adjustments simply contain a list node.
\vbox{\readcode\vskip -\baselineskip\putcode}
@s ADJUST symbol
@<symbols@>=
%token ADJUST "adjust"
@
@<scanning rules@>=
::@=adjust@> :< return ADJUST; >:
@
@<parsing rules@>=
content_node: start ADJUST list END { hput_tags($1,TAG(adjust_kind,1));};
@
\vbox{\getcode\vskip -\baselineskip\writecode}
@<cases to get content@>=
@t\1\kern1em@>
case TAG(adjust_kind,1):@+ { List l;@+hget_list(&l); @+ hwrite_list(&l); @+} @+ break;
@
\subsection{Tables}\index{alignment}
As long as a table contains no dependencies on \.{hsize} and \.{vsize},
Hi\TeX\ can expand an alignment into a set of nested horizontal and
vertical boxes and no special processing is required.
As long as only the size of the table itself but neither the tabskip
glues nor the table content depends on \.{hsize} or \.{vsize}, the table
just needs an outer node of type |hset_kind| or |vset_kind|. If there
is non aligned material inside the table that depends on \.{hsize} or
\.{vsize}, a vpack or hpack node is still sufficient.
While it is reasonable to restrict the tabskip glues to be ordinary
glue values without \.{hsize} or \.{vsize} dependencies, it might be
desirable to have content in the table that does depend on \.{hsize} or
\.{vsize}. For the latter case, we need a special kind of table
node. Here is why:
As soon as the dimension of an item in the table is an extended
dimension, it is no longer possible to compute the maximum natural with
of a column, because it is not possible to compare extended dimensions
without knowing \.{hsize} and \.{vsize}. Hence the computation of maximum
widths needs to be done in the viewer. After knowing the width of the columns,
the setting of tabskip glues is easy to compute.
To implement these extended tables, we will need a table node that
specifies a direction, either horizontal or vertical; a list of
tabskip glues, with the provision that the last tabskip glue in the
list is repeated as long as necessary; and a list of table content.
The table's content is stacked, either vertical or
horizontal, orthogonal to the alignment direction of the table.
The table's content consists of nonaligned content, for example extra glue
or rules, and aligned content.
Each element of aligned content
is called an outer item and it consist of a list of inner items.
For example in a horizontal alignment, each row is an outer item
and each table entry in that row is an inner item.
An inner item contains a box node (of kind |hbox_kind|, |vbox_kind|,
|hset_kind|, |vset_kind|, |hpack_kind|, or |vpack_kind|) followed by
an optional span count.
The glue of the boxes in the inner items will be reset so that all boxes in the same
column reach the same maximum column with. The span counts will be replaced by
the appropriate amount of empty boxes and tabskip glues. Finally the
glue in the outer item will be set to obtain the desired size
of the table.
The definitions below specify just a |list| for the list of tabskip glues and a
list for the outer table items.
This is just for convenience; the first list must contain glue
nodes and the second list must contain nonaligned content and inner item nodes.
We reuse the |H| and |V| tokens, defined as part of the specification
of extended dimensions, to indicate the alignment direction of the
table. To tell a reference to an extended dimension from a reference
to an ordinary dimension, we prefix the former with an |XDIMEN| token;
for the latter, the |DIMEN| token is optional. The scanner will
recognize not only ``item'' as an |ITEM| token but also ``row'' and
''column''. This allows a more readable notation, for example by
marking the outer items as rows and the inner items as columns.
In the short format, the |b010| bit is used to mark a vertical table
and the |b101| bits indicate how the table size is specified; an outer
item node has the info value |b000|, an inner item node with info
value |b111| contains an extra byte for the span count, otherwise the
info value is equal to the span count.
\readcode
@s TABLE symbol
@s ITEM symbol
@s table symbol
@s span_count symbol
@<symbols@>=
%token TABLE "table"
%token ITEM "item"
%type <info> table span_count
@
@<scanning rules@>=
::@=table@> :< return TABLE; >:
::@=item@> :< return ITEM; >:
::@=row@> :< return ITEM; >:
::@=column@> :< return ITEM; >:
@
@<parsing rules@>=
span_count: UNSIGNED { $$=hput_span_count($1); };
content_node: start ITEM content_node END { hput_tags($1,TAG(item_kind,1)); };
content_node: start ITEM span_count content_node END {@+ hput_tags($1,TAG(item_kind,$3));};
content_node: start ITEM list END { hput_tags($1,TAG(item_kind,b000));};
table: H box_goal list list {$$=$2;};
table: V box_goal list list {$$=$2|b010;};
content_node: start TABLE table END { hput_tags($1,TAG(table_kind,$3));};
@
\getcode
@<cases to get content@>=
@t\1\kern1em@>
case TAG(table_kind,b000): @+ HGET_TABLE(b000); @+ break;
case TAG(table_kind,b001): @+ HGET_TABLE(b001); @+ break;
case TAG(table_kind,b010): @+ HGET_TABLE(b010); @+ break;
case TAG(table_kind,b011): @+ HGET_TABLE(b011); @+ break;
case TAG(table_kind,b100): @+ HGET_TABLE(b100); @+ break;
case TAG(table_kind,b101): @+ HGET_TABLE(b101); @+ break;
case TAG(table_kind,b110): @+ HGET_TABLE(b110); @+ break;
case TAG(table_kind,b111): @+ HGET_TABLE(b111); @+ break;@#
case TAG(item_kind,b000): @+{@+ List l;@+ hget_list(&l);@+ hwrite_list(&l);@+ } @+ break;
case TAG(item_kind,b001): hget_content_node(); @+ break;
case TAG(item_kind,b010): hwritef(" 2");@+hget_content_node(); @+ break;
case TAG(item_kind,b011): hwritef(" 3");@+hget_content_node(); @+ break;
case TAG(item_kind,b100): hwritef(" 4");@+hget_content_node(); @+ break;
case TAG(item_kind,b101): hwritef(" 5");@+hget_content_node(); @+ break;
case TAG(item_kind,b110): hwritef(" 6");@+hget_content_node(); @+ break;
case TAG(item_kind,b111): hwritef(" %u",HGET8);@+hget_content_node(); @+ break;
@
@<get macros@>=
#define @[HGET_TABLE(I)@] \
if(I&b010) hwritef(" v"); @+else hwritef(" h"); \
if ((I)&b001) hwritef(" add");@+ else hwritef(" to");\
if ((I)&b100) {Xdimen x; hget_xdimen_node(&x); @+hwrite_xdimen_node(&x);@+} else HGET_REF(xdimen_kind)@;\
{@+ List l; @+hget_list(&l);@+ hwrite_list(&l);@+ } /* tabskip */ \
{@+ List l; @+hget_list(&l);@+ hwrite_list(&l);@+ } /* items */
@
\putcode
@<put functions@>=
Info hput_span_count(uint32_t n)
{ if (n==0) QUIT("Span count in item must not be zero");
else if (n<7) return n;
else if (n>0xFF) QUIT("Span count %d must be less than 255",n);
else
{ HPUT8(n); return 7; }
}
@
\section{Extensions}\hascode
\subsection{Images}
In the first implementation attempt, images behaved pretty much
like glue\index{glue}. They could stretch (or shrink) together with
the surrounding glue to fill a horizontal or vertical box. While I
thought this would be in line with \TeX's concepts, it proved to be a
bad decission because images, as opposed to glue, would stretch or
shrink horizontally {\it and} vertically at the same time.
This would require a two pass algorithm to pack boxes: first to
determine the glue setting and a second pass to determine the proper
image dimensions. Otherwise incorrect width or height values would
propagate all the way through a sequence of nested boxes. Even worse
so, this two pass algorithm would be needed in the viewer if images
were contained in boxes that had extended dimensions.
The new design described below allows images with extended dimensions.
This covers the case of stretchable or shrinkable images inside of
extended boxes. The given extended dimensions are considered maximum
values. The stretching or shrinking of images will always preserve the
$\hbox{aspect ratio}=\hbox{width}/\hbox{height}$.
For convenience, we allow missing values in the long format, for
example the aspect ratio, if they can be determined from the image
data. In the short format, the necessary information for a correct
layout must be available without using the image data.
In the long format, the only required parts of an image node are: the
number of the auxiliary section where the image data can be found and
the descriptive text which is there to make the document more
accessible. The section number is followed by the optional aspect
ratio, width, and height of the image. If some of these values are
missing, it must be possible to determine them from the image
data. The node ends with the description.
The short format, starts with the section number of the image data and
ends with the description. Missing values for aspect ratio, width, and
height are only allowed if they can be recomputed from the image data.
A missing width or height is represented by a reference to the zero
extended dimension. If the |b100| bit is set, the aspect ratio is
present as a 32 bit floating point value followed by extended
dimensions for width and height. The info value |b100| indicates a
width reference followed by a height reference; the value |b111|
indicates a width node followed by a height node; the value |b110|
indicates a height reference followed by a width node; and the value
|b101| indicates a width reference followed by a height node. The
last two rules reflect the requirement that subnodes are always
located at the end of a node.
The remaining info values are used as follows:
The value |b000| is used for a reference to an image.
The value |b011| indicates an immediate width and an immediate height.
The value |b010| indicates an aspect ratio and an immediate width.
The value |b001| indicates an aspect ratio and an immediate height.
The following data type stores image information. The width and height
are either given as extended dimensions either directly in |w| and |h|
or as references in |wr| and |hr|.
@<hint types@>=
typedef struct {@+
uint16_t n;@+
float32_t a;@+
Xdimen w,h;@+
uint8_t wr,hr;@+
} Image;
@
\readcode
@s IMAGE symbol
@s image symbol
@s image_aspect symbol
@s image_aspect symbol
@s image_width symbol
@s image_height symbol
@s image_spec symbol
@<symbols@>=
%token IMAGE "image"
%token WIDTH "width"
%token HEIGHT "height"
%type <xd> image_width image_height
%type <f> image_aspect
%type <info> image_spec image
@
@<scanning rules@>=
::@=image@> :< return IMAGE; >:
::@=width@> :< return WIDTH; >:
::@=height@> :< return HEIGHT; >:
@
@<parsing rules@>=
image_aspect: number {$$=$1;} | {$$=0.0;};
image_width: WIDTH xdimen { $$=$2;}
| { $$=xdimen_defaults[zero_xdimen_no];};
image_height: HEIGHT xdimen { $$=$2; }
| { $$=xdimen_defaults[zero_xdimen_no];};
image_spec: UNSIGNED image_aspect image_width image_height
{$$=hput_image_spec($1,$2,0,&($3),0,&($4));}
| UNSIGNED image_aspect WIDTH REFERENCE image_height
{$$=hput_image_spec($1,$2,$4,NULL,0,&($5));}
| UNSIGNED image_aspect image_width HEIGHT REFERENCE
{$$=hput_image_spec($1,$2,0,&($3),$5,NULL);}
| UNSIGNED image_aspect WIDTH REFERENCE HEIGHT REFERENCE
{$$=hput_image_spec($1,$2,$4,NULL,$6,NULL);};
image: image_spec list {$$=$1;};
content_node: start IMAGE image END { hput_tags($1,TAG(image_kind,$3));};
@
When a short format file is generated, the image width and height must be
determined if necessary from the image file.
The following function will write this information into the long format file.
Editing the image file at a later time and converting the short format
file back to a long format file will preserve the old information.
This is not allways a desirable effect. It would be possible to
eliminate information about the image size when writing the long format
if that information can be derived from the image file. The latter
solution might have the disadvantage, that infomation about a
desired image size might get lost when editing an image file.
\writecode
@<write functions@>=
void hwrite_image(Image *x)
{ RNG("Section number",x->n,3,max_section_no); hwritef(" %u",x->n);
if (x->a!=0.0) hwrite_float64(x->a, false);
if (x->wr!=0) hwritef(" width *%u",x->wr);
else if (x->w.w!=0 ||x->w.h!=0.0 || x->w.v!=0.0)
{ hwritef(" width"); hwrite_xdimen(&x->w); }
if (x->hr!=0) hwritef(" height *%u",x->hr);
else if (x->h.w!=0 || x->h.h!=0.0 || x->h.v!=0.0)
{ hwritef(" height"); hwrite_xdimen(&x->h); }
}
@
\getcode
@<cases to get content@>=
@t\1\kern1em@>
case TAG(image_kind,b001): @+ HGET_IMAGE(b001);@+break;
case TAG(image_kind,b010): @+ HGET_IMAGE(b010);@+break;
case TAG(image_kind,b011): @+ HGET_IMAGE(b011);@+break;
case TAG(image_kind,b100): @+ HGET_IMAGE(b100);@+break;
case TAG(image_kind,b101): @+ HGET_IMAGE(b101);@+break;
case TAG(image_kind,b110): @+ HGET_IMAGE(b110);@+break;
case TAG(image_kind,b111): @+ HGET_IMAGE(b111);@+break;
@
@<get macros@>=
#define @[HGET_IMAGE(I)@] @/\
{ Image x={0};\
HGET16(x.n);\
if ((I)&b100) { x.a=hget_float32();\
if ((I)==b111) {hget_xdimen_node(&x.w);hget_xdimen_node(&x.h);}\
else if ((I)==b110) {x.hr=HGET8;hget_xdimen_node(&x.w);}\
else if ((I)==b101) {x.wr=HGET8;hget_xdimen_node(&x.h);}\
else {x.wr=HGET8;x.hr=HGET8;}}\
else if((I)==b011) {HGET32(x.w.w);HGET32(x.h.w);} \
else if((I)==b010) { x.a=hget_float32(); HGET32(x.w.w);}\
else if((I)==b001){ x.a=hget_float32(); HGET32(x.h.w);}\
hwrite_image(&x);\
{List d; hget_list(&d);hwrite_list(&d);}}@/
@
\putcode
@<put functions@>=
@<image functions@>@;
Info hput_image_spec(uint32_t n, float32_t a,
uint32_t wr, Xdimen *w, uint32_t hr, Xdimen *h)
{ HPUT16(n);
if (w!=NULL && h!=NULL)
{ if (w->h==0.0 && w->v==0.0 && h->h==0.0 && h->v==0.0)
return hput_image_dimens(n,a,w->w,h->w);
else
{ hput_image_aspect(n,a);
hput_xdimen_node(w);hput_xdimen_node(h);
return b111;
}
}
else if (w!=NULL && h==NULL)
{ if (w->h==0.0 && w->v==0.0 && hr==zero_xdimen_no)
return hput_image_dimens(n,a,w->w,0);
else
{ hput_image_aspect(n,a);
HPUT8(hr);hput_xdimen_node(w);
return b110;
}
}
else if (w==NULL && h!=NULL)
{ if (wr==zero_xdimen_no && h->h==0.0 && h->v==0.0)
return hput_image_dimens(n,a,0,h->w);
else
{ hput_image_aspect(n,a);
HPUT8(wr);hput_xdimen_node(h);
return b101;
}
}
else
{ if (wr==zero_xdimen_no && hr==zero_xdimen_no)
return hput_image_dimens(n,a,0,0);
else
{ hput_image_aspect(n,a);
HPUT8(wr);HPUT8(hr);
return b100;
}
}
}
@
If extended dimensions are involved,
the long format might very well specify different values than
stored in the image. In this case the given dimensions are
interpreted as maximum dimensions. If the aspect ratio is missing,
we use |hextract_image_dimens| to extract it from the image file.
@<image functions@>=
static void hput_image_aspect(int n,double a)
{
if (a==0.0) {Dimen w,h; hextract_image_dimens(n,&a,&w,&h);}
if (a!=0.0) hput_float32(a);
else QUIT("Unable to determine aspect ratio of image %s",dir[n].file_name);
}
@
If no extended dimensions are involved in an image specification,
we use |hput_image_dimen|.
Because the long format can omit part of the image specification,
we use |hextract_image_dimens| to extract information from
the image file and merge this information
with the data supplied in the long format.
@<image functions@>=
@<auxiliar image functions@>@;
static Info hput_image_dimens(int n,float32_t a, Dimen w, Dimen h)
{ Dimen iw,ih;
double ia;
if (w>0 && h>0)
{ HPUT32(w); HPUT32(h); return b011; }
else if (a>0 && w>0)
{ hput_float32((float32_t)a); HPUT32(w); return b010; }
else if (a>0 && h>0)
{ hput_float32((float32_t)a); HPUT32(h); return b001; }
hextract_image_dimens(n,&ia,&iw,&ih);
@<merge stored image dimensions with dimensions given@>@;
if (iw>0)
{ hput_float32((float32_t)ia); HPUT32(iw); return b010; }
else if (ih>0)
{ hput_float32((float32_t)ia); HPUT32(ih); return b001; }
else
{ iw=-iw; ih=-h; /*we accept the default resolution*/
HPUT32(iw); HPUT32(ih); return b011;
}
}
@
If the width, height or aspect ratio is stored in the
image file, we can merge this information with the information given in
the long format.
It is considered an error, if the function |hextract_image_dimens|
can not extract the aspect ratio. Absolute width and height values,
however, might be missing. If the aspect ratio is computed from the
number of horizontal and vertical pixels, |hextract_image_dimens|
makes the reasonable assumption that the intended resolution
is 72dpi and converts the image dimensions to scaled points.
It negates these values to indicate
that the resolution is just a guess. This allows other programs
to used different default resolutions if desired.
@<merge stored image dimensions with dimensions given@>=
{ if (ia==0.0)
{ if (a!=0.0) ia=a;
else if(w!=0 && h!=0) ia=(double)w/(double)h;
else QUIT("Unable to determine aspect ratio of image %s",dir[n].file_name);
}
/* here the aspect ratio |ia| is known */
if (w==0 && h==0) /*neither width nor height specified*/
{ if (ih>0) iw=round(ih * ia);
else if (iw>0) ih=round(iw/ia);
}
else if (h==0) /*width specified*/
{ iw=w;@+ ih=round(w/ia);@+ }
else if (w==0) /*height specified*/
{ ih=h;@+ iw=round(h*ia);@+}
else /* both specified */
{ ih = h;@+
iw = w;@+
}
}
@
Before we present the code to extract image dimensions from
various types of image files, we define a few macros and
variables for the reading these image files.
@<auxiliar image functions@>=
#define IMG_BUF_MAX 54
#define IMG_HEAD_MAX 2
static unsigned char img_buf[IMG_BUF_MAX];
static size_t img_buf_size;
#define @[LittleEndian32(X)@] (img_buf[(X)]+(img_buf[(X)+1]<<8)+\
(img_buf[(X)+2]<<16)+(img_buf[(X)+3]<<24))
#define @[BigEndian16(X)@] (img_buf[(X)+1]+(img_buf[(X)]<<8))
#define @[BigEndian32(X)@] (img_buf[(X)+3]+(img_buf[(X)+2]<<8)+\
(img_buf[(X)+1]<<16)+(img_buf[(X)]<<24))
#define Match2(X,A,B) ((img_buf[(X)]==(A)) && (img_buf[(X)+1]==(B)))
#define Match4(X,A,B,C,D) (Match2(X,A,B)&&Match2((X)+2,C,D))
#define @[GET_IMG_BUF(X)@] \
if (img_buf_size<X) \
{ size_t i=fread(img_buf+img_buf_size,1,(X)-img_buf_size,f); \
if (i<0) QUIT("Unable to read image %s",fn); \
else if (i==0) QUIT("Unable to read image header %s",fn); \
else img_buf_size+=i; \
}
@
Considering the different image formats, we start with Windows
Bitmaps. A Windows bitmap file usually has the extension {\tt .bmp}
but the better way to check for a Windows bitmap file ist to examine
the first two byte of the file: the ASCII codes for `B' and `M'. Once
we have verified the file type, we find the width and height of the
bitmap in pixels at offsets |0x12| and |0x16| stored as little-endian
32 bit integers. At offsets |0x26| and |0x2A|, we find the horizontal
and vertical resolution in pixel per meter stored in the same format.
This is sufficient to compute the true width and height of the image
in scaled points.
The Windows Bitmap format is easy to process but not very
efficient. So the support for this format in the \HINT\ format is
deprecated and will disappear. You should use one of the formats
described next.
@<auxiliar image functions@>=
static bool get_BMP_info(FILE *f, char *fn, double *a, Dimen *w, Dimen *h)
{ double wpx,hpx;
double xppm,yppm;
GET_IMG_BUF(2);
if (!Match2(0,'B','M')) return false;
GET_IMG_BUF(0x2E);
wpx=(double)LittleEndian32(0x12); /*width in pixel*/
hpx=(double)LittleEndian32(0x16); /*height in pixel*/
xppm=(double)LittleEndian32(0x26); /* horizontal pixel per meter*/
yppm=(double)LittleEndian32(0x2A); /* vertical pixel per meter*/
*w= floor(0.5+ONE*(72.00*1000.0/25.4)*wpx/xppm);
*h= floor(0.5+ONE*(72.00*1000.0/25.4)*hpx/yppm);
*a = (wpx/xppm)/(hpx/yppm);
return true;
}
@
Now we repeat this process for image files using the Portable Network
Graphics file format. This file format is well suited to simple
graphics that do not use color gradients. These images usually have
the extension {\tt .png} and start with an eight byte signature:
|0x89| followed by the ASCII Codes `P', `N', `G', followd by a
carriage return (|0x0D| and line feed (|0x0A|), an DOS end-of-file
character (|0x1A|) and final line feed (|0x0A|). After the signature
follows a list of chunks. The first chunk is the image header chunk.
Each chunk starts with the size of the chunk stored as big-endian 32
bit integer, followed by the chunk name stored as four ASCII codes
followed by the chunk data and a CRC. The size, as stored in the
chunk, does not include the size itself, nor the name, and neither the
CRC. The first chunk is the IHDR chunk. The chunk data of the IHDR
chunk starts with the width and the height of the image in pixels
stored as 32 bit big-endian integers.
Finding the image resolution takes some more effort. The image
resolution is stored in an optional chunk named ``pHYs'' for the
physical pixel dimensions. All we know is that this chunk, if it
exists, will appear after the IHDR chunk and before the (required)
IDAT chunk. The pHYs chunk contains two 32 bit big-endian integers,
giving the horizontal and vertical pixels per unit, and a one byte
unit specifier, which is either 0 for an undefined unit or 1 for the
meter as unit. With an undefined unit, only the aspect ratio of the
pixels and hence the aspect ratio of the image can be determined.
It is not uncommon, however, that the resolution in such a case is given
as dots per inch. So we decide to assume the latter.
If there is resolution can not be determined, we assume a resolution
of 72dpi and negate width and height to inform the calling procedure
of this arbitrary choice.
@<auxiliar image functions@>=
static bool get_PNG_info(FILE *f, char *fn, double *a, Dimen *w, Dimen *h)
{ int pos, size;
double wpx,hpx; /* width and height in pixel */
double xppu,yppu; /* pixel per unit in x and y direction */
int unit;
GET_IMG_BUF(24);
if (!Match4(0, 0x89, 'P', 'N', 'G') ||
!Match4(4, 0x0D, 0x0A, 0x1A, 0x0A)) return false;
size=BigEndian32(8);
if (!Match4(12,'I', 'H', 'D', 'R')) return false;
wpx=(double)BigEndian32(16);
hpx=(double)BigEndian32(20);
pos=20+size;
while (true)
{ if (fseek(f,pos,SEEK_SET)!=0) return false;
img_buf_size=0;
GET_IMG_BUF(17);
size=BigEndian32(0);
if (Match4(4,'p', 'H', 'Y', 's')) /*must occur before IDAT chunk*/
{ xppu =(double)BigEndian32(8);
yppu =(double)BigEndian32(12);
unit=img_buf[16];
if (unit==0) /* assuming unit is inch */
{ *w=floor(0.5+ONE*72.27*wpx/xppu);
*h=floor(0.5+ONE*72.27*hpx/yppu);
*a =(wpx/xppu)/(hpx/yppu);
return true;
}
else if (unit==1) /* unit is meter */
{
*w=floor(0.5+ONE*(72.27/0.0254)*wpx/xppu);
*h=floor(0.5+ONE*(72.27/0.0254)*hpx/yppu);
*a = (wpx/xppu)/(hpx/yppu);
return true;
}
else
break;
}
else if (Match4(4,'I', 'D', 'A', 'T'))
break;
else
pos=pos+12+size;
}
/*we assume 72dpi and negate the results*/
*w=-floor(0.5+ONE*72.27*wpx/72.0);
*h=-floor(0.5+ONE*72.27*hpx/72.0);
*a =wpx/hpx;
return true;
}
@
For photographs, the JPEG File Interchange Format (JFIF) is more
appropriate. JPEG files come with all sorts of file extensions like
{\tt .jpg}, {\tt .jpeg}, or {\tt .jfif}. We check the file siganture:
it starts with the the SOI (Start of Image) marker |0xFF|, |0xD8|.
Most likely it will be followed by the JIFI-Tag.
The JIFI-Tag starts with the segment marker
APP0 (|0xFF|, |0xE0|) followed by the 2 byte segment size, followed by
the ASCII codes `J', `F', `I', `F' followed by a zero byte. Next is a
two byte version number which we do not read. Before the resolution
proper there is a resolution unit indicator byte (0 = no units, 1 =
dots per inch, 2 = dots per cm) and then comes the horizontal and
vertical resolution both as 16 Bit big-endian integers.
Instead of the JIFI-Tag, there might as well be a Exif-Tag
which starts with the segment marker
APP1 (|0xFF|, |0xE1|) followed by the 2 byte segment size.
Currently this tag is not decoded.
To find the actual width and height,
we have to search for a start of frame marker
(|0xFF|, |0xC0|+$n$ with $0\le n\le 15$). Which is followed by the 2
byte segment size, the 1 byte sample precission, the 2 byte height and
the 2 byte width.
If the resolution was given explicitely in the JIFI-Tag,
we use it. If there was no such tag or the uint was undefined,
we proceed as we did for the PNG file.
@<auxiliar image functions@>=
static bool get_JPG_info(FILE *f, char *fn, double *a, Dimen *w, Dimen *h)
{ int pos, size;
double wpx,hpx;
double xppu=72.0,yppu=72.0;
int unit;
GET_IMG_BUF(18);
if (!Match2(0, 0xFF,0xD8)) /* SOI Start of Image */
return false;
pos=2;
while (true)
{ if (fseek(f,pos,SEEK_SET)!=0)
return false;
img_buf_size=0;
GET_IMG_BUF(16);
if (img_buf[0] != 0xFF) return false; /* Not the start of a segment */
if ( img_buf[1] == 0xE0 &&
Match4(4,'J', 'F', 'I', 'F')) /* APP0 JFIF Tag */
{ unit=img_buf[11];
xppu=(double)BigEndian16(12);
yppu=(double)BigEndian16(14);
if (unit==1)
; /* allready in dpi */
else if (unit==2)
{ xppu=xppu*2.54; /* convert dot per cm to dpi */
yppu=yppu*2.54;
}
else
{ yppu=72.0*yppu/xppu; /* assume 72dpi */
xppu=72.0;
}
}
else if (img_buf[1] == 0xC0 || img_buf[1] == 0xC2) /* SOF Start of Frame */
{ hpx =(double)BigEndian16(5);
wpx =(double)BigEndian16(7);
*w = floor(0.5+ONE*72.27*wpx/xppu);
*h = floor(0.5+ONE*72.27*hpx/yppu);
*a = (wpx/xppu)/(hpx/yppu);
return true;
}
else if (img_buf[1] == 0xD9) /* EOI End of Image */
return false;
size= BigEndian16(2);
pos=pos+2+size;
}
return false;
}
@
There is still one image format missing: scalable vector graphics.
In the moment, I tend not to include a further image format into
the definition of the \HINT\ file format but instead use the
PostScript subset that is used for Type 1 fonts to encode
vector graphics. Any \HINT\ viewer must support Type 1
PostScript fonts and hence it has already the necessary interpreter.
So it seems reasonable to put the burden of converting vector graphics
into a Type 1 PostScript font on the generator of \HINT\ files
and keep the \HINT\ viewer as small and simple as possible.
An alternative which would impose only a slight burden on the \HINT\ file
viewer is the use of the rsvg library.
After having considered the various types of image files,
we now determine width, height and aspect ratio based on
such an image file.
We combine all the above functions into the |hextract_image_dimens|
function.
@<image functions@>=
void hextract_image_dimens(int n, double *a, Dimen *w, Dimen *h)
{ char *fn;
FILE *f;
*a=0.0;
*w=*h=0;
fn=dir[n].file_name;
f=fopen(fn,"rb");
if (f!=NULL)
{ img_buf_size=0;
if (!get_BMP_info(f,fn,a,w,h) &&
!get_PNG_info(f,fn,a,w,h) &&
!get_JPG_info(f,fn,a,w,h))
DBG(DBGDEF,"Unknown image type %s",fn);
fclose(f);
DBG(DBGDEF,"image %d: width= %fpt height= %fpt aspect=%f\n",
n,*w/(double)ONE,*h/(double)ONE,*a);
}
}
@
\subsection{Positions, Outlines, Links, and Labels}\label{labels}
\index{position}\index{outline}\index{link}\index{label}
A viewer can usually not display the entire content section of
a \HINT\ file. Instead it will display a page of content and will give
its user various means to change the page. This might be as simple as
a ``page down'' or ``page up'' button (or gesture) and as
sophisticated as searching using regular expressions. More
traditional ways to navigate the content include the use of a table of
content or an index of keywords. All these methods of changing a page
have in common that a part of the content that fits nicely in the
screen area provided by the output device must be rendered given a
position inside the content section.
Let's assume that the viewer uses a \HINT\ file in short
format---after all that's the format designed for precisely this use.
A position inside the content section is then the position of the
starting byte of a node. Such a position can be stored as a 32 bit
number. Because even the smallest node contains two tag bytes,
the position of any node is strictly smaller than the maximum 32 bit
number which we can conveniently use as a ``non position''.
@<hint macros@>=
#define HINT_NO_POS 0xFFFFFFFF
@
To render a page starting at a given position is not difficult:
We just read content nodes, starting at the given position and feed
them to \TeX's page builder until the page is complete. To implement a
``clickable'' table of content this is good enough. We store with
every entry in the table of content the position of the section
header, and when the user clicks the entry, the viewer can display a
new page starting exactly with that section header.
Things are slightly more complex if we want to implement a ``page
down'' button. If we press this button, we want the next page to
start exactly where the current page has ended. This is
typically in the middle of a paragraph node, and it might even be in
the middle of an hyphenated word in that paragraph. Fortunately,
paragraph and table nodes are the only nodes that can be broken across page
boundaries. But broken paragraph nodes are a common case non the less,
and unless we want to search for the enclosing node, we need to
augment in this case the primary 32 bit position inside the content
section with a secondary position. Most of the
time, 16 bit will suffice for this secondary position if we give it
relative to the primary position. Further, if the list of nodes forming the
paragraph is given as a text, we need to know the current font at the
secondary position. Of course, the viewer can find it by scanning the
initial part of the text, but when we think of a page down button, the
viewer might already know it from rendering the previous page.
Similar is the case of a ``page up'' button. Only here we need a page
that ends precisely where our current page starts. Possibly even with
the initial part of a hyphenated word. Here we need a reverse version
of \TeX's page builder that assembles a ``good'' page from the bottom
up instead of from the top down. Sure the viewer can cache the start
position of the previous page (or the rendering of the entire page) if
the reader has reached the current page using the page down
button. But this is not possible in all cases. The reader might have
reached the current page using the table of content or even an index
or a search form.
This is the most complex case to consider: a link from an index or a
search form to the position of a keyword in the main text. Let's assume
someone looks up the word ``M\"unchen''. Should the viewer then
generate a page that starts in the middle of a sentence with the word
``M\"unchen''? Probably not! We want a page that shows at least the whole sentence if
not the whole paragraph. Of course the program that generates the
link could specify the position of the start of the paragraph instead
of the position of the word. But that will not solve the problem. Just
imagine reading the groundbreaking masterpiece of a German philosopher
on a small hand-held device: the paragraph will most likely be very
long and perhaps only part of the first sentence will fit on the small
screen. So the desired keyword might not be found on the page that
starts with the beginning of the paragraph; it might not even be on
the next or next to next page. Only the viewer can decide what is the
best fragment of content to display around the position of the given
keyword.
To summarize, we need three different ways to render a page for a given position:
\itemize
\item A page that starts exactly at the given position.
\item A page that ends exactly at the given position.
\item The ``best'' page that contains the given position somewhere in the middle.
\enditemize
\noindent
A possible way to find the ``best'' page for the latter case
could be the following:
\itemize
\item If the position is inside a paragraph, break the paragraph
into lines. One line will contain
the given position. Let's call this the destination line.
\item If the paragraph will not fit entirely on the page,
start the page with the beginning of the
paragraph if that will place the destination line on the page, otherwise
start with a line in the paragraph that is about half a page
before the destination line.
\item Else traverse the content list backward for about $2/3$ of the
page height and forward for about $2/3$ of the page height, searching
for the smallest negative penalty node. Use the penalty node found as
either the beginning or ending of the page.
\item If there are several equally low negative penalty nodes. Prefer
penalties preceding the destination line over penalty nodes following
it. A good page start is more important than a good page end.
\item If there are are still several equally low negative penalty
nodes, choose the one whose distance to the destination line is closest
to $1/2$ of the page height.
\item If no negative penalty nodes could be found, start the page with
the paragraph containing the destination line.
\item Once the page start (or end) is found, use \TeX's page builder
(or its reverse variant) to complete the page.
\enditemize
We call content nodes that reference some position inside the content section
``link'' nodes. The position that is referenced is called the destination of the link.
Link nodes occur always in pairs of an ``start'' link
followed by a corresponding ``end'' link that both reference the same position
%, the same nesting level, % not sure!
and no other link nodes between them.
The content between the two will constitute the visible part of the link.
To encode a position inside the content section that can be used
as the destination of a link node, an other kind of node is needed which
we call a ``label''.
Links are not the only way to navigate inside a large
document. The user interface can also present an ``outline''
of the document that can be used for navigation.
An outline node implements an association between a name displayed by the
user interface of the \HINT\ viewer and the destination position in the \HINT\ document.
It is possible though that outline nodes, link nodes, and label nodes can share
the same kind-value and we have |outline_kind==link_kind==label_kind|.
To distinguish an outline node from a label node---both occur
in the short format definition section---the |b100| info bit is set in an
outline node.
@<get functions@>=
void hget_outline_or_label_def(Info i, uint32_t node_pos)
{ @+if (i&b100)
@<get and write an outline node@>@;
else
@<get and store a label node@>@;
}
@
The next thing we need to implement is a new maximum number
for outline nodes. We store this number in the variable
|max_outline| and limit it to a 16 bit value.
In the short format, the value of |max_outline| is stored with the
other maximum values using the kind value |outline_kind==label_kind| and the info
value |b100| for single byte and |b101| for a two byte value.
\codesection{\getsymbol}{Reading the Short Format}\getindex{1}{7}{Special Maximum Values}
@<cases of getting special maximum values@>=
@t\1\kern1em@>
case TAG(outline_kind,b100):
case TAG(outline_kind,b101): max_outline=n;
DBG(DBGDEF|DBGLABEL,"max(outline) = %d\n",max_outline); break;
@
\codesection{\putsymbol}{Writing the Short Format}\putindex{1}{7}{Special Maximum Values}
@<cases of putting special maximum values@>=
if (max_outline>-1)
{ uint32_t pos=hpos++-hstart;
DBG(DBGDEF|DBGLABEL,"max(outline) = %d\n",max_outline);
hput_tags(pos,TAG(outline_kind,b100|(hput_n(max_outline)-1)));
}
@
\codesection{\wrtsymbol}{Writing the Long Format}\wrtindex{1}{7}{Special Maximum Values}
@<cases of writing special maximum values@>=
@t\1\kern1em@>
case label_kind:
if (max_ref[label_kind]>-1)@/
{ hwrite_start();
hwritef("label %d",max_ref[label_kind]);
hwrite_end();@+
}
if (max_outline>-1)@/
{ hwrite_start();
hwritef("outline %d", max_outline);
hwrite_end();@+
}
break;
@
\codesection{\redsymbol}{Reading the Long Format}\redindex{1}{7}{Special Maximum Values}
@<parsing rules@>=
max_value: OUTLINE UNSIGNED { max_outline=$2;
RNG("max outline",max_outline,0, 0xFFFF);
DBG(DBGDEF|DBGLABEL,"Setting max outline to %d\n",max_outline);
};
@
After having seen the maximum values, we now explain labels, then links,
and finally outlines.
To store labels, we define a data type |Label| and an array |labels|
indexed by the labels reference number.
@<hint basic types@>=
typedef struct
{@+ uint32_t pos; /* position */
uint32_t pos0; /* secondary position */
uint8_t where; /* where on the rendered page */
bool used; /* label used in a link or an outline */
int next; /* reference in a linked list */
uint8_t f; /* font, currently not used */
} Label;
@
The |where| field indicates where the label position
should be on the rendered page: at the top,
at the bottom, or somewhere in the middle.
An undefined label has |where| equal to zero.
@<hint macros@>=
#define LABEL_UNDEF 0
#define LABEL_TOP 1
#define LABEL_BOT 2
#define LABEL_MID 3
@
@<common variables@>=
Label *labels=NULL;
int first_label=-1;
@
The variable |first_label| will be used together with the |next| field of
a label to construct a linked list of labels.
@<initialize definitions@>=
if (max_ref[label_kind]>=0)@/
ALLOCATE(labels,max_ref[label_kind]+1,Label);
@
The implementation of labels has to solve the
problem of forward links:
a link node that references a label
that is not yet defined.
We solve this problem by
keeping all labels in the definition section.
So for every label at least a definition is available
before we start with the content section and we can fill
in the position when the label is found.
If we restrict labels to the definition section and
do not have an alternative representation, the number of possible references
is a hard limit on the number of labels in a document.
Therefore label references are allowed to use 16 bit reference numbers.
In the short format,
the |b001| bit indicates a two byte reference number if set, and a one byte
reference number otherwise.
In the short format, the complete information about a label is in the definition section.
In the long format, this is not possible because we do not have node positions.
Therefore we will put label nodes at appropriate points in the content section
and compute the label position when writing the short format.
\gdef\subcodetitle{Labels}
\readcode
@s LABEL symbol
@s BOT symbol
@s MID symbol
@s placement symbol
@<symbols@>=
%token LABEL "label"
%token BOT "bot"
%token MID "mid"
%type <i> placement
@
@<scanning rules@>=
::@=label@> :< return LABEL; >:
::@=bot@> :< return BOT; >:
::@=mid@> :< return MID; >:
@
A label node specifies the reference number and a placement.
@<parsing rules@>=
placement: TOP {$$=LABEL_TOP;} | BOT {$$=LABEL_BOT;} | MID {$$=LABEL_MID;} | {$$=LABEL_MID;};
content_node: START LABEL REFERENCE placement END @|
{ hset_label($3,$4); @+};
@
After parsing a label, the function |hset_label| is called.
@<put functions@>=
void hset_label(int n,int w )
{ Label *t;
REF_RNG(label_kind,n);
t=labels+n;@/
if (t->where!=LABEL_UNDEF)
MESSAGE("Duplicate definition of label %d\n",n);
t->where=w;
t->pos=hpos-hstart;
t->pos0=hpos0-hstart;
t->next=first_label; first_label=n;
}
@
The above function will simply store the data obtained in the |labels| array.
The generation of the short format output is
postponed until the entire content section has been parsed and
the positions of all labels are known.
One more complication needs to be considered: The |hput_list| function
is allowed to move lists in the output stream and if positions
inside the list were recorded in a label, these labels need an
adjustment. To find out quickly if any labels are affected,
the |hset_label| function
constructs a linked list of labels starting with the reference number
of the most recent label in |first_label| and the
reference number of the label preceding label |i| in |labels[i].next|.
Because labels are recorded with increasing positions,
the list will be sorted with positions decreasing.
@<adjust label positions after moving a list@>=
{ int i;
for (i=first_label;i>=0 && labels[i].pos>=l->p;i=labels[i].next)
{ DBG(DBGNODE|DBGLABEL,"Moving label *%d by %d\n", i,d);@/
labels[i].pos+=d;
if (labels[i].pos0>=l->p) labels[i].pos0+=d;
}
}
@
The |hwrite_label| function\label{hwritelabel} is the reverse of the above parsing rule.
Note that it is different from the
usual |hwrite_|\dots\ functions. And we will see shortly why that is so.
%see |hwrite_range|
\writecode
@<write functions@>=
void hwrite_label(void) /* called in |hwrite_end| and at the start of a list */
{@+ while (first_label>=0 && (uint32_t)(hpos-hstart)>=labels[first_label].pos)@/
{ Label *t=labels+first_label;
DBG(DBGLABEL,"Inserting label *%d\n", first_label);
hwrite_start();
hwritef("label *%d",first_label);
if (t->where==LABEL_TOP) hwritef(" top");
else if (t->where==LABEL_BOT) hwritef(" bot");
nesting--;hwritec('>'); /* avoid a recursive call to |hwrite_end| */
first_label=labels[first_label].next;
}
}
@
The short format specifies the label positions in the definition section.
This is not possible in the long format because there are no ``positions''
in the long format. Therefore long format label nodes must
be inserted in the content section just before those nodes
that should come after the label. The function |hwrite_label| is called
in |hwrite_end|. At that point |hpos| is the position of the next node
and it can be compared with the positions of the labels taken from
the definition section.
Because |hpos| is strictly increasing while reading the content section,
the comparison can be made efficient by sorting the labels.
The sorting uses the |next| field in the
array of |labels| to construct a linked list. After sorting, the value of
|first_label| is the index of the label with the smallest position;
and for each |i|, the value of |labels[i].next| is the index of
the label with the next bigger position. If |labels[i].next| is negative,
there is no next bigger position.
Currently a simple insertion sort is used.
The insertion sort will work well if the labels are already
mostly in ascending order.
If we expect lots of labels in random order,
a more sophisticated sorting algorithm might be appropriate.
@<write functions@>=
void hsort_labels(void)
{ int i;
if (max_ref[label_kind]<0)
{ first_label=-1; return; @+} /* empty list */
first_label=max_ref[label_kind];
while (first_label>=0 && labels[first_label].where==LABEL_UNDEF)
first_label--;
if (first_label<0) return; /* no defined labels */
labels[first_label].next=-1;
DBG(DBGLABEL,"Sorting %d labels\n",first_label+1);
for (i=first_label-1; i>=0; i--) /* insert label |i| */
if (labels[i].where!=LABEL_UNDEF)@/
{ uint32_t pos=labels[i].pos;
if (labels[first_label].pos >= pos)@/
{ labels[i].next= first_label; first_label=i;@+ } /* new smallest */
else @/
{ int j;
for (j= first_label;
labels[j].next>=0 && labels[labels[j].next].pos<pos;
j=labels[j].next) continue;
labels[i].next=labels[j].next; labels[j].next=i;
}
}
}
@
The following code is used to get label information from the
definition section and store it in the |labels| array.
The |b010| bit indicates the presence of a secondary position for the label.
\getcode
@<get and store a label node@>=
{ Label *t;
int n;
if (i&b001) HGET16(n); @+else n=HGET8;
REF_RNG(label_kind,n);
t=labels+n;
if (t->where!=LABEL_UNDEF)
DBG(DBGLABEL,"Duplicate definition of label %d at 0x%x\n",n, node_pos);
HGET32(t->pos);
t->where=HGET8;
if (t->where==LABEL_UNDEF || t->where>LABEL_MID)
DBG(DBGLABEL,"Label %d where value invalid: %d at 0x%x\n",n,t->where,node_pos);
if (i&b010) /* secondary position */
{ HGET32(t->pos0); t->f=HGET8;@+}
else t->pos0=t->pos;
DBG(DBGLABEL,"Defining label %d at 0x%x/0x%x\n",n,t->pos0,t->pos);
}
@
The function |hput_label| is simply the reverse of the above code.
\putcode
@<put functions@>=
Tag hput_label(int n, Label *l)
{ Info i=b000;
HPUTX(13);
if (n>0xFF) {i|=b001; HPUT16(n);@+}@+ else HPUT8(n);
HPUT32(l->pos);
HPUT8(l->where);
if (l->pos!=l->pos0)
{ i|=b010; HPUT32(l->pos0); HPUT8(l->f); @+}
DBG(DBGLABEL,"Defining label %d at 0x%x/0x%x\n",n,l->pos0,l->pos);
return TAG(label_kind,i);
}
@
|hput_label_defs| is called by the parser after the entire content
section has been processed; it appends the label definitions
to the definition section.
%Using the fact that the linked list
%starting at |first_label| already contains all labels in
%order of descending position, we could easily output the
%labels in sorted order and reconstruct the sorting while reading
%in the labels. The \HINT\ format however does not require
%label nodes to be sorted and the |hsort_labels| function
%can not be avoided.
The outlines are stored after the labels because they reference the labels.
@<put functions@>=
extern void hput_definitions_end(void);
extern Tag hput_outline(Outline *t);
void hput_label_defs(void)
{ int n;
section_no=1;
hstart=dir[1].buffer;
hend=hstart+ dir[1].bsize;
hpos=hstart+dir[1].size;@/
@<output the label definitions@>@;
@<output the outline definitions@>@;
hput_definitions_end();
}
@
@<output the label definitions@>=
for (n=0; n<=max_ref[label_kind]; n++)@/
{ Label *l=labels+n;
uint32_t pos;
if (l->used)@/
{ pos=hpos++-hstart;
hput_tags(pos,hput_label(n,l));
if (l->where==LABEL_UNDEF)
MESSAGE("WARNING: Label *%d is used but not defined\n",n);
else
DBG(DBGDEF|DBGLABEL,"Label *%d defined 0x%x\n",n,pos);@/
}
else
{ if (l->where!=LABEL_UNDEF)
{ pos=hpos++-hstart;
hput_tags(pos,hput_label(n,l));
DBG(DBGDEF|DBGLABEL,"Label *%d defined but not used 0x%x\n",n,pos);@/
}
}
}
@
Links are simpler than labels. They are found only in the
content section and resemble pretty much what we have seen for other
content nodes. Let's look at them next.
When reading a short format link node,
we use again the |b001| info bit to indicate a 16 bit reference
number to a label.
To help a reader tell a link from ordinary text, links should be
visualy different. This is supported in the \HINT\ file format
by associating a different color scheme to a link.
In the short format, the |b100| bit indicates that a color set reference
(see section~\secref{colors}) follows after the label reference.
A color reference to 1 in the start node and to |0xFF| in the end node
is the default and is omitted.
Because color changes are local to the enclosing box or paragraph,
a link is local as well. Without further mentioning, here and
in the following, when we say ``box'' it also mean ``paragraph''.
A link starts with a ``start'' link and ends with either an ``end''
link or the end of the enclosing box. Links must not be nested.
It is an error to have two start links in the same box without
an end link between them.
An application can choose to continue a link in the next box
by inserting a copy of the start link node at the begining of
the new box.
In short: ``end'' links are mandatory when separating two links
but optional if they just preceede the end of the box.
The |b010| info bit indicates a ``start'' link;
otherwise it is an ``end'' link.
\gdef\subcodetitle{Links}
\getcode
@<get macros@>=
#define @[HGET_LINK(I)@] @/\
{ int n,c; if (I&b001) HGET16(n);@+ else n=HGET8;\
if (I&b100) c=HGET8; else c=(I&b010)?1:0xFF;\
hwrite_link(n,c,I&b010); @+}
@
@<cases to get content@>=
@t\1\kern1em@>
case TAG(link_kind,b000): @+ HGET_LINK(b000);@+ break;
case TAG(link_kind,b001): @+ HGET_LINK(b001);@+ break;
case TAG(link_kind,b010): @+ HGET_LINK(b010);@+ break;
case TAG(link_kind,b011): @+ HGET_LINK(b011);@+ break;
case TAG(link_kind,b100): @+ HGET_LINK(b100);@+ break;
case TAG(link_kind,b101): @+ HGET_LINK(b101);@+ break;
case TAG(link_kind,b110): @+ HGET_LINK(b110);@+ break;
case TAG(link_kind,b111): @+ HGET_LINK(b111);@+ break;
@
The function |hput_link| will insert the link in the output stream and return
the appropriate tag.
\putcode
@<put functions@>=
Tag hput_link(int n, int c, int on)
{ Info i;
REF_RNG(label_kind,n);
labels[n].used=true;
if (on) i=b010;@+ else i=b000;
if (n>0xFF) { i|=b001; HPUT16(n);@+} @+else HPUT8(n);
if ((on && c!=1) ||(!on && c!=0xFF)) { i|=b100; HPUT8(c); }
return TAG(link_kind,i);
}
@
\readcode
@s LINK symbol
@<symbols@>=
%token LINK "link"
@
@<scanning rules@>=
::@=link@> :< return LINK; >:
@
@<parsing rules@>=
content_node:start LINK REFERENCE on_off END
{@+ hput_tags($1,hput_link($3,$4?1:0xFF,$4));@+ };
content_node:start LINK REFERENCE on_off REFERENCE END
{@+ hput_tags($1,hput_link($3,$5,$4));@+ };
@
\writecode
@<write functions@>=
void hwrite_link(int n, int c, uint8_t on)
{ REF_RNG(label_kind,n);
if (labels[n].where==LABEL_UNDEF)
MESSAGE("WARNING: Link to an undefined label %d\n",n);
hwrite_ref(n);
if (on) hwritef(" on");
else hwritef(" off");
if ((on && c!=1)||(!on && c!=0xFF))
{ REF_RNG(color_kind,c); hwrite_ref(c); }
}
@
Now we look at the
outline nodes which are found only in the definition section.
Every outline node is associated with a label node, giving the position in the
document, and a unique title that should tell the user
what to expect when navigating to this position. For example
an item with the title ``Table of Content'' should navigate
to the page that shows the table of content.
The sequence of outline nodes found in the definition section
gets a tree structure by assigning to each item a depth level.
@<hint types@>=
typedef struct {@+
uint8_t *t; /* title */
int s; /* title size */
int d; /* depth */
uint16_t r; /* reference to a label */
} Outline;
@
@<shared put variables@>=
Outline *outlines;
@
@<initialize definitions@>=
if (max_outline>=0)@/
ALLOCATE(outlines,max_outline+1,Outline);
@
Child items follow their parent item and have a bigger depth level.
In the short format, the first item must be a root item, with
a depth level of 0. Further, if any item has the depth $d$, then the
item following it must have either the same depth $d$ in which
case it is a sibling, or the depth $d+1$ in which case it is a child,
or a depth $d^\prime$ with $0\le d^\prime<d$ in which case it is a sibling
of the latest ancestor with depth $d^\prime$. Because the depth is
stored in a single byte, the maximum depth is |0xFF|.
In the long format, the depth assignments are more flexible.
We allow any signed integer, but insist that the depth
assignments can be compressed to depth levels for the
short format using the following algorithm:
@<compress long format depth levels@>=
n=0;@+
while (n<=max_outline)
n=hcompress_depth(n,0);
@
Outline items must be listed in the order
in which they should be displayed.
The function |hcompress_depth(n,c)| will compress the subtree starting at
|n| with root level |d| to a new tree with the same structure
and root level |c|. It returns the outline number of the
following subtree.
@<put functions@>=
int hcompress_depth(int n, int c)
{ int d=outlines[n].d;
if (c>0xFF)
QUIT("Outline %d, depth level %d to %d out of range",n,d,c);
while (n<=max_outline)
if (outlines[n].d==d)
outlines[n++].d=c;
else if (outlines[n].d>d)
n=hcompress_depth(n,c+1);
else break;
return n;
}
@
For an outline node, the |b001| bit indicates a two byte reference to a label.
There is no reference number for an outline item itself:
it is never referenced anywhere in an \HINT\ file.
\gdef\subcodetitle{Outlines}
\vbox{\getcode\vskip -\baselineskip\writecode}
@<get and write an outline node@>=
{ int r,d;
List l;
static int outline_no=-1;
hwrite_start();@+hwritef("outline");
++outline_no;
RNG("outline",outline_no, 0, max_outline);
if (i&b001) HGET16(r);@+ else r=HGET8;
REF_RNG(link_kind,r);
if (labels[r].where==LABEL_UNDEF)@/
MESSAGE("WARNING: Outline with undefined label %d at 0x%x\n",@|r, node_pos);
hwritef(" *%d",r);@/
d=HGET8; hwritef(" %d",d);@/
hget_list(&l);hwrite_list(&l);@/
hwrite_end();
}
@
When parsing an outline definition in the long format,
we parse the outline title as a |list| which will
write the representation of the list to the output stream.
Writing the outline definitions, however, must be postponed
until the label have found their way into the definition
section. So we save the list's representation in the
outline node for later use and remove it again from the
output stream.
\readcode
@s OUTLINE symbol
@<symbols@>=
%token OUTLINE "outline"
@
@<scanning rules@>=
::@=outline@> :< return OUTLINE; >:
@
@<parsing rules@>=
def_node: START OUTLINE REFERENCE integer position list END {
static int outline_no=-1;
$$.k=outline_kind; $$.n=$3;
if ($6.s==0) QUIT("Outline with empty title in line %d",yylineno);
outline_no++;
hset_outline(outline_no,$3,$4,$5);
};
@
@<put functions@>=
void hset_outline(int m, int r, int d, uint32_t pos)
{ Outline *t;
RNG("Outline",m,0,max_outline);
t=outlines+m;
REF_RNG(label_kind,r);
t->r=r;
t->d=d;
t->s=hpos-(hstart+pos);
hpos=(hstart+pos);
ALLOCATE(t->t,t->s,uint8_t);
memmove(t->t,hpos,t->s);
labels[r].used=true;
}
@
To output the title, we need to move the list back to the output stream.
Before doing so, we allocate space (and make sure there is room left for the
end tag of the outline node), and after doing so, we release
the memory used to save the title.
@<output the title of outline |*t|@>=
memmove(hpos,t->t,t->s);
hpos=hpos+t->s;
free(t->t);
@
We output all outline definitions from 0 to |max_outline| and
check that every one of them has a title. Thereby we make sure
that in the short format |max_outline| matches the number of
outline definitions.
\putcode
@<put functions@>=
Tag hput_outline(Outline *t)
{ Info i=b100;
HPUTX(t->s+4);
if (t->r>0xFF) {i|=b001; @+HPUT16(t->r);@+} @+else HPUT8(t->r);
labels[t->r].used=true;
HPUT8(t->d);
@<output the title of outline |*t|@>@;
return TAG(outline_kind,i);
}
@
@<output the outline definitions@>=
@<compress long format depth levels@>@;
for (n=0;n<=max_outline;n++)
{ Outline *t=outlines+n;
uint32_t pos;
pos=hpos++-hstart;
if (t->s==0 || t->t==NULL)
QUIT("Definition of outline %d has an empty title",n);
DBG(DBGDEF|DBGLABEL,"Outline *%d defined\n",n);@/
hput_tags(pos,hput_outline(t));
}
@
\subsection{Colors}\label{colors}
This is the third draft of implementing color\index{color} specifications in
a \HINT\ file.
According to the initial philosophy of a \HINT\ file, a viewer must be
capable of rendering a page given just any valid position in the
content section without reading the entire file. This makes it
impossible to use global information; only the information that is
localy available can be used. Given a file position, the viewer will
compute a representation of the page, insert it into a page template,
and pass it to the renderer. Color will not effect the position of
glyphs or rules and so it is sufficient to process the color
information when rendering the page. The renderer will, however,
render the page always from the top down and from left to right. As a
consequence of the rendering order, it is very well possible to work
with a color state within the top level boxes.
A separate issue is the specification of color changes on the top
level. While a vertical list contains no character nodes, a color
specification might still affect the background color and the
foreground color of rules. Because we still want to avoid the search
for color nodes on the top level, we restrict the scope of a color
node on the top level. It will extend only to the next possible page
break and applications like Hi\TeX\ must repeat a top level color node
after every node that could be used as a page break.
% Some statisticts
% format.hnt 4065407 1103094 byte (compressed), 31802 top level content nodes
% Nodes that could be page breaks:
% glue nodes 12346 if the preceeding node is a page break
% kern nodes 0 if the next node is a glue.
% penalty nodes 2193 (unless they are infinity or bigger)
% so about 15000 extra color nodes using about 45000 bytes
% could be needed. The extra space required is close to 1 percent
% not counting compression.
% Link nodes 10154 (5077 on and 5077 off nodes)
% Having extra color nodes for links is at most 30462 byte.
% This is less than 1 percent --- again without accounting for compression.
% This could be reduced to 10154 byte if we use the b100 bit
% and include the color ref in the link node - which might be
% a good idea.
The nesting of boxes on a page together with the transparency of colors
leads to the problem of stacking several layers of color one on top
of the other.
Here is an example: An outer box might specify blue as a background color
and white as a forground color while an inside box specifies a transparent
grey background and a transparent black foreground.
Then we expect text in the outer box to have white letters on blue background.
Further we want to see the inner box casting a grey shadown on the blue
background, resulting in a mix of blue and gray, with black letters on
top of it that are not completely black but let the background shine through.
To limit the complexity, the \HINT\ file format will allow this stacking
of colors only when nesting boxes. But inside a box, there is at any position
only one foreground and one background color; a color change inside a box
will simply replace the current colors.
If an application like Hi\TeX\ wants
to implement nesting colors inside a box, it has to implement its own color
stack and compute the necessary color mixtures.
There is only one exception to this concept: When a new box starts,
the current colors will be those of the enclosing box. These colors
can be restored after a color change by using the \<color off> command.
The limited complexity is necessary to simplifies the spliting of
boxes, for example by the line breaking routine. Repeating the last
color node before the split just after the split is sufficient.
Inside a horizontal list, a background color will extend from
top to bottom; inside a vertical list a background color will extend
from the left edge to the right edge.
If the document does not want to change the background color,
a completely transparent color should be used.
While the current implementation of searching does not use the
background color, a color set will still specify background and foreground
for all colors. This is simpler, easier to extend at a later time,
and the overhead is small.
After these preliminaries let's turn our attention to the design of
a suitable color concept.
%Before the present implementation of colors,
%we used a foreground color and a
%background color; then we had a special foreground color for links;
%and finaly the implementation of searching used special forground
%colors to mark matches and highlight the current search focus. As if
%that was not sufficient, viewers could be switched to ``dark mode'' using
%a different forground and background color, and of course it was
%desirable to adjust the link, mark, and focus color as well.
%
%In the future, a document designer might want to indicate a link using a
%different background color instead or in combination with a special
%foreground color. Similar considerations apply to the colors used for
%searching. By the way: the interaction of link color with mark or
%focus color was an open question; probably mixing the colors might be a
%good solution. But there is another feature that sets links appart
%from the matching text in a search: The start and end of a link is
%already recorded in a \HINT\ file. So an automatic color change might
%be called for.
%In the first redesign, a color set consisted of eight color pairs for
%normal, link, mark, and focus colors in day mode
%and four more color pairs for night mode;
%each pair specifying a background and a foreground color.
%
%It turned out that color switching using three bits in a status
%byte was inconvenient. Further this solution was mixing two different
%causes for a color change: The doument author's requests for a color
%change and the user interface's requests for a color change either
%by switching between day and night mode or by searching.
%While switching beween day and night mode requires the author to
%supply two complete color schemas, searching just needs to change
%the color of glyphs according the the given color schema.
%While the color change for links is always known from the document,
%the color changes due to searching depend on text input at run time.
%So mixing all that together was possibly not the best solution.
%The new and second redesign tries to seperate the different concerns.
Colors come in sets.
A color set supports two modes: day and night mode.
In future extensions it might be possible for
an author to invent color sets for winter or summer, fall or
spring, or any other resonable or unreasonable purpose.
For each mode a color set specifies three color styles:
one for normal text, one for marked text and one for in-focus text.
The switching between different modes and different styles is
left to the user interface.
We store a color set as an array of 12 words.
The first 6 words are for day mode the next 6 byte are for night mode;
For each mode we have three color pairs and each pair consists
of a forgraound and a backgraund color each stored as an RGBA value.
@<hint basic types@>=
typedef uint32_t ColorSet[2*3*2];
@
To extract the various sub-arrays, we have the following macros:
@<hint macros@>=
#define CURCOLOR(M,S,C) ((C)+6*(M)+2*(S))
#define DAY(C) CURCOLOR(0,0,C)
#define NIGHT(C) CURCOLOR(1,0,C)
#define HIGH(C) CURCOLOR(0,1,C)
#define FOCUS(C) CURCOLOR(0,2,C)
#define FG(C) ((C)[0])
#define BG(C) ((C)[1])
@
We will allow up to 255 color sets that are stored in the definition
section and are referenced in the content section by a single byte.
The definition of different color sets and the switching between them
is left to the document author.
The color set with reference number zero specifies the default colors.
At the root of a page template, the default color set is selected
and the whole page is filled with the background color for normal text.
For links, by default the color set with number one is used.
Section ~\secref{colordefault} specifies default values for both
color sets; the default colors can be overwritten.
The color sets with reference numbers zero and one are not stored in the
definition section of a short format file if they are the same as the default values.
This makes files not using colors compatible with older versions of the \HINT\ file
format.
Now we are ready for the implementation.
\vbox{\readcode\vskip -\baselineskip\putcode}
@s COLOR symbol
@s color symbol
@s color_pair symbol
@s color_tripple symbol
@s color_set symbol
@s color_null symbol
@<symbols@>=
%token COLOR "color"
@
@<scanning rules@>=
::@=color@> :< return COLOR; >:
@
Colors can be specified as a single number, preferably in hexa\-decimal
notation, giving the red, green, blue, and alpha value in a single
number. For example \.{0xFF0000FF} would be pure red,
and \.{0x00FF0080} would be transparent green.
Of course even decimal values can be used. A good example is the
value \.{0} which is equivalent to but a bit shorter than \.{0x0} or
\.{0x00000000} which describes a completely transparent black.
It is invisible because the alpha value is zero.
Alternatively, colors can be given as a list of three or four numbers
enclosed in pointed brackets \.{<} \dots \.{>}.
If only three numbers are given, the color is opaque with an alpha
value equivalent to \.{0xFF}. Using this format
the same colors as before can be written \.{<0xFF 0 0>} (pure red),
\.{<0 0xFF 0 0x80>} (transparent green) and \.{<0 0 0 0>} (transparent black).
The parser will put the color definition into |colors_n|
using the index |colors_i|. As we will see later, the |colors_n|
array is initialized with the colors in |colors_0| which in turn
is initialized from |color_defaults[0]|.
|colors_0| can be changed but only if that change occurs before
any other color definition.
@<common variables@>=
ColorSet colors_0, colors_n; /* default and current color set */
int colors_i; /* current color */
@
@<initialize definitions@>=
{ int i;
for (i=0;i<sizeof(ColorSet)/4;i++)
colors_0[i]=color_defaults[0][i];
}
@
@<parsing rules@>=@/
color: START UNSIGNED UNSIGNED UNSIGNED UNSIGNED END@|
{ RNG("red",$2,0,0xFF); RNG("green",$3,0,0xFF);
RNG("blue",$4,0,0xFF); RNG("alpha",$5,0,0xFF);
colors_n[colors_i++]=($2<<24)|($3<<16)|($4<<8)|$5;
}
| START UNSIGNED UNSIGNED UNSIGNED END@|
{ RNG("red",$2,0,0xFF); RNG("green",$3,0,0xFF);
RNG("blue",$4,0,0xFF);
colors_n[colors_i++]=($2<<24)|($3<<16)|($4<<8)|0xFF;
};
color: UNSIGNED { colors_n[colors_i++]=$1;};
@
Colors are always specified in pairs: a foreground color folowed by
background color enclosed in pointed brackets \.{<} \dots \.{>} as
usual. For convenience, the background color can be omited;
in this case a completely transparent background is assumed.
@<parsing rules@>=@/
color_pair: START color color END
| START color END { colors_n[colors_i++]=0; };
color_unset: { colors_i+=2;};
@
A complete color set consists of six color pairs
organized in two |color_tripple|s:
the first three pairs for normal, mark, and focus text
in day mode are followed by the three pairs in night mode.
The |color_tripple| for night mode is optional; and within
a |color_tripple| all color pairs except the first one are
optional. An omited color is replaced by the
corresponding color from the color set zero. To make the replacement
process more predictable, the specification of color set zero---if
present---must come first.
If the default color set itself is redefined, an unspecified
color will not change the default color.
To be open to future changes, color set definitions in the short format
will contain after the reference number the number of color pairs that follow.
Currently this value is always six.
@<parsing rules@>=@/
color_tripple: START color_pair color_unset color_unset END
| START color_pair color_pair color_unset END
| START color_pair color_pair color_pair END
;
color_set: color_tripple color_tripple;
color_set: color_tripple color_unset color_unset color_unset;
def_node: start COLOR ref { HPUT8(6); color_init(); } color_set END
{ DEF($$,color_kind,$3); hput_color_def($1,$3); };
@
@<put functions@>=
void color_init(void)
{ int i;
for (i=0;i<sizeof(ColorSet)/4;i++) colors_n[i]=colors_0[i];
colors_i=0;
}
static Tag hput_color_set(int n)
{ static bool first_color=true;
int i;
if (n==0)
{ if (first_color)
for (i=0;i<sizeof(ColorSet)/4;i++) colors_0[i]=colors_n[i];
else
QUIT("Redefinition of color set 0 must be the first color definition");
}
first_color=false;
HPUTX(sizeof(ColorSet)+1);
for (i=0;i<sizeof(ColorSet)/4;i++) HPUT32(colors_n[i]);
return TAG(color_kind,b000);
}
@
The |hput_color_def| checks if color sets zero or one need to be written.
If not, the function will reset |hpos| to undo the writing of the tag
and the number of colors in the set.
@<put functions@>=
static bool colors_equal(ColorSet a, ColorSet b)
{ int i;
for (i=0;i<sizeof(ColorSet)/4;i++)
if (a[i]!=b[i]) return false;
return true;
}
void hput_color_def(uint32_t pos, int n)
{ if ((n==0 && colors_equal(color_defaults[0], colors_n)) ||
(n==1 && colors_equal(color_defaults[1], colors_n)))
{ hpos=hstart+pos;
return;
}
hput_tags(pos,hput_color_set(n));
}
@
Compared to the definitions, the content nodes are pretty simple.
The special color number |0xFF| is reserved to indicate an
\<color off> node in the short format.
@<parsing rules@>=
content_node: start COLOR ref END
{REF_RNG(color_kind,$3); hput_tags($1,TAG(color_kind,b000));};
content_node: start COLOR OFF END
{ HPUT8(0xFF); hput_tags($1,TAG(color_kind,b000));};
@
\vbox{\writecode\vskip -\baselineskip\getcode}
We contine with the color content nodes:
@<cases to get content@>=
@t\1\kern1em@>
case TAG(color_kind,b000):
{ uint8_t n=HGET8;@+
if (n==0xFF) hwritef(" off");
else { REF(color_kind,n); @+hwrite_ref(n);@+}
}
break;
@
And now we turn to the color definitions:
@<get functions@>=
void hwrite_color_pair(uint32_t f, uint32_t b)
{ hwritec('<');
if (f==0) hwritec('0'); else hwritef("0x%08X",f);
if (b!=0) hwritef(" 0x%08X",b);
hwritec('>');
}
void hget_color_set(uint32_t node_pos, ColorSet cs)
{ int i,m;
for (i=0;i<sizeof(ColorSet)/4;i++)
HGET32(cs[i]);
for(m=0;m<2;m++)
{ uint32_t *c, *d;
bool diff_high, diff_focus;
if (m==0)
{ c=cs; d=color_defaults[0]; }
else
{ c=NIGHT(cs); d=NIGHT(color_defaults[0]);
if (memcmp(c,d,sizeof(ColorSet)/2)==0)
return;
}
hwrite_start();
diff_high=FG(HIGH(c))!=FG(HIGH(d))|| BG(HIGH(c))!=BG(HIGH(d));
diff_focus=FG(FOCUS(c))!=FG(FOCUS(d))||BG(FOCUS(c))!=BG(FOCUS(d));
hwrite_color_pair(FG(c),BG(c));
if (diff_high || diff_focus)
{ hwritec(' '); hwrite_color_pair(FG(HIGH(c)),BG(HIGH(c)));}
if (diff_focus)
{ hwritec(' '); hwrite_color_pair(FG(FOCUS(c)),BG(FOCUS(c)));}
hwrite_end();
}
}
@
@<cases to get definitions for |color_kind|@>=
case b000:
{ int k;
ColorSet c;
static bool first_color=true;
k=HGET8;
if (k<6)
QUIT("Definition %d of color set needs 6 color pairs only %d given\n",n,k);
hget_color_set(node_pos,c);
if (n==0)
{ if (!first_color)
QUIT("Definition of color set zero must be first");
memcpy(&color_defaults[0],&c,sizeof(ColorSet));
}
first_color=false;
}
break;
@
\subsection{Rotation}
When it comes to rotation, there is a big difference between printed books and
computer displays. For example, if a book contains a table that is rotated
to fill a page in landscape mode, the reader can rotate the book and read
the table. If you are looking at the same page displayed on a
big computer monitor, you will most likely not turn the whole monitor.
Instead your viewing application will be able to perform the rotation for you
before displaying the page. A smart phone, on the other hand, is
easy to turn. But very likely, it will try to be smart and rerenders the
content on the display to keep the same orientation.
Occasionaly, however, rotation of text is a desirable feature. For example,
if a table has lots of tall columns with lenghty column headers. It might be
usefull to rotate the column headers in order to keep the column width within
reasonable limits.
A simple solution therefore would be optional parameters for boxes
specifying center and angle for rotating the box.
\subsection{Unknown Extensions}
Starting with the inclusion in the \TeX\ Live 2022 distribution, the \HINT\ file format
became accessible to a wider audience which brought the constant rewrite and upgrade cycle
to a sudden halt. Except for bug fixes, pretty much nothing happened for a about a year.
When the \TeX\ Live 2023 distribution started to appear on the horizon, one extension
that I had on my wish-list already for a long time---the support of \TeX's
\\{vtop} boxes---was definitely due for implementation.
Adding new tag bytes to the specification of the short file format will, however,
invalidate all \HINT\ file viewers and requires everybody to upgrade the viewing
application. Because the \HINT\ file format is still in its infancy, more
such additions are to be expected and the new version 2.0 file format needs a way to handle
such yet unknown extensions gracefuly. For this purpose the definition section
now may specify additional entries for the |hnode_size| array.
All \HINT\ file viewers starting with version 2.0 will use these entries to skip
unknown nodes and display the remaining content of \HINT\ files.
\readcode
In the long format, unknown nodes, whether in the definition or the content section,
start with the keyword \\{unknown}.
@s UNKNOWN symbol
@<symbols@>=
%token UNKNOWN "unknown"
@
@<scanning rules@>=
::@=unknown@> :< return UNKNOWN; >:
@
In the definition section, the keyword is followed by the tag and the length of the initial
part of the node (not counting the start byte), after which follows optionaly the number of trailing nodes embedded
in the unknown node. There is no need for a maximum value, because the information
is stored directly in the |hnode_size| array.
@<parsing rules@>=
def_node: start UNKNOWN UNSIGNED UNSIGNED END { hput_tags($1,hput_unknown_def($3,$4,0));@+}
| start UNKNOWN UNSIGNED UNSIGNED UNSIGNED END { hput_tags($1,hput_unknown_def($3,$4,$5));@+};
@
In the content section, the keyword is followed by the tag value, the remaining byte values
belonging to the initial part and the nodes belonging to the trailing part.
The end byte, which is equal to the start byte, is omited from the long format.
@<symbols@>=
%type <u> unknown_bytes
%type <u> unknown_nodes
@
@<parsing rules@>=
content_node: start UNKNOWN UNSIGNED unknown_bytes unknown_nodes END { hput_tags($1,hput_unknown($1,$3,$4,$5));@+}
unknown_bytes: {$$=0;} | unknown_bytes UNSIGNED { RNG("byte",$2,0,0xFF); HPUT8($2); $$=$1+1; };
unknown_node: content_node | xdimen_node | list | named_param_list;
unknown_nodes: {$$=0;} | unknown_nodes unknown_node { RNG("unknown subnodes",$1,0,3); $$=$1+1; };
@
\putcode
In the short format, definitions for unknown nodes are marked with |TAG(unknown_kind,b100)|.
This tag is not used elsewhere (see also page \pageref{FC}).
We do not check for multiple definitions of the same
tag. But only the first of them is considered valid.
After the start byte follows the unknown tag and the corresponding entry in the |hnode_size| array.
@<put functions@>=
uint32_t hput_unknown_def(uint32_t t, uint32_t b, uint32_t n)
{ if (n==0)
{ RNG("unknown tag",t,TAG(param_kind,7)+1,TAG(int_kind,0)-1);
RNG("unknown initial bytes",b,0,0x7F-2);
HPUT8(t);
HPUT8(b+2); /* adding start and end byte */
if (hnode_size[t]==0)
{ hnode_size[t]=NODE_SIZE(b,0);
DBG(DBGTAGS,"Defining unknown node size %d,%d for tag 0x%x\n",b,n,t);
}
}
else
{ int i;
RNG("unknown tag",t,TAG(param_kind,7)+1,TAG(int_kind,0)-1);
RNG("unknown initial bytes",b,0,0x1F-1);
RNG("unknown trailing nodes",n,1,4);
HPUT8(t);
i=NODE_SIZE(b,n);
HPUT8(i);
if (hnode_size[t]==0)
{ hnode_size[t]=i;
DBG(DBGTAGS,"Defining unknown node size %d,%d for tag 0x%x\n",b,n,t);
}
}
return TAG(unknown_kind,b100);
}
@
In the content section, the unknown nodes are of course marked with their unknown tag.
@<put functions@>=
Tag hput_unknown(uint32_t pos,uint32_t t, uint32_t b, uint32_t n)
{ int s;
RNG("unknown tag",t,TAG(param_kind,7)+1,TAG(int_kind,0)-1);
if (n==0)
{ RNG("unknown initial bytes",b,0,0x7F-2);
s=NODE_SIZE(b,0);
}
else
{ RNG("unknown initial bytes",b,0,0x1F-2);
RNG("unknown trailing nodes",n,1,4);
s=NODE_SIZE(b,n);
}
DBG(DBGTAGS,"Adding unknown node size %d,%d tag 0x%x at 0x%x\n",b,n,t,pos);
if (hnode_size[t]!=s)
QUIT(@["Size %d of unknown node [%s,%d] at " SIZE_F " does not match %d\n"@],s,NAME(t),INFO(t),hpos-hstart,hnode_size[t]);
return (Tag)t;
}
@
\goodbreak
\vbox{\getcode\vskip -\baselineskip\writecode}
@<get functions@>=
void hget_unknown_def(void)
{ Tag t; signed char i, b=0, n=0;
t=HGET8;
i=HGET8;
if (i==0)
QUIT("Zero not allowed for unknown node size at 0x%x\n",(uint32_t)(hpos-hstart-2));
hwrite_start(); @+hwritef("unknown");
b=NODE_HEAD(i); n=NODE_TAIL(i);
if (n==0)
hwritef(" 0x%02X %d",t,b);
else
hwritef(" 0x%02X %d %d",t,b, n);
if (hnode_size[t]==0)
{ hnode_size[t]=i;
DBG(DBGTAGS,"Defining node size %d,%d for tag 0x%x\n",b,n,t);
}
hwrite_end();
}
@
The |hget_unknown| funktion tries to process a unknown node with the help of
an entry in the |hnode_size| array. The definition section can be used to provide
this extra information. If successful the function returns 1 else 0.
@<get functions@>=
int hget_unknown(Tag a)
{ int b, n;
int8_t s;
s = hnode_size[a];
DBG(DBGTAGS,"Trying unknown tag 0x%x at 0x%x\n",a,(uint32_t)(hpos-hstart-1));
if (s==0) return 0;
b=NODE_HEAD(s); n=NODE_TAIL(s);
DBG(DBGTAGS,"Trying unknown node size %d %d\n",b,n);
hwritef("unknown 0x%02X", a);
while (b>0) { a=HGET8; hwritef(" 0x%02X",a); b--;}
while (n>0) {
a=*hpos;
if (KIND(a)==xdimen_kind) { Xdimen x; hget_xdimen_node(&x); @+hwrite_xdimen_node(&x); }
else if (KIND(a)==param_kind) { List l; @+hget_param_list(&l);@+ hwrite_named_param_list(&l); @+}
else if (KIND(a)<=list_kind) { List l; @+hget_list(&l);@+ hwrite_list(&l); @+}
else hget_content_node();
n--; }
return 1;
}
@
\section{Replacing \TeX's Page Building Process}
\TeX\ uses an output\index{output routine} routine to finalize the page. It uses the accumulated material
from the page builder, found in {\tt box255}, attaches headers, footers, and floating material
like figures, tables, and footnotes. The latter material is specified by insert nodes
while headers and footers are often constructed using mark nodes.
Running an output routine requires the full power of the \TeX\ engine and will not be
part of the \HINT\ viewer. Therefore, \HINT\ replaces output routines by page templates\index{template}.
As \TeX\ can use different output routines for different parts of a book---for example
the index might use a different output routine than the main body of text---\HINT\
will allow multiple page templates. To support different output media, the page
templates will be named and a suitable user interface may offer the user a selection
of possible page layouts. In this way, the page layout remains in the hands of the
book designer, and the user has still the opportunity to pick a layout that best fits
the display device.
\TeX\ uses insertions to describe floating content that is not necessarily displayed
where it is specified. Three examples may illustrate this:
\itemize
\item Footnotes\footnote*{Like this one.} are specified in the middle of the text but are displayed at the
bottom of the page. Several
footnotes\index{footnote} on the same page are collected and displayed together. The
page layout may specify a short rule to separate footnotes from the
main text, and if there are many short footnotes, it may use two columns
to display them. In extreme cases, the page layout may demand a long
footnote to be split and continued on the next page.
\item Illustrations\index{illustration} may be displayed exactly where specified if there is enough
room on the page, but may move to the top of the page, the bottom of the page,
the top of next page, or a separate page at the end of the chapter.
\item Margin notes\index{margin note} are displayed in the margin on the same page starting at the top
of the margin.
\enditemize
\HINT\ uses page templates and content streams to achieve similar effects.
But before I describe the page building\index{page building} mechanisms of \HINT, let me summarize \TeX's page builder.
\TeX's page builder ignores leading glue\index{glue}, kern\index{kern}, and penalty\index{penalty} nodes until the first
box\index{box} or rule\index{rule} is encountered;
whatsit\index{whatsit node} nodes do not really contribute anything to a page; mark\index{mark node} nodes are recorded for later use.
Once the first box, rule, or insert\index{insert node} arrives, \TeX\ makes copies of all parameters
that influence the page building process and uses these copies. These parameters
are the |page_goal| and the |page_max_depth|. Further, the variables
|page_total|, |page_shrink|, |page_stretch|, |page_depth|,
and {\it insert\_pe\-nal\-ties\/} are initialized to zero.
The top skip\index{top skip} adjustment is made
when the first box or rule arrives---possibly after an insert.
Now the page builder accumulates material: normal material goes into {\tt box255}\index{box 255} and will change |page_total|, |page_shrink|,
|page_stretch|, and |page_depth|. The latter is adjusted so that
is does not exceed |page_max_depth|.
The handling of inserts\index{insert node} is more complex.
\TeX\ creates an insert class using \.{newinsert}. This reserves a number $n$
and four registers: {\tt box\hair$n$} for the inserted material, {\tt count\hair$n$} for the
magnification factor $f$, {\tt dimen\hair$n$} for the maximum size per page $d$, and {\tt skip\hair$n$} for the
extra space needed on a page if there are any insertions of class $n$.
For example plain \TeX\ allocates $n=254$ for footnotes\index{footnote} and sets
{\tt count254} to~$1000$, {\tt dimen254} to 8in, and {\tt skip254} to {\tt \BS bigskipamount}.
An insertion node will specify the insertion class $n$, some vertical material,
its natural height plus depth $x$, a {\it split\-\_top\-\_skip}, a {\it split\-\_max\_depth},
and a {\it floa\-ting\-\_pe\-nal\-ty}.
Now assume that an insert node with subtype 254 arrives at the page builder.
If this is the first such insert, \TeX\ will decrease the |page_goal|
by the width of skip254 and adds its stretchability and shrinkability
to the total stretchability and shrinkability of the page. Later,
the output routine will add some space and the footnote rule to fill just that
much space and add just that much shrinkability and stretchability to the page.
Then \TeX\ will normally add the vertical material in the insert node to
box254 and decrease the |page_goal| by $x\times f/1000$.
Special processing is required if \TeX\ detects that there is not enough space on
the current page to accommodate the complete insertion.
If already a previous insert did not fit on the page, simply the |floating_penalty|
as given in the insert node is added to the total |insert_penalties|.
Otherwise \TeX\ will test that the total natural height plus depth of box254
including $x$ does not exceed the maximum size $d$ and that the
$|page_total| + |page_depth| + x\times f/1000 - |page_shrink| \le |page_goal|$.
If one of these tests fails, the current insertion
is split in such a way as to make the size of the remaining insertions just pass the tests
just stated.
Whenever a glue node, or penalty node, or a kern node that is followed by glue arrives
at the page builder, it rates the current position as a possible end of the page based on
the shrinkability of the page and the difference between |page_total| and |page_goal|.
As the page fills, the page breaks tend to become better and better until the
page starts to get overfull and the page breaks get worse and worse until
they reach the point where they become |awful_bad|. At that point,
the page builder returns to the best page break found so far and fires up the
output routine.
Let's look next at the problems that show up when implementing a replacement mechanism for \HINT.
\enumerate
\item
An insertion node can not always specify its height $x$ because insertions may contain paragraphs that need
to be broken in lines and the height of a paragraph depends in some non obvious way on
its width.
\item
Before the viewer can compute the height $x$, it needs to know the width of the insertion. Just imagine
displaying footnotes in two columns or setting notes in the margin. Knowing the width, it
can pack the vertical material and derive its height and depth.
\item
\TeX's plain format provides an insert macro that checks whether there is still space
on the current page, and if so, it creates a contribution to the main text body, otherwise it
creates a topinsert. Such a decision needs to be postponed to the \HINT\ viewer.
\item
\HINT\ has no output routines that would specify something like the space and the rule preceding the footnote.
\item
\TeX's output routines have the ability to inspect the content of the boxes,
split them, and distribute the content over the page.
For example, the output routine for an index set in two column format might
expect a box containing index entries up to a height of $2\times\.{vsize}$.
It will split this box in the middle and display the top part in the left
column and the bottom part in the right column. With this approach, the
last page will show two partly filled columns of about equal size.
\item
\HINT\ has no mark nodes that could be used to create page headers or footers.
Marks, like output routines, contain token lists and need the full \TeX\ interpreter
for processing them. Hence, \HINT\ does not support mark nodes.
\endenumerate
Here now is the solution I have chosen for \HINT:
Instead of output routines, \HINT\ will use page templates.
Page templates are basically vertical boxes with placeholders marking the
positions where the content of the box registers, filled by the page builder,
should appear.
To output the page, the viewer traverses the page template,
replaces the placeholders by the appropriate box content, and
sets the glue. Inside the page template, we can use insert nodes to act
as placeholders.
It is only natural to treat the page's main body, the
inserts, and the marks using the same mechanism. We call this
mechanism a content stream\index{stream}.
Content streams are identified by a stream number in the range 0 to 254;
the number 255 is used to indicate an invalid stream number.
The stream number 0 is reserved for the main content stream; it is always defined.
Besides the main content stream, there are three types of streams:
\itemize
\item normal streams correspond to \TeX's inserts and accumulate content on the page,
\item first\index{first stream} streams correspond to \TeX's first marks and will contain only the first insertion of the page,
\item last\index{last stream} streams correspond to \TeX's bottom marks and will contain only the last insertion of the page, and
\item top\index{top stream} streams correspond to \TeX's top marks. Top streams are not yet implemented.
\enditemize
Nodes from the content section are considered contributions to stream 0 except
for insert nodes which will specify the stream number explicitly.
If the stream is not defined or is not used in the current page template, its content is simply ignored.
The page builder needs a mechanism to redirect contributions from one content
stream to another content stream based on the availability of space.
Hence a \HINT\ content stream can optionally specify a preferred stream number,
where content should go if there is still space available, a next stream number,
where content should go if the present stream has no more space available, and
a split ratio if the content is to be split between these two streams before
filling in the template.
Various stream parameters govern the treatment of contributions to the stream
and the page building process.
\itemize
\item The magnification factor $f$: Inserting a box of height $h$ to this stream will contribute $h\times f/1000$
to the height of the page under construction. For example, a stream
that uses a two column format will have an $f$ value of 500; a stream
that specifies notes that will be displayed in the page margin will
have an $f$ value of zero.
\item The height $h$: The extended dimension $h$ gives the maximum height this
stream is allowed to occupy on the current page.
To continue the previous example, a stream that will be split into two columns
will have $h=2\cdot\.{vsize}$ , and a stream that specifies
notes that will be displayed in the page margin will have
$h=1\cdot\.{vsize}$. You can restrict the amount of space occupied by
footnotes to the bottom quarter by setting the corresponding $h$ value
to $h=0.25\cdot\.{vsize}$.
\item The depth $d$: The dimension $d$ gives the maximum depth this
stream is allowed to have after formatting.
\item The width $w$: The extended dimension $w$ gives the width of this stream
when formatting its content. For example margin notes
should have the width of the margin less some surrounding space.
\item The ``before'' list $b$: If there are any contributions to this
stream on the current page, the material in list $b$
is inserted {\it before\/} the material from the stream itself. For
example, the short line that separates the footnotes from the main
page will go, together with some surrounding space, into the list~$b$.
\item The top skip glue $g$: This glue is inserted between the material
from list $b$ and the first box of the stream, reduced
by the height of the first box. Hence it specifies the distance between
the material in $b$ and the first baseline of the stream content.
\item The ``after'' list $a$: The list $a$ is treated like list $b$ but
its material is placed {\it after\/} the material from the stream itself.
\item The ``preferred'' stream number $p$: If $p\ne 255$, it is the number of
the {\it preferred\/} stream. If stream $p$ has still
enough room to accommodate the current contribution, move the
contribution to stream $p$, otherwise keep it. For example, you can
move an illustration to the main content stream, provided there is
still enough space for it on the current page, by setting $p=0$.
\item The ``next'' stream number $n$: If $n\ne 255$, it is the number of the
{\it next\/} stream. If a contribution can not be
accommodated in stream $p$ nor in the current stream, treat it as an
insertion to stream $n$. For example, you can move contributions to
the next column after the first column is full, or move illustrations
to a separate page at the end of the chapter.
\item The split ratio\index{split ratio} $r$: If $r$ is positive, both $p$ and $n$ must
be valid stream numbers and contents is not immediately moved to stream $p$ or $n$ as described before.
Instead the content is kept in the stream itself until the current page is complete.
Then, before inserting the streams into the page template, the content of
this stream is formatted as a vertical box, the vertical box is
split into a top fraction and a bottom fraction in the ratio $r/1000$
for the top and $(1000-r)/1000$ for the bottom, and finally the top
fraction is moved to stream $p$ and the bottom fraction to stream
$n$. You can use this feature for example to implement footnotes
arranged in two columns of about equal size. By collecting all the
footnotes in one stream and then splitting the footnotes with $r=500$
before placing them on the page into a right and left column. Even
three or more columns can be implemented by cascades of streams using
this mechanism.
\enditemize
\subsection{Stream Definitions}
\index{stream}
There are four types of streams: normal streams that work like \TeX's inserts;
and first, last, and top streams that work like \TeX's marks.
For the latter types, the long format uses a matching keyword and the
short format the two least significant info bits. All stream definitions
start with the stream number.
In definitions of normal streams after the number follows in this order
\itemize
\item the maximum insertion height,
\item the magnification factor, and
\item information about splitting the stream.
It consists of: a preferred stream, a next stream, and a split ratio.
An asterisk indicates a missing stream reference, in the
short format the stream number 255 serves the same purpose.
\enditemize
All stream definitions finish with
\itemize
\item the ``before'' list,
\item an extended dimension node specifying the width of the inserted material,
\item the top skip glue,
\item the ``after'' list,
\item and the total height, stretchability, and shrinkability of the material in
the ``before'' and ``after'' list.
\enditemize
A special case is the stream definition for stream 0, the main content stream.
None of the above information is necessary for it so it is omitted.
Stream definitions, including the definition of stream 0,
occur only inside page template definitions\index{template}
where they occur twice in two different roles:
In the stream definition list, they define properties of the stream
and in the template they mark the insertion point (see section~\secref{page}).
In the latter case, stream nodes just contain the stream number.
Because a template looks like ordinary vertical material,
we like to use the same functions for parsing it.
But stream definitions are very different from stream content
nodes. To solve the problem for the long format,
the scanner will return two different tokens
when it sees the keyword ``{\tt stream}''.
In the definition section, it will return
|STREAMDEF| and in the content section |STREAM|.
The same problem is solved in the short format
by using the |b100| bit to mark a definition.
\goodbreak
\vbox{\readcode\vskip -\baselineskip\putcode}
@s STREAM symbol
@s STREAMDEF symbol
@s TOP symbol
@s FIRST symbol
@s LAST symbol
@s NOREFERENCE symbol
@s stream_type symbol
@s stream_info symbol
@s stream_split symbol
@s stream_link symbol
@s stream_def_node symbol
@s stream_ins_node symbol
@s stream_ref symbol
@<symbols@>=
%token STREAM "stream"
%token STREAMDEF "stream (definition)"
%token FIRST "first"
%token LAST "last"
%token TOP "top"
%token NOREFERENCE "*"
%type <info> stream_type
%type <u> stream_ref
%type <rf> stream_def_node
@
@<scanning rules@>=
::@=stream@> :< if (section_no==1) return STREAMDEF; else return STREAM;@+ >:
::@=first@> :< return FIRST; >:
::@=last@> :< return LAST; >:
::@=top@> :< return TOP; >:
::@=\*@> :< return NOREFERENCE; >:
@
@<parsing rules@>=
stream_link: ref { REF_RNG(stream_kind,$1); } | NOREFERENCE {HPUT8(255);};
stream_split: stream_link stream_link UNSIGNED @/{RNG("split ratio",$3,0,1000); HPUT16($3);};
stream_info: xdimen_node UNSIGNED @/{RNG("magnification factor",$2,0,1000); HPUT16($2);} stream_split;
stream_type: stream_info {$$=0;} |FIRST {$$=1;} @+ | LAST {$$=2;} @+ |TOP {$$=3;} ;
stream_def_node: start STREAMDEF ref stream_type @/
list xdimen_node glue_node list glue_node END @/
{@+ DEF($$,stream_kind,$3); @+ hput_tags($1,TAG(stream_kind,$4|b100));};
stream_ins_node: start STREAMDEF ref END@/
{ RNG("Stream insertion",$3,0,max_ref[stream_kind]); hput_tags($1,TAG(stream_kind,b100));};
content_node: stream_def_node @+ | stream_ins_node;
@
\goodbreak
\vbox{\getcode\vskip -\baselineskip\writecode}
@<get stream information for normal streams@>=
{ Xdimen x;
uint16_t f,r;
uint8_t n;
DBG(DBGDEF,"Defining normal stream %d at " SIZE_F "\n",*(hpos-1),hpos-hstart-2);
hget_xdimen_node(&x); @+hwrite_xdimen_node(&x);
HGET16(f); @+RNG("magnification factor",f,0,1000);@+ hwritef(" %d",f);
n=HGET8; if (n==255) hwritef(" *"); else { REF_RNG(stream_kind,n);@+hwrite_ref(n);@+}
n=HGET8; if (n==255) hwritef(" *"); else { REF_RNG(stream_kind,n);@+hwrite_ref(n);@+}
HGET16(r); RNG("split ratio",r,0,1000); hwritef(" %d",r);
}
@
@<get functions@>=
static bool hget_stream_def(void)
{@+ if (KIND(*hpos)!=stream_kind || !(INFO(*hpos)&b100))
return false;
else
{ Ref df;
@<read the start byte |a|@>@;
DBG(DBGDEF,"Defining stream %d at " SIZE_F "\n",*hpos,hpos-hstart-1);
DEF(df,stream_kind,HGET8);
hwrite_start();@+hwritef("stream");@+@+hwrite_ref(df.n);
if (df.n>0)
{ Xdimen x; @+ List l;
if (INFO(a)==b100) @<get stream information for normal streams@>@;
else if (INFO(a)==b101) hwritef(" first");
else if(INFO(a)==b110) hwritef(" last");
else if (INFO(a)==b111) hwritef(" top");
hget_list(&l);@+ hwrite_list(&l);
hget_xdimen_node(&x); @+hwrite_xdimen_node(&x);
hget_glue_node();@+
hget_list(&l);@+ hwrite_list(&l);@+
hget_glue_node();
}
@<read and check the end byte |z|@>@;
hwrite_end();
return true;
}
}
@
When stream definitions are part of the page template, we call them
stream insertion points.
They contain only the stream reference and
are parsed by the usual content parsing functions.
@<cases to get content@>=
@t\1\kern1em@>
case TAG(stream_kind,b100): {uint8_t n=HGET8;@+ REF_RNG(stream_kind,n); @+hwrite_ref(n); @+ break; @+}
@
\subsection{Stream Content}
Stream\index{stream} nodes occur in the content section where they
must not be inside other nodes except toplevel
paragraph\index{paragraph} nodes. A normal stream node contains in this
order: the stream reference number, the optional stream parameters,
and the stream content. The content is either a vertical box or an
extended vertical box. The stream parameters consists of the
|floating_penalty|, the |split_max_depth|, and the
|split_top_skip|. The parameterlist can be given
explicitly or as a reference.
In the short format, the info bits |b010| indicate
a normal stream content node with an explicit parameter list
and the info bits |b000| a normal stream with a parameter list reference.
If the info bit |b001| is set, we have a content node of type top, first,
or last. In this case, the short format has instead of the parameter list
a single byte indicating the type.
These types are currently not yet implemented.
\goodbreak
\vbox{\readcode\vskip -\baselineskip\putcode}
@s stream symbol
@<symbols@>=
%type <info> stream
@
@<parsing rules@>=
stream: param_list list {$$=b010;}
| param_ref list {$$=b000;};
content_node: start STREAM stream_ref stream END
@/{@+hput_tags($1,TAG(stream_kind,$4)); @+};
@
\goodbreak
\vbox{\getcode\vskip -\baselineskip\writecode}
@<cases to get content@>=
@t\1\kern1em@>
case TAG(stream_kind,b000): HGET_STREAM(b000); @+ break;
case TAG(stream_kind,b010): HGET_STREAM(b010); @+ break;
@
When we read stream numbers, we relax the define before use policy.
We just check, that the stream number is in the correct range.
\goodbreak
@<get macros@>=
#define @[HGET_STREAM(I)@] @/\
{uint8_t n=HGET8;@+ REF_RNG(stream_kind,n); @+hwrite_ref(n);@+}\
if ((I)&b010) { List l; @+hget_param_list(&l); @+hwrite_param_list(&l); @+} \
else HGET_REF(param_kind);\
{ List l; @+hget_list(&l);@+ hwrite_list(&l); @+}
@
\subsection{Page Template Definitions}\label{page}
A \HINT\ file can define multiple page templates\index{template}. Not only
might an index demand a different page layout than the main body of text,
also the front page or the chapter headings might use their own page templates.
Further, the author of a \HINT\ file might define a two column format as
an alternative to a single column format to be used if the display area
is wide enough.
To help in selecting the right page template, page template definitions start with
a name and an optional priority\index{priority}; the default priority is 1.
The names might appear in a menu from which the user
can select a page layout that best fits her taste.
Without user interaction, the
system can pick the template with the highest priority. Of course,
a user interface might provide means to alter priorities. Future
versions might include sophisticated feature-vectors that
identify templates that are good for large or small displays,
landscape or portrait mode, etc \dots
After the priority follows a glue node to specify the topskip glue
and the dimension of the maximum page depth,
an extended dimension to specify the page height and
an extended dimension to specify the page width.
Then follows the main part of a page template definition: the template.
The template consists of a list of vertical material.
To construct the page, this list will be placed
into a vertical box and the glue will be set.
But of course before doing so, the viewer will
scan the list and replace all stream insertion points
by the appropriate content streams.
Let's call the vertical box obtained this way ``the page''.
The page will fill the entire display area top to bottom and left to right.
It defines not only the appearance of the main body of text
but also the margins, the header, and the footer.
Because the \.{vsize} and \.{hsize} variables of \TeX\ are used for
the vertical and horizontal dimension of the main body of text---they
do not include the margins---the page will usually be wider than \.{hsize}
and taller than \.{vsize}. The dimensions of the page are part
of the page template. The viewer, knowing the actual dimensions
of the display area, can derive from them the actual values of \.{hsize}
and \.{vsize}.
Stream definitions are listed after the template.
The page template with number 0 is always defined and has priority 0.
It will display just the main content stream. It puts a small margin
of $\.{hsize}/8 -4.5\hbox{pt}$ all around it.
Given a letter size page, 8.5 inch wide, this formula yields a margin of 1 inch,
matching \TeX's plain format. The margin will be positive as long as
the page is wider than $1/2$ inch. For narrower pages, there will be no
margin at all. In general, the \HINT\ viewer will never set {\tt hsize} larger
than the width of the page and {\tt vsize} larger than its height.
%8.5 in should give 1 inch margin 2/17
%612pt should give 72pt margin
%72pt = 612/8-4.5pt
%This would give a positive margin starting at 36pt or 1/2 inch
\goodbreak
\vbox{\readcode\vskip -\baselineskip\putcode}
@s PAGE symbol
@s page_priority symbol
@s page symbol
@s stream_def_list symbol
@<symbols@>=
%token PAGE "page"
@
@<scanning rules@>=
::@=page@> :< return PAGE; >:
@
@<parsing rules@>=
page_priority: { HPUT8(1); }
| UNSIGNED { RNG("page priority",$1,0,255); HPUT8($1); };
stream_def_list: | stream_def_list stream_def_node;
page: string { hput_string($1);} page_priority glue_node dimension {@+HPUT32($5);@+}
xdimen_node xdimen_node
list stream_def_list ;
@
\goodbreak
\vbox{\getcode\vskip -\baselineskip\writecode}
@<get functions@>=
void hget_page(void)
{ char *n; uint8_t p; Xdimen x; List l;
HGET_STRING(n);@+ hwrite_string(n);
p=HGET8; @+ if (p!=1) hwritef(" %d",p);
hget_glue_node();
hget_dimen(TAG(dimen_kind,b001));
hget_xdimen_node(&x); @+hwrite_xdimen_node(&x); /* page height */
hget_xdimen_node(&x); @+hwrite_xdimen_node(&x); /* page width */
hget_list(&l);@+ hwrite_list(&l);
while (hget_stream_def()) continue;
}
@
\subsection{Page Ranges}\label{range}\index{page range}
Not every template\index{template} is necessarily valid for the entire content
section. A page range specifies a start position $a$ and an end
position $b$ in the content section and the page template is valid if
the start position $p$ of the page is within that range: $a\le p < b$.
If paging backward this definition might cause problems because the
start position of the page is known only after the page has been
build. In this case, the viewer might choose a page template based on
the position at the bottom of the page.
If it turns out that this ``bottom template''
is no longer valid when the page builder has found the start of the
page, the viewer might display the page anyway with the bottom
template, it might just display the page with the new ``top
template'', or rerun the whole page building process using this time
the ``top template''. Neither of these alternatives is guaranteed to
produce a perfect result because changing the page template might
change the amount of material that fits on the page. A good page
template design should take this into account.
The representation of page ranges differs significantly for the short
format and the long format. The short format will include a list of page
ranges in the definition section which consist of a page template number,
a start position, and an end position. In the long format, the start
and end position of a page
range is marked with a page range node switching the availability of a
page template on and off. Such a page range node must be a top level node.
It is an error, to switch a page template
off that was not switched on, or to switch a page template on that was
already switched on. It is permissible to omit switching off a page
template at the very end of the content section.
While we parse a long format \HINT\ file, we store page ranges and generate
the short format after reaching the end of the content section.
While we parse a short format \HINT\ file,
we check at the end of each top level node whether we should insert a
page range node into the output.
For the \.{shrink} program, it is best
to store the start and end positions of all page ranges
in an array sorted by the position\footnote*{For a \HINT\ viewer,
a data structure which allows fast retrieval of all
valid page templates for a given position is needed.}.
To check the restrictions on the switching of page templates, we
maintain for every page template an index into the range array
which identifies the position where the template was switched on.
A zero value instead of an index will identify templates that
are currently invalid. When switching a range off again, we
link the two array entries using this index. These links
are useful when producing the range nodes in short format.
A range node in short format contains the template number, the
start position and the end position.
A zero start position
is not stored, the info bit |b100| indicates a nonzero start position.
An end position equal to |HINT_NO_POS| is not stored,
the info bit |b010| indicates a smaller end position.
The info bit |b001| indicates that positions are stored using 2 byte
otherwise 4 byte are used for the positions.
@<hint types@>=
typedef
struct {@+uint8_t pg; @+uint32_t pos; @+ bool on; @+int link;@+} RangePos;
@
@<common variables@>=
RangePos *range_pos;
int next_range=1, max_range;
int *page_on;
@
@<initialize definitions@>=
ALLOCATE(page_on,max_ref[page_kind]+1,int);
ALLOCATE(range_pos,2*(max_ref[range_kind]+1),RangePos);
@
@<hint macros@>=
#define @[ALLOCATE(R,S,T)@] @/((R)=@[(T *)calloc((S),sizeof(T)),\
(((R)==NULL)?QUIT("Out of memory for " #R):0))
#define @[REALLOCATE(R,S,T)@] @/((R)=@[(T *)realloc((R),(S)*sizeof(T)),\
(((R)==NULL)?QUIT("Out of memory for " #R):0))
@
\readcode
@s RANGE symbol
@<symbols@>=
%token RANGE "range"
@
@<scanning rules@>=
::@=range@> :< return RANGE; >:
@
@<parsing rules@>=
content_node: START RANGE REFERENCE ON END @/{ REF(page_kind,$3);hput_range($3,true);}
| START RANGE REFERENCE OFF END @/{ REF(page_kind,$3);hput_range($3,false);};
@
\writecode
@<write functions@>=
void hwrite_range(void) /* called in |hwrite_end| */
{ uint32_t p=hpos-hstart;
DBG(DBGRANGE,"Range check at pos 0x%x next at 0x%x\n",p,range_pos[next_range].pos);
while (next_range<max_range && range_pos[next_range].pos <= p)
{ hwrite_start();
hwritef("range *%d ",range_pos[next_range].pg);
if (range_pos[next_range].on) hwritef("on"); else hwritef("off");
nesting--; @+hwritec('>'); /* avoid a recursive call to |hwrite_end| */
next_range++;
}
}
@
\getcode
@<get functions@>=
void hget_range(Info info, uint8_t pg)
{ uint32_t from, to;
REF(page_kind,pg);
REF(range_kind,(next_range-1)/2);
if (info&b100) @+
{ @+ if (info&b001) HGET32(from); @+else HGET16(from); @+}
else from=0;
if (info&b010) @+
{ @+if (info&b001) HGET32(to); @+else HGET16(to); @+}
else to=HINT_NO_POS;
range_pos[next_range].pg=pg;
range_pos[next_range].on=true;
range_pos[next_range].pos=from;
DBG(DBGRANGE,"Range *%d from 0x%x\n",pg,from);
DBG(DBGRANGE,"Range *%d to 0x%x\n",pg,to);
next_range++;
if (to!=HINT_NO_POS) @/
{ range_pos[next_range].pg=pg;
range_pos[next_range].on=false;
range_pos[next_range].pos=to;
next_range++;
}
}
@
@<write functions@>=
void hsort_ranges(void) /* simple insert sort by position */
{ int i;
DBG(DBGRANGE,"Range sorting %d positions\n",next_range-1);
for(i=3; i<next_range; i++)@/
{ int j = i-1;
if (range_pos[i].pos < range_pos[j].pos) @/
{ RangePos t;
t= range_pos[i];
do {
range_pos[j+1] = range_pos[j];
j--;
} while (range_pos[i].pos < range_pos[j].pos);
range_pos[j+1] = t;
}
}
max_range=next_range; @+next_range=1; /* prepare for |hwrite_range| */
}
@
\putcode
@<put functions@>=
void hput_range(uint8_t pg, bool on)
{ if (((next_range-1)/2)>max_ref[range_kind])
QUIT("Page range %d > %d",(next_range-1)/2,max_ref[range_kind]);
if (on && page_on[pg]!=0)
QUIT(@["Template %d is switched on at 0x%x and " SIZE_F@],@|
pg, range_pos[page_on[pg]].pos, hpos-hstart);
else if (!on && page_on[pg]==0)
QUIT(@["Template %d is switched off at " SIZE_F " but was not on"@],@|
pg, hpos-hstart);
DBG(DBGRANGE,@["Range *%d %s at " SIZE_F "\n"@],pg,on?"on":"off",hpos-hstart);
range_pos[next_range].pg=pg;
range_pos[next_range].pos=hpos-hstart;
range_pos[next_range].on=on;
if (on) page_on[pg]=next_range;
else @/{ range_pos[next_range].link =page_on[pg];
range_pos[page_on[pg]].link=next_range;
page_on[pg]=0; }
next_range++;
}
void hput_range_defs(void)
{ int i;
section_no=1;
hstart=dir[1].buffer;
hend=hstart+ dir[1].bsize;
hpos=hstart+dir[1].size;
for (i=1; i< next_range;i++)
if (range_pos[i].on)@/
{ Info info=b000;
uint32_t p=hpos++-hstart;
uint32_t from, to;
HPUT8(range_pos[i].pg);
from= range_pos[i].pos;
if (range_pos[i].link!=0) to = range_pos[range_pos[i].link].pos;
else to=HINT_NO_POS;
if (from!=0) @/
{ info=info|b100;@+ if (from>0xFFFF) @+ info=info|b001;@+}
if (to!=HINT_NO_POS) @/
{ info=info|b010;@+ if (to>0xFFFF) info=info|b001;@+ }
if (info & b100) @/
{ @+if (info & b001) HPUT32(from); @+else HPUT16(from); @+}
if (info & b010) @/
{ @+if (info & b001) HPUT32(to); @+else HPUT16(to); @+}
DBG(DBGRANGE,"Range *%d from 0x%x to 0x%x\n",@|range_pos[i].pg,from, to);
hput_tags(p,TAG(range_kind,info));
}
hput_definitions_end();
}
@
\section{File Structure}\hascode
All \HINT\ files\index{file} start with a banner\index{banner} as
described below. After that, they contain three mandatory
sections\index{section}: the directory\index{directory section}
section, the definition\index{definition section} section, and the
content\index{content section} section. Usually, further
optional\index{optional section} sections follow. In short format
files, these contain auxiliary\index{auxiliary file} files
(fonts\index{font}, images\index{image}, \dots) necessary for
rendering the content. In long format files, the directory section
will simply list the file names of the auxiliary files.
\subsection{Banner}
All \HINT\ files start with a banner\index{banner}. The banner contains only
printable ASCII characters and spaces;
its end is marked with a newline character\index{newline character}.
The first four byte are the ``magic'' number by which you recognize a \HINT\
file. It consists of the four ASCII codes `{\tt H}', `{\tt I}', `{\tt N}',
and `{\tt T}' in the long format and `{\tt h}', `{\tt i}', `{\tt n}',
and `{\tt t}' in the short format. Then follows a space, then
the version number, a dot, the sub-version number, and another
space. Both numbers are encoded as decimal ASCII strings. The
remainder of the banner is simply ignored but may be used to contain
other useful information about the file. The maximum size of the
banner is 256 byte.
@<hint macros@>=
#define MAX_BANNER 256
@
\goodbreak
To check the banner, we have the function |hcheck_banner|;
it returns |true| if successful.
@<common variables@>=
char hbanner[MAX_BANNER+1];
int hbanner_size=0;
@
@<function to check the banner@>=
bool hcheck_banner(char *magic)
{
int v,s;
char *t;
t=hbanner;
if (strncmp(magic,hbanner,4)!=0)
{ MESSAGE("This is not a %s file\n",magic); return false; }
else t+=4;
if(hbanner[hbanner_size-1]!='\n')
{ MESSAGE("Banner exceeds maximum size=0x%x\n",MAX_BANNER); return false; }
if (*t!=' ')
{ MESSAGE("Space expected in banner after %s\n",magic); return false; }
else t++;
v=strtol(t,&t,10);
if (*t!='.')
{ MESSAGE("Dot expected in banner after HINT version number\n"); return false; }
else t++;
s=strtol(t,&t,10);
if (v!=HINT_VERSION)
{ MESSAGE("Wrong HINT version: got %d.%d, expected %d.%d\n",
v,s,HINT_VERSION,HINT_MINOR_VERSION); return false; }
#if 0 /* minor versions should be downward compatible */
if (s<HINT_MINOR_VERSION)
{ MESSAGE("Outdated HINT minor version: got %d.%d, expected %d.%d\n",
v,s,HINT_VERSION,HINT_MINOR_VERSION); }
else
#endif
if (s>HINT_MINOR_VERSION)
{ MESSAGE("More recent HINT minor version: got %d.%d, expected %d.%d, update your application\n",
v,s,HINT_VERSION,HINT_MINOR_VERSION); }
if (*t!=' ' && *t!='\n')
{ MESSAGE("Space expected in banner after HINT minor version\n"); return false; }
LOG("%s file version " HINT_VERSION_STRING ":%s", magic, t);
DBG(DBGDIR,"banner size=0x%x\n",hbanner_size);
return true;
}
@
To read a short format file, we use the macro |HGET8|. It returns a single byte.
We read the banner knowing that it ends with a newline character
and is at most |MAX_BANNER| byte long. Because this is the first access to a yet unknown file,
we are very careful and make sure we do not read past the end of the file.
Checking the banner is a separate step.
\getcode
@<get file functions@>=
void hget_banner(void)
{ hbanner_size=0;
while (hbanner_size<MAX_BANNER && hpos<hend)
{ uint8_t c=HGET8;
hbanner[hbanner_size++]=c;
if (c=='\n') break;
}
hbanner[hbanner_size]=0;
}
@
To read a long format file, we use the function |fgetc|.
\readcode
@<read the banner@>=
{ hbanner_size=0;
while ( hbanner_size<MAX_BANNER)
{ int c=fgetc(hin);
if (c==EOF) break;
hbanner[hbanner_size++]=c;
if (c=='\n') break;
}
hbanner[hbanner_size]=0;
}
@
Writing the banner to a short format file is accomplished by calling
|hput_banner| with the ``magic'' string |"hint"| as a first argument
and a (short) comment as the second argument.
\putcode
@<function to write the banner@>=
static size_t hput_banner(char *magic, char *str)
{ size_t s=fprintf(hout,"%s " HINT_VERSION_STRING " %s\n",magic,str);
if (s>MAX_BANNER) QUIT("Banner too big");
return s;
}
@
\writecode
Writing the banner of a long format file is essentially the same as for a short
format file calling |hput_banner| with |"HINT"| as a first argument.
\subsection{Long Format Files}\gdef\subcodetitle{Banner}%
After reading and checking the banner, reading a long format file is
simply done by calling |yyparse|. The following rule gives the big picture:
\readcode
@s hint symbol
@s content_section symbol
@<parsing rules@>=
hint: directory_section definition_section content_section ;
@
\subsection{Short Format Files}\gdef\subcodetitle{Primitives}%
A short format\index{short format} file starts with the banner and continues
with a list of sections. Each section has a maximum size
of $2^{32}$ byte or 4GByte. This restriction ensures that positions\index{position}
inside a section can be stored as 32 bit integers, a feature that
we will need only for the so called ``content'' section, but it
is also nice for implementers to know in advance what sizes to expect.
The big picture is captured by the |put_hint| function:
@<put functions@>=
static size_t hput_root(void);
static size_t hput_section(uint16_t n);
static size_t hput_optional_sections(void);
size_t hput_hint(char * str)
{ size_t s;
DBG(DBGBASIC,"Writing hint output %s\n",str);
s=hput_banner("hint",str);
DBG(DBGDIR,@["Root entry at " SIZE_F "\n"@],s);
s+=hput_root();
DBG(DBGDIR,@["Directory section at " SIZE_F "\n"@],s);
s+=hput_section(0);
DBG(DBGDIR,@["Definition section at " SIZE_F "\n"@],s);
s+=hput_section(1);
DBG(DBGDIR,@["Content section at " SIZE_F "\n"@],s);
s+=hput_section(2);
DBG(DBGDIR,@["Auxiliary sections at " SIZE_F "\n"@],s);
s+=hput_optional_sections();
DBG(DBGDIR,@["Total number of bytes written " SIZE_F "\n"@],s);
return s;
}
@
When we work on a section, we will have the entire section in
memory and use three variables to access it: |hstart|
points to the first byte of the section, |hend| points
to the byte after the last byte of the section, and |hpos| points to the
current position inside the section.\label{hpos}
The auxiliary variable |hpos0| contains the |hpos| value of the
last content node on nesting level zero.
@<common variables@>=
uint8_t *hpos=NULL, *hstart=NULL, *hend=NULL, *hpos0=NULL;
@
There are two sets of macros that read or write binary data at the current position
and advance the stream position accordingly.\label{HPUT}\label{HGET}
\getcode
@<shared get macros@>=
#define HGET_ERROR @/ QUIT(@["HGET overrun in section %d at " SIZE_F "\n"@],@|section_no,hpos-hstart)
#define @[HEND@] @[((hpos<=hend)?0:(HGET_ERROR,0))@]
#define @[HGET8@] ((hpos<hend)?*(hpos++):(HGET_ERROR,0))
#define @[HGET16(X)@] ((X)=(hpos[0]<<8)+hpos[1],hpos+=2,HEND)
#define @[HGET24(X)@] ((X)=(hpos[0]<<16)+(hpos[1]<<8)+hpos[2],hpos+=3,HEND)
#define @[HGET32(X)@] ((X)=(hpos[0]<<24)+(hpos[1]<<16)+(hpos[2]<<8)+hpos[3],hpos+=4,HEND)
#define @[HGETTAG(A)@] @[A=HGET8,DBGTAG(A,hpos-1)@]
@
\putcode
@<put functions@>=
void hput_error(void)
{@+if (hpos<hend) return;
QUIT(@["HPUT overrun section %d pos=" SIZE_F "\n"@],@|section_no,hpos-hstart);
}
@
@<put macros@>=
extern void hput_error(void);
#define @[HPUT8(X)@] (hput_error(),*(hpos++)=(X))
#define @[HPUT16(X)@] (HPUT8(((X)>>8)&0xFF),HPUT8((X)&0xFF))
#define @[HPUT24(X)@] (HPUT8(((X)>>16)&0xFF),HPUT8(((X)>>8)&0xFF),HPUT8((X)&0xFF))
#define @[HPUT32(X)@] (HPUT8(((X)>>24)&0xFF),HPUT8(((X)>>16)&0xFF),HPUT8(((X)>>8)&0xFF),HPUT8((X)&0xFF))
@
The above macros test for buffer overruns\index{buffer overrun};
allocating sufficient buffer space is done separately.
Before writing a node, we will insert a test and increase the buffer if necessary.
@<put macros@>=
void hput_increase_buffer(uint32_t n);
#define @[HPUTX(N)@] @[(((hend-hpos) < (N))? hput_increase_buffer(N):(void)0)@]
#define HPUTNODE @[HPUTX(MAX_TAG_DISTANCE)@]
#define @[HPUTTAG(K,I)@] @|@[(HPUTNODE,@+DBGTAG(TAG(K,I),hpos),@+HPUT8(TAG(K,I)))@]
@
Fortunately the only data types that have an unbounded size are
strings\index{string} and texts\index{text}.
For these we insert specific tests. For all other cases a relatively
small upper bound on the maximum distance between two tags can be determined.
Currently the maximum distance between tags is 26 byte as can be determined
from the |hnode_size| array described in appendix~\secref{fastforward}.
The definition below uses a slightly larger value leaving some room
for future changes in the design of the short file format.
@<hint macros@>=
#define MAX_TAG_DISTANCE 32
@
\subsection{Mapping a Short Format File to Memory}
In the following, we implement two alternatives to map a file into memory.
The first implementation, opens the file, gets its size, allocates memory,
and reads the file. The second implementation uses a call to |mmap|.
Since modern computers with 64bit hardware have a huge address space,
using |mmap| to map the entire file into virtual memory is the most efficient way
to access a large file. ``Mapping'' is not the same as ``reading'' and it is
not the same as allocating precious memory, all that is done by the
operating system when needed. Mapping just reserves addresses.
There is one disadvantage of mapping: it typically locks the underlying file
and will not allow a separate process to modify it. This prevents using
this method for previewing a \HINT\ file while editing and recompiling it.
In this case, the first implementation, which has a copy of the file in memory,
is the better choice. To select the second implementation, define the macro |USE_MMAP|.
The following functions map and unmap a short format input
file setting |hin_addr| to its address and |hin_size| to its size.
The value |hin_addr==NULL| indicates, that no file is open.
The variable |hin_time| is set to the time when the file was last modified.
It can be used to detect modifications of the file and reload it.\label{map}
@<common variables@>=
char *hin_name=NULL;
uint64_t hin_size=0;
uint8_t *hin_addr=NULL;
uint64_t hin_time=0;
@
@<map functions@>=
#ifndef USE_MMAP
void hget_unmap(void)
{@+ if (hin_addr!=NULL) free(hin_addr);
hin_addr=NULL;
hin_size=0;
}
bool hget_map(void)
{ FILE *f;
struct stat st;
size_t s,t;
uint64_t u;
f= fopen(hin_name,"rb");
if (f==NULL)@/
{ MESSAGE("Unable to open file: %s\n", hin_name);@+ return false;@+ }
if (stat(hin_name,&st)<0)
{ MESSAGE("Unable to obtain file size: %s\n", hin_name);
fclose(f);
return false;
}
if (st.st_size==0)
{ MESSAGE("File %s is empty\n", hin_name);
fclose(f);
return false;
}
u=st.st_size;
if (hin_addr!=NULL) hget_unmap();
hin_addr=malloc(u);
if (hin_addr==NULL)
{ MESSAGE("Unable to allocate 0x%"PRIx64" byte for File %s\n", u,hin_name);
fclose(f);
return 0;
}
t=0;
do{
s=fread(hin_addr+t,1,u,f);
if (s<=0)
{ MESSAGE("Unable to read file %s\n",hin_name);
fclose(f);
free(hin_addr);
hin_addr=NULL;
return false;
}
t=t+s;@+
u=u-s;
} while (u>0);
hin_size=st.st_size;
hin_time=st.st_mtime;
return true;
}
#else
#include <sys/mman.h>
void hget_unmap(void)
{@+ munmap(hin_addr,hin_size);
hin_addr=NULL;
hin_size=0;
}
bool hget_map(void)
{ struct stat st;
int fd;
fd = open(hin_name, O_RDONLY, 0);
if (fd<0)@/
{ MESSAGE("Unable to open file %s\n", hin_name);@+ return false;@+ }
if (fstat(fd, &st)<0)
{ MESSAGE("Unable to get file size\n");
close(fd);
return false;
}
if (st.st_size==0)
{ MESSAGE("File %s is empty\n",hin_name);
close(fd);
return false;
}
if (hin_addr!=NULL) hget_unmap();
hin_size=st.st_size;
hin_time=st.st_mtime;
hin_addr= mmap(NULL,hin_size,PROT_READ,MAP_PRIVATE,fd, 0);
if (hin_addr==MAP_FAILED)
{ close(fd);
hin_addr=NULL;hin_size=0;
MESSAGE("Unable to map file into memory\n");
return 0;
}
close(fd);
return hin_size;
}
#endif
@
\subsection{Compression}
The short file format offers the possibility to store sections in
compressed\index{compression} form. We use the {\tt zlib}\index{zlib+{\tt zlib}} compression library\cite{zlib}\cite{RFC1950}
to deflate\index{deflate} and inflate\index{inflate} individual sections. When one of the following
functions is called, we can get the section buffer, the buffer size
and the size actually used from the directory entry. If a section
needs to be inflated, its size after decompression is found in the
|xsize| field; if a section needs to be deflated, its size after
compression will be known after deflating it.
@s z_stream int
@<get file functions@>=
static void hdecompress(uint16_t n)
{ z_stream z; /* decompression stream */
uint8_t *buffer;
int i;
DBG(DBGCOMPRESS,"Decompressing section %d from 0x%x to 0x%x byte\n",@|n, dir[n].size, dir[n].xsize);
z.zalloc = (alloc_func)0;@+
z.zfree = (free_func)0;@+
z.opaque = (voidpf)0;
z.next_in = hstart;
z.avail_in = hend-hstart;
if (inflateInit(&z)!=Z_OK)
QUIT("Unable to initialize decompression: %s",z.msg);
ALLOCATE(buffer,dir[n].xsize+MAX_TAG_DISTANCE,uint8_t);
DBG(DBGBUFFER,"Allocating output buffer size=0x%x, margin=0x%x\n",dir[n].xsize,MAX_TAG_DISTANCE);
z.next_out = buffer;
z.avail_out =dir[n].xsize+MAX_TAG_DISTANCE;
i= inflate(&z, Z_FINISH);
DBG(DBGCOMPRESS,"in: avail/total=0x%x/0x%lx "@|"out: avail/total=0x%x/0x%lx, return %d;\n",@|
z.avail_in,z.total_in, z.avail_out, z.total_out,i);
if (i!=Z_STREAM_END)
QUIT("Unable to complete decompression: %s",z.msg);
if (z.avail_in != 0)
QUIT("Decompression missed input data");
if (z.total_out != dir[n].xsize)
QUIT("Decompression output size mismatch 0x%lx != 0x%x",z.total_out, dir[n].xsize );
if (inflateEnd(&z)!=Z_OK)
QUIT("Unable to finalize decompression: %s",z.msg);
dir[n].buffer=buffer;
dir[n].bsize=dir[n].xsize;
hpos0=hpos=hstart=buffer;
hend=hstart+dir[n].xsize;
}
@
@<put functions@>=
static void hcompress(uint16_t n)
{ z_stream z; /* compression stream */
uint8_t *buffer;
int i;
if (dir[n].size==0) { dir[n].xsize=0;@+ return; @+}
DBG(DBGCOMPRESS,"Compressing section %d of size 0x%x\n",n, dir[n].size);
z.zalloc = (alloc_func)0;@+
z.zfree = (free_func)0;@+
z.opaque = (voidpf)0;
if (deflateInit(&z,Z_DEFAULT_COMPRESSION)!=Z_OK)
QUIT("Unable to initialize compression: %s",z.msg);
ALLOCATE(buffer,dir[n].size+MAX_TAG_DISTANCE,uint8_t);
z.next_out = buffer;
z.avail_out = dir[n].size+MAX_TAG_DISTANCE;
z.next_in = dir[n].buffer;
z.avail_in = dir[n].size;
i=deflate(&z, Z_FINISH);
DBG(DBGCOMPRESS,"deflate in: avail/total=0x%x/0x%lx out: avail/total=0x%x/0x%lx, return %d;\n",@|
z.avail_in,z.total_in, z.avail_out, z.total_out,i);
if (z.avail_in != 0)
QUIT("Compression missed input data");
if (i!=Z_STREAM_END)
QUIT("Compression incomplete: %s",z.msg);
if (deflateEnd(&z)!=Z_OK)
QUIT("Unable to finalize compression: %s",z.msg);
DBG(DBGCOMPRESS,"Compressed 0x%lx byte to 0x%lx byte\n",@|z.total_in,z.total_out);
free(dir[n].buffer);
dir[n].buffer=buffer;
dir[n].bsize=dir[n].size+MAX_TAG_DISTANCE;
dir[n].xsize=dir[n].size;
dir[n].size=z.total_out;
}
@
\subsection{Reading Short Format Sections}
\gdef\subcodetitle{Sections}%
After mapping the file at address |hin_addr| access to sections of the
file is provided by decompressing them if necessary and
setting the three pointers |hpos|, |hstart|, and
|hend|.
To read sections of a short format input file, we use the function |hget_section|.
\getcode
%\codesection{\getsymbol}\getindex{1}{3}{Files}
@<get file functions@>=
void hget_section(uint16_t n)
{ DBG(DBGDIR,"Reading section %d\n",n);
RNG("Section number",n,0,max_section_no);
if (dir[n].buffer!=NULL && dir[n].xsize>0)
{ hpos0=hpos=hstart=dir[n].buffer;
hend=hstart+dir[n].xsize;
}
else
{ hpos0=hpos=hstart=hin_addr+dir[n].pos;
hend=hstart+dir[n].size;
if (dir[n].xsize>0) hdecompress(n);
}
}
@
\subsection{Writing Short Format Sections}
\gdef\subcodetitle{Sections}%
To write a short format file, we allocate for each of the first three sections a
suitable buffer\index{buffer}, then fill these buffers, and finally write them
out in sequential order.
@<put functions@>=
#define BUFFER_SIZE 0x400
void new_output_buffers(void)
{ dir[0].bsize=dir[1].bsize=dir[2].bsize=BUFFER_SIZE;
DBG(DBGBUFFER,"Allocating output buffer size=0x%x, margin=0x%x\n",BUFFER_SIZE,MAX_TAG_DISTANCE);
ALLOCATE(dir[0].buffer,dir[0].bsize+MAX_TAG_DISTANCE,uint8_t);
ALLOCATE(dir[1].buffer,dir[1].bsize+MAX_TAG_DISTANCE,uint8_t);
ALLOCATE(dir[2].buffer,dir[2].bsize+MAX_TAG_DISTANCE,uint8_t);
}
void hput_increase_buffer(uint32_t n)
{ size_t bsize;
uint32_t pos, pos0;
const double buffer_factor=1.4142136; /* $\sqrt 2$ */
pos=hpos-hstart; pos0=hpos0-hstart;
bsize=dir[section_no].bsize*buffer_factor+0.5;
if (bsize<pos+n) bsize=pos+n;
if (bsize>=HINT_NO_POS) bsize=HINT_NO_POS;
if (bsize<pos+n) QUIT(@["Unable to increase buffer size " SIZE_F " by 0x%x byte"@],@|hpos-hstart,n);
DBG(DBGBUFFER,@["Reallocating output buffer "@|" for section %d from 0x%x to " SIZE_F " byte\n"@],
section_no,dir[section_no].bsize,bsize);
REALLOCATE(dir[section_no].buffer,bsize,uint8_t);
dir[section_no].bsize=(uint32_t)bsize;
hstart=dir[section_no].buffer;
hend=hstart+bsize;
hpos0=hstart+pos0; hpos=hstart+pos;
}
static size_t hput_data(uint16_t n, uint8_t *buffer, uint32_t size)
{ size_t s;
s=fwrite(buffer,1,size,hout);
if (s!=size)
QUIT(@["short write " SIZE_F " < %d in section %d"@],s,size,n);
return s;
}
static size_t hput_section(uint16_t n)
{ return hput_data(n, dir[n].buffer,dir[n].size);
}
@
\section{Directory Section}
A \HINT\ file is subdivided in sections and
each section can be identified by its section number.
The first three sections, numbered 0, 1, and 2, are mandatory:
directory\index{directory section} section, definition section, and content section.
The directory section, which we explain now, lists all sections
that make up a \HINT\ file.
A document will often contain not only plain text but also other media
for example illustrations. Illustrations are produced with specialized
tools and stored in specialized files. Because a \HINT\ file in short format
should be self contained, these special files are embedded in the \HINT\ file
as optional sections.
Because a \HINT\ file in long format should be readable, these special files
are written to disk and only the file names are retained in the directory.
Writing special files to disk has also the advantage that you can modify
them individually before embedding them in a short format file.
\subsection{Directories in Long Format}\gdef\subcodetitle{Directory Section}%
The directory\index{directory section} section of a long format \HINT\ file starts
with the ``\.{directory}'' keyword; then follows the maximum section number used and
a list of directory entries, one for each optional section numbered 3 and above.
Each entry consists of the keyword ``\.{section}'' followed by the
section number, followed by the file name.
The section numbers must be unique and fit into 16 bit.
The directory entries must be ordered with strictly increasing section numbers.
Keeping section numbers consecutive is recommended because it reduces the
memory footprint if directories are stored as arrays indexed by the section
number as we will do below.
\readcode
@s directory_section symbol
@s entry_list symbol
@s entry symbol
@s DIRECTORY symbol
@s SECTION symbol
@<symbols@>=
%token DIRECTORY "directory"
%token SECTION "entry"
@
@<scanning rules@>=
::@=directory@> :< return DIRECTORY; >:
::@=section@> :< return SECTION; >:
@
@<parsing rules@>=
directory_section: START DIRECTORY UNSIGNED @|{new_directory($3+1); new_output_buffers();} entry_list END ;
entry_list: @,@+ | entry_list entry;
entry: START SECTION UNSIGNED string END @/
{ RNG("Section number",$3,3,max_section_no); hset_entry(&(dir[$3]), $3,0,0,$4);};
@
We use a dynamically allocated array
of directory entries to store the directory.
@<directory entry type@>=
typedef struct {
uint64_t pos;
uint32_t size, xsize;
uint16_t section_no;
char *file_name;
uint8_t *buffer;
uint32_t bsize;
} Entry;
@
The function |new_directory| allocates the directory.
@<directory functions@>=
Entry *dir=NULL;
uint16_t section_no, max_section_no;
void new_directory(uint32_t entries)
{ DBG(DBGDIR,"Creating directory with %d entries\n", entries);
RNG("Directory entries",entries,3,0x10000);
max_section_no=entries-1;@+
ALLOCATE(dir,entries,Entry);
dir[0].section_no=0; @+ dir[1].section_no=1; @+ dir[2].section_no=2;
}
@
The function |hset_entry| fills in the appropriate entry.
@<directory functions@>=
void hset_entry(Entry *e, uint16_t i, uint32_t size, uint32_t xsize, @|char *file_name)
{ e->section_no=i;
e->size=size; @+e->xsize=xsize;
if (file_name==NULL || *file_name==0)
e->file_name=NULL;
else
e->file_name=strdup(file_name);
DBG(DBGDIR,"Creating entry %d: \"%s\" size=0x%x xsize=0x%x\n",@|i,file_name,size,xsize);
}
@
Writing the auxiliary files depends on the {\tt -a}, {\tt -g} and {\tt -f}
options.
@<without {\tt -f} skip writing an existing file@>=
if ( !option_force && access(aux_name,F_OK)==0)
{ MESSAGE("File '%s' exists.\n"@| "To rewrite the file use the -f option.\n",
aux_name);
continue;
}
@
The above code uses the |access| function, and we need to make sure it is defined:
@<make sure |access| is defined@>=
#ifdef WIN32
#include <io.h>
#define @[access(N,M)@] @[_access(N, M )@]
#define F_OK 0
#else
#include <unistd.h>
#endif
@
With the {\tt -g} option, filenames are considered global, and files
are written to the filesystem possibly overwriting the existing files.
For example a font embedded in a \HINT\ file might replace a font of
the same name in some operating systems font folder.
If the \HINT\ file is {\tt shrink}ed on one system and
{\tt stretch}ed on another system, this is usually not the desired behavior.
Without the {\tt -g} option,\label{absrel} the files will be written in two local directories.
The names of these directories are derived from the output file name,
replacing the extension ``{\tt .hint}'' with ``{\tt .abs}'' if the original
filename contained an absolute path, and replacing it with ``{\tt .rel}''
if the original filename contained a relative path. Inside these directories,
the path as given in the filename is retained.
When {\tt shrink}ing a \HINT\ file without the {\tt -g} option,
the original filenames can be reconstructed.
@<compute a local |aux_name|@>=
{ char *path=dir[i].file_name;
int path_length=(int)strlen(path);
@<determine whether |path| is absolute or relative@>@;
@<replace links to the parent directory@>@;
DBG(DBGDIR,"Replacing auxiliary file name:\n\t%s\n->\t%s\n",path,aux_name);
}
@
@<determine whether |path| is absolute or relative@>=
int aux_length;
enum {absolute=0, relative=1} name_type;
char *aux_ext[2]={".abs/",".rel/"};
int ext_length=5;
aux_length=stem_length+ext_length+path_length;
ALLOCATE(aux_name,aux_length+1,char);
strcpy(aux_name,stem_name);
if (path[0]=='/')
{ name_type=absolute;
strcpy(aux_name+stem_length,aux_ext[name_type]);
strcpy(aux_name+stem_length+ext_length,path+1);
}
else if (path_length>3 && isalpha(path[0]) &&
path[1]==':' && path[2]=='/')
{ name_type=absolute;
strcpy(aux_name+stem_length,aux_ext[name_type]);
strcpy(aux_name+stem_length+ext_length,path);
aux_name[stem_length+ext_length+1]='_';
}
else
name_type=relative;
@
When the {\tt -g} is not given, auxiliar files are written into
special subdirectories. To prevent them from escaping into the global
file system, we replace links to the parent direcory ``{\tt ../}'' by
``{\tt \_\,\_/}''.
@<replace links to the parent directory@>=
{ int k;
for (k=stem_length+ext_length; k<aux_length-3;k++)
if (aux_name[k]=='.'&& aux_name[k+1]=='.'&& aux_name[k+2]=='/')
{ aux_name[k]=aux_name[k+1]='_';k=k+2;}
}
@
It remains to create the directories along the path we might have constructed.
@<make sure the path in |aux_name| exists@>=
{ char *path_end;
path_end=aux_name+1;
while (*path_end!=0)
{ if(*path_end=='/')
{ struct stat s;
*path_end=0;
if (stat(aux_name,&s)==-1)
{
#ifdef WIN32
if (mkdir(aux_name)!=0)
#else
@t\2\kern-1em@>if (mkdir(aux_name,0777)!=0)
#endif
QUIT("Unable to create directory %s",aux_name);
DBG(DBGDIR,"Creating directory %s\n",aux_name);
} else if (!(S_IFDIR&(s.st_mode)))
QUIT("Unable to create directory %s, file exists",aux_name);
*path_end='/';
}
path_end++;
}
}
@
\writecode
@<write functions@>=
@<make sure |access| is defined@>@;
extern char *stem_name;
extern int stem_length;
void hget_section(uint16_t n);
void hwrite_aux_files(void)
{ int i;
if (!option_aux) return;
DBG(DBGBASIC|DBGDIR,"Writing %d aux files\n",max_section_no-2);
for (i=3;i<=max_section_no;i++)
{ FILE *f;
char *aux_name=NULL;
if (option_global)
aux_name=strdup(dir[i].file_name);
else
@<compute a local |aux_name|@>@;
@<without {\tt -f} skip writing an existing file@>@;
@<make sure the path in |aux_name| exists@>@;
f=fopen(aux_name,"wb");
if (f==NULL)
QUIT("Unable to open file '%s' for writing",aux_name);
else
{ size_t s;
hget_section(i);
DBG(DBGDIR,"Writing file %s\n",aux_name);
s=fwrite(hstart,1,dir[i].size,f);
if (s!=dir[i].size) QUIT("writing file %s",aux_name);
fclose(f);
}
free(aux_name);
}
}
@
We write the directory, and the directory entries
in long format using the following functions.
@<write functions@>=
static void hwrite_entry(int i)
{ hwrite_start();
hwritef("section %u",dir[i].section_no);@+ hwrite_string(dir[i].file_name);
hwrite_end();
}
void hwrite_directory(void)
{ int i;
if (dir==NULL) QUIT("Directory not allocated");
section_no=0;
hwritef("<directory %u", max_section_no);@/
for (i=3;i<=max_section_no;i++)
hwrite_entry(i);
hwritef("\n>\n");
}
@
\subsection{Directories in Short Format}
The directory\index{directory section} section of a short format file contains entries
for all sections including the directory section itself. After reading the
directory section, enough information---position and size---is available to
access any section directly. As usual, a directory entry starts and ends with
a tag byte. The kind part of an entry's tag is not used; it is always zero.
The value $s$ of the two least significant bits of the info part indicate
that sizes are stored using $s+1$ byte. The most significant bit of the info
part is 1 if the section is stored in compressed\index{compression} form. In this case the size
of the section is followed by the size of the section after decompressing it.
After the tag byte follows the section number. In the short format file,
section numbers must be strictly increasing and consecutive. This is redundant but helps
with checking. Then follows the size---or the sizes---of the section. After the size
follows the file name terminated by a zero byte. The file name might be an empty
string in which case there is just the zero byte. After the zero byte follows
a copy of the tag byte.
Here is the macro and function to read a directory\index{directory entry} entry:
\gdef\subcodetitle{Directory Entries}%
\getcode
@<shared get macros@>=
#define @[HGET_SIZE(I)@] \
if ((I)&b100) { \
if (((I)&b011)==0) s=HGET8,xs=HGET8; \
else if (((I)&b011)==1) HGET16(s),HGET16(xs); \
else if (((I)&b011)==2) HGET24(s),HGET24(xs); \
else if (((I)&b011)==3) HGET32(s),HGET32(xs); \
} \
else { \
if (((I)&b011)==0) s=HGET8; \
else if (((I)&b011)==1) HGET16(s); \
else if (((I)&b011)==2) HGET24(s); \
else if (((I)&b011)==3) HGET32(s); \
}
#define @[HGET_ENTRY(I,E)@] \
{ uint16_t i; \
uint32_t s=0,xs=0; \
char *file_name; \
HGET16(i); @+HGET_SIZE(I); @+HGET_STRING(file_name); @/\
hset_entry(&(E),i,s,xs,file_name); \
}
@
@<get file functions@>=
void hget_entry(Entry *e)
{ @<read the start byte |a|@>@;
DBG(DBGDIR,"Reading directory entry\n");
switch(a)
{ case TAG(0,b000+0): HGET_ENTRY(b000+0,*e);@+ break;
case TAG(0,b000+1): HGET_ENTRY(b000+1,*e);@+ break;
case TAG(0,b000+2): HGET_ENTRY(b000+2,*e);@+ break;
case TAG(0,b000+3): HGET_ENTRY(b000+3,*e);@+ break;
case TAG(0,b100+0): HGET_ENTRY(b100+0,*e);@+ break;
case TAG(0,b100+1): HGET_ENTRY(b100+1,*e);@+ break;
case TAG(0,b100+2): HGET_ENTRY(b100+2,*e);@+ break;
case TAG(0,b100+3): HGET_ENTRY(b100+3,*e);@+ break;
default: TAGERR(a); @+ break;
}
@<read and check the end byte |z|@>@;
}
@
Because the first entry in the directory section describes the
directory section itself, we can not check its info bits in advance to determine
whether it is compressed or not. Therefore the directory section
starts with a root entry, which is always uncompressed. It describes
the remainder of the directory which follows.
There are two differences between the root entry and a normal entry:
it starts with the maximum section number instead of the section number zero,
and we set its position to the position of the
entry for section 1 (which might already be compressed).
The name of the directory section must be the empty string.
\gdef\subcodetitle{Directory Section}%
\getcode
@<get file functions@>=
static void hget_root(Entry *root)
{ DBG(DBGDIR,"Root entry at " SIZE_F "\n",hpos-hstart);
hget_entry(root);
root->pos=hpos-hstart;
max_section_no=root->section_no;
root->section_no=0;
if (max_section_no<2) QUIT("Sections 0, 1, and 2 are mandatory");
}
void hget_directory(void)
{ int i;
Entry root={0};
hget_root(&root);
DBG(DBGDIR,"Directory\n");
new_directory(max_section_no+1);
dir[0]=root;
DBG(DBGDIR,"Directory entry 1 at 0x%"PRIx64"\n",dir[0].pos);
hget_section(0);
for (i=1;i<=max_section_no;i++)@/
{ hget_entry(&(dir[i]));@+
dir[i].pos=dir[i-1].pos +dir[i-1].size;@+
DBG(DBGDIR,"Section %d at 0x%"PRIx64"\n",i,dir[i].pos);
}
}
void hclear_dir(void)
{ int i;
if (dir==NULL) return;
for (i=0;i<3;i++) /* currently the only compressed sections */
if (dir[i].xsize>0 && dir[i].buffer!=NULL) free(dir[i].buffer);
free(dir); dir=NULL;
}
@
Armed with these preparations, we can put the directory into the \HINT\ file.
\gdef\subcodetitle{Directory Section}%
\putcode
@<put functions@>=
static void hput_entry(Entry *e)
{ Info b;
if (e->size<0x100 && e->xsize<0x100) b=b000;
else if (e->size<0x10000 &&e->xsize<0x10000) b=b001;
else if (e->size<0x1000000 &&e->xsize<0x1000000) b=b010;
else b=b011;
if (e->xsize!=0) b =b|b100;
DBG(DBGTAGS,"Directory entry no=%d size=0x%x xsize=0x%x\n",e->section_no, e->size, e->xsize);
HPUTTAG(0,b);@/
HPUT16(e->section_no);
switch (b) {
case b000: HPUT8(e->size);@+break;
case b001: HPUT16(e->size);@+break;
case b010: HPUT24(e->size);@+break;
case b011: HPUT32(e->size);@+break;
case b100: HPUT8(e->size);@+HPUT8(e->xsize);@+break;
case b101: HPUT16(e->size);@+HPUT16(e->xsize);@+break;
case b110: HPUT24(e->size);@+HPUT24(e->xsize);@+break;
case b111: HPUT32(e->size);@+HPUT32(e->xsize);@+break;
default: QUIT("Can't happen");@+ break;
}
hput_string(e->file_name);@/
DBGTAG(TAG(0,b),hpos);@+HPUT8(TAG(0,b));
}
static void hput_directory_start(void)
{ DBG(DBGDIR,"Directory Section\n");
section_no=0;
hpos=hstart=dir[0].buffer;
hend=hstart+dir[0].bsize;
}
static void hput_directory_end(void)
{ dir[0].size=hpos-hstart;
DBG(DBGDIR,"End Directory Section size=0x%x\n",dir[0].size);
}
static size_t hput_root(void)
{ uint8_t buffer[MAX_TAG_DISTANCE];
size_t s;
hpos=hstart=buffer;
hend=hstart+MAX_TAG_DISTANCE;
dir[0].section_no=max_section_no;
hput_entry(&dir[0]);
s=hput_data(0, hstart,hpos-hstart);
DBG(DBGDIR,@["Writing root size=" SIZE_F "\n"@],s);
return s;
}
extern int option_compress;
static char **aux_names;
void hput_directory(void)
{ int i;
@<update the file sizes of optional sections@>@;
if (option_compress) { hcompress(1); @+hcompress(2); @+}
hput_directory_start();
for (i=1; i<=max_section_no; i++)
{ dir[i].pos=dir[i-1].pos+dir[i-1].size;
DBG(DBGDIR,"writing entry %u at 0x%" PRIx64 "\n",i, dir[i].pos);
hput_entry(&dir[i]);
}
hput_directory_end();
if (option_compress) hcompress(0);
}
@
Now let us look at the optional sections described in the directory
entries 3 and above. Where these files are found depends on the {\tt
-g} and {\tt -a} options.
With the {\tt -g} option given, only the file names as given in the
directory entries are used. With the {\tt -a} option given, the file
names are translated to filenames in the {|hin_name|\tt .abs} and
{|hin_name|\tt .rel} directories, as described in
section~\secref{absrel}. If neither the {\tt -a} nor the {\tt -g}
option is given, {\tt shrink} first tries the translated filename and
then the global filename before it gives up.
When the \.{shrink} program writes the directory section in the short
format, it needs to know the sizes of all the sections---including the
optional sections. These sizes are not provided in the long format
because it is safer and more convenient to let the machine figure out
the file sizes\index{file size}. But before we can determine the
size, we need to determine the file.
@<update the file sizes of optional sections@>=
{ int i;
ALLOCATE(aux_names,max_section_no+1,char *);
for (i=3;i<=max_section_no;i++)
{ struct stat s;
if (!option_global)
{ char * aux_name=NULL;
@<compute a local |aux_name|@>@;
if (stat(aux_name,&s)==0)
aux_names[i]=aux_name;
else
{ if (option_aux) QUIT("Unable to find file '%s'",aux_name);
free(aux_name); aux_name=NULL;
}
}
if ((aux_names[i]==NULL && !option_aux) || option_global)
{ if (stat(dir[i].file_name,&s)!=0)
QUIT("Unable to find file '%s'",dir[i].file_name);
}
dir[i].size=s.st_size;
dir[i].xsize=0;
DBG(DBGDIR,"section %i: found file %s size %u\n",i,aux_names[i]?aux_names[i]:dir[i].file_name, dir[i].size);
}
}
@
@<rewrite the file names of optional sections@>=
{ int i;
for (i=3;i<=max_section_no;i++)
if (aux_names[i]!=NULL)
{ free(dir[i].file_name);
dir[i].file_name=aux_names[i];
aux_names[i]=NULL;
}
}
@
The computation of the sizes of the mandatory sections will be
explained later.
\gdef\subcodetitle{Optional Sections}%
To conclude this section, here is the function that adds the files that
are described in the directory entries 3 and above to a \HINT\ file in short format.
\putcode
@<put functions@>=
static size_t hput_optional_sections(void)
{ int i;
size_t s=0;
DBG(DBGDIR,"Optional Sections\n");
for (i=3; i<=max_section_no; i++)@/
{ FILE *f;
size_t fsize;
char *file_name=dir[i].file_name;
DBG(DBGDIR,"adding file %d: %s\n",dir[i].section_no,file_name);
if (dir[i].xsize!=0) @/
DBG(DBGDIR,"Compressing of auxiliary files currently not supported");
f=fopen(file_name,"rb");
if (f==NULL) QUIT("Unable to read section %d, file %s",
dir[i].section_no,file_name);
fsize=0;
while (!feof(f))@/
{ size_t s,t;
char buffer[1<<13]; /* 8kByte */
s=fread(buffer,1,1<<13,f);@/
t=fwrite(buffer,1,s,hout);
if (s!=t) QUIT("writing file %s",file_name);
fsize=fsize+t;
}
fclose(f);
if (fsize!=dir[i].size)
QUIT(@["File size " SIZE_F " does not match section[0] size %u"@],@|fsize,dir[i].size);
s=s+fsize;
}
return s;
}
@
\section{Definition Section}\index{definition section}
\label{defsection}
In a typical \HINT\ file, there are many things that are used over and over again.
For example the interword glue of a specific font or the indentation of
the first line of a paragraph. The definition section contains this information so that
it can be referenced in the content section by a simple reference number.
In addition there are a few parameters that guide the routines of \TeX.
An example is the ``above display skip'', which controls the amount of white space
inserted above a displayed equation, or the ``hyphen penalty'' that tells \TeX\
the ``\ae sthetic cost'' of ending a line with a hyphenated word. These parameters
also get their values in the definition section as explained in section~\secref{defaults}.
The most simple way to store these definitions is to store them in an array indexed by the
reference numbers.
To simplify the dynamic allocation of these arrays, the list of definitions
will always start with the list of maximum\index{maximum values} values: a list that contains
for each node type the maximum reference number used.
In the long format, the definition section starts with the keyword \.{definitions},
followed by the list of maximum values,
followed by the definitions proper.
When writing the short format, we start by positioning the output stream at the beginning of
the definition buffer and we end with recording the size of the definition section
in the directory.
\readcode
@s definition_section symbol
@s definition_list symbol
@s definition symbol
@s DEFINITIONS symbol
@<symbols@>=
%token DEFINITIONS "definitions"
@
@<scanning rules@>=
::@=definitions@> :< return DEFINITIONS; >:
@
@<parsing rules@>=
definition_section: START DEFINITIONS { hput_definitions_start();}@/
max_definitions definition_list @/
END {hput_definitions_end();};
definition_list: @+ | definition_list def_node;
@
\writecode
@<write functions@>=
void hwrite_definitions_start(void)
{ section_no=1; @+hwritef("<definitions");
}
void hwrite_definitions_end(void)
{ hwritef("\n>\n");
}
@
@<get functions@>=
void hget_definition_section(void)
{ DBG(DBGBASIC|DBGDEF,"Definitions\n");
hget_section(1);
hwrite_definitions_start();
DBG(DBGDEF,"List of maximum values\n");
hget_max_definitions();
@<initialize definitions@>@;
hwrite_max_definitions();
DBG(DBGDEF,"List of definitions\n");
while (hpos<hend)
hget_def_node();
hwrite_definitions_end();
}
@
\putcode
@<put functions@>=
void hput_definitions_start(void)
{ DBG(DBGDEF,"Definition Section\n");
section_no=1;
hpos=hstart=dir[1].buffer;
hend=hstart+dir[1].bsize;
}
void hput_definitions_end(void)
{ dir[1].size=hpos-hstart;
DBG(DBGDEF,"End Definition Section size=0x%x\n",dir[1].size);
}
@
\gdef\codetitle{Definitions}
\hascode
\subsection{Maximum Values}\index{maximum values}
To help implementations allocating the right amount of memory for the
definitions, the definition section starts with a list of maximum
values. For each kind of node, we store the maximum valid reference
number in the array |max_ref| which is indexed by the kind-values.
For a reference number |n| and kind-value $k$ we have
$0\le n\le |max_ref[k]|$.
To make sure that a hint file without any definitions
will work, some definitions have default values.
The initialization of default and maximum values is described
in section~\secref{defaults}. The maximum
reference number that has a default value is stored in the array
|max_default|.
We have $-1 \le |max_default[k]| \le |max_ref[k]| < 2^{16}$,
and for most $k$ even $|max_ref[k]| < 2^{8}$.
Specifying maximum values that are lower than the
default\index{default value} values is not allowed in the short
format; in the long format, lower values are silently ignored. Some
default values are permanently fixed; for example the zero glue with
reference number |zero_skip_no| must never change. The array
|max_fixed| stores the maximum reference number that has a fixed value for a
given kind. Definitions with reference numbers less or equal than the
corresponding |max_fixed[k]| number are disallowed. Usually we have
$-1 \le |max_fixed[k]| \le |max_default[k]| $, but if for a kind-value
$k$ no definitions, and hence no maximum values are allowed, we set
$|max_fixed[k]|=|0x10000|>|max_default[k]| $.
We use the |max_ref| array whenever we find a
reference number in the input to check if it is within the proper range.
@<debug macros@>=
#define @[REF_RNG(K,N)@] if ((int)(N)>max_ref[K]) QUIT("Reference %d to %s out of range [0 - %d]",\
(N),definition_name[K],max_ref[K])
@
In the long format file, the list of maximum values starts with
``\.{<max }'', then follow pairs of keywords and numbers like
``\.{<glue 57>}'', and it ends with ``\.{>}''. In the short format,
we start the list of maximums with a |list_kind| tag and end it with
a |list_kind| tag. Each maximum value is preceded and followed by a
tag byte with the appropriate kind-value. The info value has its |b001| bit
cleared if the maximum value is in the range 0 to |0xFF| and fits into a
single byte; the info value hast its |b001| bit set if it fits into two byte.
Currently only the |label_kind| may need to use two byte.
@<debug macros@>=
#define MAX_REF(K) ((K)==label_kind?0xFFFF:0xFF)
@
Other info values are reserved for future extensions.
After reading the maximum values, we initialize the data structures for
the definitions.
\readcode
@s max_list symbol
@s max_value symbol
@s max_definitions symbol
@s MAX symbol
@<symbols@>=
%token MAX "max"
@
@<scanning rules@>=
::@=max@> :< return MAX; >:
@
@<parsing rules@>=
max_definitions: START MAX max_list END @|
{ @<initialize definitions@>@;@+ hput_max_definitions(); };
max_list:@+ | max_list START max_value END;
max_value: FONT UNSIGNED { hset_max(font_kind,$2); }
| INTEGER UNSIGNED { hset_max(int_kind,$2); }
| DIMEN UNSIGNED { hset_max(dimen_kind,$2); }
| LIGATURE UNSIGNED { hset_max(ligature_kind,$2); }
| DISC UNSIGNED { hset_max(disc_kind,$2); }
| GLUE UNSIGNED { hset_max(glue_kind,$2); }
| LANGUAGE UNSIGNED { hset_max(language_kind,$2); }
| RULE UNSIGNED { hset_max(rule_kind,$2); }
| IMAGE UNSIGNED { hset_max(image_kind,$2); }
| LEADERS UNSIGNED { hset_max(leaders_kind,$2); }
| BASELINE UNSIGNED { hset_max(baseline_kind,$2); }
| XDIMEN UNSIGNED { hset_max(xdimen_kind,$2); }
| PARAM UNSIGNED { hset_max(param_kind,$2); }
| STREAMDEF UNSIGNED { hset_max(stream_kind,$2); }
| PAGE UNSIGNED { hset_max(page_kind,$2); }
| RANGE UNSIGNED { hset_max(range_kind,$2); }
| LABEL UNSIGNED { hset_max(label_kind,$2); }
| COLOR UNSIGNED { hset_max(color_kind,$2); };
@
@<parsing functions@>=
void hset_max(Kind k, int n)
{ DBG(DBGDEF,"Setting max %s to %d\n",definition_name[k],n);
RNG("Maximum",n,max_fixed[k]+1,MAX_REF(k));
if (n>max_ref[k])
max_ref[k]=n;
}
@
\writecode
@<write functions@>=
void hwrite_max_definitions(void)
{ Kind k;
hwrite_start();@+
hwritef("max");
for (k=0; k<32;k++)
if (max_ref[k]>max_default[k])@/
{@+ switch (k)
{ @<cases of writing special maximum values@>@;
default:
hwrite_start();
hwritef("%s %d",definition_name[k], max_ref[k]);
hwrite_end();
break;
}
}
hwrite_end();
}
@
\getcode
@<get file functions@>=
void hget_max_definitions(void)
{ Kind k;
@<read the start byte |a|@>@;
if (a!=TAG(list_kind,0)) QUIT("Start of maximum list expected");
for(k= 0;k<32;k++)max_ref[k]= max_default[k]; max_outline=-1;
while (true) @/
{ int n;
if (hpos>=hend) QUIT("Unexpected end of maximum list");
node_pos=hpos-hstart;
HGETTAG(a);@+
k=KIND(a);@+
if (k==list_kind) break;
if (INFO(a)&b001) HGET16(n); @+else n=HGET8;
switch (a)
{ @<cases of getting special maximum values@>@;
default:
if (max_fixed[k]>max_default[k])
MESSAGE("Maximum value for kind %s not supported\n",definition_name[k]); else
{ RNG("Maximum number",n,max_default[k],MAX_REF(k));
max_ref[k]=n;
DBG(DBGDEF,"max(%s) = %d\n",definition_name[k],max_ref[k]);
}
break;
}
@<read and check the end byte |z|@>@;
}
if (INFO(a)!=0) QUIT("End of maximum list with info %d", INFO(a));
DBG(DBGDEF,"Getting Max Definitions END\n");
}
@
\putcode
@<put functions@>=
void hput_max_definitions(void)
{ Kind k;
DBG(DBGDEF,"Writing Max Definitions\n");
HPUTTAG(list_kind,0);
for (k=0; k<32; k++)
if (max_ref[k]>max_default[k])
{ uint32_t pos=hpos++-hstart;
DBG(DBGDEF,"max(%s) = %d\n",definition_name[k],max_ref[k]);
hput_tags(pos,TAG(k,hput_n(max_ref[k])-1));
}
@<cases of putting special maximum values@>@;
HPUTTAG(list_kind,0);
DBG(DBGDEF,"Writing Max Definitions End\n");
}
@
\subsection{Definitions}\label{definitions}
A definition\index{definition section} associates a reference number
with a content node. Here is an example: A glue definition associates
a glue number, for example 71, with a glue specification. In the long
format this might look like ``{\tt \.{<}glue *71 4pt plus 5pt minus
0.5pt\.{>}}'' which makes glue number 71 refer to a 4pt glue with a
stretchability of 5pt and a shrinkability of 0.5pt.
Such a glue definition differs from a normal glue node just by an extra
byte value immediately following the keyword respectively start byte.
Whenever we need this glue in the content section, we can say
``{\tt \.{<}glue *71\.{>}}''. Because we restrict the number of glue definitions
to at most 256, a single byte is sufficient to
store the reference number. The \.{shrink} and \.{stretch} programs
will, however, not bother to store glue definitions. Instead they will
write them in the new format immediately to the output.
The parser will handle definitions in any order, but the order is relevant
if a definition references another definition, and of course,
it never does any harm to present definitions in a systematic way.
As a rule, the definition of a reference must always precede the
use of that reference. While this is always the case for
references in the content section, it restricts the use of
references inside the definition section.
The definitions for integers, dimensions, extended dimensions,
languages, rules, ligatures, and images are ``simple''.
They never contain references and so it is always possible to list them first.
The definition of glues may contain extended dimensions,
the definitions of baselines may reference glue nodes, and
the definitions of parameter lists contain definitions of integers, dimensions,
and glues. So these definitions should follow in this order.
The definitions of leaders and discretionary breaks allow boxes.
While these boxes are usually
quite simple, they may contain arbitrary references---including again
references to leaders and discretionary breaks. So, at least in principle,
they might impose complex (or even unsatisfiable) restrictions
on the order of those definitions.
The definitions of fonts contain not only ``simple'' definitions
but also the definitions of interword glues and hyphens
introducing additional ordering restrictions.
The definition of hyphens regularly contain glyphs which in turn reference
a font---typically the font that just gets defined.
Therefore we relax the define before use policy for glyphs:
Glyphs may reference a font before the font is defined.
The definitions of page templates contain lists of arbitrary content
nodes, and while the boxes inside leaders or discretionary breaks tend to be simple,
the content of page templates is often quite complex.
Page templates are probably the source of most ordering restrictions.
Placing page templates towards the end of the list of definitions
might be a good idea.
%
A special case are stream definitions. These occur only as part of
the corresponding page template definition and are listed at its end.
So references to them will occur in the page template always before their
definition.
%
Finally, the definitions of page ranges always reference a page template
and they should come after the page template definitions.
For technical reasons explained in section~\secref{labels},
definitions of labels and outlines come last.
To avoid complex dependencies, an application can always choose not to
use references in the definition section. There are only three types of
nodes where references can not be avoided: fonts are referenced in glyph nodes,
labels are referenced in outlines,
and languages are referenced in boxes or page templates.
Possible ordering restrictions can be satisfied if languages are defined early.
To check the define before use policy, we use an array of bitvectors,
but we limit checking to the first 256 references.
We have for every reference number $|N|<256$ and every kind |K| a single
bit which is set if and only if the corresponding reference is defined.
@<definition checks@>=
uint32_t definition_bits[0x100/32][32]={{0}};
#define @[SET_DBIT(N,K)@] ((N)>0xFF?1:(definition_bits[N/32][K]|=(1<<((N)&(32-1)))))
#define @[GET_DBIT(N,K)@] ((N)>0xFF?1:((definition_bits[N/32][K]>>((N)&(32-1)))&1))
#define @[DEF(D,K,N)@] (D).k=K;@+ (D).n=(N);@+SET_DBIT((D).n,(D).k);\
DBG(DBGDEF,"Defining %s %d\n",definition_name[(D).k],(D).n);\
RNG("Definition",(D).n,max_fixed[(D).k]+1,max_ref[(D).k]);
#define @[REF(K,N)@] REF_RNG(K,N);if(!GET_DBIT(N,K)) \
QUIT("Reference %d to %s before definition",(N),definition_name[K])
@
@<initialize definitions@>=
definition_bits[0][list_kind]=(1<<(MAX_LIST_DEFAULT+1))-1;
definition_bits[0][param_kind]=(1<<(MAX_LIST_DEFAULT+1))-1;
definition_bits[0][int_kind]=(1<<(MAX_INT_DEFAULT+1))-1;
definition_bits[0][dimen_kind]=(1<<(MAX_DIMEN_DEFAULT+1))-1;
definition_bits[0][xdimen_kind]=(1<<(MAX_XDIMEN_DEFAULT+1))-1;
definition_bits[0][glue_kind]=(1<<(MAX_GLUE_DEFAULT+1))-1;
definition_bits[0][baseline_kind]=(1<<(MAX_BASELINE_DEFAULT+1))-1;
definition_bits[0][page_kind]=(1<<(MAX_PAGE_DEFAULT+1))-1;
definition_bits[0][stream_kind]=(1<<(MAX_STREAM_DEFAULT+1))-1;
definition_bits[0][range_kind]=(1<<(MAX_RANGE_DEFAULT+1))-1;
definition_bits[0][color_kind]=(1<<(MAX_COLOR_DEFAULT+1))-1;
@
\goodbreak
\vbox{\readcode\vskip -\baselineskip\putcode}
@s font symbol
@<symbols@>=
%type <rf> def_node
@
@<parsing rules@>=
def_node:
start FONT ref font END @| { DEF($$,font_kind,$3);@+ hput_tags($1,$4);@+}
| start INTEGER ref integer END @| { DEF($$,int_kind,$3);@+ hput_tags($1,hput_int($4));@+}
| start DIMEN ref dimension END @| { DEF($$,dimen_kind,$3);@+ hput_tags($1,hput_dimen($4));}
| start LANGUAGE ref string END @| { DEF($$,language_kind,$3);@+hput_string($4); hput_tags($1,TAG(language_kind,0));}
| start GLUE ref glue END @| { DEF($$,glue_kind,$3);@+ hput_tags($1,hput_glue(&($4)));}
| start XDIMEN ref xdimen END @| { DEF($$,xdimen_kind,$3);@+ hput_tags($1,hput_xdimen(&($4)));}
| start RULE ref rule END @| { DEF($$,rule_kind,$3);@+ hput_tags($1,hput_rule(&($4)));}
| start LEADERS ref leaders END @| { DEF($$,leaders_kind,$3);@+ hput_tags($1,TAG(leaders_kind, $4));}
| start BASELINE ref baseline END @| { DEF($$,baseline_kind,$3);@+hput_tags($1,TAG(baseline_kind, $4));@+}
| start LIGATURE ref ligature END @| { DEF($$,ligature_kind,$3);@+hput_tags($1,hput_ligature(&($4)));}
| start DISC ref disc END @| { DEF($$,disc_kind,$3);@+ hput_tags($1,hput_disc(&($4)));}
| start IMAGE ref image END @| { DEF($$,image_kind,$3);@+ hput_tags($1,TAG(image_kind,$4));}
| start PARAM ref parameters END @| { DEF($$,param_kind,$3);@+ hput_tags($1,hput_list($1+2,&($4)));}
| start PAGE ref page END @| { DEF($$,page_kind,$3);@+ hput_tags($1,TAG(page_kind,0));};
@
There are a few cases where one wants to define a reference by a reference.
For example, a \HINT\ file may want to set the {\tt parfillskip} glue to zero.
While there are multiple ways to define the zero glue, the canonical way is a reference
using the |zero_glue_no|. All these cases have in common that the reference to be defined
is one of the default references and the defining reference is one of the fixed references.
We add a few parsing rules and a testing macro for those cases where the number
of default definitions is greater than the number of fixed definitions.
@<definition checks@>=
#define @[DEF_REF(D,K,M,N)@] DEF(D,K,M);\
if ((int)(M)>max_default[K]) QUIT("Defining non default reference %d for %s",M,definition_name[K]); \
if ((int)(N)>max_fixed[K]) QUIT("Defining reference %d for %s by non fixed reference %d",M,definition_name[K],N);
@
@<parsing rules@>=
def_node:
start INTEGER ref ref END @/{DEF_REF($$,int_kind,$3,$4); hput_tags($1,TAG(int_kind,0)); }
| start DIMEN ref ref END @/{DEF_REF($$,dimen_kind,$3,$4); hput_tags($1,TAG(dimen_kind,0)); }
| start GLUE ref ref END @/{DEF_REF($$,glue_kind,$3,$4); hput_tags($1,TAG(glue_kind,0)); };
@
\goodbreak
\vbox{\getcode\vskip -\baselineskip\writecode}
@<get functions@>=
void hget_definition(int n, Tag a, uint32_t node_pos)
{@+ switch(KIND(a))
{ case font_kind: hget_font_def(INFO(a),n);@+ break;
case param_kind:
{@+ List l; l.t=a; @+HGET_LIST(INFO(a),l); @+hwrite_parameters(&l); @+ break;@+}
case page_kind: hget_page(); @+break;
case dimen_kind: hget_dimen(a); @+break;
case xdimen_kind:
{@+ Xdimen x; @+hget_xdimen(a,&x); @+hwrite_xdimen(&x); @+break;@+ }
case language_kind:
if (INFO(a)!=b000)
QUIT("Info value of language definition must be zero");
else
{ char *n; HGET_STRING(n);@+ hwrite_string(n); }
break;
case color_kind:
switch (INFO(a))
{ @<cases to get definitions for |color_kind|@>@;
default:
QUIT("Undefined tag %d for color_kind definition at 0x%x",INFO(a),node_pos);
}
break;
default:
hget_content(a); @+break;
}
}
void hget_def_node()
{ Kind k;
@<read the start byte |a|@>@;
k=KIND(a);
if (k==unknown_kind && INFO(a)==b100)
hget_unknown_def();
else if (k==label_kind)
hget_outline_or_label_def(INFO(a),node_pos);
else
{ int n;
n=HGET8;
if (k!=range_kind) REF_RNG(k,n);
SET_DBIT(n,k);
if (k==range_kind)
hget_range(INFO(a),n);
else
{ hwrite_start(); @+hwritef("%s *%d",definition_name[k],n);
hget_definition(n,a,node_pos);
hwrite_end();
}
if(n>max_ref[k] || n <= max_fixed[k])
QUIT("Definition %d for %s out of range [%d - %d]",@|
n, definition_name[k],max_fixed[k]+1,max_ref[k]);
if (max_fixed[k]>max_default[k])
QUIT("Definitions for kind %s not supported", definition_name[k]);
}
@<read and check the end byte |z|@>@;
}
@
\subsection{Parameter Lists}\label{paramlist}\index{parameter list}
Because the content section is a ``stateless'' list of nodes, the
definitions we see in the definition section can never change. It is
however necessary to make occasionally local modifications of some of
these definitions, because some definitions are parameters of the
algorithms borrowed from \TeX. Nodes that need such modifications, for
example the paragraph nodes that are passed to \TeX's line breaking
algorithm, contain a list of local definitions called parameters.
Typically sets of related parameters are needed. To facilitate a
simple reference to such a set of parameters, we allow predefined
parameter lists that can be referenced by a single number. The
parameters of \TeX's routines are quite basic---integers\index{integer},
dimensions\index{dimension}, and glues\index{glue}---and all
of them have default values.
Therefore we restrict the definitions in parameter lists to such
basic definitions.
@<parsing functions@>=
void check_param_def(Ref *df)
{ if(df->k!=int_kind && df->k!=dimen_kind && @| df->k!=glue_kind)
QUIT("Kind %s not allowed in parameter list", definition_name[df->k]);
if(df->n<=max_fixed[df->k] || max_default[df->k]<df->n)
QUIT("Parameter %d for %s not allowed in parameter list", df->n, definition_name[df->k]);
}
@
The definitions below repeat the definitions we have seen for lists in section~\secref{plainlists}
with small modifications. For example we use the kind-value |param_kind|. An empty parameter list
is omitted in the long format as well as in the short format.
\goodbreak
\vbox{\readcode\vskip -\baselineskip\putcode}
@s PARAM symbol
@s def_list symbol
@s parameters_node symbol
@s def_node symbol
@s parameters symbol
@s named_param_list symbol
@s param_list symbol
@<symbols@>=
%token PARAM "param"
%type <u> def_list
%type <l> parameters
@
@<scanning rules@>=
::@=param@> :< return PARAM; >:
@
@<parsing rules@>=
def_list: position @+
| def_list def_node {check_param_def(&($2));};
parameters: estimate def_list { $$.p=$2; $$.t=TAG(param_kind,b001); $$.s=(hpos-hstart)-$2;};
@
Using a parsing rule like
``\nts{param\_list}: \nts{start} \ts{PARAM} \nts{parameters} \ts{END}'',
an empty parameter list will be written as ``\.{<param>}''.
This looks ugly and seems like unnecessary syntax because
the parser knows anyway that a parameter list will come next.
Therefore the keyword can be omited except in definitions and in unknown nodes.
@<parsing rules@>=
named_param_list: start PARAM parameters END @/
{ @+ hput_tags($1,hput_list($1+1,&($3)));@+};
param_list: named_param_list | start parameters END @/
{ @+ hput_tags($1,hput_list($1+1,&($2)));@+};
@
\writecode
@<write functions@>=
void hwrite_parameters(List *l)
{ uint32_t h=hpos-hstart, e=hend-hstart; /* save |hpos| and |hend| */
hpos=l->p+hstart;@+ hend=hpos+l->s;
if (l->s>0xFF) hwritef(" %d",l->s);
while(hpos<hend) hget_def_node();
hpos=hstart+h;@+ hend=hstart+e; /* restore |hpos| and |hend| */
}
void hwrite_param_list(List *l)
{ hwrite_start();@+
hwrite_parameters(l);
hwrite_end();
}
void hwrite_named_param_list(List *l)
{ hwrite_start();@+
hwritef("param");
hwrite_parameters(l);
hwrite_end();
}
@
\getcode
@<get functions@>=
void hget_param_list(List *l)
{ @+if (KIND(*hpos)!=param_kind) @/
QUIT("Parameter list expected at 0x%x", (uint32_t)(hpos-hstart));
else hget_list(l);
}
@
\subsection{Fonts}\label{fonts}
Another definition that has no corresponding content node is the
font\index{font} definition. Fonts by themselves do not constitute
content, instead they are used in glyph\index{glyph} nodes.
Further, fonts are never directly embedded in a content node; in a content node, a
font is always specified by its font number. This limits the number of
fonts that can be used in a \HINT\ file to at most 256.
A long format font definition starts with the keyword ``\.{font}'' and
is followed by the font number, as usual prefixed by an asterisk. Then
comes the font specification with the font name, the font size,
the section number of the \TeX\ font metric file, and the
section number of the file containing the glyphs for the font.
The \HINT\ format supports \.{.pk} files, the traditional font format
for \TeX, and the more modern PostScript Type 1 fonts,
TrueType fonts, and OpenType fonts.
Starting with version 2.2, there is new support for TrueType and OpenType
fonts while \.{.pk} fonts are considered deprecated.
The previously mandatory \.{.tfm} file is no longer required for
TrueType and OpenType fonts. Instead, the hint viewer is required to
extract the necessary font metrics directly from the font files.
In a \HINT\ file, text is represented as a sequence of numbers called
character codes. \HINT\ files use the UTF-8 character encoding
scheme (CES) to map these numbers to their representation as byte
sequences. For example the number ``|0xE4|'' is encoded as the byte
sequence ``|0xC3| |0xA4|''. The same number |0xE4| now can represent
different characters depending on the coded character set (CCS). For
example in the common ISO-8859-1 (Latin 1) encoding the number |0xE4|
is the umlaut ``\"a'' where as in the ISO-8859-7 (Latin/Greek) it is
the Greek letter ``$\delta$'' and in the EBCDIC encoding, used on IBM
mainframes, it is the upper case letter ``U''.
The character encoding is
irrelevant for rendering a \HINT\ file as long as the character codes
in the glyph nodes are consistent with the character codes used in the font
file, but the character encoding is necessary for all programs that
need to ``understand'' the content of the \HINT\ file. For example
programs that want to translate a \HINT\ document to a different language,
or for text-to-speech conversion.
For FreeType and OpenType fonts, the \HINT\ viewer is required to support
only two character encodings: |FT_ENCODING_ADOBE_CUSTOM|, used for the
traditional \TeX\ encoding schema of fonts;
and |FT_ENCODING_UNICODE| used for TrueType and OpenType fonts that do not have
a \.{.tfm} file along with them. To be precise: a font that has no \.{.tfm}
file along with it must be encoded in Unicode. That is a glyph node
will specify the unicode value of the desired character, which is then
translated to the glyph number using |FT_ENCODING_UNICODE|.
For ligature nodes, the node will speciy the glyph number directly without
a need to translate it further, but the replacement list of the ligature
will contain the unicode values of the replacement characters.
In the short format we use the info value |b000| for a font with \.{.tfm} file
and the info value |b001| for a font without \.{.tfm} file.
The Internet Engineering Task Force IETF has established a character set
registry\cite{ietf:charset-mib} that defines an enumeration of all
registered coded character sets\cite{iana:charset-mib}. The coded
character set numbers are in the range 1--2999.
This encoding number, as given in~\cite{iana:charset},
might be one possibility for specifying the font encoding as
part of a font definition. But none such addition is planed at the moment.
Currently, it is only required that a font specifies
an interword glue and a default discretionary break. After that comes
a list of up to 12 font specific parameters.
The font size specifies the desired ``at size''\index{font at size}
which might be different from the ``design size''\index{font design size}
of the font as stored in the \.{.tfm} file.
In the short format, the font specification is given in the same order
as in the long format.
Our internal representation of a font just stores the font name
because in the long format we add the font name as a comment to glyph
nodes.
@<common variables@>=
char **hfont_name; /* dynamically allocated array of font names */
@
@<hint basic types@>=
#define MAX_FONT_PARAMS 11
@
@<initialize definitions@>=
ALLOCATE(hfont_name,max_ref[font_kind]+1,char *);
@
\readcode
@s FONT symbol
@s fref symbol
@s font_param_list symbol
@s font_param symbol
@s font_head symbol
@<symbols@>=
%token FONT "font"
%type <info> font font_head
@
@<scanning rules@>=
::@=font@> :< return FONT; >:
@
Note that we set the definition bit early because the definition of font |f|
might involve glyphs that reference font |f| (or other fonts).
@<parsing rules@>=@/
font: font_head font_param_list;
font_head: string dimension UNSIGNED UNSIGNED @/
{uint8_t f=$<u>@&0; SET_DBIT(f,font_kind); @+hfont_name[f]=strdup($1); $$=hput_font_head(f,hfont_name[f],$2,$3,$4);};
font_head: string dimension UNSIGNED @/
{uint8_t f=$<u>@&0; SET_DBIT(f,font_kind); @+hfont_name[f]=strdup($1); $$=hput_font_head(f,hfont_name[f],$2,-1,$3);};
font_param_list: glue_node disc_node @+ | font_param_list font_param ;
font_param: @/
start PENALTY fref penalty END { hput_tags($1,hput_int($4));}
| start KERN fref kern END { hput_tags($1,hput_kern(&($4)));}
| start LIGATURE fref ligature END { hput_tags($1,hput_ligature(&($4)));}
| start DISC fref disc END { hput_tags($1,hput_disc(&($4)));}
| start GLUE fref glue END { hput_tags($1,hput_glue(&($4)));}
| start LANGUAGE fref string END { hput_string($4);hput_tags($1,TAG(language_kind,0));}
| start RULE fref rule END { hput_tags($1,hput_rule(&($4)));}
| start IMAGE fref image END { hput_tags($1,TAG(image_kind,$4));};
fref: ref @/{ RNG("Font parameter",$1,0,MAX_FONT_PARAMS); };
@
\goodbreak
\vbox{\getcode\vskip -\baselineskip\writecode}
@<get functions@>=
static void hget_font_params(void)
{ Disc h;
hget_glue_node();
hget_disc_node(&(h));@+ hwrite_disc_node(&h);
DBG(DBGDEF,"Start font parameters\n");
while (KIND(*hpos)!=font_kind)@/
{ Ref df;
@<read the start byte |a|@>@;
df.k=KIND(a);
df.n=HGET8;
DBG(DBGDEF,"Reading font parameter %d: %s\n",df.n, definition_name[df.k]);
if (df.k!=penalty_kind && df.k!=kern_kind && df.k!=ligature_kind && @|
df.k!=disc_kind && df.k!=glue_kind && df.k!=language_kind && @| df.k!=rule_kind && df.k!=image_kind)
QUIT("Font parameter %d has invalid type %s",df.n, content_name[df.n]);
RNG("Font parameter",df.n,0,MAX_FONT_PARAMS);
hwrite_start(); @+ hwritef("%s *%d",content_name[KIND(a)],df.n);
hget_definition(df.n,a,node_pos);
hwrite_end();
@<read and check the end byte |z|@>@;
}
DBG(DBGDEF,"End font parameters\n");
}
void hget_font_def(Info i, uint8_t f)
{ char *n; @+Dimen s=0;@+uint16_t m,y;
HGET_STRING(n);@+ hwrite_string(n);@+ hfont_name[f]=strdup(n);
HGET32(s); @+ hwrite_dimension(s);
DBG(DBGDEF,"Font %s size 0x%x\n", n, s);
if (i==b000) { HGET16(m); @+RNG("Font metrics",m,3,max_section_no); }
HGET16(y); @+RNG("Font glyphs",y,3,max_section_no);
if (i==b000) hwritef(" %d",m);
hwritef(" %d",y);
hget_font_params();
DBG(DBGDEF,"End font definition\n");
}
@
\putcode
@<put functions@>=
Tag hput_font_head(uint8_t f, char *n, Dimen s, @| int m, uint16_t y)
{ Info i;
DBG(DBGDEF,"Defining font %d (%s) size 0x%x\n", f, n, s);
hput_string(n);
HPUT32(s);@+
if (m>=0) {i=b000; HPUT16(m);} else i=b001;
HPUT16(y);
return TAG(font_kind,i);
}
@
\subsection{References}\label{reference}
We have seen how to make definitions, now let's see how to
reference\index{reference} them. In the long form, we can simply
write the reference number, after the keyword like this:
``{\tt \.{<}glue *17\.{>}}''.
The asterisk\index{asterisk} is necessary to keep apart,
for example, a penalty with value 50,
written ``{\tt \.{<}penalty 50\.{>}}'',
from a penalty referencing the integer
definition number 50, written ``{\tt \.{<}penalty *50\.{>}}''.
@<hint types@>=
typedef struct { @+Kind k; @+int n; @+} Ref;
@
\goodbreak
\vbox{\readcode\vskip -\baselineskip\putcode}
@<parsing rules@>=
xdimen_ref: ref { REF(xdimen_kind,$1);};
param_ref: ref { REF(param_kind,$1); };
stream_ref: ref { REF_RNG(stream_kind,$1); };
content_node:
start PENALTY ref END @/{ REF(penalty_kind,$3); @+ hput_tags($1,TAG(penalty_kind,0)); }
|start KERN explicit ref END @/
{ REF(dimen_kind,$4); @+ hput_tags($1,TAG(kern_kind,($3)?b100:b000)); }
|start KERN explicit XDIMEN ref END @/
{ REF(xdimen_kind,$5); @+hput_tags($1,TAG(kern_kind,($3)?b101:b001)); }
|start GLUE ref END @/{ REF(glue_kind,$3); @+ hput_tags($1,TAG(glue_kind,0)); }
|start LIGATURE ref END @/{ REF(ligature_kind,$3); @+ hput_tags($1,TAG(ligature_kind,0)); }
|start DISC ref END @/{ REF(disc_kind,$3); @+ hput_tags($1,TAG(disc_kind,0)); }
|start RULE ref END @/{ REF(rule_kind,$3); @+ hput_tags($1,TAG(rule_kind,0)); }
|start IMAGE ref END @/{ REF(image_kind,$3);@+ hput_tags($1,TAG(image_kind,0)); }
|start LEADERS ref END @/{ REF(leaders_kind,$3); @+ hput_tags($1,TAG(leaders_kind,0)); }
|start BASELINE ref END @/{ REF(baseline_kind,$3);@+ hput_tags($1,TAG(baseline_kind,0)); }
|start LANGUAGE REFERENCE END @/{ REF(language_kind,$3);@+ hput_tags($1,hput_language($3)); };
glue_node: start GLUE ref END @/{ REF(glue_kind,$3);
if ($3==zero_skip_no) { hpos=hpos-2; $$=false;@+ }
else {hput_tags($1,TAG(glue_kind,0)); $$=true;@t\2@>@+}};
@
\getcode
@<cases to get content@>=
@t\1\kern1em@>
case TAG(penalty_kind,0): HGET_REF(penalty_kind); @+break;
case TAG(kern_kind,b000): HGET_REF(dimen_kind); @+break;
case TAG(kern_kind,b100): hwritef(" !"); @+HGET_REF(dimen_kind); @+break;
case TAG(kern_kind,b001): @| hwritef(" xdimen");@+ HGET_REF(xdimen_kind); @+break;
case TAG(kern_kind,b101): @| hwritef(" ! xdimen");@+ HGET_REF(xdimen_kind); @+break;
case TAG(ligature_kind,0): HGET_REF(ligature_kind); @+break;
case TAG(disc_kind,0): HGET_REF(disc_kind); @+break;
case TAG(glue_kind,0): HGET_REF(glue_kind); @+break;
case TAG(language_kind,b000): HGET_REF(language_kind); @+break;
case TAG(rule_kind,0): HGET_REF(rule_kind); @+break;
case TAG(image_kind,0): HGET_REF(image_kind); @+break;
case TAG(leaders_kind,0): HGET_REF(leaders_kind); @+break;
case TAG(baseline_kind,0): HGET_REF(baseline_kind); @+break;
@
@<get macros@>=
#define @[HGET_REF(K)@] {uint8_t n=HGET8;@+ REF(K,n); @+hwrite_ref(n);@+}
@
\writecode
@<write functions@>=
void hwrite_ref(int n)
{hwritef(" *%d",n);@+}
void hwrite_ref_node(Kind k, uint8_t n)
{ hwrite_start(); @+hwritef("%s",content_name[k]);@+ hwrite_ref(n); @+hwrite_end();}
@
\section{Defaults}\label{defaults}\index{default value}
Several of the predefined values found in the definition section are used
as parameters for the routines borrowed from \TeX\ to display the content
of a \HINT\ file. These values must be defined, but it is inconvenient if
the same standard definitions need to be placed in each and every \HINT\ file.
Therefore we specify in this chapter reasonable default values.
As a consequence, even a \HINT\ file without any definitions should
produce sensible results when displayed.
The definitions that have default values are integers, dimensions,
extended dimensions, glues, baselines, labels, page templates,
streams, and page ranges.
Each of these defaults has its own subsection below.
Actually the defaults for extended dimensions, baselines, and labels
are not needed by \TeX's routines, but it is nice to have default
values for the extended dimensions that represent
\.{hsize}, \.{vsize}, a zero baseline skip, and a label for the table
of content.
The array |max_default| contains for each kind-value the maximum number of
the default values. The function |hset_max| is used to initialize them.
The programs \.{shrink} and \.{stretch} actually do not use the defaults,
but it would be possible to suppress definitions if the defined value
is the same as the default value.
%
We start by setting |max_default[k]==-1|, meaning no defaults,
and |max_fixed[k]==0x10000|, meaning no definitions.
The following subsections will then overwrite these values for
all kinds of definitions that have defaults.
It remains to reset |max_fixed| to $-1$ for all those kinds
that have no defaults but allow definitions.
@<take care of variables without defaults@>=
for (k=0; k<32; k++) max_default[k]=-1,max_fixed[k]=0x10000;
@/@t}$\hangindent=1em${@>max_fixed[font_kind]= max_fixed[ligature_kind]= max_fixed[disc_kind]
@|=max_fixed[language_kind]=max_fixed[rule_kind]= max_fixed[image_kind]
@|= max_fixed[leaders_kind]= max_fixed[param_kind]=max_fixed[label_kind]@|= -1;
@
\subsection{Integers}
Integers\index{integer} are very simple objects, and it might be tempting not to
use predefined integers at all. But the \TeX\ typesetting engine,
which is used by \HINT, uses many integer parameters to fine tune
its operations. As we will see, all these integer parameters have a predefined
integer number that refers to an integer definition.
Integers and penalties\index{penalty} share the same kind-value. So a penalty node that references
one of the predefined penalties, simply contains the integer number as a reference
number.
The following integer numbers are predefined.
The zero integer is fixed with integer number zero. %It is never redefined.
The default values are taken from {\tt plain.tex}.
@<default names@>=
typedef enum {@t}$\hangindent=2em${@>
zero_int_no=0,
pretolerance_no=1,
tolerance_no=2,
line_penalty_no=3,
hyphen_penalty_no=4,
ex_hyphen_penalty_no=5,
club_penalty_no=6,
widow_penalty_no=7,
display_widow_penalty_no=8,
broken_penalty_no=9,
pre_display_penalty_no=10,
post_display_penalty_no=11,
inter_line_penalty_no=12,
double_hyphen_demerits_no=13,
final_hyphen_demerits_no=14,
adj_demerits_no=15,
looseness_no=16,
time_no=17,
day_no=18,
month_no=19,
year_no=20,
hang_after_no=21,
floating_penalty_no=22
} Int_no;
#define MAX_INT_DEFAULT floating_penalty_no
@
@<define |int_defaults|@>=
max_default[int_kind]=MAX_INT_DEFAULT;
max_fixed[int_kind]=zero_int_no;
int_defaults[zero_int_no]=0;
int_defaults[pretolerance_no]=100;
int_defaults[tolerance_no]=200;
int_defaults[line_penalty_no]=10;
int_defaults[hyphen_penalty_no]=50;
int_defaults[ex_hyphen_penalty_no]=50;
int_defaults[club_penalty_no]=150;
int_defaults[widow_penalty_no]=150;
int_defaults[display_widow_penalty_no]=50;
int_defaults[broken_penalty_no]=100;
int_defaults[pre_display_penalty_no]=10000;
int_defaults[post_display_penalty_no]=0;
int_defaults[inter_line_penalty_no]=0;
int_defaults[double_hyphen_demerits_no]=10000;
int_defaults[final_hyphen_demerits_no]=5000;
int_defaults[adj_demerits_no]=10000;
int_defaults[looseness_no]=0;
int_defaults[time_no]=720;
int_defaults[day_no]=4;
int_defaults[month_no]=7;
int_defaults[year_no]=1776;
int_defaults[hang_after_no]=1;
int_defaults[floating_penalty_no]=20000;
@#
printf("int32_t int_defaults[MAX_INT_DEFAULT+1]={");
for (i=0; i<= max_default[int_kind];i++)@/
{ printf("%d",int_defaults[i]);@+
if (i<max_default[int_kind]) printf(", ");@+
}
printf("};\n\n");
@
\subsection{Dimensions}
Notice that there are default values for the two dimensions \.{hsize} and \.{vsize}.
These are the ``design sizes'' for the hint file. While it might not be possible
to display the \HINT\ file using these values of \.{hsize} and \.{vsize},
these are the author's recommendation for the best ``viewing experience''.
\noindent
@<default names@>=
typedef enum {@t}$\hangindent=2em${@>
zero_dimen_no=0,
hsize_dimen_no=1,
vsize_dimen_no=2,
line_skip_limit_no=3,
max_depth_no=4,
split_max_depth_no=5,
hang_indent_no=6,
emergency_stretch_no=7,
quad_no=8,
math_quad_no=9
} Dimen_no;
#define MAX_DIMEN_DEFAULT math_quad_no
@
@<define |dimen_defaults|@>=
max_default[dimen_kind]=MAX_DIMEN_DEFAULT;
max_fixed[dimen_kind]=zero_dimen_no;@#
dimen_defaults[zero_dimen_no]=0;
dimen_defaults[hsize_dimen_no]=(Dimen)(6.5*72.27*ONE);
dimen_defaults[vsize_dimen_no]=(Dimen)(8.9*72.27*ONE);
dimen_defaults[line_skip_limit_no]=0;
dimen_defaults[split_max_depth_no]=(Dimen)(3.5*ONE);
dimen_defaults[hang_indent_no]=0;
dimen_defaults[emergency_stretch_no]=0;
dimen_defaults[quad_no]=10*ONE;
dimen_defaults[math_quad_no]=10*ONE;@#
printf("Dimen dimen_defaults[MAX_DIMEN_DEFAULT+1]={");
for (i=0; i<= max_default[dimen_kind];i++)
{ printf("0x%x",dimen_defaults[i]);
if (i<max_default[dimen_kind]) printf(", ");
}
printf("};\n\n");
@
\subsection{Extended Dimensions}
Extended dimensions\index{extended dimension} can be used in a variety of nodes for example
kern\index{kern} and box\index{box} nodes.
We define three fixed extended dimensions: zero, hsize, and vsize.
In contrast to the \.{hsize} and \.{vsize} dimensions defined in the previous
section, the extended dimensions defined here are linear functions that always evaluate
to the current horizontal and vertical size in the viewer.
@<default names@>=
typedef enum {
zero_xdimen_no=0,
hsize_xdimen_no=1,
vsize_xdimen_no=2
} Xdimen_no;
#define MAX_XDIMEN_DEFAULT vsize_xdimen_no
@
@<define |xdimen_defaults|@>=
max_default[xdimen_kind]=MAX_XDIMEN_DEFAULT;
max_fixed[xdimen_kind]=vsize_xdimen_no;@#
printf("Xdimen xdimen_defaults[MAX_XDIMEN_DEFAULT+1]={"@/
"{0x0, 0.0, 0.0}, {0x0, 1.0, 0.0}, {0x0, 0.0, 1.0}"@/
"};\n\n");
@
\subsection{Glue}
There are predefined glue\index{glue} numbers that correspond to the skip parameters of \TeX.
The default values are taken from {\tt plain.tex}.
@<default names@>=
typedef enum {@t}$\hangindent=2em${@>
zero_skip_no=0,
fil_skip_no=1,
fill_skip_no=2,
line_skip_no=3,
baseline_skip_no=4,
above_display_skip_no=5,
below_display_skip_no=6,
above_display_short_skip_no=7,
below_display_short_skip_no=8,
left_skip_no=9,
right_skip_no=10,
top_skip_no=11,
split_top_skip_no=12,
tab_skip_no=13,
par_fill_skip_no=14
} Glue_no;
#define MAX_GLUE_DEFAULT par_fill_skip_no
@
@<define |glue_defaults|@>=
max_default[glue_kind]=MAX_GLUE_DEFAULT;
max_fixed[glue_kind]=fill_skip_no;
glue_defaults[fil_skip_no].p.f=1.0;
glue_defaults[fil_skip_no].p.o=fil_o;
glue_defaults[fill_skip_no].p.f=1.0;
glue_defaults[fill_skip_no].p.o=fill_o;@#
glue_defaults[line_skip_no].w.w=1*ONE;
glue_defaults[baseline_skip_no].w.w=12*ONE;
glue_defaults[above_display_skip_no].w.w=12*ONE;
glue_defaults[above_display_skip_no].p.f=3.0;
glue_defaults[above_display_skip_no].p.o=normal_o;
glue_defaults[above_display_skip_no].m.f=9.0;
glue_defaults[above_display_skip_no].m.o=normal_o;
glue_defaults[below_display_skip_no].w.w=12*ONE;
glue_defaults[below_display_skip_no].p.f=3.0;
glue_defaults[below_display_skip_no].p.o=normal_o;
glue_defaults[below_display_skip_no].m.f=9.0;
glue_defaults[below_display_skip_no].m.o=normal_o;
glue_defaults[above_display_short_skip_no].p.f=3.0;
glue_defaults[above_display_short_skip_no].p.o=normal_o;
glue_defaults[below_display_short_skip_no].w.w=7*ONE;
glue_defaults[below_display_short_skip_no].p.f=3.0;
glue_defaults[below_display_short_skip_no].p.o=normal_o;
glue_defaults[below_display_short_skip_no].m.f=4.0;
glue_defaults[below_display_short_skip_no].m.o=normal_o;
glue_defaults[top_skip_no].w.w=10*ONE;
glue_defaults[split_top_skip_no].w.w=(Dimen)8.5*ONE;
glue_defaults[par_fill_skip_no].p.f=1.0;
glue_defaults[par_fill_skip_no].p.o=fil_o;
#define @[PRINT_GLUE(G)@] \
@[printf("{{0x%x, %f, %f},{%f, %d},{%f, %d}}",\
G.w.w, G.w.h, G.w.v, G.p.f, G.p.o, G.m.f,G.m.o)@]@#
printf("Glue glue_defaults[MAX_GLUE_DEFAULT+1]={\n");
for (i=0; i<= max_default[glue_kind];i++)@/
{ PRINT_GLUE(glue_defaults[i]); @+
if (i<max_default[int_kind]) printf(",\n");
}
printf("};\n\n");
@
We fix the glue definition with number zero to be the ``zero glue'': a
glue with width zero and zero stretchability and shrinkability. Here
is the reason: In the short format, the info bits of a glue node
indicate which components of a glue are nonzero. Therefore the zero
glue should have an info value of zero---which on the other hand is
reserved for a reference to a glue definition. Hence, the best way to
represent a zero glue is as a predefined glue.
\subsection{Baseline Skips}
The zero baseline\index{baseline skip} which inserts no baseline skip is predefined.
@<default names@>=
typedef enum {@+
zero_baseline_no=0@+
} Baseline_no;
#define MAX_BASELINE_DEFAULT zero_baseline_no
@
@<define |baseline_defaults|@>=
max_default[baseline_kind]=MAX_BASELINE_DEFAULT;
max_fixed[baseline_kind]=zero_baseline_no;@#
{ Baseline z={{{0}}};
printf("Baseline baseline_defaults[MAX_BASELINE_DEFAULT+1]={{");
PRINT_GLUE(z.bs); @+printf(", "); @+PRINT_GLUE(z.ls); printf(", 0x%x}};\n\n",z.lsl);
}
@
\subsection{Labels}
The zero label\index{label} is predefined. It should point to the
``home'' position of the document which should be the position
where a user can start reading or navigating the document.
For a short document this is usually the start of the document,
and hence, the default is the first position of the content section.
For a larger document, the home position could point to the
table of content where a reader will find links to other parts
of the document.
@<default names@>=
typedef enum {@+
zero_label_no=0@+
} Label_no;
#define MAX_LABEL_DEFAULT zero_label_no
@
@<define |label_defaults|@>=
max_default[label_kind]=MAX_LABEL_DEFAULT;
printf("Label label_defaults[MAX_LABEL_DEFAULT+1]="@|"{{0,0,LABEL_TOP,true,0,0}};\n\n");
@
\subsection{Streams}
The zero stream\index{stream} is predefined for the main content.
@<default names@>=
typedef enum {@+
zero_stream_no=0@+
} Stream_no;
#define MAX_STREAM_DEFAULT zero_stream_no
@
@<define stream defaults@>=
max_default[stream_kind]=MAX_STREAM_DEFAULT;
max_fixed[stream_kind]=zero_stream_no;
@
\subsection{Page Templates}
The zero page template\index{template} is a predefined, built-in page template.
@<default names@>=
typedef enum {@+
zero_page_no=0@+
} Page_no;
#define MAX_PAGE_DEFAULT zero_page_no
@
@<define page defaults@>=
max_default[page_kind]=MAX_PAGE_DEFAULT;
max_fixed[page_kind]=zero_page_no;
@
\subsection{Page Ranges}
The page\index{page range} range for the zero page template is
the entire content section.
@<default names@>=
typedef enum {@+
zero_range_no=0@+
} Range_no;
#define MAX_RANGE_DEFAULT zero_range_no
@
@<define range defaults@>=
max_default[range_kind]=MAX_RANGE_DEFAULT;
max_fixed[range_kind]=zero_range_no;
@
\subsection{List, Texts, and Parameters}
@<default names@>=
typedef enum {@+
empty_list_no=0@+
} List_no;
#define MAX_LIST_DEFAULT empty_list_no
@
@<define range defaults@>=
max_default[list_kind]=MAX_LIST_DEFAULT;
max_fixed[list_kind]=empty_list_no;
max_default[param_kind]=MAX_LIST_DEFAULT;
max_fixed[param_kind]=empty_list_no;
@
\subsection{Colors}\label{colordefault}
@<default names@>=
typedef enum {@+
zero_color_no=0, link_color_no=1@+
} Color_no;
#define MAX_COLOR_DEFAULT link_color_no
@
The default colors for day mode are
black on white, red on white, and green on white;
the links in day mode are blue.
In night mode the background becomes black, the normal text white
and the other colors become slightly lighter.
We store the default color set using an byte array in RGBA format for colors;
we combine a pair of colors for foreground and background in an array;
we combine three pairs for normal, mark, and focus text in an array;
and we define a color set as two such pairs, one for day and one for night mode
to define the default colors.
@<define |color_defaults|@>=
max_default[color_kind]=MAX_COLOR_DEFAULT;
max_fixed[color_kind]=-1;
printf("ColorSet color_defaults[MAX_COLOR_DEFAULT+1]=\n"@|
"{{0x000000FF, 0xFFFFFF00,\n" /* black on white */@|
" 0xEE0000FF, 0xFFFFFF00,\n" /* dark red */
" 0x00EE00FF, 0xFFFFFF00,\n" /* dark green */@|
" 0xFFFFFFFF, 0x00000000," /* white on black */
" 0xFF1111FF, 0x00000000,\n" /* light red */
" 0x11FF11FF, 0x00000000},\n" /* light green*/@|
" {0x0000EEFF, 0xFFFFFF00,\n" /* dark blue on white */
" 0xEE0000FF, 0xFFFFFF00,\n" /* dark red on white */
" 0x00EE00FF, 0xFFFFFF00,\n" /* dark green on white */@|
" 0x1111FFFF, 0x00000000,\n" /* light blue on black */
" 0xFF1111FF, 0x00000000,\n" /* light red on black */
" 0x11FF11FF, 0x00000000\n" /* light green on black */
"}};\n\n");
@
\section{Content Section}
The content section\index{content section} is just a list of nodes.
Within the \.{shrink} program,
reading a node in long format will trigger writing the node in short format.
Similarly within the \.{stretch} program, reading a node
in short form will cause writing it in long format. As a consequence,
the main task of writing the content section in long format is accomplished
by calling |get_content| and writing it in the short format
is accomplished by parsing the |content_list|.
%\readcode
\codesection{\redsymbol}{Reading the Long Format}\redindex{1}{6}{Content Section}
\label{content}%
@s CONTENT symbol
@<symbols@>=
%token CONTENT "content"
@
@<scanning rules@>=
::@=content@> :< return CONTENT; >:
@
@<parsing rules@>=
content_section: START CONTENT { hput_content_start(); } @| content_list END @|
{ hput_content_end(); hput_range_defs(); hput_label_defs(); };
@
%\writecode
\codesection{\wrtsymbol}{Writing the Long Format}\wrtindex{1}{6}{Content Section}
@<write functions@>=
void hwrite_content_section(void)
{ section_no=2;
hwritef("<content");
hsort_ranges();
hsort_labels();
hget_content_section();
hwritef("\n>\n");
}
@
%\getcode
\codesection{\getsymbol}{Reading the Short Format}\getindex{1}{6}{Content Section}
@<get functions@>=
void hget_content_section()
{ DBG(DBGBASIC|DBGDIR,"Content\n");
hget_section(2);
hwrite_range();
hwrite_label();
while(hpos<hend)
hget_content_node();
}
@
%\putcode
\codesection{\putsymbol}{Writing the Short Format}\putindex{1}{6}{Content Section}
@<put functions@>=
void hput_content_start(void)
{ DBG(DBGDIR,"Content Section\n");
section_no=2;
hpos0=hpos=hstart=dir[2].buffer;
hend=hstart+dir[2].bsize;
}
void hput_content_end(void)
{
dir[2].size=hpos-hstart; /* Updating the directory entry */
DBG(DBGDIR,"End Content Section, size=0x%x\n", dir[2].size);
}
@
\section{Processing the Command Line}
The following code explains the command line\index{command line}
parameters and options\index{option}\index{debugging}.
It tells us what to expect in the rest of this section.
{\def\SP{\hskip .5em}
@<explain usage@>=
fprintf(stdout,
"Usage: %s [OPTION]... FILENAME%s\n",prog_name, in_ext);@/
fprintf(stdout,DESCRIPTION);
fprintf(stdout,
"\nOptions:\n"@/
"\t --help \t display this message\n"@/
"\t --version\t display the HINT version\n"@/
"\t -l \t redirect stderr to a log file\n"@/
#if defined (STRETCH) || defined (SHRINK)
"\t -o FILE\t specify an output file name\n"@/
#endif
#if defined (STRETCH)
"\t -a \t write auxiliary files\n"@/
"\t -g \t do not use localized names (implies -a)\n"@/
"\t -f \t force overwriting existing auxiliary files\n"@/
"\t -u \t enable writing utf8 character codes\n"@/
"\t -x \t enable writing hexadecimal character codes\n"@/
#elif defined (SHRINK)
"\t -a \t use only localized names\n"@/
"\t -g \t do not use localized names\n"@/
"\t -c \t enable compression\n"@/
#endif
);
#ifdef DEBUG
fprintf(stdout,"\t -d XXXX \t set debug flag to hexadecimal value XXXX.\n"
"\t\t\t OR together these values:\n");@/
fprintf(stdout,"\t\t\t XX=%03X basic debugging\n", DBGBASIC);@/
fprintf(stdout,"\t\t\t XX=%03X tag debugging\n", DBGTAGS);@/
fprintf(stdout,"\t\t\t XX=%03X node debugging\n",DBGNODE);@/
fprintf(stdout,"\t\t\t XX=%03X definition debugging\n", DBGDEF);@/
fprintf(stdout,"\t\t\t XX=%03X directory debugging\n", DBGDIR);@/
fprintf(stdout,"\t\t\t XX=%03X range debugging\n",DBGRANGE);@/
fprintf(stdout,"\t\t\t XX=%03X float debugging\n", DBGFLOAT);@/
fprintf(stdout,"\t\t\t XX=%03X compression debugging\n", DBGCOMPRESS);@/
fprintf(stdout,"\t\t\t XX=%03X buffer debugging\n", DBGBUFFER);@/
fprintf(stdout,"\t\t\t XX=%03X flex debugging\n", DBGFLEX);@/
fprintf(stdout,"\t\t\t XX=%03X bison debugging\n", DBGBISON);@/
fprintf(stdout,"\t\t\t XX=%03X TeX debugging\n", DBGTEX);@/
fprintf(stdout,"\t\t\t XX=%03X Page debugging\n", DBGPAGE);@/
fprintf(stdout,"\t\t\t XX=%03X Font debugging\n", DBGFONT);@/
fprintf(stdout,"\t\t\t XX=%03X Render debugging\n", DBGRENDER);@/
fprintf(stdout,"\t\t\t XX=%03X Label debugging\n", DBGLABEL);@/
#endif
@
}
We define constants for different debug flags.
@<debug constants@>=
#define DBGNONE 0x0
#define DBGBASIC 0x1
#define DBGTAGS 0x2
#define DBGNODE 0x4
#define DBGDEF 0x8
#define DBGDIR 0x10
#define DBGRANGE 0x20
#define DBGFLOAT 0x40
#define DBGCOMPRESS 0x80
#define DBGBUFFER 0X100
#define DBGFLEX 0x200
#define DBGBISON 0x400
#define DBGTEX 0x800
#define DBGPAGE 0x1000
#define DBGFONT 0x2000
#define DBGRENDER 0x4000
#define DBGLABEL 0x8000
@
Next we define common variables that are
needed in all three programs defined here.
@<common variables@>=
unsigned int debugflags=DBGNONE;
int option_utf8=false;
int option_hex=false;
int option_force=false;
int option_global=false;
int option_aux=false;
int option_compress=false;
char *stem_name=NULL;
int stem_length=0;
@
The variable |stem_name| contains the name of the input file
not including the extension. The space allocated for it
is large enough to append an extension with up to five characters.
It can be used with the extension {\tt .log} for the log file,
with {\tt .hint} or {\tt .hnt} for the output file,
and with {\tt .abs} or {\tt .rel} when writing or reading the auxiliary sections.
The {\tt stretch} program will overwrite the |stem_name|
using the name of the output file if it is set with the {\tt -o}
option.
Next are the variables that are local in the |main| program.
@<local variables in |main|@>=
char *prog_name;
char *in_ext;
char *out_ext;
int option_log=false;
#ifndef SKIP
char *file_name=NULL;
int file_name_length=0;
#endif
@
Processing the command line looks for options and then sets the
input file name\index{file name}. For compatibility with
GNU standards, the long options {\tt --help} and {\tt --version}
are supported in addition to the short options.
@<process the command line@>=
debugflags=DBGBASIC;
prog_name=argv[0];
if (argc < 2)
{ fprintf(stderr,
"%s: no input file given\n"
"Try '%s --help' for more information\n",prog_name, prog_name);
exit(1);
}
argv++; /* skip the program name */
while (*argv!=NULL)
{ if ((*argv)[0]=='-')
{ char option=(*argv)[1];
switch(option)
{ case '-':
if (strcmp(*argv,"--version")==0)
{ fprintf(stderr,"%s version " HINT_VERSION_STRING "\n",prog_name);
exit(0);
}
else if (strcmp(*argv,"--help")==0)
{ @<explain usage@>@;
fprintf(stdout,"\nFor further information and reporting bugs see https://hint.userweb.mwn.de/\n");
exit(0);
}
case 'l': option_log=true; @+break;
#if defined (STRETCH) || defined (SHRINK)
case 'o': argv++;
file_name_length=(int)strlen(*argv);
ALLOCATE(file_name,file_name_length+6,char); /*plus extension*/
strcpy(file_name,*argv);@+ break;
case 'g': option_global=option_aux=true; @+break;
case 'a': option_aux=true; @+break;
#endif
#if defined (STRETCH)
case 'u': option_utf8=true;@+break;
case 'x': option_hex=true;@+break;
case 'f': option_force=true; @+break;
#elif defined (SHRINK)
case 'c': option_compress=true; @+break;
#endif
case 'd': @/
argv++; if (*argv==NULL)
{ fprintf(stderr,
"%s: option -d expects an argument\n"
"Try '%s --help' for more information\n",prog_name, prog_name);
exit(1);
}
debugflags=strtol(*argv,NULL,16);
break;
default:
{ fprintf(stderr,
"%s: unrecognized option '%s'\n"
"Try '%s --help' for more information\n",prog_name,*argv,prog_name);
exit(1);
}
}
}
else /* the input file name */
{ int path_length=(int)strlen(*argv);
int ext_length=(int)strlen(in_ext);
ALLOCATE(hin_name,path_length+ext_length+1,char);
strcpy(hin_name,*argv);
if (path_length<ext_length
|| strncmp(hin_name+path_length-ext_length,in_ext,ext_length)!=0)
{ strcat(hin_name,in_ext);
path_length+=ext_length;
}
stem_length=path_length-ext_length;
ALLOCATE(stem_name,stem_length+6,char);
strncpy(stem_name,hin_name,stem_length);
stem_name[stem_length]=0;
if (*(argv+1)!=NULL)
{ fprintf(stderr,
"%s: extra argument after input file name: '%s'\n"
"Try '%s --help' for more information\n",prog_name,*(argv+1),prog_name);
exit(1);
}
}
argv++;
}
if (hin_name==NULL)
{ fprintf(stderr,
"%s: missing input file name\n"
"Try '%s --help' for more information\n",prog_name,prog_name);
exit(1);
}
@
After the command line has been processed, three file streams need to be opened:
The input file |hin|\index{input file} and the output file |hout|\index{output file}.
Further we need a log file |hlog|\index{log file} if debugging is enabled.
For technical reasons, the scanner\index{scanning} generated by \.{flex} needs
an input file |yyin|\index{input file} which is set to |hin|
and an output file |yyout| (which is not used).
@<common variables@>=
FILE *hin=NULL, *hout=NULL, *hlog=NULL;
@
The log file is opened first because
this is the place where error messages\index{error message}
should go while the other files are opened.
It inherits its name from the input file name.
@<open the log file@> =
if (option_log)
{
strcat(stem_name,".log");
hlog=freopen(stem_name,"w",stderr);
if (hlog==NULL)
{ fprintf(stderr,"Unable to open logfile %s",stem_name);
hlog=stderr;
}
stem_name[stem_length]=0;
}
else
hlog=stderr;
@
Once we have established logging, we can try to open the other files.
@<open the input file@>=
hin=fopen(hin_name,"rb");
if (hin==NULL) QUIT("Unable to open input file %s",hin_name);
@
@<open the output file@>=
if (file_name!=NULL)
{ int ext_length=(int)strlen(out_ext);
if (file_name_length<=ext_length
|| strncmp(file_name+file_name_length-ext_length,out_ext,ext_length)!=0)
{ strcat(file_name,out_ext); file_name_length+=ext_length; }
}
else
{ file_name_length=stem_length+(int)strlen(out_ext);
ALLOCATE(file_name,file_name_length+1,char);
strcpy(file_name,stem_name);@+
strcpy(file_name+stem_length,out_ext);
}
{ char *aux_name=file_name;
@<make sure the path in |aux_name| exists@>@;
aux_name=NULL;
}
hout=fopen(file_name,"wb");
if (hout==NULL) QUIT("Unable to open output file %s",file_name);
@
The {\tt stretch} program will replace the |stem_name| using the stem of the
output file.
@<determine the |stem_name| from the output |file_name|@>=
stem_length=file_name_length-(int)strlen(out_ext);
ALLOCATE(stem_name,stem_length+6,char);
strncpy(stem_name,file_name,stem_length);
stem_name[stem_length]=0;
@
At the very end, we will close the files again.
@<close the input file@>=
if (hin_name!=NULL) free(hin_name);
if (hin!=NULL) fclose(hin);
@
@<close the output file@>=
if (file_name!=NULL) free(file_name);
if (hout!=NULL) fclose(hout);
@
@<close the log file@>=
if (hlog!=NULL) fclose(hlog);
if (stem_name!=NULL) free(stem_name);
@
\section{Error Handling and Debugging}\label{error_section}
There is no good program without good error handling\index{error message}\index{debugging}.
To print messages\index{message} or indicate errors, I define the following macros:
\index{MESSAGE+\.{MESSAGE}}\index{QUIT+\.{QUIT}}
@(error.h@>=
#ifndef _ERROR_H
#define _ERROR_H
#include <stdlib.h>
#include <stdio.h>
extern FILE *hlog;
extern uint8_t *hpos, *hstart;
#ifndef LOG_PREFIX
#define LOG_PREFIX "HINT "
#endif
#define @[LOG(...)@] @[(fprintf(hlog,LOG_PREFIX __VA_ARGS__),fflush(hlog))@]
#define @[MESSAGE(...)@] @[(fprintf(hlog,LOG_PREFIX __VA_ARGS__),fflush(hlog))@]
#define @[QUIT(...)@] (MESSAGE("ERROR: " __VA_ARGS__),fprintf(hlog,"\n"),exit(1))
#endif
@
The amount of debugging\index{debugging} depends on the debugging flags.
For portability, we first define the output specifier for expressions of type |size_t|.
\index{DBG+\.{DBG}}\index{SIZE F+\.{SIZE\_F}}\index{DBGTAG+\.{DBGTAG}}
\index{RNG+\.{RNG}}\index{TAGERR+\.{TAGERR}}
@<debug macros@>=
#ifdef WIN32
#define SIZE_F "0x%tx"
#else
#define SIZE_F "0x%tx"
#endif
#ifdef DEBUG
#define @[DBG(FLAGS,...)@] ((debugflags & (FLAGS))?LOG(__VA_ARGS__):0)
#else
#define @[DBG(FLAGS,...)@] (void)0
#endif
#define @[DBGTAG(A,P)@] @[DBG(DBGTAGS,@["tag [%s,%d] at " SIZE_F "\n"@],@|NAME(A),INFO(A),(P)-hstart)@]
#define @[RNG(S,N,A,Z)@] @/\
if ((int)(N)<(int)(A)||(int)(N)>(int)(Z)) QUIT(S@, " %d out of range [%d - %d]",N,A,Z)
#define @[TAGERR(A)@] @[QUIT(@["Unknown tag [%s,%d] at " SIZE_F "\n"@],NAME(A),INFO(A),hpos-hstart)@]
@
The \.{bison} generated parser will need a function |yyerror| for
error reporting. We can define it now:
@<parsing functions@>=
extern int yylineno;
int yyerror(const char *msg)
{ QUIT(" in line %d %s",yylineno,msg);
return 0;
}
@
To enable the generation of debugging code \.{bison} needs also the following:
@<enable bison debugging@>=
#ifdef DEBUG
#define YYDEBUG 1
extern int yydebug;
#else
#define YYDEBUG 0
#endif
@
\appendix
\section{Traversing Short Format Files}\label{fastforward}
For applications like searching or repositioning a file after reloading
a possibly changed version of a file, it is useful to have a fast way
of getting from one content node to the next.
For quite some nodes, it is possible to know the size of the
node from the tag. So the fastest way to get to the next node
is looking up the node size in a table.
Other important nodes, for example hbox, vbox, or par nodes, end with a
list node and it is possible to know the size of the node up to the final
list. With that knowledge it is possible to skip the initial
part of the node, then skip the list, and finally skip the tag byte.
The size of the initial part can be stored in the same node size table
using negated values. What works for lists,
of course, will work for other kinds of nodes as well.
So we use the lowest two bits of the values in the size table
to store the number of embedded nodes that follow after the initial part.
To combine the number of leading bytes and the number of trailing nodes
into a single number that encodes both values according to this formula
we use the macro |NODE_SIZE|. We can get back both values using the
macros |NODE_HEAD| and |NODE_TAIL|.
@<hint macros@>=
#define @[NODE_SIZE(H,T)@] ((T)==0?(H)+2:-4*((H)+1)+((T)-1))
#define @[NODE_HEAD(N)@] ((N)>0?(N)-2:-((N)>>2)-1)
#define @[NODE_TAIL(N)@] ((N)<0?((N)&0x3)+1:0)
@
For list nodes neither of these methods works and these nodes can be marked
with a zero entry in the node size table.
This leads to the following code for a ``fast forward'' function
for |hpos|:
@<shared skip functions@>=
uint32_t hff_list_pos=0, hff_list_size=0;
Tag hff_tag;
void hff_hpos(void)
{ signed char i,b,n;
hff_tag=*hpos;@+
DBGTAG(hff_tag,hpos);
i= hnode_size[hff_tag];
if (i>0) {hpos=hpos+NODE_HEAD(i)+2; @+return;@+ }
else if (i<0)
{ n=NODE_TAIL(i);@+ b=NODE_HEAD(i);
hpos=hpos+1+b; /* skip initial part */
while (n>0)
{ hff_hpos(); @+n--; @+} /* skip trailing nodes */
hpos++;/* skip end byte */
return;
}
else if (hff_tag <=TAG(param_kind,7))
@<advance |hpos| over a list@>@;
TAGERR(hff_tag);
}
@
We will put the |hnode_size| variable into the {\tt tables.c} file
using the following function. We add some comments and
split negative values into their components, to make the result more
readable.
@<print the |hnode_size| variable@>=
printf("signed char hnode_size[0x100]= {\n");
for (i=0; i<=0xff; i++)@/
{ signed char s = hnode_size[i];
if (s>=0) printf("%d",s); else printf("-4*%d+%d",-(s>>2),s&3);
if (i<0xff) printf(","); else printf("};");
if ((i&0x7)==0x7) printf(" /* %s */\n", content_name[KIND(i)]);
}
printf("\n\n");
@
When dealing with unknown content nodes, it is convenient to know which
nodes are known and which are not. For this purpose the |content_known|
array contains one byte for each kind value and each such bytes will
indicate using the seven least significant bits for which info values
the corresponding nodes are known.
@<print the |content_known| variable@>=
for (k=0;k<32;k++)
for(i=0;i<8;i++)
if (hnode_size[TAG(k,i)]!=0)
content_known[k]|=(1<<i);
printf("uint8_t content_known[32]= {\n");
for (k=0; k<32; k++)@/
{ printf("0x%02X",content_known[k]);
if (k<31) printf(",");
else printf("};");
printf(" /* %s */\n",content_name[k]);
}
printf("\n");
@
\subsection{Lists}\index{list}\index{text}\index{parameters}
List don't follow the usual schema of nodes. They have a variable size
that is stored in the node. We keep position and size in global variables
so that the list that ends a node can be conveniently located.
@<advance |hpos| over a list@>=
switch (INFO(hff_tag)&0x3){
case 0: hff_list_pos=hpos-hstart+1;hff_list_size=0; hpos=hpos+3;@+ return;
case 1: hpos++;@+ hff_list_size=HGET8;@+ hff_list_pos=hpos-hstart+1; hpos=hpos+1+hff_list_size+1+1+1;@+ return;
case 2: hpos++;@+ HGET16(hff_list_size);@+hff_list_pos=hpos-hstart+1; hpos=hpos+1+hff_list_size+1+2+1;@+ return;
case 3: hpos++;@+ HGET32(hff_list_size);@+hff_list_pos=hpos-hstart+1; hpos=hpos+1+hff_list_size+1+4+1;@+ return;
default: QUIT(@["List with unknown info [%s,%d] at " SIZE_F "\n"@],NAME(hff_tag),INFO(hff_tag),hpos-hstart);
}
@
Actually list nodes never occur as content nodes in their own right but only as subnodes of
content nodes.
Now let's consider the different kinds of nodes.
\subsection{Glyphs}\index{glyph}
We start with the glyph nodes. All glyph nodes
have a start and an end tag, one byte for the font,
and depending on the info from 1 to 4 bytes for the character code.
@<initialize the |hnode_size| array@>=
hnode_size[TAG(glyph_kind,1)] = NODE_SIZE(1+1,0);
hnode_size[TAG(glyph_kind,2)] = NODE_SIZE(1+2,0);
hnode_size[TAG(glyph_kind,3)] = NODE_SIZE(1+3,0);
hnode_size[TAG(glyph_kind,4)] = NODE_SIZE(1+4,0);
@
\subsection{Penalties}\index{penalty}
Penalty nodes either contain a one byte reference, a one byte number, or a two byte number.
@<initialize the |hnode_size| array@>=
hnode_size[TAG(penalty_kind,0)] = NODE_SIZE(1,0);
hnode_size[TAG(penalty_kind,1)] = NODE_SIZE(1,0);
hnode_size[TAG(penalty_kind,2)] = NODE_SIZE(2,0);
hnode_size[TAG(penalty_kind,3)] = NODE_SIZE(4,0);
@
\subsection{Kerns}\index{kern}
Kern nodes can contain a reference (either to a dimension or an extended dimension)
followed by either a dimension or an extended dimension node.
@<initialize the |hnode_size| array@>=
hnode_size[TAG(kern_kind,b000)] = NODE_SIZE(1,0);
hnode_size[TAG(kern_kind,b001)] = NODE_SIZE(1,0);
hnode_size[TAG(kern_kind,b010)] = NODE_SIZE(4,0);
hnode_size[TAG(kern_kind,b011)] = NODE_SIZE(0,1);
hnode_size[TAG(kern_kind,b100)] = NODE_SIZE(1,0);
hnode_size[TAG(kern_kind,b101)] = NODE_SIZE(1,0);
hnode_size[TAG(kern_kind,b110)] = NODE_SIZE(4,0);
hnode_size[TAG(kern_kind,b111)] = NODE_SIZE(0,1);
@
\subsection{Extended Dimensions}\index{extended dimension}
Extended dimensions contain either one two or three 4 byte values depending
on the info bits.
@<initialize the |hnode_size| array@>=
hnode_size[TAG(xdimen_kind,b100)] = NODE_SIZE(4,0);
hnode_size[TAG(xdimen_kind,b010)] = NODE_SIZE(4,0);
hnode_size[TAG(xdimen_kind,b001)] = NODE_SIZE(4,0);
hnode_size[TAG(xdimen_kind,b110)] = NODE_SIZE(4+4,0);
hnode_size[TAG(xdimen_kind,b101)] = NODE_SIZE(4+4,0);
hnode_size[TAG(xdimen_kind,b011)] = NODE_SIZE(4+4,0);
hnode_size[TAG(xdimen_kind,b111)] = NODE_SIZE(4+4+4,0);
@
\subsection{Language}\index{language}
Language nodes either code the language in the info value or they contain
a reference byte.
@<initialize the |hnode_size| array@>=
hnode_size[TAG(language_kind,b000)] = NODE_SIZE(1,0);
hnode_size[TAG(language_kind,1)] = NODE_SIZE(0,0);
hnode_size[TAG(language_kind,2)] = NODE_SIZE(0,0);
hnode_size[TAG(language_kind,3)] = NODE_SIZE(0,0);
hnode_size[TAG(language_kind,4)] = NODE_SIZE(0,0);
hnode_size[TAG(language_kind,5)] = NODE_SIZE(0,0);
hnode_size[TAG(language_kind,6)] = NODE_SIZE(0,0);
hnode_size[TAG(language_kind,7)] = NODE_SIZE(0,0);
@
\subsection{Rules}\index{rule}
Rules usually contain a reference, otherwise
they contain either one, two, or three 4 byte values depending
on the info bits.
@<initialize the |hnode_size| array@>=
hnode_size[TAG(rule_kind,b000)] = NODE_SIZE(1,0);
hnode_size[TAG(rule_kind,b100)] = NODE_SIZE(4,0);
hnode_size[TAG(rule_kind,b010)] = NODE_SIZE(4,0);
hnode_size[TAG(rule_kind,b001)] = NODE_SIZE(4,0);
hnode_size[TAG(rule_kind,b110)] = NODE_SIZE(4+4,0);
hnode_size[TAG(rule_kind,b101)] = NODE_SIZE(4+4,0);
hnode_size[TAG(rule_kind,b011)] = NODE_SIZE(4+4,0);
hnode_size[TAG(rule_kind,b111)] = NODE_SIZE(4+4+4,0);
@
\subsection{Glue}\index{glue}
Glues usually contain a reference or
they contain either one two or three 4 byte values depending
on the info bits, and possibly even an extended dimension node followed
by two 4 byte values.
@<initialize the |hnode_size| array@>=
hnode_size[TAG(glue_kind,b000)] = NODE_SIZE(1,0);
hnode_size[TAG(glue_kind,b100)] = NODE_SIZE(4,0);
hnode_size[TAG(glue_kind,b010)] = NODE_SIZE(4,0);
hnode_size[TAG(glue_kind,b001)] = NODE_SIZE(4,0);
hnode_size[TAG(glue_kind,b110)] = NODE_SIZE(4+4,0);
hnode_size[TAG(glue_kind,b101)] = NODE_SIZE(4+4,0);
hnode_size[TAG(glue_kind,b011)] = NODE_SIZE(4+4,0);
hnode_size[TAG(glue_kind,b111)] = NODE_SIZE(4+4,1);
@
\subsection{Boxes}\index{box}
The layout of boxes is quite complex and explained in section~\secref{boxnodes}.
All boxes contain height and width, some contain a depth, some a shift amount,
and some a glue setting together with glue sign and glue order.
The last item in a box is a node list.
@<initialize the |hnode_size| array@>=
hnode_size[TAG(hbox_kind,b000)] = NODE_SIZE(4+4,1); /* tag, height, width*/
hnode_size[TAG(hbox_kind,b001)] = NODE_SIZE(4+4+4,1); /* and depth */
hnode_size[TAG(hbox_kind,b010)] = NODE_SIZE(4+4+4,1); /* or shift */
hnode_size[TAG(hbox_kind,b011)] = NODE_SIZE(4+4+4+4,1); /* or both */
hnode_size[TAG(hbox_kind,b100)] = NODE_SIZE(4+4+5,1); /* and glue setting*/
hnode_size[TAG(hbox_kind,b101)] = NODE_SIZE(4+4+4+5,1); /* and depth */
hnode_size[TAG(hbox_kind,b110)] = NODE_SIZE(4+4+4+5,1); /* or shift */
hnode_size[TAG(hbox_kind,b111)] = NODE_SIZE(4+4+4+4+5,1); /*or both */
hnode_size[TAG(vbox_kind,b000)] = NODE_SIZE(4+4,1); /* same for vbox*/
hnode_size[TAG(vbox_kind,b001)] = NODE_SIZE(4+4+4,1);
hnode_size[TAG(vbox_kind,b010)] = NODE_SIZE(4+4+4,1);
hnode_size[TAG(vbox_kind,b011)] = NODE_SIZE(4+4+4+4,1);
hnode_size[TAG(vbox_kind,b100)] = NODE_SIZE(4+4+5,1);
hnode_size[TAG(vbox_kind,b101)] = NODE_SIZE(4+4+4+5,1);
hnode_size[TAG(vbox_kind,b110)] = NODE_SIZE(4+4+4+5,1);
hnode_size[TAG(vbox_kind,b111)] = NODE_SIZE(4+4+4+4+5,1);
@
\subsection{Extended Boxes}\index{extended box}
Extended boxes start with height, width, depth, stretch, or shrink components.
Then follows an extended dimension either as a reference or a node.
The node ends with a list.
@<initialize the |hnode_size| array@>=
hnode_size[TAG(hset_kind,b000)] = NODE_SIZE(4+4+4+4+1,1);
hnode_size[TAG(hset_kind,b001)] = NODE_SIZE(4+4+4+4+4+1,1);
hnode_size[TAG(hset_kind,b010)] = NODE_SIZE(4+4+4+4+4+1,1);
hnode_size[TAG(hset_kind,b011)] = NODE_SIZE(4+4+4+4+4+4+1,1);
hnode_size[TAG(vset_kind,b000)] = NODE_SIZE(4+4+4+4+1,1);
hnode_size[TAG(vset_kind,b001)] = NODE_SIZE(4+4+4+4+4+1,1);
hnode_size[TAG(vset_kind,b010)] = NODE_SIZE(4+4+4+4+4+1,1);
hnode_size[TAG(vset_kind,b011)] = NODE_SIZE(4+4+4+4+4+4+1,1);
hnode_size[TAG(hset_kind,b100)] = NODE_SIZE(4+4+4+4,2);
hnode_size[TAG(hset_kind,b101)] = NODE_SIZE(4+4+4+4+4,2);
hnode_size[TAG(hset_kind,b110)] = NODE_SIZE(4+4+4+4+4,2);
hnode_size[TAG(hset_kind,b111)] = NODE_SIZE(4+4+4+4+4+4,2);
hnode_size[TAG(vset_kind,b100)] = NODE_SIZE(4+4+4+4,2);
hnode_size[TAG(vset_kind,b101)] = NODE_SIZE(4+4+4+4+4,2);
hnode_size[TAG(vset_kind,b110)] = NODE_SIZE(4+4+4+4+4,2);
hnode_size[TAG(vset_kind,b111)] = NODE_SIZE(4+4+4+4+4+4,2);
@
The hpack and vpack nodes start with a shift amount and in case of vpack a depth.
Then again an extended dimension and a list.
@<initialize the |hnode_size| array@>=
hnode_size[TAG(hpack_kind,b000)] = NODE_SIZE(1,1);
hnode_size[TAG(hpack_kind,b001)] = NODE_SIZE(1,1);
hnode_size[TAG(hpack_kind,b010)] = NODE_SIZE(4+1,1);
hnode_size[TAG(hpack_kind,b011)] = NODE_SIZE(4+1,1);
hnode_size[TAG(vpack_kind,b000)] = NODE_SIZE(4+1,1);
hnode_size[TAG(vpack_kind,b001)] = NODE_SIZE(4+1,1);
hnode_size[TAG(vpack_kind,b010)] = NODE_SIZE(4+4+1,1);
hnode_size[TAG(vpack_kind,b011)] = NODE_SIZE(4+4+1,1);
hnode_size[TAG(hpack_kind,b100)] = NODE_SIZE(0,2);
hnode_size[TAG(hpack_kind,b101)] = NODE_SIZE(0,2);
hnode_size[TAG(hpack_kind,b110)] = NODE_SIZE(4,2);
hnode_size[TAG(hpack_kind,b111)] = NODE_SIZE(4,2);
hnode_size[TAG(vpack_kind,b100)] = NODE_SIZE(4,2);
hnode_size[TAG(vpack_kind,b101)] = NODE_SIZE(4,2);
hnode_size[TAG(vpack_kind,b110)] = NODE_SIZE(4+4,2);
hnode_size[TAG(vpack_kind,b111)] = NODE_SIZE(4+4,2);
@
\subsection{Leaders}\index{leaders}
Most leader nodes will use a reference.
Otherwise they contain a glue node followed by a box or rule node.
@<initialize the |hnode_size| array@>=
hnode_size[TAG(leaders_kind,b000)] = NODE_SIZE(1,0);
hnode_size[TAG(leaders_kind,1)] = NODE_SIZE(0,1);
hnode_size[TAG(leaders_kind,2)] = NODE_SIZE(0,1);
hnode_size[TAG(leaders_kind,3)] = NODE_SIZE(0,1);
hnode_size[TAG(leaders_kind,b100|1)] = NODE_SIZE(0,2);
hnode_size[TAG(leaders_kind,b100|2)] = NODE_SIZE(0,2);
hnode_size[TAG(leaders_kind,b100|3)] = NODE_SIZE(0,2);
@
\subsection{Baseline Skips}\index{baseline skip}
Here we expect either a reference or two optional glue nodes followed by an optional dimension.
@<initialize the |hnode_size| array@>=
hnode_size[TAG(baseline_kind,b000)] = NODE_SIZE(1,0);
hnode_size[TAG(baseline_kind,b001)] = NODE_SIZE(4,0);
hnode_size[TAG(baseline_kind,b010)] = NODE_SIZE(0,1);
hnode_size[TAG(baseline_kind,b100)] = NODE_SIZE(0,1);
hnode_size[TAG(baseline_kind,b110)] = NODE_SIZE(0,2);
hnode_size[TAG(baseline_kind,b011)] = NODE_SIZE(4,1);
hnode_size[TAG(baseline_kind,b101)] = NODE_SIZE(4,1);
hnode_size[TAG(baseline_kind,b111)] = NODE_SIZE(4,2);
@
\subsection{Ligatures}\index{ligature}
As usual a reference is possible, otherwise the font is followed by character bytes
as given by the info. Only if the info value is 7, the number of character bytes
is stored separately.
@<initialize the |hnode_size| array@>=
hnode_size[TAG(ligature_kind,b000)] = NODE_SIZE(1,0);
hnode_size[TAG(ligature_kind,1)] = NODE_SIZE(1+1,0);
hnode_size[TAG(ligature_kind,2)] = NODE_SIZE(1+2,0);
hnode_size[TAG(ligature_kind,3)] = NODE_SIZE(1+3,0);
hnode_size[TAG(ligature_kind,4)] = NODE_SIZE(1+4,0);
hnode_size[TAG(ligature_kind,5)] = NODE_SIZE(1+5,0);
hnode_size[TAG(ligature_kind,6)] = NODE_SIZE(1+6,0);
hnode_size[TAG(ligature_kind,7)] = NODE_SIZE(1,1);
@
\subsection{Discretionary breaks}\index{discretionary break}
The simple cases here are references, discretionary breaks
with empty pre- and post-list, or with a zero line skip limit
Otherwise one or two lists are followed by an optional replace count.
@<initialize the |hnode_size| array@>=
hnode_size[TAG(disc_kind,b000)] = NODE_SIZE(1,0);
hnode_size[TAG(disc_kind,b010)] = NODE_SIZE(0,1);
hnode_size[TAG(disc_kind,b011)] = NODE_SIZE(0,2);
hnode_size[TAG(disc_kind,b100)] = NODE_SIZE(1,0);
hnode_size[TAG(disc_kind,b110)] = NODE_SIZE(1,1);
hnode_size[TAG(disc_kind,b111)] = NODE_SIZE(1,2);
@
\subsection{Paragraphs}\index{paragraph}
Paragraph nodes contain an extended dimension, an parameter list and a list.
The first two can be given as a reference.
@<initialize the |hnode_size| array@>=
hnode_size[TAG(par_kind,b000)] = NODE_SIZE(1+1,1);
hnode_size[TAG(par_kind,b010)] = NODE_SIZE(1,2);
hnode_size[TAG(par_kind,b110)] = NODE_SIZE(0,3);
hnode_size[TAG(par_kind,b100)] = NODE_SIZE(1,2);
@
\subsection{Mathematics}\index{mathematics}\index{displayed formula}
Displayed math needs a parameter list, either as list or as reference
followed by an optional left or right equation number and a list.
Text math is simpler: the only information is in the info value.
@<initialize the |hnode_size| array@>=
hnode_size[TAG(math_kind,b000)] = NODE_SIZE(1,1);
hnode_size[TAG(math_kind,b001)] = NODE_SIZE(1,2);
hnode_size[TAG(math_kind,b010)] = NODE_SIZE(1,2);
hnode_size[TAG(math_kind,b100)] = NODE_SIZE(0,2);
hnode_size[TAG(math_kind,b101)] = NODE_SIZE(0,3);
hnode_size[TAG(math_kind,b110)] = NODE_SIZE(0,3);
hnode_size[TAG(math_kind,b111)] = NODE_SIZE(0,0);
hnode_size[TAG(math_kind,b011)] = NODE_SIZE(0,0);
@
\subsection{Adjustments}\index{adjustment}
@<initialize the |hnode_size| array@>=
hnode_size[TAG(adjust_kind,1)] = NODE_SIZE(0,1);
@
\subsection{Tables}\index{alignment}
Tables have an extended dimension either as a node or as a reference followed
by two lists.
@<initialize the |hnode_size| array@>=
hnode_size[TAG(table_kind,b000)] = NODE_SIZE(1,2);
hnode_size[TAG(table_kind,b001)] = NODE_SIZE(1,2);
hnode_size[TAG(table_kind,b010)] = NODE_SIZE(1,2);
hnode_size[TAG(table_kind,b011)] = NODE_SIZE(1,2);
hnode_size[TAG(table_kind,b100)] = NODE_SIZE(0,3);
hnode_size[TAG(table_kind,b101)] = NODE_SIZE(0,3);
hnode_size[TAG(table_kind,b110)] = NODE_SIZE(0,3);
hnode_size[TAG(table_kind,b111)] = NODE_SIZE(0,3);
@
Outer item nodes are lists of inner item nodes, inner item nodes are box nodes
followed by an optional span count.
@<initialize the |hnode_size| array@>=
hnode_size[TAG(item_kind,b000)] = NODE_SIZE(0,1); /* outer */
hnode_size[TAG(item_kind,1)] = NODE_SIZE(0,1); /* inner */
hnode_size[TAG(item_kind,2)] = NODE_SIZE(0,1);
hnode_size[TAG(item_kind,3)] = NODE_SIZE(0,1);
hnode_size[TAG(item_kind,4)] = NODE_SIZE(0,1);
hnode_size[TAG(item_kind,5)] = NODE_SIZE(0,1);
hnode_size[TAG(item_kind,6)] = NODE_SIZE(0,1);
hnode_size[TAG(item_kind,7)] = NODE_SIZE(1,1);
@
\subsection{Images}\index{image}
If not given by a reference, images contain a section reference and optional dimensions and a descriptive list.
@<initialize the |hnode_size| array@>=
hnode_size[TAG(image_kind,b000)] = NODE_SIZE(1,0);
hnode_size[TAG(image_kind,b001)] = NODE_SIZE(2+4+4,1);
hnode_size[TAG(image_kind,b010)] = NODE_SIZE(2+4+4,1);
hnode_size[TAG(image_kind,b011)] = NODE_SIZE(2+4+4,1);
hnode_size[TAG(image_kind,b100)] = NODE_SIZE(2+4+1+1,1);
hnode_size[TAG(image_kind,b101)] = NODE_SIZE(2+4+1,2);
hnode_size[TAG(image_kind,b110)] = NODE_SIZE(2+4+1,2);
hnode_size[TAG(image_kind,b111)] = NODE_SIZE(2+4,3);
@
\subsection{Links}\index{link}
Links contain either a 2 byte or a 1 byte reference and possibly a color reference.
@<initialize the |hnode_size| array@>=
hnode_size[TAG(link_kind,b000)] = NODE_SIZE(1,0);
hnode_size[TAG(link_kind,b001)] = NODE_SIZE(2,0);
hnode_size[TAG(link_kind,b010)] = NODE_SIZE(1,0);
hnode_size[TAG(link_kind,b011)] = NODE_SIZE(2,0);
hnode_size[TAG(link_kind,b100)] = NODE_SIZE(2,0);
hnode_size[TAG(link_kind,b101)] = NODE_SIZE(3,0);
hnode_size[TAG(link_kind,b110)] = NODE_SIZE(2,0);
hnode_size[TAG(link_kind,b111)] = NODE_SIZE(3,0);
@
\subsection{Streams}\index{stream}
After the stream reference follows a parameter list, either as reference
or as a list, and then a content list.
@<initialize the |hnode_size| array@>=
hnode_size[TAG(stream_kind,b000)] = NODE_SIZE(1+1,1);
hnode_size[TAG(stream_kind,b010)] = NODE_SIZE(1,2);
hnode_size[TAG(stream_kind,b100)] = NODE_SIZE(1,0);
@
\subsection{Colors}\index{color}
@<initialize the |hnode_size| array@>=
hnode_size[TAG(color_kind,b000)] = NODE_SIZE(1,0);
@
\section{Reading Short Format Files Backwards}
This section is not really part of the file format definition, but it
illustrates an important property of the content section in short
format files: it can be read in both directions. This is important
because we want to be able to start at an arbitrary point in the
content and from there move pagewise backward.
The program {\tt skip}\index{skip+{\tt skip}} described in this
section does just that. As wee see in appendix~\secref{skip}, its
|main| program is almost the same as the |main| program of the program
{\tt stretch} in appendix~\secref{stretchmain}.
The major difference is the removal of an output file
and the replacement of the call to |hwrite_content_section| by
a call to |hteg_content_section|.
@<skip functions@>=
static void hteg_content_section(void)
{ hget_section(2);
hpos=hend;
while(hpos>hstart)
hteg_content_node();
}
@
The functions |hteg_content_section| and |hteg_content_node| above are
reverse versions of the functions |hget_content_section| and
|hget_content_node|. Many such ``reverse functions'' will follow now
and we will consistently use the same naming scheme: replacing
``{\it get\/}`` by ``{\it teg\/}'' or ``{\tt GET}'' by ``{\tt TEG}''.
The {\tt skip} program does not do much input
checking; it will just extract enough information from a content node
to skip a node and ``advance'' or better ``retreat'' to the previous
node.
@<skip functions@>=
static void hteg_content_node(void)
{ @<skip the end byte |z|@>@;
hteg_content(z);
@<skip and check the start byte |a|@>@;
}
static void hteg_content(Tag z)
{@+ switch (z)@/
{
@<cases to skip content@>@;@t\1@>@/
default:
if (!hteg_unknown(z))
TAGERR(z);
break;@t\2@>@/
}
}
@
The code to skip the end\index{end byte} byte |z| and to check the start\index{start byte} byte |a| is used repeatedly.
@<skip the end byte |z|@>=
Tag a,z; /* the start and the end byte*/
uint32_t node_pos=hpos-hstart;
if (hpos<=hstart) return;
HTEGTAG(z);
@
@<skip and check the start byte |a|@>=
HTEGTAG(a);
if (a!=z) QUIT(@["Tag mismatch [%s,%d]!=[%s,%d] at " SIZE_F " to 0x%x\n"@],@|NAME(a),INFO(a),NAME(z),INFO(z),@|
hpos-hstart,node_pos-1);
@
We replace the ``{\tt GET}'' macros by the following ``{\tt TEG}'' macros:
@<shared get macros@>=
#define @[HBACK(X)@] @[((hpos-(X)<hstart)?(QUIT("HTEG underflow\n"),NULL):(hpos-=(X)))@]
#define @[HTEG8@] (HBACK(1),hpos[0])
#define @[HTEG16(X)@] (HBACK(2),(X)=(hpos[0]<<8)+hpos[1])
#define @[HTEG24(X)@] (HBACK(3),(X)=(hpos[0]<<16)+(hpos[1]<<8)+hpos[2])
#define @[HTEG32(X)@] (HBACK(4),(X)=(hpos[0]<<24)+(hpos[1]<<16)+(hpos[2]<<8)+hpos[3])
#define @[HTEGTAG(X)@] @[X=HTEG8,DBGTAG(X,hpos)@]
@
Now we review step by step the different kinds of nodes.
\subsection{Floating Point Numbers}\index{floating point number}
\noindent
@<shared skip functions@>=
float32_t hteg_float32(void)
{ union {@+float32_t d; @+ uint32_t bits; @+} u;
HTEG32(u.bits);
return u.d;
}
@
\subsection{Extended Dimensions}\index{extended dimension}
\noindent
@<skip macros@>=
#define @[HTEG_XDIMEN(I,X)@] \
if((I)&b001) HTEG32((X).v); \
if((I)&b010) HTEG32((X).h);\
if((I)&b100) HTEG32((X).w);
@
@<skip functions@>=
static void hteg_xdimen_node(Xdimen *x)
{ @<skip the end byte |z|@>@;
switch(z)
{
#if 0
/* currently the info value 0 is not supported */
case TAG(xdimen_kind,b000): /* see section~\secref{reference} */
{uint8_t n;@+ n=HTEG8;} @+ break;
#endif
case TAG(xdimen_kind,b001): HTEG_XDIMEN(b001,*x);@+break;
case TAG(xdimen_kind,b010): HTEG_XDIMEN(b010,*x);@+break;
case TAG(xdimen_kind,b011): HTEG_XDIMEN(b011,*x);@+break;
case TAG(xdimen_kind,b100): HTEG_XDIMEN(b100,*x);@+break;
case TAG(xdimen_kind,b101): HTEG_XDIMEN(b101,*x);@+break;
case TAG(xdimen_kind,b110): HTEG_XDIMEN(b110,*x);@+break;
case TAG(xdimen_kind,b111): HTEG_XDIMEN(b111,*x);@+break;
default:
QUIT("Extent expected at 0x%x got %s",node_pos,NAME(z)); @+ break;
}
@<skip and check the start byte |a|@>@;
}
@
\subsection{Stretch and Shrink}\index{stretchability}\index{shrinkability}
\noindent
@<skip macros@>=
#define @[HTEG_STRETCH(S)@] { Stch st; @+ HTEG32(st.u);@+ S.o=st.u&3;@+ st.u&=~3;@+ S.f=st.f; @+}
@
\subsection{Glyphs}\index{glyph}
\noindent
@<skip macros@>=
#define HTEG_GLYPH(I,G) \
(G).f=HTEG8; \
if (I==1) (G).c=HTEG8;\
else if (I==2) HTEG16((G).c);\
else if (I==3) HTEG24((G).c);\
else if (I==4) HTEG32((G).c);
@
@<cases to skip content@>=
@t\1\kern1em@>case TAG(glyph_kind,1): @+{@+Glyph g;@+ HTEG_GLYPH(1,g);@+}@+break;
case TAG(glyph_kind,2): @+{@+Glyph g;@+ HTEG_GLYPH(2,g);@+}@+break;
case TAG(glyph_kind,3): @+{@+Glyph g;@+ HTEG_GLYPH(3,g);@+}@+break;
case TAG(glyph_kind,4): @+{@+Glyph g;@+ HTEG_GLYPH(4,g);@+}@+break;
@
\subsection{Penalties}\index{penalty}
\noindent
@<skip macros@>=
#define @[HTEG_PENALTY(I,P)@] \
if (I==1) {int8_t n; @+n=HTEG8; @+P=n;@+ } \
else if (I==2) {int16_t n;@+ HTEG16(n); @+ P=n; @+}\
else if (I==3) {int32_t n;@+ HTEG32(n); @+ P=n; @+}
@
@<cases to skip content@>=
@t\1\kern1em@>case TAG(penalty_kind,1): @+{int32_t p;@+ HTEG_PENALTY(1,p);@+} @+break;
case TAG(penalty_kind,2): @+{int32_t p;@+ HTEG_PENALTY(2,p);@+} @+break;
case TAG(penalty_kind,3): @+{int32_t p;@+ HTEG_PENALTY(2,p);@+} @+break;
@
\subsection{Kerns}\index{kern}
\noindent
@<skip macros@>=
#define @[HTEG_KERN(I,X)@] @[if (((I)&b011)==2) HTEG32(X.w); else if (((I)&b011)==3) hteg_xdimen_node(&(X))@]
@
@<cases to skip content@>=
@t\1\kern1em@>case TAG(kern_kind,b010): @+ {@+Xdimen x; @+HTEG_KERN(b010,x);@+ } @+break;
case TAG(kern_kind,b011): @+ {@+Xdimen x; @+HTEG_KERN(b011,x);@+ } @+break;
case TAG(kern_kind,b110): @+ {@+Xdimen x; @+HTEG_KERN(b110,x);@+ } @+break;
case TAG(kern_kind,b111): @+ {@+Xdimen x; @+HTEG_KERN(b111,x);@+ } @+break;
@
\subsection{Language}\index{language}
\noindent
@<cases to skip content@>=
@t\kern1em@>case TAG(language_kind,1):
case TAG(language_kind,2):
case TAG(language_kind,3):
case TAG(language_kind,4):
case TAG(language_kind,5):
case TAG(language_kind,6):
case TAG(language_kind,7):@+break;
@
\subsection{Rules}\index{rule}
\noindent
@<skip macros@>=
#define @[HTEG_RULE(I,R)@]@/\
if ((I)&b001) HTEG32((R).w); @+else (R).w=RUNNING_DIMEN;\
if ((I)&b010) HTEG32((R).d); @+else (R).d=RUNNING_DIMEN;\
if ((I)&b100) HTEG32((R).h); @+else (R).h=RUNNING_DIMEN;
@
@<cases to skip content@>=
@t\1\kern1em@>case TAG(rule_kind,b011): @+ {Rule r;@+ HTEG_RULE(b011,r);@+ }@+ break;
case TAG(rule_kind,b101): @+ {Rule r;@+ HTEG_RULE(b101,r);@+ }@+ break;
case TAG(rule_kind,b001): @+ {Rule r;@+ HTEG_RULE(b001,r);@+ }@+ break;
case TAG(rule_kind,b110): @+ {Rule r;@+ HTEG_RULE(b110,r);@+ }@+ break;
case TAG(rule_kind,b111): @+ {Rule r;@+ HTEG_RULE(b111,r);@+ }@+ break;
@
@<skip functions@>=
static void hteg_rule_node(void)
{ @<skip the end byte |z|@>@;
if (KIND(z)==rule_kind) { @+Rule r; @+HTEG_RULE(INFO(z),r); @+}
else
QUIT("Rule expected at 0x%x got %s",node_pos,NAME(z));
@<skip and check the start byte |a|@>@;
}
@
\subsection{Glue}\index{glue}
\noindent
@<skip macros@>=
#define @[HTEG_GLUE(I,G)@] @/\
if(I==b111) hteg_xdimen_node(&((G).w)); else (G).w.h=(G).w.v=0.0;\
if((I)&b001) HTEG_STRETCH((G).m) @+else (G).m.f=0.0, (G).m.o=0; \
if((I)&b010) HTEG_STRETCH((G).p) @+else (G).p.f=0.0, (G).p.o=0;\
if((I)!=b111) { @+if ((I)&b100) HTEG32((G).w.w);@+ else (G).w.w=0;@+ }
@
@<cases to skip content@>=
@t\1\kern1em@>case TAG(glue_kind,b001): @+{ Glue g;@+ HTEG_GLUE(b001,g);@+}@+break;
case TAG(glue_kind,b010): @+{ Glue g;@+ HTEG_GLUE(b010,g);@+}@+break;
case TAG(glue_kind,b011): @+{ Glue g;@+ HTEG_GLUE(b011,g);@+}@+break;
case TAG(glue_kind,b100): @+{ Glue g;@+ HTEG_GLUE(b100,g);@+}@+break;
case TAG(glue_kind,b101): @+{ Glue g;@+ HTEG_GLUE(b101,g);@+}@+break;
case TAG(glue_kind,b110): @+{ Glue g;@+ HTEG_GLUE(b110,g);@+}@+break;
case TAG(glue_kind,b111): @+{ Glue g;@+ HTEG_GLUE(b111,g);@+}@+break;
@
@<skip functions@>=
static void hteg_glue_node(void)
{ @<skip the end byte |z|@>@;
if (INFO(z)==b000) HTEG_REF(glue_kind);
else
{ @+Glue g; @+HTEG_GLUE(INFO(z),g);@+}
@<skip and check the start byte |a|@>@;
}
@
\subsection{Boxes}\index{box}
\noindent
@<skip macros@>=
#define @[HTEG_BOX(I,B)@] \
hteg_list(&(B.l));\
if ((I)&b100) @/{ B.s=HTEG8; @+ B.r=hteg_float32();@+ B.o=B.s&0xF; @+B.s=B.s>>4;@+ }\
else { B.r=0.0;@+ B.o=B.s=0;@+ }\
if ((I)&b010) HTEG32(B.a); @+else B.a=0;\
HTEG32(B.w);\
if ((I)&b001) HTEG32(B.d); @+ else B.d=0;\
HTEG32(B.h);\
@
@<cases to skip content@>=
@t\1\kern1em@> case TAG(hbox_kind,b000): @+{Box b; @+HTEG_BOX(b000,b);@+} @+ break;
case TAG(hbox_kind,b001): @+{Box b; @+HTEG_BOX(b001,b);@+} @+ break;
case TAG(hbox_kind,b010): @+{Box b; @+HTEG_BOX(b010,b);@+} @+ break;
case TAG(hbox_kind,b011): @+{Box b; @+HTEG_BOX(b011,b);@+} @+ break;
case TAG(hbox_kind,b100): @+{Box b; @+HTEG_BOX(b100,b);@+} @+ break;
case TAG(hbox_kind,b101): @+{Box b; @+HTEG_BOX(b101,b);@+} @+ break;
case TAG(hbox_kind,b110): @+{Box b; @+HTEG_BOX(b110,b);@+} @+ break;
case TAG(hbox_kind,b111): @+{Box b; @+HTEG_BOX(b111,b);@+} @+ break;
case TAG(vbox_kind,b000): @+{Box b; @+HTEG_BOX(b000,b);@+} @+ break;
case TAG(vbox_kind,b001): @+{Box b; @+HTEG_BOX(b001,b);@+} @+ break;
case TAG(vbox_kind,b010): @+{Box b; @+HTEG_BOX(b010,b);@+} @+ break;
case TAG(vbox_kind,b011): @+{Box b; @+HTEG_BOX(b011,b);@+} @+ break;
case TAG(vbox_kind,b100): @+{Box b; @+HTEG_BOX(b100,b);@+} @+ break;
case TAG(vbox_kind,b101): @+{Box b; @+HTEG_BOX(b101,b);@+} @+ break;
case TAG(vbox_kind,b110): @+{Box b; @+HTEG_BOX(b110,b);@+} @+ break;
case TAG(vbox_kind,b111): @+{Box b; @+HTEG_BOX(b111,b);@+} @+ break;
@
@<skip functions@>=
static void hteg_hbox_node(void)
{ Box b;
@<skip the end byte |z|@>@;
if (KIND(z)!=hbox_kind) QUIT("Hbox expected at 0x%x got %s",node_pos,NAME(z));
HTEG_BOX(INFO(z),b);@/
@<skip and check the start byte |a|@>@;
}
static void hteg_vbox_node(void)
{ Box b;
@<skip the end byte |z|@>@;
if (KIND(z)!=vbox_kind) QUIT("Vbox expected at 0x%x got %s",node_pos,NAME(z));
HTEG_BOX(INFO(z),b);@/
@<skip and check the start byte |a|@>@;
}
@
\subsection{Extended Boxes}\index{extended box}
\noindent
@<skip macros@>=
#define @[HTEG_SET(I)@] @/\
{ List l; @+hteg_list(&l); @+} \
if ((I)&b100) {Xdimen x;@+ hteg_xdimen_node(&x); @+} \
else HTEG_REF(xdimen_kind);\
{ Stretch m; @+HTEG_STRETCH(m);@+}\
{ Stretch p; @+HTEG_STRETCH(p);@+}\
if ((I)&b010) { Dimen a; @+HTEG32(a);@+} \
{ Dimen w; @+HTEG32(w);@+} \
{ Dimen d; @+if ((I)&b001) HTEG32(d); @+ else d=0;@+}\
{ Dimen h; @+HTEG32(h);@+}
@#
#define @[HTEG_PACK(K,I)@] @/\
{ List l; @+hteg_list(&l); @+} \
if ((I)&b100) {Xdimen x; hteg_xdimen_node(&x);@+} @+ else HTEG_REF(xdimen_kind);\
if ((I)&b010) { Dimen d; @+HTEG32(d); @+ }\
if (K==vpack_kind) { Dimen d; @+HTEG32(d); @+ }
@
@<cases to skip content@>=
@t\1\kern1em@>case TAG(hset_kind,b000): HTEG_SET(b000); @+ break;
case TAG(hset_kind,b001): HTEG_SET(b001); @+ break;
case TAG(hset_kind,b010): HTEG_SET(b010); @+ break;
case TAG(hset_kind,b011): HTEG_SET(b011); @+ break;
case TAG(hset_kind,b100): HTEG_SET(b100); @+ break;
case TAG(hset_kind,b101): HTEG_SET(b101); @+ break;
case TAG(hset_kind,b110): HTEG_SET(b110); @+ break;
case TAG(hset_kind,b111): HTEG_SET(b111); @+ break;@#
case TAG(vset_kind,b000): HTEG_SET(b000); @+ break;
case TAG(vset_kind,b001): HTEG_SET(b001); @+ break;
case TAG(vset_kind,b010): HTEG_SET(b010); @+ break;
case TAG(vset_kind,b011): HTEG_SET(b011); @+ break;
case TAG(vset_kind,b100): HTEG_SET(b100); @+ break;
case TAG(vset_kind,b101): HTEG_SET(b101); @+ break;
case TAG(vset_kind,b110): HTEG_SET(b110); @+ break;
case TAG(vset_kind,b111): HTEG_SET(b111); @+ break;@#
case TAG(hpack_kind,b000): HTEG_PACK(hpack_kind,b000); @+ break;
case TAG(hpack_kind,b001): HTEG_PACK(hpack_kind,b001); @+ break;
case TAG(hpack_kind,b010): HTEG_PACK(hpack_kind,b010); @+ break;
case TAG(hpack_kind,b011): HTEG_PACK(hpack_kind,b011); @+ break;
case TAG(hpack_kind,b100): HTEG_PACK(hpack_kind,b100); @+ break;
case TAG(hpack_kind,b101): HTEG_PACK(hpack_kind,b101); @+ break;
case TAG(hpack_kind,b110): HTEG_PACK(hpack_kind,b110); @+ break;
case TAG(hpack_kind,b111): HTEG_PACK(hpack_kind,b111); @+ break;@#
case TAG(vpack_kind,b000): HTEG_PACK(vpack_kind,b000); @+ break;
case TAG(vpack_kind,b001): HTEG_PACK(vpack_kind,b001); @+ break;
case TAG(vpack_kind,b010): HTEG_PACK(vpack_kind,b010); @+ break;
case TAG(vpack_kind,b011): HTEG_PACK(vpack_kind,b011); @+ break;
case TAG(vpack_kind,b100): HTEG_PACK(vpack_kind,b100); @+ break;
case TAG(vpack_kind,b101): HTEG_PACK(vpack_kind,b101); @+ break;
case TAG(vpack_kind,b110): HTEG_PACK(vpack_kind,b110); @+ break;
case TAG(vpack_kind,b111): HTEG_PACK(vpack_kind,b111); @+ break;
@
\subsection{Leaders}\index{leaders}
\noindent
@<skip macros@>=
#define @[HTEG_LEADERS(I)@]@/ \
if (KIND(hpos[-1])==rule_kind) hteg_rule_node(); \
else if (KIND(hpos[-1])==hbox_kind) hteg_hbox_node();\
else hteg_vbox_node();\
if ((I)&b100) hteg_glue_node();
@
@<cases to skip content@>=
@t\1\kern1em@>case TAG(leaders_kind,1): @+ HTEG_LEADERS(1); @+break;
case TAG(leaders_kind,2): @+ HTEG_LEADERS(2); @+break;
case TAG(leaders_kind,3): @+ HTEG_LEADERS(3); @+break;
case TAG(leaders_kind,b100|1): @+ HTEG_LEADERS(b100|1); @+break;
case TAG(leaders_kind,b100|2): @+ HTEG_LEADERS(b100|2); @+break;
case TAG(leaders_kind,b100|3): @+ HTEG_LEADERS(b100|3); @+break;
@
\subsection{Baseline Skips}\index{baseline skip}
\noindent
@<skip macros@>=
#define @[HTEG_BASELINE(I,B)@] \
if((I)&b010) hteg_glue_node(); \
else {B.ls.p.o=B.ls.m.o=B.ls.w.w=0; @+B.ls.w.h=B.ls.w.v=B.ls.p.f=B.ls.m.f=0.0;@+}\
if((I)&b100) hteg_glue_node(); \
else {B.bs.p.o=B.bs.m.o=B.bs.w.w=0; @+B.bs.w.h=B.bs.w.v=B.bs.p.f=B.bs.m.f=0.0;@+}\
if((I)&b001) HTEG32((B).lsl); @+else B.lsl=0;
@
@<cases to skip content@>=
@t\1\kern1em@>case TAG(baseline_kind,b001): @+{ Baseline b;@+ HTEG_BASELINE(b001,b);@+ }@+break;
case TAG(baseline_kind,b010): @+{ Baseline b;@+ HTEG_BASELINE(b010,b);@+ }@+break;
case TAG(baseline_kind,b011): @+{ Baseline b;@+ HTEG_BASELINE(b011,b);@+ }@+break;
case TAG(baseline_kind,b100): @+{ Baseline b;@+ HTEG_BASELINE(b100,b);@+ }@+break;
case TAG(baseline_kind,b101): @+{ Baseline b;@+ HTEG_BASELINE(b101,b);@+ }@+break;
case TAG(baseline_kind,b110): @+{ Baseline b;@+ HTEG_BASELINE(b110,b);@+ }@+break;
case TAG(baseline_kind,b111): @+{ Baseline b;@+ HTEG_BASELINE(b111,b);@+ }@+break;
@
\subsection{Ligatures}\index{ligature}
\noindent
@<skip macros@>=
#define @[HTEG_LIG(I,L)@] @/\
if ((I)==7) hteg_list(&((L).l)); \
else {(L).l.s=(I); @+hpos-=(L).l.s; @+ (L).l.p=hpos-hstart;@+} \
(L).f=HTEG8;
@
@<cases to skip content@>=
@t\1\kern1em@>case TAG(ligature_kind,1):@+ {Lig l; @+HTEG_LIG(1,l);@+} @+break;
case TAG(ligature_kind,2):@+ {Lig l; @+HTEG_LIG(2,l);@+} @+break;
case TAG(ligature_kind,3):@+ {Lig l; @+HTEG_LIG(3,l);@+} @+break;
case TAG(ligature_kind,4):@+ {Lig l; @+HTEG_LIG(4,l);@+} @+break;
case TAG(ligature_kind,5):@+ {Lig l; @+HTEG_LIG(5,l);@+} @+break;
case TAG(ligature_kind,6):@+ {Lig l; @+HTEG_LIG(6,l);@+} @+break;
case TAG(ligature_kind,7):@+ {Lig l; @+HTEG_LIG(7,l);@+} @+break;
@
\subsection{Discretionary breaks}\index{discretionary breaks}
\noindent
@<skip macros@>=
#define @[HTEG_DISC(I,H)@]\
if ((I)&b001) hteg_list(&((H).q)); else { (H).q.p=hpos-hstart; @+(H).q.s=0; @+(H).q.t=TAG(list_kind,b000); @+}\
if ((I)&b010) hteg_list(&((H).p)); else { (H).p.p=hpos-hstart; @+(H).p.s=0; @+(H).p.t=TAG(list_kind,b000); @+} \
if ((I)&b100) (H).r=HTEG8; @+else (H).r=0;
@
@<cases to skip content@>=
@t\1\kern1em@>case TAG(disc_kind,b001): @+{Disc h; @+HTEG_DISC(b001,h); @+} @+break;
case TAG(disc_kind,b010): @+{Disc h; @+HTEG_DISC(b010,h); @+} @+break;
case TAG(disc_kind,b011): @+{Disc h; @+HTEG_DISC(b011,h); @+} @+break;
case TAG(disc_kind,b100): @+{Disc h; @+HTEG_DISC(b100,h); @+} @+break;
case TAG(disc_kind,b101): @+{Disc h; @+HTEG_DISC(b101,h); @+} @+break;
case TAG(disc_kind,b110): @+{Disc h; @+HTEG_DISC(b110,h); @+} @+break;
case TAG(disc_kind,b111): @+{Disc h; @+HTEG_DISC(b111,h); @+} @+break;
@
\subsection{Paragraphs}\index{paragraph}
\noindent
@<skip macros@>=
#define @[HTEG_PAR(I)@] @/\
{ List l; @+hteg_list(&l); @+} \
if ((I)&b010) { List l; @+hteg_param_list(&l); @+} else if ((I)!=b100) HTEG_REF(param_kind);\
if ((I)&b100) {Xdimen x; @+ hteg_xdimen_node(&x); @+} else HTEG_REF(xdimen_kind);\
if ((I)==b100) HTEG_REF(param_kind);
@
@<cases to skip content@>=
@t\1\kern1em@>case TAG(par_kind,b000): @+HTEG_PAR(b000);@+break;
case TAG(par_kind,b010): @+HTEG_PAR(b010);@+break;
case TAG(par_kind,b100): @+HTEG_PAR(b100);@+break;
case TAG(par_kind,b110): @+HTEG_PAR(b110);@+break;
@
\subsection{Mathematics}\index{mathematics}\index{displayed formula}%
\noindent
@<skip macros@>=
#define @[HTEG_MATH(I)@] \
if ((I)&b001) hteg_hbox_node();\
{ List l; @+hteg_list(&l); @+} \
if ((I)&b010) hteg_hbox_node(); \
if ((I)&b100) { List l; @+hteg_param_list(&l); @+} @+ else HTEG_REF(param_kind);
@
@<cases to skip content@>=
@t\1\kern1em@>case TAG(math_kind,b000): HTEG_MATH(b000); @+ break;
case TAG(math_kind,b001): HTEG_MATH(b001); @+ break;
case TAG(math_kind,b010): HTEG_MATH(b010); @+ break;
case TAG(math_kind,b100): HTEG_MATH(b100); @+ break;
case TAG(math_kind,b101): HTEG_MATH(b101); @+ break;
case TAG(math_kind,b110): HTEG_MATH(b110); @+ break;
case TAG(math_kind,b011):
case TAG(math_kind,b111): @+ break;
@
\subsection{Images}\index{image}
\noindent
@<skip macros@>=
#define @[HTEG_IMAGE(I)@] @/\
{ Image x={0}; List d; hteg_list(&d);\
if ((I)&b100) {\
if ((I)==b111) {hteg_xdimen_node(&x.h);hteg_xdimen_node(&x.w);}\
else if ((I)==b110) {hteg_xdimen_node(&x.w);x.hr=HTEG8;}\
else if ((I)==b101) {hteg_xdimen_node(&x.h);x.wr=HTEG8;}\
else {x.hr=HTEG8;x.wr=HTEG8;}\
x.a=hteg_float32();}\
else if((I)==b011) {HTEG32(x.h.w);HTEG32(x.w.w);} \
else if((I)==b010) { HTEG32(x.w.w); x.a=hteg_float32();}\
else if((I)==b001){ HTEG32(x.h.w); x.a=hteg_float32();}\
HTEG16(x.n);}
@
@<cases to skip content@>=
@t\1\kern1em@>
case TAG(image_kind,b001): @+ HTEG_IMAGE(b001);@+break;
case TAG(image_kind,b010): @+ HTEG_IMAGE(b010);@+break;
case TAG(image_kind,b011): @+ HTEG_IMAGE(b011);@+break;
case TAG(image_kind,b100): @+ HTEG_IMAGE(b100);@+break;
case TAG(image_kind,b101): @+ HTEG_IMAGE(b101);@+break;
case TAG(image_kind,b110): @+ HTEG_IMAGE(b110);@+break;
case TAG(image_kind,b111): @+ HTEG_IMAGE(b111);@+break;
@
\subsection{Links and Labels}
\noindent
@<skip macros@>=
#define @[HTEG_LINK(I)@] @/\
{ uint16_t n; if (I&b100) n=HTEG8; if (I&b001) HTEG16(n);@+ else n=HTEG8; @+}
@
@<cases to skip content@>=
@t\1\kern1em@>case TAG(link_kind,b000): @+ HTEG_LINK(b000); @+break;
case TAG(link_kind,b001): @+ HTEG_LINK(b001); @+break;
case TAG(link_kind,b010): @+ HTEG_LINK(b010); @+break;
case TAG(link_kind,b011): @+ HTEG_LINK(b011); @+break;
case TAG(link_kind,b100): @+ HTEG_LINK(b100); @+break;
case TAG(link_kind,b101): @+ HTEG_LINK(b101); @+break;
case TAG(link_kind,b110): @+ HTEG_LINK(b110); @+break;
case TAG(link_kind,b111): @+ HTEG_LINK(b111); @+break;
@
\subsection{Colors}
\noindent
@<cases to skip content@>=
@t\1\kern1em@>
case TAG(color_kind,b000): @+ (void)HTEG8; @+break;
@
\subsection{Plain Lists, Texts, and Parameter Lists}\index{list}
\noindent
@<shared skip functions@>=
uint32_t hteg_list_size(Info info)
{ uint32_t n=0;
info=info&0x3;
if (info==0) return 0;
else if (info==1) n=HTEG8;
else if (info==2) HTEG16(n);
else if (info==3) HTEG32(n);
else QUIT("List info %d must be 0, 1, 2, or 3",info);
return n;
}
@
@<skip functions@>=
static void hteg_size_boundary(Info info)
{ uint32_t n;
info=info&0x3;
if (info==0) return;
n=HTEG8;
if (n!=0x100-info) QUIT(@["List size boundary byte 0x%x does not match info value %d at " SIZE_F@],
n, info,hpos-hstart);
}
static void hteg_list(List *l)
{ @<skip the end byte |z|@>@,
@+if (KIND(z)!=list_kind && KIND(z)!=param_kind) @/
QUIT("List expected at 0x%x", (uint32_t)(hpos-hstart));
else if ((INFO(z)&0x3)==0) {HBACK(1); l->s=0;@+}
else
{ uint32_t s;
l->t=z;
l->s=hteg_list_size(INFO(z));
hteg_size_boundary(INFO(z));
hpos=hpos-l->s;
l->p=hpos-hstart;
hteg_size_boundary(INFO(z));
s=hteg_list_size(INFO(z));
if (s!=l->s) QUIT(@["List sizes at " SIZE_F " and 0x%x do not match 0x%x != 0x%x"@],
hpos-hstart,node_pos-1,s,l->s);
}
@<skip and check the start byte |a|@>@;
}
static void hteg_param_list(List *l)
{ @+if (KIND(*(hpos-1))!=param_kind) return;
hteg_list(l);
}
@
\subsection{Adjustments}\index{adjustment}
\noindent
@<cases to skip content@>=
@t\1\kern1em@>case TAG(adjust_kind,b001): @+ { List l; @+hteg_list(&l);@+ } @+ break;
@
\subsection{Tables}\index{table}
\noindent
@<skip macros@>=
#define @[HTEG_TABLE(I)@] \
{@+ List l; @+ hteg_list(&l);@+}\
{@+ List l; @+ hteg_list(&l);@+}\
if ((I)&b100) {Xdimen x;@+ hteg_xdimen_node(&x);@+} else HTEG_REF(xdimen_kind)@;
@
@<cases to skip content@>=
@t\1\kern1em@>case TAG(table_kind,b000): @+ HTEG_TABLE(b000); @+ break;
case TAG(table_kind,b001): @+ HTEG_TABLE(b001); @+ break;
case TAG(table_kind,b010): @+ HTEG_TABLE(b010); @+ break;
case TAG(table_kind,b011): @+ HTEG_TABLE(b011); @+ break;
case TAG(table_kind,b100): @+ HTEG_TABLE(b100); @+ break;
case TAG(table_kind,b101): @+ HTEG_TABLE(b101); @+ break;
case TAG(table_kind,b110): @+ HTEG_TABLE(b110); @+ break;
case TAG(table_kind,b111): @+ HTEG_TABLE(b111); @+ break;@#
case TAG(item_kind,b000): @+{@+ List l; @+hteg_list(&l);@+ } @+ break;
case TAG(item_kind,b001): hteg_content_node(); @+ break;
case TAG(item_kind,b010): hteg_content_node(); @+ break;
case TAG(item_kind,b011): hteg_content_node(); @+ break;
case TAG(item_kind,b100): hteg_content_node(); @+ break;
case TAG(item_kind,b101): hteg_content_node(); @+ break;
case TAG(item_kind,b110): hteg_content_node(); @+ break;
case TAG(item_kind,b111): hteg_content_node(); @+{uint8_t n;@+ n=HTEG8;@+}@+ break;
@
\subsection{Stream Nodes}\index{stream}
@<skip macros@>=
#define @[HTEG_STREAM(I)@] @/\
{ List l; @+hteg_list(&l); @+}\
if ((I)&b010) { List l; @+hteg_param_list(&l); @+} @+ else HTEG_REF(param_kind);\
HTEG_REF(stream_kind);
@
@<cases to skip content@>=
@t\1\kern1em@>case TAG(stream_kind,b000): HTEG_STREAM(b000); @+ break;
case TAG(stream_kind,b010): HTEG_STREAM(b010); @+ break;
@
\subsection{References}\index{reference}
\noindent
@<skip macros@>=
#define @[HTEG_REF(K)@] do@+{uint8_t n; @+ n=HTEG8;@+} @+ while (false)
@
@<cases to skip content@>=
@t\1\kern1em@>case TAG(penalty_kind,0): HTEG_REF(penalty_kind); @+break;
case TAG(kern_kind,b000): HTEG_REF(dimen_kind); @+break;
case TAG(kern_kind,b100): HTEG_REF(dimen_kind); @+break;
case TAG(kern_kind,b001): HTEG_REF(xdimen_kind); @+break;
case TAG(kern_kind,b101): HTEG_REF(xdimen_kind); @+break;
case TAG(ligature_kind,0): HTEG_REF(ligature_kind); @+break;
case TAG(disc_kind,0): HTEG_REF(disc_kind); @+break;
case TAG(glue_kind,0): HTEG_REF(glue_kind); @+break;
case TAG(language_kind,0): HTEG_REF(language_kind); @+break;
case TAG(rule_kind,0): HTEG_REF(rule_kind); @+break;
case TAG(image_kind,0): HTEG_REF(image_kind); @+break;
case TAG(leaders_kind,0): HTEG_REF(leaders_kind); @+break;
case TAG(baseline_kind,0): HTEG_REF(baseline_kind); @+break;
@
\subsection{Unknown nodes}
@<skip functions@>=
static int hteg_unknown(Tag z)
{ int b, n;
int8_t s;
s = hnode_size[z];
DBG(DBGTAGS,"Trying unknown tag 0x%x at 0x%x\n",z,(uint32_t)(hpos-hstart-1));
if (s==0) return 0;
b=NODE_HEAD(s); n=NODE_TAIL(s);
DBG(DBGTAGS,"Trying unknown node size %d %d\n",b,n);
while (n>0) {
z=*(hpos-1);
if (KIND(z)==xdimen_kind) { Xdimen x; hteg_xdimen_node(&x); }
else if (KIND(z)==param_kind) { List l; @+hteg_param_list(&l); @+}
else if (KIND(z)<=list_kind) { List l; @+hteg_list(&l); @+}
else hteg_content_node();
n--; }
while (b>0) { z=HTEG8; b--;}
return 1;
}
@
\section{Code and Header Files}\index{code file}\index{header file}
\subsection{{\tt basetypes.h}}
To define basic types in a portable way, we create an include file.
The macro |_MSC_VER| (Microsoft Visual C Version)\index{Microsoft Visual C}
is defined only if using the respective compiler.
\index{false+\\{false}}\index{true+\\{true}}\index{bool+\&{bool}}
@(basetypes.h@>=
#ifndef __BASETYPES_H__
#define __BASETYPES_H__
#include <stdlib.h>
#include <stdio.h>
#ifndef _STDLIB_H
#define _STDLIB_H
#endif
#ifdef _MSC_VER
#include <windows.h>
#define uint8_t UINT8
#define uint16_t UINT16
#define uint32_t UINT32
#define uint64_t UINT64
#define int8_t INT8
#define int16_t INT16
#define int32_t INT32
#define bool BOOL
#define true (0==0)
#define false (!true)
#define __SIZEOF_FLOAT__ 4
#define __SIZEOF_DOUBLE__ 8
#define PRIx64 "I64x"
#pragma @[warning( disable : @[4244@]@t @> @[4996@]@t @> @[4127@])@]
#else
#include <stdint.h>
#include <stdbool.h>
#include <inttypes.h>
#include <unistd.h>
#ifdef WIN32
#include <io.h>
#endif
#endif
typedef float float32_t;
typedef double float64_t;
#if __SIZEOF_FLOAT__!=4
#error @=float32 type must have size 4@>
#endif
#if __SIZEOF_DOUBLE__!=8
#error @=float64 type must have size 8@>
#endif
#define HINT_VERSION 2
#define HINT_MINOR_VERSION 2
#define AS_STR(X) #X
#define VERSION_AS_STR(X,Y) AS_STR(X) "." AS_STR(Y)
#define HINT_VERSION_STRING VERSION_AS_STR(HINT_VERSION, HINT_MINOR_VERSION)
#endif
@
\subsection{{\tt format.h}}\index{format.h+{\tt format.h}}
The \.{format.h} file contains definitions of types, macros, variables and functions
that are needed in other compilation units.
@(format.h@>=
#ifndef _HFORMAT_H_
#define _HFORMAT_H_
@<debug macros@>@;
@<debug constants@>@;
@<hint macros@>@;
@<hint basic types@>@;
@<default names@>@;
extern const char *content_name[32];
extern const char *definition_name[32];
extern unsigned int debugflags;
extern FILE *hlog;
extern int max_fixed[32], max_default[32], max_ref[32], max_outline;
extern int32_t int_defaults[MAX_INT_DEFAULT+1];
extern Dimen dimen_defaults[MAX_DIMEN_DEFAULT+1];
extern Xdimen xdimen_defaults[MAX_XDIMEN_DEFAULT+1];
extern Glue glue_defaults[MAX_GLUE_DEFAULT+1];
extern Baseline baseline_defaults[MAX_BASELINE_DEFAULT+1];
extern Label label_defaults[MAX_LABEL_DEFAULT+1];
extern ColorSet color_defaults[MAX_COLOR_DEFAULT+1];
extern signed char hnode_size[0x100];
extern uint8_t content_known[32];
#endif
@
\subsection{{\tt tables.c}}\index{tables.c+{\tt tables.c}}\index{mktables.c+{\tt mktables.c}}
For maximum flexibility and efficiency, the file {\tt tables.c}
is generated by a \CEE\ program.
Here is the |main| program of {\tt mktables}:
@(mktables.c@>=
#include "basetypes.h"
#include "format.h"
@<skip macros@>@;
int max_fixed[32], max_default[32];
int32_t int_defaults[MAX_INT_DEFAULT+1]={0};
Dimen dimen_defaults[MAX_DIMEN_DEFAULT+1]={0};
Xdimen xdimen_defaults[MAX_XDIMEN_DEFAULT+1]={{0}};
Glue glue_defaults[MAX_GLUE_DEFAULT+1]={{{0}}};
Baseline baseline_defaults[MAX_BASELINE_DEFAULT+1]={{{{0}}}};
signed char hnode_size[0x100]={0};
uint8_t content_known[32]={0};
@<define |content_name| and |definition_name|@>@;
int main(void)
{ Kind k;
int i;
printf("#include \"basetypes.h\"\n"@/
"#include \"format.h\"\n\n");@/
@<print |content_name| and |definition_name|@>@;
printf("int max_outline=-1;\n\n");
@<take care of variables without defaults@>@;
@<define |int_defaults|@>@;
@<define |dimen_defaults|@>@;
@<define |glue_defaults|@>@;
@<define |xdimen_defaults|@>@;
@<define |baseline_defaults|@>@;
@<define page defaults@>@;
@<define stream defaults@>@;
@<define range defaults@>@;
@<define |label_defaults|@>@;
@<define |color_defaults|@>@;
@<print defaults@>@;
@<initialize the |hnode_size| array@>@;
@<print the |hnode_size| variable@>@;
@<print the |content_known| variable@>@;
return 0;
}
@
The following code prints the arrays containing the default values.
@<print defaults@>=
printf("int max_fixed[32]= {");
for (k=0; k<32; k++)@/
{ printf("%d",max_fixed[k]);@+
if (k<31) printf(", ");@+
}
printf("};\n\n");@#
printf("int max_default[32]= {");
for (k=0; k<32; k++)@/
{ printf("%d",max_default[k]);@+
if (k<31) printf(", ");@+
}
printf("};\n\n");
printf("int max_ref[32]= {");
for (k=0; k<32; k++)@/
{ printf("%d",max_default[k]);@+
if (k<31) printf(", ");@+
}
printf("};\n\n");
@
\subsection{{\tt get.h}}\index{get.h+{\tt get.h}}
The \.{get.h} file contains function prototypes for all the functions
that read the short format.
@(get.h@>=
@<hint types@>@;
@<directory entry type@>@;
@<shared get macros@>@;
extern Entry *dir;
extern uint16_t section_no, max_section_no;
extern uint8_t *hpos, *hstart, *hend, *hpos0;
extern uint64_t hin_size, hin_time;
extern uint8_t *hin_addr;
extern Label *labels;
extern char *hin_name;
extern bool hget_map(void);
extern void hget_unmap(void);
extern void new_directory(uint32_t entries);
extern void hset_entry(Entry *e, uint16_t i, @|uint32_t size, uint32_t xsize, char *file_name);
extern void hget_banner(void);
extern void hget_section(uint16_t n);
extern void hget_entry(Entry *e);
extern void hget_directory(void);
extern void hclear_dir(void);
extern bool hcheck_banner(char *magic);
extern int max_range;
extern void hget_max_definitions(void);
extern uint32_t hget_utf8(void);
extern void hget_size_boundary(Info info);
extern uint32_t hget_list_size(Info info);
extern void hget_list(List *l);
extern float32_t hget_float32(void);
extern void hff_hpos(void);
extern uint32_t hff_list_pos, hff_list_size;
extern Tag hff_tag;
extern float32_t hteg_float32(void);
extern uint32_t hteg_list_size(Info info);
/* seems like these are declared static */
#if 0
extern void hteg_list(List *l);
extern void hteg_size_boundary(Info info);
#endif
@
\subsection{{\tt get.c}}\index{get.c+{\tt get.c}}
@(get.c@>=
#include "basetypes.h"
#include <string.h>
#include <math.h>
#include <zlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include "error.h"
#include "format.h"
#include "get.h"
@<common variables@>@;
@<map functions@>@;
@<function to check the banner@>@;
@<directory functions@>@;
@<get file functions@>@;
@<shared get functions@>@;
@<shared skip functions@>@;
@
\subsection{{\tt put.h}}\index{put.h+{\tt put.h}}
The \.{put.h} file contains function prototypes for all the functions
that write the short format.
@(put.h@>=
@<put macros@>@;
@<hint macros@>@;
@<hint types@>@;
@<directory entry type@>@;
extern Entry *dir;
extern uint16_t section_no, max_section_no;
extern uint8_t *hpos, *hstart, *hend, *hpos0;
extern int next_range;
extern RangePos *range_pos;
extern int next_range, max_range;
extern int *page_on;
extern Label *labels;
extern int first_label;
extern int max_outline;
extern Outline *outlines;
extern FILE *hout;
extern void new_directory(uint32_t entries);
extern void new_output_buffers(void);
/* declarations for the parser */
extern void hput_definitions_start(void);
extern void hput_definitions_end(void);
extern void hput_content_start(void);
extern void hput_content_end(void);
extern void hset_label(int n,int w);
extern Tag hput_link(int n, int c, int on);
extern void hset_outline(int m, int r, int d, uint32_t p);
extern void hput_label_defs(void);
extern void hput_tags(uint32_t pos, Tag tag);
extern Tag hput_glyph(Glyph *g);
extern Tag hput_xdimen(Xdimen *x);
extern Tag hput_int(int32_t p);
extern Tag hput_language(uint8_t n);
extern Tag hput_rule(Rule *r);
extern Tag hput_glue(Glue *g);
extern Tag hput_list(uint32_t size_pos, List *y);
extern uint8_t hsize_bytes(uint32_t n);
extern void hput_txt_cc(uint32_t c);
extern void hput_txt_font(uint8_t f);
extern void hput_txt_global(Ref *d);
extern void hput_txt_local(uint8_t n);
extern Info hput_box_dimen(Dimen h, Dimen d, Dimen w);
extern Info hput_box_shift(Dimen a);
extern Info hput_box_glue_set(int8_t s, float32_t r, Order o);
extern void hput_stretch(Stretch *s);
extern Tag hput_kern(Kern *k);
extern void hput_utf8(uint32_t c);
extern Tag hput_ligature(Lig *l);
extern Tag hput_disc(Disc *h);
extern Info hput_span_count(uint32_t n);
extern void hextract_image_dimens(int n, double *a, Dimen *w, Dimen *h);
extern Info hput_image_spec(uint32_t n, float32_t a, uint32_t wr, Xdimen *w, uint32_t hr, Xdimen *h);
extern int colors_i;
extern ColorSet colors_0, colors_n;
extern void color_init(void);
extern void hput_color_def(uint32_t pos, int n);
extern void hput_string(char *str);
extern void hput_range(uint8_t pg, bool on);
extern void hput_max_definitions(void);
extern Tag hput_dimen(Dimen d);
extern Tag hput_font_head(uint8_t f, char *n, Dimen s,@| int m, uint16_t y);
extern void hput_range_defs(void);
extern void hput_xdimen_node(Xdimen *x);
extern void hput_directory(void);
extern size_t hput_hint(char * str);
extern void hput_list_size(uint32_t n, int i);
extern uint32_t hput_unknown_def(uint32_t t, uint32_t b, uint32_t n);
extern Tag hput_unknown(uint32_t pos,uint32_t t, uint32_t b, uint32_t n);
extern int hcompress_depth(int n, int c);
@
\subsection{{\tt put.c}}\label{writeshort}\index{put.c+{\tt put.c}}
\noindent
@(put.c@>=
#include "basetypes.h"
#include <string.h>
#include <ctype.h>
#include <math.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <zlib.h>
#include "error.h"
#include "format.h"
#include "put.h"
@<common variables@>@;
@<shared put variables@>@;
@<directory functions@>@;
@<function to write the banner@>@;
@<put functions@>@;
@
\subsection{{\tt lexer.l}}\index{lexer.l+{\tt lexer.l}}\index{scanning}
The definitions for lex are collected in the file {\tt lexer.l}
@(lexer.l@>=
%{
#include "basetypes.h"
#include "error.h"
#include "format.h"
#include "put.h"
@<enable bison debugging@>@;
#include "parser.h"
@<scanning macros@>@;@+
@<scanning functions@>@;
int yywrap (void )@+{ return 1;@+}
#ifdef _MSC_VER
#pragma warning( disable : 4267)
#endif
%}
%option yylineno stack batch never-interactive
%option debug
%option nounistd nounput noinput noyy_top_state
@<scanning definitions@>@/
%%
@<scanning rules@>@/
::@=[a-z]+@> :< QUIT("Unexpected keyword '%s' in line %d",@|yytext,yylineno); >:
::@=.@> :< QUIT("Unexpected character '%c' (0x%02X) in line %d",@|yytext[0]>' '?yytext[0]:' ',yytext[0],yylineno); >:
%%
@
\subsection{{\tt parser.y}}\index{parser.y+{\tt parser.y}}\index{parsing}
The grammar rules for bison are collected in the file {\tt parser.y}.
% for the option %token-table use the command line parameter -k
@(parser.y@>=
%{
#include "basetypes.h"
#include <string.h>
#include <math.h>
#include "error.h"
#include "format.h"
#include "put.h"
extern char **hfont_name; /* in common variables */
@<definition checks@>@;
extern void hset_entry(Entry *e, uint16_t i, @|uint32_t size,
uint32_t xsize, char *file_name);
@<enable bison debugging@>@;
extern int yylex(void);
@<parsing functions@>@;
%}
@t{\label{union}\index{union}\index{parsing}}@>
%union {uint32_t u; @+ int32_t i; @+ char *s; @+ float64_t f; @+ Glyph c;
@+ Dimen @+d; Stretch st; @+ Xdimen xd; @+ Kern kt;
@+ Rule r; @+ Glue g; @+ @+ Image x;
@+ List l; @+ Box h; @+ Disc dc; @+ Lig lg;
@+ Ref rf; @+ Info info; @+ Order o; bool@+ b;
}
@t{}@>
%error_verbose
%start hint
@t@>
@<symbols@>@/
%%
@<parsing rules@>@;
%%
@
\subsection{{\tt shrink.c}}\index{shrink.c+{\tt shrink.c}}
\.{shrink} is a \CEE\ program translating a \HINT\ file in long format into a \HINT\ file in short format.
@(shrink.c@>=
#include "basetypes.h"
#include <math.h>
#include <string.h>
#include <ctype.h>
#include <sys/types.h>
#include <sys/stat.h>
#ifdef WIN32
#include <direct.h>
#endif
#include <zlib.h>
#include "error.h"
#include "format.h"
#include "put.h"
@<enable bison debugging@>@;
#include "parser.h"
extern void yyset_debug(int lex_debug);
extern int yylineno;
extern FILE *yyin, *yyout;
extern int yyparse(void);
@<put macros@>@;
@<common variables@>@;
@<shared put variables@>@;
@<function to check the banner@>@;
@<directory functions@>@;
@<function to write the banner@>@;
@<put functions@>@;
#define SHRINK
#define DESCRIPTION "\nConvert a `long' ASCII HINT file into a `short' binary HINT file.\n"
int main(int argc, char *argv[])
{ @<local variables in |main|@>@;
in_ext=".hint";
out_ext=".hnt";
@<process the command line@>@;
if (debugflags&DBGFLEX) yyset_debug(1); else yyset_debug(0);
#if YYDEBUG
if (debugflags&DBGBISON) yydebug=1;
else yydebug=0;
#endif
@<open the log file@>@;
@<open the input file@>@;
@<open the output file@>@;
yyin=hin;
yyout=hlog;
@<read the banner@>@;
if (!hcheck_banner("HINT")) QUIT("Invalid banner");
yylineno++;
DBG(DBGBISON|DBGFLEX,"Parsing Input\n");
yyparse();
hput_directory();
@<rewrite the file names of optional sections@>@;
hput_hint("created by shrink");
@<close the output file@>@;
@<close the input file@>@;
@<close the log file@>@;
return 0;
}
@
\subsection{{\tt stretch.c}}\label{stretchmain}\index{stretch.c+{\tt stretch.c}}
\.{stretch} is a \CEE\ program translating a \HINT\ file in short
format into a \HINT\ file in long format.
@(stretch.c@>=
#include "basetypes.h"
#include <math.h>
#include <string.h>
#include <ctype.h>
#include <zlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#ifdef WIN32
#include <direct.h>
#endif
#include <fcntl.h>
#include "error.h"
#include "format.h"
#include "get.h"
@<get macros@>@;
@<write macros@>@;
@<common variables@>@;
@<shared put variables@>@;
@<map functions@>@;
@<function to check the banner@>@;
@<function to write the banner@>@;
@<directory functions@>@;
@<definition checks@>@;
@<get function declarations@>@;
@<write functions@>@;
@<get file functions@>@;
@<shared get functions@>@;
@<get functions@>@;
#define STRETCH
#define DESCRIPTION "\nConvert a `short' binary HINT file into a `long' ASCII HINT file.\n"
int main(int argc, char *argv[])
{ @<local variables in |main|@>@;
in_ext=".hnt";
out_ext=".hint";
@<process the command line@>@;
@<open the log file@>@;
@<open the output file@>@;
@<determine the |stem_name| from the output |file_name|@>@;
if (!hget_map()) QUIT("Unable to map the input file");
hpos=hstart=hin_addr;
hend=hstart+hin_size;
hget_banner();
if (!hcheck_banner("hint")) QUIT("Invalid banner");
hput_banner("HINT","created by stretch");
hget_directory();
hwrite_directory();
hget_definition_section();
hwrite_content_section();
hwrite_aux_files();
hget_unmap();
@<close the output file@>@;
DBG(DBGBASIC,"End of Program\n");
@<close the log file@>@;
return 0;
}
@
In the above program, the get functions call the write functions
and the write functions call some get functions. This requires
function declarations to satisfy the define before use requirement
of \CEE. Some of the necessary function declarations are already
contained in {\tt get.h}. The remaining declarations are these:
@<get function declarations@>=
extern void hget_xdimen_node(Xdimen *x);
extern void hget_def_node(void);
extern void hget_font_def(Info i, uint8_t f);
extern void hget_content_section(void);
extern Tag hget_content_node(void);
extern void hget_glue_node(void);
extern void hget_rule_node(void);
extern void hget_hbox_node(void);
extern void hget_vbox_node(void);
extern void hget_param_list(List *l);
extern int hget_txt(void);
extern int hget_unknown(Tag a);
@
\subsection{{\tt skip.c}}\label{skip}\index{skip.c+{\tt skip.c}}
\.{skip} is a \CEE\ program reading the content section of a \HINT\ file in short format
backwards.
@(skip.c@>=
#include "basetypes.h"
#include <math.h>
#include <string.h>
#include <ctype.h>
#include <zlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include "error.h"
#include "format.h"
#if 1
#include "get.h"
#else
@<hint types@>@;
@<directory entry type@>@;
@<shared get macros@>@;
#endif
@<get macros@>@;
@<write macros@>@;
@<common variables@>@;
@<shared put variables@>@;
@<map functions@>@;
@<function to check the banner@>@;
@<directory functions@>@;
@<shared get macros@>@;
@<get file functions@>@;
@<skip macros@>@;
@<skip function declarations@>@;
@<shared skip functions@>@;
@<skip functions@>@;
@<definition checks@>@;
@<get function declarations@>@;
@<write functions@>@;
@<shared get functions@>@;
@<get functions@>@;
#define SKIP
#define DESCRIPTION "\n This program tests parsing a binary HINT file in reverse direction.\n"
int main(int argc, char *argv[])
{ @<local variables in |main|@>@;
in_ext=".hnt";
out_ext=".bak";
@<process the command line@>@;
@<open the log file@>@;
hout=NULL;
if (!hget_map()) QUIT("Unable to map the input file");
hpos=hstart=hin_addr;
hend=hstart+hin_size;
hget_banner();
if (!hcheck_banner("hint")) QUIT("Invalid banner");
hget_directory();
hget_definition_section();
DBG(DBGBASIC,"Skipping Content Section\n");
hteg_content_section();
DBG(DBGBASIC,"Fast forward Content Section\n");
hpos=hstart;
while(hpos<hend)
{ hff_hpos();
if (KIND(*(hpos-1))==par_kind && KIND(hff_tag)==list_kind && hff_list_size>0 && !(INFO(hff_tag)&b100))
{ uint8_t *p=hpos,*q;
DBG(DBGTAGS,"Fast forward list at 0x%x, size %d\n",hff_list_pos,hff_list_size);
hpos=hstart+hff_list_pos;
q=hpos+hff_list_size;
while (hpos<q)
hff_hpos();
DBG(DBGTAGS,"Fast forward list end at 0x%x\n",(uint32_t)(hpos-hstart));
hpos=p;
DBG(DBGTAGS,"Continue at 0x%x\n",(uint32_t)(hpos-hstart-1));
}
}
hget_unmap();
@<close the log file@>@;
return 0;
}
@
As we have seen already in the {\tt stretch} program, a few
function declarations are necessary to satisfy the define before
use requirement of \CEE.
@<skip function declarations@>=
static void hteg_content_node(void);
static void hteg_content(Tag z);
static void hteg_xdimen_node(Xdimen *x);
static void hteg_list(List *l);
static void hteg_param_list(List *l);
static void hteg_rule_node(void);
static void hteg_hbox_node(void);
static void hteg_vbox_node(void);
static void hteg_glue_node(void);
static int hteg_unknown(Tag z);
@
\thecodeindex
\crosssections
\plainsection{References}
{\baselineskip=11pt
\def\bfblrm{\small\rm}%
\def\bblem{\small\it}%
\bibliography{../hint}
\bibliographystyle{plain}
}
\plainsection{Index}
{
\def\_{{\tt \UL}} % underline in a string
\catcode`\_=\active \let_=\_ % underline is a letter
\input format.ind
}
\write\cont{} % ensure that the contents file isn't empty
% \write\cont{\catcode `\noexpand\@=12\relax} % \makeatother
\closeout\cont% the contents information has been fully gathered
|