1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574
|
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
NNNNAAAAMMMMEEEE
hhttttpp--aannaallyyzzee - a fast log analyzer for web servers
SSSSYYYYNNNNOOOOPPPPSSSSIIIISSSS
hhttttpp--aannaallyyzzee [--{{hhddmmBBVVXX}}] [--33aaeeffggnnqqvvxxyyMM] [--bb _b_u_f_s_i_z_e] [--cc _c_f_g_f_i_l_e]
[--ii _n_e_w_c_f_g] [--ll _l_i_b_d_i_r] [--oo _o_u_t_d_i_r] [--pp _p_r_v_d_i_r] [--ss _s_u_b_o_p_t,...]
[--tt _n_u_m,...] [--uu _t_i_m_e] [--ww _h_i_t_s] [--FF _l_o_g_f_m_t] [--LL _l_a_n_g] [--CC _c_h_r_s_e_t]
[--II _d_a_t_e] [--EE _d_a_t_e] [--GG _s_u_f_f_i_x,...] [--HH _i_d_x_f_i_l_e,...] [--OO _v_n_a_m_e,...]
[--PP _p_r_o_l_o_g] [--RR _d_o_c_r_o_o_t] [--SS _s_r_v_n_a_m_e] [--TT _T_L_D_f_i_l_e] [--UU _s_r_v_u_r_l]
[--WW _3_D_w_i_n] [--ZZ _s_h_o_w_d_o_m] [_l_o_g_f_i_l_e[...]]
DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN
hhttttpp--aannaallyyzzee analyzes the logfile of a web server and creates a detailed
summary of the servers's access load in graphical, tabular, and three-
dimensional form. The analyzer does this by
o reading all logfiles specified on the command line,
o saving all unique (different) URLs, hostnames, referrer URLs and
user agents,
o accounting for hits (successful requests), files sent, files
cached, data sent, etc.,
o and finally creating a statistics report for the period detected
in the logfile(s).
The resulting statistics report is a comprehensive view of the server's
logfile. The server writes a logfile entry for every response on behalf
of a request from a browser or a forwarding system such as proxy servers.
To understand the meaning of the terms in the report, you need a little
knowledge about the type of data your web server records in its logfile.
LLLLOOOOGGGGFFFFIIIILLLLEEEE FFFFOOOORRRRMMMMAAAATTTTSSSS
NNCCSSAA CCoommmmoonn LLooggffiillee FFoorrmmaatt ((CCLLFF))
The basic logfile format supported by allmost all servers is the _N_C_S_A
_C_o_m_m_o_n _L_o_g_f_i_l_e _F_o_r_m_a_t. It contains the following information for each
request (hit):
dns-name - auth-user [date] "clf-request" clf-status ct-length
where the fields have following meaning:
dns-name The IP number of the system accessing the web server. If
there is an entry in the _D_o_m_a_i_n _N_a_m_e _S_y_s_t_e_m (_D_N_S) for this
IP number and the web server is configured to do DNS
lookups, the corresponding hostname is logged instead.
- Unused.
auth-user The username provided by the client if authentication was
required.
[date] The date of the access in format
[DD/MMM/YYYY:HH:MM:SS +-ZZZZ].
Page 1 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
"clf-request" The request in format "method URI proto", where _m_e_t_h_o_d is
one of GGEETT, HHEEAADD, PPOOSSTT, PPUUTT, BBRROOWWSSEE, OOPPTTIIOONNSS, DDEELLEETTEE or
TTRRAACCEE; _U_R_I is the _U_n_i_f_o_r_m _R_e_s_o_u_r_c_e _I_d_e_n_t_i_f_i_e_r, and _p_r_o_t_o is
the HTTP version number.
clf-status The (numerical) response code from the server.
ct-length This is either the size of the document or the data
actually sent over the wire.
Following is an example for an entry in _N_C_S_A _C_o_m_m_o_n _L_o_g_f_i_l_e _F_o_r_m_a_t:
car.4rent.de - - [01/Aug/1999:00:00:02 +0100] "GET /doc.html HTTP/1.1" 200 393
WW33CC EExxtteennddeedd LLooggffiillee FFoorrmmaatt ((EELLFF))
The _W_3_C _E_x_t_e_n_d_e_d _L_o_g_f_i_l_e _F_o_r_m_a_t (_E_L_F) is basically _N_C_S_A _C_L_F plus user-
agent and referrer URL information. hhttttpp--aannaallyyzzee supports two variants
of this extended format: _D_L_F and _E_L_F.
The _D_L_F format adds the referrer URL and the user-agent in this order
with or without surrounding double quotes:
CLF "referrer_URL" "user_agent"
CLF referrer_URL user_agent
This is an example for an entry in _D_L_F format (wrapped on two lines for
readability):
car.4rent.de - - [01/Aug/1999:00:00:02 +0100] "GET /doc.html HTTP/1.1" 200 393
"http://inet-tv.net/hot.html" "Mozilla/4.05 (X11; I; IRIX64 6.4 IP30)"
The _E_L_F format also adds the referrer URL and the user-agent, but in the
opposite order and without the double quotes:
CLF user_agent referrer_URL
This is an example for an entry in _E_L_F format (wrapped on two lines for
readability):
car.4rent.de - - [01/Aug/1999:00:00:02 +0100] "GET /doc.html HTTP/1.1" 200 393
Mozilla/4.05 (X11; I; IRIX64 6.4 IP30) http://inet-tv.net/index.html
The _E_L_F variant is the preferred method to pass referrer URL and user-
agent information. When this format is used, hhttttpp--aannaallyyzzee searches
backwards for the protocol specification of the referrer URL (to be
precise, it looks for the colon in hhttttpp::) and then for the preceeding
Page 2 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
blank. This ensures that broken referrer URLs which contain blanks or
double quotes are handled correctly.
To select either logfile format, edit the configuration file of your web
server and define the fields to be logged. See the web server's
documentation for information how to customize logging.
AAuuttoommaattiicc ddeetteeccttiioonn ooff tthhee llooggffiillee ffoorrmmaatt
hhttttpp--aannaallyyzzee tries to automatically detect the correct logfile format by
analyzing the first few entries of a logfile (this works only if your
server records a hyphen (`-') for empty referrer URL or user-agent
fields). If hhttttpp--aannaallyyzzee detects referrer URL and user-agent
information, it assumes the _E_L_F variant of the _W_3_C _E_x_t_e_n_d_e_d _L_o_g_f_i_l_e
_F_o_r_m_a_t. To process the _D_L_F variant, specify the logfile format
explicitely using the option --FF.
LLooggffiillee ddaattaa uusseedd bbyy hhttttpp--aannaallyyzzee
The statistics report shows a summary of the information which has been
recorded into the logfile by the web server. For each logfile entry
hhttttpp--aannaallyyzzee processes the origin (sitename) and date of the request, the
request method, the URL of the requested object, the server's response on
behalf of the request, the size of the requested object and optionally
the user-agent and the referrer URL if sent by the client.
Note that hhttttpp--aannaallyyzzee does not recognize visitors, email addresses of
users visting your server, the path a user took through your web site,
the last page visited by a user before leaving your site nor anything
else not recorded in the server's logfile. Although hostnames are
recorded for each request, they must not necessarily correspond to the
real system actually used by a visitor - the request could be forwarded
through a dialup service for example. Furthermore, no request may get
logged by your server at all while someone is surfing through cached
copies of parts of your site depending on the configuration of his/her
browser ...
Page 3 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
BBBBAAAASSSSIIIICCCC OOOOPPPPEEEERRRRAAAATTTTIIIIOOOONNNN
By default, hhttttpp--aannaallyyzzee creates a _f_u_l_l _s_t_a_t_i_s_t_i_c_s _r_e_p_o_r_t for a whole
month, which contains complete details for the period determined by the
timestamps of the first and last logfile entry processed. It is
therefore extremly important to always feed all logfiles for a whole
month into hhttttpp--aannaallyyzzee, no matter how frequently you rotate (save) the
logfiles.
The recommended way of providing an up-to-date statistics report for a
web server is to have a script running hhttttpp--aannaallyyzzee automatically on a
regular base, say twice per day, and have it process the current logfile
of the web server from the beginning of the current month until today.
At the first of a new month, the logfile should be saved elsewhere and
the web server should be restarted to create a new logfile for the new
month. Then run hhttttpp--aannaallyyzzee on the old (saved) logfile to create a final
statistics report for the previous month. A history file is used to
produce a summary for the last 12 months on the main page of the
statistics report without having to analyze logfiles for those older
periods again.
If you rotate the logfile more often to be able to compress them - for
example, once per day -, you must uncompress and concatenate all separate
logfiles for the whole month into one, chronologically ordered data
stream, which the can be processed by hhttttpp--aannaallyyzzee.
FFuullll ssttaattiissttiiccss rreeppoorrtt
Due to technical reasons, a full statistics report will not be created
before the second day of a new month, although the totals for the first
day of the new month on the summary main page of the report will be
updated. A full statistics report contains a detailed summary including
the following items (see the section _I_n_t_e_r_p_r_e_t_a_t_i_o_n _o_f _t_h_e _r_e_s_u_l_t_s for an
explanation of the terms):
o the number of hits, files sent/cached, pageviews, sessions and the
amount of data sent
o the total amount of data requested, transferred, and saved by
caching mechanisms
o the total number of unique URLs, sites, sessions, browser types and
referrer URLs
o the total number of all response codes other than Code 200 (_O_K)
o the total number of requests which required authentication
o the average load per week, day, hour, minute and second
o the Top 7 days, 24 hours, 5 minutes and 5 seconds
o the Top 30 most commonly accessed URLs (hits, files, pageviews,
sessions, data sent)
o the 10 least frequently accessed URLs (hits, files, pageviews,
sessions, data sent)
o the Top 30 client domains, browser types, and referrer hosts
o an overview and a detailed list of all files, sitenames, browser
types and referrer URLs
o a list of all Code 404 (_N_o_t _F_o_u_n_d) responses
Page 4 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
SShhoorrtt ssttaattiissttiiccss rreeppoorrtt
Since analyzing the complete logfile for a whole month increases
processing time on heavily accessed web servers, you can instruct hhttttpp--
aannaallyyzzee to create a _s_h_o_r_t _s_t_a_t_i_s_t_i_c_s _r_e_p_o_r_t for the current day only. In
this mode, hhttttpp--aannaallyyzzee updates only the daily totals for the current
month in the _H_i_t_s _b_y _D_a_y section of the report and saves the results in a
history file. If the analyzer is then run a second time to update the
_s_h_o_r_t _s_t_a_t_i_s_t_i_c_s _r_e_p_o_r_t, it skips all logfile entries from the beginning
of the month until it detects any entries for the current day, which are
then processed to produce an up-to-date _H_i_t_s _b_y _D_a_y section in the
statistics report.
In _s_h_o_r_t _s_t_a_t_i_s_t_i_c_s _m_o_d_e, hhttttpp--aannaallyyzzee needs only a fraction of
processing time required for a _f_u_l_l _s_t_a_t_i_s_t_i_c_s _r_e_p_o_r_t, but it updates
only a very small part of the statistics report so that this should be
considered an additional feature rather than a replacement for the _f_u_l_l
_s_t_a_t_i_s_t_i_c_s _m_o_d_e. The recommended way for using this feature is to have
hhttttpp--aannaallyyzzee generate a _f_u_l_l _s_t_a_t_i_s_t_i_c_s _r_e_p_o_r_t once per day or week,
while generating an up-to-date _s_h_o_r_t _s_t_a_t_i_s_t_i_c_s _r_e_p_o_r_t as often as once
per hour or day.
UUUUSSSSEEEERRRR IIIINNNNTTTTEEEERRRRFFFFAAAACCCCEEEESSSS
Two user interfaces exists for access to the statistics report: a
conventional interface suitable for any browser and a frames-based
interface which requires JavaScript.
TThhee ccoonnvveennttiioonnaall iinntteerrffaaccee
The conventional interface appears as in version 1.9 if JavaScript is
disabled in your browser or the option --gg was specified at invocation of
hhttttpp--aannaallyyzzee. If JavaScript is enabled, the following separate windows
are used for different parts of the report to allow for easy navigation:
_T_h_e _M_a_i_n _w_i_n_d_o_w
This window is used for most parts of the report such as the yearly,
monthly, daily and weekly summaries, the _T_o_p _N lists and the
overviews. Hotlinks in the _T_o_p _N most often point to the
corresponding page, which is then displayed in the _V_i_e_w_e_r _w_i_n_d_o_w if
the link is followed, while hotlinks in the overviews point to the
detailed lists, which will show up in the _L_i_s_t _w_i_n_d_o_w.
_T_h_e _N_a_v_i_g_a_t_i_o_n _w_i_n_d_o_w
If JavaScript is enabled in your browser and a summary for a year or
a month is loaded into the main window, a small window containing a
navigation panel will pop up. If JavaScript is disabled, the
navigation links appear at the bottom of the monthly summary pages.
In the latter case, use the _B_a_c_k button of your browser for
navigation.
_T_h_e _L_i_s_t _w_i_n_d_o_w
This window is used for the detailed lists of URLs, sites, browser
Page 5 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
types and referrer URLs. A separate window for those (often large)
lists causes them to be loaded only once if you follow any link in
the _M_a_i_n _w_i_n_d_o_w while the _L_i_s_t _w_i_n_d_o_w is still open.
_T_h_e _V_i_e_w_e_r _w_i_n_d_o_w
This window is used for external pages which are loaded by following
the hotlinks in the statistics report. This way, you can visit the
pages referred to in the report without having to go forth and back
between the report itself and the pages listed there.
_T_h_e _3_D _w_i_n_d_o_w
This window is used for the 3D (VRML) model of the statistics. If
you have JavaScript enabled, the window's size will be set to the
smallest possible size so that the 3D model fits onto the screen or
to the dimensions specified with the 33DDWWiinnSSiizzee directive.
TThhee ffrraammeess--bbaasseedd iinntteerrffaaccee
The frames-based interface requires a JavaScript-enabled browser. It
contains the following frames and windows:
_T_h_e _N_a_v_i_g_a_t_i_o_n _f_r_a_m_e
This frame contains navigation buttons and text. You can specify
its width using the NNaavviiggFFrraammee directive in the configuration file.
_T_h_e _M_a_i_n _f_r_a_m_e
This frame is used for most parts of the report such as the yearly,
monthly, daily and weekly summaries, the _T_o_p _N lists and the
overviews. Hotlinks in the _T_o_p _N lists point most often to the
corresponding page, which is displayed in the _V_i_e_w_e_r _w_i_n_d_o_w if the
link is followed, while hotlinks in the overviews point to the
detailed lists, which show up in the _L_i_s_t _w_i_n_d_o_w.
_T_h_e _L_i_s_t _w_i_n_d_o_w
This window is used for the detailed lists of URLs, sites, browser
types and referrer URLs. A separate window for those (often large)
lists causes them to be loaded only once if the links in the _M_a_i_n
_w_i_n_d_o_w are followed and the _L_i_s_t _w_i_n_d_o_w is still open.
_T_h_e _V_i_e_w_e_r _w_i_n_d_o_w
This (separate) window is used for external pages which are loaded
by following the hotlinks in the statistics report. This way, you
can visit the pages referred to in the report without having to go
forth and back between the report and the pages listed there.
_T_h_e _3_D _w_i_n_d_o_w
This window is used for the 3D (VRML) model of the statistics.
Depending on the setting of the 33DDWWiinnddooww directive in the
configuration file, this is either a separate window (_e_x_t_e_r_n_a_l) or a
new frame (_i_n_t_e_r_n_a_l) inside the _M_a_i_n _f_r_a_m_e (actually, two frames are
created which replace the former _M_a_i_n _f_r_a_m_e when the 3D model is
being displayed). In case of a separate (external) _3_D _w_i_n_d_o_w, you
Page 6 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
can specify its dimensions using the 33DDWWiinnSSiizzee directive.
TThhee 33DD mmooddeell
The 3D model requires a VRML 2.0 plug-in such as CosmoPlayer from Cosmo
Software (http://cosmosoftware.com/). Using this plug-in, which is
available for Silicon Graphics, Windows and Macintosh platforms, you can
>>walk<< or >>fly<< through the model and view the scene from all sides.
If you look at the models, don't forget to touch the buddha appearing in
our 3D logo on top of the statistics report in the yearly summary pages!
The 3D model contains two _s_c_e_n_e_s (models): one shows the hits, files,
cached files, sites and the amount of data sent by day and the other one
shows the server's access load by weekday and hour. To view the second
scene click on the _s_c_e_n_e _s_w_i_t_c_h on the right top of the model. To
navigate through the 3D space, use the pre-defined _V_i_e_w_p_o_i_n_t_s (camera
positions) and CosmoPlayer's _N_a_v_i_g_a_t_i_o_n _p_a_n_e_l. For customization use the
CosmoPlayer pop-up menu.
The 3D representation of hits by weekday and hour in the second scene
allow easy identification of the time your server has been most busy
serving requests.
IIIINNNNTTTTEEEERRRRPPPPRRRREEEETTTTAAAATTTTIIIIOOOONNNN OOOOFFFF TTTTHHHHEEEE RRRREEEESSSSUUUULLLLTTTTSSSS
hhttttpp--aannaallyyzzee creates a summary of the information found in the server's
logfile. The analyzer counts the requests, saves the unique URLs,
sitenames, browser types and referrer-URLs and creates a comprehensive
statistics report. The following terms are used in this report:
HHiittss (color: green) A hit is any response from the web server on behalf
of a request sent by a browser, such as text (HTML) files, images,
applets, audio/movie clips and even error messages. For example, if
a page is requested which contains two inline images, the server
would generate three hits: one hit for the HTML page itself and two
hits for the images. If an invalid URL is requested, the server
would respond with a Code 404 (_N_o_t _F_o_u_n_d) status code, which is also
a response accounted for as a hit.
FFiilleess
(color: blue) If the server sends back a file for this request, this
is accounted for as a Code 200 (_O_K) response. Such a response is
classified as a _f_i_l_e _s_e_n_t. Again, file here means any kind of a
file, no matter whether it contains text (HTML documents) or binary
data (images, applets, movies, etc.). Note that if you would
configure the web server to only log accesses to HTML files, but not
images nor any other binary data, the number of files would directly
correspond to the number of documents served.
CCaacchheedd
(color: yellow) A _c_a_c_h_e_d _f_i_l_e is a Code 304 (_N_o_t _M_o_d_i_f_i_e_d) response.
This response is generated by the server if a document hasn't
changed since the last time it was transferred to the site
Page 7 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
requesting it. If the browser has access to a local copy of a
document requested by a user - either through its local disk cache
or through a caching server -, it sends out a _c_o_n_d_i_t_i_o_n_a_l _r_e_q_u_e_s_t,
which asks for the document to be sent only if it has been changed
since it was requested the last time. If the document hasn't been
change since then, the server sends back a _C_o_d_e _3_0_4 response to
inform the browser that it can use its local copy.
While this caching mechanism can significantly reduce network
traffic, it causes an inaccuracy in the statistics report regarding
the number a file is requested by someone because of two reasons:
First, the browser can be configured to send conditional requests
_e_v_e_r_y _t_i_m_e, _o_n_c_e _p_e_r _s_e_s_s_i_o_n or _n_e_v_e_r if a cached file is requested.
Second, online services, ISPs, companies and many other
organizations use so-called caching servers or proxies, which itself
fulfill requests if the file is found in the cache. Since proxies
can serve hundreds to thousands of users, requests from certain
sites could be caused by thousands of users requesting a cached file
or by just one person with his/her browser configured to not cache
anything at all.
The ratio between _f_i_l_e_s _s_e_n_t and _c_a_c_h_e_d _f_i_l_e_s therefore reflects the
efficiency of caching mechanisms - but only for those requests which
were handled by your web server.
PPaaggeevviieewwss
(color: magenta) The _p_a_g_e_v_i_e_w mechanism can be used to separate
requests for text or HTML files from all other types of requests.
If a filename pattern has been defined, hhttttpp--aannaallyyzzee classifies all
URLs matching this pattern as pageviews (text files), which allows
to estimate the number of >>real<< text documents transmitted by
your web server. Filename patterns may be defined using the option
--GG or the PPaaggeeVViieeww directive in the configuration file. The suffix
..hhttmmll is pre-defined already.
KKBByytteess ttrraannssffeerrrreedd
(color: orange) This is the amount of data sent during the whole
summary period as reported by the server. Note that some servers
record the size of a document instead of the actual number of bytes
transferred. While in most cases this is the same, if a user
interrupts the transmission by pressing the browser's stop button
before the page has been received completely, some servers (for
example all Netscape web servers) log the size of the file instead
the amount of data transmitted actually.
KKBByytteess rreeqquueesstteedd
This is the amount of data requested during the whole summary
period. hhttttpp--aannaallyyzzee computes this number by summing up the values
of _K_B_y_t_e_s _t_r_a_n_s_f_e_r_r_e_d and _K_B_y_t_e_s _s_a_v_e_d _b_y _c_a_c_h_e (see below).
KKBByytteess ssaavveedd bbyy ccaacchhee
The amount of data saved by various caching mechanisms. This value
Page 8 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
is computed by multiplying the number of _c_a_c_h_e_d _f_i_l_e_s (_C_o_d_e _3_0_4)
responses with the size of the corresponding file. Because hhttttpp--
aannaallyyzzee can determine the size of a file only if the file has been
transmitted at least once in the same summary period, the values for
_K_B_y_t_e_s _s_a_v_e_d _b_y _c_a_c_h_e and _K_B_y_t_e_s _r_e_q_u_e_s_t_e_d are just approximations
of the real values.
UUnniiqquuee UURRLLss
The total number of _u_n_i_q_u_e _U_R_L_s is the sum of all different URLs
(files) on your web server, which have been requested at least once
in the corresponding summary period.
RReeffeerrrreerr UURRLLss
If a user follows a link to your web site and his/her browser sends
the URL of the page containing the link to the server, this URL is
logged as the _r_e_f_e_r_r_e_r _U_R_L (the location referring to your
document). Note that the browser does not necessarily send a
referrer URL and even if it does, a proxy server may alter or delete
it before forwarding the request to a web server. Such requests
appear under _U_n_k_n_o_w_n in the referrer URL list.
SSeellff--rreeffeerrrreerr UURRLLss
As soon as the browser detects any inline objects (images, applets,
etc.) in a page just loaded, it sends out separate requests for
those objects. If the objects reside on the same server as the page
referring to them, the corresponding referrer URLs contain the URL
of the page on your server. Such requests are called _s_e_l_f-_r_e_f_e_r_r_e_r
_U_R_L_s. If configured correctly, hhttttpp--aannaallyyzzee separates all self-
referrer URLs from the rest of the referrer URLs in the report.
This allows to separate accesses, which actually originated by using
inline objects in a text page, from the remaining (external)
accesses.
UUnniiqquuee ssiitteess
This is the number of all different hostnames or IP addresses found
in the logfile. Each different hostname is counted only once per
period, so this number shows how many systems did send requests to
your server.
SSeessssiioonnss
(color: red) Similar to unique sites, this is the number of
different hostnames or IP addresses accessing the server during a
certain _t_i_m_e-_w_i_n_d_o_w, which defaults to one day for backward
compatibility. Accesses from a known hostname outside this time-
window get accounted for as a new _s_e_s_s_i_o_n. You can increase or
decrease the time-window for sessions using the option --uu or the
SSeessssiioonn directive in the configuration file. For example, if you
set the time-window to 2 hours, all accesses from the same host in
less than 2 hours are accounted for as the same session, while any
access more than 2 hours apart from the first one is accounted for
as a new session.
Page 9 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
RReeqquueesstt MMeetthhoodd
The browser uses a certain method to request a document from a web
server. For example, documents, images, applets, etc. are usually
requested using the GGEETT method. Other often used methods are the
HHEEAADD method to request more information about a document such as its
size without have the server send its actual content, and the PPOOSSTT
method, a special way to transfer user input from forms into CGI
scripts.
Although all logfile entries with a valid request method are
accounted for as hits, only URLs requested using either the GGEETT or
the PPOOSSTT method are processed further. The remaining hits are
summarized under _R_e_q_u_e_s_t _M_e_t_h_o_d_s _o_t_h_e_r _t_h_a_n _G_E_T/_P_O_S_T.
RReessppoonnssee CCooddeess
In reply of a request from a browser, the server sends back a status
code such as a Code 200 (_O_K) or Code 404 (_N_o_t _F_o_u_n_d) response.
Similar to the request methods, the analyzer will account any valid
response code as a hit, but it will only process those URLs, which
did cause a Code 200 (_O_K), Code 304 (_N_o_t _M_o_d_i_f_i_e_d), or Code 404 (_N_o_t
_F_o_u_n_d) response from the server. All other responses are summarized
in the monthly summary page under _O_t_h_e_r _R_e_s_p_o_n_s_e _C_o_d_e_s. See the
current HTML specification at http://www.w3.org/ for information
about all valid response codes and their meaning. hhttttpp--aannaallyyzzee
recognizes HTTP/1.1 responses according to RFC2616.
UUnnrreessoollvveedd
A system identifies itself to a web server using an _I_P _n_u_m_b_e_r.
Depending on the configuration, the web server might perform a DNS
lookup to resolve the IP number into a hostname. If no hostname has
been assigned to this IP number, only the IP number is logged. Such
requests are accounted for under _U_n_r_e_s_o_l_v_e_d in the country list of
the statistics report. Since some systems intentionally have no
hostname, a percentage of up to 35% for unresolved IP numbers is
absolutely normal.
If the country list shows only 100% unresolved IP numbers, either
enable the DNS lookup in your web server or have a DNS resolver
utility preprocess the logfile before feeding the data into hhttttpp--
aannaallyyzzee. For our Commercial Service Licensees, we offer a fast DNS
resolver utility with negative caching and a history mechanism.
Visit the support site at http://support.netstore.de/ for more
information.
WWhhaatt tthhee rreeppoorrtt ddooeess NNOOTT sshhooww ......
Due to the nature of the HTTP protocol used for communication between the
browser and the server and due to the type of information available in
the server's logfile, the analyzer can not:
o identify a person as a visitor of your server,
Page 10 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
o count the number of visitors of your server,
o find out the email address of a visitor,
o track the path a visitor takes through your site,
o measure the time a visitor sees a page of your server,
o determine the last page someone saw before leaving your site,
o inform you about the sudden death of the visitor while looking at
your homepage,
o nor show any other information not recorded in the server's
logfile.
Even if you classify certain URLs as _p_a_g_e_v_i_e_w_s or use a specific time-
window to count _s_e_s_s_i_o_n_s, this does in no way tell you anything about the
number of real visitors of your server.
However, if you use an appropriate server structure with files grouped by
its content or if you use the HHiiddeeUURRLL directive to group unstructered
files together, the statistics report does show you at least a trend or a
tendency. Following the numbers for some time, you soon get a feeling
which documents are most interesting for the visitors of your site.
Page 11 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
OOOOUUUUTTTTPPPPUUUUTTTT FFFFIIIILLLLEEEESSSS
A statistics report is created in the current directory or in the output
directory specified at invocation of hhttttpp--aannaallyyzzee. All output files are
placed into separate subdirectories to reduce the number of directory
entries per report. Those subdirectories are named wwwwww_Y_Y_Y_Y, where _Y_Y_Y_Y
is the year of the period covered by the report.
The analyzer can be instructed to place files with >>private<< data such
as overviews and detailed lists of files, hosts, browser types, and
referrer URLs in a separate (>>private<<) subdirectory. The web server
then can be configured to request authentication for access of files in
this directory. See also the option --pp and the PPrriivvaatteeDDiirr directive in
the configuration file.
NNoottee:: for protection of the whole report, you would configure your web
server to request authentication for any file in the statistics output
directory. A separate private area is needed only if you want to secure
certain lists while granting access to the rest of the statistics report.
The following list shows all output files of a full statistics report in
a wwwwww_Y_Y_Y_Y directory:
iinnddeexx..hhttmmll
is the main page for the year and contains the total numbers of
_h_i_t_s, _f_i_l_e_s _s_e_n_t, _c_a_c_h_e_d _f_i_l_e_s, _p_a_g_e_v_i_e_w_s, _s_e_s_s_i_o_n_s and _d_a_t_a _s_e_n_t
per month in tabular and graphical form for the last 12 months. At
the end of the year, this file contains the values for the whole
year, while the values for the last 12 months then will be continued
in the index file for the new year. This page is displayed in the
_M_a_i_n _w_i_n_d_o_w.
ssttaattss_M_M_Y_Y..hhttmmll and ttoottaallss_M_M_Y_Y..hhttmmll
contain the total summary for the month _M_M of year _Y_Y in tabular
form. The file ttoottaallss_M_M_Y_Y..hhttmmll is the frames version of the report
in ssttaattss_M_M_Y_Y..hhttmmll. In the conventional interface, this page is
displayed in the _M_a_i_n _w_i_n_d_o_w.
jjssnnaavv..hhttmmll and nnaavv_M_M_Y_Y..hhttmmll
navigation panels for JavaScript-enabled browsers, shown in the
_N_a_v_i_g_a_t_i_o_n _w_i_n_d_o_w.
ddaayyss_M_M_Y_Y..hhttmmll
contains the numbers of _h_i_t_s, _f_i_l_e_s _s_e_n_t, _c_a_c_h_e_d _f_i_l_e_s, _p_a_g_e_v_i_e_w_s,
_s_e_s_s_i_o_n_s and _d_a_t_a _s_e_n_t per day for the month _M_M of year _Y_Y. This
report is displayed in the _M_a_i_n _w_i_n_d_o_w.
aavvllooaadd_M_M_Y_Y..hhttmmll
contains a graphical representation of the _a_v_e_r_a_g_e _h_i_t_s per
weekday/hour and the _t_o_p _s_e_c_o_n_d_s, _m_i_n_u_t_e_s, _h_o_u_r_s, and _d_a_y_s of the
period. This list appears in the _M_a_i_n _w_i_n_d_o_w.
ccoouunnttrryy_M_M_Y_Y..hhttmmll
Page 12 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
contains the list of all countries the visitors of your web server
came from. This information is determined by analyzing the _t_o_p-
_l_e_v_e_l _d_o_m_a_i_n (_T_L_D) of the hostname assigned to a system in the
_D_o_m_a_i_n _N_a_m_e _S_y_s_t_e_m (_D_N_S). The country report is displayed in the
_M_a_i_n _w_i_n_d_o_w.
Note that the country list is meaningful only for hostnames with ISO
two-letter domains. All other domains (..ccoomm, ..oorrgg, ..nneett, etc.) are
used by organizations world-wide, so they are not assigned a
country, but listed literally in the charts. The ISO country code
for the U.S. is ..uuss, by the way, not ..ccoomm ...
33DDssttaattss_M_M_Y_Y..hhttmmll, 33DDssttaattss_M_M_Y_Y..wwrrll..ggzz, 33DDssttaattss_Y_Y_Y_Y..hhttmmll, 33DDssttaattss_Y_Y_Y_Y..wwrrll..ggzz
are pre-requisite files for the 3D statistics models in the _V_i_r_t_u_a_l
_R_e_a_l_i_t_y _M_o_d_e_l_i_n_g _L_a_n_g_u_a_g_e (_V_R_M_L). Those models are created if the
option --33 is given at invocation of hhttttpp--aannaallyyzzee. To view those
models, you need a VRML2.0 compatible plug-in such as the free
_C_o_s_m_o_P_l_a_y_e_r from Cosmo Software, which is currently available for
Silicon Graphics, Windows and Macintosh systems. See
http://cosmosoftware.com/ for more information. All 3D models are
displayed in the _3_D _w_i_n_d_o_w, so that you can compare them with the
graphs in the conventional report.
ttooppuurrll_M_M_Y_Y..hhttmmll, ttooppddoomm_M_M_Y_Y..hhttmmll, ttooppuuaagg_M_M_Y_Y..hhttmmll, ttoopprreeff_M_M_Y_Y..hhttmmll
These files contain the _T_o_p _T_e_n lists (actually it's _T_o_p _N, where _N
is a configurable number) of the _f_i_l_e_s, _s_i_t_e_s, _b_r_o_w_s_e_r _t_y_p_e_s and
_r_e_f_e_r_r_e_r _U_R_L_s. The URLs shown in ttooppuurrll_M_M_Y_Y..hhttmmll are either the
real URLs requested by the visitor or an _i_t_e_m (arbitrary text) you
choosed to collect certain file names under (see the HHiiddeeUURRLL
directive in the configuration file).
The domain names shown in ttooppddoomm_M_M_Y_Y..hhttmmll are either the second-
level domains of the hosts accessing your server if the DNS name is
available or an item you choosed to collect certain hostnames under
(see the HHiiddeeSSyyss directive in the configuration file). Unresolved IP
numbers show up as _U_n_r_e_s_o_l_v_e_d.
The file ttooppuuaagg_M_M_Y_Y..hhttmmll contains a list of all different user
agents, which have been used by visitors to access your web site.
The user agent information is an identification sent by the browser
and logged by the web server. Although the format for this
identification is well-defined, it isn't obeyed by any browser. If
possible, hhttttpp--aannaallyyzzee reduces the name of the user agent in the _T_o_p
_l_i_s_t_s to the browser type including the first digit of its version
number. If it is not possible to isolate the browser type from the
user agent, the full identification string as sent by the browser is
stored.
A referrer URL is the URL of the page containing a link to your web
site, which has been followed by someone to reach your site. Note
that for manually entered URLs no referrer URL gets logged. Also,
some browsers do not send a referrer URL or send a faked one.
Entries without a referrer URL are collected under _U_n_k_n_o_w_n in the
referrer list. The list of referrer URLs is displayed in the _M_a_i_n
_w_i_n_d_o_w.
Page 13 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
ffiilleess_M_M_Y_Y..hhttmmll, ssiitteess_M_M_Y_Y..hhttmmll, aaggeennttss_M_M_Y_Y..hhttmmll, rreeffeerrss_M_M_Y_Y..hhttmmll
Those files contain a complete overview of the _f_i_l_e_s, _s_i_t_e_s, _b_r_o_w_s_e_r
_t_y_p_e_s _a_n_d _r_e_f_e_r_r_e_r _U_R_L_s , similar to the _T_o_p _N lists.
llffiilleess_M_M_Y_Y..hhttmmll, llssiitteess_M_M_Y_Y..hhttmmll, llaaggeennttss_M_M_Y_Y..hhttmmll, llrreeffeerrss_M_M_Y_Y..hhttmmll
Those files contain the detailed lists of all _f_i_l_e_s, _s_i_t_e_s, _b_r_o_w_s_e_r
_t_y_p_e_s and _r_e_f_e_r_r_e_r _U_R_L_s, similar to the previous lists, but sorted
by item (if any) and hits. On frequently accessed sites those lists
can become rather large, so they are shown in the separate _L_i_s_t
_w_i_n_d_o_w.
rrffiilleess_M_M_Y_Y..hhttmmll
contains all invalid URLs which caused the server to respond with a
_C_o_d_e _4_0_4 (_N_o_t _f_o_u_n_d) status. If there are large number of hits for
certain files the server couldn't find, it's probably due to missing
inline images or other HTML objects embedded in other pages. This
report is displayed in the _M_a_i_n _w_i_n_d_o_w.
rrssiitteess_M_M_Y_Y..hhttmmll
contains the list of reverse domains. This report is displayed in
the _M_a_i_n _w_i_n_d_o_w.
ffrraammeess..hhttmmll, hheeaaddeerr..hhttmmll
This two files are required for the frames-based user interface.
All other files are shared with the ones for the non-frames UI. In
the frames-based UI, the _M_a_i_n window is inside the frame, while the
_L_i_s_t window is still an external window. The _3_D _w_i_n_d_o_w may be
inside the frame or an external window (see the 33DDWWiinnddooww directive).
ggrr--iiccoonn..ppnngg
This small icon showing the graph from the main page is displayed on
the main page under the base directory for each statistics report.
Page 14 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
OOOOPPPPTTTTIIIIOOOONNNNSSSS
--hh print a short help list explaining the meaning of the options. Use
--hhhh to print an even more detailed help.
--dd (_d_a_i_l_y _m_o_d_e) generate a short statistics report for the current day
only. If a history file exists, the values for the previous days
will be read from this history file and the corresponding logfile
entries are skipped. If no history exist, the whole logfile will be
processed and a history file will be created (unless --nn is also
given).
--mm (_m_o_n_t_h_l_y _m_o_d_e) generate a full statistics report for a whole month.
In this mode, the values from the history file are used only to
create a summary page for the last 12 months. The timestamps from
the logfile entries feed into hhttttpp--aannaallyyzzee always take preceedence
over any records in the history unless the option --ee is specified.
--BB create buttons only and exit. The analyzer copies or links the
required files and buttons from the central directory HHAA__LLIIBBDDIIRR into
the output directory specified by --oo.
--VV (_v_e_r_s_i_o_n) print the version number of hhttttpp--aannaallyyzzee and exit.
--XX print the URL to file a bug report. Use command substitution or cut
& paste to pass this URL to your favourite browser, complete the
form and submit it.
--33 create a VRML2.0-compliant 3D model of the statistics in addition to
the regular statistics report. You need a VRML2.0 compliant plug-in
such as _C_o_s_m_o_P_l_a_y_e_r from Cosmo Software to view the resulting model.
--aa ignore all requests for URLs which required authentication. If your
statistics report is publicly available, you probably do not want to
have >>secret URLs<< listed in the report. See also the AAuutthhUURRLL
directive in the configuration file.
--ee use the history file even in full statistics (--mm) mode. If this
option is specified and you analyze the logfiles for several months
in one run, hhttttpp--aannaallyyzzee uses the results recorded in the history
file for previous months and skips all logfile entries up to the
first day of a new month not recorded in the history (usually the
current month). This option is useful if you rotate your logfile
once per quarter and want hhttttpp--aannaallyyzzee to skip all entries for
previous months which have been processed already.
--ff create an additional frames-based user interface for the statistics
report. This interface requires JavaScript.
--gg (_g_e_n_e_r_i_c _i_n_t_e_r_f_a_c_e) create a conventional (non-frames) user
interface for the statistics report without the optional
JavaScript-based navigation window. By default, the conventional
interface includes JavaScript enhancements for window control, which
Page 15 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
only become active if the user has enabled JavaScript in his/her
browser. Use this option only to completely disable JavaScript
enhancements in the report even if the user has enabled JavaScript
in the browser.
--nn (_n_o _u_p_d_a_t_e) do not update the history file. Since the history is
used in the statistics report to create the main summary page with
the results of last 12 months, this option must be used to not mess
up the statistics report when analyzing logfiles for previous months
(before the last one).
--qq do not strip arguments to CGI scripts. By default, hhttttpp--aannaallyyzzee
strips arguments from CGI URLs to be able to lump them together. If
your server creates dynamic HTML files through a CGI script, they
are reduced to the URL of the script. If --qq is specified, those
argument lists are left intact and CGI URLs with different arguments
are treated as different URLs. Note that this only works for
requests to scripts, which receive their arguments using the GGEETT,
but not the PPOOSSTT method. See the section _I_n_t_e_r_p_r_e_t_a_t_i_o_n _o_f _t_h_e
_r_e_s_u_l_t_s for an explanation of the request methods and the SSttrriippCCGGII
directive.
--vv (verbose) comment ongoing processing. Warnings are printed only in
verbose mode. Use this option to see how hhttttpp--aannaallyyzzee processes the
logfile. If --vv is doubled, a dot is printed for each new day in the
logfile.
--xx list each image URL literally rather than lumping them together
under the item _A_l_l _i_m_a_g_e_s. Without this option, hhttttpp--aannaallyyzzee
collects all requests for images (*._g_i_f, *._p_n_g, *._j_p_g, *._i_e_f, *._p_c_d,
*._r_g_b, *._x_b_m, *._x_p_m, *._x_w_d, *._t_i_f) under the item _A_l_l _i_m_a_g_e_s to
avoid cluttering up the lists with lots of image URLs. If --xx is
given, each image URL is listed literally unless matched by an
explicit HHiiddeeUURRLL directive in the configuration file.
--MM MS IIS-Mode: use case-insensitive matching for URLs. This violates
the standard, but is necessary for logfiles produced by IIS servers
to correctly identify unique URLs.
--bb _b_u_f_s_i_z_e
defines the size of the I/O buffer for reading the logfile (default:
64KB). Usually, the best size for I/O buffers is reported on a
per-file base by the operating system, but some OS report the
logical blocksize instead. If hhttttpp--aannaallyyzzee --vv reports a >>Best
buffer size for I/O<< less than or equal to 8KB, you should specify
a size of 16KB for pipes and up to 64KB for disk files to increase
the processing speed.
--cc _c_f_g_f_i_l_e
use _c_f_g_f_i_l_e as the configuration file. A configuration file allows
you to define the behaviour of hhttttpp--aannaallyyzzee and to define the >>look
& feel<< of the statistics report. See the section _C_o_n_f_i_g_u_r_a_t_i_o_n
Page 16 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
_F_i_l_e for a description of possible settings, which are called
_d_i_r_e_c_t_i_v_e_s in the following text.
--ii _n_e_w_c_f_g
create a new configuration file named _n_e_w_c_f_g. If an old
configuration file was also specified using the --cc option, older
settings are retained in the new file. Any command line options
take preceedence over old configuration file entries and will be
transformed into the corresponding directive if possible. For
example, specifying the output directory using the option --oo _o_u_t_d_i_r
will produce an entry OOuuttppuuttDDiirr _o_u_t_d_i_r in the new configuration
file.
--ll _l_i_b_d_i_r
use _l_i_b_d_i_r as the central library directory where hhttttpp--aannaallyyzzee looks
for the pre-requisite files, buttons, and license information
(usually /usr/local/lib/http-analyze). This location can also be
specified using the environment variable HHAA__LLIIBBDDIIRR.
--oo _o_u_t_d_i_r
use _o_u_t_d_i_r instead of the current directory as the output directory
for the statistics report. hhttttpp--aannaallyyzzee checks automatically for
the required files and buttons in _o_u_t_d_i_r. If they are missing or
out of date, the analyzer copies them from HHAA__LLIIBBDDIIRR into the output
directory. See also the OOuuttppuuttDDiirr and the BBttnnSSyymmlliinnkk directives.
--pp _p_r_v_d_i_r
defines the name of a >>private<< directory for the detailed lists
of _f_i_l_e_s, _s_i_t_e_s, _b_r_o_w_s_e_r_s and _r_e_f_e_r_r_e_r _U_R_L_s. Because _p_r_v_d_i_r must
reside directly under the output directory, its name may not contain
any slashes ('/'). A private directory for detailed lists may be
useful to restrict access to those lists if the rest of the
statistics report is publicly available. Note that for restricting
access to the complete statistics report, you do nnoott need to place
the detailed lists in a private directory. See also the PPrriivvaatteeDDiirr
directive.
Page 17 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
--ss _s_u_b_o_p_t,...
suppress certain lists in the report. See also the SSuupppprreessss
directive. _s_u_b_o_p_t may be:
AVLoad to suppress the average load report (top seconds/minutes/hours),
URLs to suppress the overview and list of URLs/items,
URLList to suppress the list of URLs/items only,
Code404 to suppress the list of Code 404 (_N_o_t _F_o_u_n_d) responses,
Sites to suppress the overview and list of client domains,
RSites to suppress the overview of reverse client domains,
SiteList to suppress the list of all client domains/hostnames,
Agents to suppress the overview and list of browser types,
Referrer to suppress the overview and list of referrers URLs,
Country to suppress the list of countries,
Pageviews to suppress pageview rating (cached files are shown instead),
AuthReq to suppress requests which required authentication,
Graphics to suppress images such as graphs and pie charts,
Hotlinks to suppress hotlinks in the list of all URLs,
Interpol to suppress interpolation of values in graphs.
--tt _n_u_m
defines the size of certain lists. _n_u_m is either a positive number
or the value 0 to suppress the corresponding list. You specify the
list by appending one of the following characters to the number
shown here as '#' (note that the characters are case-sensitive):
#U # is the number of entries in the Top URL list (default: 30),
#L # is the number of entries in the Least URL list (default: 10).
#S # is the number of entries in the Top domain list (default: 30),
#A # is the number of entries in the Top agent/browser list (default: 30),
#R # is the number of entries in the Top referrer URL list (default: 30),
#d # is the number of entries in the Top days table (default: 7),
#h # is the number of entries in the Top hours table (default: 24),
#m # is the number of entries in the Top minutes table (default: 5),
#s # is the number of entries in the Top seconds table (default: 5),
#N # is the size of the navigation frame (default: 120 pixels)
You can specify more than one _n_u_m with a single --tt option by
separating them with commas as in -t 20U,0L,20S. See also the TToopp**
directives in the configuration file.
--uu _t_i_m_e
defines the time-window for counting _s_e_s_s_i_o_n_s. See _S_e_s_s_i_o_n_s in the
section _I_n_t_e_r_p_r_e_t_a_t_i_o_n _o_f _t_h_e _r_e_s_u_l_t_s for an explanation of this
term.
--ww _h_i_t_s
sets the noise-level to _h_i_t_s. If a noise-level is defined, all
URLs, sites, agents and referrer URLs with hits below this level are
collected under the item _N_o_i_s_e in the _T_o_p _N lists and overviews to
avoid cluttering up those lists. See also the NNooiisseeLLeevveell directive.
--II _d_a_t_e
skip all logfile entries until this day (exclusive). The date may
be specified as _D_D/_M_M/_Y_Y_Y_Y _o_r _M_M/_Y_Y_Y_Y , where _M_M is the number or
Page 18 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
the name of a month. Note that in full statistics mode, _D_D defaults
to the first day of the month if absent. If you specify any other
day in this mode, unpredictable results may occur. For example,
-I Feb restricts the analysis to the February of the current year.
--EE _d_a_t_e
skip all logfile entries starting from this day on (inclusive). The
date format is the same as in --II. To restrict analysis to a certain
period, specify the starting date using --II and the first date to be
ignored using --EE. For example, -I Jan/99 -E Feb/99 restricts the
analysis to January 1999.
--FF _l_o_g_f_m_t
the logfile format to use. Valid keywords for _l_o_g_f_m_t are aauuttoo for
auto-sensing the logfile format, ccllff for the _C_o_m_m_o_n _L_o_g_f_i_l_e _F_o_r_m_a_t,
or ddllff and eellff for the two variants of the _W_3_C _E_x_t_e_n_d_e_d _L_o_g_f_i_l_e
_F_o_r_m_a_t. See also the section _L_o_g_f_i_l_e _F_o_r_m_a_t_s above.
--LL _l_a_n_g
use the language _l_a_n_g for warning messages and for the statistics
report. See also the directive LLaanngguuaaggee and the section _M_u_l_t_i-
_N_a_t_i_o_n_a_l _L_a_n_g_u_a_g_e _S_u_p_p_o_r_t for more information about localization of
hhttttpp--aannaallyyzzee.
--CC _c_h_r_s_e_t
force use of _c_h_r_s_e_t for the browser's encoding when displaying the
statistics report. This is needed for languages which require
special character sets such as Chinese. See also HHTTMMLLCChhaarrSSeett and
the section about _M_u_l_t_i-_N_a_t_i_o_n_a_l _L_a_n_g_u_a_g_e _S_u_p_p_o_r_t.
--GG _p_a_t_t_e_r_n,...
defines additional pageview patterns. All URLs matching one of the
_p_a_t_t_e_r_n_s are classified as pageviews (text files). If _p_a_t_t_e_r_n
starts (doesn't start) with a slash (`/'), it is treated as a prefix
(suffix) each URL is compared with. The suffix ..hhttmmll is pre-defined
by default. You can add 9 more patterns here, for example ..sshhttmmll,
..tteexxtt and //ccggii--bbiinn//. To specify more than one suffix with a single
--GG option, use commas to separate them. See also the PPaaggeeVViieeww
directive.
--HH _i_d_x_f_i_l_e,...
defines additional directory index filenames. The name iinnddeexx..hhttmmll
is pre-defined by default. hhttttpp--aannaallyyzzee truncates URLs containing
an index filename so that they merge with `/' (their >>base URL<<).
For example, /_d_i_r/_i_n_d_e_x._h_t_m_l is truncated to /_d_i_r/. You can add up
to 9 more names for directory index files, for example WWeellccoommee..hhttmmll
or hhoommee..hhttmmll. To specify more than one name with a single --HH
option, use commas to separate them. See also the IInnddeexxFFiilleess
directive.
--OO _v_n_a_m_e,...
defines additional (virtual) names for this server to be classified
Page 19 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
as _s_e_l_f-_r_e_f_e_r_r_e_r _U_R_L_s. The server's primary name (from --SS or --UU) is
pre-defined already. If _v_n_a_m_e doesn't include a protocol spcifier,
two URLs with the http and the https protocol specifier are added
for each name. To specify more than one server name with a single
--OO option, use commas to separate them. See also the VViirrttuuaallNNaammeess
directive.
--PP _p_r_o_l_o_g
use _p_r_o_l_o_g as the prolog file for a yearly VRML model (optional).
The file 33DDpprroolloogg..wwrrll is included in the distribution as an example.
Note that the resulting VRML model for a whole year may be suitable
only for viewing on a fast system such as a workstation. The
monthly VRML models do not need a prolog file and can be viewed on
any platform without problems. See also the VVRRMMLLPPrroolloogg directive.
--RR _d_o_c_r_o_o_t
restrict logfile analysis to the given Document Root. If _d_o_c_r_o_o_t is
prefixed by a `!', analysis takes place for all directories except
_d_o_c_r_o_o_t. If _d_o_c_r_o_o_t does not start with a slash (`/'), it is
interpreted as the name of a virtual server, which is matched
against the normally unused second field of a logfile entry.
Intented for use with virtual servers with a separate Document Root
or for which the hostname is recorded in the second field of a
logfile entry. See also the DDooccRRoooott directive.
--SS _s_r_v_n_a_m_e
use _s_r_v_n_a_m_e for the server name. If no server name is defined,
hhttttpp--aannaallyyzzee uses the hostname of the system it is running on. The
server name must be a full qualified domain name, not an URL. See
also the SSeerrvveerrNNaammee directive.
--TT _T_L_D_f_i_l_e
use _T_L_D_f_i_l_e for the list of valid top-level domains (TLDs). This
list currently includes all ISO two-letter country domains, the
well-known domains ..nneett, ..iinntt, ..oorrgg, ..ccoomm, ..eedduu, ..ggoovv, ..mmiill, ..aarrppaa,
..nnaattoo, and the new _C_O_R_E top-level domains ..ffiirrmm, ..iinnffoo, ..sshhoopp,
..aarrttss, ..wweebb, ..rreecc, and ..nnoomm. The length of a top-level domain in
the TLD file may not exceed 6 characters. Since hhttttpp--aannaallyyzzee uses
its built-in defaults if no TLD file is specified, you rarely will
need this option. See also the TTLLDDFFiillee directive and the sample TLD
file included in the distribution.
--UU _s_r_v_u_r_l
defines _s_r_v_u_r_l as the URL of the server to be used for hotlinks in
URL lists. Useful if the report for your web server is published on
another server. Also necessary for virtual servers to have hhttttpp--
aannaallyyzzee generate correct hypertext links in the report. See also
the SSeerrvveerrUURRLL directive.
--WW _3_D_w_i_n
defines the window for the VRML model. The keyword _3_D_w_i_n may be
either eexxtteerrnn or iinntteerrnn for display of the VRML model in a new,
Page 20 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
external window or in the lower half of the main frame respectively
(meaningful only in the frames-based interface).
--ZZ _s_h_o_w_d_o_m
defines _s_h_o_w_d_o_m as the number of components in a domain name which
make up the organizational part. This is usually the _s_e_c_o_n_d-_l_e_v_e_l
_d_o_m_a_i_n, so that the last two components of the domain name (for
example, company.com) are used as the organizationial part.
However, some countries prefer to use _t_h_i_r_d-_l_e_v_e_l _d_o_m_a_i_n_s, so that
the hostnames use 4 or more components, where the last 3 are used
for the organizational part (as in company.co.uk). To recognize
such third-level domains, _s_h_o_w_d_o_m can be set to the value 3.
Hostnames with exactly 3 components will still be reduced to their
second-level domain if _s_h_o_w_d_o_m is set to 3.
_l_o_g_f_i_l_e(_s)
This are the name(s) of the logfile(s) to process. If more than one
file is given, they are processed in the order in which their names
appear on the command line. hhttttpp--aannaallyyzzee checks for the existance
of all files before processing them. If a `-' is specified as the
filename, standard input is read. If no file is given, the analyzer
either processes the default logfile specified in the configuration
file or the standard input.
TTyyppiiccaall UUssaaggee
This is an example for the typical use of hhttttpp--aannaallyyzzee on Unix systems:
$ http-analyze -v3f -o /usr/web/htdocs/stats /usr/ns-home/logs/access.log
On Windows systems, open a DOS window, change into the directory where
you did install hhttttpp--aannaallyyzzee and run a command similar to the following:
C:> http-analyze -v3f -o c:\web\htdocs\stats c:\programs\msiis\access.log
Note that on Windows systems, hhttttpp--aannaallyyzzee searches for the required
buttons and files in the subdirectory files of the current directory it
is running in. Therefore, if you get error messages about missing
buttons make sure you did change into the directory the analyzer is
installed in (by default the installation directory is C:\Programs\RENT-
A-GURU\http-analyze2.4).
Page 21 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
CCCCOOOONNNNFFFFIIIIGGGGUUUURRRRAAAATTTTIIIIOOOONNNN FFFFIIIILLLLEEEE
You can define server-specific configuration settings for hhttttpp--aannaallyyzzee in
an _a_n_a_l_y_z_e_r _c_o_n_f_i_g_u_r_a_t_i_o_n _f_i_l_e. To have the analyzer use such a
configuration file, specify its name with the option --cc _c_f_g_f_i_l_e or the
environment variable HHAA__CCOONNFFIIGG. Note that command line options always
take preceedence over settings in a configuration file.
If the option --ii _n_e_w_c_f_g is specified, hhttttpp--aannaallyyzzee creates a
configuration template in the file _n_e_w_c_f_g. Any other command line
options present will be transformed into its appropriate definitions in
the new configuration file. The settings then can be customized further
by manually editing the configuration definitions using a standard text
editor.
To update an old configuration file into a new format, specify its name
using the option --cc in addition to --ii. This will instruct the analyzer
to retain any settings from the old file.
The configuration file contains a single directive per line. Except for
IInnddeexxFFiilleess, PPaaggeeVViieeww, AAddddDDoommaaiinn, VViirrttuuaallNNaammeess, IIggnn**, and HHiiddee**, each
directive may appear only once in the configuration file. Following a
directive field there are one or two value fields, which must be
separated from the directive and each other by one or more tabulators.
Blanks are considered part of the string in an optional third field only.
All directive names are case-insensitive. Comment lines starting with a
hash character (#) are ignored.
33DDWWiinnSSiizzee _w_i_d_t_hx_h_e_i_g_h_t
Defines the size of the 3D window (default: 520x420 pixels).
Example:
3DWinSize 540x450
33DDWWiinnddooww _k_e_y_w_o_r_d
Defines the 3D window the VRML model is displayed in (same as option
--WW). The _k_e_y_w_o_r_d may be either eexxtteerrnn (default) or iinntteerrnn for
display of the VRML model in a new, external window or in the lower
half of the main frame respectively. Example:
3DWindow intern
AAddddDDoommaaiinn _d_o_m_a_i_n _s_t_r_i_n_g
Add entries to the domain table causing certain _d_o_m_a_i_n_s to be
allocated to the mock domain _s_t_r_i_n_g. Wildcards in _d_o_m_a_i_n are
ignored. This directive is useful to collect certain hostnames (for
example the hosts of world-wide operating online services), under
some _s_t_r_i_n_g (item) instead of the country associated with the top-
level-domain. Example:
AddDomain .compuserve.com CompuServe
AddDomain .aol.com AOL
AAuutthhUURRLL _b_o_o_l_e_a_n _v_a_l_u_e
Defines whether accesses which required authentication should be
Page 22 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
skipped. By default, such URLs appear in the report just like
ordinary URLs. If AAuutthhUURRLL is set to _O_f_f, _N_o, _N_o_n_e, _F_a_l_s_e, or _0 the
analyzer skips authenticated requests in the logfile, so that they
will be suppressed from the statistics report. Example:
AuthURL No
BBttnnSSyymmlliinnkk _b_o_o_l_e_a_n _v_a_l_u_e
Creates symbolic links to the required buttons and files in HHAA__LLIIBBDDIIRR
instead of copying them into the output directory. If you are going
to analyze a large number of virtual servers which reside on the same
host, you can probably save disk space by avoiding copies of all
buttons and files into each output directory. Note that this
directive can be used only on systems which support symbolic links.
Example:
BtnSymlink Yes
CCuussttLLooggooWW _i_m_a_g_e _s_r_v_u_r_l and CCuussttLLooggooBB _i_m_a_g_e _s_r_v_u_r_l
Defines images for use as customer logos in the statistics report.
This feature is available only in the commercial version of the
analyzer. _i_m_a_g_e is the name of the image file relative to the output
directory OOuuttppuuttDDiirr and _s_r_v_u_r_l is the URL to be followed if the user
clicks on the image. To use your own logos create two images - one
for use on white backgrounds (CCuussttLLooggooWW) and another one for use on
black backgrounds (CCuussttLLooggooBB). The images should be approximately
72x72 pixels in size and must be placed into the buttons subdirectory
of the central libdir (HHAA__LLIIBBDDIIRR//bbttnn). Next time a report is
generated, the analyzer copies those logos into the output directory
and includes them in the report. Example:
CustLogoW btn/mycompany_sw.png http://www.mycompany.com/
CustLogoB btn/mycompany_sb.png http://www.mycompany.com/
DDeeffaauullttMMooddee _m_o_d_e
The default operation mode of hhttttpp--aannaallyyzzee. The value field contains
either the keyword ddaaiillyy for short statistics mode or mmoonntthhllyy for
full statistics mode (see also options --dd and --mm). If left
undefined, the default is full statistics mode (mmoonntthhllyy). Example:
DefaultMode daily
DDooccRRoooott _d_o_c_r_o_o_t
Restricts logfile analysis to the given Document Root (same as option
--RR). If _d_o_c_r_o_o_t is prefixed by a `!', analysis takes place for all
directories except _d_o_c_r_o_o_t. If _d_o_c_r_o_o_t does not start with a slash
(`/'), it is interpreted as the name of a virtual server, which is
matched against the normally unused second field of a logfile entry.
Useful for virtual servers with a separate Document Root. NNoottee:: Do
not define this directive to analyze the whole server. Explicitely
setting DDooccRRoooott to `/' (the default) only increases processing time.
Example:
DocRoot /customer/
DocRoot www.customer.com
HHTTMMLLCChhaarrSSeett _c_h_r_s_e_t
Page 23 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
Force use of _c_h_r_s_e_t for the browser's encoding when displaying the
statistics report (same as option --CC). This is needed for languages
which require special character sets such as Chinese. See also the
section about _M_u_l_t_i-_N_a_t_i_o_n_a_l _L_a_n_g_u_a_g_e _S_u_p_p_o_r_t. Example:
HTMLCharSet iso-8859-1
HHTTMMLLPPrreeffiixx _p_r_e_f_i_x and HHTTMMLLTTrraaiilleerr _t_r_a_i_l_e_r
The HTML _p_r_e_f_i_x and _t_r_a_i_l_e_r to be inserted into the statistics output
files at the top and bottom of the page. If defined, the HHTTMMLLPPrreeffiixx
string must include the <BODY> tag. To read the HTML code from a
file, specify its name as the _p_r_e_f_i_x or _t_r_a_i_l_e_r. Example:
HTMLPrefix <BODY BGCOLOR="#FF0000">
HTMLTrailer <A HREF="/intern/">Back</A> to the internal page.
HHeeaaddFFoonntt _f_o_n_t_l_i_s_t, TTeexxttFFoonntt _f_o_n_t_l_i_s_t and LLiissttFFoonntt _f_o_n_t_l_i_s_t
The fonts to use for headers, for regular text, and for the detailed
lists. If unset, the analyzer uses a list of common serif-less fonts
for headers and regular text and a monospaced (fixed) font for the
detailed lists. To force the navigator's default for fonts, use the
keyword ddeeffaauulltt as the fontname. Example:
HeadFont Helvetica,Arial,Geneva,sans-serif
TextFont Helvetica,Arial,Geneva,sans-serif
ListFont Courier,monospaced
HHeeaaddSSiizzee _s_i_z_e, TTeexxttSSiizzee _s_i_z_e, SSmmaallllSSiizzee _s_i_z_e and LLiissttSSiizzee _s_i_z_e
The font sizes for headings (navigator default, usually 3), regular
text (default: 2), small text (default: 1) and lists (default: 2).
TTeexxttSSiizzee replaces the former FFoonnttSSiizzee, which is still recognized for
backward compatibility with older configuration files. Example:
HeadSize 4
SmallSize 2
HHiiddeeAAggeenntt _a_g_e_n_t _s_t_r_i_n_g
Hide a browser type under an arbitrary _s_t_r_i_n_g (item). Needed only
for a certain browser whose vendor still can't spell its name
correctly. Only the leading part of the browser type is compared
against _a_g_e_n_t, so no wildcards are needed in the second field.
Example:
HideAgent Mozilla/4.0 (compatible; MSIE 4. MSIE 4.*
HideAgent Mozilla/3.0 (compatible; MSIE 3. MSIE 3.*
HHiiddeeRReeffeerr _r_e_f_e_r_r_e_r _s_t_r_i_n_g
Hide certain referrer URLs under an arbitrary _s_t_r_i_n_g (item). Useful
to map different referrer URLs for a given host to a common name.
Since only the leading string of the referrer URL is compared against
_r_e_f_e_r_r_e_r, there is no need to specify wildcards. As in HHiiddeeAAggeenntt, a
wildcard suffix is removed from the string, while a wildcard prefix
is taken literal.
If the second argument contains a string in square brackets, this
defines the CGI parameter which specifies the search key for search
engines. In this case, the search key will be extracted from the
Page 24 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
argument list and prominently displayed after the name of the search
engine/web server. See also the configuration file template produced
by hhttttpp--aannaallyyzzee --ii for more examples hot to use the HHiiddeeRReeffeerr
directive. Example:
HideRefer http://www.altavista.com/ AltaVista [q=]
HideRefer http://lycospro.lycos.com/ Lycos [query=]
HideRefer http://www.excite.com/ Excite [search=]
HideRefer http://www.dino-online.de/ Dino Online [query=]
HHiiddeeSSyyss _h_o_s_t_n_a_m_e _s_t_r_i_n_g
Hide a _h_o_s_t_n_a_m_e under an arbitrary _s_t_r_i_n_g (item). The string may
contain blanks. If the first character of _s_t_r_i_n_g is a `[', this item
is suppressed in the _T_o_p _N lists. Hidden items are accounted for
separately, but in the summary they are collected under the
description defined with this directive. You may use the wildcard
character `*' as either a prefix or as a suffix of the _h_o_s_t_n_a_m_e (as
in **..hhoosstt..ccoomm and 119922..116688..1122..**), bot not as both. Hostnames are
case-insensitive.
When building the list of countries, hhttttpp--aannaallyyzzee determines the
country from the top-level domain given in _h_o_s_t_n_a_m_e. If _h_o_s_t_n_a_m_e is
an IP number, you can optionally define the top-level domain to be
accounted for by appending it in square brackets to the _s_t_r_i_n_g as
shown in the last example below. Example:
HideSys *.mycompany.com MY COMPANY
HideSys 192.168.12.* MY COMPANY [US]
HHiiddeeUURRLL_u_r_l _s_t_r_i_n_g
Hide an _U_R_L under an arbitrary _s_t_r_i_n_g (item). The string may contain
blanks. If the first character of _s_t_r_i_n_g is a `[', this item is
suppressed in the _T_o_p _N lists. Hidden items are accounted for
separately, but in the summary they are collected under the
description defined with this directive. You may use the wildcard
character `*' as either a prefix or as a suffix of the _U_R_L (as in
**..mmaapp and //ssuubbddiirr//**), bot not as both. URLs are case-sensitive as
required by the HTTP standard. If the option --MM is specified, URLs
will become case-insensitive for compatibility with non-compliant web
servers. Note that images are hidden automatically under _A_l_l _i_m_a_g_e_s
by default unless --xx is specified. Example:
HideURL *.map [All image maps]
HideURL /robots.txt [Robot control file]
HideURL /newsletter/* MyCompany Monthly Newsletter
HideURL /products/* MyCompany Products
HideURL /~delta-t/ DELTA-t Homepage
HideURL /~delta-t/* DELTA-t more pages
IIggnnUURRLL _u_r_l and IIggnnSSyyss _h_o_s_t_n_a_m_e
Ignore entries with a specific URL or accesses from a certain system.
You may use the wildcard character `*' as either a prefix or as a
suffix of the URL or the hostname (as in **..ppnngg, //ssuubbddiirr//ffiillee** and
**..hhoosstt..ccoomm), but not as both. Note that all logfile entries are
compared against this list while hhttttpp--aannaallyyzzee reads the logfile
Page 25 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
opposed to the HHiiddeeUURRLL and HHiiddeeSSyyss directives, which are looked up
for when all entries have been reduced to the set of unique URLs and
hostnames, respectively. Therefore, many IIggnnUURRLL/IIggnnSSyyss definitions
will significantly increase processing time of hhttttpp--aannaallyyzzee.
Example:
IgnURL *.gif,*.png,*.jpg,*.jpeg
IgnURL /stats/
IInnddeexxFFiilleess _i_d_x_f_i_l_e[,_i_d_x_f_i_l_e...]
Defines additional directory index filenames (same as option --HH).
The name iinnddeexx..hhttmmll is pre-defined by default. hhttttpp--aannaallyyzzee
truncates URLs containing an index filename so that they merge with
`/' (their >>base URL<<). For example, /_d_i_r/_i_n_d_e_x._h_t_m_l is truncated
to /_d_i_r/. You can add up to 9 more names for directory index files.
Note that each name requires another table lookup, which may
significantly increase processing time. Example:
IndexFiles Welcome.html,home.html,index.htm
LLaanngguuaaggee _l_a_n_g
Use the language _l_a_n_g for warning messages and for the statistics
report (same as option --LL). See the section _M_u_l_t_i-_N_a_t_i_o_n_a_l _L_a_n_g_u_a_g_e
_S_u_p_p_o_r_t for more information about localization of hhttttpp--aannaallyyzzee.
Example:
Language de
LLooggFFiillee _f_i_l_e_n_a_m_e
The name of the server's logfile. If you define a default name for
the logfile, this file is processed if no other filenames are
explicitely specified on the command line. If no logfile is
specified, hhttttpp--aannaallyyzzee always reads _s_t_d_i_n. Example:
LogFile /usr/ns-home/logs/access
LLooggFFoorrmmaatt _l_o_g_f_m_t
Use this logfile format. Valid values for _l_o_g_f_m_t are aauuttoo for auto-
sensing the logfile format, ccllff for the _N_C_S_A _C_o_m_m_o_n _L_o_g_f_i_l_e _F_o_r_m_a_t,
or ddllff and eellff for the two supported variants of the _W_3_C _E_x_t_e_n_d_e_d
_L_o_g_f_i_l_e _F_o_r_m_a_t. See the section _L_o_g_f_i_l_e _F_o_r_m_a_t_s for a detailed
description of those formats. Example:
LogFormat clf
MMSSIIIISSmmooddee _b_o_o_l_e_a_n _v_a_l_u_e
Use case-insensitive string comparison for URLs. Needed for MS IIS
which makes no difference between upper- and lower-case characters.
MS users may regard this as an enhancement, while for the rest of the
world this is just a violation of the RFC2616 HTTP standard and
should be ignored. Example:
MSIISmode Yes
NNaavvWWiinnSSiizzee _w_i_d_t_hx_h_e_i_g_h_t
Defines the size of the navigation window which pops up in the
conventional interface if JavaScript is enabled. Useful if the
browser displays scrollbars when using the default size of 420x190
Page 26 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
pixels. Example:
NavWinSize 440x200
NNaavviiggFFrraammee _s_i_z_e
Defines the size of the navigation frame in pixels. Useful if the
browser displays scrollbars when using the default size of 120
pixels. Example:
NavigFrame 140
NNooiisseeLLeevveell _h_i_t_s
Sets the noise-level to _h_i_t_s. If a noise-level is defined, all URLs,
sites, agents and referrer URLs with hits below this level are
collected under the item _N_o_i_s_e in the _T_o_p _N lists and overviews to
avoid cluttering up those lists. Example:
NoiseLevel 7
OOuuttppuuttDDiirr _d_i_r_e_c_t_o_r_y
The name of the directory where the output files of the statistics
report should be created (same as option --oo). By default, the output
directory is the current directory. Example:
OutputDir /usr/web/htdocs/stats
PPaaggeeVViieeww _p_a_t_t_e_r_n[,_p_a_t_t_e_r_n...]
Defines additional pageview patterns (same as option --GG). All URLs
matching one of the _p_a_t_t_e_r_n_s are classified as pageviews (text
files). If _p_a_t_t_e_r_n starts (doesn't start) with a slash (`/'), it is
treated as a prefix (suffix) each URL is compared with. The suffix
..hhttmmll is pre-defined by default. You can add 9 more patterns here,
for example ..sshhttmmll, ..tteexxtt and //ccggii--bbiinn//. Note that each pattern
requires another table lookup, which may significantly increase
processing time. Example:
PageView .shtml,.text,/cgi-bin/
PPrriivvaatteeDDiirr _p_r_v_d_i_r
Defines the name of a >>private<< directory for the detailed lists of
_f_i_l_e_s, _s_i_t_e_s, _b_r_o_w_s_e_r_s and _r_e_f_e_r_r_e_r _U_R_L_s (same as option --pp).
Because _p_r_v_d_i_r must reside directly under the output directory, its
name may not contain any slashes (`/'). A private directory for
detailed lists may be useful to restrict access to those lists if the
rest of the statistics report is publicly available. Note that for
restricting access to the complete statistics report, you do nnoott need
to place the detailed lists in a private directory. Example:
PrivateDir lists
RReeggIInnffoo _c_u_s_t_o_m_e_r__n_a_m_e _r_e_g_i_s_t_r_a_t_i_o_n__I_D
Defines the customer's name and the registration ID, which are both
shown on the main page in the summary report. Example:
RegInfo MyCompany 3745JMJZ00000311300000682344
RReeppoorrttTTiittllee _t_i_t_l_e
The document title to use in the statistics report. Example:
ReportTitle Access Statistics for MyCompany
Page 27 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
SSeerrvveerrNNaammee _s_r_v_n_a_m_e
The official name of the server (same as option --SS). If no server
name is defined, hhttttpp--aannaallyyzzee uses the hostname of the system it is
running on. The server name must be a full qualified domain name,
not an URL. Example:
ServerName www.mycompany.com
SSeerrvveerrUURRLL _s_r_v_u_r_l
The URL of the server to be used for hotlinks in URL lists (same as
option --UU). Useful if the report for your web server is published on
another server. Also necessary for virtual servers to have hhttttpp--
aannaallyyzzee generate correct hypertext links in the report. Example:
ServerURL http://www.mycompany.com
SSeessssiioonn _t_i_m_e
The time-window for counting _s_e_s_s_i_o_n_s. All unique hosts accessing
your server more than once inside this time-window are accounted for
as the same session. If the distance between two adjacend accesses
from the same host is greater than the time-window, the accesses from
this host are accounted for as different sessions. Example:
Session 4 hours
SShhoowwDDoommaaiinn _n_u_m_b_e_r
Defines the number of components in a domain name which make up the
organizational part (same as option --ZZ). This is usually the
_s_e_c_o_n_d-_l_e_v_e_l _d_o_m_a_i_n, so that the last two components of the domain
name (for example, company.com) are used as the organizationial part.
However, some countries prefer to use _t_h_i_r_d-_l_e_v_e_l _d_o_m_a_i_n_s, so that
the hostnames use 4 or more components, where the last 3 are used for
the organizational part (as in company.co.uk). To recognize such
third-level domains, _S_h_o_w_D_o_m_a_i_n can be set to the value 3. Hostnames
with exactly 3 components will still be reduced to their second-level
domain if _S_h_o_w_D_o_m_a_i_n is set to 3. Example:
ShowDomain 3
SSttrriippCCGGII _b_o_o_l_e_a_n _v_a_l_u_e
Do not strip arguments to CGI scripts (same as option --qq). By
default, hhttttpp--aannaallyyzzee strips arguments from CGI URLs to be able to
lump them together. If your server creates dynamic HTML files
through a CGI script, they are reduced to the URL of the script. If
SSttrriippCCGGII is set to _O_f_f, _N_o, _N_o_n_e, _F_a_l_s_e or _0, those argument lists
are left intact and CGI URLs with different arguments are treated as
different URLs. Note that this only works for requests to scripts,
which receive their arguments using the GGEETT, but not the PPOOSSTT method.
See the section _I_n_t_e_r_p_r_e_t_a_t_i_o_n _o_f _t_h_e _r_e_s_u_l_t_s for an explanation of
the request methods. Example:
StripCGI No
SSuupppprreessss _s_u_b_o_p_t,...
Suppress certain lists in the report (same as --ss). _s_u_b_o_p_t may be one
of:
AVLoad to suppress the average load report (top seconds/minutes/hours),
Page 28 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
URLs to suppress the overview and list of URLs/items,
URLList to suppress the list of URLs/items only,
Code404 to suppress the list of Code 404 (_N_o_t _F_o_u_n_d) responses,
Sites to suppress the overview and list of client domains,
RSites to suppress the overview of reverse client domains,
SiteList to suppress the list of all client domains/hostnames,
Agents to suppress the overview and list of browser types,
Referrer to suppress the overview and list of referrers URLs,
Country to suppress the list of countries,
Pageviews to suppress pageview rating (cached files are shown instead),
AuthReq to suppress requests which required authentication,
Graphics to suppress images such as graphs and pie charts,
Hotlinks to suppress hotlinks in the list of all URLs,
Interpol to suppress interpolation of values in graphs.
Example:
Suppress Country,Interpol
TTLLDDFFiillee _f_i_l_e_n_a_m_e
Use _f_i_l_e_n_a_m_e for the list of top-level domains (same as option --TT).
This list includes all ISO two-letter country domains, the well-known
domains ..nneett, ..iinntt, ..oorrgg, ..ccoomm, ..eedduu, ..ggoovv, ..mmiill, ..aarrppaa, ..nnaattoo, and
the new _C_O_R_E top-level domains ..ffiirrmm, ..iinnffoo, ..sshhoopp, ..aarrttss, ..wweebb,
..rreecc, and ..nnoomm. The length of a domain in the TLD file may not
exceed 6 characters. Since hhttttpp--aannaallyyzzee uses its built-in defaults
if no TLD file is specified, you rarely will need this directive.
Example:
TLDFile /usr/local/lib/http-analyze/TLD
TTbbllFFoorrmmaatt _t_b_l_n_a_m_e _s_p_e_c_i_f_i_e_r
Defines the layout of tables in the statistics report. The argument
_t_b_l_n_a_m_e may be one of:
Month for the statistics of the last 12 months (main page)
Day for the daily statistics in the short and full summaries
Load for the average load by weekday, hour, minute, second
Country for the list of countries
TopTen for all _T_o_p _N lists
Overview for all overviews
Lists for all detailed lists (preformatted text)
NotFound for the list of _N_o_t_F_o_u_n_d responses
The _s_p_e_c_i_f_i_e_r string defines the items to be shown in the table:
n, N an index number or label (don't touch!)
h, H the number of _h_i_t_s
f, F the number of _f_i_l_e_s _s_e_n_t
c, C the number of _c_a_c_h_e_d _f_i_l_e_s
p, P the number of _p_a_g_e_v_i_e_w_s
s, S the number of _s_e_s_s_i_o_n_s
k, K the amount of _d_a_t_a _s_e_n_t in Kbytes (integer value)
B the amount of _d_a_t_a _s_e_n_t in bytes (float value)
L a dynamically created label (don't touch!)
If a format specifier is used in upper-case, the value displayed in
the report will include the percentage for this number. Example:
TblFormat Month n h f c p s k
Page 29 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
TblFormat Day n H F C P S k
TblFormat Country N H F P S k L
TToopp{DDaayyss,,HHoouurrss,,MMiinnuutteess,,SSeeccoonnddss,,UURRLLss,,SSiitteess,,AAggeennttss,,RReeffeerrss}, LLeeaassttUURRLLss
Defines the size of certain _T_o_p _N tables and lists. If set to zero,
the corresponding list will be suppressed. Example:
TopURLs 20
LeastURLs 0
TopDays 14
VViirrttuuaallNNaammeess _v_n_a_m_e,...
The list of additional (>>virtual<<) names for this server to be
classified as _s_e_l_f-_r_e_f_e_r_r_e_r _U_R_L_s. The server's primary name (from
SSeerrvveerrNNaammee or SSeerrvveerrUURRLL) is pre-defined already. If _v_n_a_m_e doesn't
include a protocol specifier, two URLs with the http and the https
protocol specifier will be added for each name. Since self-referrers
are suppressed from the list of referrer URLs, the remaining entries
give a good impression about external pages referring to some
document on your site. Example:
VirtualNames www2.mycompany.com,mycompany.com
VirtualNames www.customer.com,customer.com
VirtualNames http://www.other.com,https://secure.other.com
VVRRMMLLPPrroolloogg _f_i_l_e
The name of a prolog file for a yearly VRML model (same as option
--PP). Pathnames not beginning with a `/' are relative to OOuuttppuuttDDiirr.
If a prolog file is given, an additional yearly model with all
12 monthly models embedded as inlines is created. See the section
_O_u_t_p_u_t _f_i_l_e_s for further information about this yearly model.
Example:
VRMLProlog 3Dprolog.wrl
MMMMUUUULLLLTTTTIIII----NNNNAAAATTTTIIIIOOOONNNNAAAALLLL LLLLAAAANNNNGGGGUUUUAAAAGGGGEEEE SSSSUUUUPPPPPPPPOOOORRRRTTTT
hhttttpp--aannaallyyzzee supports _M_u_l_t_i-_N_a_t_i_o_n_a_l-_L_a_n_g_u_a_g_e-_S_u_p_p_o_r_t (_M_N_L_S) according to
the _X/_O_p_e_n _P_o_r_t_a_b_i_l_i_t_y _G_u_i_d_e (_X_P_G_4) and the _S_y_s_t_e_m _V _I_n_t_e_r_f_a_c_e _D_e_f_i_n_i_t_i_o_n
(_S_V_R_4). For systems without MNLS support, a simple native implementation
is used. See the file INSTALL included in the distribution for
information about installation of the appropriate MNLS support for your
system. The option --VV displays the type of MNLS support compiled into a
binary.
All text strings and messages of hhttttpp--aannaallyyzzee are contained in a separate
message catalog, which is read at start-up of the program. If a message
catalog is installed in the system, you can select the language to be
used for warning messages and for the statistics report by setting the
appropriate _l_o_c_a_l_e. This can be done by defining the LANG (_X_P_G_4/_S_V_R_4
_M_N_L_S) or the HA_LANG (_n_a_t_i_v_e _M_N_L_S) environment variable or by using the
option --LL. When using --LL, the analyzer switches to the specified
language when it has recognized the option. If no message catalog exists
for the specified locale, hhttttpp--aannaallyyzzee uses built-in messages in english
language.
Page 30 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
Certain languages require a specific character set to be used by the
browser when displaying the statistics report. This can be defined using
the option --cc or the CChhaarrSSeett directive. The following table summarizes
the most common combinations of languages and character sets. Note that
the name of the locale is system-specific (for example, ddee could be ddee--
iissoo88885599 on some systems.
_C_o_u_n_t_r_y _L_o_c_a_l_e _E_n_c_o_d_i_n_g
Standard C C us-ascii
Arabic Countries ar iso-8859-6
Belarus be iso-8859-5
Bulgaria bg iso-8859-5
Czech Republic cs iso-8859-2
Denmark da iso-8859-1
Germany de iso-8859-1
Greece el iso-8859-7
Spain es iso-8859-1
Mexico es_MX iso-8859-1
Finland fi iso-8859-1
France fr iso-8859-1
Switzerland fr_CH iso-8859-1
Croatia hr iso-8859-2
Hungary hu iso-8859-2
Iceland is iso-8859-1
Italy it iso-8859-1
Israel iw iso-8859-8
Japan ja Shift_JIS or iso-2022-jp
Korea ko EUC-kr or iso-2022-kr
Netherlands nl iso-8859-1
Belgium nl_BE iso-8859-1
Norway no iso-8859-1
Poland pl iso-8859-2
Portugal pt iso-8859-1
Russia ru KOI8-R or iso-8859-5
Sweden sv iso-8859-1
Chinese zh big5
Since the message catalogs are independent from the base software, more
languages may become available without having to re-compile or re-install
the software. Please visit the homepage of hhttttpp--aannaallyyzzee for up-to-date
information about the available languages. For more information about
localization, see _e_n_v_i_r_o_n(_5) and _s_e_t_l_o_c_a_l_e(_3) in the online manual.
Page 31 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
EEEEXXXXAAAAMMMMPPPPLLLLEEEESSSS
After successful compilation of hhttttpp--aannaallyyzzee you can test-run the
analyzer before installing it permanently. Just create a subdirectory
for the output files and run hhttttpp--aannaallyyzzee on either one of the sample
logfiles included in the distribution (as shown below) or use your web
server's logfile. For example, to create a full statistics including a
frames-based interface and a 3D VRML model in the subdirectory tteessttdd, use
the following commands:
$ cd http-analyze2.4
$ mkdir testd
$ http-analyze -vm3f -o testd files/logfmt.elf
http-analyze 2.4 (IP22; IRIX 6.2; XPG4 MNLS; PNG)
Copyright 1999 by RENT-A-GURU(TM)
Generating full statistics in output directory `testd'
Reading data from `files/logfmt.elf'
Best blocksize for I/O is set to 64 KB
Hmm, looks like Extended Logfile Format (ELF)
Start new period at 01/Jan/1999
Creating VRML model for January 1999
Creating full statistics for January 1999
... processing URLs
... processing hostnames
... processing user agents
... processing referrer URLs
Total entries read: 8, processed: 8
Clear almost all counters at 03/Jan/1999
Start new period at 01/Feb/1999
No more hits since 02/Feb/1999
Creating VRML model for February 1999
Creating full statistics for February 1999
... processing URLs
... processing hostnames
... processing user agents
... processing referrer URLs
... updating `www1999/index.html': last report is for February 1999
Total entries read: 3, processed: 3
Statistics complete until 28/Feb/1999
$
To view the statistics report, start your browser and open the file
tteessttdd//iinnddeexx..hhttmmll.
For permanent installation of hhttttpp--aannaallyyzzee, issue a make install to copy
the required files into the appropriate directory. The executable is
usually installed in /usr/local/bin, while the required buttons and files
are placed under /usr/local/lib/http-analyze unless this has been changed
by defining the HHAA__LLIIBBDDIIRR make macro during installation.
Note that you do not need to install files in a new statistics output
directory anymore if they have been installed in HHAA__LLIIBBDDIIRR; this is now
done automatically by hhttttpp--aannaallyyzzee if it runs the first time on this
Page 32 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
output directory.
Following are some more examples, which assume that the analyzer has been
installed permanently. The first command processes an archived logfile
_l_o_g_Y_Y_Y_Y/_a_c_c_e_s_s._M_M from the server's log directory to create a report for
January 1999 in the directory //uussrr//wweebb//hhttddooccss//ssttaattss:
$ cd /usr/ns-home/logs
$ http-analyze -vm3f -o /usr/web/htdocs/stats log1999/access.01
The next command uncompresses the logfiles for a whole year and feeds the
data via a pipe into the analyzer, which then creates a statistics report
for this period. All options are passed to the analyzer through a
customized configuration file specified with --cc:
$ gzcat log1998/access.[01]?.gz | http-analyze -c /usr/httpd/analyze.conf -
The following command creates a configuration file template with the name
ssaammppllee..ccoonnff. Any additional options will be transformed into the
appropriate directives in the new configuration file. In this example,
the server's name specified with --SS is transformed into a SSeerrvveerrNNaammee
directive and the output directory specified with --oo is transformed into
a OOuuttppuuttDDiirr directive. All other directives are set to their respective
default value. To further customize any settings, use a standard text
editor.
$ http-analyze -i sample.conf -S www.myserver.com -o /usr/web/htdocs/stats
To update an old configuration file into the new format while retaining
any old settings, specify its name when creating the new file. Again,
command line options may be used to alter certain settings; they take
preceedence over definitions in the old configuration file. The
following command reads the file oollddffiillee..ccoonnff and transforms its content
into a new file named nneewwffiillee..ccoonnff:
$ http-analyze -c oldfile.conf -i newfile.conf
RRRREEEEGGGGUUUULLLLAAAARRRR IIIINNNNVVVVOOOOCCCCAAAATTTTIIIIOOOONNNN VVVVIIIIAAAA CCCCRRRROOOONNNN
Although hhttttpp--aannaallyyzzee can be run manually to process logfiles, it usually
is executed automatically on a regular base. On Unix systems you use the
_c_r_o_n(_1) utility, while Windows systems provide a similar functionality
with the _A_T command. To have your statistics report updated
automatically, use the following scheme:
1) Install a cron job which calls hhttttpp--aannaallyyzzee --mm33ff to create a full
statistics report once per hour or twice per day depending on the
processing load caused by analyzing the logfile. Note that the
Page 33 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
full statistics report is created for the first time at the
second day of a new month.
2) Optionally install a cron job which calls hhttttpp--aannaallyyzzee --dd more
often to create a short statistics report. Although this will
only update the _H_i_t_s _b_y _d_a_y section of the report, the advantage
of the short statistics mode is that hhttttpp--aannaallyyzzee needs only a
fraction of the time required to create a full statistics report.
However, this is only needed if the total time needed to create
full statistics reports requires more than 15 minutes.
3) Install a shell script which rotates (saves) the server's
logfile, restarts the web server, and then creates the final
summary for this period. Have _c_r_o_n execute this script at 00:00
on the ffiirrsstt ddaayy of a new month. See the script rroottaattee--hhttttppdd for
an example how to do this for several virtual web servers at
once.
4) Because of delays in execution of the script which rotates the
logfile, heavy used servers sometimes writes a few entries for
the new month in the old logfile. hhttttpp--aannaallyyzzee usually detects
and ignores such >>noise<< appearing at the end of a logfile.
However, to initialize the files for the new month, you should
run hhttttpp--aannaallyyzzee --mm33ff on the logfile for the current month
immediately after the statistics for the previous month have been
generated.
Note that all cron jobs must run with the user ID of the owner of the
output directory except for rroottaattee--hhttttppdd, which must run with the user ID
of the server user. This is a sample _c_r_o_n_t_a_b(1) for the scheme described
above:
# Generate a full statistics report twice per day at 01:17 and 13:17
17 1,13 * * * /usr/local/bin/http-analyze -m3f -c /usr/httpd/analyze.conf
# Generate a short statistics report each hour except at 01:17 or 13:17
17 2-12 * * * /usr/local/bin/http-analyze -d -c /usr/httpd/analyze.conf
17 14-23 * * * /usr/local/bin/http-analyze -d -c /usr/httpd/analyze.conf
# Rotate the logfiles at the first day of a new month at 00:00
0 0 1 * * /usr/local/bin/rotate-httpd
PPPPEEEERRRRFFFFOOOORRRRMMMMAAAANNNNCCCCEEEE CCCCOOOONNNNSSSSIIIIDDDDEEEERRRRAAAATTTTIIIIOOOONNNNSSSS
The processing time needed to create full statistics reports depends on
many factors:
o The size of the I/O buffer (reported by hhttttpp--aannaallyyzzee when --vv is
given) should be as big as possible. For example, a buffer size
of 64KB can significantly reduce disk activity when reading the
logfile.
o If many IIggnn** directives are defined, the analyzer must compare
each logfile entry against each entry in the corresponding IIggnn**
list. The recommended way to suppress certain parts of the web
Page 34 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
server in the statistics report is to have the server not record
any accesses to those areas in the logfile. Similar, many HHiiddee**
directives may also require additional table lookups, although
this will happen only once for each unique (different) URLs,
sitename, browser type or referrer URL.
o If SSttrriippCCGGII is set to NNoo, this will require more memory.
o Some systems impose a memory limit on a per-process base (see
_u_l_i_m_i_t(_1) and _s_e_t_r_l_i_m_i_t(_3)). There are no unusual requirements
regarding main memory needed by hhttttpp--aannaallyyzzee - to be precise that
means >>the bigger, the better<< -, but you should make sure that
about 5-10MB is available for processing of a medium-size
logfile.
TTTTRRRROOOOUUUUBBBBLLLLEEEESSSSHHHHOOOOOOOOTTTTIIIINNNNGGGG
If you discover any problems using the analyzer you may find the verbose
mode helpful. Each --vv option increases the verbosity level. In verbosity
level 1, hhttttpp--aannaallyyzzee comments ongoing processing; in level 2 it
indicates progress by printing a dot for each new day discovered in the
logfile. In level 3, a debug message for each logfile entry parsed
successfully is printed and in level 4 an even more detailed message
appears on standard error. Furthermore, compiling hhttttpp--aannaallyyzzee without
the macro _N_D_E_B_U_G includes various assertion checks in the executable.
$ http-analyze -vvvm3f -o testd files/logfmt.elf
http-analyze 2.4 (IP22; IRIX 6.2; XPG4 MNLS; PNG)
Copyright 1999 by RENT-A-GURU(TM)
Generating full statistics in output directory `testd'
Reading data from `files/logfmt.elf'
Best blocksize for I/O is set to 64 KB
Hmm, looks like Extended Logfile Format (ELF)
1 01/Jan/1999:16:37:25 [298971279], req="GET /", sz=280 <- OK (Code 200), PAGEVIEW
Start new period at 01/Jan/1999
2 01/Jan/1999:16:38:39 [298971355], req="GET /def/", sz=910 <- OK (Code 200), PAGEVIEW
3 02/Jan/1999:16:39:39 [299060697], req="GET /abc/", sz=910 <- OK (Code 200), PAGEVIEW
...
FFiilliinngg bbuugg--rreeppoorrttss
If you want to file a bug report, use the option --XX to have hhttttpp--aannaallyyzzee
generate an URL of a bug reporting form with some information already
filled in. You can pass this URL to your favourite browser using
cut&paste or - on Unix systems - using command substitution as in:
$ netscape `http-analyze -X`
This address a bug report form on http://support.netstore.de/ with the
following information filled in already:
o the customer's name as specified in the registration
o the registration ID with licensing information
Page 35 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
(Personal/Commercial License)
o the version number of hhttttpp--aannaallyyzzee
o the platform the program was compiled for.
Using this interface to submit report bugs will ensure proper handling
and timely response. Please note that although we gladly accept bug
reports from everyone, only Commercial Service Licensees are entitled to
request technical assistance or open a support call.
RRRREEEEGGGGIIIISSSSTTTTRRRRAAAATTTTIIIIOOOONNNN
hhttttpp--aannaallyyzzee is available through our web site for evaluation purposes.
In the evaluation version an >>unregistered version<< button will show up
in the statistics report. To replace this button with the Netstore(R)
logo of the free version for personal and educational use, just click on
the >>unregistered version<< button to follow the link to our online
registration form on our web site and register for a free, non-commercial
version.
NNNNOOOONNNN----CCCCOOOOMMMMMMMMEEEERRRRCCCCIIIIAAAALLLL VVVVEEEERRRRSSSSIIIIOOOONNNN
After registration you will receive a registration ID and two
registration images as replacements for the >>unregistered version<<
button by email. In the free version, the Netstore(R) logo, a copyright
note and a link to the homepage of hhttttpp--aannaallyyzzee appears in the statistics
report, which must be left intact according to the license under which
this software is made available to you.
CCCCOOOOMMMMMMMMEEEERRRRCCCCIIIIAAAALLLL VVVVEEEERRRRSSSSIIIIOOOONNNN
If you use hhttttpp--aannaallyyzzee for commercial purposes such as providing
statistics services for your customers, you must buy a _C_o_m_m_e_r_c_i_a_l _S_e_r_v_i_c_e
_L_i_c_e_n_s_e available from RENT-A-GURU(R) and its authorized resellers. You
will receive a registration ID and two registration images as
replacements for the >>unregistered version<< button by email from our
office.
In the commercial version, the Netstore(R) logo, the copyright note and
the link to the homepage of hhttttpp--aannaallyyzzee are supressed from the
statistics report - except for the logo and copyright note, which appears
only once on the main page and inside the navigation frame. On all other
pages, your company's name is shown. Additionally, you can add your
company's logo to the report using the CCuussttLLooggooWW and CCuussttLLooggooBB directives
in the configuration file, which are enabled in the commercial version
only. Except for this feature and the individual support for Commercial
Service Licensees, both versions of the software have identical
functionality.
BBBBRRRRAAAANNNNDDDDIIIINNNNGGGG TTTTHHHHEEEE SSSSOOOOFFFFTTTTWWWWAAAARRRREEEE
For all license types, you have to brand your copy of hhttttpp--aannaallyyzzee with
the registration ID and the registration images. The registration ID may
be set either in a system-wide file (usually /usr/local/lib/http-
analyze/REGID) or via the RReeggIInnffoo directives in an analyzer configuration
file. The latter method requires specification of the configuration file
each time hhttttpp--aannaallyyzzee is invoked. If you create a system-wide
Page 36 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
registration file, the registration information applies to all virtual
servers being analyzed.
To brand the software, detach the registration images we sent to you from
the email. After detaching them, there should be two files free-
netstore_s[bw].png for the free version and comm-netstore_s[bw].png for
the commercial version. Next, define the HHAA__LLIIBBDDIIRR environment variable
if you did choose another directory for the central libdir rather than
the default (/usr/local/lib/http-analyze). For example, if you can't
become _r_o_o_t, you would choose a directory for which you have write
permissions, install the analyzer files there and then use the HHAA__LLIIBBDDIIRR
variable to pass its name to hhttttpp--aannaallyyzzee. Finally, brand the software
by executing the following command as root:
# http-analyze -r "_C_u_s_t_o_m_e_r _N_a_m_e" _r_e_g_I_D _t_y_p_e
Registration information saved in file `/usr/local/lib/http-analyze/REGID'
#
where _C_u_s_t_o_m_e_r _N_a_m_e is the name of the organization this license is
registered for, _r_e_g_I_D is the registration ID of the license and _t_y_p_e is
either the keyword free or comm according to the type of the license.
Now run the analyzer to have the new buttons appear in the statistics
report.
Note that running the analyzer the first time will install or update any
older buttons and files in the statistics output directory automatically;
there is no need to run some helper application as it was the case in
previous versions of hhttttpp--aannaallyyzzee.
YYYYEEEEAAAARRRR 2222000000000000 CCCCOOOOMMMMPPPPLLLLIIIIAAAANNNNCCCCEEEE
All versions 2.X and above of hhttttpp--aannaallyyzzee are fully Year 2000 compliant.
There will be no problems with date-related functions after the year 1999
as long as the operating system itself is Year 2000 compliant also. Year
2000 compliant means, that the software does not produce errors in date-
related data or calculations or experience loss of functionality as a
result of the transition to the year 2000. This Year 2000 compliance
statement is not a product warranty. hhttttpp--aannaallyyzzee is provided under the
terms of the license agreement included in each distribution.
Please see http://www.netstore.de/Supply/http-analyze/year2000.html for
more information about the Year 2000 compliance real-time tests we did
run with hhttttpp--aannaallyyzzee.
DDDDAAAATTTTEEEE UUUUSSSSAAAAGGGGEEEE IIIINNNN HHHHTTTTTTTTPPPP----AAAANNNNAAAALLLLYYYYZZZZEEEE
The analyzer depends on the timestamp found in the logfile entries
produced by a web server. For the _N_C_S_A _C_o_m_m_o_n _L_o_g_f_i_l_e _F_o_r_m_a_t _a_n_d _t_h_e
_W_3_C _E_x_t_e_n_d_e_d _L_o_g_f_i_l_e _F_o_r_m_a_t a Year 2000 compliant date format was choosen
from the beginning on. This unique date format is - and ever was -
required by hhttttpp--aannaallyyzzee to be able to generate a statistics report, so
there are no problems unless those caused by your Operating System (see
below).
Page 37 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
To retain compatibility with previous versions of the log analyzer,
hhttttpp--aannaallyyzzee generates two-digit years in some output filenames.
However, those files are placed in a subdirectory containing the year in
four digits, which makes all output filenames fully Year 2000 compliant.
The date format in the --II and --EE options allows specification of a year
using only two digits. hhttttpp--aannaallyyzzee interprets values greater and equal
to 69 in 1900 and values lower than 69 in 2000. This way, the analyzer
covers the whole range of the time representation in modern Operating
Systems. However, any year can always be specified unambiguously by
using four digits.
DDDDAAAATTTTEEEE UUUUSSSSAAAAGGGGEEEE IIIINNNN TTTTHHHHEEEE OOOOPPPPEEEERRRRAAAATTTTIIIINNNNGGGG SSSSYYYYSSSSTTTTEEEEMMMM
Rumors has it that some systems don't recognize the Year 2000 as a leap
year. Although hhttttpp--aannaallyyzzee computes leap years for itself correctly, it
maps dates into weekdays using the _l_o_c_a_l_t_i_m_e(_3) function, which might
fail if the OS doesn't recognize the Year 2000 as a leap year.
Actually, there is a date-related function in modern operating systems,
which may cause problems after the year 2037. For those interested in the
technical details, here's why:
In operating systems the date is often represented in seconds since a
certain date. For example, in Unix systems the date is represented as
seconds since the birth of the OS at January, 1st 1970. This value is
stored in a _s_i_g_n_e_d _l_o_n_g (4-byte) data object, so it can represent as much
as 2147483648 seconds, which equals 35791394 minutes = 596523 hours =
24855 days = 68 years. Therefore, most clocks in traditional Unix
systems will overflow at January, 1st 2038 if the OS is not updated
before this date. Since hhttttpp--aannaallyyzzee uses several data structures
depending on the operating system's idea of the time (for example, the
_t_m__y_e_a_r variable contains the years since 1900), the software has to be
updated also before the year 2038 in order to take advantage of the time
representation in future OS versions.
EEEENNNNVVVVIIIIRRRROOOONNNNMMMMEEEENNNNTTTT VVVVAAAARRRRIIIIAAAABBBBLLLLEEEESSSS
Environment variables might work only in the Unix version of hhttttpp--
aannaallyyzzee.
HA_LIBDIR name of the library directory (default: /usr/local/lib/http-analyze)
HA_CONFIG name of the configuration file for hhttttpp--aannaallyyzzee (no default)
LANG language to use if XPG4 MNLS support is compiled in (see --VV)
HA_LANG language to use if native MNLS support is compiled in (see --VV)
FFFFIIIILLLLEEEESSSS
The following required files are installed in the library directory as
defined by the environment variable HHAA__LLIIBBDDIIRR or the hard-coded default
defined at compile-time. See also the section _S_t_a_t_i_s_t_i_c_s _R_e_p_o_r_t above
for the names of the HTML output files.
_b_t_n/*._p_n_g buttons files used in the statistics report
Page 38 (printed 11/1/99)
hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111)))) VVVVeeeerrrrssssiiiioooonnnn 2222....4444 hhhhttttttttpppp----aaaannnnaaaallllyyyyzzzzeeee((((1111))))
_T_L_D list of all top-level-domains
_h_a_2._0_*._p_n_g hhttttpp--aannaallyyzzee logos for your web site (for black and white bg)
_l_o_g_f_m_t.[_c_d_e]_l_f sample logfiles in CLF, DLF and ELF format
_3_D* required files for VRML model
SSSSEEEEEEEE AAAALLLLSSSSOOOO
_r_o_t_a_t_e-_h_t_t_p_d shell script to rotate the web server's logfiles
_h_t_t_p://_w_w_w._n_e_t_s_t_o_r_e._d_e/_S_u_p_p_l_y/_h_t_t_p-_a_n_a_l_y_z_e/homepage of hhttttpp--aannaallyyzzee
_h_t_t_p://_s_u_p_p_o_r_t._n_e_t_s_t_o_r_e._d_e/support site of hhttttpp--aannaallyyzzee
NNNNOOOOTTTTEEEESSSS
Logfile entries must be sorted in chronological order (ascending date)
when feed into the analyzer. If hhttttpp--aannaallyyzzee detects logfile entries
from an older month between newer ones, it prints a warning and skips all
entries up to the date of the last entry processed. To sort the data
from several different logfiles into a chronologically sorted data
stream, we provide a utility ha-sort to our Commercial Service Licensees.
To increase response time of web servers, DNS lookups are often disabled.
In this case hhttttpp--aannaallyyzzee does not see any hostname, but only numerical
IP addresses. To resolve the IP addresses into hostnames, we provide a
very fast DNS resolver ipresolve to our Commercial Service Licensees,
which does negative caching and saves all data in a history file.
Please visit our support site at http://support.netstore.de/ for more
information about the available helper applications.
CCCCOOOOPPPPYYYYRRRRIIIIGGGGHHHHTTTT
Copyright (C) 1996-1999 by Stefan Stapelberg, RENT-A-GURU(R),
<stefan@rent-a-guru.de>
Please see the file LLIICCEENNSSEE included in the distribution for the license
terms under which this program is made available to you in the free,
non-commercial version.
RENT-A-GURU(R) is a registered trademark of Martin Weitzel, Stefan
Stapelberg, and Walter Mecky.
Netstore(R) is a registered trademark of Stefan Stapelberg.
CCCCRRRREEEEDDDDIIIITTTTSSSS
Thanks to the numeruous users of hhttttpp--aannaallyyzzee for their valuable
feedback. Special thanks to Lars-Owe Ivarsson for his suggestions to
optimize the parser algorithm and for the code he provided as an example.
Many thanks also to Thomas Boutell (http://www.boutell.com/) for his
great GD library for fast image creation, without hhttttpp--aannaallyyzzee couldn't
produce such fancy graphics in the statistics report.
Page 39 (printed 11/1/99)
|