1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391
|
Lucene Change Log
$Id: CHANGES.txt 629178 2008-02-19 18:27:34Z buschmi $
======================= Release 2.3.1 2008-02-22 =======================
Bug fixes
1. LUCENE-1168: Fixed corruption cases when autoCommit=false and
documents have mixed term vectors (Suresh Guvvala via Mike
McCandless).
2. LUCENE-1171: Fixed some cases where OOM errors could cause
deadlock in IndexWriter (Mike McCandless).
3. LUCENE-1173: Fixed corruption case when autoCommit=false and bulk
merging of stored fields is used (Yonik via Mike McCandless).
4. LUCENE-1163: Fixed bug in CharArraySet.contains(char[] buffer, int
offset, int len) that was ignoring offset and thus giving the
wrong answer. (Thomas Peuss via Mike McCandless)
5. LUCENE-1177: Fix rare case where IndexWriter.optimize might do too
many merges at the end. (Mike McCandless)
6. LUCENE-1176: Fix corruption case when documents with no term
vector fields are added before documents with term vector fields.
(Mike McCandless)
7. LUCENE-1179: Fixed assert statement that was incorrectly
preventing Fields with empty-string field name from working.
(Sergey Kabashnyuk via Mike McCandless)
======================= Release 2.3.0 2008-01-23 =======================
Changes in runtime behavior
1. LUCENE-994: Defaults for IndexWriter have been changed to maximize
out-of-the-box indexing speed. First, IndexWriter now flushes by
RAM usage (16 MB by default) instead of a fixed doc count (call
IndexWriter.setMaxBufferedDocs to get backwards compatible
behavior). Second, ConcurrentMergeScheduler is used to run merges
using background threads (call IndexWriter.setMergeScheduler(new
SerialMergeScheduler()) to get backwards compatible behavior).
Third, merges are chosen based on size in bytes of each segment
rather than document count of each segment (call
IndexWriter.setMergePolicy(new LogDocMergePolicy()) to get
backwards compatible behavior).
NOTE: users of ParallelReader must change back all of these
defaults in order to ensure the docIDs "align" across all parallel
indices.
(Mike McCandless)
2. LUCENE-1045: SortField.AUTO didn't work with long. When detecting
the field type for sorting automatically, numbers used to be
interpreted as int, then as float, if parsing the number as an int
failed. Now the detection checks for int, then for long,
then for float. (Daniel Naber)
API Changes
1. LUCENE-843: Added IndexWriter.setRAMBufferSizeMB(...) to have
IndexWriter flush whenever the buffered documents are using more
than the specified amount of RAM. Also added new APIs to Token
that allow one to set a char[] plus offset and length to specify a
token (to avoid creating a new String() for each Token). (Mike
McCandless)
2. LUCENE-963: Add setters to Field to allow for re-using a single
Field instance during indexing. This is a sizable performance
gain, especially for small documents. (Mike McCandless)
3. LUCENE-969: Add new APIs to Token, TokenStream and Analyzer to
permit re-using of Token and TokenStream instances during
indexing. Changed Token to use a char[] as the store for the
termText instead of String. This gives faster tokenization
performance (~10-15%). (Mike McCandless)
4. LUCENE-847: Factored MergePolicy, which determines which merges
should take place and when, as well as MergeScheduler, which
determines when the selected merges should actually run, out of
IndexWriter. The default merge policy is now
LogByteSizeMergePolicy (see LUCENE-845) and the default merge
scheduler is now ConcurrentMergeScheduler (see
LUCENE-870). (Steven Parkes via Mike McCandless)
5. LUCENE-1052: Add IndexReader.setTermInfosIndexDivisor(int) method
that allows you to reduce memory usage of the termInfos by further
sub-sampling (over the termIndexInterval that was used during
indexing) which terms are loaded into memory. (Chuck Williams,
Doug Cutting via Mike McCandless)
6. LUCENE-743: Add IndexReader.reopen() method that re-opens an
existing IndexReader (see New features -> 8.) (Michael Busch)
7. LUCENE-1062: Add setData(byte[] data),
setData(byte[] data, int offset, int length), getData(), getOffset()
and clone() methods to o.a.l.index.Payload. Also add the field name
as arg to Similarity.scorePayload(). (Michael Busch)
8. LUCENE-982: Add IndexWriter.optimize(int maxNumSegments) method to
"partially optimize" an index down to maxNumSegments segments.
(Mike McCandless)
9. LUCENE-1080: Changed Token.DEFAULT_TYPE to be public.
10. LUCENE-1064: Changed TopDocs constructor to be public.
(Shai Erera via Michael Busch)
11. LUCENE-1079: DocValues cleanup: constructor now has no params,
and getInnerArray() now throws UnsupportedOperationException (Doron Cohen)
12. LUCENE-1089: Added PriorityQueue.insertWithOverflow, which returns
the Object (if any) that was bumped from the queue to allow
re-use. (Shai Erera via Mike McCandless)
13. LUCENE-1101: Token reuse 'contract' (defined LUCENE-969)
modified so it is token producer's responsibility
to call Token.clear(). (Doron Cohen)
14. LUCENE-1118: Changed StandardAnalyzer to skip too-long (default >
255 characters) tokens. You can increase this limit by calling
StandardAnalyzer.setMaxTokenLength(...). (Michael McCandless)
Bug fixes
1. LUCENE-933: QueryParser fixed to not produce empty sub
BooleanQueries "()" even if the Analyzer produced no
tokens for input. (Doron Cohen)
2. LUCENE-955: Fixed SegmentTermPositions to work correctly with the
first term in the dictionary. (Michael Busch)
3. LUCENE-951: Fixed NullPointerException in MultiLevelSkipListReader
that was thrown after a call of TermPositions.seek().
(Rich Johnson via Michael Busch)
4. LUCENE-938: Fixed cases where an unhandled exception in
IndexWriter's methods could cause deletes to be lost.
(Steven Parkes via Mike McCandless)
5. LUCENE-962: Fixed case where an unhandled exception in
IndexWriter.addDocument or IndexWriter.updateDocument could cause
unreferenced files in the index to not be deleted
(Steven Parkes via Mike McCandless)
6. LUCENE-957: RAMDirectory fixed to properly handle directories
larger than Integer.MAX_VALUE. (Doron Cohen)
7. LUCENE-781: MultiReader fixed to not throw NPE if isCurrent(),
isOptimized() or getVersion() is called. Separated MultiReader
into two classes: MultiSegmentReader extends IndexReader, is
package-protected and is created automatically by IndexReader.open()
in case the index has multiple segments. The public MultiReader
now extends MultiSegmentReader and is intended to be used by users
who want to add their own subreaders. (Daniel Naber, Michael Busch)
8. LUCENE-970: FilterIndexReader now implements isOptimized(). Before
a call of isOptimized() would throw a NPE. (Michael Busch)
9. LUCENE-832: ParallelReader fixed to not throw NPE if isCurrent(),
isOptimized() or getVersion() is called. (Michael Busch)
10. LUCENE-948: Fix FNFE exception caused by stale NFS client
directory listing caches when writers on different machines are
sharing an index over NFS and using a custom deletion policy (Mike
McCandless)
11. LUCENE-978: Ensure TermInfosReader, FieldsReader, and FieldsReader
close any streams they had opened if an exception is hit in the
constructor. (Ning Li via Mike McCandless)
12. LUCENE-985: If an extremely long term is in a doc (> 16383 chars),
we now throw an IllegalArgumentException saying the term is too
long, instead of cryptic ArrayIndexOutOfBoundsException. (Karl
Wettin via Mike McCandless)
13. LUCENE-991: The explain() method of BoostingTermQuery had errors
when no payloads were present on a document. (Peter Keegan via
Grant Ingersoll)
14. LUCENE-992: Fixed IndexWriter.updateDocument to be atomic again
(this was broken by LUCENE-843). (Ning Li via Mike McCandless)
15. LUCENE-1008: Fixed corruption case when document with no term
vector fields is added after documents with term vector fields.
This bug was introduced with LUCENE-843. (Grant Ingersoll via
Mike McCandless)
16. LUCENE-1006: Fixed QueryParser to accept a "" field value (zero
length quoted string.) (yonik)
17. LUCENE-1010: Fixed corruption case when document with no term
vector fields is added after documents with term vector fields.
This case is hit during merge and would cause an EOFException.
This bug was introduced with LUCENE-984. (Andi Vajda via Mike
McCandless)
19. LUCENE-1009: Fix merge slowdown with LogByteSizeMergePolicy when
autoCommit=false and documents are using stored fields and/or term
vectors. (Mark Miller via Mike McCandless)
20. LUCENE-1011: Fixed corruption case when two or more machines,
sharing an index over NFS, can be writers in quick succession.
(Patrick Kimber via Mike McCandless)
21. LUCENE-1028: Fixed Weight serialization for few queries:
DisjunctionMaxQuery, ValueSourceQuery, CustomScoreQuery.
Serialization check added for all queries.
(Kyle Maxwell via Doron Cohen)
22. LUCENE-1048: Fixed incorrect behavior in Lock.obtain(...) when the
timeout argument is very large (eg Long.MAX_VALUE). Also added
Lock.LOCK_OBTAIN_WAIT_FOREVER constant to never timeout. (Nikolay
Diakov via Mike McCandless)
23. LUCENE-1050: Throw LockReleaseFailedException in
Simple/NativeFSLockFactory if we fail to delete the lock file when
releasing the lock. (Nikolay Diakov via Mike McCandless)
24. LUCENE-1071: Fixed SegmentMerger to correctly set payload bit in
the merged segment. (Michael Busch)
25. LUCENE-1042: Remove throwing of IOException in getTermFreqVector(int, String, TermVectorMapper) to be consistent
with other getTermFreqVector calls. Also removed the throwing of the other IOException in that method to be consistent. (Karl Wettin via Grant Ingersoll)
26. LUCENE-1096: Fixed Hits behavior when hits' docs are deleted
along with iterating the hits. Deleting docs already retrieved
now works seamlessly. If docs not yet retrieved are deleted
(e.g. from another thread), and then, relying on the initial
Hits.length(), an application attempts to retrieve more hits
than actually exist , a ConcurrentMidificationException
is thrown. (Doron Cohen)
27. LUCENE-1068: Changed StandardTokenizer to fix an issue with it marking
the type of some tokens incorrectly. This is done by adding a new flag named
replaceInvalidAcronym which defaults to false, the current, incorrect behavior. Setting
this flag to true fixes the problem. This flag is a temporary fix and is already
marked as being deprecated. 3.x will implement the correct approach. (Shai Erera via Grant Ingersoll)
LUCENE-1140: Fixed NPE caused by 1068 (Alexei Dets via Grant Ingersoll)
28. LUCENE-749: ChainedFilter behavior fixed when logic of
first filter is ANDNOT. (Antonio Bruno via Doron Cohen)
29. LUCENE-508: Make sure SegmentTermEnum.prev() is accurate (= last
term) after next() returns false. (Steven Tamm via Mike
McCandless)
New features
1. LUCENE-906: Elision filter for French.
(Mathieu Lecarme via Otis Gospodnetic)
2. LUCENE-960: Added a SpanQueryFilter and related classes to allow for
not only filtering, but knowing where in a Document a Filter matches
(Grant Ingersoll)
3. LUCENE-868: Added new Term Vector access features. New callback
mechanism allows application to define how and where to read Term
Vectors from disk. This implementation contains several extensions
of the new abstract TermVectorMapper class. The new API should be
back-compatible. No changes in the actual storage of Term Vectors
has taken place.
3.1 LUCENE-1038: Added setDocumentNumber() method to TermVectorMapper
to provide information about what document is being accessed.
(Karl Wettin via Grant Ingersoll)
4. LUCENE-975: Added PositionBasedTermVectorMapper that allows for
position based lookup of term vector information.
See item #3 above (LUCENE-868).
5. LUCENE-1011: Added simple tools (all in org.apache.lucene.store)
to verify that locking is working properly. LockVerifyServer runs
a separate server to verify locks. LockStressTest runs a simple
tool that rapidly obtains and releases locks.
VerifyingLockFactory is a LockFactory that wraps any other
LockFactory and consults the LockVerifyServer whenever a lock is
obtained or released, throwing an exception if an illegal lock
obtain occurred. (Patrick Kimber via Mike McCandless)
6. LUCENE-1015: Added FieldCache extension (ExtendedFieldCache) to
support doubles and longs. Added support into SortField for sorting
on doubles and longs as well. (Grant Ingersoll)
7. LUCENE-1020: Created basic index checking & repair tool
(o.a.l.index.CheckIndex). When run without -fix it does a
detailed test of all segments in the index and reports summary
information and any errors it hit. With -fix it will remove
segments that had errors. (Mike McCandless)
8. LUCENE-743: Add IndexReader.reopen() method that re-opens an
existing IndexReader by only loading those portions of an index
that have changed since the reader was (re)opened. reopen() can
be significantly faster than open(), depending on the amount of
index changes. SegmentReader, MultiSegmentReader, MultiReader,
and ParallelReader implement reopen(). (Michael Busch)
9. LUCENE-1040: CharArraySet useful for efficiently checking
set membership of text specified by char[]. (yonik)
10. LUCENE-1073: Created SnapshotDeletionPolicy to facilitate taking a
live backup of an index without pausing indexing. (Mike
McCandless)
11. LUCENE-1019: CustomScoreQuery enhanced to support multiple
ValueSource queries. (Kyle Maxwell via Doron Cohen)
12. LUCENE-1095: Added an option to StopFilter to increase
positionIncrement of the token succeeding a stopped token.
Disabled by default. Similar option added to QueryParser
to consider token positions when creating PhraseQuery
and MultiPhraseQuery. Disabled by default (so by default
the query parser ignores position increments).
(Doron Cohen)
Optimizations
1. LUCENE-937: CachingTokenFilter now uses an iterator to access the
Tokens that are cached in the LinkedList. This increases performance
significantly, especially when the number of Tokens is large.
(Mark Miller via Michael Busch)
2. LUCENE-843: Substantial optimizations to improve how IndexWriter
uses RAM for buffering documents and to speed up indexing (2X-8X
faster). A single shared hash table now records the in-memory
postings per unique term and is directly flushed into a single
segment. (Mike McCandless)
3. LUCENE-892: Fixed extra "buffer to buffer copy" that sometimes
takes place when using compound files. (Mike McCandless)
4. LUCENE-959: Remove synchronization in Document (yonik)
5. LUCENE-963: Add setters to Field to allow for re-using a single
Field instance during indexing. This is a sizable performance
gain, especially for small documents. (Mike McCandless)
6. LUCENE-939: Check explicitly for boundary conditions in FieldInfos
and don't rely on exceptions. (Michael Busch)
7. LUCENE-966: Very substantial speedups (~6X faster) for
StandardTokenizer (StandardAnalyzer) by using JFlex instead of
JavaCC to generate the tokenizer.
(Stanislaw Osinski via Mike McCandless)
8. LUCENE-969: Changed core tokenizers & filters to re-use Token and
TokenStream instances when possible to improve tokenization
performance (~10-15%). (Mike McCandless)
9. LUCENE-871: Speedup ISOLatin1AccentFilter (Ian Boston via Mike
McCandless)
10. LUCENE-986: Refactored SegmentInfos from IndexReader into the new
subclass DirectoryIndexReader. SegmentReader and MultiSegmentReader
now extend DirectoryIndexReader and are the only IndexReader
implementations that use SegmentInfos to access an index and
acquire a write lock for index modifications. (Michael Busch)
11. LUCENE-1007: Allow flushing in IndexWriter to be triggered by
either RAM usage or document count or both (whichever comes
first), by adding symbolic constant DISABLE_AUTO_FLUSH to disable
one of the flush triggers. (Ning Li via Mike McCandless)
12. LUCENE-1043: Speed up merging of stored fields by bulk-copying the
raw bytes for each contiguous range of non-deleted documents.
(Robert Engels via Mike McCandless)
13. LUCENE-693: Speed up nested conjunctions (~2x) that match many
documents, and a slight performance increase for top level
conjunctions. (yonik)
14. LUCENE-1098: Make inner class StandardAnalyzer.SavedStreams static
and final. (Nathan Beyer via Michael Busch)
Documentation
1. LUCENE-1051: Generate separate javadocs for core, demo and contrib
classes, as well as an unified view. Also add an appropriate menu
structure to the website. (Michael Busch)
2. LUCENE-746: Fix error message in AnalyzingQueryParser.getPrefixQuery.
(Ronnie Kolehmainen via Michael Busch)
Build
1. LUCENE-908: Improvements and simplifications for how the MANIFEST
file and the META-INF dir are created. (Michael Busch)
2. LUCENE-935: Various improvements for the maven artifacts. Now the
artifacts also include the sources as .jar files. (Michael Busch)
3. Added apply-patch target to top-level build. Defaults to looking for
a patch in ${basedir}/../patches with name specified by -Dpatch.name.
Can also specify any location by -Dpatch.file property on the command
line. This should be helpful for easy application of patches, but it
is also a step towards integrating automatic patch application with
JIRA and Hudson, and is thus subject to change. (Grant Ingersoll)
4. LUCENE-935: Defined property "m2.repository.url" to allow setting
the url to a maven remote repository to deploy to. (Michael Busch)
5. LUCENE-1051: Include javadocs in the maven artifacts. (Michael Busch)
6. LUCENE-1055: Remove gdata-server from build files and its sources
from trunk. (Michael Busch)
7. LUCENE-935: Allow to deploy maven artifacts to a remote m2 repository
via scp and ssh authentication. (Michael Busch)
8. LUCENE-1123: Allow overriding the specification version for
MANIFEST.MF (Michael Busch)
Test Cases
1. LUCENE-766: Test adding two fields with the same name but different
term vector setting. (Nicolas Lalevée via Doron Cohen)
======================= Release 2.2.0 2007-06-19 =======================
Changes in runtime behavior
API Changes
1. LUCENE-793: created new exceptions and added them to throws clause
for many methods (all subclasses of IOException for backwards
compatibility): index.StaleReaderException,
index.CorruptIndexException, store.LockObtainFailedException.
This was done to better call out the possible root causes of an
IOException from these methods. (Mike McCandless)
2. LUCENE-811: make SegmentInfos class, plus a few methods from related
classes, package-private again (they were unnecessarily made public
as part of LUCENE-701). (Mike McCandless)
3. LUCENE-710: added optional autoCommit boolean to IndexWriter
constructors. When this is false, index changes are not committed
until the writer is closed. This gives explicit control over when
a reader will see the changes. Also added optional custom
deletion policy to explicitly control when prior commits are
removed from the index. This is intended to allow applications to
share an index over NFS by customizing when prior commits are
deleted. (Mike McCandless)
4. LUCENE-818: changed most public methods of IndexWriter,
IndexReader (and its subclasses), FieldsReader and RAMDirectory to
throw AlreadyClosedException if they are accessed after being
closed. (Mike McCandless)
5. LUCENE-834: Changed some access levels for certain Span classes to allow them
to be overridden. They have been marked expert only and not for public
consumption. (Grant Ingersoll)
6. LUCENE-796: Removed calls to super.* from various get*Query methods in
MultiFieldQueryParser, in order to allow sub-classes to override them.
(Steven Parkes via Otis Gospodnetic)
7. LUCENE-857: Removed caching from QueryFilter and deprecated QueryFilter
in favour of QueryWrapperFilter or QueryWrapperFilter + CachingWrapperFilter
combination when caching is desired.
(Chris Hostetter, Otis Gospodnetic)
8. LUCENE-869: Changed FSIndexInput and FSIndexOutput to inner classes of FSDirectory
to enable extensibility of these classes. (Michael Busch)
9. LUCENE-580: Added the public method reset() to TokenStream. This method does
nothing by default, but may be overwritten by subclasses to support consuming
the TokenStream more than once. (Michael Busch)
10. LUCENE-580: Added a new constructor to Field that takes a TokenStream as
argument, available as tokenStreamValue(). This is useful to avoid the need of
"dummy analyzers" for pre-analyzed fields. (Karl Wettin, Michael Busch)
11. LUCENE-730: Added the new methods to BooleanQuery setAllowDocsOutOfOrder() and
getAllowDocsOutOfOrder(). Deprecated the methods setUseScorer14() and
getUseScorer14(). The optimization patch LUCENE-730 (see Optimizations->3.)
improves performance for certain queries but results in scoring out of docid
order. This patch reverse this change, so now by default hit docs are scored
in docid order if not setAllowDocsOutOfOrder(true) is explicitly called.
This patch also enables the tests in QueryUtils again that check for docid
order. (Paul Elschot, Doron Cohen, Michael Busch)
12. LUCENE-888: Added Directory.openInput(File path, int bufferSize)
to optionally specify the size of the read buffer. Also added
BufferedIndexInput.setBufferSize(int) to change the buffer size.
(Mike McCandless)
13. LUCENE-923: Make SegmentTermPositionVector package-private. It does not need
to be public because it implements the public interface TermPositionVector.
(Michael Busch)
Bug fixes
1. LUCENE-804: Fixed build.xml to pack a fully compilable src dist. (Doron Cohen)
2. LUCENE-813: Leading wildcard fixed to work with trailing wildcard.
Query parser modified to create a prefix query only for the case
that there is a single trailing wildcard (and no additional wildcard
or '?' in the query text). (Doron Cohen)
3. LUCENE-812: Add no-argument constructors to NativeFSLockFactory
and SimpleFSLockFactory. This enables all 4 builtin LockFactory
implementations to be specified via the System property
org.apache.lucene.store.FSDirectoryLockFactoryClass. (Mike McCandless)
4. LUCENE-821: The new single-norm-file introduced by LUCENE-756
failed to reduce the number of open descriptors since it was still
opened once per field with norms. (yonik)
5. LUCENE-823: Make sure internal file handles are closed when
hitting an exception (eg disk full) while flushing deletes in
IndexWriter's mergeSegments, and also during
IndexWriter.addIndexes. (Mike McCandless)
6. LUCENE-825: If directory is removed after
FSDirectory.getDirectory() but before IndexReader.open you now get
a FileNotFoundException like Lucene pre-2.1 (before this fix you
got an NPE). (Mike McCandless)
7. LUCENE-800: Removed backslash from the TERM_CHAR list in the queryparser,
because the backslash is the escape character. Also changed the ESCAPED_CHAR
list to contain all possible characters, because every character that
follows a backslash should be considered as escaped. (Michael Busch)
8. LUCENE-372: QueryParser.parse() now ensures that the entire input string
is consumed. Now a ParseException is thrown if a query contains too many
closing parentheses. (Andreas Neumann via Michael Busch)
9. LUCENE-814: javacc build targets now fix line-end-style of generated files.
Now also deleting all javacc generated files before calling javacc.
(Steven Parkes, Doron Cohen)
10. LUCENE-829: close readers in contrib/benchmark. (Karl Wettin, Doron Cohen)
11. LUCENE-828: Minor fix for Term's equal().
(Paul Cowan via Otis Gospodnetic)
12. LUCENE-846: Fixed: if IndexWriter is opened with autoCommit=false,
and you call addIndexes, and hit an exception (eg disk full) then
when IndexWriter rolls back its internal state this could corrupt
the instance of IndexWriter (but, not the index itself) by
referencing already deleted segments. This bug was only present
in 2.2 (trunk), ie was never released. (Mike McCandless)
13. LUCENE-736: Sloppy phrase query with repeating terms matches wrong docs.
For example query "B C B"~2 matches the doc "A B C D E". (Doron Cohen)
14. LUCENE-789: Fixed: custom similarity is ignored when using MultiSearcher (problem reported
by Alexey Lef). Now the similarity applied by MultiSearcer.setSimilarity(sim) is being used.
Note that as before this fix, creating a multiSearcher from Searchers for whom custom similarity
was set has no effect - it is masked by the similarity of the MultiSearcher. This is as
designed, because MultiSearcher operates on Searchables (not Searchers). (Doron Cohen)
15. LUCENE-880: Fixed DocumentWriter to close the TokenStreams after it
has written the postings. Then the resources associated with the
TokenStreams can safely be released. (Michael Busch)
16. LUCENE-883: consecutive calls to Spellchecker.indexDictionary()
won't insert terms twice anymore. (Daniel Naber)
17. LUCENE-881: QueryParser.escape() now also escapes the characters
'|' and '&' which are part of the queryparser syntax. (Michael Busch)
18. LUCENE-886: Spellchecker clean up: exceptions aren't printed to STDERR
anymore and ignored, but re-thrown. Some javadoc improvements.
(Daniel Naber)
19. LUCENE-698: FilteredQuery now takes the query boost into account for
scoring. (Michael Busch)
20. LUCENE-763: Spellchecker: LuceneDictionary used to skip first word in
enumeration. (Christian Mallwitz via Daniel Naber)
21. LUCENE-903: FilteredQuery explanation inaccuracy with boost.
Explanation tests now "deep" check the explanation details.
(Chris Hostetter, Doron Cohen)
22. LUCENE-912: DisjunctionMaxScorer first skipTo(target) call ignores the
skip target param and ends up at the first match.
(Sudaakeran B. via Chris Hostetter & Doron Cohen)
23. LUCENE-913: Two consecutive score() calls return different
scores for Boolean Queries. (Michael Busch, Doron Cohen)
24. LUCENE-1013: Fix IndexWriter.setMaxMergeDocs to work "out of the
box", again, by moving set/getMaxMergeDocs up from
LogDocMergePolicy into LogMergePolicy. This fixes the API
breakage (non backwards compatible change) caused by LUCENE-994.
(Yonik Seeley via Mike McCandless)
New features
1. LUCENE-759: Added two n-gram-producing TokenFilters.
(Otis Gospodnetic)
2. LUCENE-822: Added FieldSelector capabilities to Searchable for use with
RemoteSearcher, and other Searchable implementations. (Mark Miller, Grant Ingersoll)
3. LUCENE-755: Added the ability to store arbitrary binary metadata in the posting list.
These metadata are called Payloads. For every position of a Token one Payload in the form
of a variable length byte array can be stored in the prox file.
Remark: The APIs introduced with this feature are in experimental state and thus
contain appropriate warnings in the javadocs.
(Michael Busch)
4. LUCENE-834: Added BoostingTermQuery which can boost scores based on the
values of a payload (see #3 above.) (Grant Ingersoll)
5. LUCENE-834: Similarity has a new method for scoring payloads called
scorePayloads that can be overridden to take advantage of payload
storage (see #3 above)
6. LUCENE-834: Added isPayloadAvailable() onto TermPositions interface and
implemented it in the appropriate places (Grant Ingersoll)
7. LUCENE-853: Added RemoteCachingWrapperFilter to enable caching of Filters
on the remote side of the RMI connection.
(Matt Ericson via Otis Gospodnetic)
8. LUCENE-446: Added Solr's search.function for scores based on field
values, plus CustomScoreQuery for simple score (post) customization.
(Yonik Seeley, Doron Cohen)
9. LUCENE-1058: Added new TeeTokenFilter (like the UNIX 'tee' command) and SinkTokenizer which can be used to share tokens between two or more
Fields such that the other Fields do not have to go through the whole Analysis process over again. For instance, if you have two
Fields that share all the same analysis steps except one lowercases tokens and the other does not, you can coordinate the operations
between the two using the TeeTokenFilter and the SinkTokenizer. See TeeSinkTokenTest.java for examples.
(Grant Ingersoll, Michael Busch, Yonik Seeley)
Optimizations
1. LUCENE-761: The proxStream is now cloned lazily in SegmentTermPositions
when nextPosition() is called for the first time. This allows using instances
of SegmentTermPositions instead of SegmentTermDocs without additional costs.
(Michael Busch)
2. LUCENE-431: RAMInputStream and RAMOutputStream extend IndexInput and
IndexOutput directly now. This avoids further buffering and thus avoids
unnecessary array copies. (Michael Busch)
3. LUCENE-730: Updated BooleanScorer2 to make use of BooleanScorer in some
cases and possibly improve scoring performance. Documents can now be
delivered out-of-order as they are scored (e.g. to HitCollector).
N.B. A bit of code had to be disabled in QueryUtils in order for
TestBoolean2 test to keep passing.
(Paul Elschot via Otis Gospodnetic)
4. LUCENE-882: Spellchecker doesn't store the ngrams anymore but only indexes
them to keep the spell index small. (Daniel Naber)
5. LUCENE-430: Delay allocation of the buffer after a clone of BufferedIndexInput.
Together with LUCENE-888 this will allow to adjust the buffer size
dynamically. (Paul Elschot, Michael Busch)
6. LUCENE-888: Increase buffer sizes inside CompoundFileWriter and
BufferedIndexOutput. Also increase buffer size in
BufferedIndexInput, but only when used during merging. Together,
these increases yield 10-18% overall performance gain vs the
previous 1K defaults. (Mike McCandless)
7. LUCENE-866: Adds multi-level skip lists to the posting lists. This speeds
up most queries that use skipTo(), especially on big indexes with large posting
lists. For average AND queries the speedup is about 20%, for queries that
contain very frequent and very unique terms the speedup can be over 80%.
(Michael Busch)
Documentation
1. LUCENE 791 && INFRA-1173: Infrastructure moved the Wiki to
http://wiki.apache.org/lucene-java/ Updated the links in the docs and
wherever else I found references. (Grant Ingersoll, Joe Schaefer)
2. LUCENE-807: Fixed the javadoc for ScoreDocComparator.compare() to be
consistent with java.util.Comparator.compare(): Any integer is allowed to
be returned instead of only -1/0/1.
(Paul Cowan via Michael Busch)
3. LUCENE-875: Solved javadoc warnings & errors under jdk1.4.
Solved javadoc errors under jdk5 (jars in path for gdata).
Made "javadocs" target depend on "build-contrib" for first downloading
contrib jars configured for dynamic downloaded. (Note: when running
behind firewall, a firewall prompt might pop up) (Doron Cohen)
4. LUCENE-740: Added SNOWBALL-LICENSE.txt to the snowball package and a
remark about the license to NOTICE.TXT. (Steven Parkes via Michael Busch)
5. LUCENE-925: Added analysis package javadocs. (Grant Ingersoll and Doron Cohen)
6. LUCENE-926: Added document package javadocs. (Grant Ingersoll)
Build
1. LUCENE-802: Added LICENSE.TXT and NOTICE.TXT to Lucene jars.
(Steven Parkes via Michael Busch)
2. LUCENE-885: "ant test" now includes all contrib tests. The new
"ant test-core" target can be used to run only the Core (non
contrib) tests.
(Chris Hostetter)
3. LUCENE-900: "ant test" now enables Java assertions (in Lucene packages).
(Doron Cohen)
4. LUCENE-894: Add custom build file for binary distributions that includes
targets to build the demos. (Chris Hostetter, Michael Busch)
5. LUCENE-904: The "package" targets in build.xml now also generate .md5
checksum files. (Chris Hostetter, Michael Busch)
6. LUCENE-907: Include LICENSE.TXT and NOTICE.TXT in the META-INF dirs of
demo war, demo jar, and the contrib jars. (Michael Busch)
7. LUCENE-909: Demo targets for running the demo. (Doron Cohen)
8. LUCENE-908: Improves content of MANIFEST file and makes it customizable
for the contribs. Adds SNOWBALL-LICENSE.txt to META-INF of the snowball
jar and makes sure that the lucli jar contains LICENSE.txt and NOTICE.txt.
(Chris Hostetter, Michael Busch)
9. LUCENE-930: Various contrib building improvements to ensure contrib
dependencies are met, and test compilation errors fail the build.
(Steven Parkes, Chris Hostetter)
10. LUCENE-622: Add ant target and pom.xml files for building maven artifacts
of the Lucene core and the contrib modules.
(Sami Siren, Karl Wettin, Michael Busch)
======================= Release 2.1.0 2007-02-14 =======================
Changes in runtime behavior
1. 's' and 't' have been removed from the list of default stopwords
in StopAnalyzer (also used in by StandardAnalyzer). Having e.g. 's'
as a stopword meant that 's-class' led to the same results as 'class'.
Note that this problem still exists for 'a', e.g. in 'a-class' as
'a' continues to be a stopword.
(Daniel Naber)
2. LUCENE-478: Updated the list of Unicode code point ranges for CJK
(now split into CJ and K) in StandardAnalyzer. (John Wang and
Steven Rowe via Otis Gospodnetic)
3. Modified some CJK Unicode code point ranges in StandardTokenizer.jj,
and added a few more of them to increase CJK character coverage.
Also documented some of the ranges.
(Otis Gospodnetic)
4. LUCENE-489: Add support for leading wildcard characters (*, ?) to
QueryParser. Default is to disallow them, as before.
(Steven Parkes via Otis Gospodnetic)
5. LUCENE-703: QueryParser changed to default to use of ConstantScoreRangeQuery
for range queries. Added useOldRangeQuery property to QueryParser to allow
selection of old RangeQuery class if required.
(Mark Harwood)
6. LUCENE-543: WildcardQuery now performs a TermQuery if the provided term
does not contain a wildcard character (? or *), when previously a
StringIndexOutOfBoundsException was thrown.
(Michael Busch via Erik Hatcher)
7. LUCENE-726: Removed the use of deprecated doc.fields() method and
Enumeration.
(Michael Busch via Otis Gospodnetic)
8. LUCENE-436: Removed finalize() in TermInfosReader and SegmentReader,
and added a call to enumerators.remove() in TermInfosReader.close().
The finalize() overrides were added to help with a pre-1.4.2 JVM bug
that has since been fixed, plus we no longer support pre-1.4.2 JVMs.
(Otis Gospodnetic)
9. LUCENE-771: The default location of the write lock is now the
index directory, and is named simply "write.lock" (without a big
digest prefix). The system properties "org.apache.lucene.lockDir"
nor "java.io.tmpdir" are no longer used as the global directory
for storing lock files, and the LOCK_DIR field of FSDirectory is
now deprecated. (Mike McCandless)
New features
1. LUCENE-503: New ThaiAnalyzer and ThaiWordFilter in contrib/analyzers
(Samphan Raruenrom via Chris Hostetter)
2. LUCENE-545: New FieldSelector API and associated changes to
IndexReader and implementations. New Fieldable interface for use
with the lazy field loading mechanism. (Grant Ingersoll and Chuck
Williams via Grant Ingersoll)
3. LUCENE-676: Move Solr's PrefixFilter to Lucene core. (Yura
Smolsky, Yonik Seeley)
4. LUCENE-678: Added NativeFSLockFactory, which implements locking
using OS native locking (via java.nio.*). (Michael McCandless via
Yonik Seeley)
5. LUCENE-544: Added the ability to specify different boosts for
different fields when using MultiFieldQueryParser (Matt Ericson
via Otis Gospodnetic)
6. LUCENE-528: New IndexWriter.addIndexesNoOptimize() that doesn't
optimize the index when adding new segments, only performing
merges as needed. (Ning Li via Yonik Seeley)
7. LUCENE-573: QueryParser now allows backslash escaping in
quoted terms and phrases. (Michael Busch via Yonik Seeley)
8. LUCENE-716: QueryParser now allows specification of Unicode
characters in terms via a unicode escape of the form \uXXXX
(Michael Busch via Yonik Seeley)
9. LUCENE-709: Added RAMDirectory.sizeInBytes(), IndexWriter.ramSizeInBytes()
and IndexWriter.flushRamSegments(), allowing applications to
control the amount of memory used to buffer documents.
(Chuck Williams via Yonik Seeley)
10. LUCENE-723: QueryParser now parses *:* as MatchAllDocsQuery
(Yonik Seeley)
11. LUCENE-741: Command-line utility for modifying or removing norms
on fields in an existing index. This is mostly based on LUCENE-496
and lives in contrib/miscellaneous.
(Chris Hostetter, Otis Gospodnetic)
12. LUCENE-759: Added NGramTokenizer and EdgeNGramTokenizer classes and
their passing unit tests.
(Otis Gospodnetic)
13. LUCENE-565: Added methods to IndexWriter to more efficiently
handle updating documents (the "delete then add" use case). This
is intended to be an eventual replacement for the existing
IndexModifier. Added IndexWriter.flush() (renamed from
flushRamSegments()) to flush all pending updates (held in RAM), to
the Directory. (Ning Li via Mike McCandless)
14. LUCENE-762: Added in SIZE and SIZE_AND_BREAK FieldSelectorResult options
which allow one to retrieve the size of a field without retrieving the
actual field. (Chuck Williams via Grant Ingersoll)
15. LUCENE-799: Properly handle lazy, compressed fields.
(Mike Klaas via Grant Ingersoll)
API Changes
1. LUCENE-438: Remove "final" from Token, implement Cloneable, allow
changing of termText via setTermText(). (Yonik Seeley)
2. org.apache.lucene.analysis.nl.WordlistLoader has been deprecated
and is supposed to be replaced with the WordlistLoader class in
package org.apache.lucene.analysis (Daniel Naber)
3. LUCENE-609: Revert return type of Document.getField(s) to Field
for backward compatibility, added new Document.getFieldable(s)
for access to new lazy loaded fields. (Yonik Seeley)
4. LUCENE-608: Document.fields() has been deprecated and a new method
Document.getFields() has been added that returns a List instead of
an Enumeration (Daniel Naber)
5. LUCENE-605: New Explanation.isMatch() method and new ComplexExplanation
subclass allows explain methods to produce Explanations which model
"matching" independent of having a positive value.
(Chris Hostetter)
6. LUCENE-621: New static methods IndexWriter.setDefaultWriteLockTimeout
and IndexWriter.setDefaultCommitLockTimeout for overriding default
timeout values for all future instances of IndexWriter (as well
as for any other classes that may reference the static values,
ie: IndexReader).
(Michael McCandless via Chris Hostetter)
7. LUCENE-638: FSDirectory.list() now only returns the directory's
Lucene-related files. Thanks to this change one can now construct
a RAMDirectory from a file system directory that contains files
not related to Lucene.
(Simon Willnauer via Daniel Naber)
8. LUCENE-635: Decoupling locking implementation from Directory
implementation. Added set/getLockFactory to Directory and moved
all locking code into subclasses of abstract class LockFactory.
FSDirectory and RAMDirectory still default to their prior locking
implementations, but now you can mix & match, for example using
SingleInstanceLockFactory (ie, in memory locking) locking with an
FSDirectory. Note that now you must call setDisableLocks before
the instantiation a FSDirectory if you wish to disable locking
for that Directory.
(Michael McCandless, Jeff Patterson via Yonik Seeley)
9. LUCENE-657: Made FuzzyQuery non-final and inner ScoreTerm protected.
(Steven Parkes via Otis Gospodnetic)
10. LUCENE-701: Lockless commits: a commit lock is no longer required
when a writer commits and a reader opens the index. This includes
a change to the index file format (see docs/fileformats.html for
details). It also removes all APIs associated with the commit
lock & its timeout. Readers are now truly read-only and do not
block one another on startup. This is the first step to getting
Lucene to work correctly over NFS (second step is
LUCENE-710). (Mike McCandless)
11. LUCENE-722: DEFAULT_MIN_DOC_FREQ was misspelled DEFALT_MIN_DOC_FREQ
in Similarity's MoreLikeThis class. The misspelling has been
replaced by the correct spelling.
(Andi Vajda via Daniel Naber)
12. LUCENE-738: Reduce the size of the file that keeps track of which
documents are deleted when the number of deleted documents is
small. This changes the index file format and cannot be
read by previous versions of Lucene. (Doron Cohen via Yonik Seeley)
13. LUCENE-756: Maintain all norms in a single .nrm file to reduce the
number of open files and file descriptors for the non-compound index
format. This changes the index file format, but maintains the
ability to read and update older indices. The first segment merge
on an older format index will create a single .nrm file for the new
segment. (Doron Cohen via Yonik Seeley)
14. LUCENE-732: DateTools support has been added to QueryParser, with
setters for both the default Resolution, and per-field Resolution.
For backwards compatibility, DateField is still used if no Resolutions
are specified. (Michael Busch via Chris Hostetter)
15. Added isOptimized() method to IndexReader.
(Otis Gospodnetic)
16. LUCENE-773: Deprecate the FSDirectory.getDirectory(*) methods that
take a boolean "create" argument. Instead you should use
IndexWriter's "create" argument to create a new index.
(Mike McCandless)
17. LUCENE-780: Add a static Directory.copy() method to copy files
from one Directory to another. (Jiri Kuhn via Mike McCandless)
18. LUCENE-773: Added Directory.clearLock(String name) to forcefully
remove an old lock. The default implementation is to ask the
lockFactory (if non null) to clear the lock. (Mike McCandless)
19. LUCENE-795: Directory.renameFile() has been deprecated as it is
not used anymore inside Lucene. (Daniel Naber)
Bug fixes
1. Fixed the web application demo (built with "ant war-demo") which
didn't work because it used a QueryParser method that had
been removed (Daniel Naber)
2. LUCENE-583: ISOLatin1AccentFilter fails to preserve positionIncrement
(Yonik Seeley)
3. LUCENE-575: SpellChecker min score is incorrectly changed by suggestSimilar
(Karl Wettin via Yonik Seeley)
4. LUCENE-587: Explanation.toHtml was producing malformed HTML
(Chris Hostetter)
5. Fix to allow MatchAllDocsQuery to be used with RemoteSearcher (Yonik Seeley)
6. LUCENE-601: RAMDirectory and RAMFile made Serializable
(Karl Wettin via Otis Gospodnetic)
7. LUCENE-557: Fixes to BooleanQuery and FilteredQuery so that the score
Explanations match up with the real scores.
(Chris Hostetter)
8. LUCENE-607: ParallelReader's TermEnum fails to advance properly to
new fields (Chuck Williams, Christian Kohlschuetter via Yonik Seeley)
9. LUCENE-610,LUCENE-611: Simple syntax changes to allow compilation with ecj:
disambiguate inner class scorer's use of doc() in BooleanScorer2,
other test code changes. (DM Smith via Yonik Seeley)
10. LUCENE-451: All core query types now use ComplexExplanations so that
boosts of zero don't confuse the BooleanWeight explain method.
(Chris Hostetter)
11. LUCENE-593: Fixed LuceneDictionary's inner Iterator
(Kåre Fiedler Christiansen via Otis Gospodnetic)
12. LUCENE-641: fixed an off-by-one bug with IndexWriter.setMaxFieldLength()
(Daniel Naber)
13. LUCENE-659: Make PerFieldAnalyzerWrapper delegate getPositionIncrementGap()
to the correct analyzer for the field. (Chuck Williams via Yonik Seeley)
14. LUCENE-650: Fixed NPE in Locale specific String Sort when Document
has no value.
(Oliver Hutchison via Chris Hostetter)
15. LUCENE-683: Fixed data corruption when reading lazy loaded fields.
(Yonik Seeley)
16. LUCENE-678: Fixed bug in NativeFSLockFactory which caused the same
lock to be shared between different directories.
(Michael McCandless via Yonik Seeley)
17. LUCENE-690: Fixed thread unsafe use of IndexInput by lazy loaded fields.
(Yonik Seeley)
18. LUCENE-696: Fix bug when scorer for DisjunctionMaxQuery has skipTo()
called on it before next(). (Yonik Seeley)
19. LUCENE-569: Fixed SpanNearQuery bug, for 'inOrder' queries it would fail
to recognize ordered spans if they overlapped with unordered spans.
(Paul Elschot via Chris Hostetter)
20. LUCENE-706: Updated fileformats.xml|html concerning the docdelta value
in the frequency file. (Johan Stuyts, Doron Cohen via Grant Ingersoll)
21. LUCENE-715: Fixed private constructor in IndexWriter.java to
properly release the acquired write lock if there is an
IOException after acquiring the write lock but before finishing
instantiation. (Matthew Bogosian via Mike McCandless)
22. LUCENE-651: Multiple different threads requesting the same
FieldCache entry (often for Sorting by a field) at the same
time caused multiple generations of that entry, which was
detrimental to performance and memory use.
(Oliver Hutchison via Otis Gospodnetic)
23. LUCENE-717: Fixed build.xml not to fail when there is no lib dir.
(Doron Cohen via Otis Gospodnetic)
24. LUCENE-728: Removed duplicate/old MoreLikeThis and SimilarityQueries
classes from contrib/similarity, as their new home is under
contrib/queries.
(Otis Gospodnetic)
25. LUCENE-669: Do not double-close the RandomAccessFile in
FSIndexInput/Output during finalize(). Besides sending an
IOException up to the GC, this may also be the cause intermittent
"The handle is invalid" IOExceptions on Windows when trying to
close readers or writers. (Michael Busch via Mike McCandless)
26. LUCENE-702: Fix IndexWriter.addIndexes(*) to not corrupt the index
on any exceptions (eg disk full). The semantics of these methods
is now transactional: either all indices are merged or none are.
Also fixed IndexWriter.mergeSegments (called outside of
addIndexes(*) by addDocument, optimize, flushRamSegments) and
IndexReader.commit() (called by close) to clean up and keep the
instance state consistent to what's actually in the index (Mike
McCandless).
27. LUCENE-129: Change finalizers to do "try {...} finally
{super.finalize();}" to make sure we don't miss finalizers in
classes above us. (Esmond Pitt via Mike McCandless)
28. LUCENE-754: Fix a problem introduced by LUCENE-651, causing
IndexReaders to hang around forever, in addition to not
fixing the original FieldCache performance problem.
(Chris Hostetter, Yonik Seeley)
29. LUCENE-140: Fix IndexReader.deleteDocument(int docNum) to
correctly raise ArrayIndexOutOfBoundsException when docNum is too
large. Previously, if docNum was only slightly too large (within
the same multiple of 8, ie, up to 7 ints beyond maxDoc), no
exception would be raised and instead the index would become
silently corrupted. The corruption then only appears much later,
in mergeSegments, when the corrupted segment is merged with
segment(s) after it. (Mike McCandless)
30. LUCENE-768: Fix case where an Exception during deleteDocument,
undeleteAll or setNorm in IndexReader could leave the reader in a
state where close() fails to release the write lock.
(Mike McCandless)
31. Remove "tvp" from known index file extensions because it is
never used. (Nicolas Lalevée via Bernhard Messer)
32. LUCENE-767: Change how SegmentReader.maxDoc() is computed to not
rely on file length check and instead use the SegmentInfo's
docCount that's already stored explicitly in the index. This is a
defensive bug fix (ie, there is no known problem seen "in real
life" due to this, just a possible future problem). (Chuck
Williams via Mike McCandless)
Optimizations
1. LUCENE-586: TermDocs.skipTo() is now more efficient for
multi-segment indexes. This will improve the performance of many
types of queries against a non-optimized index. (Andrew Hudson
via Yonik Seeley)
2. LUCENE-623: RAMDirectory.close now nulls out its reference to all
internal "files", allowing them to be GCed even if references to the
RAMDirectory itself still exist. (Nadav Har'El via Chris Hostetter)
3. LUCENE-629: Compressed fields are no longer uncompressed and
recompressed during segment merges (e.g. during indexing or
optimizing), thus improving performance . (Michael Busch via Otis
Gospodnetic)
4. LUCENE-388: Improve indexing performance when maxBufferedDocs is
large by keeping a count of buffered documents rather than
counting after each document addition. (Doron Cohen, Paul Smith,
Yonik Seeley)
5. Modified TermScorer.explain to use TermDocs.skipTo() instead of
looping through docs. (Grant Ingersoll)
6. LUCENE-672: New indexing segment merge policy flushes all
buffered docs to their own segment and delays a merge until
mergeFactor segments of a certain level have been accumulated.
This increases indexing performance in the presence of deleted
docs or partially full segments as well as enabling future
optimizations.
NOTE: this also fixes an "under-merging" bug whereby it is
possible to get far too many segments in your index (which will
drastically slow down search, risks exhausting file descriptor
limit, etc.). This can happen when the number of buffered docs
at close, plus the number of docs in the last non-ram segment is
greater than mergeFactor. (Ning Li, Yonik Seeley)
7. Lazy loaded fields unnecessarily retained an extra copy of loaded
String data. (Yonik Seeley)
8. LUCENE-443: ConjunctionScorer performance increase. Speed up
any BooleanQuery with more than one mandatory clause.
(Abdul Chaudhry, Paul Elschot via Yonik Seeley)
9. LUCENE-365: DisjunctionSumScorer performance increase of
~30%. Speeds up queries with optional clauses. (Paul Elschot via
Yonik Seeley)
10. LUCENE-695: Optimized BufferedIndexInput.readBytes() for medium
size buffers, which will speed up merging and retrieving binary
and compressed fields. (Nadav Har'El via Yonik Seeley)
11. LUCENE-687: Lazy skipping on proximity file speeds up most
queries involving term positions, including phrase queries.
(Michael Busch via Yonik Seeley)
12. LUCENE-714: Replaced 2 cases of manual for-loop array copying
with calls to System.arraycopy instead, in DocumentWriter.java.
(Nicolas Lalevee via Mike McCandless)
13. LUCENE-729: Non-recursive skipTo and next implementation of
TermDocs for a MultiReader. The old implementation could
recurse up to the number of segments in the index. (Yonik Seeley)
14. LUCENE-739: Improve segment merging performance by reusing
the norm array across different fields and doing bulk writes
of norms of segments with no deleted docs.
(Michael Busch via Yonik Seeley)
15. LUCENE-745: Add BooleanQuery.clauses(), allowing direct access
to the List of clauses and replaced the internal synchronized Vector
with an unsynchronized List. (Yonik Seeley)
16. LUCENE-750: Remove finalizers from FSIndexOutput and move the
FSIndexInput finalizer to the actual file so all clones don't
register a new finalizer. (Yonik Seeley)
Test Cases
1. Added TestTermScorer.java (Grant Ingersoll)
2. Added TestWindowsMMap.java (Benson Margulies via Mike McCandless)
3. LUCENE-744 Append the user.name property onto the temporary directory
that is created so it doesn't interfere with other users. (Grant Ingersoll)
Documentation
1. Added style sheet to xdocs named lucene.css and included in the
Anakia VSL descriptor. (Grant Ingersoll)
2. Added scoring.xml document into xdocs. Updated Similarity.java
scoring formula.(Grant Ingersoll and Steve Rowe. Updates from:
Michael McCandless, Doron Cohen, Chris Hostetter, Doug Cutting).
Issue 664.
3. Added javadocs for FieldSelectorResult.java. (Grant Ingersoll)
4. Moved xdocs directory to src/site/src/documentation/content/xdocs per
Issue 707. Site now builds using Forrest, just like the other Lucene
siblings. See http://wiki.apache.org/jakarta-lucene/HowToUpdateTheWebsite
for info on updating the website. (Grant Ingersoll with help from Steve Rowe,
Chris Hostetter, Doug Cutting, Otis Gospodnetic, Yonik Seeley)
5. Added in Developer and System Requirements sections under Resources (Grant Ingersoll)
6. LUCENE-713 Updated the Term Vector section of File Formats to include
documentation on how Offset and Position info are stored in the TVF file.
(Grant Ingersoll, Samir Abdou)
7. Added in link to Clover Test Code Coverage Reports under the Develop
section in Resources (Grant Ingersoll)
8. LUCENE-748: Added details for semantics of IndexWriter.close on
hitting an Exception. (Jed Wesley-Smith via Mike McCandless)
9. Added some text about what is contained in releases.
(Eric Haszlakiewicz via Grant Ingersoll)
10. LUCENE-758: Fix javadoc to clarify that RAMDirectory(Directory)
makes a full copy of the starting Directory. (Mike McCandless)
11. LUCENE-764: Fix javadocs to detail temporary space requirements
for IndexWriter's optimize(), addIndexes(*) and addDocument(...)
methods. (Mike McCandless)
Build
1. Added in clover test code coverage per http://issues.apache.org/jira/browse/LUCENE-721
To enable clover code coverage, you must have clover.jar in the ANT
classpath and specify -Drun.clover=true on the command line.
(Michael Busch and Grant Ingersoll)
2. Added a sysproperty in common-build.xml per Lucene 752 to map java.io.tmpdir to
${build.dir}/test just like the tempDir sysproperty.
3. LUCENE-757 Added new target named init-dist that does setup for
distribution of both binary and source distributions. Called by package
and package-*-src
======================= Release 2.0.0 2006-05-26 =======================
API Changes
1. All deprecated methods and fields have been removed, except
DateField, which will still be supported for some time
so Lucene can read its date fields from old indexes
(Yonik Seeley & Grant Ingersoll)
2. DisjunctionSumScorer is no longer public.
(Paul Elschot via Otis Gospodnetic)
3. Creating a Field with both an empty name and an empty value
now throws an IllegalArgumentException
(Daniel Naber)
4. LUCENE-301: Added new IndexWriter({String,File,Directory},
Analyzer) constructors that do not take a boolean "create"
argument. These new constructors will create a new index if
necessary, else append to the existing one. (Dan Armbrust via
Mike McCandless)
New features
1. LUCENE-496: Command line tool for modifying the field norms of an
existing index; added to contrib/miscellaneous. (Chris Hostetter)
2. LUCENE-577: SweetSpotSimilarity added to contrib/miscellaneous.
(Chris Hostetter)
Bug fixes
1. LUCENE-330: Fix issue of FilteredQuery not working properly within
BooleanQuery. (Paul Elschot via Erik Hatcher)
2. LUCENE-515: Make ConstantScoreRangeQuery and ConstantScoreQuery work
with RemoteSearchable. (Philippe Laflamme via Yonik Seeley)
3. Added methods to get/set writeLockTimeout and commitLockTimeout in
IndexWriter. These could be set in Lucene 1.4 using a system property.
This feature had been removed without adding the corresponding
getter/setter methods. (Daniel Naber)
4. LUCENE-413: Fixed ArrayIndexOutOfBoundsException exceptions
when using SpanQueries. (Paul Elschot via Yonik Seeley)
5. Implemented FilterIndexReader.getVersion() and isCurrent()
(Yonik Seeley)
6. LUCENE-540: Fixed a bug with IndexWriter.addIndexes(Directory[])
that sometimes caused the index order of documents to change.
(Yonik Seeley)
7. LUCENE-526: Fixed a bug in FieldSortedHitQueue that caused
subsequent String sorts with different locales to sort identically.
(Paul Cowan via Yonik Seeley)
8. LUCENE-541: Add missing extractTerms() to DisjunctionMaxQuery
(Stefan Will via Yonik Seeley)
9. LUCENE-514: Added getTermArrays() and extractTerms() to
MultiPhraseQuery (Eric Jain & Yonik Seeley)
10. LUCENE-512: Fixed ClassCastException in ParallelReader.getTermFreqVectors
(frederic via Yonik)
11. LUCENE-352: Fixed bug in SpanNotQuery that manifested as
NullPointerException when "exclude" query was not a SpanTermQuery.
(Chris Hostetter)
12. LUCENE-572: Fixed bug in SpanNotQuery hashCode, was ignoring exclude clause
(Chris Hostetter)
13. LUCENE-561: Fixed some ParallelReader bugs. NullPointerException if the reader
didn't know about the field yet, reader didn't keep track if it had deletions,
and deleteDocument calls could circumvent synchronization on the subreaders.
(Chuck Williams via Yonik Seeley)
14. LUCENE-556: Added empty extractTerms() implementation to MatchAllDocsQuery and
ConstantScoreQuery in order to allow their use with a MultiSearcher.
(Yonik Seeley)
15. LUCENE-546: Removed 2GB file size limitations for RAMDirectory.
(Peter Royal, Michael Chan, Yonik Seeley)
16. LUCENE-485: Don't hold commit lock while removing obsolete index
files. (Luc Vanlerberghe via cutting)
1.9.1
Bug fixes
1. LUCENE-511: Fix a bug in the BufferedIndexOutput optimization
introduced in 1.9-final. (Shay Banon & Steven Tamm via cutting)
1.9 final
Note that this release is mostly but not 100% source compatible with
the previous release of Lucene (1.4.3). In other words, you should
make sure your application compiles with this version of Lucene before
you replace the old Lucene JAR with the new one. Many methods have
been deprecated in anticipation of release 2.0, so deprecation
warnings are to be expected when upgrading from 1.4.3 to 1.9.
Bug fixes
1. The fix that made IndexWriter.setMaxBufferedDocs(1) work had negative
effects on indexing performance and has thus been reverted. The
argument for setMaxBufferedDocs(int) must now at least be 2, otherwise
an exception is thrown. (Daniel Naber)
Optimizations
1. Optimized BufferedIndexOutput.writeBytes() to use
System.arraycopy() in more cases, rather than copying byte-by-byte.
(Lukas Zapletal via Cutting)
1.9 RC1
Requirements
1. To compile and use Lucene you now need Java 1.4 or later.
Changes in runtime behavior
1. FuzzyQuery can no longer throw a TooManyClauses exception. If a
FuzzyQuery expands to more than BooleanQuery.maxClauseCount
terms only the BooleanQuery.maxClauseCount most similar terms
go into the rewritten query and thus the exception is avoided.
(Christoph)
2. Changed system property from "org.apache.lucene.lockdir" to
"org.apache.lucene.lockDir", so that its casing follows the existing
pattern used in other Lucene system properties. (Bernhard)
3. The terms of RangeQueries and FuzzyQueries are now converted to
lowercase by default (as it has been the case for PrefixQueries
and WildcardQueries before). Use setLowercaseExpandedTerms(false)
to disable that behavior but note that this also affects
PrefixQueries and WildcardQueries. (Daniel Naber)
4. Document frequency that is computed when MultiSearcher is used is now
computed correctly and "globally" across subsearchers and indices, while
before it used to be computed locally to each index, which caused
ranking across multiple indices not to be equivalent.
(Chuck Williams, Wolf Siberski via Otis, bug #31841)
5. When opening an IndexWriter with create=true, Lucene now only deletes
its own files from the index directory (looking at the file name suffixes
to decide if a file belongs to Lucene). The old behavior was to delete
all files. (Daniel Naber and Bernhard Messer, bug #34695)
6. The version of an IndexReader, as returned by getCurrentVersion()
and getVersion() doesn't start at 0 anymore for new indexes. Instead, it
is now initialized by the system time in milliseconds.
(Bernhard Messer via Daniel Naber)
7. Several default values cannot be set via system properties anymore, as
this has been considered inappropriate for a library like Lucene. For
most properties there are set/get methods available in IndexWriter which
you should use instead. This affects the following properties:
See IndexWriter for getter/setter methods:
org.apache.lucene.writeLockTimeout, org.apache.lucene.commitLockTimeout,
org.apache.lucene.minMergeDocs, org.apache.lucene.maxMergeDocs,
org.apache.lucene.maxFieldLength, org.apache.lucene.termIndexInterval,
org.apache.lucene.mergeFactor,
See BooleanQuery for getter/setter methods:
org.apache.lucene.maxClauseCount
See FSDirectory for getter/setter methods:
disableLuceneLocks
(Daniel Naber)
8. Fixed FieldCacheImpl to use user-provided IntParser and FloatParser,
instead of using Integer and Float classes for parsing.
(Yonik Seeley via Otis Gospodnetic)
9. Expert level search routines returning TopDocs and TopFieldDocs
no longer normalize scores. This also fixes bugs related to
MultiSearchers and score sorting/normalization.
(Luc Vanlerberghe via Yonik Seeley, LUCENE-469)
New features
1. Added support for stored compressed fields (patch #31149)
(Bernhard Messer via Christoph)
2. Added support for binary stored fields (patch #29370)
(Drew Farris and Bernhard Messer via Christoph)
3. Added support for position and offset information in term vectors
(patch #18927). (Grant Ingersoll & Christoph)
4. A new class DateTools has been added. It allows you to format dates
in a readable format adequate for indexing. Unlike the existing
DateField class DateTools can cope with dates before 1970 and it
forces you to specify the desired date resolution (e.g. month, day,
second, ...) which can make RangeQuerys on those fields more efficient.
(Daniel Naber)
5. QueryParser now correctly works with Analyzers that can return more
than one token per position. For example, a query "+fast +car"
would be parsed as "+fast +(car automobile)" if the Analyzer
returns "car" and "automobile" at the same position whenever it
finds "car" (Patch #23307).
(Pierrick Brihaye, Daniel Naber)
6. Permit unbuffered Directory implementations (e.g., using mmap).
InputStream is replaced by the new classes IndexInput and
BufferedIndexInput. OutputStream is replaced by the new classes
IndexOutput and BufferedIndexOutput. InputStream and OutputStream
are now deprecated and FSDirectory is now subclassable. (cutting)
7. Add native Directory and TermDocs implementations that work under
GCJ. These require GCC 3.4.0 or later and have only been tested
on Linux. Use 'ant gcj' to build demo applications. (cutting)
8. Add MMapDirectory, which uses nio to mmap input files. This is
still somewhat slower than FSDirectory. However it uses less
memory per query term, since a new buffer is not allocated per
term, which may help applications which use, e.g., wildcard
queries. It may also someday be faster. (cutting & Paul Elschot)
9. Added javadocs-internal to build.xml - bug #30360
(Paul Elschot via Otis)
10. Added RangeFilter, a more generically useful filter than DateFilter.
(Chris M Hostetter via Erik)
11. Added NumberTools, a utility class indexing numeric fields.
(adapted from code contributed by Matt Quail; committed by Erik)
12. Added public static IndexReader.main(String[] args) method.
IndexReader can now be used directly at command line level
to list and optionally extract the individual files from an existing
compound index file.
(adapted from code contributed by Garrett Rooney; committed by Bernhard)
13. Add IndexWriter.setTermIndexInterval() method. See javadocs.
(Doug Cutting)
14. Added LucenePackage, whose static get() method returns java.util.Package,
which lets the caller get the Lucene version information specified in
the Lucene Jar.
(Doug Cutting via Otis)
15. Added Hits.iterator() method and corresponding HitIterator and Hit objects.
This provides standard java.util.Iterator iteration over Hits.
Each call to the iterator's next() method returns a Hit object.
(Jeremy Rayner via Erik)
16. Add ParallelReader, an IndexReader that combines separate indexes
over different fields into a single virtual index. (Doug Cutting)
17. Add IntParser and FloatParser interfaces to FieldCache, so that
fields in arbitrarily formats can be cached as ints and floats.
(Doug Cutting)
18. Added class org.apache.lucene.index.IndexModifier which combines
IndexWriter and IndexReader, so you can add and delete documents without
worrying about synchronization/locking issues.
(Daniel Naber)
19. Lucene can now be used inside an unsigned applet, as Lucene's access
to system properties will not cause a SecurityException anymore.
(Jon Schuster via Daniel Naber, bug #34359)
20. Added a new class MatchAllDocsQuery that matches all documents.
(John Wang via Daniel Naber, bug #34946)
21. Added ability to omit norms on a per field basis to decrease
index size and memory consumption when there are many indexed fields.
See Field.setOmitNorms()
(Yonik Seeley, LUCENE-448)
22. Added NullFragmenter to contrib/highlighter, which is useful for
highlighting entire documents or fields.
(Erik Hatcher)
23. Added regular expression queries, RegexQuery and SpanRegexQuery.
Note the same term enumeration caveats apply with these queries as
apply to WildcardQuery and other term expanding queries.
These two new queries are not currently supported via QueryParser.
(Erik Hatcher)
24. Added ConstantScoreQuery which wraps a filter and produces a score
equal to the query boost for every matching document.
(Yonik Seeley, LUCENE-383)
25. Added ConstantScoreRangeQuery which produces a constant score for
every document in the range. One advantage over a normal RangeQuery
is that it doesn't expand to a BooleanQuery and thus doesn't have a maximum
number of terms the range can cover. Both endpoints may also be open.
(Yonik Seeley, LUCENE-383)
26. Added ability to specify a minimum number of optional clauses that
must match in a BooleanQuery. See BooleanQuery.setMinimumNumberShouldMatch().
(Paul Elschot, Chris Hostetter via Yonik Seeley, LUCENE-395)
27. Added DisjunctionMaxQuery which provides the maximum score across its clauses.
It's very useful for searching across multiple fields.
(Chuck Williams via Yonik Seeley, LUCENE-323)
28. New class ISOLatin1AccentFilter that replaces accented characters in the ISO
Latin 1 character set by their unaccented equivalent.
(Sven Duzont via Erik Hatcher)
29. New class KeywordAnalyzer. "Tokenizes" the entire stream as a single token.
This is useful for data like zip codes, ids, and some product names.
(Erik Hatcher)
30. Copied LengthFilter from contrib area to core. Removes words that are too
long and too short from the stream.
(David Spencer via Otis and Daniel)
31. Added getPositionIncrementGap(String fieldName) to Analyzer. This allows
custom analyzers to put gaps between Field instances with the same field
name, preventing phrase or span queries crossing these boundaries. The
default implementation issues a gap of 0, allowing the default token
position increment of 1 to put the next field's first token into a
successive position.
(Erik Hatcher, with advice from Yonik)
32. StopFilter can now ignore case when checking for stop words.
(Grant Ingersoll via Yonik, LUCENE-248)
33. Add TopDocCollector and TopFieldDocCollector. These simplify the
implementation of hit collectors that collect only the
top-scoring or top-sorting hits.
API Changes
1. Several methods and fields have been deprecated. The API documentation
contains information about the recommended replacements. It is planned
that most of the deprecated methods and fields will be removed in
Lucene 2.0. (Daniel Naber)
2. The Russian and the German analyzers have been moved to contrib/analyzers.
Also, the WordlistLoader class has been moved one level up in the
hierarchy and is now org.apache.lucene.analysis.WordlistLoader
(Daniel Naber)
3. The API contained methods that declared to throw an IOException
but that never did this. These declarations have been removed. If
your code tries to catch these exceptions you might need to remove
those catch clauses to avoid compile errors. (Daniel Naber)
4. Add a serializable Parameter Class to standardize parameter enum
classes in BooleanClause and Field. (Christoph)
5. Added rewrite methods to all SpanQuery subclasses that nest other SpanQuerys.
This allows custom SpanQuery subclasses that rewrite (for term expansion, for
example) to nest within the built-in SpanQuery classes successfully.
Bug fixes
1. The JSP demo page (src/jsp/results.jsp) now properly closes the
IndexSearcher it opens. (Daniel Naber)
2. Fixed a bug in IndexWriter.addIndexes(IndexReader[] readers) that
prevented deletion of obsolete segments. (Christoph Goller)
3. Fix in FieldInfos to avoid the return of an extra blank field in
IndexReader.getFieldNames() (Patch #19058). (Mark Harwood via Bernhard)
4. Some combinations of BooleanQuery and MultiPhraseQuery (formerly
PhrasePrefixQuery) could provoke UnsupportedOperationException
(bug #33161). (Rhett Sutphin via Daniel Naber)
5. Small bug in skipTo of ConjunctionScorer that caused NullPointerException
if skipTo() was called without prior call to next() fixed. (Christoph)
6. Disable Similiarty.coord() in the scoring of most automatically
generated boolean queries. The coord() score factor is
appropriate when clauses are independently specified by a user,
but is usually not appropriate when clauses are generated
automatically, e.g., by a fuzzy, wildcard or range query. Matches
on such automatically generated queries are no longer penalized
for not matching all terms. (Doug Cutting, Patch #33472)
7. Getting a lock file with Lock.obtain(long) was supposed to wait for
a given amount of milliseconds, but this didn't work.
(John Wang via Daniel Naber, Bug #33799)
8. Fix FSDirectory.createOutput() to always create new files.
Previously, existing files were overwritten, and an index could be
corrupted when the old version of a file was longer than the new.
Now any existing file is first removed. (Doug Cutting)
9. Fix BooleanQuery containing nested SpanTermQuery's, which previously
could return an incorrect number of hits.
(Reece Wilton via Erik Hatcher, Bug #35157)
10. Fix NullPointerException that could occur with a MultiPhraseQuery
inside a BooleanQuery.
(Hans Hjelm and Scotty Allen via Daniel Naber, Bug #35626)
11. Fixed SnowballFilter to pass through the position increment from
the original token.
(Yonik Seeley via Erik Hatcher, LUCENE-437)
12. Added Unicode range of Korean characters to StandardTokenizer,
grouping contiguous characters into a token rather than one token
per character. This change also changes the token type to "<CJ>"
for Chinese and Japanese character tokens (previously it was "<CJK>").
(Cheolgoo Kang via Otis and Erik, LUCENE-444 and LUCENE-461)
13. FieldsReader now looks at FieldInfo.storeOffsetWithTermVector and
FieldInfo.storePositionWithTermVector and creates the Field with
correct TermVector parameter.
(Frank Steinmann via Bernhard, LUCENE-455)
14. Fixed WildcardQuery to prevent "cat" matching "ca??".
(Xiaozheng Ma via Bernhard, LUCENE-306)
15. Fixed a bug where MultiSearcher and ParallelMultiSearcher could
change the sort order when sorting by string for documents without
a value for the sort field.
(Luc Vanlerberghe via Yonik, LUCENE-453)
16. Fixed a sorting problem with MultiSearchers that can lead to
missing or duplicate docs due to equal docs sorting in an arbitrary order.
(Yonik Seeley, LUCENE-456)
17. A single hit using the expert level sorted search methods
resulted in the score not being normalized.
(Yonik Seeley, LUCENE-462)
18. Fixed inefficient memory usage when loading an index into RAMDirectory.
(Volodymyr Bychkoviak via Bernhard, LUCENE-475)
19. Corrected term offsets returned by ChineseTokenizer.
(Ray Tsang via Erik Hatcher, LUCENE-324)
20. Fixed MultiReader.undeleteAll() to correctly update numDocs.
(Robert Kirchgessner via Doug Cutting, LUCENE-479)
21. Race condition in IndexReader.getCurrentVersion() and isCurrent()
fixed by acquiring the commit lock.
(Luc Vanlerberghe via Yonik Seeley, LUCENE-481)
22. IndexWriter.setMaxBufferedDocs(1) didn't have the expected effect,
this has now been fixed. (Daniel Naber)
23. Fixed QueryParser when called with a date in local form like
"[1/16/2000 TO 1/18/2000]". This query did not include the documents
of 1/18/2000, i.e. the last day was not included. (Daniel Naber)
24. Removed sorting constraint that threw an exception if there were
not yet any values for the sort field (Yonik Seeley, LUCENE-374)
Optimizations
1. Disk usage (peak requirements during indexing and optimization)
in case of compound file format has been improved.
(Bernhard, Dmitry, and Christoph)
2. Optimize the performance of certain uses of BooleanScorer,
TermScorer and IndexSearcher. In particular, a BooleanQuery
composed of TermQuery, with not all terms required, that returns a
TopDocs (e.g., through a Hits with no Sort specified) runs much
faster. (cutting)
3. Removed synchronization from reading of term vectors with an
IndexReader (Patch #30736). (Bernhard Messer via Christoph)
4. Optimize term-dictionary lookup to allocate far fewer terms when
scanning for the matching term. This speeds searches involving
low-frequency terms, where the cost of dictionary lookup can be
significant. (cutting)
5. Optimize fuzzy queries so the standard fuzzy queries with a prefix
of 0 now run 20-50% faster (Patch #31882).
(Jonathan Hager via Daniel Naber)
6. A Version of BooleanScorer (BooleanScorer2) added that delivers
documents in increasing order and implements skipTo. For queries
with required or forbidden clauses it may be faster than the old
BooleanScorer, for BooleanQueries consisting only of optional
clauses it is probably slower. The new BooleanScorer is now the
default. (Patch 31785 by Paul Elschot via Christoph)
7. Use uncached access to norms when merging to reduce RAM usage.
(Bug #32847). (Doug Cutting)
8. Don't read term index when random-access is not required. This
reduces time to open IndexReaders and they use less memory when
random access is not required, e.g., when merging segments. The
term index is now read into memory lazily at the first
random-access. (Doug Cutting)
9. Optimize IndexWriter.addIndexes(Directory[]) when the number of
added indexes is larger than mergeFactor. Previously this could
result in quadratic performance. Now performance is n log(n).
(Doug Cutting)
10. Speed up the creation of TermEnum for indices with multiple
segments and deleted documents, and thus speed up PrefixQuery,
RangeQuery, WildcardQuery, FuzzyQuery, RangeFilter, DateFilter,
and sorting the first time on a field.
(Yonik Seeley, LUCENE-454)
11. Optimized and generalized 32 bit floating point to byte
(custom 8 bit floating point) conversions. Increased the speed of
Similarity.encodeNorm() anywhere from 10% to 250%, depending on the JVM.
(Yonik Seeley, LUCENE-467)
Infrastructure
1. Lucene's source code repository has converted from CVS to
Subversion. The new repository is at
http://svn.apache.org/repos/asf/lucene/java/trunk
2. Lucene's issue tracker has migrated from Bugzilla to JIRA.
Lucene's JIRA is at http://issues.apache.org/jira/browse/LUCENE
The old issues are still available at
http://issues.apache.org/bugzilla/show_bug.cgi?id=xxxx
(use the bug number instead of xxxx)
1.4.3
1. The JSP demo page (src/jsp/results.jsp) now properly escapes error
messages which might contain user input (e.g. error messages about
query parsing). If you used that page as a starting point for your
own code please make sure your code also properly escapes HTML
characters from user input in order to avoid so-called cross site
scripting attacks. (Daniel Naber)
2. QueryParser changes in 1.4.2 broke the QueryParser API. Now the old
API is supported again. (Christoph)
1.4.2
1. Fixed bug #31241: Sorting could lead to incorrect results (documents
missing, others duplicated) if the sort keys were not unique and there
were more than 100 matches. (Daniel Naber)
2. Memory leak in Sort code (bug #31240) eliminated.
(Rafal Krzewski via Christoph and Daniel)
3. FuzzyQuery now takes an additional parameter that specifies the
minimum similarity that is required for a term to match the query.
The QueryParser syntax for this is term~x, where x is a floating
point number >= 0 and < 1 (a bigger number means that a higher
similarity is required). Furthermore, a prefix can be specified
for FuzzyQuerys so that only those terms are considered similar that
start with this prefix. This can speed up FuzzyQuery greatly.
(Daniel Naber, Christoph Goller)
4. PhraseQuery and PhrasePrefixQuery now allow the explicit specification
of relative positions. (Christoph Goller)
5. QueryParser changes: Fix for ArrayIndexOutOfBoundsExceptions
(patch #9110); some unused method parameters removed; The ability
to specify a minimum similarity for FuzzyQuery has been added.
(Christoph Goller)
6. IndexSearcher optimization: a new ScoreDoc is no longer allocated
for every non-zero-scoring hit. This makes 'OR' queries that
contain common terms substantially faster. (cutting)
1.4.1
1. Fixed a performance bug in hit sorting code, where values were not
correctly cached. (Aviran via cutting)
2. Fixed errors in file format documentation. (Daniel Naber)
1.4 final
1. Added "an" to the list of stop words in StopAnalyzer, to complement
the existing "a" there. Fix for bug 28960
(http://issues.apache.org/bugzilla/show_bug.cgi?id=28960). (Otis)
2. Added new class FieldCache to manage in-memory caches of field term
values. (Tim Jones)
3. Added overloaded getFieldQuery method to QueryParser which
accepts the slop factor specified for the phrase (or the default
phrase slop for the QueryParser instance). This allows overriding
methods to replace a PhraseQuery with a SpanNearQuery instead,
keeping the proper slop factor. (Erik Hatcher)
4. Changed the encoding of GermanAnalyzer.java and GermanStemmer.java to
UTF-8 and changed the build encoding to UTF-8, to make changed files
compile. (Otis Gospodnetic)
5. Removed synchronization from term lookup under IndexReader methods
termFreq(), termDocs() or termPositions() to improve
multi-threaded performance. (cutting)
6. Fix a bug where obsolete segment files were not deleted on Win32.
1.4 RC3
1. Fixed several search bugs introduced by the skipTo() changes in
release 1.4RC1. The index file format was changed a bit, so
collections must be re-indexed to take advantage of the skipTo()
optimizations. (Christoph Goller)
2. Added new Document methods, removeField() and removeFields().
(Christoph Goller)
3. Fixed inconsistencies with index closing. Indexes and directories
are now only closed automatically by Lucene when Lucene opened
them automatically. (Christoph Goller)
4. Added new class: FilteredQuery. (Tim Jones)
5. Added a new SortField type for custom comparators. (Tim Jones)
6. Lock obtain timed out message now displays the full path to the lock
file. (Daniel Naber via Erik)
7. Fixed a bug in SpanNearQuery when ordered. (Paul Elschot via cutting)
8. Fixed so that FSDirectory's locks still work when the
java.io.tmpdir system property is null. (cutting)
9. Changed FilteredTermEnum's constructor to take no parameters,
as the parameters were ignored anyway (bug #28858)
1.4 RC2
1. GermanAnalyzer now throws an exception if the stopword file
cannot be found (bug #27987). It now uses LowerCaseFilter
(bug #18410) (Daniel Naber via Otis, Erik)
2. Fixed a few bugs in the file format documentation. (cutting)
1.4 RC1
1. Changed the format of the .tis file, so that:
- it has a format version number, which makes it easier to
back-compatibly change file formats in the future.
- the term count is now stored as a long. This was the one aspect
of the Lucene's file formats which limited index size.
- a few internal index parameters are now stored in the index, so
that they can (in theory) now be changed from index to index,
although there is not yet an API to do so.
These changes are back compatible. The new code can read old
indexes. But old code will not be able read new indexes. (cutting)
2. Added an optimized implementation of TermDocs.skipTo(). A skip
table is now stored for each term in the .frq file. This only
adds a percent or two to overall index size, but can substantially
speedup many searches. (cutting)
3. Restructured the Scorer API and all Scorer implementations to take
advantage of an optimized TermDocs.skipTo() implementation. In
particular, PhraseQuerys and conjunctive BooleanQuerys are
faster when one clause has substantially fewer matches than the
others. (A conjunctive BooleanQuery is a BooleanQuery where all
clauses are required.) (cutting)
4. Added new class ParallelMultiSearcher. Combined with
RemoteSearchable this makes it easy to implement distributed
search systems. (Jean-Francois Halleux via cutting)
5. Added support for hit sorting. Results may now be sorted by any
indexed field. For details see the javadoc for
Searcher#search(Query, Sort). (Tim Jones via Cutting)
6. Changed FSDirectory to auto-create a full directory tree that it
needs by using mkdirs() instead of mkdir(). (Mladen Turk via Otis)
7. Added a new span-based query API. This implements, among other
things, nested phrases. See javadocs for details. (Doug Cutting)
8. Added new method Query.getSimilarity(Searcher), and changed
scorers to use it. This permits one to subclass a Query class so
that it can specify its own Similarity implementation, perhaps
one that delegates through that of the Searcher. (Julien Nioche
via Cutting)
9. Added MultiReader, an IndexReader that combines multiple other
IndexReaders. (Cutting)
10. Added support for term vectors. See Field#isTermVectorStored().
(Grant Ingersoll, Cutting & Dmitry)
11. Fixed the old bug with escaping of special characters in query
strings: http://issues.apache.org/bugzilla/show_bug.cgi?id=24665
(Jean-Francois Halleux via Otis)
12. Added support for overriding default values for the following,
using system properties:
- default commit lock timeout
- default maxFieldLength
- default maxMergeDocs
- default mergeFactor
- default minMergeDocs
- default write lock timeout
(Otis)
13. Changed QueryParser.jj to allow '-' and '+' within tokens:
http://issues.apache.org/bugzilla/show_bug.cgi?id=27491
(Morus Walter via Otis)
14. Changed so that the compound index format is used by default.
This makes indexing a bit slower, but vastly reduces the chances
of file handle problems. (Cutting)
1.3 final
1. Added catch of BooleanQuery$TooManyClauses in QueryParser to
throw ParseException instead. (Erik Hatcher)
2. Fixed a NullPointerException in Query.explain(). (Doug Cutting)
3. Added a new method IndexReader.setNorm(), that permits one to
alter the boosting of fields after an index is created.
4. Distinguish between the final position and length when indexing a
field. The length is now defined as the total number of tokens,
instead of the final position, as it was previously. Length is
used for score normalization (Similarity.lengthNorm()) and for
controlling memory usage (IndexWriter.maxFieldLength). In both of
these cases, the total number of tokens is a better value to use
than the final token position. Position is used in phrase
searching (see PhraseQuery and Token.setPositionIncrement()).
5. Fix StandardTokenizer's handling of CJK characters (Chinese,
Japanese and Korean ideograms). Previously contiguous sequences
were combined in a single token, which is not very useful. Now
each ideogram generates a separate token, which is more useful.
1.3 RC3
1. Added minMergeDocs in IndexWriter. This can be raised to speed
indexing without altering the number of files, but only using more
memory. (Julien Nioche via Otis)
2. Fix bug #24786, in query rewriting. (bschneeman via Cutting)
3. Fix bug #16952, in demo HTML parser, skip comments in
javascript. (Christoph Goller)
4. Fix bug #19253, in demo HTML parser, add whitespace as needed to
output (Daniel Naber via Christoph Goller)
5. Fix bug #24301, in demo HTML parser, long titles no longer
hang things. (Christoph Goller)
6. Fix bug #23534, Replace use of file timestamp of segments file
with an index version number stored in the segments file. This
resolves problems when running on file systems with low-resolution
timestamps, e.g., HFS under MacOS X. (Christoph Goller)
7. Fix QueryParser so that TokenMgrError is not thrown, only
ParseException. (Erik Hatcher)
8. Fix some bugs introduced by change 11 of RC2. (Christoph Goller)
9. Fixed a problem compiling TestRussianStem. (Christoph Goller)
10. Cleaned up some build stuff. (Erik Hatcher)
1.3 RC2
1. Added getFieldNames(boolean) to IndexReader, SegmentReader, and
SegmentsReader. (Julien Nioche via otis)
2. Changed file locking to place lock files in
System.getProperty("java.io.tmpdir"), where all users are
permitted to write files. This way folks can open and correctly
lock indexes which are read-only to them.
3. IndexWriter: added a new method, addDocument(Document, Analyzer),
permitting one to easily use different analyzers for different
documents in the same index.
4. Minor enhancements to FuzzyTermEnum.
(Christoph Goller via Otis)
5. PriorityQueue: added insert(Object) method and adjusted IndexSearcher
and MultiIndexSearcher to use it.
(Christoph Goller via Otis)
6. Fixed a bug in IndexWriter that returned incorrect docCount().
(Christoph Goller via Otis)
7. Fixed SegmentsReader to eliminate the confusing and slightly different
behaviour of TermEnum when dealing with an enumeration of all terms,
versus an enumeration starting from a specific term.
This patch also fixes incorrect term document frequencies when the same term
is present in multiple segments.
(Christoph Goller via Otis)
8. Added CachingWrapperFilter and PerFieldAnalyzerWrapper. (Erik Hatcher)
9. Added support for the new "compound file" index format (Dmitry
Serebrennikov)
10. Added Locale setting to QueryParser, for use by date range parsing.
11. Changed IndexReader so that it can be subclassed by classes
outside of its package. Previously it had package-private
abstract methods. Also modified the index merging code so that it
can work on an arbitrary IndexReader implementation, and added a
new method, IndexWriter.addIndexes(IndexReader[]), to take
advantage of this. (cutting)
12. Added a limit to the number of clauses which may be added to a
BooleanQuery. The default limit is 1024 clauses. This should
stop most OutOfMemoryExceptions by prefix, wildcard and fuzzy
queries which run amok. (cutting)
13. Add new method: IndexReader.undeleteAll(). This undeletes all
deleted documents which still remain in the index. (cutting)
1.3 RC1
1. Fixed PriorityQueue's clear() method.
Fix for bug 9454, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9454
(Matthijs Bomhoff via otis)
2. Changed StandardTokenizer.jj grammar for EMAIL tokens.
Fix for bug 9015, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9015
(Dale Anson via otis)
3. Added the ability to disable lock creation by using disableLuceneLocks
system property. This is useful for read-only media, such as CD-ROMs.
(otis)
4. Added id method to Hits to be able to access the index global id.
Required for sorting options.
(carlson)
5. Added support for new range query syntax to QueryParser.jj.
(briangoetz)
6. Added the ability to retrieve HTML documents' META tag values to
HTMLParser.jj.
(Mark Harwood via otis)
7. Modified QueryParser to make it possible to programmatically specify the
default Boolean operator (OR or AND).
(Péter Halácsy via otis)
8. Made many search methods and classes non-final, per requests.
This includes IndexWriter and IndexSearcher, among others.
(cutting)
9. Added class RemoteSearchable, providing support for remote
searching via RMI. The test class RemoteSearchableTest.java
provides an example of how this can be used. (cutting)
10. Added PhrasePrefixQuery (and supporting MultipleTermPositions). The
test class TestPhrasePrefixQuery provides the usage example.
(Anders Nielsen via otis)
11. Changed the German stemming algorithm to ignore case while
stripping. The new algorithm is faster and produces more equal
stems from nouns and verbs derived from the same word.
(gschwarz)
12. Added support for boosting the score of documents and fields via
the new methods Document.setBoost(float) and Field.setBoost(float).
Note: This changes the encoding of an indexed value. Indexes
should be re-created from scratch in order for search scores to
be correct. With the new code and an old index, searches will
yield very large scores for shorter fields, and very small scores
for longer fields. Once the index is re-created, scores will be
as before. (cutting)
13. Added new method Token.setPositionIncrement().
This permits, for the purpose of phrase searching, placing
multiple terms in a single position. This is useful with
stemmers that produce multiple possible stems for a word.
This also permits the introduction of gaps between terms, so that
terms which are adjacent in a token stream will not be matched by
and exact phrase query. This makes it possible, e.g., to build
an analyzer where phrases are not matched over stop words which
have been removed.
Finally, repeating a token with an increment of zero can also be
used to boost scores of matches on that token. (cutting)
14. Added new Filter class, QueryFilter. This constrains search
results to only match those which also match a provided query.
Results are cached, so that searches after the first on the same
index using this filter are very fast.
This could be used, for example, with a RangeQuery on a formatted
date field to implement date filtering. One could re-use a
single QueryFilter that matches, e.g., only documents modified
within the last week. The QueryFilter and RangeQuery would only
need to be reconstructed once per day. (cutting)
15. Added a new IndexWriter method, getAnalyzer(). This returns the
analyzer used when adding documents to this index. (cutting)
16. Fixed a bug with IndexReader.lastModified(). Before, document
deletion did not update this. Now it does. (cutting)
17. Added Russian Analyzer.
(Boris Okner via otis)
18. Added a public, extensible scoring API. For details, see the
javadoc for org.apache.lucene.search.Similarity.
19. Fixed return of Hits.id() from float to int. (Terry Steichen via Peter).
20. Added getFieldNames() to IndexReader and Segment(s)Reader classes.
(Peter Mularien via otis)
21. Added getFields(String) and getValues(String) methods.
Contributed by Rasik Pandey on 2002-10-09
(Rasik Pandey via otis)
22. Revised internal search APIs. Changes include:
a. Queries are no longer modified during a search. This makes
it possible, e.g., to reuse the same query instance with
multiple indexes from multiple threads.
b. Term-expanding queries (e.g. PrefixQuery, WildcardQuery,
etc.) now work correctly with MultiSearcher, fixing bugs 12619
and 12667.
c. Boosting BooleanQuery's now works, and is supported by the
query parser (problem reported by Lee Mallabone). Thus a query
like "(+foo +bar)^2 +baz" is now supported and equivalent to
"(+foo^2 +bar^2) +baz".
d. New method: Query.rewrite(IndexReader). This permits a
query to re-write itself as an alternate, more primitive query.
Most of the term-expanding query classes (PrefixQuery,
WildcardQuery, etc.) are now implemented using this method.
e. New method: Searchable.explain(Query q, int doc). This
returns an Explanation instance that describes how a particular
document is scored against a query. An explanation can be
displayed as either plain text, with the toString() method, or
as HTML, with the toHtml() method. Note that computing an
explanation is as expensive as executing the query over the
entire index. This is intended to be used in developing
Similarity implementations, and, for good performance, should
not be displayed with every hit.
f. Scorer and Weight are public, not package protected. It now
possible for someone to write a Scorer implementation that is
not in the org.apache.lucene.search package. This is still
fairly advanced programming, and I don't expect anyone to do
this anytime soon, but at least now it is possible.
g. Added public accessors to the primitive query classes
(TermQuery, PhraseQuery and BooleanQuery), permitting access to
their terms and clauses.
Caution: These are extensive changes and they have not yet been
tested extensively. Bug reports are appreciated.
(cutting)
23. Added convenience RAMDirectory constructors taking File and String
arguments, for easy FSDirectory to RAMDirectory conversion.
(otis)
24. Added code for manual renaming of files in FSDirectory, since it
has been reported that java.io.File's renameTo(File) method sometimes
fails on Windows JVMs.
(Matt Tucker via otis)
25. Refactored QueryParser to make it easier for people to extend it.
Added the ability to automatically lower-case Wildcard terms in
the QueryParser.
(Tatu Saloranta via otis)
1.2 RC6
1. Changed QueryParser.jj to have "?" be a special character which
allowed it to be used as a wildcard term. Updated TestWildcard
unit test also. (Ralf Hettesheimer via carlson)
1.2 RC5
1. Renamed build.properties to default.properties and updated
the BUILD.txt document to describe how to override the
default.property settings without having to edit the file. This
brings the build process closer to Scarab's build process.
(jon)
2. Added MultiFieldQueryParser class. (Kelvin Tan, via otis)
3. Updated "powered by" links. (otis)
4. Fixed instruction for setting up JavaCC - Bug #7017 (otis)
5. Added throwing exception if FSDirectory could not create directory
- Bug #6914 (Eugene Gluzberg via otis)
6. Update MultiSearcher, MultiFieldParse, Constants, DateFilter,
LowerCaseTokenizer javadoc (otis)
7. Added fix to avoid NullPointerException in results.jsp
(Mark Hayes via otis)
8. Changed Wildcard search to find 0 or more char instead of 1 or more
(Lee Mallobone, via otis)
9. Fixed error in offset issue in GermanStemFilter - Bug #7412
(Rodrigo Reyes, via otis)
10. Added unit tests for wildcard search and DateFilter (otis)
11. Allow co-existence of indexed and non-indexed fields with the same name
(cutting/casper, via otis)
12. Add escape character to query parser.
(briangoetz)
13. Applied a patch that ensures that searches that use DateFilter
don't throw an exception when no matches are found. (David Smiley, via
otis)
14. Fixed bugs in DateFilter and wildcardquery unit tests. (cutting, otis, carlson)
1.2 RC4
1. Updated contributions section of website.
Add XML Document #3 implementation to Document Section.
Also added Term Highlighting to Misc Section. (carlson)
2. Fixed NullPointerException for phrase searches containing
unindexed terms, introduced in 1.2RC3. (cutting)
3. Changed document deletion code to obtain the index write lock,
enforcing the fact that document addition and deletion cannot be
performed concurrently. (cutting)
4. Various documentation cleanups. (otis, acoliver)
5. Updated "powered by" links. (cutting, jon)
6. Fixed a bug in the GermanStemmer. (Bernhard Messer, via otis)
7. Changed Term and Query to implement Serializable. (scottganyo)
8. Fixed to never delete indexes added with IndexWriter.addIndexes().
(cutting)
9. Upgraded to JUnit 3.7. (otis)
1.2 RC3
1. IndexWriter: fixed a bug where adding an optimized index to an
empty index failed. This was encountered using addIndexes to copy
a RAMDirectory index to an FSDirectory.
2. RAMDirectory: fixed a bug where RAMInputStream could not read
across more than across a single buffer boundary.
3. Fix query parser so it accepts queries with unicode characters.
(briangoetz)
4. Fix query parser so that PrefixQuery is used in preference to
WildcardQuery when there's only an asterisk at the end of the
term. Previously PrefixQuery would never be used.
5. Fix tests so they compile; fix ant file so it compiles tests
properly. Added test cases for Analyzers and PriorityQueue.
6. Updated demos, added Getting Started documentation. (acoliver)
7. Added 'contributions' section to website & docs. (carlson)
8. Removed JavaCC from source distribution for copyright reasons.
Folks must now download this separately from metamata in order to
compile Lucene. (cutting)
9. Substantially improved the performance of DateFilter by adding the
ability to reuse TermDocs objects. (cutting)
10. Added IndexReader methods:
public static boolean indexExists(String directory);
public static boolean indexExists(File directory);
public static boolean indexExists(Directory directory);
public static boolean isLocked(Directory directory);
public static void unlock(Directory directory);
(cutting, otis)
11. Fixed bugs in GermanAnalyzer (gschwarz)
1.2 RC2, 19 October 2001:
- added sources to distribution
- removed broken build scripts and libraries from distribution
- SegmentsReader: fixed potential race condition
- FSDirectory: fixed so that getDirectory(xxx,true) correctly
erases the directory contents, even when the directory
has already been accessed in this JVM.
- RangeQuery: Fix issue where an inclusive range query would
include the nearest term in the index above a non-existant
specified upper term.
- SegmentTermEnum: Fix NullPointerException in clone() method
when the Term is null.
- JDK 1.1 compatibility fix: disabled lock files for JDK 1.1,
since they rely on a feature added in JDK 1.2.
1.2 RC1 (first Apache release), 2 October 2001:
- packages renamed from com.lucene to org.apache.lucene
- license switched from LGPL to Apache
- ant-only build -- no more makefiles
- addition of lock files--now fully thread & process safe
- addition of German stemmer
- MultiSearcher now supports low-level search API
- added RangeQuery, for term-range searching
- Analyzers can choose tokenizer based on field name
- misc bug fixes.
1.01b (last Sourceforge release), 2 July 2001
. a few bug fixes
. new Query Parser
. new prefix query (search for "foo*" matches "food")
1.0, 2000-10-04
This release fixes a few serious bugs and also includes some
performance optimizations, a stemmer, and a few other minor
enhancements.
0.04 2000-04-19
Lucene now includes a grammar-based tokenizer, StandardTokenizer.
The only tokenizer included in the previous release (LetterTokenizer)
identified terms consisting entirely of alphabetic characters. The
new tokenizer uses a regular-expression grammar to identify more
complex classes of terms, including numbers, acronyms, email
addresses, etc.
StandardTokenizer serves two purposes:
1. It is a much better, general purpose tokenizer for use by
applications as is.
The easiest way for applications to start using
StandardTokenizer is to use StandardAnalyzer.
2. It provides a good example of grammar-based tokenization.
If an application has special tokenization requirements, it can
implement a custom tokenizer by copying the directory containing
the new tokenizer into the application and modifying it
accordingly.
0.01, 2000-03-30
First open source release.
The code has been re-organized into a new package and directory
structure for this release. It builds OK, but has not been tested
beyond that since the re-organization.
|