1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234
|
Changelog
---------
[3.12.2] - 2025-03-02
^^^^^^^^^^^^^^^^^^^^^
Added
~~~~~
* added wheels for pypy 3.11
Changed
~~~~~~~
* upgrade to ``Cython==3.0.12``
[3.12.1] - 2025-01-30
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix version number
[3.12.0] - 2025-01-16
^^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~~~~~
* generate code for fallback imports to be better parseable for tools bundling Python
applications into a single binary (examples are cx-freeze and pyinstaller)
Added
~~~~~
* added support for taskflow 3.9.0
[3.11.0] - 2024-12-17
^^^^^^^^^^^^^^^^^^^^^
Performance
~~~~~~~~~~~
* improve calculation of min score inside partial_ratio so it can skip more alignments
Added
~~~~~
* added build support for emscripten
[3.10.1] - 2024-10-24
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix compilation on clang-19
* fix incorrect results in simd optimized implementation of Levenshtein and OSA on 32bit targets
Added
~~~~~
* added support for taskflow 3.8.0
[3.10.0] - 2024-09-21
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* drop support for Python 3.8
* switch build system to ``scikit-build-core``
[3.9.7] - 2024-09-02
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix crash in ``cdist`` due to Visual Studio upgrade
[3.9.6] - 2024-08-06
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* upgrade to ``Cython==3.0.11``
* add python 3.13 wheels
[3.9.5] - 2024-07-29
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* include simd binaries in pyinstaller builds
* fix builds with setuptools 72 by upgrading ``scikit-build``
[3.9.4] - 2024-07-02
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix bug in ``Levenshtein.editops`` and ``Levenshtein.opcodes`` which could lead
to incorrect results and crashes for some inputs
[3.9.3] - 2024-05-31
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix None handling for queries in ``process.cdist`` for scorers not supporting SIMD
[3.9.2] - 2024-05-28
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix supported versions of taskflow in cmake to be in the range v3.3 - v3.7
[3.9.1] - 2024-05-19
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* disable AVX2 on MacOS since it did lead to illegal instructions being generated
[3.9.0] - 2024-05-02
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* significantly improve type hints for the library
Fixed
~~~~~
* fix cmake version parsing
[3.8.1] - 2024-04-07
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* use the correct version of ``rapidfuzz-cpp`` when building against a system installed version
[3.8.0] - 2024-04-06
^^^^^^^^^^^^^^^^^^^^
Added
~~~~~
* added ``process.cpdist`` which allows pairwise comparison of two collection of inputs
Fixed
~~~~~
* fix some minor errors in the type hints
* fix potentially incorrect results of JaroWinkler when using high prefix weights
[3.7.0] - 2024-03-21
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* reduce importtime
[3.6.2] - 2024-03-05
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* upgrade to ``Cython==3.0.9``
Fixed
~~~~~
* upgrade ``rapidfuzz-cpp`` which includes a fix for build issues on some compilers
* fix some issues with the sphinx config
[3.6.1] - 2023-12-28
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix overflow error on systems with ``sizeof(size_t) < 8``
[3.6.0] - 2023-12-26
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix pure python fallback implementation of ``fuzz.token_set_ratio``
* properly link with ``-latomic`` if ``std::atomic<uint64_t>`` is not natively supported
Performance
~~~~~~~~~~~
* add banded implementation of LCS / Indel. This improves the runtime from ``O((|s1|/64) * |s2|)`` to ``O((score_cutoff/64) * |s2|)``
Changed
~~~~~~~
* upgrade to ``Cython==3.0.7``
* cdist for many metrics now returns a matrix of ``uint32`` instead of ``int32`` by default
[3.5.2] - 2023-11-02
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* use _mm_malloc/_mm_free on macOS if aligned_alloc is unsupported
[3.5.1] - 2023-10-31
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix compilation failure on macOS
[3.5.0] - 2023-10-31
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* skip pandas ``pd.NA`` similar to ``None``
* add ``score_multiplier`` argument to ``process.cdist`` which allows multiplying the end result scores
with a constant factor.
* drop support for Python 3.7
Performance
~~~~~~~~~~~
* improve performance of simd implementation for ``LCS`` / ``Indel`` / ``Jaro`` / ``JaroWinkler``
* improve performance of Jaro and Jaro Winkler for long sequences
* implement ``process.extract`` with ``limit=1`` using ``process.extractOne`` which can be faster
Fixed
~~~~~
* the preprocessing function was always called through Python due to a broken C-API version check
* fix wraparound issue in simd implementation of Jaro and Jaro Winkler
[3.4.0] - 2023-10-09
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* upgrade to ``Cython==3.0.3``
* add simd implementation for Jaro and Jaro Winkler
[3.3.1] - 2023-09-25
^^^^^^^^^^^^^^^^^^^^
Added
~~~~~
* add missing tag for python 3.12 support
[3.3.0] - 2023-09-11
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* upgrade to ``Cython==3.0.2``
* implement the remaining missing features from the C++ implementation in the pure Python implementation
Added
~~~~~
* added support for Python 3.12
[3.2.0] - 2023-08-02
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* build x86 with sse2/avx2 runtime detection
[3.1.2] - 2023-07-19
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* upgrade to ``Cython==3.0.0``
[3.1.1] - 2023-06-06
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* upgrade to ``taskflow==3.6``
Fixed
~~~~~
* replace usage of ``isnan`` with ``std::isnan`` which fixes the build on NetBSD
[3.1.0] - 2023-06-02
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* added keyword argument ``pad`` to Hamming distance. This controls whether sequences of different
length should be padded or lead to a ``ValueError``
* improve consistency of exception messages between the C++ and pure Python implementation
* upgrade required Cython version to ``Cython==3.0.0b3``
Fixed
~~~~~
* fix missing GIL restore when an exception is thrown inside ``process.cdist``
* fix incorrect type hints for the ``process`` module
[3.0.0] - 2023-04-16
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* allow the usage of ``Hamming`` for different string lengths. Length differences are handled as
insertions / deletions
* remove support for boolean preprocessor functions in ``rapidfuzz.fuzz`` and ``rapidfuzz.process``.
The processor argument is now always a callable or ``None``.
* update defaults of the processor argument to be ``None`` everywhere. For affected functions this can change results, since strings are no longer preprocessed.
To get back the old behaviour pass ``processor=utils.default_process`` to these functions.
The following functions are affected by this:
* ``process.extract``, ``process.extract_iter``, ``process.extractOne``
* ``fuzz.token_sort_ratio``, ``fuzz.token_set_ratio``, ``fuzz.token_ratio``, ``fuzz.partial_token_sort_ratio``, ``fuzz.partial_token_set_ratio``, ``fuzz.partial_token_ratio``, ``fuzz.WRatio``, ``fuzz.QRatio``
* ``rapidfuzz.process`` no longer calls scorers with ``processor=None``. For this reason user provided scorers no longer require this argument.
* remove option to pass keyword arguments to scorer via ``**kwargs`` in ``rapidfuzz.process``. They can be passed
via a ``scorer_kwargs`` argument now. This ensures this does not break when extending function parameters and
prevents naming clashes.
* remove ``rapidfuzz.string_metric`` module. Replacements for all functions are available in ``rapidfuzz.distance``
Added
~~~~~
* added support for arbitrary hashable sequence in the pure Python fallback implementation of all functions in ``rapidfuzz.distance``
* added support for ``None`` and ``float("nan")`` in ``process.cdist`` as long as the underlying scorer supports it.
This is the case for all scorers returning normalized results.
Fixed
~~~~~
* fix division by zero in simd implementation of normalized metrics leading to incorrect results
[2.15.1] - 2023-04-11
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix incorrect tag dispatching implementation leading to AVX2 instructions in the SSE2 code path
Added
~~~~~
* add wheels for windows arm64
[2.15.0] - 2023-04-01
^^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* allow the usage of finite generators as choices in ``process.extract``
[2.14.0] - 2023-03-31
^^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* upgrade required Cython version to ``Cython==3.0.0b2``
Fixed
~~~~~
* fix handling of non symmetric scorers in pure python version of ``process.cdist``
* fix default dtype handling when using ``process.cdist`` with pure python scorers
[2.13.7] - 2022-12-20
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~~~
* fix function signature of ``get_requires_for_build_wheel``
[2.13.6] - 2022-12-11
^^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* reformat changelog as restructured text to get rig of ``m2r2`` dependency
[2.13.5] - 2022-12-11
^^^^^^^^^^^^^^^^^^^^^
Added
~~~~~
* added docs to sdist
Fixed
~~~~~
* fix two cases of undefined behavior in ``process.cdist``
[2.13.4] - 2022-12-08
^^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* handle ``float("nan")`` similar to ``None`` for query / choice, since this is common for
non-existent data in tools like numpy
Fixed
~~~~~
* fix handling on ``None``\ /\ ``float("nan")`` in ``process.distance``
* use absolute imports inside tests
[2.13.3] - 2022-12-03
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* improve handling of functions wrapped using ``functools.wraps``
* fix broken fallback to Python implementation when the a ``ImportError`` occurs on import.
This can e.g. occur when the binary has a dependency on libatomic, but it is unavailable on
the system
* define ``CMAKE_C_COMPILER_AR``\ /\ ``CMAKE_CXX_COMPILER_AR``\ /\ ``CMAKE_C_COMPILER_RANLIB``\ /\ ``CMAKE_CXX_COMPILER_RANLIB``
if they are not defined yet
[2.13.2] - 2022-11-05
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix incorrect results in ``Hamming.normalized_similarity``
* fix incorrect score_cutoff handling in pure python implementation of
``Postfix.normalized_distance`` and ``Prefix.normalized_distance``
* fix ``Levenshtein.normalized_similarity`` and ``Levenshtein.normalized_distance``
when used in combination with the process module
* ``fuzz.partial_ratio`` was not always symmetric when ``len(s1) == len(s2)``
[2.13.1] - 2022-11-02
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix bug in ``normalized_similarity`` of most scorers,
leading to incorrect results when used in combination with the process module
* fix sse2 support
* fix bug in ``JaroWinkler`` and ``Jaro`` when used in the pure python process module
* forward kwargs in pure Python implementation of ``process.extract``
[2.13.0] - 2022-10-30
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix bug in ``Levenshtein.editops`` leading to crashes when used with ``score_hint``
Changed
~~~~~~~
* moved capi from ``rapidfuzz_capi`` into ``rapidfuzz``\ , since it will always
succeed the installation now that there is a pure Python mode
* add ``score_hint`` argument to process module
* add ``score_hint`` argument to Levenshtein module
[2.12.0] - 2022-10-24
^^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* drop support for Python 3.6
Added
~~~~~
* added ``Prefix``\ /\ ``Suffix`` similarity
Fixed
~~~~~
* fixed packaging with pyinstaller
[2.11.1] - 2022-10-05
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* Fix segmentation fault in ``process.cdist`` when used with an empty query sequence
[2.11.0] - 2022-10-02
^^^^^^^^^^^^^^^^^^^^^
Changes
~~~~~~~
* move jarowinkler dependency into rapidfuzz to simplify maintenance
Performance
~~~~~~~~~~~
* add SIMD implementation for ``fuzz.ratio``\ /\ ``fuzz.QRatio``\ /\ ``Levenshtein``\ /\ ``Indel``\ /\ ``LCSseq``\ /\ ``OSA`` to improve
performance for short strings in cdist
[2.10.3] - 2022-09-30
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* use ``scikit-build=0.14.1`` on Linux, since ``scikit-build=0.15.0`` fails to find the Python Interpreter
* workaround gcc in bug in template type deduction
[2.10.2] - 2022-09-27
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix support for cmake versions below 3.17
[2.10.1] - 2022-09-25
^^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* modernize cmake build to fix most conda-forge builds
[2.10.0] - 2022-09-18
^^^^^^^^^^^^^^^^^^^^^
Added
~~~~~
* add editops to hamming distance
Performance
~~~~~~~~~~~
* strip common affix in osa distance
Fixed
~~~~~
* ignore missing pandas in Python 3.11 tests
[2.9.0] - 2022-09-16
^^^^^^^^^^^^^^^^^^^^
Added
~~~~~
* add optimal string alignment (OSA)
[2.8.0] - 2022-09-11
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* ``fuzz.partial_ratio`` did not find the optimal alignment in some edge cases (#219)
Performance
~~~~~~~~~~~
* improve performance of ``fuzz.partial_ratio``
Changed
~~~~~~~
* increased minimum C++ version to C++17 (see #255)
[2.7.0] - 2022-09-11
^^^^^^^^^^^^^^^^^^^^
Performance
~~~~~~~~~~~
* improve performance of ``Levenshtein.distance``\ /\ ``Levenshtein.editops`` for
long sequences.
Added
~~~~~
* add ``score_hint`` parameter to ``Levenshtein.editops`` which allows the use of a
faster implementation
Changed
~~~~~~~
* all functions in the ``string_metric`` module do now raise a deprecation warning.
They are now only wrappers for their replacement functions, which makes them slower
when used with the process module
[2.6.1] - 2022-09-03
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix incorrect results of partial_ratio for long needles (#257)
[2.6.0] - 2022-08-20
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix hashing for custom classes
Added
~~~~~
* add support for slicing in ``Editops.__getitem__``\ /\ ``Editops.__delitem__``
* add ``DamerauLevenshtein`` module
[2.5.0] - 2022-08-14
^^^^^^^^^^^^^^^^^^^^
Added
~~~~~
* added support for KeyboardInterrupt in processor module
It might still take a bit until the KeyboardInterrupt is registered, but
no longer runs all text comparisons after pressing ``Ctrl + C``
Fixed
~~~~~
* fix default scorer used by cdist to use C++ implementation if possible
[2.4.4] - 2022-08-12
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* Added support for Python 3.11
[2.4.3] - 2022-08-08
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix value range of ``jaro_similarity``\ /\ ``jaro_winkler_similarity`` in the pure Python mode
for the string_metric module
* fix missing atomix symbol on arm 32 bit
[2.4.2] - 2022-07-30
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* add missing symbol to pure Python which made the usage impossible
[2.4.1] - 2022-07-29
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix version number
[2.4.0] - 2022-07-29
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix banded Levenshtein implementation
Performance
~~~~~~~~~~~
* improve performance and memory usage of ``Levenshtein.editops``
* memory usage is reduced from O(NM) to O(N)
* performance is improved for long sequences
[2.3.0] - 2022-07-23
^^^^^^^^^^^^^^^^^^^^
Added
~~~~~
* add ``as_matching_blocks`` to ``Editops``\ /\ ``Opcodes``
* add support for deletions from ``Editops``
* add ``Editops.apply``\ /\ ``Opcodes.apply``
* add ``Editops.remove_subsequence``
Changed
~~~~~~~
* merge adjacent similar blocks in ``Opcodes``
Fixed
~~~~~
* fix usage of ``eval(repr(Editop))``\ , ``eval(repr(Editops))``\ , ``eval(repr(Opcode))`` and ``eval(repr(Opcodes))``
* fix opcode conversion for empty source sequence
* fix validation for empty Opcode list passed into ``Opcodes.__init__``
[2.2.0] - 2022-07-19
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* added in-tree build backend to install cmake and ninja only when it is not installed yet
and only when wheels are available
[2.1.4] - 2022-07-17
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* changed internal implementation of cdist to remove build dependency to numpy
Added
~~~~~
* added wheels for musllinux and manylinux ppc64le, s390x
[2.1.3] - 2022-07-09
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix missing type stubs
[2.1.2] - 2022-07-04
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* change src layout to make package import from root directory possible
[2.1.1] - 2022-06-30
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* allow installation without the C++ extension if it fails to compile
* allow selection of implementation via the environment variable ``RAPIDFUZZ_IMPLEMENTATION``
which can be set to "cpp" or "python"
[2.1.0] - 2022-06-29
^^^^^^^^^^^^^^^^^^^^
Added
~~~~~
* added pure python fallback for all implementations with the following exceptions:
* no support for sequences of hashables. Only strings supported so far
* ``\*.editops`` / ``\*.opcodes`` functions not implemented yet
* process.cdist does not support multithreading
Fixed
~~~~~
* fuzz.partial_ratio_alignment ignored the score_cutoff
* fix implementation of Hamming.normalized_similarity
* fix default score_cutoff of Hamming.similarity
* fix implementation of LCSseq.distance when used in the process module
* treat hash for -1 and -2 as different
[2.0.15] - 2022-06-24
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix integer wraparound in partial_ratio/partial_ratio_alignment
[2.0.14] - 2022-06-23
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix unlimited recursion in LCSseq when used in combination with the process module
Changed
~~~~~~~
* add fallback implementations of ``taskflow``\ , ``rapidfuzz-cpp`` and ``jarowinkler-cpp``
back to wheel, since some package building systems like piwheels can't clone sources
[2.0.13] - 2022-06-22
^^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* use system version of cmake on arm platforms, since the cmake package fails to compile
[2.0.12] - 2022-06-22
^^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* add tests to sdist
* remove cython dependency for sdist
[2.0.11] - 2022-04-23
^^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* relax version requirements of dependencies to simplify packaging
[2.0.10] - 2022-04-17
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* Do not include installations of jaro_winkler in wheels (regression from 2.0.7)
Changed
~~~~~~~
* Allow installation from system installed versions of ``rapidfuzz-cpp``\ , ``jarowinkler-cpp``
and ``taskflow``
Added
~~~~~
* Added PyPy3.9 wheels on Linux
[2.0.9] - 2022-04-07
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* Add missing Cython code in sdist
* consider float imprecision in score_cutoff (see #210)
[2.0.8] - 2022-04-07
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix incorrect score_cutoff handling in token_set_ratio and token_ratio
Added
~~~~~
* add longest common subsequence
[2.0.7] - 2022-03-13
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* Do not include installations of jaro_winkler and taskflow in wheels
[2.0.6] - 2022-03-06
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix incorrect population of sys.modules which lead to submodules overshadowing
other imports
Changed
~~~~~~~
* moved JaroWinkler and Jaro into a separate package
[2.0.5] - 2022-02-25
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix signed integer overflow inside hashmap implementation
[2.0.4] - 2022-02-21
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix binary size increase due to debug symbols
* fix segmentation fault in ``Levenshtein.editops``
[2.0.3] - 2022-02-18
^^^^^^^^^^^^^^^^^^^^
Added
~~~~~
* Added fuzz.partial_ratio_alignment, which returns the result of fuzz.partial_ratio
combined with the alignment this result stems from
Fixed
~~~~~
* Fix Indel distance returning incorrect result when using score_cutoff=1, when the strings
are not equal. This affected other scorers like fuzz.WRatio, which use the Indel distance
as well.
[2.0.2] - 2022-02-12
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix type hints
* Add back transpiled cython files to the sdist to simplify builds in package builders
like FreeBSD port build or conda-forge
[2.0.1] - 2022-02-11
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix type hints
* Indel.normalized_similarity mistakenly used the implementation of Indel.normalized_distance
[2.0.0] - 2022-02-09
^^^^^^^^^^^^^^^^^^^^
Added
~~~~~
* added C-Api which can be used to extend RapidFuzz from different Python modules using any
programming language which allows the usage of C-Apis (C/C++/Rust)
* added new scorers in ``rapidfuzz.distance.*``
* port existing distances to this new api
* add Indel distance along with the corresponding editops function
Changed
~~~~~~~
* when the result of ``string_metric.levenshtein`` or ``string_metric.hamming`` is below max
they do now return ``max + 1`` instead of -1
* Build system moved from setuptools to scikit-build
* Stop including all modules in __init__.py, since they significantly slowed down import time
Removed
~~~~~~~
* remove the ``rapidfuzz.levenshtein`` module which was deprecated in v1.0.0 and scheduled for removal in v2.0.0
* dropped support for Python2.7 and Python3.5
Deprecated
~~~~~~~~~~
* deprecate support to specify processor in form of a boolean (will be removed in v3.0.0)
* new functions will not get support for this in the first place
* deprecate ``rapidfuzz.string_metric`` (will be removed in v3.0.0). Similar scorers are available
in ``rapidfuzz.distance.*``
Fixed
~~~~~
* process.cdist did raise an exception when used with a pure python scorer
Performance
~~~~~~~~~~~
* improve performance and memory usage of ``rapidfuzz.string_metric.levenshtein_editops``
* memory usage is reduced by 33%
* performance is improved by around 10%-20%
* significantly improve performance of ``rapidfuzz.string_metric.levenshtein`` for ``max <= 31``
using a banded implementation
[1.9.1] - 2021-12-13
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix bug in new editops implementation, causing it to SegFault on some inputs (see qurator-spk/dinglehopper#64)
[1.9.0] - 2021-12-11
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* Fix some issues in the type annotations (see #163)
Performance
~~~~~~~~~~~
* improve performance and memory usage of ``rapidfuzz.string_metric.levenshtein_editops``
* memory usage is reduced by 10x
* performance is improved from ``O(N * M)`` to ``O([N / 64] * M)``
[1.8.3] - 2021-11-19
^^^^^^^^^^^^^^^^^^^^
Added
~~~~~
* Added missing wheels for Python3.6 on MacOs and Windows (see #159)
[1.8.2] - 2021-10-27
^^^^^^^^^^^^^^^^^^^^
Added
~~~~~
* Add wheels for Python 3.10 on MacOs
[1.8.1] - 2021-10-22
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* Fix incorrect editops results (See #148)
[1.8.0] - 2021-10-20
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* Add Wheels for Python3.10 on all platforms except MacOs (see #141)
* Improve performance of ``string_metric.jaro_similarity`` and ``string_metric.jaro_winkler_similarity`` for strings with a length <= 64
[1.7.1] - 2021-10-02
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fixed incorrect results of fuzz.partial_ratio for long needles (see #138)
[1.7.0] - 2021-09-27
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* Added typing for process.cdist
* Added multithreading support to cdist using the argument ``process.cdist``
* Add dtype argument to ``process.cdist`` to set the dtype of the result numpy array (see #132)
* Use a better hash collision strategy in the internal hashmap, which improves the worst case performance
[1.6.2] - 2021-09-15
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* improved performance of fuzz.ratio
* only import process.cdist when numpy is available
[1.6.1] - 2021-09-11
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* Add back wheels for Python2.7
[1.6.0] - 2021-09-10
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* fuzz.partial_ratio uses a new implementation for short needles (<= 64). This implementation is
* more accurate than the current implementation (it is guaranteed to find the optimal alignment)
* it is significantly faster
* Add process.cdist to compare all elements of two lists (see #51)
[1.5.1] - 2021-09-01
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* Fix out of bounds access in levenshtein_editops
[1.5.0] - 2021-08-21
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* all scorers do now support similarity/distance calculations between any sequence of hashables. So it is possible to calculate e.g. the WER as:
.. code-block::
>>> string_metric.levenshtein(["word1", "word2"], ["word1", "word3"])
1
Added
~~~~~
* Added type stub files for all functions
* added jaro similarity in ``string_metric.jaro_similarity``
* added jaro winkler similarity in ``string_metric.jaro_winkler_similarity``
* added Levenshtein editops in ``string_metric.levenshtein_editops``
Fixed
~~~~~
* Fixed support for set objects in ``process.extract``
* Fixed inconsistent handling of empty strings
[1.4.1] - 2021-03-30
^^^^^^^^^^^^^^^^^^^^
Performance
~~~~~~~~~~~
* improved performance of result creation in process.extract
Fixed
~~~~~
* Cython ABI stability issue (#95)
* fix missing decref in case of exceptions in process.extract
[1.4.0] - 2021-03-29
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* added processor support to ``levenshtein`` and ``hamming``
* added distance support to extract/extractOne/extract_iter
Fixed
~~~~~
* incorrect results of ``normalized_hamming`` and ``normalized_levenshtein`` when used with ``utils.default_process`` as processor
[1.3.3] - 2021-03-20
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* Fix a bug in the mbleven implementation of the uniform Levenshtein distance and cover it with fuzz tests
[1.3.2] - 2021-03-20
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* some of the newly activated warnings caused build failures in the conda-forge build
[1.3.1] - 2021-03-20
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* Fixed issue in LCS calculation for partial_ratio (see #90)
* Fixed incorrect results for normalized_hamming and normalized_levenshtein when the processor ``utils.default_process`` is used
* Fix many compiler warnings
[1.3.0] - 2021-03-16
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* add wheels for a lot of new platforms
* drop support for Python 2.7
Performance
~~~~~~~~~~~
* use ``is`` instead of ``==`` to compare functions directly by address
Fixed
~~~~~
* Fix another ref counting issue
* Fix some issues in the Levenshtein distance algorithm (see #92)
[1.2.1] - 2021-03-08
^^^^^^^^^^^^^^^^^^^^
Performance
~~~~~~~~~~~
* further improve bitparallel implementation of uniform Levenshtein distance for strings with a length > 64 (in many cases more than 50% faster)
[1.2.0] - 2021-03-07
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* add more benchmarks to documentation
Performance
~~~~~~~~~~~
* add bitparallel implementation to InDel Distance (Levenshtein with the weights 1,1,2) for strings with a length > 64
* improve bitparallel implementation of uniform Levenshtein distance for strings with a length > 64
* use the InDel Distance and uniform Levenshtein distance in more cases instead of the generic implementation
* Directly use the Levenshtein implementation in C++ instead of using it through Python in process.*
[1.1.2] - 2021-03-03
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* Fix reference counting in process.extract (see #81)
[1.1.1] - 2021-02-23
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* Fix result conversion in process.extract (see #79)
[1.1.0] - 2021-02-21
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* string_metric.normalized_levenshtein supports now all weights
* when different weights are used for Insertion and Deletion the strings are not swapped inside the Levenshtein implementation anymore. So different weights for Insertion and Deletion are now supported.
* replace C++ implementation with a Cython implementation. This has the following advantages:
* The implementation is less error prone, since a lot of the complex things are done by Cython
* slightly faster than the current implementation (up to 10% for some parts)
* about 33% smaller binary size
* reduced compile time
* Added \*\*kwargs argument to process.extract/extractOne/extract_iter that is passed to the scorer
* Add max argument to hamming distance
* Add support for whole Unicode range to utils.default_process
Performance
~~~~~~~~~~~
* replaced Wagner Fischer usage in the normal Levenshtein distance with a bitparallel implementation
[1.0.2] - 2021-02-19
^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* The bitparallel LCS algorithm in fuzz.partial_ratio did not find the longest common substring properly in some cases.
The old algorithm is used again until this bug is fixed.
[1.0.1] - 2021-02-17
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* string_metric.normalized_levenshtein supports now the weights (1, 1, N) with N >= 1
Performance
~~~~~~~~~~~
* The Levenshtein distance with the weights (1, 1, >2) do now use the same implementation as the weight (1, 1, 2), since
``Substitution > Insertion + Deletion`` has no effect
Fixed
~~~~~
* fix uninitialized variable in bitparallel Levenshtein distance with the weight (1, 1, 1)
[1.0.0] - 2021-02-12
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* all normalized string_metrics can now be used as scorer for process.extract/extractOne
* Implementation of the C++ Wrapper completely refactored to make it easier to add more scorers, processors and string matching algorithms in the future.
* increased test coverage, that already helped to fix some bugs and help to prevent regressions in the future
* improved docstrings of functions
Performance
~~~~~~~~~~~
* Added bit-parallel implementation of the Levenshtein distance for the weights (1,1,1) and (1,1,2).
* Added specialized implementation of the Levenshtein distance for cases with a small maximum edit distance, that is even faster, than the bit-parallel implementation.
* Improved performance of ``fuzz.partial_ratio``
-> Since ``fuzz.ratio`` and ``fuzz.partial_ratio`` are used in most scorers, this improves the overall performance.
* Improved performance of ``process.extract`` and ``process.extractOne``
Deprecated
~~~~~~~~~~
* the ``rapidfuzz.levenshtein`` module is now deprecated and will be removed in v2.0.0
These functions are now placed in ``rapidfuzz.string_metric``. ``distance``\ , ``normalized_distance``\ , ``weighted_distance`` and ``weighted_normalized_distance`` are combined into ``levenshtein`` and ``normalized_levenshtein``.
Added
~~~~~
* added normalized version of the hamming distance in ``string_metric.normalized_hamming``
* process.extract_iter as a generator, that yields the similarity of all elements, that have a similarity >= score_cutoff
Fixed
~~~~~
* multiple bugs in extractOne when used with a scorer, that's not from RapidFuzz
* fixed bug in ``token_ratio``
* fixed bug in result normalization causing zero division
[0.14.2] - 2020-12-31
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* utf8 usage in the copyright header caused problems with python2.7 on some platforms (see #70)
[0.14.1] - 2020-12-13
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* when a custom processor like ``lambda s: s`` was used with any of the methods inside fuzz.* it always returned a score of 100. This release fixes this and adds a better test coverage to prevent this bug in the future.
[0.14.0] - 2020-12-09
^^^^^^^^^^^^^^^^^^^^^
Added
~~~~~
* added hamming distance metric in the levenshtein module
Performance
~~~~~~~~~~~
* improved performance of default_process by using lookup table
[0.13.4] - 2020-11-30
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* Add missing virtual destructor that caused a segmentation fault on Mac Os
[0.13.3] - 2020-11-21
^^^^^^^^^^^^^^^^^^^^^
Added
~~~~~
* C++11 Support
* manylinux wheels
[0.13.2] - 2020-11-21
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* Levenshtein was not imported from __init__
* The reference count of a Python Object inside process.extractOne was decremented to early
[0.13.1] - 2020-11-17
^^^^^^^^^^^^^^^^^^^^^
Performance
~~~~~~~~~~~
* process.extractOne exits early when a score of 100 is found. This way the other strings do not have to be preprocessed anymore.
[0.13.0] - 2020-11-16
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* string objects passed to scorers had to be strings even before preprocessing them. This was changed, so they only have to be strings after preprocessing similar to process.extract/process.extractOne
Performance
~~~~~~~~~~~
* process.extractOne is now implemented in C++ making it a lot faster
* When token_sort_ratio or partial_token_sort ratio is used inprocess.extractOne the words in the query are only sorted once to improve the runtime
Changed
~~~~~~~
* process.extractOne/process.extract do now return the index of the match, when the choices are a list.
Removed
~~~~~~~
* process.extractIndices got removed, since the indices are now already returned by process.extractOne/process.extract
[0.12.5] - 2020-10-26
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix documentation of process.extractOne (see #48)
[0.12.4] - 2020-10-22
^^^^^^^^^^^^^^^^^^^^^
Added
~~~~~
* Added wheels for
* CPython 2.7 on windows 64 bit
* CPython 2.7 on windows 32 bit
* PyPy 2.7 on windows 32 bit
[0.12.3] - 2020-10-09
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix bug in partial_ratio (see #43)
[0.12.2] - 2020-10-01
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix inconsistency with fuzzywuzzy in partial_ratio when using strings of equal length
[0.12.1] - 2020-09-30
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* MSVC has a bug and therefore crashed on some of the templates used. This Release simplifies the templates so compiling on msvc works again
[0.12.0] - 2020-09-30
^^^^^^^^^^^^^^^^^^^^^
Performance
~~~~~~~~~~~
* partial_ratio is using the Levenshtein distance now, which is a lot faster. Since many of the other algorithms use partial_ratio, this helps to improve the overall performance
[0.11.3] - 2020-09-22
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* fix partial_token_set_ratio returning 100 all the time
[0.11.2] - 2020-09-12
^^^^^^^^^^^^^^^^^^^^^
Added
~~~~~
* added rapidfuzz.__author__, rapidfuzz.__license__ and rapidfuzz.__version__
[0.11.1] - 2020-09-01
^^^^^^^^^^^^^^^^^^^^^
Fixed
~~~~~
* do not use auto junk when searching the optimal alignment for partial_ratio
[0.11.0] - 2020-08-22
^^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* support for python 2.7 added #40
* add wheels for python2.7 (both pypy and cpython) on MacOS and Linux
[0.10.0] - 2020-08-17
^^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
* added wheels for Python3.9
Fixed
~~~~~
* tuple scores in process.extractOne are now supported #39
|