1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041
|
# Changelog
## 1.8.7 (2025-03-27)
* [BUGFIX] Fixed build warnings.
* [BUGFIX] #512: Fixed PROTECT stack imbalance in `stri_encode_from_marked`.
## 1.8.4 (2024-05-06)
* [BUILD TIME] [BUGFIX] #508: Fixed build errors on Windows
(thanks to @jeroen and @kalibera).
## 1.8.3 (2023-12-10)
* [BUILD TIME] [BUGFIX] Fixed the *format string is not a string literal
(potentially insecure)* warnings.
## 1.8.2 (2023-11-22)
* [BUILD TIME] [BUGFIX] #501: Fixed failing build on 32-bit Windows
(Windows API `ResolveLocaleName` function not available).
* [BUILD TIME] [BUGFIX] #502: `PKG_CPPFLAGS` are now considered
before other `CPPFLAGS` (the same with other flag types) in
the `configure` script to make it compatible with what happens in `Makevars`.
* [BUILD TIME] [BUGFIX] Support for ICU's `double` conversion on Loongarch
has been restored (see #463).
## 1.8.1 (2023-11-09)
* [GENERAL] ICU bundle updated to version 74.1 (Unicode 15.1, CLDR 44).
* [BACKWARD INCOMPATIBILITY] [BUILD TIME] Support for Solaris has now been
dropped. The package is no longer shipped with the very outdated ICU55 bundle.
A compiler supporting at least C++11 as well as ICU >= 61 are now required.
* [BACKWARD INCOMPATIBILITY] #469: Missing date-time fields in
`stri_datetime_parse` and `stri_datetime_create` now default to today's
midnight local time.
* [BACKWARD INCOMPATIBILITY] Removed the long-deprecated and defunct
`fallback_encoding` parameter of `stri_read_lines` and the ellipsis
parameter of `stri_opts_collator`, `stri_opts_regex`, `stri_opts_fixed`,
`stri_opts_brkiter`, and `stri_opts_regex`.
* [BUILD TIME] As per the suggestion of Prof. Brian Ripley, `icudt74l`
(ICU data - little endian) is now included in the source tarball (compressed
with xz to save space). This allows for building **`stringi`** on systems with
no internet access.
* [NEW FEATURE] #476: In break iterator-, date-time-, and collator-based
operations (e.g., `stri_sort`), a warning is emitted when the *root* ICU
resource bundle is returned when using an *explicitly* requested locale.
This might happen when we pass an 'unknown' `locale` argument to these
functions. Note that when relying on the default `locale=NULL` argument,
no warning is emitted. In such a case, checking
if the default locale as returned by `stri_enc_get` is amongst
those listed in `stri_enc_list` is recommended.
* [NEW FEATURE] The `C` locale identifier now resolves to `en_US_POSIX`.
* [BUGFIX] #469: `stri_datetime_parse` did not reset the `Calendar`
object when parsing multiple dates.
* [BUGFIX] #487: Some functions did not accept ASCII strings longer than
858993457 characters on input.
## 1.7.12 (2023-01-09)
* [BUGFIX] Fixed a few issues reported by `rchk`.
* [NOTE] [BACKWARD INCOMPATIBLE CHANGE IF ICU >= 72]
If building against ICU >= 72, note a backward incompatible change:
`@` is no longer considered a word break; for more details, see
<https://github.com/unicode-org/cldr/pull/2256>.
## 1.7.8 (2022-07-11)
* [DOCUMENTATION] Paper on **`stringi`** has been published in
the *Journal of Statistical Software*;
see <https://doi.org/10.18637/jss.v103.i02>.
* [BUGFIX] #473, #397: Fixed buffer overflow in `stri_dup`; Also,
`stri_dup`, `stri_paste`, ... fail more graciously on attempts to
generate strings of length >= 2^31 each.
* [BUILD TIME] #480: Using `Rf_isNull` instead of `isNull`.
* [DOCUMENTATION] #462: That the `numeric=TRUE` collator
does not handle negative numbers correctly is now mentioned in the manual.
## 1.7.6 (2021-11-29)
* [BUILD TIME] #463: Added Loongarch support in ICU's double conversion
(@liuxiang88).
* [BUGFIX] #467: The UCRT build on Windows was not marking strings as `latin1`.
## 1.7.5 (2021-10-04)
* [DOCUMENTATION] Paper on **`stringi`** has been accepted for
publication in the *Journal of Statistical Software*,
see <https://stringi.gagolewski.com/_static/vignette/stringi.pdf>
for a draft version.
* [DOCUMENTATION] The **`stringi`** website at <https://stringi.gagolewski.com/>
now features a comprehensive tutorial based on the aforementioned paper.
* [DOCUMENTATION] The *ICU* Project site has been moved to
<https://icu.unicode.org/>.
* [BUILD TIME] #457: The `autoconf` macros `AC_LANG_CPLUSPLUS`
and `AC_TRY_COMPILE` were obsolete.
* [BUGFIX] #458: Passing ALTREP objects no longer yields
'embeded nul in string' errors.
## 1.7.4 (2021-08-12)
* [BUGFIX] #449: Fixed segfaults generated by `stri_sprintf`.
* [BUILD TIME] No longer defining `USE_RINTERNALS` and `R_NO_REMAP`.
## 1.7.3 (2021-07-15)
* [BUGFIX] Fixed the previous patch of ICU55 causing a build failure on,
amongst others, CRAN's Solaris-based target.
## 1.7.2 (2021-07-14)
* [BUGFIX] Workaround for a bug in `tools::checkFF` failing
when `NA_character_` is passed to `.Call`.
## 1.7.1 (2021-07-14)
* [BACKWARD INCOMPATIBILITY] `%s$%` and `%stri$%` now use the new `stri_sprintf`
(see below) function instead of `base::sprintf`.
* [BACKWARD INCOMPATIBILITY, NEW FEATURE] In `stri_sub<-` and `stri_sub_all<-`,
providing a negative `length` from now on does not result in the corresponding
input string being altered.
* [BACKWARD INCOMPATIBILITY, NEW FEATURE] In `stri_sub` and `stri_sub_all`,
negative `length` results in the corresponding output being `NA`
or not extracted at all, depending on the setting of the new argument
`ignore_negative_length`.
* [BACKWARD INCOMPATIBILITY, BUGFIX, NEW FEATURE] In `stri_subset*`
and their replacement versions, `pattern` and `value` cannot be longer
than `str` (but now they are recycled if necessary).
* [BACKWARD INCOMPATIBILITY, NEW FEATURE] `stri_sub*` now accept the
`from` argument being a matrix like `cbind(from, length=length)`.
Unnamed columns or any other names are still interpreted as `cbind(from, to)`.
Also, the new argument `use_matrix` can be used to disable
the special treatment of such matrices.
* [DOCUMENTATION] It has been clarified that the syntax of `*_charclass`
(e.g., used in `stri_trim*`) differs slightly from regex character
classes.
* [NEW FEATURE] #420: `stri_sprintf` (alias: `stri_string_format`)
is a Unicode-aware replacement for and enhancement of the base `sprintf`:
it adds a customised handling of `NA`s (on demand), computing field size
based on code point width, outputting substrings of at most given width,
variable width and precision (both at the same time), etc. Moreover,
`stri_printf` can be used to display formatted strings conveniently.
* [NEW FEATURE] #153: `stri_match_*_regex` now extract capture group names.
* [NEW FEATURE] #25: `stri_locate_*_regex` now have a new argument,
`capture_groups`, which allows for extracting positions of matches
to parenthesised subexpressions.
* [NEW FEATURE] `stri_locate_*` now have a new argument, `get_length`,
whose setting may result in generating *from-length* matrices
(instead of *from-to* ones).
* [NEW FEATURE] #438: `stri_trans_general` now supports rule-based
as well as reverse-direction transliteration.
* [NEW FEATURE] #434: `stri_datetime_format` and `stri_datetime_parse`
are now vectorised also with respect to the `format` argument.
* [NEW FEATURE] `stri_datetime_fstr` has a new argument, `ignore_special`,
which defaults to `TRUE` for backward compatibility.
* [NEW FEATURE] `stri_datetime_format`, `stri_datetime_add`, and
`stri_datetime_fields` now call `as.POSIXct` more eagerly.
* [NEW FEATURE] `stri_trim*` now have a new argument, `negate`.
* [NEW FEATURE] `stri_replace_rstr` converts `gsub`-style replacement strings
to `stri_replace`-style.
* [INTERNAL] `stri_prepare_arg*` have been refactored, buffer overruns
in the exception handling subsystem are now avoided.
* [BUGFIX] Few functions (`stri_length`, `stri_enc_toutf32`, etc.)
did not throw an exception on an invalid UTF-8
byte sequence (and merely issued a warning instead).
* [BUGFIX] `stri_datetime_fstr` did not honour `NA_character_`
and did not parse format strings such as `"%Y%m%d"` correctly.
It has now been completely rewritten (in C).
* [BUGFIX] `stri_wrap` did not recognise the width of certain Unicode sequences
correctly.
## 1.6.2 (2021-05-14)
* [BACKWARD INCOMPATIBILITY] In `stri_enc_list()`,
`simplify` now defaults to `TRUE`.
* [NEW FEATURE] #425: The outputs of `stri_enc_list()`, `stri_locale_list()`,
`stri_timezone_list()`, and `stri_trans_list()` are now sorted.
* [NEW FEATURE] #428: In `stri_flatten`, `na_empty=NA` now omits missing values.
* [BUILD TIME] #431: Pre-4.9.0 GCC has `::max_align_t`,
but not `std::max_align_t`, added a (possible) workaround, see the `INSTALL`
file.
* [BUGFIX] #429: `stri_width()` misclassified the width of certain
code points (including grave accent, Eszett, etc.);
General category *Sk* (Symbol, modifier) is no longer of width 0,
`UCHAR_EAST_ASIAN_WIDTH` of `U_EA_AMBIGUOUS` is no longer of width 2.
* [BUGFIX] #354: `ALTREP` `CHARSXP`s were not copied, and thus could have been
garbage collected in the so-called meanwhile (with thanks to @jimhester).
## 1.6.1 (2021-05-05)
* [GENERAL] #401: stringi is now bundled with ICU4C 69.1 (upgraded from 61.1),
which is used on most Windows and OS X builds as well as on *nix systems
not equipped with system ICU. However, if the C++11 support is disabled,
stringi will be built against the battle-tested ICU4C 55.1.
The update to ICU brings Unicode 13.0 and CLDR 39 support.
* [DOCUMENTATION] A draft version of a paper on **`stringi`** is now available
at <https://stringi.gagolewski.com/_static/vignette/stringi.pdf>.
* [GENERAL] stringi now requires R >= 3.1 (`CXX_STD` of `CXX11` or `CXX1X`).
* [NEW FEATURE] #408: `stri_trans_casefold()` performs case folding;
this is different from case mapping, which is locale-dependent.
Folding makes two pieces of text that differ only in case identical.
This can come in handy when comparing strings.
* [NEW FEATURE] #421: `stri_rank()` ranks strings in a character vector
(e.g., for ordering data frames with regards to multiple criteria,
the ranks can be passed to `order()`, see #219).
* [NEW FEATURE] #266: `stri_width()` now supports emojis.
* [NEW FEATURE] `%s$%` and `%stri$%` are now vectorised with respect to
both arguments.
* [BUGFIX] `stri_sort_key()` now outputs `bytes`-encoded strings.
* [BUGFIX] #415: `locale=''` was not equivalent to `locale=NULL`
in `stri_opts_collator()`.
* [INTERNAL] #414: Use `LEVELS(x)` macro instead of accessing `(x)->sxpinfo.gp`
directly (@lukaszdaniel).
## 1.5.3 (2020-09-04)
* [DOCUMENTATION] stringi home page has moved to
<https://stringi.gagolewski.com/> and now includes a comprehensive reference
manual.
* [NEW FEATURE] #400: `%s$%` and `%stri$%` are now binary operators
that call base R's `sprintf()`.
* [NEW FEATURE] #399: The `%s*%` and `%stri*%` operators can be used
in addition to `stri_dup()`, for the very same purpose.
* [NEW FEATURE] #355: `stri_opts_regex()` now accepts the `time_limit` and
`stack_limit` options so as to prevent malformed or malicious regexes
from running for too long.
* [NEW FEATURE] #345: `stri_startswith()` and `stri_endswith()` are now equipped
with the `negate` parameter.
* [NEW FEATURE] #382: Incorrect regexes are now reported to ease debugging.
* [DEPRECATION WARNING] #347: Any unknown option passed to `stri_opts_fixed()`,
`stri_opts_regex()`, `stri_opts_coll()`, and `stri_opts_brkiter()` now
generates a warning. In the future, the `...` parameter will be removed,
so that will be an error.
* [DEPRECATION WARNING] `stri_duplicated()`'s `fromLast` argument
has been renamed `from_last`. `fromLast` is now its alias scheduled
for removal in a future version of the package.
* [DEPRECATION WARNING] `stri_enc_detect2()`
is scheduled for removal in a future version of the package.
Use `stri_enc_detect()` or the more targeted `stri_enc_isutf8()`,
`stri_enc_isascii()`, etc., instead.
* [DEPRECATION WARNING] `stri_read_lines()`, `stri_write_lines()`,
`stri_read_raw()`: use `con` argument instead of `fname` now.
The argument `fallback_encoding` is scheduled for removal and is no longer
used. `stri_read_lines()` does not support `encoding="auto"` anymore.
* [DEPRECATION WARNING] `nparagraphs` in `stri_rand_lipsum()` has been renamed
`n_paragraphs`.
* [NEW FEATURE] #398: Alternative, British spelling of function parameters
has been introduced, e.g., `stri_opts_coll()` now supports both
`normalization` and `normalisation`.
* [NEW FEATURE] #393: `stri_read_bin()`, `stri_read_lines()`, and
`stri_write_lines()` are no longer marked as draft API.
* [NEW FEATURE] #187: `stri_read_bin()`, `stri_read_lines()`, and
`stri_write_lines()` now support connection objects as well.
* [NEW FEATURE] #386: New function `stri_sort_key()` for generating
locale-dependent sort keys which can be ordered at the byte level and
return an equivalent ordering to the original string (@DavisVaughan).
* [BUGFIX] #138: `stri_encode()` and `stri_rand_strings()`
now can generate strings of much larger lengths.
* [BUGFIX] `stri_wrap()` did not honour `indent` correctly when
`use_width` was `TRUE`.
## 1.4.6 (2020-02-17)
* [BACKWARD INCOMPATIBILITY] #369: `stri_c()` now returns an empty string
when input is empty and `collapse` is set.
* [BUGFIX] #370: fixed an issue in `stri_prepare_arg_POSIXct()`
reported by rchk.
* [DOCUMENTATION] #372: documented arguments not in `\usage` in
documentation object `stri_datetime_format`: `...`
## 1.4.5 (2020-01-11)
* [BUGFIX] #366: fix for #363 required ICU >= 55 .
## 1.4.4 (2020-01-06)
* [BUGFIX] #348: Avoid copying 0 bytes to a nil-buffer in `stri_sub_all()`.
* [BUGFIX] #362: Removed `configure` variable `CXXCPP` as it is now deprecated.
* [BUGFIX] #318: PROTECTing objects from gcing as reported by `rchk`.
* [BUGFIX] #344, #364: Removed compiler warnings in icu61/common/cstring.h.
* [BUGFIX] #363: Status of `RegexMatcher` is now checked after its use.
## 1.4.3 (2019-03-12)
* [NEW FEATURE] #30: New function `stri_sub_all()` - a version of
`stri_sub()` accepting list `from`/`to`/`length` arguments for extracting
multiple substrings from each string in a character vector.
* [NEW FEATURE] #30: New function `stri_sub_all<-()` (and its `%<%`-friendly
version, `stri_sub_replace_all()`) - for replacing multiple substrings
with corresponding replacement strings.
* [NEW FEATURE] In `stri_sub_replace()`, `value` parameter
has a new alias, `replacement`.
* [NEW FEATURE] New convenience functions based on `stri_remove_empty()`:
`stri_omit_empty_na()`, `stri_remove_empty_na()`, `stri_omit_empty()`,
and also `stri_remove_na()`, `stri_omit_na()`.
* [BUGFIX] #343: `stri_trans_char()` did not yield correct results
for overlapping pattern and replacement strings.
* [WARNFIX] #205: `configure.ac` is now included in the source bundle.
## 1.3.1 (2019-02-10)
* [BACKWARD INCOMPATIBILITY] #335: A fix to #314 prevented (by design) the use
of the system ICU if the library had been compiled with `U_CHARSET_IS_UTF8=1`.
However, this is the default setting in `libicu`>=61. From now on, in such
cases the system ICU is used more eagerly, but `stri_enc_set()` issues
a warning stating that the default (UTF-8) encoding cannot be changed.
* [NEW FEATURE] #232: All `stri_detect_*` functions now have the `max_count`
argument that allows for, e.g., stopping at the first pattern occurrence.
* [NEW FEATURE] #338: `stri_sub_replace()` is now an alias for `stri_sub<-()`
which makes it much more easily pipable (@yutannihilation, @BastienFR).
* [NEW FEATURE] #334: Added missing `icudt61b.dat` to support big-endian
platforms (thanks to Dimitri John Ledkov @xnox).
* [BUGFIX] #296: Out-of-the box build used to fail on CentOS 6, upgraded
`configure` to `--disable-cxx11` more eagerly at an early stage.
* [BUGFIX] #341: Fixed possible buffer overflows when calling `strncpy()`
from within ICU 61.
* [BUGFIX] #325: Made `configure` more portable so that it works
under `/bin/dash` now.
* [BUGFIX] #319: Fixed overflow in `stri_rand_shuffle()`.
* [BUGFIX] #337: Empty search patterns in search functions (e.g.,
`stri_split_regex()` and `stri_count_fixed()`) used to raise
too many warnings on empty search patterns.
## 1.2.4 (2018-07-20)
* [BUGFIX] #314: Testing `U_CHARSET_IS_UTF8` in `configure` when
using `pkg-build`.
* [BUILD TIME] #317: Included `icudt61l.zip` in the source bundle to solve
the frequent `icudt download failed` error (also on CRAN's `windows-release`
and `windows-oldrel`). (reverted in version 1.3.1, the `winbuilder`
errors were caused by a build chain bug).
## 1.2.3 (2018-05-16)
* [BUGFIX] #296: Fixed the behaviour of the `configure` script on CentOS 6.
* [BUGFIX] Fixed broken Windows build by updating the `icudt` mirror list.
## 1.2.2 (2018-05-01)
* [GENERAL] #193: stringi is now bundled with ICU4C 61.1,
which is used on most Windows and OS X builds as well as on *nix systems
not equipped with ICU. However, if the C++11 support is disabled,
stringi will be built against ICU4C 55.1. The update to ICU brings
Unicode 10.0 support, including new emoji characters.
* [BUGFIX] #288: `stri_match()` did not return the correct number of columns
when input was empty.
* [NEW FEATURE] #188: `stri_enc_detect()` now returns a list of data frames.
* [NEW FEATURE] #289: `stri_flatten()` how has `na_empty` and `omit_empty`
arguments.
* [NEW FEATURE] New functions: `stri_remove_empty()`, `stri_na2empty()`.
* [NEW FEATURE] #285: Coercion from a non-trivial list (one that consists
of atomic vectors, each of length 1) to an atomic vector now issues a warning.
* [WARN] Removed `-Wparentheses` warnings in `icu55/common/cstring.h:38:63`
and `icu55/i18n/windtfmt.cpp` in the ICU4C 55.1 bundle.
## 1.1.7 (2018-03-06)
* [BUGFIX] Fixed ICU4C 55.1 generating some *significant warnings*
(`icu55/i18n/winnmfmt.cpp`) and *suppressing important diagnostics*
(`src/icu55/i18n/decNumber.c`).
## 1.1.6 (2017-11-10)
* [WINDOWS SPECIFIC] #270: Strings marked with `latin1` encoding
are now converted internally to UTF-8 using the WINDOWS-1252 codec.
This fixes problems with - among others - displaying the Euro sign.
* [NEW FEATURE] #263: Added support for custom rule-based break iteration,
see `?stri_opts_brkiter`.
* [NEW FEATURE] #267: `omit_na=TRUE` in `stri_sub<-()` now ignores missing
values in any of the arguments provided.
* [BUGFIX] Fixed unPROTECTed variable names and stack imbalances
as reported by `rchk`.
## 1.1.5 (2017-04-07)
* [GENERAL] stringi now requires ICU4C >= 52.
* [BUGFIX] Fixed errors pointed out by `clang-UBSAN` in `stri_brkiter.h`.
* [GENERAL] stringi now requires R >= 2.14.
* [BUILD TIME] #238, #220: Now trying *standard* ICU4C build flags if a call
to `pkg-config` fails.
* [BUILD TIME] #258: Use `CXX11` instead of `CXX1X` on R >= 3.4.
* [BUILD TIME, BUGFIX] #254: `dir.exists()` is R >= 3.2.
## 1.1.3 (2017-03-21)
* [REMOVE DEPRECATED] `stri_install_check()` and `stri_install_icudt()`
marked as deprecated in stringi 0.5-5 are no longer being exported.
* [BUGFIX] #227: Incorrect behaviour of `stri_sub()` and `stri_sub<-()`
if the empty string was the result.
* [BUILD TIME] #231: The `configure` (Linux/Unix only) script now reads the
following environment variables: `STRINGI_CFLAGS`, `STRINGI_CPPFLAGS`,
`STRINGI_CXXFLAGS`, `STRINGI_LDFLAGS`, `STRINGI_LIBS`,
`STRINGI_DISABLE_CXX11`, `STRINGI_DISABLE_ICU_BUNDLE`,
`STRINGI_DISABLE_PKG_CONFIG`, `PKG_CONFIG`,
see `INSTALL` for more information.
* [BUILD TIME] #253: Call to `R_useDynamicSymbols()` added.
* [BUILD TIME] #230: `icudt` is now being downloaded by
`configure` (*NIX only) *before* building.
* [BUILD TIME] #242: `_COUNT/_LIMIT` enum constants have been deprecated
as of ICU 58.2, stringi code has been upgraded accordingly.
## 1.1.2 (2016-09-30)
* [BUGFIX] `round()`, `snprintf()` is not C++98.
## 1.1.1 (2016-05-25)
* [BUGFIX] #214: Allow a regex pattern like `.*` to match an empty string.
* [BUGFIX] #210: `stri_replace_all_fixed(c("1", "NULL"), "NULL", NA)`
now results in `c("1", NA)`.
* [NEW FEATURE] #199: `stri_sub<-()` now allows for ignoring `NA` locations
(a new `omit_na` argument added).
* [NEW FEATURE] #207: `stri_sub<-()` now allows for substring insertions
(via `length=0`).
* [NEW FUNCTION] #124: `stri_subset<-()` functions added.
* [NEW FEATURE] #216: `stri_detect()`, `stri_subset()`, `stri_subset<-()`
now all have the `negate` argument.
* [NEW FUNCTION] #175: `stri_join_list()` concatenates all strings
in a list of character vectors. Useful in conjunction with, e.g.,
`stri_extract_all_regex()`, `stri_extract_all_words()`, etc.
## 1.0-1 (2015-10-22)
* [GENERAL] #88: C API is now available for use in, e.g., Rcpp packages, see
<https://github.com/gagolews/ExampleRcppStringi> for an example.
* [BUGFIX] #183: Floating point exception raised in `stri_sub()` and
`stri_sub<-()` when `to` or `length` was a zero-length numeric vector.
* [BUGFIX] #180: `stri_c()` warned incorrectly (recycling rule) when using more
than two elements.
## 0.5-5 (2015-06-28)
* [BACKWARD INCOMPATIBILITY] `stri_install_check()` and `stri_install_icudt()`
are now deprecated. From now on they are supposed to be used only
by the stringi installer.
* [BUGFIX] #176: A patch for `sys/feature_tests.h` no longer included
(the original file was copyrighted by Sun Microsystems); fixed the *Compiler
or options invalid for pre-Unix 03 X/Open applications and pre-2001 POSIX
applications* error by forcing (conditionally) `_XPG6` conformance.
* [BUGFIX] #174: `stri_paste()` did not generate any warning when
the recycling rule is violated and `sep==""`.
* [BUGFIX] #170: `icu::setDataDirectory` is no longer called if our ICU
source bundle is not used (this used to cause build problems on openSUSE).
* [BUILD TIME] #169: `configure` now tries to switch to the *standard*
C++ compiler if a C++11 one is not configured correctly.
* [BUILD TIME] `configure.win` (`Biarch: TRUE`) now mimics `autoconf`'s
`AC_SUBST` and `AC_CONFIG_FILES` so that the build process is now
more similar across different platforms.
* [NEW FEATURE] `stri_info()` now also gives information about which version
of ICU4C is in use (system or bundle).
## 0.5-2 (2015-06-21)
* [BACKWARD INCOMPATIBILITY] The second argument to `stri_pad_*()` has
been renamed `width`.
* [GENERAL] #69: stringi is now bundled with ICU4C 55.1.
* [NEW FUNCTIONS] `stri_extract_*_boundaries()` extract text between text
boundaries.
* [NEW FUNCTION] #46: `stri_trans_char()` is a stringi-flavoured
`chartr()` equivalent.
* [NEW FUNCTION] #8: `stri_width()` approximates the *width* of a string
in a more Unicode-ish fashion than `nchar(..., "width")`
* [NEW FEATURE] #149: `stri_pad()` and `stri_wrap()` is now (by default)
based on code point widths instead of the number of code points.
Moreover, the default behaviour of `stri_wrap()` is now such that it
does not get rid of non-breaking, zero width, etc., spaces.
* [NEW FEATURE] #133: `stri_wrap()` silently allows for `width <= 0`
(for compatibility with `strwrap()`).
* [NEW FEATURE] #139: `stri_wrap()` gained a new argument: `whitespace_only`.
* [NEW FUNCTIONS] #137: Date-time formatting/parsing:
* `stri_timezone_list()` - lists all known time zone identifiers;
* `stri_timezone_set()`, `stri_timezone_get()` - manage the current
default time zone;
* `stri_timezone_info()` - basic information on a given time zone;
* `stri_datetime_symbols()` - gives localizable date-time formatting data;
* `stri_datetime_fstr()` - converts a `strptime`-like format string
to an ICU date/time format string;
* `stri_datetime_format()` - converts date/time to string;
* `stri_datetime_parse()` - converts string to date/time object;
* `stri_datetime_create()` - constructs date-time objects
from numeric representations;
* `stri_datetime_now()` - returns current date-time;
* `stri_datetime_fields()` - returns date-time fields' values;
* `stri_datetime_add()` - adds specific number of date-time units
to a date-time object.
* [GENERAL] #144: Performance improvements in handling ASCII strings
(these affect `stri_sub()`, `stri_locate()` and other string index-based
operations)
* [GENERAL] #143: Searching for short fixed patterns (`stri_*_fixed()`) now
relies on the current `libC`'s implementation of `strchr()` and `strstr()`.
This is very fast, e.g., on `glibc` using the `SSE2/3/4` instruction set.
* [BUILD TIME] #141: A local copy of `icudt*.zip` may be used on package
install; see the `INSTALL` file for more information.
* [BUILD TIME] #165: The `configure` option `--disable-icu-bundle`
forces the use of system ICU when building the package.
* [BUGFIX] Locale specifiers are now normalized in a more intelligent way:
e.g., `@calendar=gregorian` expands to `DEFAULT_LOCALE@calendar=gregorian`.
* [BUGFIX] #134: `stri_extract_all_words()` did not accept `simplify=NA`.
* [BUGFIX] #132: Incorrect behaviour in `stri_locate_regex()` for matches
of zero lengths.
* [BUGFIX] stringr/#73: `stri_wrap()` returned `CHARSXP` instead of `STRSXP`
on empty string input with `simplify=FALSE` argument.
* [BUGFIX] #164: Using `libicu-dev` failed on Ubuntu
(`LIBS` shall be passed after `LDFLAGS` and the list of `.o` files).
* [BUGFIX] #168: Build now fails if `icudt` is not available.
* [BUGFIX] #135: C++11 is now used by default (see the `INSTALL` file,
however) to build stringi from sources. This is because ICU4C uses the
`long long` type which is not part of the C++98 standard.
* [BUGFIX] #154: Dates and other objects with a custom class attribute
were not coerced to the character type correctly.
* [BUGFIX] Force ICU `u_init()` call on the stringi dynlib load.
* [BUGFIX] #157: Many overfull `hbox`es in the package PDF manual have been
corrected.
## 0.4-1 (2014-12-11)
* [IMPORTANT CHANGE] `n_max` argument in `stri_split_*()` has been renamed `n`.
* [IMPORTANT CHANGE] `simplify=FALSE` in `stri_extract_all_*()` and
`stri_split_*()` now calls `stri_list2matrix()` with `fill=""`.
`fill=NA_character_` may be obtained by using `simplify=NA`.
* [IMPORTANT CHANGE, NEW FUNCTIONS] #120: `stri_extract_words()` has been
renamed `stri_extract_all_words()` and `stri_locate_boundaries()` -
`stri_locate_all_boundaries()` as well as `stri_locate_words()` -
`stri_locate_all_words()`. New functions are now available:
`stri_locate_first_boundaries()`, `stri_locate_last_boundaries()`,
`stri_locate_first_words()`, `stri_locate_last_words()`,
`stri_extract_first_words()`, `stri_extract_last_words()`.
* [IMPORTANT CHANGE] #111: `opts_regex`, `opts_collator`, `opts_fixed`, and
`opts_brkiter` can now be supplied individually via `...`.
In other words, you may now simply call, e.g.,
`stri_detect_regex(str, pattern, case_insensitive=TRUE)` instead of
`stri_detect_regex(str, pattern,
opts_regex=stri_opts_regex(case_insensitive=TRUE))`.
* [NEW FEATURE] #110: Fixed pattern search engine's settings can
now be supplied via `opts_fixed` argument in `stri_*_fixed()`,
see `stri_opts_fixed()`. A simple (not suitable for natural language
processing) yet very fast `case_insensitive` pattern matching can be
performed now. `stri_extract_*_fixed()` is again available.
* [NEW FEATURE] #23: `stri_extract_all_fixed()`, `stri_count()`, and
`stri_locate_all_fixed()` may now also look for overlapping pattern
matches, see `?stri_opts_fixed`.
* [NEW FEATURE] #129: `stri_match_*_regex()` gained a `cg_missing` argument.
* [NEW FEATURE] #117: `stri_extract_all_*()`, `stri_locate_all_*()`,
`stri_match_all_*()` gained a new argument: `omit_no_match`.
Setting it to `TRUE` makes these functions compatible with their
**`stringr`** equivalents.
* [NEW FEATURE] #118: `stri_wrap()` gained `indent`, `exdent`, `initial`,
and `prefix` arguments. Moreover, Knuth's dynamic word wrapping algorithm
now assumes that the cost of printing the last line is zero, see #128.
* [NEW FEATURE] #122: `stri_subset()` gained an `omit_na` argument.
* [NEW FEATURE] `stri_list2matrix()` gained an `n_min` argument.
* [NEW FEATURE] #126: `stri_split()` is now also able to act
just like `stringr::str_split_fixed()`.
* [NEW FEATURE] #119: `stri_split_boundaries()` now has
`n`, `tokens_only`, and `simplify` arguments. Additionally,
`stri_extract_all_words()` is now equipped with `simplify` arg.
* [NEW FEATURE] #116: `stri_paste()` gained a new argument:
`ignore_null`. Setting it to `TRUE` makes this function more compatible
with `paste()`.
* [OTHER] #123: `useDynLib` is used to speed up symbol look-up in
the compiled dynamic library.
* [BUGFIX] #114: `stri_paste()`: could return result in an incorrect order.
* [BUGFIX] #94: Run-time errors on Solaris caused by setting
`-DU_DISABLE_RENAMING=1` - memory allocation errors in, among others,
the ICU `UnicodeString`. This setting also caused some `ASAN` sanity check
failures within ICU code.
## 0.3-1 (2014-11-06)
* [IMPORTANT CHANGE] #87: `%>%` overlapped with the pipe operator from
the `magrittr` package; now each operator like `%>%` has been renamed `%s>%`.
* [IMPORTANT CHANGE] #108: Now the `BreakIterator` (for text boundary analysis)
may be more easily controlled via `stri_opts_brkiter()` (see options `type`
and `locale` which aim to replace now-removed `boundary` and `locale`
parameters to `stri_locate_boundaries()`, `stri_split_boundaries()`,
`stri_trans_totitle()`, `stri_extract_words()`, and `stri_locate_words()`).
* [NEW FUNCTIONS] #109: `stri_count_boundaries()` and `stri_count_words()`
count the number of text boundaries in a string.
* [NEW FUNCTIONS] #41: `stri_startswith_*()` and `stri_endswith_*()`
determine whether a string starts or ends with a given pattern.
* [NEW FEATURE] #102: `stri_replace_all_*()` now all have the `vectorize_all`
parameter, which defaults to `TRUE` for backward compatibility.
* [NEW FUNCTION] #91: Added `stri_subset_*()` - a convenient and more efficient
substitute for `str[stri_detect_*(str, ...)]`.
* [NEW FEATURE] #100: `stri_split_fixed()`, `stri_split_charclass()`,
`stri_split_regex()`, `stri_split_coll()` gained a `tokens_only` parameter,
which defaults to `FALSE` for backward compatibility.
* [NEW FUNCTION] #105: `stri_list2matrix()` converts lists of atomic vectors
to character matrices, useful in conjunction with `stri_split()`
and `stri_extract()`.
* [NEW FEATURE] #107: `stri_split_*()` now allow
setting an `omit_empty=NA` argument.
* [NEW FEATURE] #106: `stri_split()` and `stri_extract_all()`
gained a `simplify` argument
(if `TRUE`, then `stri_list2matrix(..., byrow=TRUE)`
is called on the resulting list).
* [NEW FUNCTION] #77: `stri_rand_lipsum()` generates
a (pseudo)random dummy *lorem ipsum* text.
* [NEW FEATURE] #98: `stri_trans_totitle()` gained a `opts_brkiter`
parameter; it indicates which ICU `BreakIterator` should be used when
case mapping.
* [NEW FEATURE] `stri_wrap()` gained a new parameter: `normalize`.
* [BUGFIX] #86: `stri_*_fixed()`, `stri_*_coll()`, and `stri_*_regex()` could
give incorrect results if one of search strings were of length 0.
* [BUGFIX] #99: `stri_replace_all()` did not use the `replacement` arg.
* [BUGFIX] #112: Some of the objects were not PROTECTed from
garbage collection - this could have led to spontaneous SEGFAULTS.
* [BUGFIX] Some collator's options were not passed correctly to ICU services.
* [BUGFIX] Memory leaks as detected by
`valgrind --tool=memcheck --leak-check=full` have been removed.
* [DOCUMENTATION] Significant extensions/clean ups in the stringi manual.
## 0.2-5 (2014-05-16)
* Some examples are no longer run if `icudt` is not available
(this was reverted in a future version though).
## 0.2-4 (2014-05-15)
* [BUGFIX] Fixed issues with loading of misaligned addresses
in `stri_*_fixed()`.
## 0.2-3 (2014-05-14)
* [IMPORTANT CHANGE] `stri_cmp*()` now do not allow for passing
`opts_collator=NA`. From now on, `stri_cmp_eq()`, `stri_cmp_neq()`,
and the new operators `%===%`, `%!==%`, `%stri===%`, and `%stri!==%`
are locale-independent operations, which base on code point comparisons.
New functions `stri_cmp_equiv()` and `stri_cmp_nequiv()`
(and from now on also `%==%`, `%!=%`, `%stri==%`, and `%stri!=%`)
test for canonical equivalence.
* [IMPORTANT CHANGE] `stri_*_fixed()` search functions now perform
a locale-independent exact (byte-wise, of course after conversion to UTF-8)
pattern search. All the `Collator`-based, locale-dependent search routines
are now available via `stri_*_coll()`. The reason behind this is that
ICU's `USearch` has currently very poor performance. What is more,
in many search tasks exact pattern matching is sufficient anyway.
* [GENERAL] `stri_*_fixed` now use a tweaked Knuth-Morris-Pratt search
algorithm which improves the search performance drastically.
* [IMPORTANT CHANGE] `stri_enc_nf*()` and `stri_enc_isnf*()` function families
have been renamed `stri_trans_nf*()` and `stri_trans_isnf*()`,
respectively -- they deal with text transforming,
and not with character encoding. Note that all of these may
be performed by ICU's `Transliterator` too (see below).
* [NEW FUNCTION] `stri_trans_general()` and `stri_trans_list()` give access
to ICU's `Transliterator`: they may be used to perform some generic
text transforms, like Unicode normalisation, case folding, etc.
* [NEW FUNCTION `stri_split_boundaries()` uses ICU's `BreakIterator`
to split strings at specific text boundaries. Moreover,
`stri_locate_boundaries()` indicates positions of these boundaries.
* [NEW FUNCTION] `stri_extract_words()` uses ICU's `BreakIterator` to
extract all words from a text. Additionally, `stri_locate_words()`
locates start and end positions of words in a text.
* [NEW FUNCTION] `stri_pad()`, `stri_pad_left()`, `stri_pad_right()`,
and `stri_pad_both()` pad a string with a specific code point.
* [NEW FUNCTION] `stri_wrap()` breaks paragraphs of text into lines.
Two algorithms (greedy and minimal raggedness) are available.
* [IMPORTANT CHANGE] `stri_*_charclass()` search functions now
rely solely on ICU's `UnicodeSet` patterns. All the previously accepted
charclass identifiers became invalid. However, new patterns
should now be more familiar to the users (they are regex-like).
Moreover, we observe a very nice performance gain.
* [IMPORTANT CHANGE] `stri_sort()` now does not include `NA`s
in output vectors by default, for compatibility with `sort()`.
Moreover, currently none of the input vector's attributes are preserved.
* [NEW FUNCTION] `stri_unique()` extracts unique elements from
a character vector.
* [NEW FUNCTIONS] `stri_duplicated()` and `stri_duplicated_any()`
determine duplicate elements in a character vector.
* [NEW FUNCTION] `stri_replace_na()` replaces `NA`s in a character vector
with a given string, useful for emulating, e.g., R's `paste()` behaviour.
* [NEW FUNCTION] `stri_rand_shuffle()` generates a random permutation
of code points in a string.
* [NEW FUNCTION] `stri_rand_strings()` generates random strings.
* [NEW FUNCTIONS] New functions and binary operators for string comparison:
`stri_cmp_eq()`, `stri_cmp_neq()`, `stri_cmp_lt()`, `stri_cmp_le()`,
`stri_cmp_gt()`, `stri_cmp_ge()`, `%==%`, `%!=%`, `%<%`, `%<=%`,
`%>%`, `%>=%`.
* [NEW FUNCTION] `stri_enc_mark()` reads declared encodings of character
strings as seen by stringi.
* [NEW FUNCTION] `stri_enc_tonative(str)` is an alias to
`stri_encode(str, NULL, NULL)`.
* [NEW FEATURE] `stri_order()` and `stri_sort()` now have an additional
argument `na_last` (defaults to `TRUE` and `NA`, respectively).
* [NEW FEATURE] `stri_replace_all_charclass()`, `stri_extract_all_charclass()`,
and `stri_locate_all_charclass()` now have a new argument, `merge`
(defaults to `FALSE` for backward-compatibility). It may be used
to, e.g., replace sequences of white spaces with a single space.
* [NEW FEATURE] `stri_enc_toutf8()` now has a new `validate` argument
(which defaults to `FALSE` for backward-compatibility). It may be used
in a (rare) case where a user wants to fix an invalid UTF-8 byte sequence.
`stri_length()` (among others) now detects invalid UTF-8 byte sequences.
* [NEW FEATURE] All binary operators `%???%` now also have aliases `%stri???%`.
* [GENERAL] Performance improvements in `StriContainerUTF8`
and `StriContainerUTF16` (they affect most other functions).
* [GENERAL] Significant performance improvements in `stri_join()`,
`stri_flatten()`, `stri_cmp()`, `stri_trans_to*()`, and others.
* [GENERAL] Added 3rd mirror site for our `icudt` binary distribution.
* `U_MISSING_RESOURCE_ERROR` message in `StriException` now suggests
calling `stri_install_check()`.
* [BUGFIX] UTF-8 BOMs are now silently removed from input strings.
* [BUGFIX] No more attempts to re-encode UTF-8 encoded strings
if native encoding is UTF-8 in `StriContainerUTF8`.
* [BUGFIX] Possible memory leaks when throwing errors via `Rf_error()`.
* [BUGFIX] `stri_order()` and `stri_cmp()` could return incorrect results
for `opts_collator=NA`.
* [BUGFIX] `stri_sort()` did not guarantee to return strings in UTF-8.
## 0.1-25 (2014-03-12)
* LICENSE tweaks.
* First CRAN release.
## 0.1-24 (2014-03-11)
* Fixed bugs detected with `ASAN` and `UBSAN`,
e.g., fixed `CharClass::gcmask` type (`enum` -> `uint32_t`)
(reported by `UBSAN`).
* Fixed array over-runs detected with `valgrind` in `string8.h`.
* Fixed uninitialised class fields in `StriContainerUTF8`
(reported by `valgrind`).
## 0.1-23 (2014-03-11)
* License changed to BSD-3-clause, COPYRIGHTS updated.
* `icudt` is not shipped with stringi anymore;
it is now downloaded in `install.libs.R` from one of our servers.
* New functions: `stri_install_check()`, `stri_install_icudt()`.
## 0.1-22 (2014-02-20)
* System ICU is used on systems which do have one (version >= 50 needed).
ICU is auto-detected with `pkg-config` in `configure`.
Pass `'--disable-pkg-config'` to `configure` to force building
ICU from sources.
* `icudt52b` (custom subset) is now shipped with stringi
(for big-endian, ASCII systems).
## 0.1-21 (2014-02-19)
* Fixed some issues on Solaris while preparing stringi
for CRAN submission.
## 0.1-20 (2014-02-17)
* ICU4C 52.1 sources included (common, i18n, stubdata + `icu52dt.dat`
loaded dynamically). Compilation via Makevars.
* stringi does not depend on any external libraries anymore.
## 0.1-11 (2013-11-16)
* ICU4C is now statically linked on Windows.
* First OS X binary build.
* The package is being intensively tested by our students at Warsaw
University of Technology.
## 0.1-10 (2013-11-13)
* Using `pkg-config` via `configure` to look for ICU4C libs.
## 0.1-6 (2013-07-05)
* First Windows binary build.
* Compilation passed on Oracle Sun Studio compiler collection.
* By now we have implemented most of the functionality
scheduled for milestone 0.1.
## 0.1-1 (2013-01-05)
* The stringi project has been started.
|