File: README.md

package info (click to toggle)
rustc 1.85.0%2Bdfsg3-1
  • links: PTS, VCS
  • area: main
  • in suites: experimental, sid, trixie
  • size: 893,396 kB
  • sloc: xml: 158,127; python: 35,830; javascript: 19,497; cpp: 19,002; sh: 17,245; ansic: 13,127; asm: 4,376; makefile: 1,051; perl: 29; lisp: 29; ruby: 19; sql: 11
file content (122 lines) | stat: -rw-r--r-- 4,794 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
Iterators which split strings on Grapheme Cluster or Word boundaries, according
to the [Unicode Standard Annex #29](http://www.unicode.org/reports/tr29/) rules.

[![Build Status](https://github.com/unicode-rs/unicode-segmentation/actions/workflows/rust.yml/badge.svg)](https://github.com/unicode-rs/unicode-segmentation/actions/workflows/rust.yml)

[Documentation](https://unicode-rs.github.io/unicode-segmentation/unicode_segmentation/index.html)

```rust
use unicode_segmentation::UnicodeSegmentation;

fn main() {
    let s = "a̐éö̲\r\n";
    let g = s.graphemes(true).collect::<Vec<&str>>();
    let b: &[_] = &["a̐", "é", "ö̲", "\r\n"];
    assert_eq!(g, b);

    let s = "The quick (\"brown\") fox can't jump 32.3 feet, right?";
    let w = s.unicode_words().collect::<Vec<&str>>();
    let b: &[_] = &["The", "quick", "brown", "fox", "can't", "jump", "32.3", "feet", "right"];
    assert_eq!(w, b);

    let s = "The quick (\"brown\")  fox";
    let w = s.split_word_bounds().collect::<Vec<&str>>();
    let b: &[_] = &["The", " ", "quick", " ", "(", "\"", "brown", "\"", ")", "  ", "fox"];
    assert_eq!(w, b);
}
```

# no_std

unicode-segmentation does not depend on libstd, so it can be used in crates
with the `#![no_std]` attribute.

# crates.io

You can use this package in your project by adding the following
to your `Cargo.toml`:

```toml
[dependencies]
unicode-segmentation = "1.10.1"
```

# Change Log

## 1.11.0
* [#124](https://github.com/unicode-rs/unicode-segmentation/pull/124) Update data to Unicode 15.1
* [#128](https://github.com/unicode-rs/unicode-segmentation/pull/128) Add `size_hint` to iterators

## 1.10.1
* [#113](https://github.com/unicode-rs/unicode-segmentation/pull/113) Use criterion.rs for word benchmarks
* [#112](https://github.com/unicode-rs/unicode-segmentation/pull/112) Improve table search speed through lookups

## 1.10.0
* [#107](https://github.com/unicode-rs/unicode-segmentation/pull/107) Upgrade to Unicode 15.0.0
* [#104](https://github.com/unicode-rs/unicode-segmentation/pull/104) Supersedes and fixes [#75](https://github.com/unicode-rs/unicode-segmentation/pull/75)

## 1.9.0
* [#101](https://github.com/unicode-rs/unicode-segmentation/pull/101) Upgrade to Unicode 14.0.0

## 1.8.0
* [#100](https://github.com/unicode-rs/unicode-segmentation/pull/100) * [#100](https://github.com/unicode-rs/unicode-segmentation/pull/100) - Increase `#[inline]` opportunities, resulting in 15-40% performance improvement.
* [#95](https://github.com/unicode-rs/unicode-segmentation/pull/98) Implement debug for Graphemes
* [#94](https://github.com/unicode-rs/unicode-segmentation/pull/94) Add Initial fuzzer for oss-fuzz integration
* [#93](https://github.com/unicode-rs/unicode-segmentation/pull/93) Fix  unused imports and deprecated pattern warnings
* [#91](https://github.com/unicode-rs/unicode-segmentation/pull/92) Made local variable immutable by moving it into loop
* [#91](https://github.com/unicode-rs/unicode-segmentation/pull/91) Add new iterator [UnicodeWordIndices](https://unicode-rs.github.io/unicode-segmentation/unicode_segmentation/struct.UnicodeWordIndices.html) and [unicode_word_indices](https://unicode-rs.github.io/unicode-segmentation/unicode_segmentation/trait.UnicodeSegmentation.html#tymethod.unicode_word_indices)

## 1.7.1

* Update docs on version number

## 1.7.0

* [#87](https://github.com/unicode-rs/unicode-segmentation/pull/87) Upgrade to Unicode 13
* [#79](https://github.com/unicode-rs/unicode-segmentation/pull/79) Implement a special-case lookup for ascii grapheme categories
* [#77](https://github.com/unicode-rs/unicode-segmentation/pull/77) Optimization for grapheme iteration

## 1.6.0

* [#72](https://github.com/unicode-rs/unicode-segmentation/pull/72) Upgrade to Unicode 12

## 1.5.0

* [#68](https://github.com/unicode-rs/unicode-segmentation/pull/68) Upgrade to Unicode 11

## 1.4.0

* [#56](https://github.com/unicode-rs/unicode-segmentation/pull/56) Upgrade to Unicode 10

## 1.3.0

* [#24](https://github.com/unicode-rs/unicode-segmentation/pull/24) Add support for sentence boundaries
* [#44](https://github.com/unicode-rs/unicode-segmentation/pull/44) Treat `gc=No` as a subset of `gc=N`

## 1.2.1

* [#37](https://github.com/unicode-rs/unicode-segmentation/pull/37):
  Fix panic in `provide_context`.
* [#40](https://github.com/unicode-rs/unicode-segmentation/pull/40):
  Fix crash in `prev_boundary`.

## 1.2.0

* New `GraphemeCursor` API allows random access and bidirectional iteration.
* Fixed incorrect splitting of certain emoji modifier sequences.

## 1.1.0

* Add `as_str` methods to the iterator types.

## 1.0.3

* Code cleanup and additional tests.

## 1.0.1

* Fix a bug affecting some grapheme clusters containing Prepend characters.

## 1.0.0

* Upgrade to Unicode 9.0.0.