File: README.md

package info (click to toggle)
rust-secular 1.0.1-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 1,024 kB
  • sloc: makefile: 4
file content (81 lines) | stat: -rw-r--r-- 2,276 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81

# Secular

[![MIT][s2]][l2] [![Latest Version][s1]][l1] [![Chat on Miaou][s3]][l3]

[s1]: https://img.shields.io/crates/v/secular.svg
[l1]: https://crates.io/crates/secular

[s2]: https://img.shields.io/badge/license-MIT-blue.svg
[l2]: LICENSE

[s3]: https://miaou.dystroy.org/static/shields/room.svg
[l3]: https://miaou.dystroy.org/3?broot

Provide a lowercased diacritics-free version of a character or a string.

For example return `e` for `é`.

Secular's char lookup is an inlined lookup of a static table, which means it's possible to use it in performance sensitive code.

Secular also performs (optionally) Unicode normalization.

A common use case for the removal of diacritics and some unicode arterfacts is to ease searches:

![broot search](doc/broot-search.png)

(diacritics ignoring normalized search in [broot](https://dystroy.org/broot): the user typed `rève`)

## Declaration

By default, diacritics removal is only done on ascii chars, so to include a smaller table.

If you want to handle the whole BMP, use the "bmp" feature" (the downside is that the binary is bigger as it includes a big map).

Default import:

	[dependencies]
	secular = "0.3"

For more characters (the BMP):

	[dependencies]
	secular = { version="0.3", features=["bmp"] }

With Unicode normalization functions (using the unicode-normalization crate):

	[dependencies]
	secular = { version="0.3", features=["normalization"] }

or

	[dependencies]
	secular = { version="0.3", features=["bmp","normalization"] }

This feature is optional so that you can avoid importing the unicode-normalization crate (note that it's used in many other crates so it's possible your text processing application already uses it).

## Usage

On characters:

```rust
use secular::*;
let s = "Comunicações"; // normalized string (length=12)
let chars: Vec<char> = s.chars().collect();
assert_eq!(chars.len(), 12);
assert_eq!(chars[0], 'C');
assert_eq!(lower_lay_char(chars[0]), 'c');
assert_eq!(chars[8], 'ç');
assert_eq!(lower_lay_char(chars[8]), 'c');
```

On strings:

```rust
use secular::*;
let s = "Comunicações"; // unnormalized string (length=14)
assert_eq!(s.chars().count(), 14);
let s = normalized_lower_lay_string(s);
assert_eq!(s.chars().count(), 12);
assert_eq!(s, "comunicacoes");
```