File: README.md

package info (click to toggle)
erlang-idna 6.1.1-3~bpo10+1
  • links: PTS, VCS
  • area: main
  • in suites: buster-backports
  • size: 5,388 kB
  • sloc: erlang: 47,646; makefile: 7; sh: 3
file content (71 lines) | stat: -rw-r--r-- 3,137 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
## erlang-idna

A pure Erlang IDNA implementation that folllow the [RFC5891](https://tools.ietf.org/html/rfc5891).

* support IDNA 2008 and IDNA 2003.
* label validation:
    - [x] **check NFC**: Label must be in Normalization Form C
    - [x] **check hyphen**: The Unicode string MUST NOT contain "--" (two consecutive hyphens) in
    the third and fourth character positions and MUST NOT start or end
    with a "-" (hyphen).
    - [x]  **Leading Combining Marks**: The Unicode string MUST NOT begin with a combining mark or combining character (see The Unicode Standard, Section 2.11 [Unicode](https://tools.ietf.org/html/rfc5891#ref-Unicode) for an  exact definition).
    - [x] **Contextual Rules**: The Unicode string MUST NOT contain any characters whose validity is
    context-dependent, unless the validity is positively confirmed by a contextual rule.  To check this, each code point identified as  CONTEXTJ or CONTEXTO in the Tables document [RFC5892](https://tools.ietf.org/html/rfc5892#section-2.7) MUST have a  non-null rule.  If such a code point is missing a rule, the label is  invalid.  If the rule exists but the result of applying the rule is  negative or inconclusive, the proposed label is invalid.
    - [x] **check BIDI**: label contains any characters from scripts that are
    written from right to left, it MUST meet the Bidi criteria  [rfc5893](https://tools.ietf.org/html/rfc5893)




## Usage



`idna:encode/{1,2}` and `idna:decode/{1, 2}` functions are used to encode or decode an Internationalized Domain
Names using IDNA protocol.

Input can be mapped to unicode using [uts46](https://unicode.org/reports/tr46/#Introduction)
by setting  the `uts46` flag to true (default is false). If transition from IDNA 2003 to
IDNA 2008 is needed, the flag `transitional` can be set to `true`, (`default` is false). If
conformance to STD3 is needed, the flag `std3_rules` can be set to true. (default is `false`).

example:

```erlang
1> idna:encode("日本語。JP", [uts46]).
"xn--wgv71a119e.xn--jp-"
2> idna:encode("日本語.JP", [uts46]).
"xn--wgv71a119e.xn--jp-"
...
```


Legacy support of IDNA 2003 is also available with  `to_ascii` and `to_unicode` functions:


```erlang
1> Domain = "www.詹姆斯.com".
[119,119,119,46,35449,22982,26031,46,99,111,109]
2> Encoded =  idna:to_ascii("www.詹姆斯.com").
"www.xn--8ws00zhy3a.com"
3> idna:to_unicode(Encoded).
[119,119,119,46,35449,22982,26031,46,99,111,109]
```



Update Unicode data

wget -O test/IdnaTestV2.txt https://www.unicode.org/Public/idna/latest/IdnaTestV2.txt
wget -O uc_spec/ArabicShaping.txt https://www.unicode.org/Public/UNIDATA/ArabicShaping.txt
wget -O uc_spec/IdnaMappingTable.txt https://www.unicode.org/Public/idna/latest/IdnaMappingTable.txt
wget -O uc_spec/Scripts.txt https://www.unicode.org/Public/UNIDATA/Scripts.txt
wget -O uc_spec/UnicodeData.txt https://www.unicode.org/Public/UNIDATA/UnicodeData.txt

git clone https://github.com/kjd/idna.git
./idna/tools/idna-data make-table --version 13.0.0 > uc_spec/idna-table.txt

cd uc_spec
./gen_idnadata_mod.escript
./gen_idna_table_mod.escript
./gen_idna_mapping_mod.escript