File: README.rst

package info (click to toggle)
python3-precis-i18n 1.0.0-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye, buster, sid
  • size: 680 kB
  • sloc: python: 1,716; makefile: 3
file content (178 lines) | stat: -rw-r--r-- 9,247 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
PRECIS-i18n: Internationalized Usernames and Passwords
======================================================

|MIT licensed| |Build Status| |codecov.io|

If you want your application to accept unicode user names and passwords,
you must be careful in how you validate and compare them. The PRECIS
framework makes internationalized user names and passwords safer for use
by applications. PRECIS profiles transform unicode strings into a
canonical form, suitable for comparison.

This module implements the PRECIS Framework as described in:

-  PRECIS Framework: Preparation, Enforcement, and Comparison of
   Internationalized Strings in Application Protocols (`RFC
   8264 <https://tools.ietf.org/html/rfc8264>`__)
-  Preparation, Enforcement, and Comparison of Internationalized Strings
   Representing Usernames and Passwords (`RFC
   8265 <https://tools.ietf.org/html/rfc8265>`__)
-  Preparation, Enforcement, and Comparison of Internationalized Strings
   Representing Nicknames (`RFC
   8266 <https://tools.ietf.org/html/rfc8266>`__)

Requires Python 3.3 or later.

Usage
-----

Use the ``get_profile`` function to obtain a profile object, then use
its ``enforce`` method. The ``enforce`` method returns a Unicode string.

::


    >>> from precis_i18n import get_profile
    >>> username = get_profile('UsernameCaseMapped')
    >>> username.enforce('Kevin')
    'kevin'
    >>> username.enforce('\u212Aevin')
    'kevin'
    >>> username.enforce('\uFF2Bevin')
    'kevin'
    >>> username.enforce('\U0001F17Aevin')
    Traceback (most recent call last):
        ...
    UnicodeEncodeError: 'UsernameCaseMapped' codec can't encode character '\U0001f17a' in position 0: DISALLOWED/symbols

Alternatively, you can use the Python ``str.encode`` API. Import the
``precis_i18n.codec`` module to register the PRECIS codec names. Now you
can use the ``str.encode`` method with any unicode string. The result
will be a UTF-8 encoded byte string or a ``UnicodeEncodeError`` if the
string is disallowed.

::


    >>> import precis_i18n.codec
    >>> 'Kevin'.encode('UsernameCasePreserved')
    b'Kevin'
    >>> '\u212Aevin'.encode('UsernameCasePreserved')
    b'Kevin'
    >>> '\uFF2Bevin'.encode('UsernameCasePreserved')
    b'Kevin'
    >>> '\u212Aevin'.encode('UsernameCaseMapped')
    b'kevin'
    >>> '\uFF2Bevin'.encode('OpaqueString')
    b'\xef\xbc\xabevin'
    >>> '\U0001F17Aevin'.encode('UsernameCasePreserved')
    Traceback (most recent call last):
        ...
    UnicodeEncodeError: 'UsernameCasePreserved' codec can't encode character '\U0001f17a' in position 0: DISALLOWED/symbols

Supported Profiles and Codecs
-----------------------------

Each PRECIS profile has a corresponding codec name. The ``CaseMapped``
variant converts the string to lower case for implementing
case-insensitive comparison.

-  UsernameCasePreserved
-  UsernameCaseMapped
-  OpaqueString
-  NicknameCasePreserved
-  NicknameCaseMapped

The ``CaseMapped`` profiles use Unicode ``ToLower`` per the latest RFC. Previous
verions of this package used Unicode Default Case Folding. There are CaseMapped variants
for different case transformations. These profile names are deprecated:

-  UsernameCaseMapped:ToLower
-  UsernameCaseMapped:CaseFold
-  NicknameCaseMapped:ToLower
-  NicknameCaseMapped:CaseFold

The PRECIS base string classes are also available as codecs:

-  IdentifierClass
-  FreeFormClass

Error Messages
--------------

A PRECIS profile raises a ``UnicodeEncodeError`` exception if a string
is disallowed. The ``reason`` field specifies the kind of error.

+------------------------------+---------------------------------------------+
| Reason                       | Explanation                                 |
+==============================+=============================================+
| DISALLOWED/arabic\_indic     | Arabic-Indic digits cannot be mixed with    |
|                              | Extended Arabic-Indic Digits. (Context)     |
+------------------------------+---------------------------------------------+
| DISALLOWED/bidi\_rule        | Right-to-left string cannot contain         |
|                              | left-to-right characters due to the "Bidi"  |
|                              | rule. (Context)                             |
+------------------------------+---------------------------------------------+
| DISALLOWED/controls          | Control character is not allowed.           |
+------------------------------+---------------------------------------------+
| DISALLOWED/empty             | After applying the profile, the result      |
|                              | cannot be empty.                            |
+------------------------------+---------------------------------------------+
| DISALLOWED/exceptions        | Exception character is not allowed.         |
+------------------------------+---------------------------------------------+
| DISALLOWED/extended\_arabic\ | Extended Arabic-Indic digits cannot be      |
| _indic                       | mixed with Arabic-Indic Digits. (Context)   |
+------------------------------+---------------------------------------------+
| DISALLOWED/greek\_keraia     | Greek keraia must be followed by a Greek    |
|                              | character. (Context)                        |
+------------------------------+---------------------------------------------+
| DISALLOWED/has\_compat       | Compatibility characters are not allowed.   |
+------------------------------+---------------------------------------------+
| DISALLOWED/hebrew\           | Hebrew punctuation geresh or gershayim must |
| _punctuation                 | be preceded by Hebrew character. (Context)  |
+------------------------------+---------------------------------------------+
| DISALLOWED/katakana\_middle\ | Katakana middle dot must be accompanied by  |
| _dot                         | a Hiragana, Katakana, or Han character.     |
|                              | (Context)                                   |
+------------------------------+---------------------------------------------+
| DISALLOWED/middle\_dot       | Middle dot must be surrounded by the letter |
|                              | 'l'. (Context)                              |
+------------------------------+---------------------------------------------+
| DISALLOWED/not\_idempotent   | After reapplying the profile, the result is |
|                              | not stable.                                 |
+------------------------------+---------------------------------------------+
| DISALLOWED/old\_hangul\_jamo | Conjoining Hangul Jamo is not allowed.      |
+------------------------------+---------------------------------------------+
| DISALLOWED/other             | Other character is not allowed.             |
+------------------------------+---------------------------------------------+
| DISALLOWED/other\_letter\    | Non-traditional letter or digit is not      |
| _digits                      | allowed.                                    |
+------------------------------+---------------------------------------------+
| DISALLOWED/precis\           | Default ignorable or non-character is not   |
| _ignorable\_properties       | allowed.                                    |
+------------------------------+---------------------------------------------+
| DISALLOWED/punctuation       | Non-ASCII punctuation character is not      |
|                              | allowed.                                    |
+------------------------------+---------------------------------------------+
| DISALLOWED/spaces            | Space character is not allowed.             |
+------------------------------+---------------------------------------------+
| DISALLOWED/symbols           | Non-ASCII symbol character is not allowed.  |
+------------------------------+---------------------------------------------+
| DISALLOWED/unassigned        | Unassigned unicode character is not         |
|                              | allowed.                                    |
+------------------------------+---------------------------------------------+
| DISALLOWED/zero\_width\      | Zero width joiner must immediately follow a |
| _joiner                      | combining virama. (Context)                 |
+------------------------------+---------------------------------------------+
| DISALLOWED/zero\_width\      | Zero width non-joiner must immediately      |
| _nonjoiner                   | follow a combining virama, or appear where  |
|                              | it breaks a cursive connection in a         |
|                              | formally cursive script. (Context)          |
+------------------------------+---------------------------------------------+

.. |MIT licensed| image:: https://img.shields.io/badge/license-MIT-blue.svg
   :target: https://raw.githubusercontent.com/byllyfish/precis_i18n/master/LICENSE.txt
.. |Build Status| image:: https://travis-ci.org/byllyfish/precis_i18n.svg?branch=master
   :target: https://travis-ci.org/byllyfish/precis_i18n
.. |codecov.io| image:: https://codecov.io/gh/byllyfish/precis_i18n/coverage.svg?branch=master
   :target: https://codecov.io/gh/byllyfish/precis_i18n?branch=master