1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445
|
<pre>Internet Engineering Task Force (IETF) M. Davis
Request for Comments: 6067 Google
Category: Informational A. Phillips
ISSN: 2070-1721 Lab126
Y. Umaoka
IBM
December 2010
<span class="h1">BCP 47 Extension U</span>
Abstract
This document specifies an Extension to <a href="https://www.rfc-editor.org/bcp/bcp47">BCP 47</a> that provides subtags
that specify language and/or locale-based behavior or refinements to
language tags, according to work done by the Unicode Consortium.
Status of This Memo
This document is not an Internet Standards Track specification; it is
published for informational purposes.
This document is a product of the Internet Engineering Task Force
(IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Not all documents
approved by the IESG are a candidate for any level of Internet
Standard; see <a href="./rfc5741#section-2">Section 2 of RFC 5741</a>.
Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
<a href="http://www.rfc-editor.org/info/rfc6067">http://www.rfc-editor.org/info/rfc6067</a>.
Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to <a href="https://www.rfc-editor.org/bcp/bcp78">BCP 78</a> and the IETF Trust's Legal
Provisions Relating to IETF Documents
(<a href="http://trustee.ietf.org/license-info">http://trustee.ietf.org/license-info</a>) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
<span class="grey">Davis, et al. Informational [Page 1]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-2" ></span>
<span class="grey"><a href="./rfc6067">RFC 6067</a> <a href="https://www.rfc-editor.org/bcp/bcp47">BCP 47</a> Unicode Locale Extension December 2010</span>
Table of Contents
<a href="#section-1">1</a>. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . <a href="#page-2">2</a>
<a href="#section-1.1">1.1</a>. Requirements Language . . . . . . . . . . . . . . . . . . . <a href="#page-2">2</a>
<a href="#section-2">2</a>. <a href="https://www.rfc-editor.org/bcp/bcp47">BCP 47</a> Required Information . . . . . . . . . . . . . . . . . . <a href="#page-2">2</a>
<a href="#section-2.1">2.1</a>. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . <a href="#page-4">4</a>
<a href="#section-2.1.1">2.1.1</a>. Canonicalization . . . . . . . . . . . . . . . . . . . <a href="#page-5">5</a>
<a href="#section-2.2">2.2</a>. Registration Form . . . . . . . . . . . . . . . . . . . . . <a href="#page-6">6</a>
<a href="#section-3">3</a>. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . <a href="#page-6">6</a>
<a href="#section-4">4</a>. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . <a href="#page-6">6</a>
<a href="#section-5">5</a>. Security Considerations . . . . . . . . . . . . . . . . . . . . <a href="#page-7">7</a>
<a href="#section-6">6</a>. References . . . . . . . . . . . . . . . . . . . . . . . . . . <a href="#page-7">7</a>
<a href="#section-6.1">6.1</a>. Normative References . . . . . . . . . . . . . . . . . . . <a href="#page-7">7</a>
<a href="#section-6.2">6.2</a>. Informative References . . . . . . . . . . . . . . . . . . <a href="#page-7">7</a>
<span class="h2"><a class="selflink" id="section-1" href="#section-1">1</a>. Introduction</span>
[<a id="ref-BCP47">BCP47</a>] permits the definition and registration of language tag
extensions "that contain a language component and are compatible with
applications that understand language tags". This document defines
an extension for identifying Unicode locale-based variations using
language tags. The "singleton" identifier for this extension is 'u'.
<span class="h3"><a class="selflink" id="section-1.1" href="#section-1.1">1.1</a>. Requirements Language</span>
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <a href="./rfc2119">RFC 2119</a> [<a href="./rfc2119" title=""Key words for use in RFCs to Indicate Requirement Levels"">RFC2119</a>].
<span class="h2"><a class="selflink" id="section-2" href="#section-2">2</a>. <a href="https://www.rfc-editor.org/bcp/bcp47">BCP 47</a> Required Information</span>
Language tags, as defined by [<a href="#ref-BCP47" title=""Tags for Identifying Languages"">BCP47</a>], are useful for identifying the
language of content. They are also used as locale identifiers (or
can be mapped to locales) in many operating environments and APIs.
However, many locale identifiers also require additional "tailorings"
or options for specific values within a language, culture, region, or
other variation. This extension provides a mechanism for using these
additional tailorings within language tags for general interchange.
The Unicode Consortium defines a standardized, structured set of
locale data and identifiers for locale data in the "Common Locale
Data Repository" or "CLDR". The maintaining authority for the
extension defined by this document is the Unicode Consortium:
<span class="grey">Davis, et al. Informational [Page 2]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-3" ></span>
<span class="grey"><a href="./rfc6067">RFC 6067</a> <a href="https://www.rfc-editor.org/bcp/bcp47">BCP 47</a> Unicode Locale Extension December 2010</span>
+---------------+---------------------------------------------------+
| Item | Value |
+---------------+---------------------------------------------------+
| Name | Unicode Consortium |
| Contact Email | cldr-contact@unicode.org |
| Discussion | cldr-users@unicode.org |
| List Email | |
| URL Location | cldr.unicode.org |
| Specification | Unicode Technical Standard #35 Unicode Locale |
| | Data Markup Language (LDML), |
| | <a href="http://unicode.org/reports/tr35/">http://unicode.org/reports/tr35/</a> |
| Section | <a href="#section-3">Section 3</a> Unicode Language and Locale Identifiers |
+---------------+---------------------------------------------------+
The specification of extension subtags is provided by <a href="#section-3">Section 3</a>, Key
Type Definitions of Unicode Technical Standard #35: Unicode Locale
Data Markup Language [<a href="#ref-UTS35" title=""Unicode Technical Standard #35: Locale Data Markup Language (LDML)"">UTS35</a>]. As required by <a href="https://www.rfc-editor.org/bcp/bcp47">BCP 47</a>, subtags follow
the language tag ABNF and other rules for the formation of language
tags and subtags, are restricted to the ASCII letters and digits, are
not case sensitive, and do not exceed eight characters in length.
Note that any "well-formed" language tag (see <a href="./rfc5646#section-2.2.9">RFC 5646, Section 2.2.9</a>
[<a href="#ref-BCP47" title=""Tags for Identifying Languages"">BCP47</a>]) is also a well-formed locale identifier.
LDML [<a href="#ref-UTS35" title=""Unicode Technical Standard #35: Locale Data Markup Language (LDML)"">UTS35</a>] specifies a canonical representation. LDML is available
over the Internet and at no cost, and is available via a royalty-free
license at <a href="http://unicode.org/copyright.html">http://unicode.org/copyright.html</a>. LDML is versioned, and
each version of LDML is numbered, dated, and stable. Extension
subtags, once defined by LDML, are never retracted and never change
in meaning in a substantial way.
The structure of the Unicode locale extension is determined by the
Unicode CLDR Technical Committee, in accordance with the policies and
procedures in <a href="http://www.unicode.org/consortium/tc-procedures.html">http://www.unicode.org/consortium/tc-procedures.html</a>,
and subject to the Unicode Consortium Policies on
<a href="http://www.unicode.org/policies/policies.html">http://www.unicode.org/policies/policies.html</a>.
Changes that can be made by successive versions of LDML [<a href="#ref-UTS35" title=""Unicode Technical Standard #35: Locale Data Markup Language (LDML)"">UTS35</a>] by
the Unicode Consortium without requiring a new RFC include: the
allocation of new attributes, keywords, and types; clarifications or
non-material changes to an existing attribute, keyword, or type; and
compatible extensions to the overall syntactic structure of
attributes, keywords, and types. A new RFC would be required for
material changes to an existing attribute, keyword, or type, or an
incompatible change to the overall syntactic structure of attributes,
keywords, and types; however, such a change would be contrary to the
policies of the Unicode Consortium, and thus is not anticipated.
<span class="grey">Davis, et al. Informational [Page 3]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-4" ></span>
<span class="grey"><a href="./rfc6067">RFC 6067</a> <a href="https://www.rfc-editor.org/bcp/bcp47">BCP 47</a> Unicode Locale Extension December 2010</span>
<span class="h3"><a class="selflink" id="section-2.1" href="#section-2.1">2.1</a>. Summary</span>
The subtags available for use in the 'u' extension consist of a set
of attributes, keys, and types. Attributes, keys, types, and their
respective meanings are defined in <a href="#section-3">Section 3</a> (Unicode Language and
Locale Identifiers) of [<a href="#ref-UTS35" title=""Unicode Technical Standard #35: Locale Data Markup Language (LDML)"">UTS35</a>]. The following is a summary of that
definition:
o An 'attribute' is a subtag with a length of three to eight
characters following the singleton and preceding any 'keyword'
sequences. No attributes were defined at the time of this
document's publication.
o A 'keyword' is a sequence of subtags consisting of a 'key' subtag,
followed by zero or more 'type' subtags (so a 'key' might appear
alone and not be accompanied by a 'type' subtag). A 'key' MUST
NOT appear more than once in a language tag's extension string.
The order of the 'type' subtags within a 'keyword' is sometimes
significant to their interpretation.
A. A 'key' is a subtag with a length of exactly two characters.
Each 'key' is followed by zero or more 'type' subtags.
B. A 'type' is a subtag with a length of three to eight
characters following a 'key'. 'Type' subtags are specific to
a particular 'key' and the order of the 'type' subtags MAY be
significant to the interpretation of the 'keyword'.
For example, the language tag "de-DE-u-attr-co-phonebk" consists of:
o The base language tag "de-DE" (German as used in Germany), exactly
as defined by [<a href="#ref-BCP47" title=""Tags for Identifying Languages"">BCP47</a>] using subtags from the IANA Language Subtag
Registry.
o The singleton 'u', identifying this extension.
o The attribute 'attr', which is an example for illustration (no
attributes were defined at the time this document was published).
o The keyword 'co-phonebk', consisting to the key 'co' (Collation)
and the type 'phonebk' (Phonebook collation order).
Only the first occurrence of an attribute or key conveys meaning in a
language tag. When interpreting tags containing the Unicode locale
extension, duplicate attributes or keywords are ignored in the
following way: ignore any attribute that has already appeared in the
tag and ignore any keyword whose key has already occurred in the tag.
<span class="grey">Davis, et al. Informational [Page 4]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-5" ></span>
<span class="grey"><a href="./rfc6067">RFC 6067</a> <a href="https://www.rfc-editor.org/bcp/bcp47">BCP 47</a> Unicode Locale Extension December 2010</span>
Successive versions of [<a href="#ref-UTS35" title=""Unicode Technical Standard #35: Locale Data Markup Language (LDML)"">UTS35</a>] could define additional attributes,
keys, and types. Once defined, attributes, keys, and types will
never be removed.
Beginning with CLDR version 1.7.2, machine-readable files are
available listing the valid attributes, keys, and types for each
successive version of [<a href="#ref-UTS35" title=""Unicode Technical Standard #35: Locale Data Markup Language (LDML)"">UTS35</a>]. These releases are listed on
<a href="http://cldr.unicode.org/index/downloads">http://cldr.unicode.org/index/downloads</a>. Each release has an
associated data directory of the form
"<a href="http://unicode">http://unicode</a>.org/Public/cldr/<version>", where "<version>" is
replaced by the release number. For example, for version 1.7.2, the
"core.zip" file is located at
<a href="http://unicode.org/Public/cldr/1.7.2/core.zip">http://unicode.org/Public/cldr/1.7.2/core.zip</a>. Inside the "core.zip"
file, the path "common/bcp47" contains the data files defining the
valid attributes, keys, and types. The most recent version is always
identified by the version "latest" and can be accessed by the URL in
<a href="#section-2.2">Section 2.2</a>.
To get the version information in XML when working with the data
files, the XML parser must be validating. When the 'core.zip' file
is unzipped, the 'dtd' directory will be at the same level as the
'<a href="https://www.rfc-editor.org/bcp/bcp47">bcp47</a>' directory; this is required for correct validation. For each
release after CLDR 1.8, types introduced in that release are also
marked in the data files by the XML attribute "since", such as in the
following example:
<type name="adp" since="1.9"/>
The data is also currently maintained in a source code repository,
with each release tagged, for viewing directly without unzipping.
For example, see:
o <a href="http://unicode.org/repos/cldr/tags/release-1-7-2/common/bcp47/">http://unicode.org/repos/cldr/tags/release-1-7-2/common/bcp47/</a>
o <a href="http://unicode.org/repos/cldr/tags/release-1-8/common/bcp47/">http://unicode.org/repos/cldr/tags/release-1-8/common/bcp47/</a>
Some data in the CLDR data files might require reference to LDML
[<a href="#ref-UTS35" title=""Unicode Technical Standard #35: Locale Data Markup Language (LDML)"">UTS35</a>]. For specific information, see <a href="#appendix-Q">Appendix Q</a> in that document.
For example, LDML reserves the type 'codepoints' to define specific
code point ranges in Unicode for specific purposes.
<span class="h4"><a class="selflink" id="section-2.1.1" href="#section-2.1.1">2.1.1</a>. Canonicalization</span>
As required by [<a href="#ref-BCP47" title=""Tags for Identifying Languages"">BCP47</a>], the use of uppercase or lowercase letters is
not significant in the subtags used in this extension. The canonical
form for all subtags in the extension is lowercase. The canonical
order of attributes is in [<a href="#ref-US-ASCII" title=""ISO/IEC 646:1991, Information technology -- ISO 7-bit coded character set for information interchange."">US-ASCII</a>] order (that is, numbers before
letters, with letters sorted as lowercase US-ASCII code points). The
canonical order of keywords is in [<a href="#ref-US-ASCII" title=""ISO/IEC 646:1991, Information technology -- ISO 7-bit coded character set for information interchange."">US-ASCII</a>] order by key. The order
<span class="grey">Davis, et al. Informational [Page 5]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-6" ></span>
<span class="grey"><a href="./rfc6067">RFC 6067</a> <a href="https://www.rfc-editor.org/bcp/bcp47">BCP 47</a> Unicode Locale Extension December 2010</span>
of subtags within a keyword is significant; the meaning of this
extension is altered if those subtags are rearranged. Thus, the
canonical form of the extension never reorders the subtags within a
keyword.
<span class="h3"><a class="selflink" id="section-2.2" href="#section-2.2">2.2</a>. Registration Form</span>
Per <a href="./rfc5646#section-3.7">RFC 5646, Section 3.7</a> [<a href="#ref-BCP47" title=""Tags for Identifying Languages"">BCP47</a>]:
%%
Identifier: u
Description: Unicode Locale
Comments: Subtags for the identification of language and cultural
variations. Used to set behavior in locale APIs. Data is
located in the "common/bcp47" directory inside the referenced
URL. Unicode Technical Standard #35 (LDML) provides additional
reference material defining the keys and values.
For more details please see
<<a href="http://cldr.unicode.org/index/bcp47-extension">http://cldr.unicode.org/index/bcp47-extension</a>>.
Added: 2010-09-02
RFC: <a href="./rfc6067">RFC 6067</a>
Authority: Unicode Consortium
Contact_Email: cldr-contact@unicode.org
Mailing_List: cldr-users@unicode.org
URL: <a href="http://www.unicode.org/Public/cldr/latest/core.zip">http://www.unicode.org/Public/cldr/latest/core.zip</a>
%%
<span class="h2"><a class="selflink" id="section-3" href="#section-3">3</a>. Acknowledgements</span>
Thanks to John Emmons and the rest of the Unicode CLDR Technical
Committee for their work in developing the <a href="https://www.rfc-editor.org/bcp/bcp47">BCP 47</a> subtags for LDML.
Thanks also to Doug Ewell, for his many suggestions for improvements
to this document.
<span class="h2"><a class="selflink" id="section-4" href="#section-4">4</a>. IANA Considerations</span>
According to this document, IANA has inserted the record in
<a href="#section-2.2">Section 2.2</a> into the Language Extensions Registry, according to
<a href="#section-3.7">Section 3.7</a> (Extensions and the Extensions Registry) of [<a href="#ref-BCP47" title=""Tags for Identifying Languages"">BCP47</a>],
"Tags for Identifying Languages". Per <a href="https://www.rfc-editor.org/bcp/bcp47#section-5.2">Section 5.2 of [BCP47]</a>, there
might be occasional (rare) requests by the Unicode Consortium (the
"Authority" listed in the record) for maintenance of this record.
Changes that can be submitted to IANA without the publication of a
new RFC are limited to modification of the Comments, Contact_Email,
Mailing_List, and URL fields. Any such requested changes MUST use
the domain 'unicode.org' in any new addresses or URIs, MUST
explicitly cite this document (so that IANA can reference these
<span class="grey">Davis, et al. Informational [Page 6]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-7" ></span>
<span class="grey"><a href="./rfc6067">RFC 6067</a> <a href="https://www.rfc-editor.org/bcp/bcp47">BCP 47</a> Unicode Locale Extension December 2010</span>
requirements), and MUST originate from the 'unicode.org' domain. The
domain or authority can only be changed via a new RFC.
<span class="h2"><a class="selflink" id="section-5" href="#section-5">5</a>. Security Considerations</span>
The security considerations for this extension are the same as those
for [<a href="#ref-BCP47" title=""Tags for Identifying Languages"">BCP47</a>]. See <a href="./rfc5646#section-6">RFC 5646, Section 6</a>, Security Considerations
[<a href="#ref-BCP47" title=""Tags for Identifying Languages"">BCP47</a>].
<span class="h2"><a class="selflink" id="section-6" href="#section-6">6</a>. References</span>
<span class="h3"><a class="selflink" id="section-6.1" href="#section-6.1">6.1</a>. Normative References</span>
[<a id="ref-BCP47">BCP47</a>] Phillips, A., Ed. and M. Davis, Ed., "Tags for
Identifying Languages", <a href="https://www.rfc-editor.org/bcp/bcp47">BCP 47</a>, <a href="./rfc5646">RFC 5646</a>,
September 2009.
[<a id="ref-RFC2119">RFC2119</a>] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", <a href="https://www.rfc-editor.org/bcp/bcp14">BCP 14</a>, <a href="./rfc2119">RFC 2119</a>, March 1997.
[<a id="ref-US-ASCII">US-ASCII</a>] International Organization for Standardization,
"ISO/IEC 646:1991, Information technology -- ISO
7-bit coded character set for information
interchange.", 1991.
[<a id="ref-UTS35">UTS35</a>] Davis, M., "Unicode Technical Standard #35: Locale
Data Markup Language (LDML)", December 2007,
<<a href="http://www.unicode.org/reports/tr35/">http://www.unicode.org/reports/tr35/</a>>.
<a href="#section-3">Section 3</a>: <a href="http://unicode.org/reports/tr35/#Unicode_Language_and_Locale_Identifiers">http://unicode.org/reports/</a>
<a href="http://unicode.org/reports/tr35/#Unicode_Language_and_Locale_Identifiers">tr35/#Unicode_Language_and_Locale_Identifiers</a>
<a href="#appendix-Q">Appendix Q</a>: <a href="http://unicode.org/reports/tr35/#Locale_Extension_Key_and_Type_Data">http://unicode.org/reports/</a>
<a href="http://unicode.org/reports/tr35/#Locale_Extension_Key_and_Type_Data">tr35/#Locale_Extension_Key_and_Type_Data</a>
<span class="h3"><a class="selflink" id="section-6.2" href="#section-6.2">6.2</a>. Informative References</span>
[<a id="ref-ldml-registry">ldml-registry</a>] "Registry for Common Locale Data Repository tag
elements", September 2009.
<span class="grey">Davis, et al. Informational [Page 7]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-8" ></span>
<span class="grey"><a href="./rfc6067">RFC 6067</a> <a href="https://www.rfc-editor.org/bcp/bcp47">BCP 47</a> Unicode Locale Extension December 2010</span>
Authors' Addresses
Mark Davis
Google
EMail: mark@macchiato.com
Addison Phillips
Lab126
EMail: addison@lab126.com
Yoshito Umaoka
IBM
EMail: yoshito_umaoka@us.ibm.com
Davis, et al. Informational [Page 8]
</pre>
|