1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563
|
Network Working Group N. Freed
Request for Comments: 2278 Innosoft
BCP: 19 J. Postel
Category: Best Current Practice ISI
January 1998
IANA Charset
Registration Procedures
Status of this Memo
This document specifies an Internet Best Current Practices for the
Internet Community, and requests discussion and suggestions for
improvements. Distribution of this memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (1998). All Rights Reserved.
1. Abstract
MIME [RFC-2045, RFC-2046, RFC-2047, RFC-2184] and various other
modern Internet protocols are capable of using many different
charsets. This in turn means that the ability to label different
charsets is essential. This registration procedure exists solely to
associate a specific name or names with a given charset and to give
an indication of whether or not a given charset can be used in MIME
text objects. In particular, the general applicability and
appropriateness of a given registered charset is a protocol issue,
not a registration issue, and is not dealt with by this registration
procedure.
2. Definitions and Notation
The following sections define various terms used in this document.
2.1. Requirements Notation
This document occasionally uses terms that appear in capital letters.
When the terms "MUST", "SHOULD", "MUST NOT", "SHOULD NOT", and "MAY"
appear capitalized, they are being used to indicate particular
requirements of this specification. A discussion of the meanings of
these terms appears in [RFC-2119].
Freed & Postel Best Current Practice [Page 1]
RFC 2278 Charset Registration January 1998
2.2. Character
A member of a set of elements used for the organisation, control, or
representation of data.
2.3. Charset
The term "charset" (see historical note below) is used here to refer
to a method of converting a sequence of octets into a sequence of
characters. This conversion may also optionally produce additional
control information such as directionality indicators.
Note that unconditional and unambiguous conversion in the other
direction is not required, in that not all characters may be
representable by a given charset and a charset may provide more than
one sequence of octets to represent a particular sequence of
characters.
This definition is intended to allow charsets to be defined in a
variety of different ways, from simple single-table mappings such as
US-ASCII to complex table switching methods such as those that use
ISO 2022's techniques, to be used as charsets. However, the
definition associated with a charset name must fully specify the
mapping to be performed. In particular, use of external profiling
information to determine the exact mapping is not permitted.
HISTORICAL NOTE: The term "character set" was originally used in MIME
to describe such straightforward schemes as US-ASCII and ISO-8859-1
which consist of a small set of characters and a simple one-to-one
mapping from single octets to single characters. Multi-octet
character encoding schemes and switching techniques make the
situation much more complex. As such, the definition of this term was
revised to emphasize both the conversion aspect of the process, and
the term itself has been changed to "charset" to emphasize that it is
not, after all, just a set of characters. A discussion of these
issues as well as specification of standard terminology for use in
the IETF appears in RFC 2130.
2.4. Coded Character Set
A Coded Character Set (CCS) is a mapping from a set of abstract
characters to a set of integers. Examples of coded character sets are
ISO 10646 [ISO-10646], US-ASCII [US-ASCII], and the ISO-8859 series
[ISO-8859].
Freed & Postel Best Current Practice [Page 2]
RFC 2278 Charset Registration January 1998
2.5. Character Encoding Scheme
A Character Encoding Scheme (CES) is a mapping from a Coded Character
Set or several coded character sets to a set of octets. A given CES
is typically associated with a single CCS; for example, UTF-8 applies
only to ISO 10646.
3. Registration Requirements
Registered charsets are expected to conform to a number of
requirements as described below.
3.1. Required Characteristics
Registered charsets MUST conform to the definition of a "charset"
given above. In addition, charsets intended for use in MIME content
types under the "text" top-level type must conform to the
restrictions on that type described in RFC 2045. All registered
charsets MUST note whether or not they are suitable for use in MIME.
All charsets which are constructed as a composition of a CCS and a
CES MUST either include the CCS and CES they are based on in their
registration or else cite a definition of their CCS and CES that
appears elsewhere.
All registered charsets MUST be specified in a stable, openly
available specification. Registration of charsets whose
specifications aren't stable and openly available is forbidden.
3.2. New Charsets
This registration mechanism is not intended to be a vehicle for the
definition of entirely new charsets. This is due to the fact that the
registration process does NOT contain adequate review mechanisims for
such undertakings.
As such, only charsets defined by other processes and standards
bodies, or specific profiles of such charsets, are eligible for
registration.
3.3. Naming Requirements
One or more names MUST be assigned to all registered charsets.
Multiple names for the same charset are permitted, but if multiple
names are assigned a single primary name for the charset MUST be
identified. All other names are considered to be aliases for the
primary name and use of the primary name is preferred over use of any
of the aliases.
Freed & Postel Best Current Practice [Page 3]
RFC 2278 Charset Registration January 1998
Each assigned name MUST uniquely identify a single charset. All
charset names MUST be suitable for use as the value of a MIME content
type charset parameter and hence MUST conform to MIME parameter value
syntax. This applies even if the specific charset being registered is
not suitable for use with the "text" media type.
Finally, charsets being registered for use with the "text" media type
MUST have a primary name that conforms to the more restrictive syntax
of the charset field in MIME encoded-words [RFC-2047, RFC-2184] and
MIME extended parameter values [RFC-2184]. A combined ABNF definition
for such names is as follows:
mime-charset = 1*<Any CHAR except SPACE, CTLs, and cspecials>
cspecials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
<"> / "/" / "[" / "]" / "?" / "." / "=" / "*"
CHAR = <any ASCII character> ; ( 0-177, 0.-127.)
SPACE = <ASCII SP, space> ; ( 40, 32.)
CTL = <any ASCII control ; ( 0- 37, 0.- 31.)
character and DEL> ; ( 177, 127.)
3.4. Functionality Requirement
Charsets must function as actual charsets: Registration of things
that are better thought of as a transfer encoding, as a media type,
or as a collection of separate entities of another type, is not
allowed. For example, although HTML could theoretically be thought
of as a charset, it is really better thought of as a media type and
as such it cannot be registered as a charset.
3.5. Usage and Implementation Requirements
Use of a large number of charsets in a given protocol may hamper
interoperability. However, the use of a large number of undocumented
and/or unlabelled charsets hampers interoperability even more.
A charset should therefore be registered ONLY if it adds significant
functionality that is valuable to a large community, OR if it
documents existing practice in a large community. Note that charsets
registered for the second reason should be explicitly marked as being
of limited or specialized use and should only be used in Internet
messages with prior bilateral agreement.
3.6. Publication Requirements
Charset registrations can be published in RFCs, however, RFC
publication is not required to register a new charset.
Freed & Postel Best Current Practice [Page 4]
RFC 2278 Charset Registration January 1998
The registration of a charset does not imply endorsement, approval,
or recommendation by the IANA, IESG, or IETF, or even certification
that the specification is adequate. It is expected that applicability
statements for particular applications will be published from time to
time that recommend implementation of, and support for, charsets that
have proven particularly useful in those contexts.
3.7. MIBenum Requirements
Each registered charset MUST also be assigned a unique enumerated
integer value. These "MIBenum" values are defined by and used in the
Printer MIB [RFC-1759].
A MIBenum value for each charset will be assigned by IANA at the time
of registration.
4. Registration Procedure
The following procedure has been implemented by the IANA for review
and approval of new charsets. This is not a formal standards
process, but rather an administrative procedure intended to allow
community comment and sanity checking without excessive time delay.
4.1. Present the Charset to the Community
Send the proposed charset registration to the "ietf-
charsets@iana.org" mailing list. This mailing list has been
established for the sole purpose of reviewing proposed charset
registrations. Proposed charsets are not formally registered and must
not be used; the "x-" prefix specified in RFC 2045 can be used until
registration is complete.
The intent of the public posting is to solicit comments and feedback
on the definition of the charset and the name chosen for it over a
two week period.
4.2. Charset Reviewer
When the two week period has passed and the registration proposer is
convinced that consensus has been achieved, the registration
application should be submitted to IANA and the charset reviewer. The
charset reviewer, who is appointed by the IETF Applications Area
Director(s), either approves the request for registration or rejects
it. Rejection may occur because of significant objections raised on
the list or objections raised externally. If the charset reviewer
considers the registration sufficiently important and controversial,
a last call for comments may be issued to the full IETF. The charset
Freed & Postel Best Current Practice [Page 5]
RFC 2278 Charset Registration January 1998
reviewer may also recommend standards track processing (before or
after registration) when that appears appropriate and the level of
specification of the charset is adequate.
Decisions made by the reviewer must be posted to the ietf-charsets
mailing list within 14 days. Decisions made by the reviewer may be
appealed to the IESG.
4.3. IANA Registration
Provided that the charset registration has either passed review or
has been successfully appealed to the IESG, the IANA will register
the charset, assign a MIBenum value, and make its registration
available to the community.
5. Location of Registered Charset List
Charset registrations will be posted in the anonymous FTP file
"ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets" and all
registered charsets will be listed in the periodically issued
"Assigned Numbers" RFC [currently RFC-1700]. The description of the
charset may also be published as an Informational RFC by sending it
to "rfc-editor@isi.edu" (please follow the instructions to RFC
authors [RFC-2223]).
6. Registration Template
To: ietf-charsets@iana.org
Subject: Registration of new charset
Charset name(s):
(All names must be suitable for use as the value of a MIME content-
type parameter.)
Published specification(s):
(A specification for the charset must be openly available that
accurately describes what is being registered. If a charset is
defined as a composition of a CCS and a CES then these defintions
must either be included or referenced.)
Person & email address to contact for further information:
Freed & Postel Best Current Practice [Page 6]
RFC 2278 Charset Registration January 1998
7. Security Considerations
This registration procedure is not known to raise any sort of
security considerations that are appreciably different from those
already existing in the protocols that employ registered charsets.
8. References
[ISO-2022]
International Standard -- Information Processing -- Character
Code Structure and Extension Techniques, ISO/IEC 2022:1994, 4th
ed.
[ISO-8859]
International Standard -- Information Processing -- 8-bit
Single-Byte Coded Graphic Character Sets
- Part 1: Latin Alphabet No. 1, ISO 8859-1:1987, 1st ed.
- Part 2: Latin Alphabet No. 2, ISO 8859-2:1987, 1st ed.
- Part 3: Latin Alphabet No. 3, ISO 8859-3:1988, 1st ed.
- Part 4: Latin Alphabet No. 4, ISO 8859-4:1988, 1st ed.
- Part 5: Latin/Cyrillic Alphabet, ISO 8859-5:1988, 1st
ed.
- Part 6: Latin/Arabic Alphabet, ISO 8859-6:1987, 1st ed.
- Part 7: Latin/Greek Alphabet, ISO 8859-7:1987, 1st ed.
- Part 8: Latin/Hebrew Alphabet, ISO 8859-8:1988, 1st ed.
- Part 9: Latin Alphabet No. 5, ISO/IEC 8859-9:1989, 1st
ed.
International Standard -- Information Technology -- 8-bit
Single-Byte Coded Graphic Character Sets
- Part 10: Latin Alphabet No. 6, ISO/IEC 8859-10:1992,
1st ed.
[ISO-10646]
ISO/IEC 10646-1:1993(E), "Information technology --
Universal Multiple-Octet Coded Character Set (UCS) --
Part 1: Architecture and Basic Multilingual Plane",
JTC1/SC2, 1993.
[RFC-2048]
Freed, N., Klensin, J., and J. Postel, "Multipurpose Internet
Mail Extensions (MIME) Part Four: Registration Procedures", RFC
2048, November 1996.
[RFC-1700]
Reynolds, J., and J. Postel, "Assigned Numbers", STD 2, RFC
1700, October 1994.
Freed & Postel Best Current Practice [Page 7]
RFC 2278 Charset Registration January 1998
[RFC-1759]
Smith, R., Wright, F., Hastings, T., Zilles, S., and J.
Gyllenskog, "Printer MIB", RFC 1759, March 1995.
[RFC-2045]
Freed, N., and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part One: Format of Internet Message Bodies",
RFC 2045, November 1996.
[RFC-2046]
Freed, N., and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part Two: Media Types", RFC 2046, November
1996.
[RFC-2047]
Moore, K., "Multipurpose Internet Mail Extensions (MIME) Part
Three: Representation of Non-Ascii Text in Internet Message
Headers", RFC 2047, November 1996.
[RFC-2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
[RFC-2130]
Weider, C., Preston, C., Simonsen, K., Alvestrand, H., Atkinson,
R., Crispin, M., and P. Svanberg, "Report from the IAB Character
Set Workshop", RFC 2130, April 1997.
[RFC-2184]
Freed, N., and K. Moore, "MIME Parameter Value and Encoded Word
Extensions: Character Sets, Languages, and Continuations", RFC
2184, August 1997.
[US-ASCII]
Coded Character Set -- 7-Bit American Standard Code for
Information Interchange, ANSI X3.4-1986.
Freed & Postel Best Current Practice [Page 8]
RFC 2278 Charset Registration January 1998
9. Authors' Addresses
Ned Freed
Innosoft International, Inc.
1050 Lakes Drive
West Covina, CA 91790
USA
Phone: +1 626 919 3600
Fax: +1 626 919 3614
EMail: ned.freed@innosoft.com
Jon Postel
USC/Information Sciences Institute
4676 Admiralty Way
Marina del Rey, CA 90292
USA
Phone: +1 310 822 1511
Fax: +1 310 823 6714
EMail: Postel@ISI.EDU
Freed & Postel Best Current Practice [Page 9]
RFC 2278 Charset Registration January 1998
Full Copyright Statement
Copyright (C) The Internet Society (1998). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Freed & Postel Best Current Practice [Page 10]
|