1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949
|
<pre>Network Working Group E. Wilde
Request for Comments: 5147 UC Berkeley
Updates: <a href="./rfc2046">2046</a> M. Duerst
Category: Standards Track Aoyama Gakuin University
April 2008
<span class="h1">URI Fragment Identifiers for the text/plain Media Type</span>
Status of This Memo
This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
Abstract
This memo defines URI fragment identifiers for text/plain MIME
entities. These fragment identifiers make it possible to refer to
parts of a text/plain MIME entity, either identified by character
position or range, or by line position or range. Fragment
identifiers may also contain information for integrity checks to make
them more robust.
<span class="grey">Wilde & Duerst Standards Track [Page 1]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-2" ></span>
<span class="grey"><a href="./rfc5147">RFC 5147</a> text/plain Fragment Identifiers April 2008</span>
Table of Contents
<a href="#section-1">1</a>. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . <a href="#page-3">3</a>
<a href="#section-1.1">1.1</a>. What Is text/plain? . . . . . . . . . . . . . . . . . . . <a href="#page-3">3</a>
<a href="#section-1.2">1.2</a>. What Is a URI Fragment Identifier? . . . . . . . . . . . . <a href="#page-4">4</a>
<a href="#section-1.3">1.3</a>. Why text/plain Fragment Identifiers? . . . . . . . . . . . <a href="#page-4">4</a>
<a href="#section-1.4">1.4</a>. Incremental Deployment . . . . . . . . . . . . . . . . . . <a href="#page-5">5</a>
<a href="#section-1.5">1.5</a>. Notation Used in This Memo . . . . . . . . . . . . . . . . <a href="#page-5">5</a>
<a href="#section-2">2</a>. Fragment Identification Methods . . . . . . . . . . . . . . . <a href="#page-5">5</a>
<a href="#section-2.1">2.1</a>. Fragment Identification Principles . . . . . . . . . . . . <a href="#page-6">6</a>
<a href="#section-2.1.1">2.1.1</a>. Positions and Ranges . . . . . . . . . . . . . . . . . <a href="#page-6">6</a>
<a href="#section-2.1.2">2.1.2</a>. Characters and Lines . . . . . . . . . . . . . . . . . <a href="#page-7">7</a>
<a href="#section-2.2">2.2</a>. Combining the Principles . . . . . . . . . . . . . . . . . <a href="#page-7">7</a>
<a href="#section-2.2.1">2.2.1</a>. Character Position . . . . . . . . . . . . . . . . . . <a href="#page-7">7</a>
<a href="#section-2.2.2">2.2.2</a>. Character Range . . . . . . . . . . . . . . . . . . . <a href="#page-8">8</a>
<a href="#section-2.2.3">2.2.3</a>. Line Position . . . . . . . . . . . . . . . . . . . . <a href="#page-8">8</a>
<a href="#section-2.2.4">2.2.4</a>. Line Range . . . . . . . . . . . . . . . . . . . . . . <a href="#page-8">8</a>
<a href="#section-2.3">2.3</a>. Fragment Identifier Robustness . . . . . . . . . . . . . . <a href="#page-8">8</a>
<a href="#section-3">3</a>. Fragment Identification Syntax . . . . . . . . . . . . . . . . <a href="#page-9">9</a>
<a href="#section-3.1">3.1</a>. Integrity Checks . . . . . . . . . . . . . . . . . . . . . <a href="#page-9">9</a>
<a href="#section-4">4</a>. Fragment Identifier Processing . . . . . . . . . . . . . . . . <a href="#page-10">10</a>
<a href="#section-4.1">4.1</a>. Handling of Line Endings in text/plain MIME Entities . . . <a href="#page-10">10</a>
<a href="#section-4.2">4.2</a>. Handling of Position Values . . . . . . . . . . . . . . . <a href="#page-11">11</a>
<a href="#section-4.3">4.3</a>. Handling of Integrity Checks . . . . . . . . . . . . . . . <a href="#page-11">11</a>
<a href="#section-4.4">4.4</a>. Syntax Errors in Fragment Identifiers . . . . . . . . . . <a href="#page-12">12</a>
<a href="#section-5">5</a>. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . <a href="#page-12">12</a>
<a href="#section-6">6</a>. IANA Considerations . . . . . . . . . . . . . . . . . . . . . <a href="#page-13">13</a>
<a href="#section-7">7</a>. Security Considerations . . . . . . . . . . . . . . . . . . . <a href="#page-13">13</a>
<a href="#section-8">8</a>. References . . . . . . . . . . . . . . . . . . . . . . . . . . <a href="#page-14">14</a>
<a href="#section-8.1">8.1</a>. Normative References . . . . . . . . . . . . . . . . . . . <a href="#page-14">14</a>
<a href="#section-8.2">8.2</a>. Informative References . . . . . . . . . . . . . . . . . . <a href="#page-14">14</a>
<a href="#appendix-A">Appendix A</a>. Acknowledgements . . . . . . . . . . . . . . . . . . <a href="#page-16">16</a>
<span class="grey">Wilde & Duerst Standards Track [Page 2]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-3" ></span>
<span class="grey"><a href="./rfc5147">RFC 5147</a> text/plain Fragment Identifiers April 2008</span>
<span class="h2"><a class="selflink" id="section-1" href="#section-1">1</a>. Introduction</span>
This memo updates the text/plain media type defined in <a href="./rfc2046">RFC 2046</a> [<a href="#ref-3" title=""Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types"">3</a>]
by defining URI fragment identifiers for text/plain MIME entities.
This makes it possible to refer to parts of a text/plain MIME entity.
Such parts can be identified by either character position or range,
or by line position or range. Integrity checking information can be
added to a fragment identifier to make it more robust, enabling
applications to detect changes of the entity.
This section gives an introduction to the general concepts of text/
plain MIME entities and URI fragment identifiers, and it discusses
the need for fragment identifiers for text/plain and deployment
issues. <a href="#section-2">Section 2</a> discusses the principles and methods on which this
memo is based. <a href="#section-3">Section 3</a> defines the syntax, and <a href="#section-4">Section 4</a> discusses
processing of text/plain fragment identifiers. <a href="#section-5">Section 5</a> shows some
examples.
<span class="h3"><a class="selflink" id="section-1.1" href="#section-1.1">1.1</a>. What Is text/plain?</span>
Internet Media Types (often referred to as "MIME types"), as defined
in <a href="./rfc2045">RFC 2045</a> [<a href="#ref-2" title=""Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies"">2</a>] and <a href="./rfc2046">RFC 2046</a> [<a href="#ref-3" title=""Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types"">3</a>], are used to identify different
types and sub-types of media. <a href="./rfc2046">RFC 2046</a> [<a href="#ref-3" title=""Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types"">3</a>] and <a href="./rfc3676">RFC 3676</a> [<a href="#ref-6" title=""The Text/Plain Format and DelSp Parameters"">6</a>] specify
the text/plain media type, which is used for simple, unformatted
text. Quoting from <a href="./rfc2046">RFC 2046</a> [<a href="#ref-3" title=""Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types"">3</a>]: "Plain text does not provide for or
allow formatting commands, font attribute specifications, processing
instructions, interpretation directives, or content markup. Plain
text is seen simply as a linear sequence of characters, possibly
interrupted by line breaks or page breaks".
The text/plain media type does not restrict the character encoding;
any character encoding may be used. In the absence of an explicit
character encoding declaration, US-ASCII [<a href="#ref-13" title=""Coded Character Set - 7-Bit American National Standard Code for Information Interchange"">13</a>] is assumed as the
default character encoding. This variability of the character
encoding makes it impossible to count characters in a text/plain MIME
entity without taking the character encoding into account, because
there are many character encodings using more than one octet per
character.
The biggest advantage of text/plain MIME entities is their ease of
use and their portability among different platforms. As long as they
use popular character encodings (such as US-ASCII or UTF-8 [<a href="#ref-12" title=""UTF-8, a transformation format of ISO 10646"">12</a>]),
they can be displayed and processed on virtually every computer
system. The only remaining interoperability issue is the
representation of line endings, which is discussed in <a href="#section-4.1">Section 4.1</a>.
<span class="grey">Wilde & Duerst Standards Track [Page 3]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-4" ></span>
<span class="grey"><a href="./rfc5147">RFC 5147</a> text/plain Fragment Identifiers April 2008</span>
<span class="h3"><a class="selflink" id="section-1.2" href="#section-1.2">1.2</a>. What Is a URI Fragment Identifier?</span>
URIs are the identification mechanism for resources on the Web. The
URI syntax specified in <a href="./rfc3986">RFC 3986</a> [<a href="#ref-7" title=""Uniform Resource Identifier (URI): Generic Syntax"">7</a>] optionally includes a so-called
"fragment identifier", separated by a number sign ('#'). The
fragment identifier consists of additional reference information to
be interpreted by the user agent after the retrieval action has been
successfully completed. The semantics of a fragment identifier are a
property of the data resulting from a retrieval action, regardless of
the type of URI used in the reference. Therefore, the format and
interpretation of fragment identifiers is dependent on the media type
of the retrieval result.
The most popular fragment identifier is defined for text/html
(defined in <a href="./rfc2854">RFC 2854</a> [<a href="#ref-10" title=""The 'text/html' Media Type"">10</a>]) and makes it possible to refer to a
specific element (identified by the value of a 'name' or 'id'
attribute) of an HTML document. This makes it possible to reference
a specific part of a Web page, rather than a Web page as a whole.
<span class="h3"><a class="selflink" id="section-1.3" href="#section-1.3">1.3</a>. Why text/plain Fragment Identifiers?</span>
Referring to specific parts of a resource can be very useful because
it enables users and applications to create more specific references.
Users can create references to the part they really are interested in
or want to talk about, rather than always pointing to a complete
resource. Even though it is suggested that fragment identification
methods are specified in a media type's MIME registration (see [<a href="#ref-15" title=""Media Type Specifications and Registration Procedures"">15</a>]),
many media types do not have fragment identification methods
associated with them.
Fragment identifiers are only useful if supported by the client,
because they are only interpreted by the client. Therefore, a new
fragment identification method will require some time to be adopted
by clients, and older clients will not support it. However, because
the URI still works even if the fragment identifier is not supported
(the resource is retrieved, but the fragment identifier is not
interpreted), rapid adoption is not highly critical to ensure the
success of a new fragment identification method.
Fragment identifiers for text/plain, as defined in this memo, make it
possible to refer to specific parts of a text/plain MIME entity,
using concepts of positions and ranges, which may be applied to
characters and lines. Thus, text/plain fragment identifiers enable
users to exchange information more specifically, thereby reducing the
time and effort that is necessary to manually search for the relevant
part of a text/plain MIME entity.
<span class="grey">Wilde & Duerst Standards Track [Page 4]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-5" ></span>
<span class="grey"><a href="./rfc5147">RFC 5147</a> text/plain Fragment Identifiers April 2008</span>
The text/plain format does not support the embedding of links, so in
most environments, text/plain resources can only serve as targets for
links, and not as sources. However, when combining the text/plain
fragment identifiers specified in this memo with out-of-line linking
mechanisms such as XLink [<a href="#ref-14" title=""XML Linking Language (XLink) Version 1.0"">14</a>], it becomes possible to "bind" link
resources to text/plain resources and thereby "embed" links into
text/plain resources. Thus, the text/plain fragment identifiers
specified in this memo open a path for text/plain files to become
bidirectionally navigable resources in hypermedia systems such as the
Web.
<span class="h3"><a class="selflink" id="section-1.4" href="#section-1.4">1.4</a>. Incremental Deployment</span>
As long as text/plain fragment identifiers are not supported
universally, it is important to consider the implications of
incremental deployment. Clients (for example, Web browsers) not
supporting the text/plain fragment identifier described in this memo
will work with URI references to text/plain MIME entities, but they
will fail to locate the sub-resource identified by the fragment
identifier. This is a reasonable fallback behavior, and in general,
users should take into account the possibility that a program
interpreting a given URI will fail to interpret the fragment
identifier part. Since fragment identifier evaluation is local to
the client (and happens after retrieving the MIME entity), there is
no reliable way for a server to determine whether a requesting client
is using a URI containing a fragment identifier.
<span class="h3"><a class="selflink" id="section-1.5" href="#section-1.5">1.5</a>. Notation Used in This Memo</span>
The capitalized key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
"SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in <a href="./rfc2119">RFC</a>
<a href="./rfc2119">2119</a> [<a href="#ref-4" title=""Key words for use in RFCs to Indicate Requirement Levels"">4</a>].
<span class="h2"><a class="selflink" id="section-2" href="#section-2">2</a>. Fragment Identification Methods</span>
The identification of fragments of text/plain MIME entities can be
based on different foundations. Since it is not possible to insert
explicit, invisible identifiers into a text/plain MIME entity (for
example, as used in HTML documents, implemented through dedicated
attributes), fragment identification has to rely on certain inherent
properties of the MIME entity. This memo specifies fragment
identification using four different methods, which are character
positions and ranges, and line positions and ranges, augmented by an
integrity check mechanism for improving the robustness of fragment
identifiers.
<span class="grey">Wilde & Duerst Standards Track [Page 5]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-6" ></span>
<span class="grey"><a href="./rfc5147">RFC 5147</a> text/plain Fragment Identifiers April 2008</span>
When interpreting character or line numbers, implementations MUST
take the character encoding of the MIME entity into account, because
character count and octet count may differ for the character encoding
being used. For example, a MIME entity using the UTF-16 encoding (as
specified in <a href="./rfc2781">RFC 2781</a> [<a href="#ref-11" title=""UTF-16, an encoding of ISO 10646"">11</a>]) uses two octets per character in most
cases, and sometimes four octets per character. It can also have a
leading BOM (Byte-Order Mark), which does not count as a character
and thus also affects the mapping from a simple octet count to a
character count.
<span class="h3"><a class="selflink" id="section-2.1" href="#section-2.1">2.1</a>. Fragment Identification Principles</span>
Fragment identification can be done by combining two orthogonal
principles, which are positions and ranges, and characters and lines.
This section describes the principles themselves, while <a href="#section-2.2">Section 2.2</a>
describes the combination of the principles.
<span class="h4"><a class="selflink" id="section-2.1.1" href="#section-2.1.1">2.1.1</a>. Positions and Ranges</span>
A position does not identify an actual fragment of the MIME entity,
but a position inside the MIME entity, which can be regarded as a
fragment of length zero. The use case for positions is to provide
pointers for applications that may use them to implement
functionalities such as "insert some text here", which needs a
position rather than a fragment. Positions are counted from zero;
position zero being before the first character or line of a text/
plain MIME entity. Thus, a text/plain MIME entity having one
character has two positions, one before the first character (position
zero), and one after the first character (position 1).
Since positions are fragments of length zero, applications SHOULD use
other methods than highlighting to indicate positions, the most
obvious way being the positioning of a cursor (if the application
supports the concept of a cursor).
Ranges, on the other hand, identify fragments of a MIME entity that
have a length that may be greater than zero. As a general principle
for ranges, they specify both a lower and an upper bound. The start
or the end of a range specification may be omitted, defaulting to the
first or last position of the MIME entity, respectively. The end of
a range must have a value greater than or equal to the start. A
range with identical start and end is legal and identifies a range of
length zero, which is equivalent to a position.
Applications that support a concept such as highlighting SHOULD use
such a concept to indicate fragments of lengths greater than zero to
the user.
<span class="grey">Wilde & Duerst Standards Track [Page 6]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-7" ></span>
<span class="grey"><a href="./rfc5147">RFC 5147</a> text/plain Fragment Identifiers April 2008</span>
For positions and ranges, it is implicitly assumed that if a number
is greater than the actual number of elements in the MIME entity,
then it is referring to the last element of the MIME entity (see
<a href="#section-4">Section 4</a> for details).
<span class="h4"><a class="selflink" id="section-2.1.2" href="#section-2.1.2">2.1.2</a>. Characters and Lines</span>
The concept of positions and ranges can be applied to characters or
lines. In both cases, positions indicate points between these
entities, while ranges identify zero or more of these entities by
indicating positions.
Character positions are numbered starting with zero (ignoring initial
BOM marks or similar concepts that are not part of the actual textual
content of a text/plain MIME entity), and counting each character
separately, with the exception of line endings, which are always
counted as one character (see <a href="#section-4.1">Section 4.1</a> for details).
Line positions are numbered starting with zero (with line position
zero always being identical with character position zero);
<a href="#section-4.1">Section 4.1</a> describes how line endings are identified. Fragments
identified by lines include the line endings, so applications
identifying line-based fragments MUST include the line endings in the
fragment identification they are using (e.g., the highlighted
selection). If a MIME entity does not contain any line endings, then
it consists of a single (the first) line.
<span class="h3"><a class="selflink" id="section-2.2" href="#section-2.2">2.2</a>. Combining the Principles</span>
In the following sections, the principles described in the preceding
section (positions/ranges and characters/lines) are combined,
resulting in four use cases. The schemes mentioned below refer to
the fragment identifier syntax, described in detail in <a href="#section-3">Section 3</a>.
<span class="h4"><a class="selflink" id="section-2.2.1" href="#section-2.2.1">2.2.1</a>. Character Position</span>
To identify a character position (i.e., a fragment of length zero
between two characters), the 'char' scheme followed by a single
number is used. This method identifies a position between two
characters (or before the first or after the last character), rather
than identifying a fragment consisting of a number of characters.
Character position counting starts with zero, so the character
position before the first character of a text/plain MIME entity has
the character position zero, and a MIME entity containing n distinct
characters has n+1 distinct character positions, the last one having
the character position n.
<span class="grey">Wilde & Duerst Standards Track [Page 7]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-8" ></span>
<span class="grey"><a href="./rfc5147">RFC 5147</a> text/plain Fragment Identifiers April 2008</span>
<span class="h4"><a class="selflink" id="section-2.2.2" href="#section-2.2.2">2.2.2</a>. Character Range</span>
To identify a fragment of one or more characters (a character range),
the 'char' scheme followed by a range specification is used. A
character range is a consecutive region of the MIME entity that
extends from the starting character position of the range to the
ending character position of the range.
<span class="h4"><a class="selflink" id="section-2.2.3" href="#section-2.2.3">2.2.3</a>. Line Position</span>
To identify a line position (i.e., a fragment of length zero between
two lines), the 'line' scheme followed by a single number is used.
This method identifies a position between two lines (or before the
first or after the last line), rather than identifying a fragment
consisting of a number of lines. Line position counting starts with
zero, so the line position before the first line of a text/plain MIME
entity has the line position zero, and a MIME entity containing n
distinct lines has n+1 distinct line positions, the last one having
the line position n.
<span class="h4"><a class="selflink" id="section-2.2.4" href="#section-2.2.4">2.2.4</a>. Line Range</span>
To identify a fragment of one or more lines (a line range), the
'line' scheme followed by a range specification is used. A line
range is a consecutive region of the MIME entity that extends from
the starting line position of the range to the ending line position
of the range.
<span class="h3"><a class="selflink" id="section-2.3" href="#section-2.3">2.3</a>. Fragment Identifier Robustness</span>
It is easily possible that a modification of the referenced resource
will break a fragment identifier. If applications want to create
more robust fragment identifiers, they may do so by adding integrity-
check information to fragment identifiers. Such information is used
to detect changes in the resource. Applications can then warn users
about the possibility that a fragment identifier might have been
broken by a modification of the resource.
Fragment identifiers are interpreted by clients, and therefore
integrity-check information is defined on MIME entities rather than
on the resource itself. This means that the integrity-check
information is specific to a certain entity. Specifically, content
encodings and/or content transfer encodings must be removed before
using integrity-check information.
Integrity-check information may specify the character encoding that
has been used when creating the information, and if such a
specification is present, clients MUST check whether the character
<span class="grey">Wilde & Duerst Standards Track [Page 8]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-9" ></span>
<span class="grey"><a href="./rfc5147">RFC 5147</a> text/plain Fragment Identifiers April 2008</span>
encoding specified and the character encoding of the retrieved MIME
entity are equal, and clients MUST NOT use the integrity check
information if these values differ. However, clients MAY choose to
transcode the retrieved MIME entity in the case of differing
character encodings, and after doing so, apply integrity checks.
Please note that this method is inherently unreliable because certain
characters or character sequences may have been lost or normalized
due to restrictions in one of the character encodings used.
<span class="h2"><a class="selflink" id="section-3" href="#section-3">3</a>. Fragment Identification Syntax</span>
The syntax for the text/plain fragment identifiers is
straightforward. The syntax defines four schemes, 'char', 'line',
and integrity check (which can either be 'length' or 'md5'). The
'char' and 'line' schemes can be used in two different variants,
either the position variant (with a single number), or the range
variant (with two comma-separated numbers). An integrity check can
either use the 'length' or the 'md5' scheme to specify a value.
'length' in this case serves as a very weak but easy to calculate
integrity check.
The following syntax definition uses ABNF as defined in <a href="./rfc5234">RFC 5234</a> [<a href="#ref-9" title=""Augmented BNF for Syntax Specifications: ABNF"">9</a>],
including the rules DIGIT and HEXDIG. The mime-charset rule is
defined in <a href="./rfc2978">RFC 2978</a> [<a href="#ref-5" title=""IANA Charset Registration Procedures"">5</a>].
NOTE: In the descriptions that follow, specified text values MUST be
used exactly as given, using exactly the indicated lower-case
letters. In this respect, the ABNF usage differs from [<a href="#ref-9" title=""Augmented BNF for Syntax Specifications: ABNF"">9</a>].
text-fragment = text-scheme 0*( ";" integrity-check )
text-scheme = ( char-scheme / line-scheme )
char-scheme = "char=" ( position / range )
line-scheme = "line=" ( position / range )
integrity-check = ( length-scheme / md5-scheme )
[ "," mime-charset ]
position = number
range = ( position "," [ position ] ) / ( "," position )
number = 1*( DIGIT )
length-scheme = "length=" number
md5-scheme = "md5=" md5-value
md5-value = 32HEXDIG
<span class="h3"><a class="selflink" id="section-3.1" href="#section-3.1">3.1</a>. Integrity Checks</span>
An integrity check can either specify a MIME entity's length, or its
MD5 fingerprint. In both cases, it can optionally specify the
character encoding that has been used when calculating the integrity
<span class="grey">Wilde & Duerst Standards Track [Page 9]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-10" ></span>
<span class="grey"><a href="./rfc5147">RFC 5147</a> text/plain Fragment Identifiers April 2008</span>
check, so that clients interpreting the fragment identifier may check
whether they are using the same character encoding for their
calculations. For lengths, the character encoding can be necessary
because it can influence the character count. As an example, Unicode
includes precomposed characters for writing Vietnamese, but in the
windows-1258 encoding, also used for writing Vietnamese, some
characters have to be encoded with separate diacritics, which means
that two characters will be counted. Applying Unicode terminology,
this means that the length of a text/plain MIME entity is computed
based on its "code points". For MD5 fingerprints, the character
encoding is necessary because the MD5 algorithm works on the binary
representation of the text/plain resource.
To allow future changes to this specification to address developments
in cryptography, implementations MUST ignore new types of integrity
checks, with names other than 'length' and 'md5'. If several
integrity checks are present, an application can use whatever
integrity checks it understands, and among these, those integrity
checks that provide an appropriate trade-off between performance and
the need for integrity checking. Please see <a href="#section-4.3">Section 4.3</a> for further
details.
The length of a text/plain MIME entity is calculated by using the
principles defined in <a href="#section-2.1.2">Section 2.1.2</a>. The MD5 fingerprint of a text/
plain MIME entity is calculated by using the algorithm presented in
[<a href="#ref-1" title=""The MD5 Message-Digest Algorithm"">1</a>], encoding the result in 32 hexadecimal digits (using uppercase or
lowercase letters) as a representation of the 128 bits that are the
result of the MD5 algorithm. Calculation of integrity checks is done
after stripping any potential content-encodings or content-transfer-
encodings of the transport mechanism.
<span class="h2"><a class="selflink" id="section-4" href="#section-4">4</a>. Fragment Identifier Processing</span>
Applications implementing support for the mechanism described in this
memo MUST behave as described in the following sections.
<span class="h3"><a class="selflink" id="section-4.1" href="#section-4.1">4.1</a>. Handling of Line Endings in text/plain MIME Entities</span>
In Internet messages, line endings in text/plain MIME entities are
represented by CR+LF character sequences (see <a href="./rfc2046">RFC 2046</a> [<a href="#ref-3" title=""Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types"">3</a>] and <a href="./rfc3676">RFC</a>
<a href="./rfc3676">3676</a> [<a href="#ref-6" title=""The Text/Plain Format and DelSp Parameters"">6</a>]). However, some protocols (such as HTTP) additionally allow
other conventions for line endings. Also, some operating systems
store text/plain entities locally with different line endings (in
most cases, Unix uses LF, MacOS traditionally uses CR, and Windows
uses CR+LF).
Independent of the number of bytes or characters used to represent a
line ending, each line ending MUST be counted as one single
<span class="grey">Wilde & Duerst Standards Track [Page 10]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-11" ></span>
<span class="grey"><a href="./rfc5147">RFC 5147</a> text/plain Fragment Identifiers April 2008</span>
character. Implementations interpreting text/plain fragment
identifiers MUST take into account the line ending conventions of the
protocols and other contexts that they work in.
As an example, an implementation working in the context of a Web
browser supporting http: URIs has to support the various line ending
conventions permitted by HTTP. As another example, an implementation
used on local files (e.g., with the file: URI scheme) has to support
the conventions used for local storage. All implementations SHOULD
support the Internet-wide CR+LF line ending convention, and MAY
support additional conventions not related to the protocols or
systems they work with.
Implementers should be aware of the fact that line endings in plain
text entities can be represented by other characters or character
sequences than CR+LF. Besides the abovementioned CR and LF, there
are also NEL and CR+NEL. In general, the encoding of line endings
can also depend on the character encoding of the MIME entity, and
implementations have to take this into account where necessary.
<span class="h3"><a class="selflink" id="section-4.2" href="#section-4.2">4.2</a>. Handling of Position Values</span>
If any position value (as a position or as part of a range) is
greater than the length of the actual MIME entity, then it identifies
the last character position or line position of the MIME entity. If
the first position value in a range is not present, then the range
extends from the start of the MIME entity. If the second position
value in a range is not present, then the range extends to the end of
the MIME entity. If a range scheme's positions are not properly
ordered (i.e., the first number is less than the second), then the
fragment identifier MUST be ignored.
<span class="h3"><a class="selflink" id="section-4.3" href="#section-4.3">4.3</a>. Handling of Integrity Checks</span>
Clients are not required to implement the handling of integrity
checks, so they MAY choose to ignore integrity check information
altogether. However, if they do implement integrity checking, the
following applies:
If a fragment identifier contains one or more integrity checks, and a
client retrieves a MIME entity and, using some integrity check(s),
detects that the entity has changed (observing the character encoding
specification as described in <a href="#section-3.1">Section 3.1</a>, if present), then the
client SHOULD NOT interpret the text/plain fragment identifier. A
client MAY signal this situation to the user.
<span class="grey">Wilde & Duerst Standards Track [Page 11]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-12" ></span>
<span class="grey"><a href="./rfc5147">RFC 5147</a> text/plain Fragment Identifiers April 2008</span>
<span class="h3"><a class="selflink" id="section-4.4" href="#section-4.4">4.4</a>. Syntax Errors in Fragment Identifiers</span>
If a fragment identifier contains a syntax error (i.e., does not
conform to the syntax specified in <a href="#section-3">Section 3</a>), then it MUST be
ignored by clients. Clients MUST NOT make any attempt to correct or
guess fragment identifiers. Syntax errors MAY be reported by
clients.
<span class="h2"><a class="selflink" id="section-5" href="#section-5">5</a>. Examples</span>
The following examples show some usages for the fragment identifiers
defined in this memo.
http://example.com/text.txt#char=100
This URI identifies the position after the 100th character of the
text.txt MIME entity. It should be noted that it is not clear which
octet(s) of the MIME entity this will be without retrieving the MIME
entity and thus knowing which character encoding it is using (in case
of HTTP, this information will be given in the Content-Type header of
the response). If the MIME entity has fewer than 100 characters, the
URI identifies the position after the MIME entity's last character.
http://example.com/text.txt#line=10,20
This URI identifies lines 11 to 20 of the text.txt MIME entity. If
the MIME entity has fewer than 11 lines, it identifies the position
after the last line. If the MIME entity has less than 20 but at
least 11 lines, it identifies the range from line 11 to the last line
of the MIME entity.
https://example.com/text.txt#line=,1
This URI identifies the first line. Please note that the URI scheme
has been changed to https.
ftp://example.com/text.txt#line=10,20;length=9876,UTF-8
As in the second example, this URI identifies lines 11 to 20 of the
text.txt MIME entity. The additional length integrity check
specifies that the MIME entity has a length of 9876 characters when
encoded in UTF-8. If the client supports the length scheme, it may
test the retrieved MIME entity for its length, but only if the
retrieved MIME entity uses the UTF-8 encoding or has been locally
transcoded into this encoding.
<span class="grey">Wilde & Duerst Standards Track [Page 12]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-13" ></span>
<span class="grey"><a href="./rfc5147">RFC 5147</a> text/plain Fragment Identifiers April 2008</span>
Please note that the FTP protocol, as well as some other protocols
underlying some other URI schemes, do not provide explicit
information about the media type of the resource being retrieved.
Using fragment identifiers with such URI schemes is therefore
inherently unreliable. Current user agents use various heuristics to
infer some media type for further processing. Processing of the
fragment identifier according to this memo is only appropriate if the
inferred media type is text/plain.
<span class="h2"><a class="selflink" id="section-6" href="#section-6">6</a>. IANA Considerations</span>
IANA has added a reference to this specification in the text/plain
Media Type registration.
<span class="h2"><a class="selflink" id="section-7" href="#section-7">7</a>. Security Considerations</span>
The fact that software implementing fragment identifiers for plain
text and software not implementing them differs in behavior, and the
fact that different software may show documents or fragments to users
in different ways, can lead to misunderstandings on the part of
users. Such misunderstandings might be exploited in a way similar to
spoofing or phishing.
In particular, care has to be taken if fragment identifiers are used
together with a mechanism that allows showing only the part of a
document identified by a fragment. One scenario may be the use of a
fragment identifier to hide small-print legal text. Another scenario
may be the inclusion of site-key-like material, which may give the
user the impression of using the real site rather than a fake site;
other scenarios may also be possible. Possible countermeasures may
include but are not limited to displaying the included content within
clearly visible boundaries and limiting inclusion to material from
the same security realm or from realms that give explicit permission
to be included in another realm.
Please note that the above issues all apply to the client side;
fragment identifiers are not used when resolving a URI to retrieve
the representation of a resource, but are only applied on the client
side.
Implementers and users of fragment identifiers for plain text should
also be aware of the security considerations in <a href="./rfc3986">RFC 3986</a> [<a href="#ref-7" title=""Uniform Resource Identifier (URI): Generic Syntax"">7</a>] and <a href="./rfc3987">RFC</a>
<a href="./rfc3987">3987</a> [<a href="#ref-8" title=""Internationalized Resource Identifiers (IRI)"">8</a>].
<span class="grey">Wilde & Duerst Standards Track [Page 13]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-14" ></span>
<span class="grey"><a href="./rfc5147">RFC 5147</a> text/plain Fragment Identifiers April 2008</span>
<span class="h2"><a class="selflink" id="section-8" href="#section-8">8</a>. References</span>
<span class="h3"><a class="selflink" id="section-8.1" href="#section-8.1">8.1</a>. Normative References</span>
[<a id="ref-1">1</a>] Rivest, R., "The MD5 Message-Digest Algorithm", <a href="./rfc1321">RFC 1321</a>,
April 1992.
[<a id="ref-2">2</a>] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part One: Format of Internet Message Bodies",
<a href="./rfc2045">RFC 2045</a>, November 1996.
[<a id="ref-3">3</a>] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part Two: Media Types", <a href="./rfc2046">RFC 2046</a>,
November 1996.
[<a id="ref-4">4</a>] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", <a href="https://www.rfc-editor.org/bcp/bcp14">BCP 14</a>, <a href="./rfc2119">RFC 2119</a>, March 1997.
[<a id="ref-5">5</a>] Freed, N. and J. Postel, "IANA Charset Registration
Procedures", <a href="https://www.rfc-editor.org/bcp/bcp19">BCP 19</a>, <a href="./rfc2978">RFC 2978</a>, October 2000.
[<a id="ref-6">6</a>] Gellens, R., "The Text/Plain Format and DelSp Parameters",
<a href="./rfc3676">RFC 3676</a>, February 2004.
[<a id="ref-7">7</a>] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
Resource Identifier (URI): Generic Syntax", STD 66, <a href="./rfc3986">RFC 3986</a>,
January 2005.
[<a id="ref-8">8</a>] Duerst, M. and M. Suignard, "Internationalized Resource
Identifiers (IRI)", <a href="./rfc3987">RFC 3987</a>, January 2005.
[<a id="ref-9">9</a>] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", STD 68, <a href="./rfc5234">RFC 5234</a>, January 2008.
<span class="h3"><a class="selflink" id="section-8.2" href="#section-8.2">8.2</a>. Informative References</span>
[<a id="ref-10">10</a>] Connolly, D. and L. Masinter, "The 'text/html' Media Type",
<a href="./rfc2854">RFC 2854</a>, June 2000.
[<a id="ref-11">11</a>] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 10646",
<a href="./rfc2781">RFC 2781</a>, February 2000.
[<a id="ref-12">12</a>] Yergeau, F., "UTF-8, a transformation format of ISO 10646",
STD 63, <a href="./rfc3629">RFC 3629</a>, November 2003.
[<a id="ref-13">13</a>] ANSI X3.4-1986, "Coded Character Set - 7-Bit American National
Standard Code for Information Interchange", 1986.
<span class="grey">Wilde & Duerst Standards Track [Page 14]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-15" ></span>
<span class="grey"><a href="./rfc5147">RFC 5147</a> text/plain Fragment Identifiers April 2008</span>
[<a id="ref-14">14</a>] DeRose, S., Maler, E., and D. Orchard, "XML Linking Language
(XLink) Version 1.0", World Wide Web Consortium Recommendation,
June 2001, <<a href="http://www.w3.org/TR/xlink/">http://www.w3.org/TR/xlink/</a>>.
[<a id="ref-15">15</a>] Freed, N. and J. Klensin, "Media Type Specifications and
Registration Procedures", <a href="https://www.rfc-editor.org/bcp/bcp13">BCP 13</a>, <a href="./rfc4288">RFC 4288</a>, December 2005.
<span class="grey">Wilde & Duerst Standards Track [Page 15]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-16" ></span>
<span class="grey"><a href="./rfc5147">RFC 5147</a> text/plain Fragment Identifiers April 2008</span>
<span class="h2"><a class="selflink" id="appendix-A" href="#appendix-A">Appendix A</a>. Acknowledgements</span>
Thanks for comments and suggestions provided by Marcel Baschnagel,
Stephane Bortzmeyer, Tim Bray, Iain Calder, John Cowan, Spencer
Dawkins, Lisa Dusseault, Benja Fallenstein, Ted Hardie, Sam Hartman,
Sandro Hawke, Jeffrey Hutzelman, Cullen Jennings, Graham Klyne, Dan
Kohn, Henrik Levkowetz, Chris Newman, Mark Nottingham, Conrad Parker,
and Tim Polk.
Authors' Addresses
Erik Wilde
UC Berkeley
School of Information, 311 South Hall
Berkeley, CA 94720-4600
U.S.A.
Phone: +1-510-6432253
EMail: dret@berkeley.edu
URI: <a href="http://dret.net/netdret/">http://dret.net/netdret/</a>
Martin Duerst (Note: Please write "Duerst" with u-umlaut wherever
possible, for example as "D&#252;rst" in XML and HTML.)
Aoyama Gakuin University
5-10-1 Fuchinobe
Sagamihara, Kanagawa 229-8558
Japan
Phone: +81 42 759 6329
Fax: +81 42 759 6495
EMail: duerst@it.aoyama.ac.jp
URI: <a href="http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/">http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/</a>
<span class="grey">Wilde & Duerst Standards Track [Page 16]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-17" ></span>
<span class="grey"><a href="./rfc5147">RFC 5147</a> text/plain Fragment Identifiers April 2008</span>
Full Copyright Statement
Copyright (C) The IETF Trust (2008).
This document is subject to the rights, licenses and restrictions
contained in <a href="https://www.rfc-editor.org/bcp/bcp78">BCP 78</a>, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in <a href="https://www.rfc-editor.org/bcp/bcp78">BCP 78</a> and <a href="https://www.rfc-editor.org/bcp/bcp79">BCP 79</a>.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
<a href="http://www.ietf.org/ipr">http://www.ietf.org/ipr</a>.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Wilde & Duerst Standards Track [Page 17]
</pre>
|