1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445
|
<pre>Internet Engineering Task Force (IETF) M. Ohye
Request for Comments: 6596 J. Kupke
Category: Informational April 2012
ISSN: 2070-1721
<span class="h1">The Canonical Link Relation</span>
Abstract
<a href="./rfc5988">RFC 5988</a> specifies a way to define relationships between links on the
web. This document describes a new type of such a relationship,
"canonical", to designate an Internationalized Resource Identifier
(IRI) as preferred over resources with duplicative content.
Status of This Memo
This document is not an Internet Standards Track specification; it is
published for informational purposes.
This document is a product of the Internet Engineering Task Force
(IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Not all documents
approved by the IESG are a candidate for any level of Internet
Standard; see <a href="./rfc5741#section-2">Section 2 of RFC 5741</a>.
Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
<a href="http://www.rfc-editor.org/info/rfc6596">http://www.rfc-editor.org/info/rfc6596</a>.
Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to <a href="https://www.rfc-editor.org/bcp/bcp78">BCP 78</a> and the IETF Trust's Legal
Provisions Relating to IETF Documents
(<a href="http://trustee.ietf.org/license-info">http://trustee.ietf.org/license-info</a>) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
<span class="grey">Ohye & Kupke Informational [Page 1]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-2" ></span>
<span class="grey"><a href="./rfc6596">RFC 6596</a> The Canonical Link Relation April 2012</span>
<span class="h2"><a class="selflink" id="section-1" href="#section-1">1</a>. Introduction</span>
The canonical link relation specifies the preferred IRI from
resources with duplicative content. Common implementations of the
canonical link relation are to specify the preferred version of an
IRI from duplicate pages created with the addition of IRI parameters
(e.g., session IDs) or to specify the single-page version as
preferred over the same content separated on multiple component
pages.
In regard to the link relation type, "canonical" can be described
informally as the author's preferred version of a resource. More
formally, the canonical link relation specifies the preferred IRI
from a set of resources that return the context IRI's content in
duplicated form. Once specified, applications such as search engines
can focus processing on the canonical, and references to the context
(referring) IRI can be updated to reference the target (canonical)
IRI.
<span class="h2"><a class="selflink" id="section-2" href="#section-2">2</a>. Notational Conventions</span>
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [<a href="./rfc2119" title=""Key words for use in RFCs to Indicate Requirement Levels"">RFC2119</a>].
<span class="h2"><a class="selflink" id="section-3" href="#section-3">3</a>. The Canonical Link Relation</span>
The target (canonical) IRI MUST identify content that is either
duplicative or a superset of the content at the context (referring)
IRI. Authors who declare the canonical link relation ought to
anticipate that applications such as search engines can:
o Index content only from the target IRI (i.e., content from the
context IRIs will be likely disregarded as duplicative).
o Consolidate IRI properties, such as link popularity, to the target
IRI.
o Display the target IRI as the representative IRI.
The target (canonical) IRI MAY:
o Specify a relative IRI (see <a href="./rfc3986#section-4.2">[RFC3986], Section 4.2</a>).
o Be self-referential (context IRI identical to target IRI).
o Exist on a different hostname or domain.
<span class="grey">Ohye & Kupke Informational [Page 2]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-3" ></span>
<span class="grey"><a href="./rfc6596">RFC 6596</a> The Canonical Link Relation April 2012</span>
o Have different scheme names, such as "http" to "https" or "gopher"
to "ftp".
o Be a superset of the content at the context IRI.
* As an example, each component page (e.g., page-1.html, page-
2.html) of a multi-page article MAY specify the "view-all"
version (e.g., page-all.html), the superset of their content,
as the target IRI. This is because the content from each
component page is contained within the view-all version. Given
this implementation, applications can mark page-1.html and
page-2.html as duplicates of page-all.html, process content
only from page-all.html, and disregard the component pages.
All references can then be made to the view-all version (page-
all.html, the target IRI), and no content will have been lost
in this process.
* Using the same example above, page-2.html SHOULD NOT designate
page-1.html as the target (canonical) IRI because this may
cause a loss of data. When page-2.html designates page-1.html
as the canonical, only content from the target IRI, page-
1.html, will be processed. page-2.html may be marked as a
duplicate of page-1.html and its content disregarded.
o Be the source IRI of a temporary redirect. For HTTP, this refers
to status codes 302, 303, or 307 (Sections <a href="#section-10.3.3">10.3.3</a>, <a href="#section-10.3.4">10.3.4</a>, and
10.3.8, respectively, of [<a href="./rfc2616" title=""Hypertext Transfer Protocol -- HTTP/1.1"">RFC2616</a>]).
To better ensure that applications properly handle the canonical link
relation, administrators ought to consider the following guidelines:
o Specify only one canonical link relation for a resource. (It
would be confusing to consider/label/designate more than one IRI
as authoritative.)
o Avoid designating the target (canonical) as:
* The source IRI of a permanent redirect (for HTTP, this refers
to 300 and 301 response codes, defined in Sections <a href="#section-10.3.1">10.3.1</a> and
10.3.2 of [<a href="./rfc2616" title=""Hypertext Transfer Protocol -- HTTP/1.1"">RFC2616</a>]).
* An IRI that also specifies a canonical link relation to an IRI
other than itself.
* An IRI that returns an error code, such as a 4xx response in
HTTP (<a href="./rfc2616#section-10.4">Section 10.4 of [RFC2616]</a>).
<span class="grey">Ohye & Kupke Informational [Page 3]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-4" ></span>
<span class="grey"><a href="./rfc6596">RFC 6596</a> The Canonical Link Relation April 2012</span>
* The first page of a multi-page article or multi-page listing of
items (since the first page is not duplicative or a superset of
the context IRI). For example, page-2.html and page-3.html of
an article SHOULD NOT specify page-1.html as the canonical.
This may cause a loss of data from page-2.html and page-3.html
as they will be marked duplicative of page-1.html with only
content from page-1.html being processed.
When the canonical link relation is declared improperly, such as
creating chained canonicals (i.e., target IRI specifies the source
IRI of a permanent redirect) or designating a target IRI that returns
a 4xx response, applications can use their own heuristics when
processing the resource. For instance, an application can choose to
ignore any improper canonical designation and continue to process the
remaining content on a page.
<span class="h2"><a class="selflink" id="section-4" href="#section-4">4</a>. Examples</span>
The following example illustrates:
o Three IRIs that serve duplicate content.
o One IRI that is the canonical or "preferred version".
o Two IRIs with additional query parameters, making them the non-
preferred version of the content (duplicates). The canonical link
relation is therefore specified on these duplicates.
If the preferred version of a IRI and its content exists at:
http://www.example.com/page.php?item=purse
Then duplicate content IRIs such as:
http://www.example.com/page.php?item=purse&category=bags
http://www.example.com/page.php?item=purse&category=bags&sid=1234
may designate the canonical link relation in HTML as specified in
[<a href="#ref-REC-html401-19991224">REC-html401-19991224</a>]:
<link rel="canonical"
href="http://www.example.com/page.php?item=purse">
or as a relative IRI:
<link rel="canonical" href="page.php?item=purse">
<span class="grey">Ohye & Kupke Informational [Page 4]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-5" ></span>
<span class="grey"><a href="./rfc6596">RFC 6596</a> The Canonical Link Relation April 2012</span>
or alternatively, in the HTTP header field as specified in <a href="./rfc5988#section-5">Section 5
of [RFC5988]</a>:
Link: <http://www.example.com/page.php?item=purse>; rel="canonical"
This signals to applications, such as search engines, that these are
duplicates of the target (canonical) IRI:
http://www.example.com/page.php?item=purse.
Applications may then select the canonical value as the display IRI
(such as in search results), and additional IRI properties such as
indexing and ranking signals can be transferred as well.
<span class="h2"><a class="selflink" id="section-5" href="#section-5">5</a>. Recommendations</span>
Before adding the canonical link relation, verification of the
following is RECOMMENDED:
1. The content of the context IRI is duplicated within the content
of the target (canonical) IRI.
2. For HTTP, permanent HTTP redirects (<a href="./rfc2616#section-10.3.2">Section 10.3.2 of [RFC2616]</a>),
the traditional strong indicator that a IRI's content has been
permanently moved, could not be implemented in place of the
canonical link relation.
3. In the case where the target (canonical) IRI is a superset of
content from the context IRI (i.e., the case where page-1.html
and page-2.html designate page-all.html as the canonical), that
the user experience is strongly taken into consideration, both in
regard to possible increased load time and potential complexity
in navigation.
<span class="h2"><a class="selflink" id="section-6" href="#section-6">6</a>. IANA Considerations</span>
IANA has registered the Canonical Link Relation below as per
[<a href="./rfc5988" title=""Web Linking"">RFC5988</a>].
Relation Name:
canonical
Description:
Designates the preferred version of a resource (the IRI and its
contents).
<span class="grey">Ohye & Kupke Informational [Page 5]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-6" ></span>
<span class="grey"><a href="./rfc6596">RFC 6596</a> The Canonical Link Relation April 2012</span>
Reference:
This specification.
Notes:
None.
Application Data:
None.
<span class="h2"><a class="selflink" id="section-7" href="#section-7">7</a>. Security Considerations</span>
When a site is compromised, the canonical link relation can be
implemented with malicious intent to designate the attacker's IRI as
the preferred version of the content. While this technique is
largely unnoticeable to humans, automated programs may cluster the
compromised resource as duplicative of the attacker's target IRI,
transferring properties such as link popularity away from the
compromised resource to the attacker's designated canonical.
(Naturally, even a site that is not compromised could provide
inaccurate or misleading information about which URI is canonical.)
<span class="h2"><a class="selflink" id="section-8" href="#section-8">8</a>. Internationalization Considerations</span>
Internationalization considerations for link relations are provided
in <a href="./rfc5988#section-8">Section 8 of [RFC5988]</a>.
<span class="h2"><a class="selflink" id="section-9" href="#section-9">9</a>. Normative References</span>
[<a id="ref-REC-html401-19991224">REC-html401-19991224</a>]
Raggett, D., Le Hors, A., and I. Jacobs, "HTML 4.01
Specification", W3C Recommendation REC-html401-19991224,
December 1999,
<<a href="http://www.w3.org/TR/1999/REC-html401-19991224">http://www.w3.org/TR/1999/REC-html401-19991224</a>>.
Latest version available at
<<a href="http://www.w3.org/TR/html401">http://www.w3.org/TR/html401</a>>.
[<a id="ref-RFC2119">RFC2119</a>] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", <a href="https://www.rfc-editor.org/bcp/bcp14">BCP 14</a>, <a href="./rfc2119">RFC 2119</a>, March 1997.
[<a id="ref-RFC2616">RFC2616</a>] Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
Transfer Protocol -- HTTP/1.1", <a href="./rfc2616">RFC 2616</a>, June 1999.
<span class="grey">Ohye & Kupke Informational [Page 6]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-7" ></span>
<span class="grey"><a href="./rfc6596">RFC 6596</a> The Canonical Link Relation April 2012</span>
[<a id="ref-RFC3986">RFC3986</a>] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
Resource Identifier (URI): Generic Syntax", STD 66,
<a href="./rfc3986">RFC 3986</a>, January 2005.
[<a id="ref-RFC5988">RFC5988</a>] Nottingham, M., "Web Linking", <a href="./rfc5988">RFC 5988</a>, October 2010.
<span class="grey">Ohye & Kupke Informational [Page 7]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-8" ></span>
<span class="grey"><a href="./rfc6596">RFC 6596</a> The Canonical Link Relation April 2012</span>
<span class="h2"><a class="selflink" id="appendix-A" href="#appendix-A">Appendix A</a>. Implementations</span>
Automated programs that implement functionality with regard for the
canonical link relation include:
o Google, canonical link relation HTML and HTTP header support,
within the same domain and across domains:
* <<a href="http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html">http://googlewebmastercentral.blogspot.com/2009/02/</a>
<a href="http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html">specify-your-canonical.html</a>>
* <<a href="http://googlewebmastercentral.blogspot.com/2011/06/supporting-relcanonical-http-headers.html">http://googlewebmastercentral.blogspot.com/2011/06/</a>
<a href="http://googlewebmastercentral.blogspot.com/2011/06/supporting-relcanonical-http-headers.html">supporting-relcanonical-http-headers.html</a>>
* <<a href="http://googlewebmastercentral.blogspot.com/2009/12/handling-legitimate-cross-domain.html">http://googlewebmastercentral.blogspot.com/2009/12/</a>
<a href="http://googlewebmastercentral.blogspot.com/2009/12/handling-legitimate-cross-domain.html">handling-legitimate-cross-domain.html</a>>
o Yahoo, canonical link relation HTML support within the same
domain:
* <<a href="http://www.ysearchblog.com/2009/02/12/fighting-duplication-adding-more-arrows-to-your-quiver/">http://www.ysearchblog.com/2009/02/12/</a>
<a href="http://www.ysearchblog.com/2009/02/12/fighting-duplication-adding-more-arrows-to-your-quiver/">fighting-duplication-adding-more-arrows-to-your-quiver/</a>>
o Bing, canonical link relation HTML support within the same domain:
* <<a href="http://www.bing.com/community/site_blogs/b/webmaster/archive/2009/02/12/partnering-to-help-solve-duplicate-content-issues.aspx">http://www.bing.com/community/site_blogs/b/webmaster/archive/</a>
<a href="http://www.bing.com/community/site_blogs/b/webmaster/archive/2009/02/12/partnering-to-help-solve-duplicate-content-issues.aspx">2009/02/12/</a>
<a href="http://www.bing.com/community/site_blogs/b/webmaster/archive/2009/02/12/partnering-to-help-solve-duplicate-content-issues.aspx">partnering-to-help-solve-duplicate-content-issues.aspx</a>>
Authors' Addresses
Maile Ohye
EMail: maileohye@gmail.com
URI: <a href="http://maileohye.com/">http://maileohye.com/</a>
Joachim Kupke
EMail: joachim@kupke.za.net
Ohye & Kupke Informational [Page 8]
</pre>
|