1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<title>
ht://Dig: Recognized META information in HTML documents
</title>
</head>
<body bgcolor="#eef7ff">
<h1>
Recognized META information in HTML documents
</h1>
<p>
ht://Dig Copyright © 1995-2004 <a href="THANKS.html">The ht://Dig Group</a><br>
Please see the file <a href="COPYING">COPYING</a> for
license information.
</p>
<hr size="4" noshade>
<h2>
Introduction
</h2>
<p>
As the <a href="index.html">ht://Dig</a> system will index
all HTML pages on a system, individual authors of pages may
want to control some of the aspects of the indexing
operation. To this end, ht://Dig will recognize some special
<META> tag attributes. The following things can be
controlled in this manner:
</p>
<ul>
<li>
Do not index the document
</li>
<li>
Notify a user that the document has expired
</li>
<li>
Set keywords for the document
</li>
</ul>
<hr>
<h2>
General <META> tag use
</h2>
<p>
In HTML, any number of <META> tags can be used between
the <HEAD> and </HEAD> tags of a document. There
are three possible attributes in this tag, two of which are
recognized by ht://Dig:
</p>
<dl>
<dt>
NAME
</dt>
<dd>
Used to name a specific property.
</dd>
<dt>
CONTENT
</dt>
<dd>
Used to supply the value for a named property.
</dd>
</dl>
<p>
A document could start with something like the following:
</p>
<blockquote>
<HTML><br>
<HEAD><br>
<META NAME="htdig-keywords" CONTENT="phone telephone
online electronic directory"><br>
<META NAME="htdig-email"
CONTENT="pat.user@nowhere.net"><br>
<TITLE>Some document title</TITLE><br>
</HEAD><br>
<BODY>
<blockquote>
<em>Body of document</em>
</blockquote>
</BODY><br>
</HTML>
</blockquote>
<hr>
<h2>
Recognized properties
</h2>
<p>
The following properties are recognized by ht://Dig:
</p>
<ul>
<li>
htdig-keywords
</li>
<li>
htdig-noindex
</li>
<li>
htdig-email
</li>
<li>
htdig-notification-date
</li>
<li>
htdig-email-subject
</li>
<li>
robots
</li>
<li>
keywords
</li>
<li>
description
</li>
<li>
author
</li>
</ul>
<p>
Detailed information about the <em>htdig-email</em>, <em>
htdig-notification-date</em>, and <em>
htdig-email-subject</em> properties can be found in the
<a href="notification.html">Email notification service</a>
document.
</p>
<p>
Descriptions of the properties and their values:
</p>
<dl>
<dt>
<strong>htdig-keywords</strong>
</dt>
<dd>
The value of this property should be a blank separated list
of keywords which will get a very high weight when
searching. This can be used to get around some problems
with common synonyms for words in the document. For
example, if a document is a telephone directory, possible
keywords could be "telephone phone directory book list".
Now, regardless of what text is actually in the document,
it can be found if these keywords are used in the search.
The weight that words in the content string will have in
search results is controlled by the
<a href="attrs.html#keywords_factor">
keywords_factor</a> attribute in your configuration.
</dd>
<dt>
<strong>htdig-noindex</strong>
</dt>
<dd>
This property has no value associated with it. If it is
used, the document will NOT be included in any searches.
Example uses of this could be:
<ul>
<li>
A document which is dynamic. ie: the contents change
continually.
</li>
<li>
Temporary document, not officially available, yet.
</li>
<li>
A document you just don't want to be found.
</li>
</ul>
</dd>
<dt>
<strong>htdig-email</strong>
</dt>
<dd>
The value is the email address a notification message
should be sent to. Multiple email addresses can be given by
separating them by commas. If no email address is given, no
notification will be sent.<br>
(Please check the <a href="notification.html">Email
notification service</a> documentation for more details on
this.)
</dd>
<dt>
<strong>htdig-notification-date</strong>
</dt>
<dd>
The value is the date on or after which the notification
should be sent. The format is simply <em>month / day /
year</em>, or if the <a href="attrs.html#iso_8601">iso_8601</a>
attribute is set, <em>year - month - day</em>.
Make sure that the year has the century with it
as well. This means that you should use <em>1995</em>
instead of <em>95</em>.<br>
If no date is given, no notification will be sent. (Please
check the <a href="notification.html">Email notification
service</a> documentation for more details on this.)
</dd>
<dt>
<strong>htdig-email-subject</strong>
</dt>
<dd>
The value specifies the subject the notification message.
This is an optional property. (Please check the
<a href="notification.html">Email notification service</a>
documentation for more details on this.)
</dd>
<dt>
<a name="robots"><strong>robots</strong></a>
</dt>
<dd>
The value specifies restrictions on robots (including ht://Dig)
for the current page. These restrictions can be "noindex" to
prevent indexing the document but allowing the robot to follow
links from the page, "nofollow" to allow indexing but preventing
links from being followed, or "none" to prevent
both. Additionally, ht://Dig supports the values "index" and
"follow" and "all" which obviously are the opposite of the other
values and are the default behavior. For more information on
META robots tags, check out the
<a href="http://www.robotstxt.org/wc/meta-user.html">
HTMLAuthor's Guide to the Robots META tag</a>.
</dd>
<dt>
<strong>keywords</strong>
</dt>
<dd>
The value of this property should be a blank separated list
of keywords, just as for the htdig-keywords property.
They are treated as equivalent by htdig. The reason for
two different properties is that the keywords property
is used by other search engines as well, while the
htdig-keywords property can be used for words you want
indexed only by htdig. You can get htdig to treat other
property names as equivalent to htdig-keywords, or disable
the htdig-keywords or keywords properties, by changing the
<a href="attrs.html#keywords_meta_tag_names">
keywords_meta_tag_names</a> attribute in your configuration.
</dd>
<dt>
<strong>description</strong>
</dt>
<dd>
The value allows you to specify an alternate excerpt
(description) of a page. If the config-file attribute
<a href="attrs.html#use_meta_description">
use_meta_description</a> is used, then any documents with
descriptions will use them instead of the automatically
generated excerpts.
The weight that words in the content string will have in
search results is controlled by the
<a href="attrs.html#meta_description_factor">
meta_description_factor</a> attribute in your configuration.
</dd>
<dt>
<strong>author</strong>
</dt>
<dd>
The value specifies the name, email address and/or affiliation
of the creator or authoriser of a page.
The weight that words in the content string will have in
search results is controlled by the
<a href="attrs.html#author_factor">author_factor</a>
attribute in your configuration.
A search for "author:<em>name</em>" will
look only in these fields for the word <em>name</em>.
</dd>
</dl>
<hr size="4" noshade>
Last modified: $Date: 2004/05/28 13:15:19 $
</body>
</html>
|