1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444
|
<!doctype html public "-//W3C//DTD HTML 3.2 Final//EN">
<html>
<head>
<title>An Overview of the WN server</title>
<link rev="made" href="mailto:john@math.nwu.edu">
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
<meta http-equiv="last-modified" content="Fri, 09 Oct 1998 18:18:09 GMT">
<meta http-equiv="keywords" content="WN overview">
</head>
<body bgcolor="#FFFFFF">
<p>
<a href="http://hopf.math.nwu.edu/"><img
src="images/powered.jpg"
border="0"
width="190"
height="41"
align="right"
alt="WN home page"
></a>
</p>
<strong>Version 2.0.3</strong>
<br>
<!-- pnuts --> <a href="manual.html">[Previous]</a> <a href="setup.html">[Next]</a> <a href="manual.html">[Up]</a> <a href="manual.html">[Top]</a> <a href="dosearch.html">[Search]</a> <a href="docindex.html">[Index]</a>
<br clear="right">
<hr size="4">
<!-- #start -->
<h2 align="center">An Overview of the <em>WN</em> Server</h2>
<blockquote>
An HTTP server should do more than just serve files. It should play an
active role in both navigation and presentation issues. It is my hope
that this server provides better tools for the creative
webmaster. <address>- John Franks</address>
</blockquote>
<hr size="4">
<p>
<em>WN</em> is a server for the Hypertext Transfer Protocol <a
href="http://www.w3c.org/Protocols/">HTTP/1.1</a>. Its primary design
goals are security, robustness, and flexibility, in that order. One of
its objectives is to provide functionality usually available only with
complex CGI programs without the necessity of writing or using these
programs. (Of course <a
href="http://hoohoo.ncsa.uiuc.edu/cgi/">CGI/1.1</a> is fully supported
for those who want it). Despite this extensive functionality the
<em>WN</em> executable is substantially smaller than the <a
href="http://www.w3c.org/Daemon/">CERN httpd</a>, <a
href="http://hoohoo.ncsa.uiuc.edu/">NCSA httpd</a> or <a
href="http://www.apache.org">Apache</a> servers.
</p>
<p>
<em>WN</em> was planned with a focus on serving <a
href="http://www.w3c.org/MarkUp/">HTML</a> documents. This means such
things as enabling full text searching of a single logical <a
href="http://www.w3c.org/MarkUp/">HTML</a> document which may consist of
many files on the server, or allowing users to search all titles on the
server and obtain a menu of matching items, or allowing users to download
a total logical document for printing which, in fact, consists of many
linked files on the server. All of these are done in a way which is
transparent to the user <em>(and largely transparent to the
maintainer)</em>! The "<a href="manual.html">User's Guide for the WN
Server</a>", which this chapter is part of, provides a good example of
many of these features.
</p>
<p>
Another feature not found in many other servers is conditionally served
text. Often a server maintainer may wish to serve different versions of
a document to different clients. By adding simple <a
href="http://www.w3c.org/MarkUp/">HTML</a> comments to documents and
marking those documents to be "parsed" by the server, the maintainer can
arrange that different sections or entirely different documents are sent
to clients, based on such things as the client's domain name, IP address,
browser type, browser "Accept" header, "Cookie header", etc. This
feature is described in more detail in the section "<a
href="parse.html#if">Conditional Text: If, Else, and Endif</a>" in this
guide.
</p>
<p>
But these are only examples of many new tools <em>WN</em> makes available
to webmasters.
</p>
<p>
The design and security mechanisms of <em>WN</em> differ substantially
from those of the httpd servers available from <a
href="http://www.w3c.org/Daemon/">CERN</a> and <a
href="http://hoohoo.ncsa.uiuc.edu/">NCSA</a> so a brief description of
how they work is useful.
</p>
<h3>1.1 <a name="how">How <em>WN</em> Works</a></h3>
<p>
Files served by an HTTP server may have many attributes relevant to their
serving. These attributes include content-type, optional title, optional
expiration date, optional keywords, whether the file should be parsed for
server-side includes, access restrictions, etc. Some servers try to
encode this information in <em>ad hoc</em> ways, in a file name suffix,
or in a global configuration file. The approach of <em>WN</em> is to
keep this information in small databases, one for each directory in the
document hierarchy.
</p>
<p>
The <em>WN</em> maintainer never needs to understand the format of these
database files (named <code>index.cache</code> by default), but this
format is very simple and a brief description will indicate how
<em>WN</em> works. When the server receives a request, say for
<code>/dir/foo.html</code>, it looks in the file
<code>/dir/index.cache</code> which contains lines like:
</p>
<blockquote>
<code>
file=foo.html&content=text/html&title=whatever...
</code>
</blockquote>
<p>
If the server finds a line starting with "<code>file=foo.html</code>"
then the file will be served. If such a line does not exist the file
will not be served (unless special permission to serve all files in the
directory has been granted). This is the basis of <em>WN</em> security.
Unlike other servers, the default action for <em>WN</em> is to deny
access to a file. A file can only be served if explicit permission to do
so has been granted by entering it in the <code>index.cache</code>
database or if explicit permission to serve all files in
<code>/dir</code> has been given in the <code>index.cache</code> file in
<code>/dir</code>. This database also provides other security functions.
For example, restricting the execution of <a
href="http://hoohoo.ncsa.uiuc.edu/cgi/">CGI/1.1</a> programs can be done
on the basis of the ownership (or group ownership) of their
<code>index.cache</code> files. There is no need to limit execution to
programs located in particular designated directories. The location of a
file in the data hierarchy should be orthogonal to security restrictions
on it and this is the case with the <em>WN</em> server.
</p>
<p>
The <code>index.cache</code> database file has a number of other
functions beyond its security role. Attributes of <code>foo.html</code>
which can be computed before it is served and which don't often change
are stored in the fields of the line starting <code>file=foo.html</code>.
For example, the MIME content type "<code>text/html</code>" must be
deduced from the filename suffix "<code>.html</code>". This is done once
at the time <code>index.cache</code> is created and need not be done
every time the file is served.
</p>
<p>
The title of a file is another example. With the <em>WN</em> server
every file served has a title (even binaries) and optionally has a list
of keywords, an expiration date, and other fields associated with it.
For an HTML document the title and the keywords are automatically
extracted from the header of the document and stored in fields of that
file's line in its <code>index.cache</code> file. These are used for the
built-in keyword and title searches which the server supports. The
maintainer also has the option of adding his own fields to this database
file. They could contain such things as document author, document id
number, etc. These user defined fields can be searched with the built-in
<em>WN</em> searches or their contents can be inserted into the document,
on the fly, as it is served
</p>
<p>
So how are the <code>index.cache</code> databases created? Their format
is quite simple and a maintainer is free to create them any way she
chooses, but normally they are created by the utility <a
href="index_desc.html#wndex"><code>wndex</code></a> (pronounced
"windex"). This program, which is part of the <em>WN</em> distribution,
is designed to produce the <code>index.cache</code> file from a file with
a friendlier format with the default name "<code><a
href="index_desc.html#index">index</a></code>". A very simple <code><a
href="index_desc.html#index">index</a></code> file might look like:
</p>
<blockquote>
<code>
<a href="appendixB.html#fdir.file">File=</a>foo.html
<br>
<br>
<a href="appendixB.html#fdir.file">File=</a>clap.au
<br>
<a href="appendixB.html#fdir.title">Title=</a>Sound of one hand
clapping
<br>
<br>
<a href="appendixB.html#fdir.file">File=</a>hand
<br>
<a href="appendixB.html#fdir.title">Title=</a>Picture of one hand
clapping
<br>
<a href="appendixB.html#fdir.content-type">Content-type=</a>text/gif
</code>
</blockquote>
<p>
Of course if the file <code>hand</code> were named <code>hand.gif</code>
the content-type line would not be necessary as <a
href="index_desc.html#wndex"><code>wndex</code></a> could deduce the type
from the <code>.gif</code> suffix. Likewise it is not necessary to give
a title for <code>foo.html</code> because <a
href="index_desc.html#wndex"><code>wndex</code></a> will read the HTML
header from that file and extract the title and perhaps other things like
keywords and expiration date.
</p>
<h3>1.2 <a name="features">Features of <em>WN</em></a></h3>
<p>
The <em>WN</em> server has several features which are not available with
other servers or only available through the use of <a
href="http://hoohoo.ncsa.uiuc.edu/cgi/">CGI/1.1</a> programs.
</p>
<h4>1.2.1 <a name="features.searching">Searching</a></h4>
<p>
One of the design goals of <em>WN</em> is to provide the maintainer with
tools to create extensive navigational aids for the server. A variety of
<a href="search.html">search mechanisms</a> are available.
</p>
<dl>
<dt><a href="search.html#title">Title searches</a></dt>
<dd>
In response to the <a
href="http://linux-howto.com/rfc/rfc1500-1999/rfc1738.txt">URL</a>
<code><http://host/dir/search=title></code> the server will
provide an HTML form (automatically generated or prepared by the
maintainer) asking for a regular expression search term. When supplied
the server will search the <code>index.cache</code> files in
<code>/dir</code> and designated subdirectories for a items whose
titles contain a match for the search term. An HTML document with a
menu of these items is returned.
</dd>
<dt><a href="search.html#keyword">Keyword searches</a></dt>
<dd>
Like title searches except matches are sought in keywords instead of
titles. Keywords for HTML documents are automatically obtained from
<code><META></code> headers. For other documents (or HTML
documents) they can be manually supplied in the <code>index</code>
file.
</dd>
<dt><a href="search.html#title_keyword">Title/Keyword search</a></dt>
<dd>
Like the above except the match can be either in the keyword or the
title.
</dd>
<dt><a href="search.html#fielded">User supplied field searches</a></dt>
<dd>
Like keyword searches except matches are sought in user supplied
fields. The user supplied fields can contain any text and are attached
to a document by entering them in that document's record in the
<code>index</code> file. Their purpose is to include items like a
document id number, or document author in the <code>index.cache</code>
database. A field search could then produce all documents by a given
author for example. Or using regular expressions in the search term
produce a list of all documents whose id number satisfy certain
criteria.
</dd>
<dt><a href="search.html#context">Context searches</a></dt>
<dd>
Unlike the title and keyword searches this is a full text search of all
<code>text/*</code> documents in one directory (not subdirectories).
The returned HTML document contains a list of all the titles of
documents containing a match together with a sublist of the lines from
those documents containing the match. This provides one line of
context for the match. For HTML documents the matched expression in
each of these lines will be a highlighted anchor. Selecting one takes
you to the document with your viewer focused on the matching location.
The primary intent of this feature is to provide full text searching
for an HTML "document" which might consist of a substantial number of
files.
</dd>
<dt>
<a href="search.html#grep">File context and <code>grep</code>
searches</a>
</dt>
<dd>
A file context search is just like a context search, except limited to
a single file. The file <code>grep</code> search returns a
<code>text/html</code> document containing the lines in the file
matching matching the regular expression.
</dd>
<dt><a href="search.html#list">List searches</a></dt>
<dd>
The server will search an HTML document looking for an unordered list
of anchors linking to Web objects. The contents of each anchor will be
searched for a match to the supplied regular expression. The search
returns an HTML document containing an unordered list of those anchors
with a match. This is quite useful with the <a
href="utility.html#wn_mkdigest"><code>wn_mkdigest</code></a> utility
which creates HTML documents to be searched in this way from files with
internal structure like mail or news digests, mailing lists, etc.
</dd>
<dt><a href="search.html#index">Index searches</a></dt>
<dd>
This is a mechanism by which arbitrary search engines can be linked to
<em>WN</em> through a <a href="module.html#isearch">search-module</a>.
The server will provide the search term to the search-module and
expects an HTML list of links to matching items to be returned.
</dd>
</dl>
<p>
All of the searching methods listed above except the index searches are
built into the server and require no additional effort for the
maintainer. They are simply referenced with <a
href="http://linux-howto.com/rfc/rfc1500-1999/rfc1738.txt">URLs</a> like
<code><http://host/dir/search=context></code> where
<code>/dir</code> is any directory containing files to be served and an
<code>index.cache</code> listing them. Of course search permission can
be denied for any directory or any file contained in that directory.
</p>
<h4>1.2.2 <a name="features.parsed">Parsed Text, Server-Side Includes and
Wrappers</a></h4>
<p>
The <em>WN</em> server has extensive capabilities for <a
href="parse.html">automatically including files</a> in one which is being
served or "wrapping" a served file with another, i.e. pre-pending and
post-pending information to a file being served. This latter is useful
if you wish to place a standard message at the beginning or end (or both)
of a large collection of files. For security all files included in a
file or used as a wrapper for it are listed in that file's
<code>index.cache</code> file. This combined with various available
security options, like requiring that a served file and all its includes
and wrappers have the same owner (or group owner) as the
<code>index.cache</code> file listing them, provide a safe and productive
Web environment.
</p>
<p>
One important application of wrappers is to customize the HTML documents
returned listing the successful search matches. If a search item is
given a wrapper the server assumes that it contains text describing the
search and it merely inserts an unordered list of links to the matching
items.
</p>
<p>
In addition to including files the output of programs may be inserted and
the value of any user defined field in the <code>index.cache</code>
database entry for a file may be inserted.
</p>
<p>
Also parsed text may conditionally insert items with a simple <a
href="parse.html#if">if - else - endif construct</a>. based on
<code>Accept</code> headers, <code>User-Agent</code> headers,
<code>Referer</code> headers etc.
</p>
<h4>1.2.3 <a name="features.filters">Filters</a></h4>
<p>
An arbitrary <a href="filter.html">filter</a> can be assigned to any file
to be served. A filter is a program which reads the file and has the
program output served rather than the content of the file. The name of
the filter is another field in the file's line in its
<code>index.cache</code> file. One common use of this feature is for
on-the-fly decompression. For, example, a file can be stored in its
compressed form and assigned a filter like the UNIX <a
href="http://linux-howto.com/man/man1/zcat.1.html"><code>zcat(1)</code></a>
utility which uncompresses it. Then the client is served the
uncompressed file but only the compressed version is stored on disk. As
another example, you might use the UNIX <a
href="http://linux-howto.com/man/man1/nroff.1.html"><code>nroff(1)</code></a>
utility, "<code>nroff -man</code>", as a filter to process UNIX man files
before serving. There are many other interesting uses of filters. Be
creative!
</p>
<h4>1.2.4 <a name="features.ranges">Ranges</a></h4>
<p>
An arbitrary <a href="range.html">range</a> of a file can be served if
the server is accessed via a <a
href="http://linux-howto.com/rfc/rfc1500-1999/rfc1738.txt">URL</a> like
<code><http://host/dir/foo;lines=20-30></code> and
<code>file</code> is any <code>text/*</code> document it will return a
<code>text/plain</code> document consisting of lines 20 through 30 of
file <code>foo</code>. This is very useful for structured text files
like address lists or digests of mail and news. A <em>WN</em> utility
called <a href="utility.html#wn_mkdigest"><code>wn_mkdigest</code></a>
will produce an HTML document with a list of links to separate sections
(line ranges) of the structured file. The <a
href="utility.html#wn_mkdigest"><code>wn_mkdigest</code></a> utility is
executed with two regular expressions as arguments: one to match the
section separator and the other to match the section title. For a mail
digest, for example, these could be "<code>^From</code>" and
"<code>^Subject:</code>" respectively. Then the sections of the virtual
documents would be delimited by a line starting with "<code>From</code>"
and would have the message subject as their title. A similar mechanism
provides byte ranges from files.
<p>
<!-- #end -->
<hr size="4">
<address>
<em>WN</em> version 2.0.3
<br>
Copyright © 1998 <a href="mailto:john@math.nwu.edu">John Franks
<john@math.nwu.edu></a>
<br>
licensed under the <a href="http://www.opencontent.org/opl.html">
OpenContent Public License</a>
<br>
last-modified: Fri, 09 Oct 1998 18:18:09 GMT
</address>
<!-- pnuts --> <a href="manual.html">[Previous]</a> <a href="setup.html">[Next]</a> <a href="manual.html">[Up]</a> <a href="manual.html">[Top]</a> <a href="dosearch.html">[Search]</a> <a href="docindex.html">[Index]</a>
</body>
</html>
|