File: overview.html

package info (click to toggle)
wn 2.0.5-3
links: PTS
area: main
in suites: slink
size: 2,208 kB
ctags: 1,499
sloc: ansic: 14,439; sh: 2,430; perl: 1,360; makefile: 291
file content (444 lines) | stat: -rw-r--r-- 20,568 bytes
<!doctype html public "-//W3C//DTD HTML 3.2 Final//EN">
<html>
  <head>
    <title>An Overview of the WN server</title>

    <link rev="made" href="mailto:john@math.nwu.edu">

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
    <meta http-equiv="last-modified" content="Fri, 09 Oct 1998 18:18:09 GMT">
    <meta http-equiv="keywords" content="WN overview">
  </head>

  <body bgcolor="#FFFFFF">
    <p>
      <a href="http://hopf.math.nwu.edu/"><img
        src="images/powered.jpg"
        border="0"
        width="190"
        height="41"
        align="right"
        alt="WN home page"
      ></a>
    </p>

    <strong>Version 2.0.3</strong>

    <br>

    <!-- pnuts --> <a href="manual.html">[Previous]</a> <a href="setup.html">[Next]</a> <a href="manual.html">[Up]</a> <a href="manual.html">[Top]</a> <a href="dosearch.html">[Search]</a> <a href="docindex.html">[Index]</a>

    <br clear="right">

    <hr size="4">
    <!-- #start -->

    <h2 align="center">An Overview of the <em>WN</em> Server</h2>

    <blockquote>
      An HTTP server should do more than just serve files.  It should play an
      active role in both navigation and presentation issues.  It is my hope
      that this server provides better tools for the creative
      webmaster. <address>- John Franks</address>
    </blockquote>

    <hr size="4">

    <p>
      <em>WN</em> is a server for the Hypertext Transfer Protocol <a
      href="http://www.w3c.org/Protocols/">HTTP/1.1</a>.  Its primary design
      goals are security, robustness, and flexibility, in that order.  One of
      its objectives is to provide functionality usually available only with
      complex CGI programs without the necessity of writing or using these
      programs. (Of course <a
      href="http://hoohoo.ncsa.uiuc.edu/cgi/">CGI/1.1</a> is fully supported
      for those who want it).  Despite this extensive functionality the
      <em>WN</em> executable is substantially smaller than the <a
      href="http://www.w3c.org/Daemon/">CERN httpd</a>, <a
      href="http://hoohoo.ncsa.uiuc.edu/">NCSA httpd</a> or <a
      href="http://www.apache.org">Apache</a> servers.
    </p>

    <p>
      <em>WN</em> was planned with a focus on serving <a
      href="http://www.w3c.org/MarkUp/">HTML</a> documents.  This means such
      things as enabling full text searching of a single logical <a
      href="http://www.w3c.org/MarkUp/">HTML</a> document which may consist of
      many files on the server, or allowing users to search all titles on the
      server and obtain a menu of matching items, or allowing users to download
      a total logical document for printing which, in fact, consists of many
      linked files on the server.  All of these are done in a way which is
      transparent to the user <em>(and largely transparent to the
      maintainer)</em>!  The "<a href="manual.html">User's Guide for the WN
      Server</a>", which this chapter is part of, provides a good example of
      many of these features.
    </p>

    <p>
      Another feature not found in many other servers is conditionally served
      text.  Often a server maintainer may wish to serve different versions of
      a document to different clients.  By adding simple <a
      href="http://www.w3c.org/MarkUp/">HTML</a> comments to documents and
      marking those documents to be "parsed" by the server, the maintainer can
      arrange that different sections or entirely different documents are sent
      to clients, based on such things as the client's domain name, IP address,
      browser type, browser "Accept" header, "Cookie header", etc.  This
      feature is described in more detail in the section "<a
      href="parse.html#if">Conditional Text: If, Else, and Endif</a>" in this
      guide.
    </p>

    <p>
      But these are only examples of many new tools <em>WN</em> makes available
      to webmasters.
    </p>

    <p>
      The design and security mechanisms of <em>WN</em> differ substantially
      from those of the httpd servers available from <a
      href="http://www.w3c.org/Daemon/">CERN</a> and <a
      href="http://hoohoo.ncsa.uiuc.edu/">NCSA</a> so a brief description of
      how they work is useful.
    </p>


    <h3>1.1 <a name="how">How <em>WN</em> Works</a></h3>

    <p>
      Files served by an HTTP server may have many attributes relevant to their
      serving.  These attributes include content-type, optional title, optional
      expiration date, optional keywords, whether the file should be parsed for
      server-side includes, access restrictions, etc.  Some servers try to
      encode this information in <em>ad hoc</em> ways, in a file name suffix,
      or in a global configuration file.  The approach of <em>WN</em> is to
      keep this information in small databases, one for each directory in the
      document hierarchy.
    </p>

    <p>
      The <em>WN</em> maintainer never needs to understand the format of these
      database files (named <code>index.cache</code> by default), but this
      format is very simple and a brief description will indicate how
      <em>WN</em> works.  When the server receives a request, say for
      <code>/dir/foo.html</code>, it looks in the file
      <code>/dir/index.cache</code> which contains lines like:
    </p>

    <blockquote>
      <code>
        file=foo.html&amp;content=text/html&amp;title=whatever...
      </code>
    </blockquote>

    <p>
      If the server finds a line starting with "<code>file=foo.html</code>"
      then the file will be served.  If such a line does not exist the file
      will not be served (unless special permission to serve all files in the
      directory has been granted).  This is the basis of <em>WN</em> security.
      Unlike other servers, the default action for <em>WN</em> is to deny
      access to a file. A file can only be served if explicit permission to do
      so has been granted by entering it in the <code>index.cache</code>
      database or if explicit permission to serve all files in
      <code>/dir</code> has been given in the <code>index.cache</code> file in
      <code>/dir</code>.  This database also provides other security functions.
      For example, restricting the execution of <a
      href="http://hoohoo.ncsa.uiuc.edu/cgi/">CGI/1.1</a> programs can be done
      on the basis of the ownership (or group ownership) of their
      <code>index.cache</code> files.  There is no need to limit execution to
      programs located in particular designated directories.  The location of a
      file in the data hierarchy should be orthogonal to security restrictions
      on it and this is the case with the <em>WN</em> server.
    </p>

    <p>
      The <code>index.cache</code> database file has a number of other
      functions beyond its security role.  Attributes of <code>foo.html</code>
      which can be computed before it is served and which don't often change
      are stored in the fields of the line starting <code>file=foo.html</code>.
      For example, the MIME content type "<code>text/html</code>" must be
      deduced from the filename suffix "<code>.html</code>".  This is done once
      at the time <code>index.cache</code> is created and need not be done
      every time the file is served.
    </p>

    <p>
      The title of a file is another example.  With the <em>WN</em> server
      every file served has a title (even binaries) and optionally has a list
      of keywords, an expiration date, and other fields associated with it.
      For an HTML document the title and the keywords are automatically
      extracted from the header of the document and stored in fields of that
      file's line in its <code>index.cache</code> file.  These are used for the
      built-in keyword and title searches which the server supports. The
      maintainer also has the option of adding his own fields to this database
      file.  They could contain such things as document author, document id
      number, etc.  These user defined fields can be searched with the built-in
      <em>WN</em> searches or their contents can be inserted into the document,
      on the fly, as it is served
    </p>

    <p>
      So how are the <code>index.cache</code> databases created?  Their format
      is quite simple and a maintainer is free to create them any way she
      chooses, but normally they are created by the utility <a
      href="index_desc.html#wndex"><code>wndex</code></a> (pronounced
      "windex").  This program, which is part of the <em>WN</em> distribution,
      is designed to produce the <code>index.cache</code> file from a file with
      a friendlier format with the default name "<code><a
      href="index_desc.html#index">index</a></code>".  A very simple <code><a
      href="index_desc.html#index">index</a></code> file might look like:
    </p>

    <blockquote>
      <code>
        <a href="appendixB.html#fdir.file">File=</a>foo.html
        <br>
        <br>
        <a href="appendixB.html#fdir.file">File=</a>clap.au
        <br>
        <a href="appendixB.html#fdir.title">Title=</a>Sound of one hand
        clapping
        <br>
        <br>
        <a href="appendixB.html#fdir.file">File=</a>hand
        <br>
        <a href="appendixB.html#fdir.title">Title=</a>Picture of one hand
        clapping
        <br>
        <a href="appendixB.html#fdir.content-type">Content-type=</a>text/gif
      </code>
    </blockquote>

    <p>
      Of course if the file <code>hand</code> were named <code>hand.gif</code>
      the content-type line would not be necessary as <a
      href="index_desc.html#wndex"><code>wndex</code></a> could deduce the type
      from the <code>.gif</code> suffix.  Likewise it is not necessary to give
      a title for <code>foo.html</code> because <a
      href="index_desc.html#wndex"><code>wndex</code></a> will read the HTML
      header from that file and extract the title and perhaps other things like
      keywords and expiration date.
    </p>


    <h3>1.2 <a name="features">Features of <em>WN</em></a></h3>

    <p>
      The <em>WN</em> server has several features which are not available with
      other servers or only available through the use of <a
      href="http://hoohoo.ncsa.uiuc.edu/cgi/">CGI/1.1</a> programs.
    </p>


    <h4>1.2.1 <a name="features.searching">Searching</a></h4>

    <p>
      One of the design goals of <em>WN</em> is to provide the maintainer with
      tools to create extensive navigational aids for the server. A variety of
      <a href="search.html">search mechanisms</a> are available.
    </p>

    <dl>
      <dt><a href="search.html#title">Title searches</a></dt>
      <dd>
        In response to the <a
        href="http://linux-howto.com/rfc/rfc1500-1999/rfc1738.txt">URL</a>
        <code>&lt;http://host/dir/search=title&gt;</code> the server will
        provide an HTML form (automatically generated or prepared by the
        maintainer) asking for a regular expression search term. When supplied
        the server will search the <code>index.cache</code> files in
        <code>/dir</code> and designated subdirectories for a items whose
        titles contain a match for the search term.  An HTML document with a
        menu of these items is returned.
      </dd>

      <dt><a href="search.html#keyword">Keyword searches</a></dt>
      <dd>
        Like title searches except matches are sought in keywords instead of
        titles.  Keywords for HTML documents are automatically obtained from
        <code>&lt;META&gt;</code> headers.  For other documents (or HTML
        documents) they can be manually supplied in the <code>index</code>
        file.
      </dd>

      <dt><a href="search.html#title_keyword">Title/Keyword search</a></dt>
      <dd>
        Like the above except the match can be either in the keyword or the
        title.
      </dd>

      <dt><a href="search.html#fielded">User supplied field searches</a></dt>
      <dd>
        Like keyword searches except matches are sought in user supplied
        fields.  The user supplied fields can contain any text and are attached
        to a document by entering them in that document's record in the
        <code>index</code> file.  Their purpose is to include items like a
        document id number, or document author in the <code>index.cache</code>
        database.  A field search could then produce all documents by a given
        author for example.  Or using regular expressions in the search term
        produce a list of all documents whose id number satisfy certain
        criteria.
      </dd>

      <dt><a href="search.html#context">Context searches</a></dt>
      <dd>
        Unlike the title and keyword searches this is a full text search of all
        <code>text/*</code> documents in one directory (not subdirectories).
        The returned HTML document contains a list of all the titles of
        documents containing a match together with a sublist of the lines from
        those documents containing the match.  This provides one line of
        context for the match.  For HTML documents the matched expression in
        each of these lines will be a highlighted anchor.  Selecting one takes
        you to the document with your viewer focused on the matching location.
        The primary intent of this feature is to provide full text searching
        for an HTML "document" which might consist of a substantial number of
        files.
      </dd>

      <dt>
        <a href="search.html#grep">File context and <code>grep</code>
        searches</a>
      </dt>
      <dd>
        A file context search is just like a context search, except limited to
        a single file.  The file <code>grep</code> search returns a
        <code>text/html</code> document containing the lines in the file
        matching matching the regular expression.
      </dd>

      <dt><a href="search.html#list">List searches</a></dt>
      <dd>
        The server will search an HTML document looking for an unordered list
        of anchors linking to Web objects.  The contents of each anchor will be
        searched for a match to the supplied regular expression.  The search
        returns an HTML document containing an unordered list of those anchors
        with a match.  This is quite useful with the <a
        href="utility.html#wn_mkdigest"><code>wn_mkdigest</code></a> utility
        which creates HTML documents to be searched in this way from files with
        internal structure like mail or news digests, mailing lists, etc.
      </dd>

      <dt><a href="search.html#index">Index searches</a></dt>
      <dd>
        This is a mechanism by which arbitrary search engines can be linked to
        <em>WN</em> through a <a href="module.html#isearch">search-module</a>.
        The server will provide the search term to the search-module and
        expects an HTML list of links to matching items to be returned.
      </dd>
    </dl>

    <p>
      All of the searching methods listed above except the index searches are
      built into the server and require no additional effort for the
      maintainer.  They are simply referenced with <a
      href="http://linux-howto.com/rfc/rfc1500-1999/rfc1738.txt">URLs</a> like
      <code>&lt;http://host/dir/search=context&gt;</code> where
      <code>/dir</code> is any directory containing files to be served and an
      <code>index.cache</code> listing them.  Of course search permission can
      be denied for any directory or any file contained in that directory.
    </p>


    <h4>1.2.2 <a name="features.parsed">Parsed Text, Server-Side Includes and
    Wrappers</a></h4>

    <p>
      The <em>WN</em> server has extensive capabilities for <a
      href="parse.html">automatically including files</a> in one which is being
      served or "wrapping" a served file with another, i.e. pre-pending and
      post-pending information to a file being served.  This latter is useful
      if you wish to place a standard message at the beginning or end (or both)
      of a large collection of files.  For security all files included in a
      file or used as a wrapper for it are listed in that file's
      <code>index.cache</code> file.  This combined with various available
      security options, like requiring that a served file and all its includes
      and wrappers have the same owner (or group owner) as the
      <code>index.cache</code> file listing them, provide a safe and productive
      Web environment.
    </p>

    <p>
      One important application of wrappers is to customize the HTML documents
      returned listing the successful search matches.  If a search item is
      given a wrapper the server assumes that it contains text describing the
      search and it merely inserts an unordered list of links to the matching
      items.
    </p>

    <p>
      In addition to including files the output of programs may be inserted and
      the value of any user defined field in the <code>index.cache</code>
      database entry for a file may be inserted.
    </p>

    <p>
      Also parsed text may conditionally insert items with a simple <a
      href="parse.html#if">if - else - endif construct</a>. based on
      <code>Accept</code> headers, <code>User-Agent</code> headers,
      <code>Referer</code> headers etc.
    </p>


    <h4>1.2.3 <a name="features.filters">Filters</a></h4>

    <p>
      An arbitrary <a href="filter.html">filter</a> can be assigned to any file
      to be served.  A filter is a program which reads the file and has the
      program output served rather than the content of the file.  The name of
      the filter is another field in the file's line in its
      <code>index.cache</code> file.  One common use of this feature is for
      on-the-fly decompression.  For, example, a file can be stored in its
      compressed form and assigned a filter like the UNIX <a
      href="http://linux-howto.com/man/man1/zcat.1.html"><code>zcat(1)</code></a>
      utility which uncompresses it.  Then the client is served the
      uncompressed file but only the compressed version is stored on disk.  As
      another example, you might use the UNIX <a
      href="http://linux-howto.com/man/man1/nroff.1.html"><code>nroff(1)</code></a>
      utility, "<code>nroff -man</code>", as a filter to process UNIX man files
      before serving.  There are many other interesting uses of filters.  Be
      creative!
    </p>


    <h4>1.2.4 <a name="features.ranges">Ranges</a></h4>

    <p>
      An arbitrary <a href="range.html">range</a> of a file can be served if
      the server is accessed via a <a
      href="http://linux-howto.com/rfc/rfc1500-1999/rfc1738.txt">URL</a> like
      <code>&lt;http://host/dir/foo;lines=20-30&gt;</code> and
      <code>file</code> is any <code>text/*</code> document it will return a
      <code>text/plain</code> document consisting of lines 20 through 30 of
      file <code>foo</code>.  This is very useful for structured text files
      like address lists or digests of mail and news.  A <em>WN</em> utility
      called <a href="utility.html#wn_mkdigest"><code>wn_mkdigest</code></a>
      will produce an HTML document with a list of links to separate sections
      (line ranges) of the structured file.  The <a
      href="utility.html#wn_mkdigest"><code>wn_mkdigest</code></a> utility is
      executed with two regular expressions as arguments: one to match the
      section separator and the other to match the section title.  For a mail
      digest, for example, these could be "<code>^From</code>" and
      "<code>^Subject:</code>" respectively.  Then the sections of the virtual
      documents would be delimited by a line starting with "<code>From</code>"
      and would have the message subject as their title.  A similar mechanism
      provides byte ranges from files.
    <p>


    <!-- #end -->
    <hr size="4">

    <address>
      <em>WN</em> version 2.0.3
      <br>
      Copyright &copy; 1998 <a href="mailto:john@math.nwu.edu">John Franks
      &lt;john@math.nwu.edu&gt;</a>
      <br>
      licensed under the <a href="http://www.opencontent.org/opl.html">
      OpenContent Public License</a>
      <br>
      last-modified: Fri, 09 Oct 1998 18:18:09 GMT
    </address>

    <!-- pnuts --> <a href="manual.html">[Previous]</a> <a href="setup.html">[Next]</a> <a href="manual.html">[Up]</a> <a href="manual.html">[Top]</a> <a href="dosearch.html">[Search]</a> <a href="docindex.html">[Index]</a>
  </body>
</html>