1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<META NAME="description"
CONTENT="Some 'how to' tips for the Apache httpd server">
<META NAME="keywords" CONTENT="apache,redirect,robots,rotate,logfiles">
<TITLE>Apache HOWTO documentation</TITLE>
</HEAD>
<!-- Background white, links blue (unvisited), navy (visited), red (active) -->
<BODY
BGCOLOR="#FFFFFF"
TEXT="#000000"
LINK="#0000FF"
VLINK="#000080"
ALINK="#FF0000"
>
<DIV ALIGN="CENTER">
<IMG SRC="../images/sub.gif" ALT="[APACHE DOCUMENTATION]">
<H3>
Apache HTTP Server Version 1.3
</H3>
</DIV>
<H1 ALIGN="CENTER">Apache HOWTO documentation</H1>
How to:
<UL>
<LI><A HREF="#redirect">redirect an entire server or directory to a single
URL</A>
<LI><A HREF="#logreset">reset your log files</A>
<LI><A HREF="#stoprob">stop/restrict robots</A>
</UL>
<HR>
<H2><A NAME="redirect">How to redirect an entire server or directory to a
single URL</A></H2>
<P>There are two chief ways to redirect all requests for an entire
server to a single location: one which requires the use of
<CODE>mod_rewrite</CODE>, and another which uses a CGI script.
<P>First: if all you need to do is migrate a server from one name to
another, simply use the <CODE>Redirect</CODE> directive, as supplied
by <CODE>mod_alias</CODE>:
<BLOCKQUOTE><PRE>
Redirect / http://www.apache.org/
</PRE></BLOCKQUOTE>
<P>Since <CODE>Redirect</CODE> will forward along the complete path,
however, it may not be appropriate - for example, when the directory
structure has changed after the move, and you simply want to direct people
to the home page.
<P>The best option is to use the standard Apache module
<CODE>mod_rewrite</CODE>.
If that module is compiled in, the following lines
<BLOCKQUOTE><PRE>RewriteEngine On
RewriteRule /.* http://www.apache.org/ [R]
</PRE></BLOCKQUOTE>
This will send an HTTP 302 Redirect back to the client, and no matter
what they gave in the original URL, they'll be sent to
"http://www.apache.org".
The second option is to set up a <CODE>ScriptAlias</CODE> pointing to
a <STRONG>cgi script</STRONG> which outputs a 301 or 302 status and the
location
of the other server.</P>
<P>By using a <STRONG>cgi-script</STRONG> you can intercept various requests
and
treat them specially, e.g. you might want to intercept <STRONG>POST</STRONG>
requests, so that the client isn't redirected to a script on the other
server which expects POST information (a redirect will lose the POST
information.) You might also want to use a CGI script if you don't
want to compile mod_rewrite into your server.
<P>Here's how to redirect all requests to a script... In the server
configuration file,
<BLOCKQUOTE><PRE>ScriptAlias / /usr/local/httpd/cgi-bin/redirect_script</PRE>
</BLOCKQUOTE>
and here's a simple perl script to redirect requests:
<BLOCKQUOTE><PRE>
#!/usr/local/bin/perl
print "Status: 302 Moved Temporarily\r
Location: http://www.some.where.else.com/\r\n\r\n";
</PRE></BLOCKQUOTE></P>
<HR>
<H2><A NAME="logreset">How to reset your log files</A></H2>
<P>Sooner or later, you'll want to reset your log files (access_log and
error_log) because they are too big, or full of old information you don't
need.</P>
<P><CODE>access.log</CODE> typically grows by 1Mb for each 10,000 requests.</P>
<P>Most people's first attempt at replacing the logfile is to just move the
logfile or remove the logfile. This doesn't work.</P>
<P>Apache will continue writing to the logfile at the same offset as before the
logfile moved. This results in a new logfile being created which is just
as big as the old one, but it now contains thousands (or millions) of null
characters.</P>
<P>The correct procedure is to move the logfile, then signal Apache to tell
it to reopen the logfiles.</P>
<P>Apache is signaled using the <STRONG>SIGHUP</STRONG> (-1) signal. e.g.
<BLOCKQUOTE><CODE>
mv access_log access_log.old<BR>
kill -1 `cat httpd.pid`
</CODE></BLOCKQUOTE>
</P>
<P>Note: <CODE>httpd.pid</CODE> is a file containing the
<STRONG>p</STRONG>rocess <STRONG>id</STRONG>
of the Apache httpd daemon, Apache saves this in the same directory as the log
files.</P>
<P>Many people use this method to replace (and backup) their logfiles on a
nightly or weekly basis.</P>
<HR>
<H2><A NAME="stoprob">How to stop or restrict robots</A></H2>
<P>Ever wondered why so many clients are interested in a file called
<CODE>robots.txt</CODE> which you don't have, and never did have?</P>
<P>These clients are called <STRONG>robots</STRONG> (also known as crawlers,
spiders and other cute name) - special automated clients which
wander around the web looking for interesting resources.</P>
<P>Most robots are used to generate some kind of <EM>web index</EM> which
is then used by a <EM>search engine</EM> to help locate information.</P>
<P><CODE>robots.txt</CODE> provides a means to request that robots limit their
activities at the site, or more often than not, to leave the site alone.</P>
<P>When the first robots were developed, they had a bad reputation for
sending hundreds/thousands of requests to each site, often resulting
in the site being overloaded. Things have improved dramatically since
then, thanks to <A
HREF="http://info.webcrawler.com/mak/projects/robots/guidelines.html">
Guidelines for Robot Writers</A>, but even so, some robots may exhibit
unfriendly behavior which the webmaster isn't willing to tolerate, and
will want to stop.</P>
<P>Another reason some webmasters want to block access to robots, is to
stop them indexing dynamic information. Many search engines will use the
data collected from your pages for months to come - not much use if your
serving stock quotes, news, weather reports or anything else that will be
stale by the time people find it in a search engine.</P>
<P>If you decide to exclude robots completely, or just limit the areas
in which they can roam, create a <CODE>robots.txt</CODE> file; refer
to the <A HREF="http://info.webcrawler.com/mak/projects/robots/robots.html"
>robot information pages</A> provided by Martijn Koster for the syntax.</P>
<HR>
<H3 ALIGN="CENTER">
Apache HTTP Server Version 1.3
</H3>
<A HREF="./"><IMG SRC="../images/index.gif" ALT="Index"></A>
<A HREF="../"><IMG SRC="../images/home.gif" ALT="Home"></A>
</BODY>
</HTML>
|