1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274
|
Mathopd and CGI
How to adjust your configuration file to enable CGI
--------------------------------
There are two methods to enable Mathopd to run CGI scripts.
The first is the "CGI specialty, the second is the "External"
keyword.
The 'old-fashioned' /cgi-bin method, where one directory is
dedicated to CGI scripts, can be implemented the following way:-
Control {
Alias /cgi-bin
Location /var/www/cgi-bin
Specials {
CGI { * }
}
}
A variation on this theme, where instead of a /cgi-bin directory
we mark CGI scripts by the .cgi extension:-
Control {
Specials {
CGI { cgi }
}
}
Usually, CGI scripts are really just interpreted lines of program text,
like a PHP or Perl script. Mathopd has another mechanism to deal with
these which is far more flexible than the above. It works like this:
Control {
External {
/usr/bin/perl { pl }
}
}
If things are set up this way, any file with extension .pl will
automatically be treated as a CGI script, and /usr/bin/perl will be
launched to interpret it. A side effect of this is that the script in
question does not have to be executable, so no messing about with
chmod is necessary.
Some interpreters, like awk, require a command-line argument before
the script name, in which case you need to do some adjusting. See the
section 'Invocation' and the example below for more details.
CGI and security
--------------------------------
A malicious CGI script can do a lot of damage to the operation of a
webserver. One way for a script to do this is to send a signal to
the server process that stops or kills it. To avoid accidents like
this, it is recommended that you start mathopd as root (user-id 0),
and set it up like this:-
User www
StayRoot On
Control {
ScriptUser cgi
}
This way, the server process runs with the (effective) user-id of
'www', but CGI programs and External programs run with the user-id
of 'cgi'.
It is also possible to have several ScriptUsers, for instance, one
for each virtual server. It is also possible to disallow CGI
altogether by not specifying any ScriptUser.
An example.
User www
StayRoot On
Virtual {
Host www.an.example
Control {
Alias /
Location /home/example/www
ScriptUser example
}
}
Virtual {
Host www.another.example
Control {
Alias /
Location /home/another/www
ScriptUser another
}
}
Virtual {
Host www.athird.example
Control {
Alias /
Location /home/athird/www
}
}
In the above setup, scripts from www.an.example will run as user
'example' and scripts from www.another.example will run as user
'another'. No scripts can run from www.athird.example, because no
ScriptUser is defined there.
Invocation
--------------------------------
Normally, when a CGI program is invoked by Mathopd, its current
directory will be the directory that contains the program itself.
If CGI programs are invoked through the CGI specialty, the value of
argv[0] inside the main() routine of the CGI will be the full pathname
of the program.
If CGI programs are invoked through the "External" mechanism, the
name of the external program is split up at each space character,
and the resulting fragments are stored in the beginning of the argv
array. Then, the full pathname of the CGI script is appended.
If the Request-URI contains a query, and the query does not contain
the '=' character, the query is split up at each '+' character, any
""%" HEX HEX" encoding in the fragments is decoded, and the query
fragments are then passed as command-line arguments to the CGI
program.
If the Request-URI contains a query, and the query contains an '='
character, no command-line arguments are passed to the CGI program.
Instead, the program should use the QUERY_STRING environment variable
to read the query. The query in QUERY_STRING is not hex-decoded.
Environment Variables
--------------------------------
In addition to the variables defined by the CGI standard, Mathopd
sets the following variables.
SCRIPT_FILENAME
this variable contains the physical pathname of the
script that Mathopd invoked;
REQUEST_URI
this variable contains the original Request-URI that
was sent by the client;
REMOTE_PORT
this variable contains the port number of the client's
end of the TCP connection to the web server;
SERVER_ADDR
this variable contains the IP address of the local end
of the TCP connection to the client.
Note that variables that have a zero-length value are not passed
in the environment. For example, if a CGI script is invoked from
a Request-URI that did not contain a query, then there will be
no QUERY_STRING variable (as opposed to a QUERY_STRING variable
with an empty value.)
Also note that, starting from version 1.5b10, Mathopd no longer
sets the REMOTE_HOST variable. Doing so would imply DNS lookups.
These are no longer done.
NPH Scripts and Buffering
--------------------------------
Prior to version 1.5b5, each CGI script launched by Mathopd had
the be an NPH script. That is, each script had a direct connection
to the client. And each script had to send a HTTP response-line.
From version 1.5b5 onwards, the situation is more or less the
opposite. NPH scripts are no longer possible, in the sense that
a CGI script is no longer directly connected to the client.
All output from CGI scripts must pass through the server.
Scripts that are written as NPH will continue to work though.
Any data that a CGI script sends to its standard output will be
sent to the client by Mathopd as soon as possible. So, if your
CGI application sends out a lot of very tiny chunks of data,
these chunks will be received by the client at about the same
speed. The downside of that is that Mathopd becomes quite
busy just copying data to and fro, and will steal precious CPU
cycles from other programs (including the CGI script itself.)
Therefore it is recommended that CGI scripts buffer their output.
"Location:" headers
--------------------------------
The CGI specification states that if a CGI script outputs a
location header that contains a virtual path, that is, a line
like
Location: /thisisthis.html
then the web server should restart the request as if the client
requested /thisisthis.html (see [1]). This is also called an
'internal redirect'. There are some problems with this approach:
- it leeds to endless loops if a CGI script outputs a Location:
header that (ultimately) points to itself; this will kill
performance;
- it does not work if the location that is referred to is
protected by access lists; a script cannot know in advance
whether this will be the case or not;
- the document that is referred to in the Location header cannot
know the location of the referer; therefore, that document
cannot contain relative hyperlinks.
Therefore I have decided not to implement internal redirects. If
a CGI script sets a "Location:" header, then the server will
fabricate a '302 Moved' response, and will pass the value of
"Location:" unmodified to the client, regardless of whether it
is an absolute URI, a virtual path, or something else. Experience
has shown that this is not a problem with most web browsers.
References:
[1] http://hoohoo.ncsa.uiuc.edu/cgi/
Example
--------------------------------
Suppose the configuration reads as follows:-
Server {
Port 8037
Virtual {
Host localhost
Control {
Alias /
Location /tmp
External {
"/usr/bin/awk -f" { .awk }
}
}
}
}
We place a file test.awk in /tmp that reads as follows:-
BEGIN {
print "Content-Type: text/plain"
print ""
print "ARGC=" ARGC
for (i = 1; i < ARGC; i++)
print "ARGV[" i "]=\"" ARGV[i] "\""
if ("QUERY_STRING" in ENVIRON)
print "QUERY_STRING=\"" ENVIRON["QUERY_STRING"] "\""
else
print "QUERY_STRING not set"
exit
}
Output from http://localhost:8037/test.awk
ARGC=1
QUERY_STRING not set
Output from http://localhost:8037/test.awk?A+%42
ARGC=3
ARGV[1]="A"
ARGV[2]="B"
QUERY_STRING="A+%42"
Output from http://localhost:8037/test.awk?A=%42
ARGC=1
QUERY_STRING="A=%42"
|