1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261
|
mod_vhost_hash_alias
====================
Overview
========
mod_vhost_hash_alias is a simple, fast and efficient way to
automatically manage virtual hosting.
It allow administrator to build massive virtual web servers,
without the need to describe each virtualhost document root.
It use the servername extract from the HTTP request to build
a path to the real document root. It build a digest based on
the servername and split it according to a configurable directory
scheme.
As digest could collide, mod_vhost_hash_alias add the servername to
distinguish between hash value (and allow human to do inverse lookup
on directory path).
For a given servername, it could give path like that:
/var/lib/www/6/ae/fa93/weuh.org/htdocs
mod_vhost_hash_alias use libmhash to create the digest, so it support
all the hash algo libmhash2 supports.
A virtual host doesn't exist if the document root doesn't exist and
the request should be redirected to a default document root, but it's
something not implemented.
Author
======
Yann Droneaud <ydroneaud@meuh.org>
But check the AUTHORS file too.
License
=======
The module is covered by LGPL (See COPYING.LGPL-2.1)
Tools developped for this project and present in the package are under GPL
(See COPYING).
Others files are under their respective licenses.
Web site
========
http://weuh.org/projects/mod_vhost_hash_alias/
This is hosted by Tuxfamily.org[1], which is the primary user
of the module, through VHFFS[2] hosting plateform.
1. http://tuxfamily.org/
2. http://vhffs.org/
Configuration
=============
#
# Request the module
#
LoadModule vhost_hash_alias_module mod_vhost_hash_alias.so
#
# mod_vhost_hash_alias have to be enabled for each virtual host (catch-all)
#
HashEnable On
#
# Digest algorithm to use:
# CRC32, ADLER32, MD5, SHA1, SHA256, or other types
# supported by the module and libmhash
#
HashType md5
#
# The output encoding (rfc3548) of hash result
# hexa, base16_low, base16_up, base32_low, base32_up, base64_ufs,
#
HashEncoding base16
#
# Number of characters to use to build the document root
# The hash string is truncated to this length
#
HashLimit 8
#
# Splitting scheme
# Specify the size of each chunk of the digest string
# The last count is taken until the end of string is reached
#
HashSplit 1 1 3
#
# The base directory used to build the document root
# (mandatory)
#
HashDocumentRootPrefix /var/lib/www
#
# A directory added to the final built root
# (optionnal)
#
HashDocumentRootSuffix htdocs
#
# A list of host prefix to strip
# eg: this handle basic web aliasing
# http://www.example.com/
# will point on the same document root than
# http://example.com/
#
# (optionnal)
#
HashAddAliasPrefix www ftp
#
# A prefix to add to the server name before it will be hashed
# Act like a salt and must be keeped secret to be efficient
#
# (optionnal)
#
HashPrefix Web:
A stupid example
================
HashDigest md5
HashLimit 17
HashSplit 1 1 2 4
HashDocumentRootPrefix /var/lib/www
HashDocumentRootSuffix htdocs
http://localhost/hello.html
-> /var/lib/www/4/2/1a/a90e/079f/a326/b/localhost/htdocs/hello.html
Why is it stupid:
* You will never need a directory tree as deep as this one.
This scheme allow
16 * 16 * 16^2 * 16^4 * 16^4 * 16^4 * 16 =~ 2.95^20 unique leaf directory.
If your filesystem allow only 65535 files per directory, you could have
at most 65535 web site by directory, so a total of 19.10^24 web sites,
but you likely suffer of collision.
So if you need to host every web of Internet on a single web server,
you could do it. Then, call Google, and run.
* HashSplit and/or HashLimit are not synchronized:
there's no need for a single character at the end of the path,
please keep things as clean as you can
Some good examples
==================
HashDigest md5
HashEncoding base64_ufs
HashLimit 4
HashSplit 1 1 2
HashPrefix web:
HashDocumentRootPrefix /data/www
HashDocumentRootSuffix htdocs
http://tuxfamily.org/
-> /data/www/m/B/gT/tuxfamily.org/htdocs/
http://meuh.org/index.html
-> /data/www/o/I/DS/meuh.org/htdocs/index.html
http://droneaud.com/cv/
-> /data/www/c/u/RG/droneaud.com/htdocs/cv/
http://common-criteria.net/hidden/movies/mp3/divx/warez/
-> /data/www/H/Y/ul/common-criteria.net/htdocs/hidden/movies/mp3/divx/warez/
Tweaking
========
If you want speed, use ADLER32 or CRC32 hash algorithm (but the way to create
the directories will be harder: try to find an ADLER32 command line tool).
And the main problem is still how to keep the number of virtual host
per directory in acceptable limits to get fast filesystem lookup
Using mod_vhost_hash_alias, you already have a good name space use,
this translate in a good space distribution too.
If you want to host no more than 256 virtual host, a configuration
like the one below would do the job (about 4 virtualhosts per directory):
HashDigest adler32
HashEncoding base64_ufs
HashLimit 1
Or to try to minize the risk of having to many virtualhosts in the same
directory (this will allow 256 directories, but some values will probably
not be used due to collision):
HashDigest adler32
HashEncoding base16
HashLimit 2
But 256 could be too many directories in a single directory, so try:
HashDigest adler32
HashEncoding base32
HashLimit 2
HashSplit 1
Remember, there's no need to have a long hash string (don't try to have
a single virtualhost per hash value because there will be collision):
doing this you lengthen the processing.
And try to keep your directory depth as short as possible because Apache
will search an .htaccess file in each level.
What's next ?
=============
The project mod_vhost_hash_alias is already dead, it does just its job,
as it was requested. IT'S PERFECT ! I'm the king of the world !
(hum, wait, I'm opening M-x doctor, fine, ok, let's go)
The only improvements would be some tweaking on speed (see below)
and probably memory usage.
But there's a new project.
The next project is to create a stackable rewrite module:
giving a set of filter, the server name would be transformed
in a document root.
This will open to other kind of rewrite,
for example servername reverse splitting:
http://www.meuh.eu.org/ -> /var/lib/www/org/eu/meuh/www/
This kind of transformation can't be done with the current code.
People will say, mod_vhost_alias already does that, and it's true,
but it does it using regular expression, and I think this could be hard wired
for speed up purpose (don't forget fun and profit too).
And using the right stack of module, the same result than mod_vhost_hash_alias
will be achieved, giving something compatible but more powerful.
The secret project is to replace mod_rewrite with something more awful
and complex.
TODO
====
For people searching for adventure, I suggest you to try to add some
fast digest algorithms like "elf32" to optimize computation:
there's no need to compute a 512 bits hash to use only 32 bits of it
(in case of using 8 characters with base16 encodings, and I hope no one
will ever do things like that for benchmarking and comparison).
See the TODO file for more details.
|