File: README

package info (click to toggle)
mod-vhost-hash-alias 1.0-2
  • links: PTS
  • area: main
  • in suites: lenny, squeeze, wheezy
  • size: 1,676 kB
  • ctags: 114
  • sloc: sh: 8,468; ansic: 913; makefile: 123; perl: 28; python: 10
file content (261 lines) | stat: -rw-r--r-- 7,093 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
mod_vhost_hash_alias
====================

Overview
========

mod_vhost_hash_alias is a simple, fast and efficient way to
automatically manage virtual hosting.

It allow administrator to build massive virtual web servers,
without the need to describe each virtualhost document root.

It use the servername extract from the HTTP request to build
a path to the real document root. It build a digest based on 
the servername and split it according to a configurable directory
scheme. 

As digest could collide, mod_vhost_hash_alias add the servername to
distinguish between hash value (and allow human to do inverse lookup
on directory path).

For a given servername, it could give path like that:
/var/lib/www/6/ae/fa93/weuh.org/htdocs

mod_vhost_hash_alias use libmhash to create the digest, so it support
all the hash algo libmhash2 supports.

A virtual host doesn't exist if the document root doesn't exist and
the request should be redirected to a default document root, but it's
something not implemented.

Author
======

Yann Droneaud <ydroneaud@meuh.org>

But check the AUTHORS file too.

License
=======

The module is covered by LGPL (See COPYING.LGPL-2.1)
Tools developped for this project and present in the package are under GPL
(See COPYING).

Others files are under their respective licenses.

Web site
========

http://weuh.org/projects/mod_vhost_hash_alias/

This is hosted by Tuxfamily.org[1], which is the primary user
of the module, through VHFFS[2] hosting plateform.

1. http://tuxfamily.org/
2. http://vhffs.org/

Configuration
=============

#
# Request the module
#
LoadModule vhost_hash_alias_module mod_vhost_hash_alias.so

#
# mod_vhost_hash_alias have to be enabled for each virtual host (catch-all) 
#
HashEnable On

#
# Digest algorithm to use:
# CRC32, ADLER32, MD5, SHA1, SHA256, or other types
# supported by the module and libmhash
#
HashType md5

#
# The output encoding (rfc3548) of hash result
# hexa, base16_low, base16_up, base32_low, base32_up, base64_ufs, 
#
HashEncoding base16

#
# Number of characters to use to build the document root
# The hash string is truncated to this length
#
HashLimit 8

#
# Splitting scheme
# Specify the size of each chunk of the digest string
# The last count is taken until the end of string is reached
#
HashSplit 1 1 3

#
# The base directory used to build the document root
# (mandatory)
#
HashDocumentRootPrefix /var/lib/www

#
# A directory added to the final built root
# (optionnal)
#
HashDocumentRootSuffix htdocs

#
# A list of host prefix to strip
# eg: this handle basic web aliasing
#   http://www.example.com/ 
# will point on the same document root than 
#   http://example.com/
#
# (optionnal)
#
HashAddAliasPrefix www ftp

#
# A prefix to add to the server name before it will be hashed
# Act like a salt and must be keeped secret to be efficient
#
# (optionnal)
# 
HashPrefix Web:

A stupid example
================

HashDigest md5
HashLimit 17
HashSplit 1 1 2 4
HashDocumentRootPrefix /var/lib/www
HashDocumentRootSuffix htdocs

http://localhost/hello.html
-> /var/lib/www/4/2/1a/a90e/079f/a326/b/localhost/htdocs/hello.html

Why is it stupid:

* You will never need a directory tree as deep as this one.
  This scheme allow
  16 * 16 * 16^2 * 16^4 * 16^4 * 16^4 * 16 =~ 2.95^20 unique leaf directory.
  If your filesystem allow only 65535 files per directory, you could have
  at most 65535 web site by directory, so a total of 19.10^24 web sites, 
  but you likely suffer of collision.
  So if you need to host every web of Internet on a single web server, 
  you could do it. Then, call Google, and run.

* HashSplit and/or HashLimit are not synchronized: 
  there's no need for a single character at the end of the path,
  please keep things as clean as you can

Some good examples
==================

HashDigest md5
HashEncoding base64_ufs
HashLimit 4
HashSplit 1 1 2
HashPrefix web:
HashDocumentRootPrefix /data/www
HashDocumentRootSuffix htdocs

http://tuxfamily.org/
-> /data/www/m/B/gT/tuxfamily.org/htdocs/

http://meuh.org/index.html
-> /data/www/o/I/DS/meuh.org/htdocs/index.html

http://droneaud.com/cv/
-> /data/www/c/u/RG/droneaud.com/htdocs/cv/

http://common-criteria.net/hidden/movies/mp3/divx/warez/
-> /data/www/H/Y/ul/common-criteria.net/htdocs/hidden/movies/mp3/divx/warez/

Tweaking
========

If you want speed, use ADLER32 or CRC32 hash algorithm (but the way to create
the directories will be harder: try to find an ADLER32 command line tool).

And the main problem is still how to keep the number of virtual host 
per directory in acceptable limits to get fast filesystem lookup

Using mod_vhost_hash_alias, you already have a good name space use, 
this translate in a good space distribution too.

If you want to host no more than 256 virtual host, a configuration 
like the one below would do the job (about 4 virtualhosts per directory):

HashDigest adler32
HashEncoding base64_ufs
HashLimit 1

Or to try to minize the risk of having to many virtualhosts in the same
directory (this will allow 256 directories, but some values will probably
not be used due to collision):

HashDigest adler32
HashEncoding base16
HashLimit 2

But 256 could be too many directories in a single directory, so try:

HashDigest adler32
HashEncoding base32
HashLimit 2
HashSplit 1

Remember, there's no need to have a long hash string (don't try to have
a single virtualhost per hash value because there will be collision): 
doing this you lengthen the processing.

And try to keep your directory depth as short as possible because Apache
will search an .htaccess file in each level.


What's next ?
=============

The project mod_vhost_hash_alias is already dead, it does just its job, 
as it was requested. IT'S PERFECT ! I'm the king of the world !
(hum, wait, I'm opening M-x doctor, fine, ok, let's go)

The only improvements would be some tweaking on speed (see below)
and probably memory usage.

But there's a new project.

The next project is to create a stackable rewrite module: 
giving a set of filter, the server name would be transformed
in a document root.

This will open to other kind of rewrite, 
for example servername reverse splitting: 
http://www.meuh.eu.org/ -> /var/lib/www/org/eu/meuh/www/
This kind of transformation can't be done with the current code.

People will say, mod_vhost_alias already does that, and it's true, 
but it does it using regular expression, and I think this could be hard wired
for speed up purpose (don't forget fun and profit too).

And using the right stack of module, the same result than mod_vhost_hash_alias
will be achieved, giving something compatible but more powerful.
 
The secret project is to replace mod_rewrite with something more awful
and complex.

TODO
====

For people searching for adventure, I suggest you to try to add some
fast digest algorithms like "elf32" to optimize computation:
there's no need to compute a 512 bits hash to use only 32 bits of it
(in case of using 8 characters with base16 encodings, and I hope no one
will ever do things like that for benchmarking and comparison).

See the TODO file for more details.