1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245
|
=head1 NAME
mod_encoding - Apache module for non-ascii filename interoperability
=head1 SYNOPSIS
# in httpd.conf
LoadModule headers_module libexec/mod_headers.so
LoadModule encoding_module libexec/mod_encoding.so
AddModule mod_headers.c
AddModule mod_encoding.c
<IfModule mod_headers.c>
Header add MS-Author-Via "DAV"
</IfModule>
<IfModule mod_encoding.c>
EncodingEngine on
NormalizeUsername on
SetServerEncoding UTF-8
DefaultClientEncoding JA-AUTO-SJIS-MS SJIS
AddClientEncoding "cadaver/" EUC-JP
</IfModule>
=head1 DESCRIPTION
This module improves non-ascii filename interoperability of
apache (and mod_dav).
It seems many WebDAV clients send filename in its platform-local
encoding. But since mod_dav expects everything, even HTTP request
line, to be in UTF-8, this causes an interoperability problem.
I believe this is a future issue for specification (RFC?) to
standardize encoding used in HTTP request-line and HTTP header,
but life would be much easier if mod_dav (and others) can handle
various encodings sent by clients, TODAY. This module does just that.
=head1 REQUIREMENTS
This module requires iconv(3) support.
If your system don't have it (ex. *BSD platform), try using
iconv(3) implementation available from
http://clisp.cons.org/~haible/packages-libiconv.html
This worked for me on BSD/OS 4.1.
=head1 INSTALLATION
Standard procedure of
$ ./configure --with-apxs=<path-to-apxs>
$ make
$ make install
should work. If you fail with "make install", just copy
created mod_encoding.so to where standard Apache DSO modules reside.
Or, you can use Makefile.simple that comes with the package.
If you have problem with configure, it is recommended to manually
edit Makefile.simple and use apxs(1) directly, as that will make
things much simpler.
Now, if configure doesn't work and you don't have working apxs,
you have a problem. In that case, you should probably consult
the apache documentation and find how you can integrate a module
into the server.
=head1 CONFIGURATION
This module adds following directives: EncodingEngine, SetServerEncoding,
AddClientEncoding, DefaultClientEncoding, and NormalizeUsername.
=over 4
=item EncodingEngine (on|off)
This directive either enables or disables this module.
=item SetServerEncoding <encoding>
This directive specifies encoding used by local filesystem.
Whenever mod_dav is requested to create file or folder, its
name will be converted into this encoding.
However, since mod_dav does not (yet) supports encoding other
than UTF-8 for local filesystem, you should better set this
to "UTF-8", unless you have apply separately available patch
to mod_dav.
=item AddClientEncoding <agent-name> <encoding> [<encoding> ...]
This is a directive to specify encoding(s) expected from each
client implementation.
Though WebDAV clients are expected (or at least recommended,
I believe) to send every data in UTF-8 or any other properly
detectable style, some (many?) clients seems to send data in
non-auto-detectable, platform-local encoding, thus breaking
interoperability.
You can use extended regexp to name the agent.
Note you should never use ".*" for agent name.
In that case, use DefaultClientEncoding instead.
=item DefaultClientEncoding <encoding> [<encoding> ...]
This directive sets default set of encoding(s) to expect
from various clients in general. Note you have no need
to specify "UTF-8", as that is the implicit default.
=item NormalizeUsername (on|off)
This directive is introduced to support behavior of WindowsXP
when accessing password-protected resource. For some reason,
it prepends "hostname\" to real username, and no server can
handle such extension. Enabling this option strips off "hostname\"
part, so only "real" username is passed to authentication module.
=back
=head1 SUPPORTED ENCODINGS
This module supports all encoding supported by underlying
iconv(3) implementation. You might want to try "iconv -l",
as it might give you the list of encoding names.
Also, if you have installed and linked iconv_hook extension,
you should be able to use following encoding names additionally:
MSSJIS
- This is almost same as SJIS, but is a Microsoft varient of it.
JA-AUTO-SJIS-MS
- This is a special converter which does autodetection between
UTF-8/JIS/MSSJIS/SJIS/EUC-JP. This itself does not do conversion.
=head1 INFORMATIONAL NOTES
This is an informational note for developers.
Today, as people around the world start to exchange information in
many languages, many protocols are now required and so beginning to
consider to be i18n (internationalization) compliant. WebDAV is not
the exception.
WebDAV selected XML as its data exchange format, and XML is
one of the most well-defined formats including this i18n issue.
However, because past standards/implementations (ex. DNS, HTTP,
etc.) did not care much about this issue, many HTTP/WebDAV
clients seem to break when used in i18n (= non-ascii) environment.
For WebDAV, there usually is no problem for its XML content part.
But HTTP header part seems to be broken in many implementations.
Here, I will describe several often observed problems with possible
solution(s).
[Problem in PUT operation]
Consider the situation when one WebDAV client tries to save a
file which has non-ASCII filename. First, it sends out filename
in HTTP header:
PUT /<non-ascii-filename> HTTP/1.1
Current standard only asks clients not to use non-ASCII encoding
in HTTP header. So many clients simply url-escapes the name in
%xx style and get away with it (some doesn't even bother to
escape, which is really broken).
Now, when server receives PUT request, it'll unescape (if escaped)
given filename and then saves the file in that name. This is the
first point of interoperability problem.
As server has no idea what encoding or charset this filename belongs
to, it cannot apply proper conversion to make sure all filenames are
aligned to encoding or charset supported by server-side filesystem.
Without information on filename charset and encoding, even a simple
"ls" or "PROPFIND" is most likely to generate unreadable garbage.
[Problem in PROPFIND operation]
Next interoperability problem arises when response to PROPFIND
request is sent. As server has no idea on charset and encoding
used, all names will be included in XML-formatted response
without (or with improper) encoding information.
Obviously, as defined by XML spec, this causes XML parser used
at client side to abort. Even if it didn't (which means non-compliant
parser), the chance for client to show/handle filename correctly
is scarce.
[Problem in MOVE operation]
Another problem arises when client tries to rename file.
To rename file, client sends out request in following format:
MOVE /<old-non-ascii-filename> HTTP/1.1
Destination: /<new-non-ascii-filename>
This is a same problem as PUT operation problem. Client simply
passes filenames in its platform-local encoding (or url-escaped
string of that), and server just cannot handle it.
[Possible Solution]
Solution to this interoperability problem is rather straight-
forward. Practically there're two ways to do it:
1. Always pass charset information. You can also you charset
encoding scheme which contains such information by default.
With proper charset information, both client and server
can interoperate safely by converting encoding whenever
needed.
2. Always use single charset encoding, which can describe any
character in any language.
Though current Unicode standard has many pitfalls and has been
criticized by many people working on i18n issue, this is
obviously a faster way because you don't have to know anything
about charset encoding.
While XML took the way to support both schemes, many protocols (DNS,
HTTP, etc) seem to be going with method #2.
So if you're implementing WebDAV client/server and want to do a
quick hack to support i18n filename, try
a. Always convert outgoing string to UTF-8 (and url-escape
it whenever needed).
b. Always convert incoming string to platform-local encoding.
If both client and server do the same, they will be interoperable.
=cut
|