1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
|
[[utf-8-build]]
=== Build under UTF-8
The default locale of the build environment is *C*.
Some programs such as the *read* function of Python3 change their behavior depending on the locale.
Adding the following code to the *debian/rules* file ensures building the program under the *C.UTF-8* locale.
----
LC_ALL := C.UTF-8
export LC_ALL
----
[[utf-8-conv]]
=== UTF-8 conversion
If upstream documents are encoded in old encoding schemes, converting them to https://en.wikipedia.org/wiki/UTF-8[UTF-8] is a good idea.
Use the *iconv* command in the *libc-bin* package to convert encodings of plain text files.
----
$ iconv -f latin1 -t utf8 foo_in.txt > foo_out.txt
----
Use *w3m*(1) to convert from HTML files to UTF-8 plain text files. When you do this, make sure to execute it under UTF-8 locale.
----
$ LC_ALL=C.UTF-8 w3m -o display_charset=UTF-8 \
-cols 70 -dump -no-graph -T text/html \
< foo_in.html > foo_out.txt
----
Run these scripts in the *override_dh_** target of the *debian/rules* file.
|