1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399
|
=============================
To Add A New Language Support
=============================
Scope
-----
This document describes how to add a new language support to
language-env.
Almost all users will not want to read this document. Developers
who want to add a new language to language-env will need to read
this document.
Introduction
------------
language-env is a software to establish basic dotfiles for various
native languages, for example, setting LANG variable in .bashrc and
.cshrc, setting fonts in .Xresources so that native characters
can correctly displayed, setting up input server, clients and XIM
interfaces so as native characters can be inputed from various
softwares.
To add a new language support you have to write
* support.<language>.pl ,
* <language>/dot.* and/or <language>/dot.*.pl ,
* documents, and
* *.xbm file for tklanguage.
A menu will be displayed to select a language when 'set-language-env',
a script to set-up native language environment, is invoked by a user.
The selection of this menu cannot be automated because there are no
information which language the user wants such as LANG environmental
variable. (Since it is set-language-env's responsibility to set
LANG environmental variable, set-language-env cannot expect LANG
is already set to proper value.)
Feature of language-env
-----------------------
Before describing how to add a new language support, the policy of
language-env is briefly explained.
language-env does not touch /etc directory, unlike user-{es,de}.
There are a few reasons language-env takes an approach to modify
personal dot-files instead of machine-wide /etc files.
1. Users from two or more countries may use one machine.
2. It is safer.
One weak point is that settings cannot become effective automatically.
A user needs to invoke set-language-env to establish his/her own environment
even if all users speak the same language or if there is only one
user. However, the administrator can help a user to invoke set-language-env
when the admin executes adduser or even set-language-env can be written
in /usr/local/sbin/adduser.local. Read adduser(8) for more detail.
Since 'set-language-env' script runs in ordinal users' priviledge
(without root priviledge), it cannot install additional Debian
packages which are needed to the language environment (for example,
locale-ja package for Japanese). This also is a weak point of this
approach. However, 'set-language-env' shows a list of suggested
packages. The user then can ask the administrator of the machine
to install these packages.
Native Character Environment
----------------------------
During execution, set-language-env takes care of whether the characters
used in the specified language can be displayed or not. set-language-env
outputs many messages and of course these messages is desirable to
be written in native languages and not in English (for non-English-speaking
people. However, English-speaking people will not need this software!).
However, if the native characters used in the language cannot be displayed,
messages should be displayed in ASCII expression of that language
or in English. Consider what occurs if set-language-env displays
native characters (such as kanji for Japanese, u with umlaut for German,
n with tilde for Spanish, Cyrillic and Greek characters for Russian and
Greek) though the environment where the set-language-env runs cannot
display these characters. Meaningless raw of characters will be
displayed. This situation is FAR WORSE than English messages, because
you can read English messages using dictionary but you can NOT read
the meaningless raw of characters AT ALL.
Here we call the environment where native characters can be displayed
as 'Native Character Environment'. For example, xterm can display
ISO-8859-1 characters and kterm can display Japanese characters.
support.<language>.pl has to have a subroutine to handle this environment.
(explained later).
Native Character Codeset for Display and for Source Code
--------------------------------------------------------
Some languages have multiple codesets to express the same contents.
For example, ISO-2022-JP, EUC-JP, and Shift-JIS codesets can be used
to express Japanese text. In the case of such languages, native
character codeset for display and for source code may differ.
For example, 0x5c ('\') may appear in the second byte for Japanese
characters in ISO-2022-JP and Shift-JIS and these codesets should
be avoided for source code. Thus EUC-JP is suitable for source
code. However, ISO-2022-JP is suitable for display because of
the least possibility for its codeset to be detected wrongly.
A subroutine sourceset2displayset($) has to be supplied to convert
from source codeset to display codeset in support.<language>.pl.
Files for language-env
----------------------
language-env consists of the following files.
* /usr/bin/set-language-env
The main script file written in perl.
* /usr/share/language-env/general.pl
Contains subroutines which can be used in <language>/dot.*.pl.
* /usr/share/language-env/<language>/dot.*
Contents to be added to users' dot-files. For example,
'korean/dot.bashrc' will contain settings which should be written
in ~/.bashrc of a Korean user. This is the most important part
of language-env package.
* /usr/share/language-env/<language>/dot.*.pl
Same as <language>/dot.* without .pl, but the contents is specified
by the output of the perl script. For example, 'russian/dot.cshrc.pl'
will outputs settings which sould be written in ~/.cshrc of a Russian
user.
In the script, $DOTFILECONTENTS1 and $DOTFILECONTENTS2 can be
referred. These variables show contents to be added before and
after the contents generated by the script. The contents come
from the original contents of dotfiles.
* /usr/share/language-env/support.<language>.pl
Language-specific settings are contained such as subroutines to
detect Native Character Environment, translated messages, and so on.
If you want to add a support for new language, you have to write this
file with using support.language.pl.template.
This file has to respect the following guidelines.
* /usr/share/language-env/support.language.pl.template
A template file for new support.<language>.pl. This file is also
used as a default settings when support.<language>.pl lacks some
of settings.
* /usr/bin/tklanguage
A wrapper for set-language-env written in Tcl/Tk.
* /usr/share/language-env/<language>.xbm
Image files for language names written in native characters.
Most images are written using 'etl' series fonts, which are
included in xfonts-intl-* packages.
Packages
--------
language-env uses the following packages, in perl's meaning.
* main::
Needless to say.
* Sub::
<language>/dot.*.pl's are executed in Sub:: package. Subroutines
in general.pl are also defined in Sub:: package and these subroutines
can be used in <language>/dot.*.pl's. initialize() in
support.<language>.pl can set any variables in Sub:: package to be
used by <language>/dot.*.pl's.
* Lang::
support.<language>.pl is executed in Lang:: package. This package
is used only to separate it from main:: and avoid collision of
name space. You, who want to add a new language support, don't
need to care about this package.
Subroutines supplied by general.pl
----------------------------------
The following subroutines are supplied by general.pl and included in
Sub:: package. These subroutines can be used from <language>/dot.*.pl
(directly) and support.<language>.pl (by use of Sub::).
* isinstalled($)
Check whether the specified Debian package is installed or not.
* addlist($)
Add a specified package name to a required package list.
The list is displayed at the end of set-language-env.
Spcification using '|' or virtual packages is not supported yet.
* disp($$)
Display message according to native character environment.
The 1st parameter is a message written in ASCII.
The 2nd parameter is a message written in the codeset for source code.
* select($$$$)
Display a message and input an integer number. Read source code for
detail.
Two of parameters are messages written in ASCII and native codeset.
The third parameter is the maximum value to be inputed and the forth
is the default value when only Enter key is pressed.
* yesno($$)
Display a message and input yes or no. Default is yes.
Two parameters are messages written in ASCII and native codeset.
* noyes($$)
Display a message and input yes or no. Default is no.
Two parameters are messages written in ASCII and native codeset.
* ask($$)
Display a message and input a string.
Two parameters are messages written in ASCII and native codeset.
support.<language>.pl
---------------------
At first, you have to write this file. <language> is a name for your
new language. This name has to be expressed in ASCII codeset (NOT
ISO-8859-1!!!!) and can be any strings you want.
'support.language.pl.template' file can be used as a template file
for your support.<language>.pl file.
The first line of this file must specify the name of the language,
followed by "#". This name is used for the language list displayed
when user-ja-conf is invoked.
This file must contain the following subroutines:
* isNC($$$)
Check Native Character Environment.
Return 1 for Native Character Environment and 0 for non-Native
Character (ASCII) Environment.
isNC() has to try establishing native character environment,
for example, by invoking 'kterm' or 'grxvt'.
* initialize()
Anything needed for initialization can be written here.
For example, initialize() can ask which input method the user
prefers or add required Debian packages to a list using 'addlist'
subroutine. Any variables in Sub:: package can be set here to
be used in <language>/dot.*.pl.
* sourceset2displayset($)
Convert a string from source codeset into display codeset.
If your language doesn't need this distinction, this subroutine can
return the parameter.
* analcode($)
Check the codeset of the given string(s). If your language does
not use multiple codesets, you may leave this do nothing.
You can determine the meaning of return value because the return
value is used as a parameter for convcode() subroutine.
* convcode($$)
Convert the given string (the 1st parameter) from its codeset
into given codeset (the 2nd parameter). If your language does
not use multiple codesets, this subroutine returns the given string.
analcode() is used to check the codeset of existing dot-files and
convcode() is used to convert the contents to be added to the same
codeset as the existing dot-files.
You can use Sub::* subroutines (supplied in general.pl) in these
subroutines.
support.<language>.pl also has to contain the following variables.
* %messages
This hash contains translated messages for set-language-env script.
This is like 'catalog' file for gettext. The KEYs of this hash
variable contain the messages written in English (like 'msgid'
for gettext) and the VALUEs of this hash contain the translated
messages. Each VALUE should contain two messages written in ASCII
codeset and in native codeset. "\000" is used to separate these
two messages. For example,
%messages = ( "Spanish" => "Espanol\000Espa\361ol" );
where "\361" is tilded 'n' in ISO-8859-1. This will display
tilded 'n' if the environment can display it and ordinal 'n'
if the environment cannot display ISO-8859-1 (of course if
isNC() is implemented properly). The second message written
in native codeset can be omitted if you don't need. Thus,
%messages = (
"French" => "Francais\000Fran\347ais" ,
"Hello" => "Bonjour"
);
A script to check whether the translation is updated or not
is supplied. Type './check_translation'.
* $yes_upper, $yes_lower, $no_upper, $no_lower
Characters to be used for users' answer of 'yes' and 'no'.
<language>/dot.*, <language>/dot.*.pl
-------------------------------------
Next you have to write <language>/dot.* and/or <language>/dot.*.pl.
'<language>/dot.*' files are templates for dot-files.
On invocation of 'set-language-env', the main part of these files
are added to corresponding dot-files with surrounded by delimiter
lines.
The first line of the file has three control fields.
The first column of the first line specify a character used for
comment in the dot-files. For example, '#' for .bashrc and ';'
for .emacs.
The second column of the first line specify whether correspoinding
dot-file should be executable or not. 'x' means executable and ' '
means non-executable.
The third column of the first line specify whether the addition
to the dot-file is done at the top of the dot-file or the bottom of
the dot-file. ' ' means bottom and 's' means top.
For example, the first line of '<language>/dot.Xresources' may be
'! s' and that of '<language>/dot.xsession' may be '#x '.
The following lines are comment which is displayed when set-langauge-env
is invoked. The end of the comment is specified by a line of 'END'.
The following lines also are comment and the comment ends by 'END'.
The first comment is written in ASCII characters and the second
in native characters.
The following lines are the real contents to be added to the
corresponding dot-file.
For example, '<language>/dot.Xresources' will be
1: ! s
2: This is a setting of resources for X clients.
3: END
4: This is a setting of resources for X clients.
5: END
6: Emacs.Fontset-0: -*-fixed-medium-r-normal--14-*-*-*-*-*-fontset-16
7: Emacs.Font: fontset-16
.pl version (<language>/dot.*.pl) can generate the real contents
to be added. Since the first control line and two comments are
same as non-.pl version, .pl file is not a perl script, strictly
speaking.
For example, '<language>/dot.Xresources.pl' will be
1: ! s
2: This is a setting of resources for X clients.
3: END
4: This is a setting of resources for X clients.
5: END
6: print <<EOF;
7: Emacs.Fontset-0: -*-fixed-medium-r-normal--14-*-*-*-*-*-fontset-16
8: Emacs.Font: fontset-16
9: EOF
though this can be written without perl.
In .pl version, you can refer $DOTFILECONTENTS1 and $DOTFILECONTENTS2
variables. These variables show the contents of original dotfiles
which should be kept unchanged. $DOTFILECONTENTS1 will be added before
the contents which the .pl script generates and $DOTFILECONTENTS2 will
be added after it. Changing the variables doesn't affect the results.
That's all. Put these files (support.<language>.pl and
<language>/dot.* and/or <language>/dot.*.pl) at source directory
and type 'debuild'. Then you can try your language.
You can write a menu item for Debian menu system. You should
also supply documents and manpages in your language.
XBM file
--------
To keep consistent looking, please make XBM file for your language
using -etl-fixed-medium-r-normal--24-* fonts if available.
Please Contact Me
-----------------
If you want to add a new language support, please contact me.
I will always welcome.
---
Tomohiro KUBOTA <kubota@debian.org> Mon, 18 Oct 1999
|