File: README.i18n

package info (click to toggle)
language-env 0.64
links: PTS
area: main
in suites: sarge
size: 1,588 kB
ctags: 125
sloc: perl: 6,243; makefile: 83; tcl: 64; sh: 18
file content (399 lines) | stat: -rw-r--r-- 15,364 bytes
parent folder | download | duplicates (4)
=============================
To Add A New Language Support
=============================


Scope
-----

This document describes how to add a new language support to
language-env.

Almost all users will not want to read this document.  Developers
who want to add a new language to language-env will need to read
this document.


Introduction
------------

language-env is a software to establish basic dotfiles for various
native languages, for example, setting LANG variable in .bashrc and
.cshrc, setting fonts in .Xresources so that native characters
can correctly displayed, setting up input server, clients and XIM 
interfaces so as native characters can be inputed from various 
softwares.


To add a new language support you have to write

 * support.<language>.pl , 
 * <language>/dot.* and/or <language>/dot.*.pl ,
 * documents, and
 * *.xbm file for tklanguage.

A menu will be displayed to select a language when 'set-language-env',
a script to set-up native language environment, is invoked by a user.  
The selection of this menu cannot be automated because there are no 
information which language the user wants such as LANG environmental 
variable.  (Since it is set-language-env's responsibility to set 
LANG environmental variable, set-language-env cannot expect LANG 
is already set to proper value.)


Feature of language-env
-----------------------

Before describing how to add a new language support, the policy of
language-env is briefly explained.

language-env does not touch /etc directory, unlike user-{es,de}.
There are a few reasons language-env takes an approach to modify 
personal dot-files instead of machine-wide /etc files.

 1. Users from two or more countries may use one machine.

 2. It is safer.

One weak point is that settings cannot become effective automatically.
A user needs to invoke set-language-env to establish his/her own environment
even if all users speak the same language or if there is only one
user.  However, the administrator can help a user to invoke set-language-env
when the admin executes adduser or even set-language-env can be written
in /usr/local/sbin/adduser.local.  Read adduser(8) for more detail.

Since 'set-language-env' script runs in ordinal users' priviledge
(without root priviledge), it cannot install additional Debian
packages which are needed to the language environment (for example, 
locale-ja package for Japanese).  This also is a weak point of this 
approach.  However, 'set-language-env' shows a list of suggested 
packages.  The user then can ask the administrator of the machine 
to install these packages.


Native Character Environment
----------------------------

During execution, set-language-env takes care of whether the characters 
used in the specified language can be displayed or not.  set-language-env
outputs many messages and of course these messages is desirable to 
be written in native languages and not in English (for non-English-speaking 
people.  However, English-speaking people will not need this software!).  
However, if the native characters used in the language cannot be displayed,
messages should be displayed in ASCII expression of that language
or in English.  Consider what occurs if set-language-env displays
native characters (such as kanji for Japanese, u with umlaut for German,
n with tilde for Spanish, Cyrillic and Greek characters for Russian and
Greek) though the environment where the set-language-env runs cannot
display these characters.  Meaningless raw of characters will be 
displayed.  This situation is FAR WORSE than English messages, because 
you can read English messages using dictionary but you can NOT read 
the meaningless raw of characters AT ALL. 

Here we call the environment where native characters can be displayed 
as 'Native Character Environment'.  For example, xterm can display 
ISO-8859-1 characters and kterm can display Japanese characters.

support.<language>.pl has to have a subroutine to handle this environment.
(explained later).


Native Character Codeset for Display and for Source Code
--------------------------------------------------------

Some languages have multiple codesets to express the same contents.
For example, ISO-2022-JP, EUC-JP, and Shift-JIS codesets can be used
to express Japanese text.  In the case of such languages, native 
character codeset for display and for source code may differ.
For example, 0x5c ('\') may appear in the second byte for Japanese
characters in ISO-2022-JP and Shift-JIS and these codesets should 
be avoided for source code.  Thus EUC-JP is suitable for source
code.  However, ISO-2022-JP is suitable for display because of
the least possibility for its codeset to be detected wrongly.

A subroutine sourceset2displayset($) has to be supplied to convert
from source codeset to display codeset in support.<language>.pl.


Files for language-env
----------------------

language-env consists of the following files.

 * /usr/bin/set-language-env
   The main script file written in perl.

 * /usr/share/language-env/general.pl
   Contains subroutines which can be used in <language>/dot.*.pl.

 * /usr/share/language-env/<language>/dot.*
   Contents to be added to users' dot-files.  For example,
   'korean/dot.bashrc' will contain settings which should be written
   in ~/.bashrc of a Korean user.  This is the most important part 
   of language-env package.

 * /usr/share/language-env/<language>/dot.*.pl
   Same as <language>/dot.* without .pl, but the contents is specified
   by the output of the perl script.  For example, 'russian/dot.cshrc.pl'
   will outputs settings which sould be written in ~/.cshrc of a Russian
   user.
   In the script, $DOTFILECONTENTS1 and $DOTFILECONTENTS2 can be
   referred.  These variables show contents to be added before and
   after the contents generated by the script.  The contents come
   from the original contents of dotfiles.

 * /usr/share/language-env/support.<language>.pl
   Language-specific settings are contained such as subroutines to
   detect Native Character Environment, translated messages, and so on.
   If you want to add a support for new language, you have to write this
   file with using support.language.pl.template.
   This file has to respect the following guidelines.

 * /usr/share/language-env/support.language.pl.template
   A template file for new support.<language>.pl.  This file is also
   used as a default settings when support.<language>.pl lacks some
   of settings.

 * /usr/bin/tklanguage
   A wrapper for set-language-env written in Tcl/Tk.

 * /usr/share/language-env/<language>.xbm
   Image files for language names written in native characters.
   Most images are written using 'etl' series fonts, which are
   included in xfonts-intl-* packages.

Packages
--------

language-env uses the following packages, in perl's meaning.

 * main::
   Needless to say.

 * Sub::
   <language>/dot.*.pl's are executed in Sub:: package.  Subroutines
   in general.pl are also defined in Sub:: package and these subroutines
   can be used in <language>/dot.*.pl's.  initialize() in 
   support.<language>.pl can set any variables in Sub:: package to be
   used by <language>/dot.*.pl's.

 * Lang::
   support.<language>.pl is executed in Lang:: package.  This package
   is used only to separate it from main:: and avoid collision of
   name space.  You, who want to add a new language support, don't
   need to care about this package.


Subroutines supplied by general.pl
----------------------------------

The following subroutines are supplied by general.pl and included in
Sub:: package.  These subroutines can be used from <language>/dot.*.pl
(directly) and support.<language>.pl (by use of Sub::).

 * isinstalled($)
   Check whether the specified Debian package is installed or not.
 * addlist($)
   Add a specified package name to a required package list.
   The list is displayed at the end of set-language-env.
   Spcification using '|' or virtual packages is not supported yet.
 * disp($$)
   Display message according to native character environment.
   The 1st parameter is a message written in ASCII.
   The 2nd parameter is a message written in the codeset for source code.
 * select($$$$)
   Display a message and input an integer number.  Read source code for 
   detail.
   Two of parameters are messages written in ASCII and native codeset.
   The third parameter is the maximum value to be inputed and the forth
   is the default value when only Enter key is pressed.
 * yesno($$)
   Display a message and input yes or no. Default is yes.
   Two parameters are messages written in ASCII and native codeset.
 * noyes($$)
   Display a message and input yes or no. Default is no.
   Two parameters are messages written in ASCII and native codeset.
 * ask($$)
   Display a message and input a string.
   Two parameters are messages written in ASCII and native codeset.


support.<language>.pl
---------------------

At first, you have to write this file.  <language> is a name for your
new language.  This name has to be expressed in ASCII codeset (NOT
ISO-8859-1!!!!) and can be any strings you want.  
'support.language.pl.template' file can be used as a template file
for your support.<language>.pl file.

The first line of this file must specify the name of the language,
followed by "#".  This name is used for the language list displayed
when user-ja-conf is invoked.

This file must contain the following subroutines:

 * isNC($$$)
   Check Native Character Environment.
   Return 1 for Native Character Environment and 0 for non-Native
   Character (ASCII) Environment.
   isNC() has to try establishing native character environment,
   for example, by invoking 'kterm' or 'grxvt'.

 * initialize()
   Anything needed for initialization can be written here.
   For example, initialize() can ask which input method the user
   prefers or add required Debian packages to a list using 'addlist'
   subroutine.   Any variables in Sub:: package can be set here to 
   be used in <language>/dot.*.pl.

 * sourceset2displayset($)
   Convert a string from source codeset into display codeset.
   If your language doesn't need this distinction, this subroutine can
   return the parameter.

 * analcode($)
   Check the codeset of the given string(s).  If your language does
   not use multiple codesets, you may leave this do nothing.
   You can determine the meaning of return value because the return
   value is used as a parameter for convcode() subroutine.

 * convcode($$)
   Convert the given string (the 1st parameter) from its codeset 
   into given codeset (the 2nd parameter).  If your language does 
   not use multiple codesets, this subroutine returns the given string.
   analcode() is used to check the codeset of existing dot-files and
   convcode() is used to convert the contents to be added to the same
   codeset as the existing dot-files.

You can use Sub::* subroutines (supplied in general.pl) in these 
subroutines.

support.<language>.pl also has to contain the following variables.

 * %messages
   This hash contains translated messages for set-language-env script.  
   This is like 'catalog' file for gettext.  The KEYs of this hash 
   variable contain the messages written in English (like 'msgid' 
   for gettext) and the VALUEs of this hash contain the translated
   messages.  Each VALUE should contain two messages written in ASCII
   codeset and in native codeset.  "\000" is used to separate these
   two messages.  For example,

   %messages = ( "Spanish" => "Espanol\000Espa\361ol" );

   where "\361" is tilded 'n' in ISO-8859-1.  This will display
   tilded 'n' if the environment can display it and ordinal 'n'
   if the environment cannot display ISO-8859-1 (of course if
   isNC() is implemented properly).  The second message written 
   in native codeset can be omitted if you don't need.  Thus,

   %messages = (
     "French" => "Francais\000Fran\347ais" ,
     "Hello"  => "Bonjour"
   );

   A script to check whether the translation is updated or not
   is supplied.  Type './check_translation'.  

 * $yes_upper, $yes_lower, $no_upper, $no_lower
   Characters to be used for users' answer of 'yes' and 'no'.
   


<language>/dot.*, <language>/dot.*.pl
-------------------------------------

Next you have to write <language>/dot.* and/or <language>/dot.*.pl.

'<language>/dot.*' files are templates for dot-files.  
On invocation of 'set-language-env', the main part of these files
are added to corresponding dot-files with surrounded by delimiter
lines.

The first line of the file has three control fields.

The first column of the first line specify a character used for 
comment in the dot-files.  For example, '#' for .bashrc and ';' 
for .emacs.

The second column of the first line specify whether correspoinding
dot-file should be executable or not.  'x' means executable and ' '
means non-executable.

The third column of the first line specify whether the addition
to the dot-file is done at the top of the dot-file or the bottom of
the dot-file.  ' ' means bottom and 's' means top.

For example, the first line of '<language>/dot.Xresources' may be
'! s' and that of '<language>/dot.xsession' may be '#x '.

The following lines are comment which is displayed when set-langauge-env
is invoked.  The end of the comment is specified by a line of 'END'.
The following lines also are comment and the comment ends by 'END'.
The first comment is written in ASCII characters and the second 
in native characters.

The following lines are the real contents to be added to the 
corresponding dot-file.

For example, '<language>/dot.Xresources' will be

 1: ! s
 2: This is a setting of resources for X clients.
 3: END
 4: This is a setting of resources for X clients.
 5: END
 6: Emacs.Fontset-0: -*-fixed-medium-r-normal--14-*-*-*-*-*-fontset-16
 7: Emacs.Font: fontset-16

.pl version (<language>/dot.*.pl) can generate the real contents
to be added.  Since the first control line and two comments are 
same as non-.pl version, .pl file is not a perl script, strictly
speaking.

For example, '<language>/dot.Xresources.pl' will be

 1: ! s
 2: This is a setting of resources for X clients.
 3: END
 4: This is a setting of resources for X clients.
 5: END
 6: print <<EOF;
 7: Emacs.Fontset-0: -*-fixed-medium-r-normal--14-*-*-*-*-*-fontset-16
 8: Emacs.Font: fontset-16
 9: EOF

though this can be written without perl.

In .pl version, you can refer $DOTFILECONTENTS1 and $DOTFILECONTENTS2
variables.  These variables show the contents of original dotfiles
which should be kept unchanged.  $DOTFILECONTENTS1 will be added before
the contents which the .pl script generates and $DOTFILECONTENTS2 will
be added after it.  Changing the variables doesn't affect the results.


That's all.  Put these files (support.<language>.pl and 
<language>/dot.* and/or <language>/dot.*.pl) at source directory
and type 'debuild'.  Then you can try your language.

You can write a menu item for Debian menu system.  You should
also supply documents and manpages in your language.


XBM file
--------

To keep consistent looking, please make XBM file for your language
using -etl-fixed-medium-r-normal--24-* fonts if available.


Please Contact Me
-----------------

If you want to add a new language support, please contact me.
I will always welcome.


---
Tomohiro KUBOTA <kubota@debian.org>  Mon, 18 Oct 1999