1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398
|
% \iffalse meta-comment
%
% Copyright 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
% The LaTeX3 Project and any individual authors listed elsewhere
% in this file.
%
% This file is part of the LaTeX base system.
% -------------------------------------------
%
% It may be distributed and/or modified under the
% conditions of the LaTeX Project Public License, either version 1.3
% of this license or (at your option) any later version.
% The latest version of this license is in
% http://www.latex-project.org/lppl.txt
% and version 1.3 or later is part of all distributions of LaTeX
% version 2003/12/01 or later.
%
% This file has the LPPL maintenance status "maintained".
%
% The list of all files belonging to the LaTeX base distribution is
% given in the file `manifest.txt'. See also `legal.txt' for additional
% information.
%
% The list of derived (unpacked) files belonging to the distribution
% and covered by LPPL is defined by the unpacking scripts (with
% extension .ins) which are part of the distribution.
%
% \fi
\NeedsTeXFormat{LaTeX2e}[1995/12/01]
\documentclass{ltxguide}[1999/02/28]
\title{Cyrillic languages support in \LaTeX}
\author{\copyright~Copyright 1998--1999,\\ Vladimir Volovich,
Werner Lemberg and \LaTeX3 Project Team.\\ All rights reserved.}
\date{12 March 1999}
\begin{document}
\maketitle
\tableofcontents
\begin{abstract}
This document contains basic information on the Cyrillic setup for
\LaTeX{}: how to get the fonts, how to set them up, how to use
the interface, its interaction with \babel{}, etc. This is only a first
draft of the document and it will probably be modified in future; so
please send in comments on it via the \texttt{latexbug} system
(see below).
\end{abstract}
\section{Introduction}
Most Latin-based European languages were supported in \LaTeX{} by
introducing the~|T1|~font encoding and by using the \textsf{fontenc}
and \textsf{inputenc} packages; these use only standard \TeX{} means
to support any \mbox{8-bit} input encoding and this one standard font
encoding. The restriction to a single font encoding guarantees that
multiple languages can happily coexist in one document (\eg
hyphenation will be correct for all languages).
Starting with the December~1998 Release, \LaTeX{} finally supports
Cyrillic languages. This support is based on the new standard
Cyrillic \TeX{} font encodings---|T2A|, |T2B|, |T2C|, and~|X2|. The
first three of these satisfy some basic requirements for
\LaTeX{}~|T*|~encodings, and thus can be used in multi-lingual documents
with other languages based on standard font encodings.
The reason why we need four different Cyrillic font encodings is that
these font encodings support \emph{all} the Cyrillic languages that
have been used during the twentieth century (see
Section~\ref{fontencs})! The number of Cyrillic glyphs is large, so
they cannot be represented with 128~character slots; the other (lower)
128~slots are reserved for Latin letters and other invariant symbols
that are needed for the encoding to be a conformant
\LaTeX{}~\texttt{T}~encoding.
There are some glyphs in the |T2*|~encodings which do not yet have
associated characters in \emph{Unicode}, the world-wide character
standard. Also, one more font encoding, |T2D|,~is planned for a
forthcoming release of \LaTeX{}. A lot of Cyrillic input encodings
are already supported (see Section~\ref{inputencs}), and additional
encodings could be added easily.
\subsection{Acknowledgments}
\label{sec:acks}
The work on |T2*|~encodings was carried out by the T2~Team, led by
Alexander Berdnikov (other members are Mikhail Kolodin and Andrew
Janishewskii). The LH~fonts were produced by Olga Lapko (with
A.~Khodulev). The \textsf{T2} bundle and \textsf{ruhyphen} package
were written by Werner Lemberg and Vladimir Volovich (except that the
concrete hyphenation patterns which are part of \textsf{ruhyphen} came
from individual authors). The support for the Ukrainian language was
prepared by Andrij Shvaika.
\section{Installation}
The \textsf{fontenc} and \textsf{inputenc} packages are installed
automatically in every base \LaTeX{} distribution.
All the necessary extra files to use with these packages for Cyrillic
are in the \textsf{cyrillic} bundle, which at present contains the
following: four font encoding definition files (|t2aenc.def|,
|t2benc.def|, |t2cenc.def|, |x2enc.def|); several input encoding
definition files (all the other |*.def| files), and font definition
files (|*.fd|).
The installation of these is described here.
\subsection{Fonts}
The default font families in \LaTeX{} are the Computer Modern
families, namely the CM~fonts (|OT1|~encoded) and the EC~fonts
(|T1|~encoded). The LH~fonts, which are now available, provide
Computer Modern fonts for all Cyrillic font encodings. They are
designed to be compatible with the EC~fonts, and they provide the same
font shapes and sizes; they are available at |CTAN:fonts/cyrillic/lh|
(the latest version is 3.20). The installation instructions for the
fonts are in the file |INSTALL| in the font distribution.
Other fonts, including Type~1 fonts, can also be used, provided that
their encoding (for \TeX{}) is \mbox{|T2|-compatible}. Some
ready-to-use packages supporting such fonts are also available, \eg at
\URL{ftp://ftp.vsu.ru/pub/tex} (they should soon be on \ctan). Currently,
you will find two packages there: \textsf{PsCyr}, which contains some
freely distributable Cyrillic Type~1 fonts with support for \LaTeX{};
and \textsf{c1fonts}, which contains virtual fonts similar to the
\textsf{AE}~fonts package using the BlueSky and BaKoMa fonts
available from \ctan{} (see the |README| file in that package for
detailed information). Further font packages are expected soon.
\subsection{Hyphenation patterns}
You can find a collection of hyphenation patterns for the Russian
language in the \textsf{ruhyphen} package at
|CTAN:language/hyphenation/ruhyphen|. These patterns support the
|T2*|~encodings, as well as other popular font encodings used for
Russian typesetting (including the Omega internal encoding).
Patterns for other Cyrillic languages should be adapted to work with
the |T2*|~encodings.
\subsection{\babel{} support for Russian and Ukrainian}
\label{bblrus}
Version~3.6k of \babel{} includes support for the |T2*|~encodings and
for typesetting both Russian and Ukrainian texts using the Cyrillic
letters. The temporary fontencoding |LWN|, which was used in earlier
releases of \babel{}, will be withdrawn in the near future and replaced
by the |OT2| encoding.
\subsection{Getting pre-built packages}
Many of the major \TeX{} distributions, such as te\TeX{}, fp\TeX{} and
\TeX{}live, contain (or soon will) everything that is needed,
including the LH~fonts, \textsf{ruhyphen} and the latest version of
\babel{}. We hope that all \TeX{} distributions will soon include all
of these, so that the chances are that you will not need to install
this by yourself (but it is not difficult).
If you are using em\TeX, Mik\TeX, or fp\TeX, you
can download the \textsf{ruemtex} package from
\URL{ftp://ftp.vsu.ru/pub/tex}.
\section{Usage}
Support for Cyrillic is based on these standard \LaTeX{} mechanisms:
the \textsf{fontenc} and \textsf{inputenc} packages (and on \babel{}).
Thus the basic principles for its use are similar to those for other
European languages: you simply add, to your document preamble, lines
like the following.
\begin{verbatim}
\usepackage[T2A]{fontenc}
\usepackage[koi8-r]{inputenc}
\end{verbatim}
Here you can put any desired input encoding instead of
\mbox{\texttt{koi8-r}}: for example, it would be \texttt{cp866} if you are
using a MS-DOS text editor with this Cyrillic code page to prepare your
documents, or \texttt{cp1251} if you are a MS~Windows user with Cyrillic
support. A full list of the available Cyrillic encodings can be found in
Section~\ref{inputencs} and in the file |cyinpenc.dtx|.
Documents are, naturally, not restricted to a single font encoding;
this is essential for multi-lingual journals or documents. Such
changes can be made by using the |\fontencoding| command as part of a
font-change. However, it is best to access these font encodings via a
higher-level interface.
Since such changes are often closely related to other
language-dependent settings, it is often sensible to use the \babel{}
system, which provides further useful `localisation' and standardised
multi-lingual interfaces (for further details, see
Section~\ref{bblrus}). Then you can use lines like the following in
your document:
\begin{verbatim}
\usepackage[koi8-r]{inputenc}
\usepackage[russian]{babel}
\end{verbatim}
This will automatically choose the default font encoding for Russian,
which is |T2A|, if available. Documentation of the complete set of
font-encoding selection rules can be found in |cyrillic.dtx| which is
part of |rusbabel|.
These \LaTeX{} interfaces are very convenient because they make your
documents completely portable, being based solely on standard \TeX{}
features. This will mean that your documents can be processed on any
\TeX{} system without any need for re-encoding to the `native'
encoding used on each platform; this is because the encoding of the
document is specified in the document itself.
Moreover, if necessary, more than one input encoding can be used
within a document; this could be useful if, for example, you need to
combine articles prepared by authors on different machines. Each part
of the document is then identified by a |\inputencoding| command,
which can therefore only be used between paragraphs.
Please note that you must always use the two standard \LaTeX{}
commands, |\MakeUppercase| and |\MakeLowercase| to produce uppercase
or lowercase text in your documents. This is because |\uppercase| and
|\lowercase| will not work at all for Cyrillic (note that these latter
two commands are not, and never have been, available for use directly
in \LaTeX{} documents).
\section{Font encodings for Cyrillic languages}
\label{fontencs}
The Cyrillic font encodings support the following languages. Note
that some languages can be properly typeset with more than one
encoding.
\begin{itemize}
\raggedright
\item[|T2A|:]
Abaza, Avar, Agul, Adyghei, Azerbaijani, Altai, Balkar, Bashkir,
Bulgarian, Buryat, Byelorussian, Gagauz, Dargin, Dungan, Ingush,
Kabardino-Cherkess, Kazakh, Kalmyk, Karakalpak, Karachaevskii,
Karelian, Kirghiz, Komi-Zyrian, Komi-Permyak, Kumyk, Lak, Lezghin,
Macedonian, Mari-Mountain, Mari-Valley, Moldavian, Mongolian,
Mordvin-Moksha, Mordvin-Erzya, Nogai, Oroch, Osetin, Russian, Rutul,
Serbian, Tabasaran, Tadzhik, Tatar, Tati, Teleut, Tofalar, Tuva,
Turkmen, Udmurt, Uzbek, Ukrainian, Hanty-Obskii, Hanty-Surgut,
Gipsi, Chechen, Chuvash, Crimean-Tatar.
\item[|T2B|:]
Abaza, Avar, Agul, Adyghei, Aleut, Altai, Balkar, Byelorussian,
Bulgarian, Buryat, Gagauz, Dargin, Dolgan, Dungan, Ingush, Itelmen,
Kabardino-Cherkess, Kalmyk, Karakalpak, Karachaevskii, Karelian,
Ketskii, Kirghiz, Komi-Zyrian, Komi-Permyak, Koryak, Kumyk, Kurdian,
Lak, Lezghin, Mansi, Mari-Valley, Moldavian, Mongolian,
Mordvin-Moksha, Mordvin-Erzya, Nanai, Nganasan, Negidal, Nenets,
Nivh, Nogai, Oroch, Russian, Rutul, Selkup, Tabasaran, Tadzhik,
Tatar, Tati, Teleut, Tofalar, Tuva, Turkmen, Udyghei, Uigur, Ulch,
Khakass, Hanty-Vahovskii, Hanty-Kazymskii, Hanty-Obskii,
Hanty-Surgut, Hanty-Shurysharskii, Gipsi, Chechen, Chukcha, Shor,
Evenk, Even, Enets, Eskimo, Yukagir, Crimean Tatar, Yakut.
\item[|T2C|:]
Abkhazian, Bulgarian, Gagauz, Karelian, Komi-Zyrian, Komi-Permyak,
Kumyk, Mansi, Moldavian, Mordvin-Moksha, Mordvin-Erzya, Nanai,
Orok (Uilta), Negidal, Nogai, Oroch, Russian, Saam, Old-Bulgarian,
Old-Russian, Tati, Teleut, Hanty-Obskii, Hanty-Surgut, Evenk,
Crimean Tatar.
\end{itemize}
The |X2|~encoding was designed to support all the above languages.
Its name does not start with |T| because, for example, it contains no
Latin letters (it is purely a Cyrillic glyph container); it therefore
cannot be used in mixed-script documents along with the other |T*|
encodings. Please consult Section~6.4 \textit{Naming conventions} of
the file |fntguide.tex| in the base \LaTeX{} distribution for details
of the differences between \LaTeX{} font encodings and how they are
named.
There are two other \LaTeX{} Cyrillic font encodings, |OT2| and |LCY|,
that are not included in the base \LaTeX{} distribution. The first is
a \mbox{7-bit} encoding (hence the |O|) developed by the AMS; it is
useful for typesetting relatively small fragments of text in Cyrillic,
using a Latin transliteration scheme. The other, |LCY|, is an
\mbox{8-bit} Cyrillic encoding which is not compatible with the
requirements for \LaTeX{} |T*|~encodings (hence the |L|); thus it is not
suitable for typesetting multi-lingual documents, but it can be used in
Plain \TeX{}-based macro packages because it is an extension of |OT1|.
These two encodings are supported by \babel{} and by \textsf{ot2cyr}.
\section{Input encodings}
\label{inputencs}
Several Cyrillic code-pages are widely used. Currently, \LaTeX{}
contains support for 20~Cyrillic input encodings (some of which are
variants of each other).
\begin{itemize}
\item |cp855| --- the standard \mbox{MS-DOS} Cyrillic code-page.
\item |cp866| --- the standard \mbox{MS-DOS} Russian code-page.
Several code-pages very similar to this are also supported
(the differences are all in the range 242--254).
\begin{itemize}
\item |cp866av| -- the `Cyrillic Alternative' code-page (an
alternative variant of cp866);
\item |cp866mav| -- the `Modified Alternative Variant';
\item |cp866nav| -- the `New Alternative Variant';
\item |cp866tat| -- an experimental Tatarian code-page.
\end{itemize}
\item |cp1251| --- the standard MS Windows Cyrillic code-page.
\item \mbox{\texttt{koi8-r}} --- a standard Cyrillic code-page widely
used in UNIX-like systems for Russian language support that is
specified in RFC~1489. The situation with \mbox{\texttt{koi8-r}} is
somewhat similar to that for |cp866|: there are several similar
code-pages which coincide for all Russian letters but add some other
Cyrillic letters. The following are supported:
\begin{itemize}
\item \mbox{\texttt{koi8-u}} -- for Ukrainian;
\item \mbox{\texttt{koi8-ru}} -- this is described in a draft RFC
document specifying a widely used character set for mail and news
exchange in the Ukrainian internet community, as well as for
presenting WWW information resources in the Ukrainian language;
\item |isoir111| -- the \mbox{ISO-IR-111 ECMA} Cyrillic Code Page.
\end{itemize}
\item |iso88595| --- the \mbox{ISO 8859-5} Cyrillic code-page (also called
\mbox{ISO-IR-144}).
\item |maccyr| --- the Apple Macintosh Cyrillic code-page (also known
as Microsoft cp10007) and |macukr|, the Apple Macintosh Ukrainian
code-page, very similar to the Cyrillic code-page.
\item The Mongolian code-pages: |ctt| |dbk| |mnk| |mos| |ncc| |mls|.
These code-pages were taken from Oliver Corff's `Mon\TeX' package
(available at |CTAN:language/mongolian/montex|). Since the |T2*|
encodings support the Mongolian Cyrillic script, it is convenient to
have support for Mongolian input encodings as well. Pointers to
documentation for these code-pages will be much appreciated.
\end{itemize}
\section{Reporting bugs}
In case you find a bug and want to report it, please follow the
guidelines given in the file |bugs.txt| in the base \LaTeX{}
distributions. Note that there is a category specifically for
reporting any bugs that occur only when using Cyrillic fonts or
support packages.
\section{Miscellanea in the \textsf{T2} bundle}
\label{t2m}
The \textsf{T2}~bundle at |CTAN:macros/latex/contrib/supported/t2|
contains some other useful files, including support for Plain
\TeX{}-based macro packages, support for Bib\TeX{} and MakeIndex (see
also the \textsf{xindy} program and package---highly recommended for
making indices with Cyrillic), support for the \textsf{fontinst}
package, mapping tables relating these Cyrillic font encodings (and
input encodings) to the Unicode character names and slots (these are
in the subdirectory |enc-maps|), and more!
To produce documented source listings of the \textsf{T2}~package, run
\LaTeX{} on the |*.dtx| and |*.fdd| files therein.
When typesetting Cyrillic texts, there is a tradition of using
Cyrillic letters (in some situations) within math formul\ae, in
exactly the same way as most of the world uses Latin letters.
By default this does not work, because symbols declared with
|\DeclareTextSymbol| may not be used in math.
If you need within math to `transparently' typeset glyphs declared in
font encoding definition files, then you could try using the
experimental \textsf{mathtext} package, which is also in the
\textsf{T2}~bundle. Note that this package uses up at least one
additional math alphabet per font encoding. For this and other
reasons, The \LaTeX3 Project Team considers that this experimental
extension to \LaTeX{}'s glyph-handling mechanisms should be used with
caution; but please try it out and send us your opinions and ideas.
Note that it is not included in the core of \LaTeX{} because both the
coding and the interfaces are likely to change at some point in the
future.
Finally, here are some pointers to further information:
\begin{quote}
\URL{http://www.cemi.rssi.ru/cyrtug}\\
\URL{http://xtalk.price.ru/tex}
\end{quote}
\end{document}
|