File: README.md

package info (click to toggle)
libtemplate-plugin-gettext-perl 1.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 352 kB
  • sloc: perl: 1,317; makefile: 2
file content (519 lines) | stat: -rw-r--r-- 18,644 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
# Template-Plugin-Gettext

Localization for the Template Toolkit 2

The POD version of this document exhibits errors.  Consider reading the markdown version instead at https://github.com/gflohr/Template-Plugin-Gettext.

## Description

This Perl library offers an end-to-end localization and internationalization solution for the Template Toolkit 2.
It consists of a plugin that offers translation functions inside templates
and a string extractor `xgettext-tt2` that extracts translatable strings
from templates and writes them to PO files (or rather a `.pot` file in PO
format).  The string extractor `xgettext-tt2` is fully
customizable and also usable for other i18n plugins or
frameworks for the Template Toolkit.

## Usage

The solution offered by this library is suitable for templates that have 
a lot of markup (normally HTML) compared to text.  If the files contain
a lot of content other solutions are probably more suitable.  One of them
is [xml2po](https://github.com/mate-desktop/mate-doc-utils/tree/master/xml2po),
especially if the input format is HTML.

If the input format is Markdown, for example for a static side generator,
a feasible approach may be to simply split the input into paragraphs, and
turn each paragraph into an entry of a PO file.

In the following, we will assume that you have decided to localize
templates with this library.

### Templates

The first step is to mark all translatable strings.  This serves
a double purpose.  Strings are marked, so that the extractor 
`xgettext-tt2` can find them and write them into a translation file 
in PO format.

The second purpose is that these markers are also valid functions
resp. filters for the template toolkit and will interpolate the
translations for these messages into the output, when rendering the
template.  As a result, your templates remain pretty readable after
localizing them.

In every source file that you want to use translations, you have
to `USE` the template:

    [% USE gtx = Gettext('com.mydomain.www', 'fr') %]

Do *not* forget to `USE` the plug-in in all templates!  The template
toolkit will not warn you, when you forget it but the translation 
mechanism will not work!

The first argument is the so-called *textdomain*.  This is the
identifier for your message catalogs and also the basename of several
files.  In the example above, the translated message catalog would
be searched as *`LOCALEDIR`*`/fr/LC_MESSAGES/com.mydomain.www.mo`. The second parameter is the language.  This will normally come from
a variable instead of a hard-coded string.

A possible third argument (omitted in the example) is the character
set to use, all following arguments are additional directories to
search first for translations.

The default list of directories is:

* `./locale`
* `/usr/share/locale`
* `/usr/local/share/locale`

The directory `./locale` is relative to the current working directory
from where you invoke the template processor.

#### Simple Translations With `gettext()`

The simplest and most common way of doing things is:

```html
[% USE gtx = Gettext('com.mydomain.www', lang) %]

<title>[% gtx.gettext("World Of Themes") %]</title>
    
<h1>[% "Introduction" | gettext %]

<p>
[% FILTER gettext %]
The "World Of Themes" is the ultimate source of templates
for the Template Toolkit.
[% END %]
</p>
```

This shows three different ways of localizing strings.  You can
use the function `gtx.gettext()`, the filter `gettext` with pipe
syntax, or the same filter with block syntax.  The result is always
the same.  The string will be recognized as translatable by 
`xgettext-tt2` and it will be translated into the selected language,
when rendering the template.

#### Interpolating Strings Into Translations

One important thing to understand is that the argument to the
gettext functions or filters is the lookup key into the translation
database, when the template gets rendered.  That implies that this
key has to be invariable and must not use any interpolated variables.

```html
[% USE gtx = Gettext('com.mydomain.www', lang) %]

[% gtx.gettext("Hello, $firstname $lastname!") %]
```

This template code is syntactically correct and will also render
correctly.  But `xgettext-tt2` will bail out on it with an error
message like

    templates.html:3: Illegal variable interpolation at "$"

The function `gettext()` will receive the interpolated string
as its argument, and that is not the same as the string that
the extractor program `xgettext-tt2` sees.  And that means that
the translation cannot be found.

The correct way to interpolate strings uses `xgettext()`:

```html
[% USE gtx = Gettext('com.mydomain.www', lang) %]

[% gtx.xgettext("Hello, {first} {last}!",
                first => firstname, last => lastname) %]
[% "Hello, {first} {last}!" | xgettext(first => firstname, 
                                       last => lastname) %]
[% FILTER xgettext(first => firstname, last => lastname) %]
Hello, {first} {last}!
[% END %]
```

One additional benefit of this is that the extractor program
`xgettext-tt2` will also mark these strings with the flag
"perl-brace-format".  When the translation from the `.po`
file gets compiled into an `.mo` file, the compiler `msgfmt`
checks that the translated strings contains exactly the same
placeholders as the original.

One thing that you should also avoid is to assemble strings
in the template source code.  Do *not*:

```html
[% gtx.gettext("Please contact") %] [% name %]
[% gtx.gettext("for help about the") %] [% package %]
[% gtx.gettext("software.") %]
```

This will result in three translatable text snippets
"Please contact", "for help about the", and "software." that
are hard to translate without context.  Besides it makes
illegal assumptions about the word order in translated sentences.
Instead, use `xgettext()` and write in complete sentences with
placeholders.

By the way, the `x` in the function `xgettext()` stands for *eXpand*
while the `x` in the program `xgettext-tt2` or GNU Gettext's
`xgettext` program stands for *eXtract*.

#### Plural Forms

Do *not* write this:

```html
[% IF num != 1 %]
[% gtx.xgettext("{number} documents deleted!", number => num) %]
[% ELSE %]
[% gtx.gettext("One document deleted!") %]
[% END %]
```

This assumes that every language has one singular and one plural
(and no other forms) and that the condition that selects the correct
form is always `COUNT != 1`.  But this is wrong for many languages
for example Russian (two plural forms), Chinese (no plural), French
(different condition), and many more.

Write instead:

```html
[% USE gtx = Gettext('com.mydomain.www', lang) %]

[% gtx.nxgettext("One document deleted.", 
                 "{count} documents deleted."
                 num,
                 count => num) %]
```

The function `nxgettext()` receives the singular and plural
form as the first and second argument, followed by the number
of items, followed by an arbitrary number of key/value pairs
for interpolating variables in the strings.

There is also a function `ngettext()` that does not expand
its first two arguments.  You will find out that you almost
never need that function.

You can also use `nxgettext()` and `ngettext()` as filters.
But the necessary code is awkward, and their use is therefore
not recommended.

#### Ambiguous Strings (message contexts)

Sometimes an English string has different meanings in other
languages:

```html
[% USE gtx = Gettext('com.mydomain.www', lang) %]

[% gtx.gettext("State:") %]
[% IF state == '1' %]
[% gtx.pgettext("state", "Open") %]
[% ELSE %]
[% gtx.gettext("Closed") %]
[% END %]
<a href="/action/open">[% gtx.pgettext("action", "Open") %]</a>
```

The function `pgettext()` works like gettext but has one 
extra argument preceding the string, the so-called
message context.  The string extractor `xgettext-tt2` will now
create two distinct messages "Open", one with the context "state",
the other one with the context "action".  The sole purpose of this
context is to disambiguate the string "Open" for languages where the
verb ("to open") and the adjective ("the door is *open*") has
two distinct translations.

You will normally use this function, when a translator asks you
to do so, but not on your own behalf.

There is also a function `pxgettext()` that supports placeholder
interpolation, and `npxgettext()` that has the following semantics:

```perl
npxgettext(CONTEXT, SINGULAR, PLURAL, COUNT,
           KEY1 => VALUE1, KEY2 => VALUE2, ...)
```

#### More Esoteric Functions

The [API documentation](lib/Template/Plugin/Gettext.pod#user-content-FUNCTIONS) contains
some more functions and filters that are available for completeness.
You will never need them in normal projects.

#### Translator Hints

You can add comments to the source code that are copied into the
`.po` file as hints for the translators.  This will look like
this:

```html
[% USE gtx = Gettext('com.mydomain.www', lang) %]

<!-- TRANSLATORS: This is the day of the week! -->
[% gtx.gettext("Sun") %]
```

In order to make that work, you have to invoke the extractor
program `xgettext-tt2` like this:

    xgettext-tt2 --add-comments=TRANSLATORS: t1.html t2.html ...

#### Modifying Flags

In rare situations, you may need the following:

```html
[% USE gtx = Gettext('com.mydomain.www', lang) %]

<!-- xgettext:no-perl-brace-format -->
[% gtx.xgettext("Value: {value}", value => whatever) %]
```

Normally, the argument of `xgettext()` will be flagged in
the `.po` file with "perl-brace-format", and a translation
will fail to compile if the translation does not contain exactly
the same placeholders as the original does.

You can override that default behavior for individual messages
by placing a comment containing the string "xgettext:" directly
in front of the string.

### Translation Workflow

The translation workflow is the standard workflow known from GNU 
Gettext.  All files relevant for translations are conventionally
kept in a subdirectory `po`.

You can save time if you use the seed project
[Template-Plugin-Gettext-Seed](https://github.com/gflohr/Template-Plugin-Gettext-Seed)
as a base.  It contains a directory `po` ready for use,
with --- at your choice --- a Makefile or a script `po-make.pl`
that automates the entire translation workflow.  It is also
prepared for extracting strings from other sources than
template files.  In that example, these are Perl source files,
but it will work in a similar fashion for other programming
languages.

But rolling your own version is also simple.  Just read on.

#### Extracting Strings With `xgettext-tt2`

Extracting translatable strings from templates for the Template
Toolkit 2 is as easy as:

```shell
$ xgettext-tt2 TEMPLATE....
```

This will scan all files given as arguments for translatable strings
and create a file `messages.po` with the strings found.

The normal invocation of `xgettext-tt2` is normally a little bit more
sophisticated:

```shell
$ xgettext-tt2 --files-from=POTFILES \
    --output=com.mydomain.www.pot \
    --add-comments=TRANSLATORS: --from-code=utf-8 \
    --force-po
```

You can, of course, write everyting in one line and omit the backslashes.

Specifying all input files as arguments on the command-line can
quickly become unwieldy.  It is more common to put the list of input
files into a text file, each input file on one line, and instruct
`xgettext-tt2` to read it with the option `--files-from`.  The name
of the file is by convention `POTFILES`.

The output file is normally a file `TEXTDOMAIN.pot`, where 
`TEXTDOMAION` is the identifier selected in the templates.  The
reverse hostname of the server serving the rendered templates
is a good choice.

If you want to be able to give hints to translators in the source
files, you have to specify the trigger string --- normally
"TRANSLATORS:" --- with the option `--add-comments`.  Specifying
an empty string (`--add-comments=''`) instructs `xgettext-tt2` 
to copy all comments into the `.pot` file.

If your templates contain characters outside of US-ASCII, you should
specify the character set of the template files with the option
`--from-code=CODESET`.

The option `--force-po` instructs `xgettext-tt2` to write an output
file even if no translatable strings had been found.  But this
is a matter of taste.  Omit the option, if you prefer it.

`xgettext-tt2` has a lot more options.  They are mostly compatible
with the ones of `xgettext` from GNU gettext for C, Perl, and
a lot more languages.  See the [documentation for GNU Gettext's
xgettext](https://www.gnu.org/software/gettext/manual/html_node/xgettext-Invocation.html)
and the documentation for [Locale::XGettext](https://github.com/gflohr/Locale-XGettext/blob/master/lib/Locale/XGettext.pod)
for more information.

By the way, why is the ouput file a `.pot` file and not a `.po`
file?  It is the *template* for the `.po` files for the individual
languages.  You never edit that file, but re-generate it, whenever
the source files have changed.  Hence, it only contains strings
in the original, in the base language.

#### Creating Translation Files

For each supported language (except for the base language) you
should create a file `LL.po`, where `LL` is the two-letter
language code for that language, for example `fr.po`, `de.po`,
or `it.po`.  You can also specify the combination of language
and country like in `de_DE.po` or `pt_BR.po`.

One option for that is to simply copy the `.pot` file and
edit the header accordingly.  It is normally easier to do that
with the program `msginit`:

```
$ msginit --input=com.mydomain.www.pot --locale=fr
```

Replace `TEXTDOMAIN.pot` with the name of the `.pot` file, and
`fr` with the language in question.  This will prefill a lot
of fields in the `.po` file.

#### Compiling Translation Files

The translated `.po` files are compiled with the program `msgfmt`:

```shell
$ msgfmt --check --statistics --verbose -o fr.mo fr.po
fr.po: 212 translated messages, 1 fuzzy translation, 3 untranslated messages.
```

This will compile the translation file `fr.po` into a binary
file `fr.mo`.  It also checks the translations for formal errors
and print statistics about the number of translated and
untranslated strings.

#### Installing Translation Files

The plugin does not use `.po` files for looking up translations
but the binary `.mo` files.  But it has to find them.

You have to decide for one of the directories that 
`Template::Plugin::Gettext` searches for translations.  The
default order is:

* `@INC/LocaleData`
* `/usr/share/locale`
* `/usr/local/share/locale`

The first line means that every directory `LocaleDir` inside
Perl's include directories is searched for translation files.
Keep in mind, that for security reasons the current directory
(`.`) is nowadays often *not* in Perl's `@INC`.

Let's assume that `/var/www/lib` is in Perl's @INC.  You would
then install the French translation file `fr.mo` as `/var/www/lib/LocaleData/fr/LC_MESSAGES/com.mydomain.www.mo`.
`TEXTDOMAIN` is a placeholder for the textdomain you have
selected (and `LC_MESSAGES` is *not* a placeholder but a real 
directory name).

That is good except for the fact that `/var/www/lib` is usually
not in Perl's `@INC`.  But you can change that where you invoke
the template processor:

```perl
BEGIN {
    unshift @INC, '/var/www/lib';
}

use Template;

Template->new->process('template.html', $data);
```

You can completely override the default search order in the
templates:

```html
[% USE gtx = Gettext('com.mydomain.www', lang, 'utf-8', 
                     '/var/www/locale', '/srv/www/locale')]
```

Now, the French translation would be searched in 
`/var/www/locale/fr/LC_MESSAGES/com.mydomain.www.mo` and 
`/var/www/locale/fr/LC_MESSAGES/com.mydomain.www.mo`.

#### Updating Translation Files

Translations may become obsolete, when the source templates
change.  In this case, you have to merge the new set of 
translatable strings into the existing translation files.
Fortunately, GNU Gettext makes this easy:

```shell
$ xgettext-tt2 --files-from=POTFILES \
    --output=com.mydomain.www.pot \
    --add-comments=TRANSLATORS: --from-code=utf-8 \
    --force-po
$ cp fr.po fr.old.po
$ msgmerge fr.old.po com.mydomain.www.pot -o fr.po
....... done
```

You first update the `.pot` file with `xgettext-tt2` so that it
contains the current set of translatable strings.  You then
make a backup of each `.po` file and then invoke the program
`msgmerge` for merging the current translations from `fr.old.po`
with the new set of strings from `com.mydomain.www.pot` into
the updated translation file `fr.po`.

The file `fr.po` will now contain the new strings as untranslated
entries.  Strings that have only slightly change will retain their
translations but they will be marked as "fuzzy", so that they
can be reviewed by a translator.  Entries for strings that are
no longer present in the sources are obsoleted.

#### Integrating With Other Programming Languages

The GNU Gettext framework is available for a lot of programming
languages and it is not uncommon that two or more of these 
languages are mixed in a project.  It is beneficial in these
cases to use a common translation base for all used 
technologies.

`xgettext-tt2` is based on [`Locale-XGettext`](https://github.com/gflohr/Locale-XGettext)
and therefore not only understands Template Toolkit templates
but also `.po` and `.pot` files as input.  GNU Gettext's xgettext
has the same feature.

Accumulating all translatable strings from the different
technologies is therefore very easy.  If you have a project
that uses Template Toolkit for rendering web pages and Perl
for the business logic you first extract strings from your 
Perl files --- as usual --- with `xgettext` from GNU gettext
into a temporary file, for example `plfiles.pot`.  Then you
extract the strings from the templates with `xgettext-tt2` 
from this library, but you specify `plfiles.pot` as an 
additional input file. And now the output file of `xgettext-tt2`
contains all the strings from the template files *plus* those
from the Perl files in `plfiles.pot`.

Of course, you can also do it the other way round, extract
with `xgettext-tt2` into `ttfiles.pot`, and then feed that as
an additional input file to GNU Gettext's `xgettext`.

You can use the seed project [Template-Plugin-Gettext-Seed](https://github.com/gflohr/Template-Plugin-Gettext-Seed)
as a fully functional starting point for such setups.

## Bugs

Please report bugs at [https://github.com/gflohr/Template-Plugin-Gettext/issues](https://github.com/gflohr/Template-Plugin-Gettext/issues)

## Author

Template-Plugin-Gettext was written by [Guido Flohr](http://www.guido-flohr.net/).