File: README.md

package info (click to toggle)
tidy-html5 1%3A5.2.0-2
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 3,384 kB
  • ctags: 4,296
  • sloc: ansic: 36,959; ruby: 841; sh: 293; cpp: 30; makefile: 9
file content (448 lines) | stat: -rw-r--r-- 18,770 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
# README.md - Localize HTML Tidy

Thank you for your interest in helping us localize HTML Tidy and LibTidy. Users
throughout the world will thank you.

The document describes Tidy's localization philosophy and instructs you on how
you can use standard `gettext` tools to generate language and region
localizations that will work with Tidy. Optionally instructions are included in
the event that you want to build Tidy with your new language.

## Contents:
- [Introduction](#introduction)
  - [PO and POT files](#po-and-pot-files)
  - [H files](#h-files)
  - [Differences for translators](#differences-for-translators)
  - [`poconvert.rb` versus `gettext`' tools](#poconvertrb-versus-gettext-tools)
- [How to Contribute](#how-to-contribute)
  - [Find or Create the Translation Files](#find-or-create-the-translation-files)
  - [Issue a Pull Request to HTML Tidy](#issue-a-pull-request-to-html-tidy)
  - [Using Git appropriately](#using-git-appropriately)
  - [Repository Notes](#repository-notes)
- [Adding Languages to Tidy](#adding-languages-to-tidy)
- [Best Practices](#best-practices)
  - [Language Inheritance](#language-inheritance)
  - [String Inheritance](#string-inheritance)
  - [Base Language First and Regional Variants](#base-language-first-and-regional-variants)
  - [Positional Parameters](#positional-parameters)
- [Testing](#testing)
  - [Command line option](#command-line-option)
  - [Changing your locale](#changing-your-locale)
  - [East Asian Languages](#east-asian-languages)
- [gettext](#gettext)
- [poconvert.rb](#poconvertrb)
  - [Create a new POT file](#create-a-new-pot-file)
  - [Create a new POT file with non-English `msgid` strings](#create-a-new-pot-file-with-non-english-msgid-strings)
  - [Convert an existing H to PO](#convert-an-existing-h-to-po)
  - [Convert an existing H to PO using a different `msgid` language](#convert-an-existing-h-to-po-using-a-different-msgid-language)
  - [Create a blank PO file for a particular region](#create-a-blank-po-file-for-a-particular-region)
  - [Create a Tidy Language Header H file](#create-a-tidy-language-header-h-file)
  - [Prepare your non-English PO for a PR](#prepare-your-non-english-po-for-a-pr)
  - [Update your PO to match the new POT](#update-your-po-to-match-the-new-pot)
- [Help Tidy Get Better](#help-tidy-get-better)

  
## Introduction

HTML Tidy is built around the localization file `language_en.h`; without this
file HTML Tidy will not work. As such _all_ language localization work
originates from this single file.

Language localizations use header files that are identical to `language_en.h`,
except that they have different strings. For the convenience of language
translators, though, Tidy source code includes a Ruby `poconvert.rb` script
that enables _optional_ gettext PO/POT work streams that may be more comfortable
to them.


### PO and POT files
HTML Tidy provides PO and POT files for language translations. The file 
`tidy.pot` is the correct template to use as a basis for new translations. In a
typical `gettext` workflow a translator will use the `tidy.pot` file to create a
language translation PO file that contains original English strings and the
translated strings.

If a language has already been translated (or if the translation has begun) then
PO files may already exist. These files are named `language_ll.po` or
`langage_ll_CC.po`, where `ll` represents the language code, and optionally,
`CC` represents the region code of the translation.

Tidy does not use MO files that `gettext` tools generate from PO files.

Please note that these PO and POT files are provided for translator convenience
only. Tidy's [header files](#h-files) constitute the true, controlled source
code for Tidy.


### H files

Tidy does not use `gettext` to display strings and so `gettext`-generated MO
files are not necessary. Instead translated PO files must be converted to Tidy's
language header H file format. Translators are not required to perform this
step, but we provide a tool to perform this function if desired.


### Differences for translators

Experienced users and translators of PO files may note that we use the PO file's
`msgctxt` field a bit uniquely. Rather than point to a line in the source code,
it contains a reference to the string's identifier. Because the PO format does
not allow for arbitrary metadata this is a requirement for generating our
header files.

If you're the type of translator the does dig into the source code, then this
`msgtext` symbol is still useful to you and adds a single extra step to finding
where a string is in context: a symbol or string search using the `msgctxt`
value will reveal the context in source code.

Finally the `msgid` field is a throwaway; Tidy's language tools do not use this
value and so it's only for the translator's convenience. This fact makes it
convenient for translators to translate from languages other than English,
which is fully supported by our tools.


### `poconvert.rb` versus `gettext`' tools

Please don't use `gettext`' tools with our PO and POT files (unless you are
using our strings for a different project). Instead all workflows can be
accomplished with our `poconvert.rb` tool.

[More information about this tool](#h-files) can be found below.


## How to Contribute

### Find or Create the Translation Files
If you've not already cloned the HTML Tidy source code repository that will be
your first step.

In the `localize\translations\` directory you can find existing languages, e.g.,

  - `tidy.pot` (Tidy's POT template for translations).
  - `language_en_gb.po` (British English variants for the built in language)
  - …and perhaps more.
  
In the `src\` directory you can find the master files for existing languages,
e.g.,

 - `language_en.h` (Tidy's native, built-in language, mostly U.S. English)
 - `language_en_gb.po` (British English variants for the built in language)
 - …and perhaps more.
 
Although the header files are the master files for HTML Tidy, we understand that
not all potential translators want to edit C files directly. Therefore as an
option, the following workflow to use POT and PO files is offered.

If the language that you want to work on is already present:

  - Simply open the file in your favorite PO editor and then get to work.
  - Note that although you can use a text editor, we recommend that you use a
    dedicated PO editor so that you don't accidentally make the file illegible
    to our conversion utility.
    
If the language that you want to work on is _not_ already present:

  - You can open `tidy.pot` in your favorite PO editor and use its functions
    to begin a new translation into your desired language.
  - Note that although you can use a text editor, we recommend that you use a
    dedicated PO editor so that you don't accidentally make the file illegible
    to our conversion utility.
  - To perform the work manually:
    - Copy `tidy.pot` to `language_ll.po` (for a non-regional variant, or base
      language), or to `language_ll_cc.po` (for a region-specific variant),
      where `ll` indicates the two letter language code and `cc` indicates the
      two letter region or country code.
    - Change the pertinent PO header section accordingly.
  - Use `poconvert.rb` to generate a PO:
    - `poconvert.rb msginit --locale ll`, where `ll` indicates the language
      code for the language you want to translate to. The tool recognizes the
      same languages as `gettext`' `msginit`. If your chosen language is not
      supported, then please see the manual method, above.
    - See also `poconvert.rb help` for more options.
  - See GNU's [The Format of PO Files](https://www.gnu.org/software/gettext/manual/html_node/PO-Files.html)
    for more specific instructions and important information.

### Issue a Pull Request to HTML Tidy

Once your translation is complete commit your entire HTML Tidy repository to
GitHub and issue a pull request (PR) against the `master` branch. If accepted a
friendly developer will convert your PO into a format useful to Tidy if your
PR is a PO, or will simply merge your changed header file if you changed it
directly.

You are also welcome to perform any conversions yourself, add new languages to
Tidy, and issue a PR for the whole change.


### Using Git appropriately

 1. Fork the repository to your GitHub account.
 2. Optionally create a **topical branch** - a branch whose name is succinct but
    explains what you're doing, such as "localize Portuguese".
 3. Make your changes, committing at logical breaks.
 4. Push your work to your personal account.
 5. [Create a pull request](https://help.github.com/articles/using-pull-requests).
 6. Watch for comments or acceptance.


### Repository Notes

If you are working with PO files then please **only** commit PO files with 
_English_ `msgid` fields. The `gettext` convention specifies only English 
`msgid`, and other translators may not understand the original strings.

Our `poconvert.rb` script can generate PO files using another language as
`msgid`. This can be very useful if it's easier for you to translate from
another language instead of English. It can also be useful for translating from
a base language to a regional variant, such as from Spanish to Mexican Spanish.

If you choose to work locally with a non-English PO, you can easily convert
your PO to a Tidy header file and back to an English-based PO using our
`poconvert.rb` script. See its documentation (`poconvert.rb help`) for
instructions.


## Adding Languages to Tidy

Although we don't require you to follow these steps to contribute a language
to Tidy, you may want to add the language to Tidy yourself to test the
translation, or to save one of the developer team a few extra steps.

  - Generate the header files if necessary:
    - Convert your PO file to a Tidy header file by executing
      `poconvert.rb msgfmt <path_to_your_file.po>`. Note that on Windows you
      will likely have to preface this line with `ruby`.
    - The tool should generate a file named `language_ll_cc.h` in the working
      directory, where `ll_cc` will be replaced with the language/region of your
      translation.
    - Copy this `.h` file into `src\`.
  - Modify Tidy's source:
    - Edit the file `src\language.c` to ensure that the new `.h` file you added
      is in the `#include` section.
    - Look for the `static tidyLanguagesType tidyLanguages` structure starting
      near line 40, and look for the comment `These languages are installed.`.
      You can add your new language to the list along with the other languages
      present, following the same format.
  - Build Tidy:
    - Build Tidy per the usual instructions, and try it out using the `-lang`
      option.
    

## Best Practices

### Language Inheritance

HTML Tidy will fallback from the specified language to the base language and
then finally to the default English as required. This means, for example, a
programmer might set `libtidy` to use “es_mx”, and if it doesn’t exist Tidy
will automatically use “es”. If that doesn’t exist `libtidy` will continue to
use whatever language it is currently using.


### String Inheritance

HTML Tidy will also fallback for individual strings. For example if `libtidy`
is set to use “es_mx” and a particular string is requested and not found, the
library will look for the string in “es”. If the string is not found there then
the “en” string will be given.


### Base Language First and Regional Variants

Because of this inheritance we hope to localize base languages first, as the
only strings then required for regional variants are the strings that differ.
This will help us keep HTML Tidy and `libtidy` small.

If you are working on a regional variation (such as “us_CA”) please only
localize strings that are actually _different_ from the base language!


### Positional Parameters

Please note that HTML Tidy does not current support positional parameters. Due
to the nature of most of Tidy's output, it's not expected that they will be
required. In any case, please translate strings so that substitution values are
in the same order as the original string.


## Testing

We hope to develop a comprehensive test suite in the future, but in the meantime
you can test localized output like this.

### Command line option

Use the `-lang`/`-language` option and specify a POSIX or Windows language name.
This option should be first option used because the console application parses
and acts on options first-in, first-out.

### Changing your locale

On Unix/Mac and Linux operating systems you can change your shell’s locale
temporarily with:

`export LANG=en_GB`
`export LC_ALL=en_GB`

…substituting, of course the language of your choice.

### East Asian Languages

East Asian languages are completely supported and have been tested on Linux,
Mac OS X, and Windows, although Windows requires you to set your operating
system (not the console locale!) to an East Asian locale to enable this in
Windows Console and PowerShell. Note that PowerShell ISE always supports East
Asian languages without requiring you to change your operating system locale.


## gettext

Although HTML Tidy uses `gettext`-compatible tools and PO files for language
localization, Tidy itself does _not_ use `gettext`. Tidy's build philosophy is
build it anywhere and build it with anything. As `gettext` is not universally
available on every platform under the sun, Tidy cannot count on `gettext`.

Instead Tidy builds all translations into its library (and command line
executable if built monolithically), and can run on virtually any general
purpose computer with any operating system.

While this does not pose a significant problem for storage or execution space
on modern PC's, we understand that certain applications may still be space
critical. As such it's trivial to build Tidy without this extra language
support using the `-DSUPPORT_LOCALIZATIONS=NO` switch.


## poconvert.rb

Tidy's source code includes a Ruby batch file called `poconvert.rb` which can be
used to generate POT, PO, and H files, and convert them back and forth. It has
been designed to work in a similar fashion as `gettext`'s tools, and includes
conveniences that let translators work in different source languages. Please
use `poconvert.rb help` for complete information (`ruby poconvert.rb help` on
Windows).

Note that you must install Ruby on your system, as well as install the required
dependencies. These can be manually installed with `[sudo] gem install xxx`,
where `xxx` represents the packages listed in `Gemfile`. For convenience, if you
have the Bundler gem installed, you can `bundle install` for automated
dependency installation.

Also take note of these two important characteristics:

- `poconvert.rb` is currently dependent on its current path. You can move it
  from its current location, but you will have to change the values of the
  `@@default_en` and `@@header_template` variables within the script.
- All files will be output in the current working directory. This will prevent
  accidental overwrites of important files while we all get used to the
  workflows.

Below are some sample workflows.


### Create a new POT file

Although we provide `tidy.pot` in the source, you can generate your own.

`./poconvert.rb xgettext`

This will put a fresh, new copy of `tidy.pot` in the working directory.


### Create a new POT file with non-English `msgid` strings

Although `gettext` officially recognizes English as the one, true source
language for PO and POT files, if you're more comfortable translating from a
non-English language, we can support you.

`./poconvert.rb xgettext <language_cc_ll.h>`

Where `<language_cc_ll.h>` is the path to an existing Tidy language header file.
This will produce a `tidy.pot` using the translated strings as `msgid`, using
English as a backup when translated strings are not present.

This can be valuable in producing regional variant translations, e.g., when
translating from `es` to `es_mx`.


### Convert an existing H to PO

In many cases you may want to have a fresh PO generated from a Tidy H file.
This can be accomplished with:

`./poconvert.rb msgunfmt <language_cc_ll.h>`


### Convert an existing H to PO using a different `msgid` language

If you want to generate a fresh PO file from a Tidy H file, but _also_ want to
have untranslated strings from a language other than English, try:

`./poconvert.rb msgunfmt <language_cc_ll.h> --baselang=<other-language_cc_ll.h>`


### Create a blank PO file for a particular region

`./poconvert.rb msginit`
or
`./poconvert.rb msginit --locale=LOCALE`

The first example will try to guess your current region, and the second will
use a region specified.

Tidy only knows about the same regions that `gettext` knows; if our `msginit`
does not recognize the region you specify, you will have to create a new PO
and modify the region settings yourself.

To create the blank PO using `msgid` strings from a different Tidy language,
you can use:

`./poconvert.rb msginit <language_cc_ll.h> [--locale=LOCALE]`


### Create a Tidy Language Header H file

When you're ready to include the language in Tidy, you can generate its header
file with:

`./poconvert.rb msgfmt <language_cc_ll.po>`

In the event you are creating a regional variant of a language, it's an
excellent idea to have Tidy exclude strings that are already present in the
parent language in order to reduce library and executable size. For example
if `es` already includes the string "archivo" there is no reason for your
translation to `es_mx` to include it, too. You can tell `poconvert.rb` to
exclude strings matching another localization like so:

`./poconvert.rb msgfmt <language_cc_ll.po> --baselang=<other-language_cc_ll.h>`


### Prepare your non-English PO for a PR

Although we have provided tools to allow you to work in languages other than
English, we can only accept PO's in the repository that have English `msgid` 
fields. It's easy to convert your PO back to English:

`./poconvert msgfmt <language_cc_ll.po>` 

`./poconvert msgunfmt <language_cc_ll.h>`

The first command converts your non-standard PO into a Tidy Language Header
file, and the second will create a fresh, new PO file from the header that
you've just created.


### Update your PO to match the new POT

If Tidy's POT changes, e.g., new strings are added, new comments, etc., the
simplest way to update your PO is to convert it to a header (which normalizes
it to the latest Tidy standard), and then convert the header to a new PO again.

`./poconvert msgfmt <language_cc_ll.po>` 

`./poconvert msgunfmt <language_cc_ll.h>`


## Help Tidy Get Better

It goes without saying **all help is appreciated**. We need to work together to
make Tidy better!