File: developers.rst

package info (click to toggle)
translate-toolkit 3.17.5-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 7,780 kB
  • sloc: python: 69,719; sh: 1,412; makefile: 186; xml: 48
file content (304 lines) | stat: -rw-r--r-- 10,163 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304

.. _translate_toolkit_developers_guide:

Translate Toolkit Developers Guide
**********************************

The goal of the translate toolkit is to simplify and unify the process of
translation.

.. _developers#history:

History
=======

The initial toolkit was designed to convert Mozilla .dtd and .properties files
into Gettext PO format.  The logic was not that PO was in any way superior but
that by simplifying the translations process i.e. allowing a translator to use
one format and one tool that we could get more people involved and more
translators.

The tools have now evolved to include other formats such as OpenOffice.org and
the goal is still to migrate various formats to a common format, PO and in the
future XLIFF as more tools become available for that format.

These tools we group as converters.  Along the way we developed other tools
that allowed us to manipulate PO files and check them for consistency.  As we
evolved the converter tools we have also improved and abstracted the classes
that read the various file types.  In the future we hope to define these better
so that we have a more or less stable API for converters.

.. _developers#resources:

Resources
=========

.. _developers#git_access:

Git access
----------
Translate Toolkit uses Git as a Version Control System. You can directly clone
the translate repository or fork it at GitHub.

::

  git clone https://github.com/translate/translate.git

.. _developers#issues:

Issues
------

* https://github.com/translate/translate/issues

.. _developers#communication:

Communication
-------------

* `Issue tracker <https://github.com/translate/translate/issues>`_ - bug reports, discussions, and questions
* `Discussions <https://github.com/translate/translate/discussions>`_ - general discussions and help


.. _developers#working_with_issues:

Working with GitHub Issues
===========================
When you close issues ensure that you give a description and git hash for the
fix.  This ensures that the reporter or code reviewer can see your work and has
an easy method of finding your fix.

Automated issue closing from commits
-------------------------------------

GitHub will automatically close issues when commit messages reference them
properly.

- Issues are recognised by keywords followed by the issue number::

    Fixes #123
    Closes #456
    Resolves #789

- Multiple issues can be referenced in a single commit::

    Fixes #123, closes #456

- Commits to the default branch (or in merged pull requests) will close the
  referenced issues automatically.

.. _developers#source_code_map:

Source code map
===============

The source code for the tools is hosted on `GitHub
<https://github.com/translate/translate>`_.  This rough map will allow you to
navigate the source code tree:

* convert -- convert between different formats and PO format
* filters -- :doc:`/commands/pofilter` and its helper functions (badly named,
  it is really a checking tool)
* storage -- all base file formats: XLIFF, .properties, OpenOffice.org, TMX,
  etc.
* misc -- various helper functions
* tools -- all PO manipulation programs: :doc:`/commands/pocount`,
  :doc:`/commands/pogrep`, etc
* lang -- modules with data / tweaks for various languages
* search -- translation memory, terminology matching, and indexing / searching
* share -- data files

.. _developers#setup:

Setup
=====

**Prerequisites:**

* Git
* Python 3.10 or newer
* `uv <https://docs.astral.sh/uv/>`_ (recommended package manager)

**Installation:**

Clone the repository:

.. code-block:: sh

   git clone https://github.com/translate/translate.git
   cd translate

Install all dependencies (creates a virtual environment in ``.venv``):

.. code-block:: sh

   uv sync --all-extras --dev

Run the tests:

.. code-block:: sh

   make test

**Note:** ``uv sync`` automatically creates and manages a virtual environment for
you. All dependencies including development tools are installed using PEP 735
dependency groups defined in ``pyproject.toml``.

.. _developers#general_overview_of_the_programs:

General overview of the programs
================================

Each tool in the toolkit has both a core program and a command line wrapper.
For example the oo2po converter:

* oo2po -- the command line tool
* oo2po.py -- the core program

This is done so that the tools can be used from within the Pootle server thus
reusing the toolkit easily.

.. _developers#command_line_options:

Command line options
--------------------

Getting lost with the command line options?  Well you might want to volunteer
to move some of them into configuration files.  But in terms of programming you
might be confused as to where they are located.  Many of the command line
options are implemented in each tool.  Things such as :opt:`--progress` and
:opt:`--errorlevel` are used in each program.  Thus these are abstracted in
**misc/optrecurse.py**.  While each tools unique command line options are
implemented in **xxx.py**.

.. _developers#converters:

Converters
==========

The converters each have a class that handles the conversion from one format to
another.  This class has one important method **convertfile** which handles the
actual conversion.

A function **convertXXX** manages the conversion for the command line
equivalent and essentially has at least 3 parameters: inputfile, outputfile and
templatefile.  It itself will call the conversion class to handle conversion of
individual files.  Recursing through multiple files is handled by the
optrecurse.py logic.

The converters **main** function handles any unique command line options.

Where we are headed is to get to a level where the storage formats themselves
are more aware of themselves and their abilities.  Thus the converter could end
up as one program that accepts storage format plugins to convert from anything
to almost anything else.  Although our target localisation formats are PO and
XLIFF only.

If you want to create a new converter it is best to look at a simple instance
such as :doc:`/commands/csv2tbx` or :doc:`/commands/txt2po` and their
associated storage classes.  The :doc:`storage base class documentation
</api/storage>` will give you the information you need for the storage class
implementation.

.. _developers#tools:

Tools
=====

The tools in some way copy the logic of the converters.  We have a class so
that we can reuse a lot of the functionality in Pootle.  We have a core
function that take: input, output and templates.  And we have a **main**
function to handle the command line version.

:doc:`/commands/pocount` should be converted to this but does not follow this
conventions.  In fact pocount should move the counting to the storage formats
to allow any format to return its own word count.

.. _developers#checks:

Checks
======

There's really only one, :doc:`/commands/pofilter`.  But there are lots of
helper functions for pofilter.  pofilters main task is to check for errors in
PO or XLIFF files.  Here are the helper file and their descriptions.

* autocorrect.py -- when using :opt:`--autocorrect` it will attempt some basic
  corrections found in this file
* checks.py -- the heart. This contains: the actual checks and their error
  reports, and defined variables and accelerators for e.g, :opt:`--mozilla`
* decorations.py -- various helper functions to identify accelerators,
  variables and markers
* helpers.py -- functions used by the tests
* prefilters.py -- functions to e.g. remove variables and accelerators before
  applying tests to the PO message

pofilter is now relatively mature.  The best areas for contributions are:

* more tests
* language specific configuration files
* tests for the tests -- so we don't break our good tests
* defining a config files scheme to do cool stuff off of the command line.
  Globally enable or disable tests based on language, etc
* some approach to retesting that would remove '# (pofilter)' failure markings
  if the test now passes.
* ability to mark false positives

The :doc:`API documentation </api/filters>` is a good start if you want to add
a new tests.  To add a new language have a look at a language you understand
amongst those already implemented.

.. _developers#storage:

Storage
=======

These are the heart of the converters.  Each destination storage format is
implemented in its own file.  Up until toolkit version 0.8, there was no
formally defined API (the tools have been evolving and only recently
stabilised), but they generally followed this structure.  These classes are
defined:

* XXelement -- handles the low level individual elements of the file format.
  e.g. PO message, CSV records, DTD elements
* XXfile -- handles the document or file level of the format.  Eg a PO file, a
  CSV file a DTD file

  * fromlines -- read in a file and initialise the various elements
  * tolines -- convert the elements stored in XXelements and portions in XXfile
    to a raw file in that format

In the XML based formats e.g.  TMX, XLIFF and HTML there is usually just an
extended parser to manage the file creation.

Within each storage format there are many helper functions for escaping and
managing the unique features of the actual format.

You can help by:

* abstracting more of the functions and documenting that so that we can get a
  better API
* adding other formats and converters e.g. .DOC, .ODF and others
* helping us move to a position where any format should convert to the base
  format: PO and in the future XLIFF without having to create a specific
  converter wrapper.
* Ensuring that our formats :doc:`conform to the standards
  </formats/conformance>`

.. _developers#base_classes:

Base Classes
------------

From toolkit 0.9 onwards, we are moving towards basing all storage formats on a
set of :doc:`base classes </formats/base_classes>`, in the move to a universal
API.  We're also fixing things so that escaping is much more sane and handled
within the class itself not by the converters.

In base classes we have different terminology

* XXXunit = XXXelement
* XXXstore = XXXfile

We have also tried to unify terminology but this has been filtered into the old
classes as far as possible.