1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98
|
# Lua-UCA hacking
You need the full installation from
[Github](https://github.com/michal-h21/lua-uca) in order to do stuff described
in this section. Package distributed on CTAN doesn't contain all necessary
files.
## Install
The package needs to download Unicode collation data and convert it to a Lua
table. It depends on `wget` and `unzip` utilities. All files can be downloaded
using Make:
make
To install the package in the local TEXMF tree, run:
make install
## New language support
To add a new language, add new function to `src/lua-uca/lua-uca-languages.lua`
file. The function name should be short language code. Example function for
the Russian language:
languages.ru = function(collator_obj)
collator_obj:reorder{ "cyrillic" }
return collator_obj
end
The language function takes the Collator object as a parameter. Methods showed
in the *Change sorting rules* section can be used with this object.
The `data/common/collation/` directory in the source repository contains files from the `CLDR` project.
They contain rules for many languages. The files needs to be normalized to the
[NFC form](https://en.wikipedia.org/wiki/Unicode_equivalence), for example using:
cat cs.xml | uconv -x any-nfc -o cs.xml
The `uconv` utility is a part of the [ICU Project](http://userguide.icu-project.org/).
Sorting rules for a language are placed in the `<collation>` element. Multiple
`<collation>` elements may be present in the XML file. It is usually best to chose the one with attribute
`type="standard"`.
The following example contains code from `da.xml`:
[caseFirst upper]
&D<<đ<<<Đ<<ð<<<Ð
&th<<<þ
&TH<<<Þ
&Y<<ü<<<Ü<<ű<<<Ű
&[before 1]ǀ<æ<<<Æ<<ä<<<Ä<ø<<<Ø<<ö<<<Ö<<ő<<<Ő<å<<<Å<<<aa<<<Aa<<<AA
&oe<<œ<<<Œ
This is translated to Lua code in `lua-uca-languages.lua` in the following way:
languages.da = function(collator_obj)
-- helper function for more readable tailoring definition
local tailoring = function(s) collator_obj:tailor_string(s) end
collator_obj:uppercase_first()
tailoring("&D<<đ<<<Đ<<ð<<<Ð")
tailoring("&th<<<þ")
tailoring("&TH<<<Þ")
tailoring("&Y<<ü<<<Ü<<ű<<<Ű")
tailoring("&ǀ<æ<<<Æ<<ä<<<Ä<ø<<<Ø<<ö<<<Ö<<ő<<<Ő<å<<<Å<<<aa<<<Aa<<<AA")
tailoring("&oe<<œ<<<Œ")
return collator_obj
end
Pull requests with new language support are highly appreciated.
## Support files in the source distribution
The `xindex` directory contains some examples for configuration of `Xindex`, Lua based indexing system.
Run `make xindex` command to compile them.
`Xindex` has built-in support for Lua-UCA since version `0.23`, it can be requested using the `-u` option.
The `tools/indexing-sample.lua` file provides a simple indexing processor, independent of any other tool.
## Testing
You can run unit tests using the following command:
make test
Testing requires [Busted](https://olivinelabs.com/busted/) testing framework installed on your system.
Tests are placed in the `spec` directory and they provide more examples of the package usage.
|