File: NEWS

package info (click to toggle)
uctodata 0.11-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 380 kB
  • sloc: python: 39; sh: 39; makefile: 25
file content (92 lines) | stat: -rw-r--r-- 3,094 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92

0.11  26-04-2024
[Ko van der Sloot]
* allow NON_SPACING_MARKERs inside NUMBERs in all languages

[Maarten van Gompel]
* README.md: README: removed LaMachine reference

0.10  02-04-2024
[Ko van der Sloot]
* modernized configuration step
* French list of known abbreviations was not used at all!
  added that
* in English, for shouldn't be handled as an abbreviation, unless
  with a trailing dot.

0.9.1  21-07-2022
[Maarten van Gompel]
  - New English twitter config wasn't installed properly yet

0.9  21-07-2022
[Ko van der Sloot]
  - fix for PREFIX rules in french and italian
  - small fix to prevent loosing a character in the PREFIX rule. (see https://github.com/LanguageMachines/ucto/issues/87 ) This doesn't fix the unwanted splits though.
  - added SYMBOL, PICTOGRAM and EMOTICON to setdefinitions
  - relaxed the e-mail rule a bit.

[Piroska Lendvai]
  - Suggestions for German abbreviations

[Antal van den Bosch]
  - New config file for English Twitter data. Recognizes and retains #hastags and @mentions.

0.8  29-11-2018
[Ko van der Sloot]
  - separated .abr files from there main files for all Languages
  - updated italian data (thanks to Kilian Evang)

[Iris Hendricks]
  - updated abbrev files for Portuguese Turkish and French based on
    https://en.wiktionary.org/wiki/Category:Portuguese_abbreviations and
    https://en.wiktionary.org/wiki/Category:Turkish_initialisms.
  - added full list of French abbreviations.
  - added 'aub' to Dutch list

0.7.1  17-05-2018
[Ko van der Sloot]
Bug fix release:
 * install some datafiles originally provided by 'ucto'

0.7  16-05-2018
[Ko vd Sloot]
 - tokconfig-nld-historical: typo in rule
 - updated all languages with new ABBREVIATION and NUMBER-ORDINAL rules:
   = accomodate ABBREVIATIONS within brackets.
   = avoid needles backtracking in NUMBER-ORDINAL

[Maarten van Gompel]
 - Apparent bug in Italian config

0.6 [Ko vd Sloot] 04-04-2018
 * several fixes for problems addressed in
   https://github.com/LanguageMachines/ucto/issues/46
   Notes:
   - the suffix problems were already addressed in Release 0.5
   - the colon problem is not addressed. Do we need REVERSE-SMILEY?

0.5 [Ko vd Sloot] 18-10-2017
 * adding slightly adapted tokenizer configuration for historical dutch
    (a bit more conservative on splitting).
    INL/nederlab-linguistic-enrichment/#7

0.4 [Maarten van Gompel] 23-01-2017
 * Data files are moved to share/ instead of etc/
     (incompatible with ucto < 0.9.6)
 * DATE pattern expanded (LanguageMachines/ucto#16)

0.3.1 [Ko van der Sloot] 06-01-2017
Bug fix release:
 * fixed install problem in debian packaging using DESTDIR
 * cleaned all rules from 'empty' entries (which lead to warnings)

0.3 [Ko vd Sloot] 05-01-2017
  * new direcory structure an filenames based on ISO 693-3 language codes.

0.2 [Ko vd Sloot] 28-09-2016
  * New implementation of rules. Needs a recent ucto that supports recursive
    application of rules
  * lots of bug fixes. Most notably in CURRENCY, ABBREV, SUFFIX and PREFIX rules

0.1 [Ko vd Sloot] 30-06-2016
  * first version of a separate data package for ucto