File: CHANGES

package info (click to toggle)
morfologik-stemming 1.9.0%2Bdfsg-0.1
  • links: PTS, VCS
  • area: main
  • in suites: jessie, jessie-kfreebsd, stretch
  • size: 6,012 kB
  • ctags: 1,052
  • sloc: java: 7,121; xml: 759; makefile: 6
file content (395 lines) | stat: -rw-r--r-- 15,336 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395

Morfologik-stemming Change Log

For an up-to-date CHANGES file see 
https://github.com/morfologik/morfologik-stemming/blob/master/CHANGES

======================= morfologik-stemming 1.9.0 =======================

Changes in backwards compatibility policy

New Features

* Added capability to normalize input and output strings for dictionaries.
  This is useful for dictionaries that do not support ligatures, for example.
  To specify input conversion, use the property 'fsa.dict.input-conversion'
  in the .info file. The output conversion (for example, to use ligatures)
  is specified by 'fsa.dict.output-conversion'. Note that lengthy 
  conversion tables may negatively affect performance.

Bug Fixes

Optimizations

 * The suggestion search for the speller is now performed directly by traversing
   the dictionary automaton, which makes it much more time-efficient (thanks
   to Jaume Ortolà).

 * Suggestions are generated faster by avoiding unnecessary case conversions.

======================= morfologik-stemming 1.8.3 =======================

Bug Fixes

* Fixed a bug for spelling dictionaries in non-UTF encodings with 
  separators: strings with non-encodable characters might have been 
  accepted as spelled correctly even if they were missing in the 
  dictionary.

======================= morfologik-stemming 1.8.2 =======================

New Features

* Added the option of using frequencies of words for sorting spelling 
  replacements. It can be used in both spelling and tagging dictionaries.
  'fsa.dict.frequency-included=true' must be added to the .info file.
  For building the dictionary, add at the end of each entry a separator and 
  a character between A and Z (A: less frequently used words; 
  Z: more frequently used words). (Jaume Ortolà)

======================= morfologik-stemming 1.8.1 =======================

Changes in backwards compatibility policy

* MorphEncodingTool will *fail* if it detects data/lines that contain the 
  separator annotation byte. This is because such lines get encoded into
  something that the decoder cannot process. You can use \u0000 as the 
  annotation byte to avoid clashes with any existing data.

======================= morfologik-stemming 1.8.0 =======================

Changes in backwards compatibility policy

* Command-line option changes to MorphEncodingTool - it now accepts an explicit
  name of the sequence encoder, not infix/suffix/prefix booleans.  

* Updating dependencies to their newest versions.

New Features

* Dictionary .info files can specify the sequence decoder explicitly:
  suffix, prefix, infix, none are supported. For backwards compatibility,
  fsa.dict.uses-prefixes, fsa.dict.uses-infixes and fsa.dict.uses-suffixes
  are still supported, but will be removed in the next major version.

* Command-line option changes to MorphEncodingTool - it now accepts an explicit
  name of the sequence encoder, not infix/suffix/prefix booleans.  

* Rewritten implementation of tab-separated data files (tab2morph tool).
  The output should yield smaller files, especially for prefix encoding
  and infix encoding. This does *not* necessarily mean smaller automata
  but we're working on getting these as well.

  Example output before and after refactoring:
  
  Prefix coder:
  postmodernizm|modernizm|xyz => [before] postmodernizm+ANmodernizm+xyz
                              => [after ] postmodernizm+EA+xyz
  
  Infix coder:
  laquelle|lequel|D f s       => [before] laquelle+AAHequel+D f s
                              => [after ] laquelle+AGAquel+D f s

* Changed the default format of the Polish dictionary from infix
  encoded to prefix encoded (smaller output size).

Optimizations

* A number of internal implementation cleanups and refactorings.

======================= morfologik-stemming 1.7.2 =======================

* A quick fix for incorrect decoding of certain suffixes (long suffixes).

* Increased max. recursion level in Speller to 6 from 4. (Jaume Ortolà)

======================= morfologik-stemming 1.7.1 =======================

* Fixed a couple of bugs in morfologik-speller (Jaume Ortolà).

======================= morfologik-stemming 1.7.0 =======================

* Changed DictionaryMetadata API (access methods for encoder/decoder).

* Initial version of morfologik-speller component.

* Minor changes to the FSADumpTool: the header block is always UTF-8 
  encoded, the default platform encoding does not matter. This is done to 
  always support certain attributes that may be unicode (and would be 
  incorrectly dumped otherwise).

* Metadata *.info files can now be encoded in UTF-8 to support text 
  attributes that otherwise would require text2ascii conversion.

======================= morfologik-stemming 1.6.0 =======================

* Update morfologik-polish data to Morfologik 2.0 PoliMorf (08.03.2013). 
  Deprecated DICTIONARY constants (unified dictionary only).
          
* Important! The format of encoding tags has changed and is now 
  multiple-tags-per-lemma. The value returned from WordData#getTag 
  may be a number of tags concatenated with a "+" character. Previously
  the same lamma/stem would be returned multiple times, each time with 
  a different tag.

* Moving code from SourceForge to github.

======================= morfologik-stemming 1.5.5 =======================

* Made hppc an optional component of morfologik-fsa. It is required
  for constructing FSA automata only and causes problems with javac.
  http://stackoverflow.com/questions/3800462/can-i-prevent-javac-accessing-the-class-path-from-the-manifests-of-our-third-par

======================= morfologik-stemming 1.5.4 =======================

* Replaced byte-based speller with CharBasedSpeller.

* Warn about UTF-8 files with BOM.
 
* Fixed a typo in package name (speller).

======================= morfologik-stemming 1.5.3 =======================

* Initial release of spelling correction submodule.

* Updated morfologik-polish data to morfologik 1.9 [12.06.2012]

* Updated morfologik-polish licensing info to BSD (yay).

======================= morfologik-stemming 1.5.2 =======================

* An alternative Polish dictionary added (BSD licensed): SGJP (Morfeusz). 
  PolishStemmer can now take an enum switching between the dictionary to 
  be used or combine both.

* Project split into modules. A single jar version (no external 
  dependencies) added by transforming via proguard.

* Enabled use of escaped special characters in the tab2morph tool.

* Added guards against the input term having separator character 
  somewhere (this will now return an empty list of matches). Added 
  getSeparatorChar to DictionaryLookup so that one can check for this 
  condition manually, if needed.

======================= morfologik-stemming 1.5.1 =======================

* Build system switch to Maven (tested with Maven2).

======================= morfologik-stemming 1.5.0 =======================

* Major size saving improvements in CFSA2. Built in Polish dictionary 
  size decreased from 2,811,345 to 1,806,661 (CFSA2 format).

* FSABuilder returns a ready-to-be-used FSA (ConstantArcSizeFSA). 
  Construction overhead for this automaton is a round zero (it is 
  immediately serialized in-memory).

* Polish dictionary updated to Morfologik 1.7. [19.11.2010]

* Added an option to serialize automaton to CFSA2 or FSA5 directly from 
  fsa_build.

* CFSA is now deprecated for serialization (the code still reads CFSA 
  automata, but will no be able to serialize them). Use CFSA2.

* Added immediate state interning. Speedup in automaton construction by 
  about 30%, memory use decreased significantly (did not perform exact 
  measurements, but incremental construction from presorted data should 
  consume way less memory).

* Added an option to build FSA from already sorted data (--sorted). 
  Avoids in-memory sorting. Pipe the input through shell sort if 
  building FSA from large data.

* Changed the default ordering from Java signed-byte to C-like unsigned 
  byte value. This lets one use GNU sort to sort the input using 
  'export LC_ALL=C; sort input'.  

* Added traversal routines to calculate perfect hashing based on 
  FSA with NUMBERS.

* Changed the order of serialized arcs in the binary serializer for FSA5 
  to lexicographic  (consistent with the input). Depth-first traversal 
  recreates the input, in other words.

* Removed character-based automata.

* Incompatible API changes to FSA builders (moved to morfologik.fsa).

* Incompatible API changes to FSATraversalHelper. Cleaned up match 
  types, added unit tests. 

* An external dependency HPPC (high performance primitive collections) 
  is now required

======================= morfologik-stemming 1.4.1 =======================

* Upgrade of the built-in Morfologik dictionary for Polish (in CFSA 
  format).

* Added options to define custom FILLER and ANNOT_SEPARATOR bytes in the 
  fsa_build tool.

* Corrected an inconsistency with the C fsa package -- FILLER and 
  ANNOT_SEPARATOR characters are now identical with the C version.
          
* Cleanups to the tools' launcher -- will complain about missing JARs, 
  if any.

======================= morfologik-stemming 1.4.0 =======================

* Added FSA5 construction in Java (on byte sequences). Added preliminary 
  support for character sequences. Added a command line tool for FSA5
  construction from unsorted data (sorting is done in-memory).

* Added a tool to encode tab-delimited dictionaries to the format 
  accepted by fsa_build and FSA5 construction tool.

* Added a new version of Morfologik dictionary for Polish (in CFSA format).

======================= morfologik-stemming 1.3.0 =======================

* Added runtime checking for tools availability so that unavailable tools 
  don't show up in the list.

* Recompressed the built-in Polish dictionary to CFSA. 

* Cleaned up FSA/Dictionary separation. FSAs don't store encoding any more 
  (because it does not make sense for them to do so). The FSA is a purely 
  abstract class pushing functionality to sub-classes. Input stream 
  reading cleaned up.

* Added initial code for CFSA (compressed FSA). Reduces automata size 
  about 10%. 

* Changes in the public API. Implementation classes renamed (FSAVer5Impl 
  into FSA5). Major tweaks and tunes to the API.

* Added support for version 5 automata built with NUMBERS flag (an extra 
  field stored for each node).

======================= morfologik-stemming 1.2.2 =======================

* License switch to plain BSD (removed the patent clause which did not 
  make much sense anyway).

* The build ZIP now includes licenses for individual JARs (prevents 
  confusion). 

======================= morfologik-stemming 1.2.1 =======================

* Fixed tool launching routines.

======================= morfologik-stemming 1.2.0 =======================

* Package hierarchy reorganized.

* Removed stempel (heuristic stemmer for Polish).

* Code updated to Java 1.5. 

* The API has changed in many places (enums instead of constants, 
  generics, iterables, removed explicit Arc and Node classes and replaced 
  by int pointers).

* FSA traversal in version 1.2 is implemented on top of primitive data 
  structures (int pointers) to keep memory usage minimal. The speed 
  boost gained from this is enormous and justifies less readable code. We
  strongly advise to use the provided iterators and helper functions 
  for matching state sequences in the FSA.

* Tools updated. Dumping existing FSAs is much, much faster now.        

======================= morfologik-stemming 1.1.4 =======================

* Fixed a bug that caused UTF-8 dictionaries to be garbled. Now it 
  should be relatively safe to use UTF-8 dictionaries (note: separators 
  cannot be multibyte UTF-8 characters, yet this is probably a very 
  rare case).

======================= morfologik-stemming 1.1.3 =======================

* Fixed a bug causing NPE when the library is called with null context 
  class loader  (happens when JVM is invoked from an JNI-attached 
  thread). Thanks to Patrick Luby for report and detailed analysis.

* Updated the built-in dictionary to the newest version available. 

======================= morfologik-stemming 1.1.2 =======================

* Fixed a bug causing JAR file locking (by implementing a workaround).

* Fixed the build script (manifest file was broken).

======================= morfologik-stemming 1.1.1 =======================

* Distribution script fixes. The final JAR does not contain test classes 
  and resources. Size trimmed almost twice compared to release 1.1.

* Updated the dump tool to accept dictionary metadata files.

======================= morfologik-stemming 1.1 =========================

* Introduced an auxiliary "meta" information files about compressed 
  dictionaries. Such information include delimiter symbol, encoding 
  and infix/prefix/postfix decoding info.

* The API has changed (repackaging). Some deprecated methods have been 
  removed. This is a major redesign/ upgrade, you will have to adjust 
  your source code.

* Cleaned up APIs and interfaces.

* Added infrastructure for command-line tool launching.

* Cleaned up tests.

* Changed project name to morfologik-stemmers and ownership to 
  (c) Morfologik.

======================= morfologik-stemming 1.0.7 =======================

* Removed one bug in fsa 'compression' decoding.

======================= morfologik-stemming 1.0.6 =======================

* Customized version of stempel replaced with a standard distribution.

* Removed deprecated methods and classes.
          
* Added infix and prefix encoding support for fsa dictionaries.

======================= morfologik-stemming 1.0.5 =======================

* Added filler and separator char dumps to FSADump.
          
* A major bug in automaton traversal corrected. Upgrade when possible.
          
* Certain API changes were introduced; older methods are now deprecated
  and will be removed in the future.

======================= morfologik-stemming 1.0.4 =======================

* Licenses for full and no-dict versions.

======================= morfologik-stemming 1.0.3 =======================

* Project code moved to SourceForge (subproject of Morfologik).
  LICENSE CHANGED FROM PUBLIC DOMAIN TO BSD (doesn't change much, but 
  clarifies legal issues).

======================= morfologik-stemming 1.0.2 =======================

* Added a Lametyzator constructor which allows custom dictionary stream, 
  field delimiters and encoding. Added an option for building stand-alone 
  JAR that does not include the default polish dictionary.

======================= morfologik-stemming 1.0.1 =======================

* Code cleanups. Added a method that returns the third automaton's column
  (form).

======================= morfologik-stemming 1.0 =========================

* Initial release