File: ChangeLog

package info (click to toggle)
python-pyocr 0.8.5-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 548 kB
  • sloc: python: 4,921; makefile: 90; sh: 3
file content (199 lines) | stat: -rw-r--r-- 7,491 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
17/09/2023 - 0.8.5:
- Add build-system.build-backend to pyproject.toml

16/09/2023 - 0.8.4:
- Fix LineBoxBuilder: Take into account headers and footers too, not just the
  body.
- switch from setup.py to pyproject.toml
- switch from tox to pytest

26/06/2022 - 0.8.3:
- Workaround https://github.com/pypa/setuptools_scm/issues/727

15/04/2022 - 0.8.2:
- Add support for Tesseravt 5 + Linux
- Fix file descriptor leak (thanks to oda)

05/12/2021 - 0.8.1:
- Make the dependency on setuptools_scm optional

01/01/2020 - 0.8.0:
- Replaced libtesseract.image_to_pdf() by an object-oriented API that allows
  creating PDF with more than 1 page (thanks to Matthias Kraus).
- Tesseract 4 + sys.frozen=True: Fix TESSDATA_PREFIX: starting with
  Tesseract 4, the path must include tessdata/

22/06/2019 - 0.7.2:
- Fix setup.py on Windows

22/06/2019 - 0.7.1:
- tesseract.can_detect_orientation(): only returns True if 'osd' date files are
  installed
- setup.py: Fix installation in MSYS2

12/05/2019 - 0.7:
- Drop support for Python <= 2.7
- Fix: Make sure the builder objects can be used to parse box files
  even if Tesseract is not installed.
- PyOCR version is now automatically set in the module by setuptools_scm
  instead of PyOCR's Makefile (except on Windows)
- Tesseract: optim: keep the get_version() in memory instead of calling
  Tesseract everytime (get_version() by psm_parameter() which is called each
  time a box file is parsed ...)

18/02/2019 - 0.6:
- Complete rewrite of unit tests (thanks to Thomas Perret)
- Libtesseract 4.0: Fix segfault when running orientation detection
  (thanks to Marián Skrip)
- Libtesseract 4.0: Add a workaround: Tesseract need the locale to be set to
  'C' (thanks to Thomas Perret)
- Libtesseract: Specify DPI of the image to Libtesseract (thanks to Thomas
  Perret)
- Tesseract 4.0: Improve Tesseract version parsing

09/04/2018 - 0.5.3:
- Really fix tesseract 4.0 support (thanks to David Martin)
- Tests: switch from nose to pytest (thanks to Elliott Sales de Andrade)

25/07/2018 - 0.5.2:
- Fix tesseract 4.0 support: Use option '--psm' instead of '-psm'
- tesseract.detection_orientation(): Fix exception generation

01/03/2017 - 0.5.1:
- libtesseract/Windows: Add possible DLL names for libtesseract
- libtesseract: Keep track of library-loading errors in
  pyocr.libtesseract.lib_load_errors (useful for debugging)
- Build method has been changed: Use now "make install" instead of
  "python3 ./setup.py install"
- cosmetic: builders/WordHTMLParser: Message "OCR confidence not found"
  floods the logs when working with old documents --> switch to debug
  instead of info.

14/12/2017 - 0.5:
- Tesseract/Libtesseract + LineBoxBuilder: Add confidence scores to
  every word boxes and to hOCR files (thanks to Adriano Pagano)
- Tesseract 4 (shell): Add '--oem 0' to specify legacy model when doing
  orientation detection as orientation detection does not work yet with
  Tesseract 4 (thanks to Adriano Pagano)
- Libtesseract: Fix multi-language support
- Tesseract (shell) + Windows: Never let the cmd window appear
- Libtesseract: Implements image_to_pdf() (thanks to Marian Skrip)
- Libtesseract: Hide debug messages (thanks to Ashish Kulkarni)

13/05/2017 - 0.4.7:
- Tesseract 4.00.00alpha:
  - Version parsing: Ignore suffix (so '4.00.00alpha' == (4, 0, 0))
  - Libtesseract: Load libtesseract.so.4 instead of libtesseract.so.3 if
    available
- Support for Tesseract 3.05.00:
  - Builders: Split field 'tess_conf' into 'tess_flags' and 'tess_conf'
  - Libtesseract: If available, use TessBaseAPIDetectOrientationScript()
    instead of TessBaseAPIDetectOS
- Libtesseract: Workaround: Prevents possible segfault in image_to_string()
  when the target language is not available

26/01/2017 - 0.4.6:
- hOCR outputs: Generate valid XHTML files

10/01/2017 - 0.4.5:
- Clean up exceptions raised when OCR fails:
  - Now, all tools raise only exceptions inheriting from
    pyocr.PyocrException
  - There is now one and only one TesseractError (shared between
    pyocr.libtesseract and pyocr.tesseract)

08/12/2016 - 0.4.4:
- Fix Python 2.7 support (broken import)

06/12/2016 - 0.4.3:
- (temporary) Use tesseract-sh by default instead of libtesseract. Some
  people have reported crashes with Paperwork+libtesseract. It needs more
  stress-testing
- DigitBuilder is now available in 'pyocr.builders' (can be used
  with libtesseract and cuneiform)
- New builder: DigitLineBoxBuilder
- Windows: Fix pyinstaller packaging suport: env variable TESSDATA_PREFIX
  wasn't set correctly
- Windows: Tesseract-sh: Prevent CMD windows from appearing

05/10/2016 - 0.4.2:
* Tesseract: orientation detection: Ignore errors printed by libleptonic
  on stderr (thanks to TeisD)
* Tesseract: Fix support of dev builds (thanks to Fjup)
* Libtesseract: Fix support of dev builds (thanks to Jakub Semerák)
* Tesseract: Use '--list-langs' to get the available languages instead of
  looking for the data directory (thanks to Bernhard Liebl)

06/04/2016 - 0.4.1:
* Disable 'libtesseract' with Tesseract <= 3.03. It tends to segfault.
  Libtesseract: Disable it with Tesseract <= 3.03. It tends to segfault.
  Note: the segfault may not actually be related to Libtesseract. It may be due to other things in Debian stable (jessie).
  Anyway, Paperwork cannot work on Debian stable because of that --> disabled just to be safe


13/03/2013 - 0.4.0:
* New module: 'libtesseract'. Use the C API of Tesseract for OCR.
  This module is more efficient and cleaner than the old 'tesseract' module
  (no more fork + exec + sh, less image manipulation, etc).
  Note that with this module the images are just loaded and uncompressed
  by Pillow. With 'tesseract', they were loaded, uncompressed, re-compressed
  and saved by Pillow, then be reloaded by Leptonica. So the results may
  vary slightly.
* Tesseract: Add support for Win32
* Tesseract: Fix orientation detection for version >= 3.04.01


0.3.1:
* tesseract.detect_orientation(): Use a temporary file instead of stdin
  to transmit the image to Tesseract. Tesseract 3.04 doesn't support
  stdin + "-psm 0" (regression ?)
* tesseract.detect_orientation(): Improve output parsing reliability
* optim: Avoid unnecessary convert to RGB and allow using image formats
  different from PNG
* TextBuilder + Cuneiform: add extra settings for Cuneiform
  (cuneiform_dotmatrix, cuneiform_fax=False, cuneiform_singlecolumn)


0.3.0:
* New API: pyocr.<tool>.can_detect_orientation() and
  pyocr.<tool>.detect_orientation()


0.2.4:
* Tesseract : add digit-only support
* Tesseract : add support for Tesseract subsets of layout analysis (-psm)


0.2.3:
* Strip the alpha channel from images before running the OCR. It's basically
  useless and can prevent the tool from working correctly.
* Make hOCR parsing more resistant (handle extra data around box positions)
* Fix: Take into account that new versions of Tesseract uses the file
  extension .hocr instead of .html


0.2.2:
* Fix Python 3 support
* Add support for Tesseract on Heroku


0.2.1:
* Make it possible to use 'import pyocr' instead of 'from pyocr import pyocr'.
  'from pyocr import pyocr' still works but is obsolete.
* Fix dependency list: depends on Pillow (it's untested with PIL)
* Fix pyocr.VERSION


0.2.0:
* Python 3.x support


0.1.2:
* Tesseract: Fix version parsing
* Tesseract: Fix Tesseract 3.02.01's hOCR format support


0.1.1:
* hOCR: Parse lines as well as words
* tesseract.get_available_languages() : Fix fedora support
* Fix UTF-8 support