File: ChangeLog

package info (click to toggle)
paperwork 2.2.5-3
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 166,660 kB
  • sloc: python: 44,775; makefile: 992; sh: 625; xml: 135
file content (268 lines) | stat: -rw-r--r-- 12,899 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
2024/08/28 - 2.2.5:
- Fix import .doc/.xsl files

2024/08/27 - 2.2.4:
- No changes

2024/05/12 - 2.2.3:
- backend/guesswork/labels/sklearn: fix crash used by wrong use of
  scipy.sparse.hstack() + numpy.zeros() (see issue #1111)
- Page editor: fix race-condition
- Handle more gracefully work directory changes when the sync is in progress

2024/02/13 - 2.2.2:
- export: Add export pipe to export to PDF by automatically selecting the
  original PDF if available or generating one else
- docexport PDF: Workaround Cairo bug that causes occasional crashes when
  exporting to generated PDF.
- make it possible to edit help documents: it seems that this is actually
  often the first thing that people do try in Paperwork.
- sklearn: Fix stuck progress bar when switching work directories
- model.pdf: tolerate corrupted PDF page mapping.
- pagetracker: handle more gracefully invalid documents
- model.workdir: ignore files and directory with a name starting with ".".
- model.thumbnail: do not crash if a document is corrupted.
- model.thumbnail: when computing the size of the thumbnail,
  make sure neither the width or height ever end up at 0px.
- Support exporting corrupted PDFs. Some PDFs may declare more pages than they
  actually contain. It must not block a mass export.
- docexport/pdf: remove old workaround: call to time.sleep() isn't useful
  anymore

2023/09/17 - 2.2.1:
- Add build-system.build-backend to pyproject.toml
- datadirhandler: when removing a directory, don't move it to the trash.
  Instead delete it entirely

2023/09/16 - 2.2.0:
- By default, use PNG instead of JPEG (See issue #1021 for rationale).
- setup.py has been replaced by pyproject.toml
- drop dependency on python-Levenshtein.
- model.pdf: Fix: When moving an edited PDF page from one document to another,
  make sure to export the PDF page as the original image page (paper.X.png)
  instead of the edited one (paper.X.edited.png).
- cairo.pillow: optimization: use cairo.ImageSurface.create_from_data() now
  that it's available.
- beacon: if openpaper.work is unreachable, catch the exception to not bother
  the user uselessly
- guesswork.labels.sklearn: store the model on-disk to reload it more quickly
  as long as nothing has changed.
- Help documents: Help documents are read-only. Therefore if the user tries
  to add pages to them, redirect the new pages to a new document.
- model.thumbnail: handle gracefully corrupted thumbnails.
- cairo.pillow: do not assume that on-disk images are RGB/RGBA (they may have
  been optimized by scripts with 'convert' for instance).
- PDF: Handle gracefully UnicodeDecodeError that happen sometimes on Windows.
- Handle images too big for pillow as gracefully as possible: do not crash
  but replace them by a tile with a text explaining the image is too big.
- Libinsane: Make it possible to specify custom scan option values
  (thanks to auriocus on forum.openpaper.work)
- PDF: in case a page_map.csv is corrupted, log which one.
- PDF: Make the readign of page_map.csv more resilient.
- chkworkdir: in case page_map.csv is corrupted, fix the document by deleting
  the page_map.csv and moving the pages around

2023/01/08 - 2.1.2:
- Cairo renderers (Image/PDF): optimisation: add a cache
- Cairo image renderer: optimisation: cairo.ImageSurface.create_for_data() has been implemented and can now be used
- Cairo renderers: cleanup: Move the blurring code to a dedicated plugin
- autoselect_scanner: workaround: instead of looking for scanners when Paperwork starts, look for a scanner if and only if we failed to find the scanner currently defined in Paperwork's configuration. With some Sane backends, it's much safer to not try listing devices when starting.
- Fix: TIFF files may have the extension .tiff but also .tif
- ignore files at the root of the work directory
- libreoffice converter: fix file descriptor leak
- guesswork.label.sklearn: fix weird crash based on user logs
  (original cause is still unclear)
- fix test failure with recent pillow (thanks to Guillaume Girol)

2022/01/31 - 2.1.1:
- guesswork.label.sklearn: Fix: Handle gracefully documents without text
- model.pdf: take into account some PDF may be really really damaged

2021/12/05 - 2.1.0:
- Support for all document types supported by LibreOffice (requires LibreOffice
  to be installed)
- Support for password-protected PDF files
- Label guessing: replace simplebayes by sklearn (GaussianNB): require more
  resources but much more accurate
- Version data files: If the version changes, rebuild them all
- Fix page export: img: do not use page number in file names
- Cropping by scanner calibration: Never crop pages that already have text
  (fix issue where cropping was applied when changing document date / ID)
- API: rename transaction methods: add_obj() --> add_doc(),
  upd_obj() --> upd_doc(), del_obj() --> del_doc(),
  unchanged_obj() --> unchanged_doc()

2021/05/24 - 2.0.3:
- hOCR: fix page_get_text_by_url(): Do not return the hOCR title in the text
  (it's always "OCR output").
- Image loading (used for file import): file extension check must be
  case-insensitive (".jpeg" and ".JPEG" must be both accepted)
- OCR: Fix: By default, never run OCR on pages that already have text.
- Take into account Cairo image size limitations (dimensions can't be higher
  than 32k). Crop images accordingly if required.
- PDF: work around possible weird replies from LibPoppler regarding
  line/word boxes (avoid useless background exception)
- Swedish translations added
- Backend: openpaperwork_gtk.fs.gio has been removed from the minimum list of
  required plugins. fs.python is good enough to load the configuration.
- Backend: model.labels: Call to callbacks "on_all_labels_loaded" has been
  removed. It was redundant with the call to callbacks "on_label_loading_end"

2021/01/01 - 2.0.2:
- beacon.sysinfo: report some extra infos to openpaper.work:
  CPU max frequency, number of CPU cores, amount of memory, version of Python.
- add dependency on psutil
- PyOCR: Fix for people who seem to have no locale configured (?!)
- PDF: Fix: Write page mapping in the order of original page
  indexes, as expected when we read it back later.
  (otherwise, we may get weird behaviours)
- Labels: When removing labels, don't add extra empty lines

2020/11/15 - 2.0.1:
- Model: Fix: When the user move a page, they may actually creating a new document.
- Libinsane + bug report: the file to attach to the bug report should be
  called 'scanner_*.json', not 'statistics_*.json'
- Import: Don't use the same name for recursive importer and single importer.
- Import: If the single file importer has matched a file to import, make sure
  the recursive one doesn't match it too.
- Windows: poppler.memory: work around suspected memory leak regarding
  Gio.MemoryInput.new_from_data()
- When thumbnail are deleted by user action, never send them to trash, really
  delete them instead.
- Include tests in Pypi package (thanks to Elliott Sales de Andrade)

2020/10/17 - 2.0:
- Full rewrite
- Use of plugin system of openpaperwork_core to split features
- PDF can be edited
- Pages can be reinitialized to their initial states (reset)
- Multiple languages can be used for OCR
- Automated tests have been added
- Features that could be reused in other applications have been move to
  openpaperwork_core and openpaperwork_gtk
- Thumbnails are slightly smaller (they will be resized automatically)

2019/12/20 - 1.3.1:
- Backend: Check if thumbnail file is writable before updating it (thanks to
  Gregor Godbersen)
- Backend: Make indexation more resilient to errors (corrupted PDFs, etc).
- Backend: chkdeps: look for Libinsane (no known package yet)

2019/08/17 - 1.3.0:
- PDF export: PDFs can now be regenerated when exporting. Regenerated versions
  will include words from the OCR, but some metadata may be lost.
- Optimization: Speed up conversions from PIL image to GdkPixbuf (used for
  export previews and thumbnail display)
- Disable the use of a dedicated process for index operations: it
  prevents debugging
- New dependency: Do not use platform.dist() or platform.linux_distribution()
  anymore: It's deprecated and will be removed in Python 3.8. Use instead the
  module 'distro'.
- paperwork-shell: Add name and label arguments to command 'import' (thanks to
  Stéphane Brunner)
- Backend: Fix importing PNG files with transparency (thanks to Balló György)
- Fix warnings related to regexes escaping + various other cleanups (thanks
  to Elliott Sales de Andrade)

2018/03/01 - 1.2.4:
- Import: Remove ambiguity: Importers designed for import of directory
  will not try to import individual files. They will just let the importers
  designed for importing single file take care of it.
- Label guessing: Fix the way bayesian filters are updated (will trigger an
  index rebuild).
- paperwork-shell/labels/guessing: return scores as well as labels
  (useful for testing/debuging)
- Optim: PDF: Keep in memory the page sizes. It's an information very often
  requested when rendering and it cannot change with PDFs

2018/02/01 - 1.2.3:
- Windows: Fix labels handling: Fix CSV file reading
- Fix global deletion of a label
- Flatpak: Fix deletion of documents
- PDF: Fix file descriptor leak
- Flatpak: Fix support on English systems

2017/11/14 - 1.2.2:
- PDF: Fix thumbnail sizes. Incorrect thumbnails will be automatically
  regenerated

2017/08/26 - 1.2.1:
- paperwork-shell: improve help string of 'paperwork-shell chkdeps'
- Fix label deletion / renaming
- Windows: Fix FS.safe() when used for PDF import
- Windows: Fix FS.unsafe() (used for PDF export)

2017/07/11 - 1.2.0:
- API: remove methods doc.drop_cache() and page.drop_cache()
- API: docsearch: add method close()
- paperwork-shell: Use JSON format for the output (except for 'paperwork-shell
  dump')
- Use GIO functions instead of Python functions (open(), read(), close(), etc)
- Use URIs instead of Unix file paths (file:///...)
- Index is now managed in a separate process (avoid Python GIL locking + UI
  freezes)
- Import: Make it possible to import image folder
- Importers: provide a list of supported file formats (mime types)
- Import: To figure out a file type, look at the file extensions but also the
  mime type in case the extension is not set
- Import: Make the importers able to handle multiple Files/URIs instead of just
  one
- paperwork-shell import: Run OCR on imported pages that have no words
- paperwork-shell: add command 'ocr'
- Configuration: [Global]:workdirectory is now an URI encoded in base64
  (base64 encoding was required due to limitations of Python's ConfigParser)
- DocSearch: When unable to open the index, destroy it and rebuild it from scratch
- Add a new document type: ExternalPdfDoc: Used to display PDF that are outside
  of the work directory (for instance application help manual)
- Configuration setting [OCR]:lang is now managed by the backend instead of
  the frontend

2017/02/09 - 1.1.2:
- PDF: When PdfDoc.drop_cache() is called, make sure *all* the references to
  the Poppler objects are dropped, including those to the pages of the document

2017/02/05 - 1.1.1:
- No change. Version created only to match Paperwork-gui version.

2017/01/30 - 1.1.0:
- Add methods doc.has_ocr() and page.has_ocr() indicating if OCR has already
  been run on a given doc/page or not yet.
  Used in GUI for the option "Redo OCR on all documents" as it must act only
  on documents where OCR has already been done in the past (ie not PDF with
  text included)
- Optim: Provides a method page.get_image() returning an already resized
  Pillow image (PDF rendering optimisation)
- Export: Report progression
- Optim: PDF thumbnail rendering: Keep a cached version of the first page only.
  The other pages can be rendered on the fly
- Fix: Label directory name use base64 encoding, and this encoding can result
  in strings containing '/'. Those characters must be replaced (by '_')
- Fix: util/find_language(): If the system locale is not set properly, pycountry
  may raise UnicodeDecodeError.
- paperwork-shell: Add commands 'search', 'dump', 'switch_workdir', 'rescan',
  'show', 'import', 'delete_doc', 'guess_labels', 'add_label', 'remove_label',
  'rename'
- Import: When importing a single PDF, don't import it if it was already
  previously imported
- Import: Provides detailed information and statistics regarding what has been
  imported (return value of Importer.import_doc() has changed)

1.0.6:
- No change. Version created only to match Paperwork-gui version.

1.0.5:
- Doc deletion: Drop cache and file descripts *before* deleting document
(optional on GNU/Linux, but required on Windows)

1.0.4:
- Windows: Fix image import

1.0.3:
- Windows: Fix import/export

1.0.2:
- No change. Version created only to match Paperwork-gui version.

1.0.1:
- util/find_language(): fix pycountry db lookup
- Windows: hide ~/.config instead of ~/.config/paperwork.conf