Package: pypdf2 / 1.26.0-4+deb11u1

Metadata

Package Version Patches format
pypdf2 1.26.0-4+deb11u1 3.0 (quilt)

Patch series

view the series file
Patch File delta Description
Prevent_infinite_loop_in_readObject.patch | (download)

PyPDF2/generic.py | 4 4 + 0 - 0 !
1 file changed, 4 insertions(+)

 [patch] prevent infinite loop in readobject() function. patch by
 dhudson1. Closes mstamy2/PyPDF2#184


CVE 2022 24859.patch | (download)

PyPDF2/pdf.py | 32 22 + 10 - 0 !
1 file changed, 22 insertions(+), 10 deletions(-)

 cve-2022-24859

Bug-Debian: https://bugs.debian.org/1009879
0001 MAINT Quadratic runtime while parsing reduced to lin.patch | (download)

PyPDF2/pdf.py | 8 4 + 4 - 0 !
1 file changed, 4 insertions(+), 4 deletions(-)

 maint: quadratic runtime while parsing reduced to linear  (#808)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When the PdfFileReader tries to find the xref marker, the readNextEndLine methods builds a so called line by reading byte-for-byte. Every time a new byte is read, it is concatenated with the currently read line. This leads to quadratic runtime O(n) behavior as Python strings (also byte-strings) are immutable and have to be copied where n is the size of the file.
For files where the xref marker can not be found at the end this takes a enormous amount of time:

* 1mb of zeros at the end: 45.54 seconds
* 2mb of zeros at the end: 357.04 seconds
(measured on a laptop made in 2015)

This pull request changes the relevant section of the code to become linear runtime O(n), leading to a run time of less then a second for both cases mentioned above. Furthermore this PR adds a regression test.