File: streaming-data.md

package info (click to toggle)

pypdf2 2.12.1-3%2Bdeb12u1

links: PTS, VCS
area: main
in suites: bookworm
size: 27,144 kB
sloc: python: 28,767; makefile: 119; sh: 2

file content (55 lines) | stat: -rw-r--r-- 1,380 bytes

# Streaming Data with PyPDF2

In some cases you might want to avoid saving things explicitly as a file
to disk, e.g. when you want to store the PDF in a database or AWS S3.

PyPDF2 supports streaming data to a file-like object and here is how.

```python
from io import BytesIO

# Prepare example
with open("example.pdf", "rb") as fh:
    bytes_stream = BytesIO(fh.read())

# Read from bytes_stream
reader = PdfReader(bytes_stream)

# Write to bytes_stream
writer = PdfWriter()
with BytesIO() as bytes_stream:
    writer.write(bytes_stream)
```

## Writing a PDF directly to AWS S3

Suppose you want to manipulate a PDF and write it directly to AWS S3 without having
to write the document to a file first. We have the original PDF in `raw_bytes_data` as `bytes`
and want to set `my-secret-password`:

```python
from io import BytesIO

import boto3
from PyPDF2 import PdfReader, PdfWriter


reader = PdfReader(BytesIO(raw_bytes_data))
writer = PdfWriter()

# Add all pages to the writer
for page in reader.pages:
    writer.add_page(page)

# Add a password to the new PDF
writer.encrypt("my-secret-password")

# Save the new PDF to a file
with BytesIO() as bytes_stream:
    writer.write(bytes_stream)
    bytes_stream.seek(0)
    s3 = boto3.client("s3")
    s3.write_get_object_response(
        Body=bytes_stream, RequestRoute=request_route, RequestToken=request_token
    )
```