1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
|
# Streaming Data with PyPDF2
In some cases you might want to avoid saving things explicitly as a file
to disk, e.g. when you want to store the PDF in a database or AWS S3.
PyPDF2 supports streaming data to a file-like object and here is how.
```python
from io import BytesIO
# Prepare example
with open("example.pdf", "rb") as fh:
bytes_stream = BytesIO(fh.read())
# Read from bytes_stream
reader = PdfReader(bytes_stream)
# Write to bytes_stream
writer = PdfWriter()
with BytesIO() as bytes_stream:
writer.write(bytes_stream)
```
## Writing a PDF directly to AWS S3
Suppose you want to manipulate a PDF and write it directly to AWS S3 without having
to write the document to a file first. We have the original PDF in `raw_bytes_data` as `bytes`
and want to set `my-secret-password`:
```python
from io import BytesIO
import boto3
from PyPDF2 import PdfReader, PdfWriter
reader = PdfReader(BytesIO(raw_bytes_data))
writer = PdfWriter()
# Add all pages to the writer
for page in reader.pages:
writer.add_page(page)
# Add a password to the new PDF
writer.encrypt("my-secret-password")
# Save the new PDF to a file
with BytesIO() as bytes_stream:
writer.write(bytes_stream)
bytes_stream.seek(0)
s3 = boto3.client("s3")
s3.write_get_object_response(
Body=bytes_stream, RequestRoute=request_route, RequestToken=request_token
)
```
|