1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
|
# Reading PDF Annotations
PDF 1.7 defines 25 different annotation types:
* Text
* Link
* FreeText
* Line, Square, Circle, Polygon, PolyLine, Highlight, Underline, Squiggly, StrikeOut
* Stamp, Caret, Ink
* Popup
* FileAttachment
* Sound, Movie
* Widget, Screen
* PrinterMark
* TrapNet
* Watermark
* 3D
In general, annotations can be read like this:
```python
from PyPDF2 import PdfReader
reader = PdfReader("commented.pdf")
for page in reader.pages:
if "/Annots" in page:
for annot in page["/Annots"]:
obj = annot.get_object()
annotation = {"subtype": obj["/Subtype"], "location": obj["/Rect"]}
print(annotation)
```
Reading the most common ones is described here.
## Text
```python
from PyPDF2 import PdfReader
reader = PdfReader("example.pdf")
for page in reader.pages:
if "/Annots" in page:
for annot in page["/Annots"]:
subtype = annot.get_object()["/Subtype"]
if subtype == "/Text":
print(annot.get_object()["/Contents"])
```
## Highlights
```python
from PyPDF2 import PdfReader
reader = PdfReader("commented.pdf")
for page in reader.pages:
if "/Annots" in page:
for annot in page["/Annots"]:
subtype = annot.get_object()["/Subtype"]
if subtype == "/Highlight":
coords = annot.get_object()["/QuadPoints"]
x1, y1, x2, y2, x3, y3, x4, y4 = coords
```
## Attachments
```python
from PyPDF2 import PdfReader
reader = PdfReader("example.pdf")
attachments = {}
for page in reader.pages:
if "/Annots" in page:
for annotation in page["/Annots"]:
subtype = annot.get_object()["/Subtype"]
if subtype == "/FileAttachment":
fileobj = annotobj["/FS"]
attachments[fileobj["/F"]] = fileobj["/EF"]["/F"].get_data()
```
|