File: reading-pdf-annotations.md

package info (click to toggle)
pypdf 6.9.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky
  • size: 18,184 kB
  • sloc: python: 48,595; makefile: 35
file content (120 lines) | stat: -rw-r--r-- 2,637 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
# Reading PDF Annotations

PDF 2.0 defines the following annotation types:

* Text
* Link
* FreeText
* Line
* Square
* Circle
* Polygon
* PolyLine
* Highlight
* Underline
* Squiggly
* StrikeOut
* Caret
* Stamp
* Ink
* Popup
* FileAttachment
* Sound
* Movie
* Screen
* Widget
* PrinterMark
* TrapNet
* Watermark
* 3D
* Redact
* Projection
* RichMedia

In general, annotations can be read like this:

```{testsetup}
pypdf_test_setup("user/reading-pdf-annotations", {
    "example.pdf": "../resources/example.pdf",
})
```

```{testcode}
from pypdf import PdfReader

reader = PdfReader("example.pdf")

for page in reader.pages:
    if "/Annots" in page:
        for annotation in page["/Annots"]:
            obj = annotation.get_object()
            print({"subtype": obj["/Subtype"], "location": obj["/Rect"]})
```

```{testoutput}
:hide:

{'subtype': '/Highlight', 'location': [376.771, 406.213, 413.78, 422.506]}
{'subtype': '/Popup', 'location': [531.053, 327.965, 715.198, 422.219]}
{'subtype': '/FileAttachment', 'location': [245.819, 223.288, 252.819, 240.288]}
{'subtype': '/Stamp', 'location': [68.7536, 187.259, 151.442, 254.124]}
{'subtype': '/Popup', 'location': [612, 631.925, 816, 745.925]}
{'subtype': '/Text', 'location': [176.9, 216.719, 200.9, 240.719]}
{'subtype': '/Popup', 'location': [596, 709.445, 780, 801.445]}
```

Examples of reading three of the most common annotations:

## Text

```{testcode}
from pypdf import PdfReader

reader = PdfReader("example.pdf")

for page in reader.pages:
    if "/Annots" in page:
        for annotation in page["/Annots"]:
            subtype = annotation.get_object()["/Subtype"]
            if subtype == "/Text":
                print(annotation.get_object()["/Contents"])
```

```{testoutput}
:hide:

Text comment
```

## Highlights

```{testcode}
from pypdf import PdfReader

reader = PdfReader("example.pdf")

for page in reader.pages:
    if "/Annots" in page:
        for annotation in page["/Annots"]:
            subtype = annotation.get_object()["/Subtype"]
            if subtype == "/Highlight":
                coords = annotation.get_object()["/QuadPoints"]
                x1, y1, x2, y2, x3, y3, x4, y4 = coords
```

## Attachments

```{testcode}
from pypdf import PdfReader

reader = PdfReader("example.pdf")

attachments = {}
for page in reader.pages:
    if "/Annots" in page:
        for annotation in page["/Annots"]:
            subtype = annotation.get_object()["/Subtype"]
            if subtype == "/FileAttachment":
                fileobj = annotation.get_object()["/FS"]
                attachments[fileobj["/F"]] = fileobj["/EF"]["/F"].get_data()
```