File: intro.md

package info (click to toggle)
pypdf 5.4.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 17,484 kB
  • sloc: python: 39,672; makefile: 35
file content (101 lines) | stat: -rw-r--r-- 3,950 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
# Developer Intro

pypdf is a library and hence its users are developers. This document is not for
the users, but for people who want to work on pypdf itself.

## Installing Requirements

```
pip install -r requirements/dev.txt
```

## Running Tests

See [testing pypdf with pytest](testing.md).

## The sample-files git submodule
The reason for having the submodule `sample-files` is that we want to keep
the size of the pypdf repository small while we also want to have an extensive
test suite. Those two goals contradict each other.

The `resources` folder should contain a select set of core examples that cover
most cases we typically want to test for. The `sample-files` might cover a lot
more edge cases, the behavior we get when file sizes get bigger, different
PDF producers.

In order to get the sample-files folder, you need to execute:

```
git submodule update --init
```

## Tools: git and pre-commit

Git is a command line application for version control. If you don't know it,
you can [play ohmygit](https://ohmygit.org/) to learn it.

GitHub is the service where the pypdf project is hosted. While git is free and
open source, GitHub is a paid service by Microsoft, but free in a lot of
cases.

[pre-commit](https://pypi.org/project/pre-commit/) is a command line application
that uses git hooks to automatically execute code. This allows you to avoid
style issues and other code quality issues. After you entered `pre-commit install`
once in your local copy of pypdf, it will automatically be executed when
you `git commit`.

## Commit Messages

Having a clean commit message helps people to quickly understand what the commit
is about, without actually looking at the changes. The first line of the
commit message is used to [auto-generate the CHANGELOG](https://github.com/py-pdf/pypdf/blob/main/make_release.py).
For this reason, the format should be:

```
PREFIX: DESCRIPTION

BODY
```

The `PREFIX` can be:

* `SEC`: Security improvements. Typically an infinite loop that was possible.
* `BUG`: A bug was fixed. Likely there is one or multiple issues. Then write in
   the `BODY`: `Closes #123` where 123 is the issue number on GitHub.
   It would be absolutely amazing if you could write a regression test in those
   cases. That is a test that would fail without the fix.
   A bug is always an issue for pypdf users - test code or CI that was fixed is
   not considered a bug here.
* `ENH`: A new feature! Describe in the body what it can be used for.
* `DEP`: A deprecation. Either marking something as "this is going to be removed"
   or actually removing it.
* `PI`: A performance improvement. This could also be a reduction in the
        file size of PDF files generated by pypdf.
* `ROB`: A robustness change. Dealing better with broken PDF files.
* `DOC`: A documentation change.
* `TST`: Adding or adjusting tests.
* `DEV`: Developer experience improvements, e.g. pre-commit or setting up CI.
* `MAINT`: Quite a lot of different stuff. Performance improvements are for sure
           the most interesting changes in here. Refactorings as well.
* `STY`: A style change. Something that makes pypdf code more consistent.
         Typically a small change. It could also be better error messages for
         end users.

The prefix is used to generate the CHANGELOG. Every PR must have exactly one -
if you feel like several match, take the top one from this list that matches for
your PR.

## Pull Request Size

Smaller Pull Requests (PRs) are preferred as it's typically easier to merge
them. For example, if you have some typos, a few code-style changes, a new
feature, and a bug-fix, that could be 3 or 4 PRs.

A PR must be complete. That means if you introduce a new feature it must be
finished within the PR and have a test for that feature.

## Benchmarks

We need to keep an eye on performance and thus we have a few benchmarks.

See [py-pdf.github.io/pypdf/dev/bench](https://py-pdf.github.io/pypdf/dev/bench/)