File: intro.md

package info (click to toggle)
pypdf 3.4.1-1%2Bdeb12u1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 15,908 kB
  • sloc: python: 31,146; makefile: 44; sh: 2
file content (83 lines) | stat: -rw-r--r-- 3,176 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
# Developer Intro

pypdf is a library and hence its users are developers. This document is not for
the users, but for people who want to work on pypdf itself.

## Installing Requirements

```
pip install -r requirements/dev.txt
```

## Running Tests

See [testing pypdf with pytest](testing.md)

## The sample-files git submodule
The reason for having the submodule `sample-files` is that we want to keep
the size of the pypdf repository small while we also want to have an extensive
test suite. Those two goals contradict each other.

The `resources` folder should contain a select set of core examples that cover
most cases we typically want to test for. The `sample-files` might cover a lot
more edge cases, the behavior we get when file sizes get bigger, different
PDF producers.

In order to get the sample-files folder, you need to execute:

```
git submodule update --init
```

## Tools: git and pre-commit

Git is a command line application for version control. If you don't know it,
you can [play ohmygit](https://ohmygit.org/) to learn it.

GitHub is the service where the pypdf project is hosted. While git is free and
open source, GitHub is a paid service by Microsoft - but for free in lot of
cases.

[pre-commit](https://pypi.org/project/pre-commit/) is a command line application
that uses git hooks to automatically execute code. This allows you to avoid
style issues and other code quality issues. After you entered `pre-commit install`
once in your local copy of pypdf, it will automatically be executed when
you `git commit`.

## Commit Messages

Having a clean commit message helps people to quickly understand what the commit
was about, without actually looking at the changes. The first line of the
commit message is used to [auto-generate the CHANGELOG](https://github.com/py-pdf/pypdf/blob/main/make_changelog.py). For this reason, the format should be:

```
PREFIX: DESCRIPTION

BODY
```

The `PREFIX` can be:

* `BUG`: A bug was fixed. Likely there is one or multiple issues. Then write in
   the `BODY`: `Closes #123` where 123 is the issue number on GitHub.
   It would be absolutely amazing if you could write a regression test in those
   cases. That is a test that would fail without the fix.
* `ENH`: A new feature! Describe in the body what it can be used for.
* `DEP`: A deprecation - either marking something as "this is going to be removed"
   or actually removing it.
* `PI`: A performance improvement. This could also be a reduction in the
        file size of PDF files generated by pypdf.
* `ROB`: A robustness change. Dealing better with broken PDF files.
* `DOC`: A documentation change.
* `TST`: Adding / adjusting tests.
* `DEV`: Developer experience improvements - e.g. pre-commit or setting up CI
* `MAINT`: Quite a lot of different stuff. Performance improvements are for sure
           the most interesting changes in here. Refactorings as well.
* `STY`: A style change. Something that makes pypdf code more consistent.
         Typically a small change.

## Benchmarks

We need to keep an eye on performance and thus we have a few benchmarks.

See [py-pdf.github.io/pypdf/dev/bench](https://py-pdf.github.io/pypdf/dev/bench/)