1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109
|
# Developer Intro
pypdf is a library and hence its users are developers. This document is not for
the users, but for people who want to work on pypdf itself.
```{note}
Our CI (continuous integration) validates that relevant standards are met with your contribution.
Especially for regular contributors or larger changes, it is highly recommended that you set up your own development environment
to already cover the most important aspects locally. This greatly helps us to reduce the noise compared to when you open an untested
PR early and use our CI to do your debugging and improvements from there. The maintainers usually receive a notification on every push
to a branch where a corresponding PR is open, possibly hiding important notifications.
```
## Installing Requirements
```
pip install -r requirements/dev.txt
```
## Running Tests
See [testing pypdf with pytest](testing.md).
## The sample-files git submodule
The reason for having the submodule `sample-files` is that we want to keep
the size of the pypdf repository small while we also want to have an extensive
test suite. Those two goals contradict each other.
The `resources` folder should contain a select set of core examples that cover
most cases we typically want to test for. The `sample-files` might cover a lot
more edge cases, the behavior we get when file sizes get bigger, different
PDF producers.
To get the sample-files folder, you need to execute:
```
git submodule update --init
```
## Tools: git and pre-commit
Git is a command line application for version control. If you don't know it,
you can [play ohmygit](https://ohmygit.org/) to learn it.
GitHub is the service where the pypdf project is hosted. While git is free and
open source, GitHub is a paid service by Microsoft, but free in a lot of
cases.
[pre-commit](https://pypi.org/project/pre-commit/) is a command line application
that uses git hooks to automatically execute code. This allows you to avoid
style issues and other code quality issues. After you entered `pre-commit install`
once in your local copy of pypdf, it will automatically be executed when
you `git commit`.
## Commit Messages
Having a clean commit message helps people to quickly understand what the commit
is about, without actually looking at the changes. The first line of the
commit message is used to [auto-generate the CHANGELOG](https://github.com/py-pdf/pypdf/blob/main/make_release.py).
For this reason, the format should be:
```
PREFIX: DESCRIPTION
BODY
```
The `PREFIX` can be:
* `SEC`: Security improvements. Typically, an infinite loop that was possible.
* `BUG`: A bug was fixed. Likely there are one or multiple issues. Then write in
the `BODY`: `Closes #123` where 123 is the issue number on GitHub.
It would be absolutely amazing if you could write a regression test in those
cases. That is a test that would fail without the fix.
A bug is always an issue for pypdf users - test code or CI that was fixed is
not considered a bug here.
* `ENH`: A new feature! Describe in the body what it can be used for.
* `DEP`: Deprecation. Either marking something as "this is going to be removed"
or actually removing it.
* `PI`: A performance improvement. This could also be a reduction in the
file size of PDF files generated by pypdf.
* `ROB`: A robustness change. Dealing better with broken PDF files.
* `DOC`: A documentation change.
* `TST`: Adding or adjusting tests.
* `DEV`: Developer experience improvements, e.g., pre-commit or setting up CI.
* `MAINT`: Quite a lot of different stuff. Performance improvements are, for sure,
the most interesting changes in here. Refactorings as well.
* `STY`: A style change. Something that makes pypdf code more consistent.
Typically, a small change. It could also be better error messages for
end users.
The prefix is used to generate the CHANGELOG. Every PR must have exactly one -
if you feel like several match, take the top one from this list that matches for
your PR.
## Pull Request Size
Smaller Pull Requests (PRs) are preferred as it's typically easier to merge
them. For example, if you have some typos, a few code-style changes, a new
feature, and a bug-fix, that could be three or four PRs.
A PR must be complete. That means if you introduce a new feature, it must be
finished within the PR and have a test for that feature.
## Benchmarks
We need to keep an eye on performance, and thus we have a few benchmarks.
See [py-pdf.github.io/pypdf/dev/bench](https://py-pdf.github.io/pypdf/dev/bench/)
|