1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
|
# Developer Intro
pypdf is a library and hence its users are developers. This document is not for
the users, but for people who want to work on pypdf itself.
## Installing Requirements
```
pip install -r requirements/dev.txt
```
## Running Tests
See [testing pypdf with pytest](testing.md)
## The sample-files git submodule
The reason for having the submodule `sample-files` is that we want to keep
the size of the pypdf repository small while we also want to have an extensive
test suite. Those two goals contradict each other.
The `resources` folder should contain a select set of core examples that cover
most cases we typically want to test for. The `sample-files` might cover a lot
more edge cases, the behavior we get when file sizes get bigger, different
PDF producers.
In order to get the sample-files folder, you need to execute:
```
git submodule update --init
```
## Tools: git and pre-commit
Git is a command line application for version control. If you don't know it,
you can [play ohmygit](https://ohmygit.org/) to learn it.
GitHub is the service where the pypdf project is hosted. While git is free and
open source, GitHub is a paid service by Microsoft - but for free in lot of
cases.
[pre-commit](https://pypi.org/project/pre-commit/) is a command line application
that uses git hooks to automatically execute code. This allows you to avoid
style issues and other code quality issues. After you entered `pre-commit install`
once in your local copy of pypdf, it will automatically be executed when
you `git commit`.
## Commit Messages
Having a clean commit message helps people to quickly understand what the commit
was about, without actually looking at the changes. The first line of the
commit message is used to [auto-generate the CHANGELOG](https://github.com/py-pdf/pypdf/blob/main/make_changelog.py). For this reason, the format should be:
```
PREFIX: DESCRIPTION
BODY
```
The `PREFIX` can be:
* `BUG`: A bug was fixed. Likely there is one or multiple issues. Then write in
the `BODY`: `Closes #123` where 123 is the issue number on GitHub.
It would be absolutely amazing if you could write a regression test in those
cases. That is a test that would fail without the fix.
* `ENH`: A new feature! Describe in the body what it can be used for.
* `DEP`: A deprecation - either marking something as "this is going to be removed"
or actually removing it.
* `PI`: A performance improvement. This could also be a reduction in the
file size of PDF files generated by pypdf.
* `ROB`: A robustness change. Dealing better with broken PDF files.
* `DOC`: A documentation change.
* `TST`: Adding / adjusting tests.
* `DEV`: Developer experience improvements - e.g. pre-commit or setting up CI
* `MAINT`: Quite a lot of different stuff. Performance improvements are for sure
the most interesting changes in here. Refactorings as well.
* `STY`: A style change. Something that makes pypdf code more consistent.
Typically a small change.
## Benchmarks
We need to keep an eye on performance and thus we have a few benchmarks.
See [py-pdf.github.io/pypdf/dev/bench](https://py-pdf.github.io/pypdf/dev/bench/)
|