1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102
|
# html2text
[](https://github.com/Alir3z4/html2text/actions/workflows/main.yml)
[](https://codecov.io/gh/Alir3z4/html2text)
html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).
Usage: `html2text [filename [encoding]]`
| Option | Description
|--------------------------------------------------------|---------------------------------------------------
| `--version` | Show program's version number and exit
| `-h`, `--help` | Show this help message and exit
| `--ignore-links` | Don't include any formatting for links
|`--escape-all` | Escape all special characters. Output is less readable, but avoids corner case formatting issues.
| `--reference-links` | Use reference links instead of links to create markdown
| `--mark-code` | Mark preformatted and code blocks with [code]...[/code]
For a complete list of options see the [docs](https://github.com/Alir3z4/html2text/blob/master/docs/usage.md)
Or you can use it from within `Python`:
```
>>> import html2text
>>>
>>> print(html2text.html2text("<p><strong>Zed's</strong> dead baby, <em>Zed's</em> dead.</p>"))
**Zed's** dead baby, _Zed's_ dead.
```
Or with some configuration options:
```
>>> import html2text
>>>
>>> h = html2text.HTML2Text()
>>> # Ignore converting links from HTML
>>> h.ignore_links = True
>>> print h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!")
Hello, world!
>>> print(h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!"))
Hello, world!
>>> # Don't Ignore links anymore, I like links
>>> h.ignore_links = False
>>> print(h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!"))
Hello, [world](https://www.google.com/earth/)!
```
*Originally written by Aaron Swartz. This code is distributed under the GPLv3.*
## How to install
`html2text` is available on pypi
https://pypi.org/project/html2text/
```shell
$ pip install html2text
```
## Development
### How to run unit tests
```shell
$ tox
```
To see the coverage results:
```shell
$ coverage html
```
then open the `./htmlcov/index.html` file in your browser.
### Code Quality & Pre Commit
The CI runs several linting steps, including:
- mypy
- Flake8
- Black
To make sure the code passes the CI linting steps, run:
```shell
$ tox -e pre-commit
```
## Documentation
Documentation lives [here](https://github.com/Alir3z4/html2text/blob/master/docs/usage.md)
|