File: README.md

package info (click to toggle)
rust-html2text 0.12.6-3
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 444 kB
  • sloc: makefile: 2
file content (124 lines) | stat: -rw-r--r-- 3,684 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
[![jugglerchris](https://circleci.com/gh/jugglerchris/rust-html2text.svg?branch=master&style=svg)](https://app.circleci.com/pipelines/github/jugglerchris/rust-html2text?filter=all)

# html2text

html2text is a [Rust](http://www.rust-lang.org/) crate which converts HTML to
plain text.

It makes use of the [Servo project](https://github.com/servo/servo)'s HTML
parser, [html5ever](https://github.com/servo/html5ever/), using the DOM to
generate text (which can optionally include annotations for some features such
as hyperlinks).

The project aims to do a reasonable job of rendering reasonable HTML in a
terminal or other places where HTML needs to be converted to text (for
example the text/plain fallback in HTML e-mails).

With features (see below) some CSS/colour support is available.

## Examples

The simple functions like `from_read()` return formatted text (in various
formats including plain text).

```rust
use html2text::from_read;
let html = b"
       <ul>
         <li>Item one</li>
         <li>Item two</li>
         <li>Item three</li>
       </ul>";
assert_eq!(from_read(&html[..], 20),
           "\
* Item one
* Item two
* Item three
");
```

A lower level API gives a bit more control.  This give the same result (except for
returning errors as Result instead of panicking):

```rust
use html2text::config;

let html = b"
       <ul>
         <li>Item one</li>
         <li>Item two</li>
         <li>Item three</li>
       </ul>";

assert_eq!(
    config::plain()
           .string_from_read(&html[..], 20)
           .unwrap(),
    "\
* Item one
* Item two
* Item three
");
```

A couple of simple demonstration programs are included as examples:

### html2text

The simplest example uses `from_read` to convert HTML on stdin into plain
text:

```sh
$ cargo run --example html2text < foo.html
[...]
```

### html2term

A very simple example of using the rich interface (`from_read_rich`) for a
slightly interactive console HTML viewer is provided as `html2term`.

```sh
$ cargo run --example html2term foo.html
[...]
```

Note that this example takes the HTML file as a parameter so that it can
read keys from stdin.

## Cargo Features

|Feature| Description|
|-------|------------|
|css    | Limited handling of CSS, adding Coloured nodes to the render tree. |
|html\_trace| Add verbose internal logging (not recommended) |
|html\_trace\_bt| Add backtraces to the verbose internal logging |

### CSS support

When the `css` feature is enabled, some simple CSS handling is available.

Style rules are taken from:
* If `Config::use_doc_css()` is called, then style from the document:
  * `<style>` elements
  * Inline `style` attributes (`<div style="...">`)
  * `<font color=...>`
* Independently of `use_doc_css`, extra rules can be added with `Config::add_css(...)`

The following CSS features are implemented:
* Basic selector matching (including child and descendents, classes and element
  types).
* CSS colors (`color`/`background-color`) will add
  `Coloured(...)`/`BgColoured(...)` nodes to the render tree.
* Rules with `display: none` will cause matching elements to be removed from
  the render tree.

The CSS handling is expected to improve in future (PRs welcome), but not to a full-
blown browser style system, which would be overkill for terminal output.

There are two ways to make use of the colours:
* Use `from_read_rich()` or one of its variants.  One of the annotations you may get
  back is `Colour(..)`.
* Use `from_read_coloured()`.  This is similar to `from_read()`, but you provide
  a function to add terminal colours (or other styling) based on the same
  RichAnnotations.  See examples/html2text.rs for an example using termion.