File: README.md

package info (click to toggle)
rustc 1.85.0%2Bdfsg2-3
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 893,176 kB
  • sloc: xml: 158,127; python: 35,830; javascript: 19,497; cpp: 19,002; sh: 17,245; ansic: 13,127; asm: 4,376; makefile: 1,051; lisp: 29; perl: 29; ruby: 19; sql: 11
file content (92 lines) | stat: -rw-r--r-- 2,064 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
# Html parser

A simple and general purpose html/xhtml parser lib/bin, using [Pest](https://pest.rs/).

## Features

- Parse html & xhtml (not xml processing instructions)
- Parse html-documents
- Parse html-fragments
- Parse empty documents
- Parse with the same api for both documents and fragments
- Parse custom, non-standard, elements; `<cat/>`, `<Cat/>` and `<C4-t/>`
- Removes comments
- Removes dangling elements
- Iterate over all nodes in the dom three

## What is it not

- It's not a high-performance browser-grade parser
- It's not suitable for html validation
- It's not a parser that includes element selection or dom manipulation

If your requirements matches any of the above, then you're most likely looking for one of the crates below:

- [html5ever](https://crates.io/crates/html5ever)
- [kuchiki](https://crates.io/crates/kuchiki)
- [scraper](https://crates.io/crates/scraper)
- or other crates using the `html5ever` parser

## Examples bin

Parse html file

```shell
html_parser index.html

```

Parse stdin with pretty output

```shell
curl <website> | html_parser -p
```

## Examples lib

Parse html document

```rust
    use html_parser::Dom;

    fn main() {
        let html = r#"
            <!doctype html>
            <html lang="en">
                <head>
                    <meta charset="utf-8">
                    <title>Html parser</title>
                </head>
                <body>
                    <h1 id="a" class="b c">Hello world</h1>
                    </h1> <!-- comments & dangling elements are ignored -->
                </body>
            </html>"#;

        assert!(Dom::parse(html).is_ok());
    }
```

Parse html fragment

```rust
    use html_parser::Dom;

    fn main() {
        let html = "<div id=cat />";
        assert!(Dom::parse(html).is_ok());
    }
```

Print to json

```rust
    use html_parser::{Dom, Result};

    fn main() -> Result<()> {
        let html = "<div id=cat />";
        let json = Dom::parse(html)?.to_json_pretty()?;
        println!("{}", json);
        Ok(())
    }
```