1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92
|
# Html parser
A simple and general purpose html/xhtml parser lib/bin, using [Pest](https://pest.rs/).
## Features
- Parse html & xhtml (not xml processing instructions)
- Parse html-documents
- Parse html-fragments
- Parse empty documents
- Parse with the same api for both documents and fragments
- Parse custom, non-standard, elements; `<cat/>`, `<Cat/>` and `<C4-t/>`
- Removes comments
- Removes dangling elements
- Iterate over all nodes in the dom three
## What is it not
- It's not a high-performance browser-grade parser
- It's not suitable for html validation
- It's not a parser that includes element selection or dom manipulation
If your requirements matches any of the above, then you're most likely looking for one of the crates below:
- [html5ever](https://crates.io/crates/html5ever)
- [kuchiki](https://crates.io/crates/kuchiki)
- [scraper](https://crates.io/crates/scraper)
- or other crates using the `html5ever` parser
## Examples bin
Parse html file
```shell
html_parser index.html
```
Parse stdin with pretty output
```shell
curl <website> | html_parser -p
```
## Examples lib
Parse html document
```rust
use html_parser::Dom;
fn main() {
let html = r#"
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Html parser</title>
</head>
<body>
<h1 id="a" class="b c">Hello world</h1>
</h1> <!-- comments & dangling elements are ignored -->
</body>
</html>"#;
assert!(Dom::parse(html).is_ok());
}
```
Parse html fragment
```rust
use html_parser::Dom;
fn main() {
let html = "<div id=cat />";
assert!(Dom::parse(html).is_ok());
}
```
Print to json
```rust
use html_parser::{Dom, Result};
fn main() -> Result<()> {
let html = "<div id=cat />";
let json = Dom::parse(html)?.to_json_pretty()?;
println!("{}", json);
Ok(())
}
```
|