1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232
|
<p align="center">
<img src="https://raw.github.com/pest-parser/pest/master/pest-logo.svg?sanitize=true" width="80%"/>
</p>
# pest. The Elegant Parser
[](https://gitter.im/pest-parser/pest?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
[](https://pest.rs/book)
[](https://docs.rs/pest)
[](https://github.com/pest-parser/pest/actions/workflows/ci.yml)
[](https://codecov.io/gh/pest-parser/pest)
<a href="https://blog.rust-lang.org/2021/11/01/Rust-1.61.0.html"><img alt="Rustc Version 1.61.0+" src="https://img.shields.io/badge/rustc-1.61.0%2B-lightgrey.svg"/></a>
[](https://crates.io/crates/pest)
[](https://crates.io/crates/pest)
pest is a general purpose parser written in Rust with a focus on accessibility,
correctness, and performance. It uses parsing expression grammars
(or [PEG]) as input, which are similar in spirit to regular expressions, but
which offer the enhanced expressivity needed to parse complex languages.
[PEG]: https://en.wikipedia.org/wiki/Parsing_expression_grammar
## Getting started
The recommended way to start parsing with pest is to read the official [book].
Other helpful resources:
* API reference on [docs.rs]
* play with grammars and share them on our [fiddle]
* find previous common questions answered or ask questions on [GitHub Discussions]
* leave feedback, ask questions, or greet us on [Gitter] or [Discord]
[book]: https://pest.rs/book
[docs.rs]: https://docs.rs/pest
[fiddle]: https://pest.rs/#editor
[Gitter]: https://gitter.im/pest-parser/pest
[Discord]: https://discord.gg/XEGACtWpT2
[GitHub Discussions]: https://github.com/pest-parser/pest/discussions
## Example
The following is an example of a grammar for a list of alphanumeric identifiers
where all identifiers don't start with a digit:
```rust
alpha = { 'a'..'z' | 'A'..'Z' }
digit = { '0'..'9' }
ident = { !digit ~ (alpha | digit)+ }
ident_list = _{ ident ~ (" " ~ ident)* }
// ^
// ident_list rule is silent which means it produces no tokens
```
Grammars are saved in separate .pest files which are never mixed with procedural
code. This results in an always up-to-date formalization of a language that is
easy to read and maintain.
## Meaningful error reporting
Based on the grammar definition, the parser also includes automatic error
reporting. For the example above, the input `"123"` will result in:
```
thread 'main' panicked at ' --> 1:1
|
1 | 123
| ^---
|
= unexpected digit', src/main.rs:12
```
while `"ab *"` will result in:
```
thread 'main' panicked at ' --> 1:1
|
1 | ab *
| ^---
|
= expected ident', src/main.rs:12
```
These error messages can be obtained from their default `Display` implementation,
e.g. `panic!("{}", parser_result.unwrap_err())` or `println!("{}", e)`.
## Pairs API
The grammar can be used to derive a `Parser` implementation automatically.
Parsing returns an iterator of nested token pairs:
```rust
use pest_derive::Parser;
use pest::Parser;
#[derive(Parser)]
#[grammar = "ident.pest"]
struct IdentParser;
fn main() {
let pairs = IdentParser::parse(Rule::ident_list, "a1 b2").unwrap_or_else(|e| panic!("{}", e));
// Because ident_list is silent, the iterator will contain idents
for pair in pairs {
// A pair is a combination of the rule which matched and a span of input
println!("Rule: {:?}", pair.as_rule());
println!("Span: {:?}", pair.as_span());
println!("Text: {}", pair.as_str());
// A pair can be converted to an iterator of the tokens which make it up:
for inner_pair in pair.into_inner() {
match inner_pair.as_rule() {
Rule::alpha => println!("Letter: {}", inner_pair.as_str()),
Rule::digit => println!("Digit: {}", inner_pair.as_str()),
_ => unreachable!()
};
}
}
}
```
This produces the following output:
```
Rule: ident
Span: Span { start: 0, end: 2 }
Text: a1
Letter: a
Digit: 1
Rule: ident
Span: Span { start: 3, end: 5 }
Text: b2
Letter: b
Digit: 2
```
### Defining multiple parsers in a single file
The current automatic `Parser` derivation will produce the `Rule` enum
which would have name conflicts if one tried to define multiple such structs
that automatically derive `Parser`. One possible way around it is to put each
parser struct in a separate namespace:
```rust
mod a {
#[derive(Parser)]
#[grammar = "a.pest"]
pub struct ParserA;
}
mod b {
#[derive(Parser)]
#[grammar = "b.pest"]
pub struct ParserB;
}
```
## Other features
* Precedence climbing
* Input handling
* Custom errors
* Runs on stable Rust
## Projects using pest
You can find more projects and ecosystem tools in the [awesome-pest](https://github.com/pest-parser/awesome-pest) repo.
* [pest_meta](https://github.com/pest-parser/pest/blob/master/meta/src/grammar.pest) (bootstrapped)
* [AshPaper](https://github.com/shnewto/ashpaper)
* [brain](https://github.com/brain-lang/brain)
* [cicada](https://github.com/mitnk/cicada)
* [comrak](https://github.com/kivikakk/comrak)
* [elastic-rs](https://github.com/cch123/elastic-rs)
* [graphql-parser](https://github.com/Keats/graphql-parser)
* [handlebars-rust](https://github.com/sunng87/handlebars-rust)
* [hexdino](https://github.com/Luz/hexdino)
* [Huia](https://gitlab.com/jimsy/huia/)
* [insta](https://github.com/mitsuhiko/insta)
* [jql](https://github.com/yamafaktory/jql)
* [json5-rs](https://github.com/callum-oakley/json5-rs)
* [mt940](https://github.com/svenstaro/mt940-rs)
* [Myoxine](https://github.com/d3bate/myoxine)
* [py_literal](https://github.com/jturner314/py_literal)
* [rouler](https://github.com/jarcane/rouler)
* [RuSh](https://github.com/lwandrebeck/RuSh)
* [rs_pbrt](https://github.com/wahn/rs_pbrt)
* [stache](https://github.com/dgraham/stache)
* [tera](https://github.com/Keats/tera)
* [ui_gen](https://github.com/emoon/ui_gen)
* [ukhasnet-parser](https://github.com/adamgreig/ukhasnet-parser)
* [ZoKrates](https://github.com/ZoKrates/ZoKrates)
* [Vector](https://github.com/timberio/vector)
* [AutoCorrect](https://github.com/huacnlee/autocorrect)
* [yaml-peg](https://github.com/aofdev/yaml-peg)
* [qubit](https://github.com/abhimanyu003/qubit)
* [caith](https://github.com/Geobert/caith) (a dice roller crate)
* [Melody](https://github.com/yoav-lavi/melody)
* [json5-nodes](https://github.com/jlyonsmith/json5-nodes)
* [prisma](https://github.com/prisma/prisma)
## Minimum Supported Rust Version (MSRV)
This library should always compile with default features on **Rust 1.61.0**.
## no_std support
The `pest` and `pest_derive` crates can be built without the Rust standard
library and target embedded environments. To do so, you need to disable
their default features. In your `Cargo.toml`, you can specify it as follows:
```toml
[dependencies]
# ...
pest = { version = "2", default-features = false }
pest_derive = { version = "2", default-features = false }
```
If you want to build these crates in the pest repository's workspace, you can
pass the `--no-default-features` flag to `cargo` and specify these crates using
the `--package` (`-p`) flag. For example:
```bash
$ cargo build --target thumbv7em-none-eabihf --no-default-features -p pest
$ cargo bootstrap
$ cargo build --target thumbv7em-none-eabihf --no-default-features -p pest_derive
```
## Special thanks
A special round of applause goes to prof. Marius Minea for his guidance and all
pest contributors, some of which being none other than my friends.
|