1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331
|
# nom, eating data byte by byte
[](LICENSE)
[](https://gitter.im/Geal/nom?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
[](https://github.com/Geal/nom/actions/workflows/ci.yml)
[](https://coveralls.io/github/Geal/nom?branch=main)
[](https://crates.io/crates/nom)
[](#rust-version-requirements-msrv)
nom is a parser combinators library written in Rust. Its goal is to provide tools
to build safe parsers without compromising the speed or memory consumption. To
that end, it uses extensively Rust's *strong typing* and *memory safety* to produce
fast and correct parsers, and provides functions, macros and traits to abstract most of the
error prone plumbing.

*nom will happily take a byte out of your files :)*
<!-- toc -->
- [Example](#example)
- [Documentation](#documentation)
- [Why use nom?](#why-use-nom)
- [Binary format parsers](#binary-format-parsers)
- [Text format parsers](#text-format-parsers)
- [Programming language parsers](#programming-language-parsers)
- [Streaming formats](#streaming-formats)
- [Parser combinators](#parser-combinators)
- [Technical features](#technical-features)
- [Rust version requirements](#rust-version-requirements-msrv)
- [Installation](#installation)
- [Related projects](#related-projects)
- [Parsers written with nom](#parsers-written-with-nom)
- [Contributors](#contributors)
<!-- tocstop -->
## Example
[Hexadecimal color](https://developer.mozilla.org/en-US/docs/Web/CSS/color) parser:
```rust
extern crate nom;
use nom::{
IResult,
bytes::complete::{tag, take_while_m_n},
combinator::map_res,
sequence::tuple
};
#[derive(Debug,PartialEq)]
pub struct Color {
pub red: u8,
pub green: u8,
pub blue: u8,
}
fn from_hex(input: &str) -> Result<u8, std::num::ParseIntError> {
u8::from_str_radix(input, 16)
}
fn is_hex_digit(c: char) -> bool {
c.is_digit(16)
}
fn hex_primary(input: &str) -> IResult<&str, u8> {
map_res(
take_while_m_n(2, 2, is_hex_digit),
from_hex
)(input)
}
fn hex_color(input: &str) -> IResult<&str, Color> {
let (input, _) = tag("#")(input)?;
let (input, (red, green, blue)) = tuple((hex_primary, hex_primary, hex_primary))(input)?;
Ok((input, Color { red, green, blue }))
}
fn main() {}
#[test]
fn parse_color() {
assert_eq!(hex_color("#2F14DF"), Ok(("", Color {
red: 47,
green: 20,
blue: 223,
})));
}
```
## Documentation
- [Reference documentation](https://docs.rs/nom)
- [Various design documents and tutorials](https://github.com/Geal/nom/tree/main/doc)
- [List of combinators and their behaviour](https://github.com/Geal/nom/blob/main/doc/choosing_a_combinator.md)
If you need any help developing your parsers, please ping `geal` on IRC (libera, geeknode, oftc), go to `#nom-parsers` on Libera IRC, or on the [Gitter chat room](https://gitter.im/Geal/nom).
## Why use nom
If you want to write:
### Binary format parsers
nom was designed to properly parse binary formats from the beginning. Compared
to the usual handwritten C parsers, nom parsers are just as fast, free from
buffer overflow vulnerabilities, and handle common patterns for you:
- [TLV](https://en.wikipedia.org/wiki/Type-length-value)
- Bit level parsing
- Hexadecimal viewer in the debugging macros for easy data analysis
- Streaming parsers for network formats and huge files
Example projects:
- [FLV parser](https://github.com/rust-av/flavors)
- [Matroska parser](https://github.com/rust-av/matroska)
- [tar parser](https://github.com/Keruspe/tar-parser.rs)
### Text format parsers
While nom was made for binary format at first, it soon grew to work just as
well with text formats. From line based formats like CSV, to more complex, nested
formats such as JSON, nom can manage it, and provides you with useful tools:
- Fast case insensitive comparison
- Recognizers for escaped strings
- Regular expressions can be embedded in nom parsers to represent complex character patterns succinctly
- Special care has been given to managing non ASCII characters properly
Example projects:
- [HTTP proxy](https://github.com/sozu-proxy/sozu/tree/main/lib/src/protocol/http/parser)
- [TOML parser](https://github.com/joelself/tomllib)
### Programming language parsers
While programming language parsers are usually written manually for more
flexibility and performance, nom can be (and has been successfully) used
as a prototyping parser for a language.
nom will get you started quickly with powerful custom error types, that you
can leverage with [nom_locate](https://github.com/fflorent/nom_locate) to
pinpoint the exact line and column of the error. No need for separate
tokenizing, lexing and parsing phases: nom can automatically handle whitespace
parsing, and construct an AST in place.
Example projects:
- [PHP VM](https://github.com/tagua-vm/parser)
- eve language prototype
- [xshade shading language](https://github.com/xshade-lang/xshade/)
### Streaming formats
While a lot of formats (and the code handling them) assume that they can fit
the complete data in memory, there are formats for which we only get a part
of the data at once, like network formats, or huge files.
nom has been designed for a correct behaviour with partial data: If there is
not enough data to decide, nom will tell you it needs more instead of silently
returning a wrong result. Whether your data comes entirely or in chunks, the
result should be the same.
It allows you to build powerful, deterministic state machines for your protocols.
Example projects:
- [HTTP proxy](https://github.com/sozu-proxy/sozu/tree/main/lib/src/protocol/http/parser)
- [Using nom with generators](https://github.com/Geal/generator_nom)
## Parser combinators
Parser combinators are an approach to parsers that is very different from
software like [lex](https://en.wikipedia.org/wiki/Lex_(software)) and
[yacc](https://en.wikipedia.org/wiki/Yacc). Instead of writing the grammar
in a separate file and generating the corresponding code, you use very
small functions with very specific purpose, like "take 5 bytes", or
"recognize the word 'HTTP'", and assemble them in meaningful patterns
like "recognize 'HTTP', then a space, then a version".
The resulting code is small, and looks like the grammar you would have
written with other parser approaches.
This has a few advantages:
- The parsers are small and easy to write
- The parsers components are easy to reuse (if they're general enough, please add them to nom!)
- The parsers components are easy to test separately (unit tests and property-based tests)
- The parser combination code looks close to the grammar you would have written
- You can build partial parsers, specific to the data you need at the moment, and ignore the rest
## Technical features
nom parsers are for:
- [x] **byte-oriented**: The basic type is `&[u8]` and parsers will work as much as possible on byte array slices (but are not limited to them)
- [x] **bit-oriented**: nom can address a byte slice as a bit stream
- [x] **string-oriented**: The same kind of combinators can apply on UTF-8 strings as well
- [x] **zero-copy**: If a parser returns a subset of its input data, it will return a slice of that input, without copying
- [x] **streaming**: nom can work on partial data and detect when it needs more data to produce a correct result
- [x] **descriptive errors**: The parsers can aggregate a list of error codes with pointers to the incriminated input slice. Those error lists can be pattern matched to provide useful messages.
- [x] **custom error types**: You can provide a specific type to improve errors returned by parsers
- [x] **safe parsing**: nom leverages Rust's safe memory handling and powerful types, and parsers are routinely fuzzed and tested with real world data. So far, the only flaws found by fuzzing were in code written outside of nom
- [x] **speed**: Benchmarks have shown that nom parsers often outperform many parser combinators library like Parsec and attoparsec, some regular expression engines and even handwritten C parsers
Some benchmarks are available on [Github](https://github.com/Geal/nom_benchmarks).
## Rust version requirements (MSRV)
The 7.0 series of nom supports **Rustc version 1.48 or greater**. It is known to work properly on Rust 1.41.1 but there is no guarantee it will stay the case through this major release.
The current policy is that this will only be updated in the next major nom release.
## Installation
nom is available on [crates.io](https://crates.io/crates/nom) and can be included in your Cargo enabled project like this:
```toml
[dependencies]
nom = "7"
```
There are a few compilation features:
* `alloc`: (activated by default) if disabled, nom can work in `no_std` builds without memory allocators. If enabled, combinators that allocate (like `many0`) will be available
* `std`: (activated by default, activates `alloc` too) if disabled, nom can work in `no_std` builds
You can configure those features like this:
```toml
[dependencies.nom]
version = "7"
default-features = false
features = ["alloc"]
```
# Related projects
- [Get line and column info in nom's input type](https://github.com/fflorent/nom_locate)
- [Using nom as lexer and parser](https://github.com/Rydgel/monkey-rust)
# Parsers written with nom
Here is a (non exhaustive) list of known projects using nom:
- Text file formats: [Ceph Crush](https://github.com/cholcombe973/crushtool),
[Cronenberg](https://github.com/ayrat555/cronenberg),
[XFS Runtime Stats](https://github.com/ChrisMacNaughton/xfs-rs),
[CSV](https://github.com/GuillaumeGomez/csv-parser),
[FASTA](https://github.com/TianyiShi2001/nom-fasta),
[FASTQ](https://github.com/elij/fastq.rs),
[INI](https://github.com/Geal/nom/blob/main/tests/ini.rs),
[ISO 8601 dates](https://github.com/badboy/iso8601),
[libconfig-like configuration file format](https://github.com/filipegoncalves/rust-config),
[Web archive](https://github.com/sbeckeriv/warc_nom_parser),
[PDB](https://github.com/TianyiShi2001/nom-pdb),
[proto files](https://github.com/tafia/protobuf-parser),
[Fountain screenplay markup](https://github.com/adamchalmers/fountain-rs),
[vimwiki](https://github.com/chipsenkbeil/vimwiki-server/tree/master/vimwiki) & [vimwiki_macros](https://github.com/chipsenkbeil/vimwiki-server/tree/master/vimwiki_macros)
- Programming languages:
[PHP](https://github.com/tagua-vm/parser),
[Basic Calculator](https://github.com/balajisivaraman/basic_calculator_rs),
[GLSL](https://github.com/phaazon/glsl),
[Lua](https://github.com/doomrobo/nom-lua53),
[Python](https://github.com/ProgVal/rust-python-parser),
[SQL](https://github.com/ms705/nom-sql),
[Elm](https://github.com/cout970/Elm-interpreter),
[SystemVerilog](https://github.com/dalance/sv-parser),
[Turtle](https://github.com/vandenoever/rome/tree/master/src/io/turtle),
[CSML](https://github.com/CSML-by-Clevy/csml-interpreter),
[Wasm](https://github.com/Strytyp/wasm-nom),
[Pseudocode](https://github.com/Gungy2/pseudocode)
[Filter for MeiliSearch](https://github.com/meilisearch/meilisearch)
- Interface definition formats: [Thrift](https://github.com/thehydroimpulse/thrust)
- Audio, video and image formats:
[GIF](https://github.com/Geal/gif.rs),
[MagicaVoxel .vox](https://github.com/davidedmonds/dot_vox),
[midi](https://github.com/derekdreery/nom-midi-rs),
[SWF](https://github.com/open-flash/swf-parser),
[WAVE](http://github.com/noise-Labs/wave),
[Matroska (MKV)](https://github.com/rust-av/matroska)
- Document formats:
[TAR](https://github.com/Keruspe/tar-parser.rs),
[GZ](https://github.com/nharward/nom-gzip),
[GDSII](https://github.com/erihsu/gds2-io)
- Cryptographic formats:
[X.509](https://github.com/rusticata/x509-parser)
- Network protocol formats:
[Bencode](https://github.com/jbaum98/bencode.rs),
[D-Bus](https://github.com/toshokan/misato),
[DHCP](https://github.com/rusticata/dhcp-parser),
[HTTP](https://github.com/sozu-proxy/sozu/tree/main/lib/src/protocol/http),
[URI](https://github.com/santifa/rrp/blob/master/src/uri.rs),
[IMAP](https://github.com/djc/tokio-imap),
[IRC](https://github.com/Detegr/RBot-parser),
[Pcap-NG](https://github.com/richo/pcapng-rs),
[Pcap](https://github.com/ithinuel/pcap-rs),
[Pcap + PcapNG](https://github.com/rusticata/pcap-parser),
[IKEv2](https://github.com/rusticata/ipsec-parser),
[NTP](https://github.com/rusticata/ntp-parser),
[SNMP](https://github.com/rusticata/snmp-parser),
[Kerberos v5](https://github.com/rusticata/kerberos-parser),
[DER](https://github.com/rusticata/der-parser),
[TLS](https://github.com/rusticata/tls-parser),
[IPFIX / Netflow v10](https://github.com/dominotree/rs-ipfix),
[GTP](https://github.com/fuerstenau/gorrosion-gtp),
[SIP](https://github.com/armatusmiles/sipcore/tree/master/crates/sipmsg),
[Prometheus](https://github.com/timberio/vector/blob/master/lib/prometheus-parser/src/line.rs)
- Language specifications:
[BNF](https://github.com/snewt/bnf)
- Misc formats:
[Gameboy ROM](https://github.com/MarkMcCaskey/gameboy-rom-parser),
[ANT FIT](https://github.com/stadelmanma/fitparse-rs),
[Version Numbers](https://github.com/fosskers/rs-versions),
[Telcordia/Bellcore SR-4731 SOR OTDR files](https://github.com/JamesHarrison/otdrs),
[MySQL binary log](https://github.com/PrivateRookie/boxercrab),
[URI](https://github.com/Skasselbard/nom-uri),
[Furigana](https://github.com/sachaarbonel/furigana.rs),
[Wordle Result](https://github.com/Fyko/wordle-stats/tree/main/parser)
Want to create a new parser using `nom`? A list of not yet implemented formats is available [here](https://github.com/Geal/nom/issues/14).
Want to add your parser here? Create a pull request for it!
# Contributors
nom is the fruit of the work of many contributors over the years, many thanks for your help!
<a href="https://github.com/geal/nom/graphs/contributors">
<img src="https://contributors-img.web.app/image?repo=geal/nom" />
</a>
|