File: README

package info (click to toggle)
storm-lang 0.7.4-1
links: PTS, VCS
area: main
in suites: forky
size: 52,004 kB
sloc: ansic: 261,462; cpp: 140,405; sh: 14,891; perl: 9,846; python: 2,525; lisp: 2,504; asm: 860; makefile: 678; pascal: 70; java: 52; xml: 37; awk: 12
file content (96 lines) | stat: -rw-r--r-- 4,889 bytes
parent folder | download | duplicates (2)
Parser Library
==============

The parser library (in the `parser` package) implements various parsers that are faster (but weaker)
than the compiler's parser. In general, they do not support all functionality of the compiler's
parser, and are not as flexible. On the other hand, they are pre-compiled (and have no startup time)
and are much faster. They are also able to parse text on any thread, and not locked to the Compiler
thread as the compiler's parser.

The limitations and characteristics of the parsers are outlined below. If a parser is fed grammar
that is incompatible with the parser, an error is generated at compile-time unless otherwise noted.
Another general difference between the compiler's parser and the parsers in this library is that
these parsers typically start evaluating transform functions before parsing is complete. As such,
some transform functions might be executed even if the string is not in the language described by
the grammar.

The following parsers are supported:

- **Recursive Descent** (names: `recursive`, `recursive descent`)

  The recursive descent parser accepts LL(1) grammars, which essentially means:
  - No left-recursion is allowed.
  - Multiple productions for a non-terminal needs to be distinguished by the first token present
    (either directly or indirectly).

  This parser transforms the parse tree directly, and as such it is not possible to capture parts of
  the parse tree in the syntax transformations. Due to how the parser works, it is also not possible
  to pass parameters to productions from tokens appearing later in the text.


- **Backtracking Recursive Descent** (names: `backtracking`, `backtracking recursive`, `backtracking recursive descent`)

  This is an extension of the recursive descent parser. It allows parsing grammars that are not LL(1)
  by utilizing backtracking. This means that the backtracking parser is potentially slower than the
  non-backtracking variant. It also means that side-effects from backtracking attempts might be
  visible to the actions specified in the grammar. For example, if the backtracking parser parses a
  production successfully but later determines that it does not match, the side-effects of the
  production will still be visible (e.g. object creation, member function calls).


Syntax
------

The Basic Storm syntax is described below.

To generate a parser, use the following syntax:

```
<name> : parser(<type>, [binary], [text and binary]) {
    start = <start rule>;
    <syntax>;
}
```

- `<name>` is the name of the generated parser. A function with the name `<name>` will be
  generated. It can be called as either `<name>(<input>)` or `<name>(<input>, <start>)`, where
  `<input>` is a string and `<start>` is an iterator.

- `<type>` is the type of the parser to use. It can be any of the ones mentioned above. For example
  `recursive` or `recursive descent` uses the recursive descent parser, or `backtracking` uses the
  backtracking recursive descent parser.

- `[binary]` generates a parser in binary mode if it is included. This means that the parser accepts
  its input as a `Buffer` instead of a `Str`, and iterators are `Nat` values, not `Str:Iter`. As a
  consequence of this, captured parts of the input evaluate to `Buffer` instead of `Str`. Therefore,
  you typically want to declare all syntax inside the parser block if you use the binary
  mode. Otherwise, you have to ensure that the syntax transforms type-check both for `Str` and
  `Buffer` (it is of course possible to do, but might be tedious).

- `[text and binary]` generates both a parser that accepts strings *and* a parser that accept
  `Buffer`s.

- `<start rule>` indicates the rule that shall be used as the start rule for the parser. The rule
  used here does not need to be defined inside the syntax inside the parser block.

- `<syntax>` is syntax as specified in the syntax language. The syntax will be placed in an
  anonymous sub-package of the current package. Both the anonymous sub-package and the current
  package will be visible by default. `use` statements can be used to import more grammar from other
  packages.


Special Rules
-------------

The package `parser.special` contains a set of special rules. These are special in the sense that
some parsers are able to implement them even if they are not possible to express in context-free
grammars. See the file `syntax.bnf` in the `parser.special` package for more details.

One example of rules that fall into this category are the rules `Bytes(Nat)` and `Chars(Nat)`. These
match a pre-determined number of bytes and codepoints. These are only available in the recursive
descent parsers at the moment.

Another example are the rules `Indent`, `MinIndent`, and `ExactIndent`. They can be used to parse
indent-sensitive languages (such as markdown). They match the indent specified by the supplied
parameters.