File: README.md

package info (click to toggle)
ruby-whitequark-parser 3.3.10.1-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 2,848 kB
  • sloc: yacc: 40,706; ruby: 20,588; makefile: 12; sh: 8
file content (334 lines) | stat: -rw-r--r-- 13,094 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
# Parser

[![Gem Version](https://badge.fury.io/rb/parser.svg)](https://badge.fury.io/rb/parser)
[![Tests](https://github.com/whitequark/parser/workflows/Tests/badge.svg?branch=master)](https://github.com/whitequark/parser/actions?query=workflow%3ATests+branch%3Amaster)

_Parser_ is a production-ready Ruby parser written in pure Ruby. It recognizes as
much or more code than Ripper, Melbourne, JRubyParser or ruby\_parser, and
is vastly more convenient to use.

You can also use [unparser](https://github.com/mbj/unparser) to produce
equivalent source code from Parser's ASTs.

Sponsored by [Evil Martians](http://evilmartians.com).
MacRuby and RubyMotion support sponsored by [CodeClimate](http://codeclimate.com).

> [!WARNING]
> The `parser` gem is only compatible with the syntax of Ruby 3.3 and lower. For Ruby 3.4 and later, please use the [`Prism::Translation::Parser`](https://github.com/ruby/prism/blob/main/docs/parser_translation.md) instead.
> Starting in Ruby 3.4, Prism is the parser used in Ruby itself and can produce AST that is identical to the output of the `parser` gem. If you only need to parse Ruby 3.3 (or greater) and don't require compatibility with the `parser` gem AST, also consider using the native Prism AST.
> See this [GitHub issue](https://github.com/whitequark/parser/issues/1046) for more details.
> For a guide on how to use `parser` for older versions and `prism` for newer ones, please see [this guide](./doc/PRISM_TRANSLATION.md).

## Installation

    $ gem install parser

## Usage

Load Parser (see the [backwards compatibility](#backwards-compatibility) section
below for explanation of `emit_*` calls):

```ruby
require 'parser/current'
# opt-in to most recent AST format:
Parser::Builders::Default.emit_lambda              = true
Parser::Builders::Default.emit_procarg0            = true
Parser::Builders::Default.emit_encoding            = true
Parser::Builders::Default.emit_index               = true
Parser::Builders::Default.emit_arg_inside_procarg0 = true
Parser::Builders::Default.emit_forward_arg         = true
Parser::Builders::Default.emit_kwargs              = true
Parser::Builders::Default.emit_match_pattern       = true
```

Parse a chunk of code:

```ruby
p Parser::CurrentRuby.parse("2 + 2")
# (send
#   (int 2) :+
#   (int 2))
```

Access the AST's source map:

```ruby
p Parser::CurrentRuby.parse("2 + 2").loc
# #<Parser::Source::Map::Send:0x007fe5a1ac2388
#   @dot=nil,
#   @begin=nil,
#   @end=nil,
#   @selector=#<Source::Range (string) 2...3>,
#   @expression=#<Source::Range (string) 0...5>>

p Parser::CurrentRuby.parse("2 + 2").loc.selector.source
# "+"
```

Traverse the AST: see the documentation for [gem ast](https://whitequark.github.io/ast/).

Parse a chunk of code and display all diagnostics:

```ruby
parser = Parser::CurrentRuby.new
parser.diagnostics.consumer = lambda do |diag|
  puts diag.render
end

buffer = Parser::Source::Buffer.new('(string)', source: "foo *bar")

p parser.parse(buffer)
# (string):1:5: warning: `*' interpreted as argument prefix
# foo *bar
#     ^
# (send nil :foo
#   (splat
#     (send nil :bar)))
```

If you reuse the same parser object for multiple `#parse` runs, you need to
`#reset` it.

You can also use the `ruby-parse` utility (it's bundled with the gem) to play
with Parser:

    $ ruby-parse -L -e "2+2"
    (send
      (int 2) :+
      (int 2))
    2+2
     ~ selector
    ~~~ expression
    (int 2)
    2+2
    ~ expression
    (int 2)
    2+2

    $ ruby-parse -E -e "2+2"
    2+2
    ^ tINTEGER 2                                    expr_end     [0 <= cond] [0 <= cmdarg]
    2+2
     ^ tPLUS "+"                                    expr_beg     [0 <= cond] [0 <= cmdarg]
    2+2
      ^ tINTEGER 2                                  expr_end     [0 <= cond] [0 <= cmdarg]
    2+2
      ^ false "$eof"                                expr_end     [0 <= cond] [0 <= cmdarg]
    (send
      (int 2) :+
      (int 2))

## Features

* Precise source location reporting.
* [Documented](doc/AST_FORMAT.md) AST format which is convenient to work with.
* A simple interface and a powerful, tweakable one.
* Parses 1.8, 1.9, 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 3.0, 3.1, 3.2, and 3.3 syntax with backwards-compatible
  AST formats.
* Parses MacRuby and RubyMotion syntax extensions.
* [Rewriting][rewriting] support.
* Parsing error recovery.
* Improved [clang-like][] diagnostic messages with location information.
* Written in pure Ruby, runs on MRI >=2.0.0, JRuby and Rubinius (and historically, all versions of Ruby since 1.8)
* Only one runtime dependency: the [ast][] gem.
* [Insane][insane-lexer] Ruby lexer rewritten from scratch in Ragel.
* 100% test coverage for Bison grammars (except error recovery).
* Readable, commented source code.

[clang-like]: http://clang.llvm.org/diagnostics.html
[ast]: https://rubygems.org/gems/ast
[insane-lexer]: http://web.archive.org/web/20210621201915/http://whitequark.org/blog/2013/04/01/ruby-hacking-guide-ch-11-finite-state-lexer/
[rewriting]: http://web.archive.org/web/20220123050223/http://whitequark.org/blog/2013/04/26/lets-play-with-ruby-code/

## Documentation

Documentation for Parser is available [online](https://whitequark.github.io/parser/).

### Node names

Several Parser nodes seem to be confusing enough to warrant a dedicated README section.

#### (block)

The `(block)` node passes a Ruby block, that is, a closure, to a method call represented by its first child, a `(send)`, `(super)` or `(zsuper)` node. To demonstrate:

```bash
$ ruby-parse -e 'foo { |x| x + 2 }'
(block
  (send nil :foo)
  (args
    (arg :x))
  (send
    (lvar :x) :+
    (int 2)))
```

#### (begin) and (kwbegin)

**TL;DR: Unless you perform rewriting, treat `(begin)` and `(kwbegin)` as the same node type.**

Both `(begin)` and `(kwbegin)` nodes represent compound statements, that is, several expressions which are executed sequentally and the value of the last one is the value of entire compound statement. They may take several forms in the source code:

  * `foo; bar`: without delimiters
  * `(foo; bar)`: parenthesized
  * `begin foo; bar; end`: grouped with `begin` keyword
  * `def x; foo; bar; end`: grouped inside a method definition

and so on.

```bash
$ ruby-parse -e '(foo; bar)'
(begin
  (send nil :foo)
  (send nil :bar))
$ ruby-parse -e 'def x; foo; bar end'
(def :x
  (args)
  (begin
    (send nil :foo)
    (send nil :bar)))
```

Note that, despite its name, `kwbegin` node only has tangential relation to the `begin` keyword. Normally, Parser AST is semantic, that is, if two constructs look differently but behave identically, they get parsed to the same node. However, there exists a peculiar construct called post-loop in Ruby:

```
begin
  body
end while condition
```

This specific syntactic construct, that is, keyword `begin..end` block followed by a postfix `while`, [behaves][postloop] very unlike other similar constructs, e.g. `(body) while condition`. While the body itself is wrapped into a `while-post` node, Parser also supports rewriting, and in that context it is important to not accidentally convert one kind of loop into another.

  [postloop]: http://rosettacode.org/wiki/Loops/Do-while#Ruby

```
$ ruby-parse -e 'begin foo end while cond'
(while-post
  (send nil :cond)
  (kwbegin
    (send nil :foo)))
$ ruby-parse -e 'foo while cond'
(while
  (send nil :cond)
  (send nil :foo))
$ ruby-parse -e '(foo) while cond'
(while
  (send nil :cond)
  (begin
    (send nil :foo)))
```

(Parser also needs the `(kwbegin)` node type internally, and it is highly problematic to map it back to `(begin)`.)

## Backwards compatibility

Parser does _not_ use semantic versioning. Parser versions are structured as `x.y.z.t`,
where `x.y.z` indicates the most recent supported Ruby release (support for every
Ruby release that is chronologically earlier is implied), and `t` is a monotonically
increasing number.

The public API of Parser as well as the AST format (as listed in the documentation)
are considered stable forever, although support for old Ruby versions may be removed
at some point.

Sometimes it is necessary to modify the format of AST nodes that are already being emitted
in a way that would break existing applications. To avoid such breakage, applications
must opt-in to these modifications; without explicit opt-in, Parser will continue to emit
the old AST node format. The most recent set of opt-ins is specified in
the [usage section](#usage) of this README.

## Compatibility with Ruby MRI

Unfortunately, Ruby MRI often changes syntax in patchlevel versions. This has happened, at least, for every release since 1.9; for example, commits [c5013452](https://github.com/ruby/ruby/commit/c501345218dc5fb0fae90d56a0c6fd19d38df5bb) and [04bb9d6b](https://github.com/ruby/ruby/commit/04bb9d6b75a55d4000700769eead5a5cb942c25b) were backported all the way from HEAD to 1.9. Moreover, there is no simple way to track these changes.

This policy makes it all but impossible to make Parser precisely compatible with the Ruby MRI parser. Indeed, at September 2014, it would be necessary to maintain and update ten different parsers together with their lexer quirks in order to be able to emulate any given released Ruby MRI version.

As a result, Parser chooses a different path: the `parser/rubyXY` parsers recognize the syntax of the latest minor version of Ruby MRI X.Y at the time of the gem release.

## Compatibility with MacRuby and RubyMotion

Parser implements the MacRuby 0.12 and RubyMotion mid-2015 parsers precisely. However, the lexers of these have been forked off Ruby MRI and independently maintained for some time, and because of that, Parser may accept some code that these upstream implementations are unable to parse.

## Known issues

Adding support for the following Ruby MRI features in Parser would needlessly complicate it, and as they all are very specific and rarely occurring corner cases, this is not done.

Parser has been extensively tested; in particular, it parses almost entire [Rubygems][rg] corpus. For every issue, a breakdown of affected gems is offered.

 [rg]: https://rubygems.org

### Void value expressions

Ruby MRI prohibits so-called "void value expressions". For a description
of what a void value expression is, see [this
gist](https://gist.github.com/JoshCheek/5625007) and [this Parser
issue](https://github.com/whitequark/parser/issues/72).

It is unknown whether any gems are affected by this issue.

### Syntax check of block exits

Similar to "void value expression" checks Ruby MRI also checks for correct
usage of `break`, `next` and `redo`, if it's used outside of a {break,next,redo}-able
context Ruby returns a syntax error starting from 3.3.0. `parser` gem simply doesn't
run this type of checks.

It is unknown whether any gems are affected by this issue.

### Invalid characters inside comments and literals

Ruby MRI permits arbitrary non-7-bit byte sequences to appear in comments, as well as in string or symbol literals in form of escape sequences, regardless of source encoding. Parser requires all source code, including the expanded escape sequences, to consist of valid byte sequences in the source encoding that are convertible to UTF-8.

As of 2013-07-25, there are about 180 affected gems.

### \u escape in 1.8 mode

Ruby MRI 1.8 permits to specify a bare `\u` escape sequence in a string; it treats it like `u`. Ruby MRI 1.9 and later treat `\u` as a prefix for Unicode escape sequence and do not allow it to appear bare. Parser follows 1.9+ behavior.

As of 2013-07-25, affected gems are: activerdf, activerdf_net7, fastreader, gkellog-reddy.

### Dollar-dash

(This one is so obscure I couldn't even think of a saner name for this issue.) Pre-2.1 Ruby allows
to specify a global variable named `$-`. Ruby 2.1 and later treat it as a syntax error. Parser
follows 2.1 behavior.

No known code is affected by this issue.

### EOF characters after embedded documents before 2.7

Code like `"=begin\n""=end\0"` is invalid for all versions of Ruby before 2.7. Ruby 2.7 and later parses it
normally. Parser follows 2.7 behavior.

It is unknown whether any gems are affected by this issue.

## Contributors

* Catherine [whitequark][]
* Markus Schirp ([mbj][])
* Yorick Peterse ([yorickpeterse][])
* Magnus Holm ([judofyr][])
* Bozhidar Batsov ([bbatsov][])

[whitequark]:     https://github.com/whitequark
[mbj]:            https://github.com/mbj
[yorickpeterse]:  https://github.com/yorickpeterse
[judofyr]:        https://github.com/judofyr
[bbatsov]:        https://github.com/bbatsov

## Acknowledgements

The lexer testsuite is derived from
[ruby\_parser](https://github.com/seattlerb/ruby_parser).

The Bison parser rules are derived from [Ruby MRI](https://github.com/ruby/ruby)
parse.y.

## Contributing

1. Make sure you have [Ragel ~> 6.7](http://www.colm.net/open-source/ragel/) installed
2. Fork it
3. Create your feature branch (`git checkout -b my-new-feature`)
4. Commit your changes (`git commit -am 'Add some feature'`)
5. Push to the branch (`git push origin my-new-feature`)
6. Create new Pull Request