File: README.rdoc

package info (click to toggle)
ruby-kpeg 1.3.3-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 608 kB
  • sloc: ruby: 11,839; makefile: 10
file content (254 lines) | stat: -rw-r--r-- 7,169 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
= kpeg

home :: https://github.com/evanphx/kpeg
bugs :: https://github.com/evanphx/kpeg/issues

== Description

KPeg is a simple PEG library for Ruby. It provides an API as well as native
grammar to build the grammar.

KPeg strives to provide a simple, powerful API without being too exotic.

KPeg supports direct left recursion of rules via the
{OMeta memoization}[http://www.vpri.org/pdf/tr2008003_experimenting.pdf] trick.

== Writing your first grammar

=== Setting up your grammar

All grammars start with with the class/module name that will be your parser

  %% name = Example::Parser

After that a block of ruby code can be defined that will be added into the
class body of your parser. Attributes that are defined in this block can be
accessed within your parser as instance variables. Methods can also be defined
in this block and used in action blocks as well.

  %% {
    attr_accessor :something_cool
    
    def something_awesome
      # do something awesome
    end
  }

=== Defining literals

Literals are static declarations of characters or regular expressions designed for reuse in the grammar. These can be constants or variables. Literals can take strings, regular expressions or character ranges

  ALPHA = /[A-Za-z]/
  DIGIT = /[0-9]/
  period = "."
  string = "a string"
  regex = /(regexs?)+/
  char_range = [b-t]

Literals can also accept multiple definitions

  vowel = "a" | "e" | "i" | "o" | "u"
  alpha = /[A-Z]/ | /[a-z]/

=== Defining Rules for Values

Before you can start parsing a string you will need to define rules that you
will use to accept or reject that string. There are many different types of
rules available in kpeg 

The most basic of these rules is a string capture

  alpha = < /[A-Za-z]/ > { text }

While this looks very much like the ALPHA literal defined above it differs in
one important way, the text captured by the rule defined between the < and >
symbols will be set as the text variable in block that follows. You can also
explicitly define the variable that you would like but only with existing
rules or literals.

  letter = alpha:a { a }

Additionally blocks can return true or false values based upon an expression
within the block. To return true if a test passes do the following:

  match_greater_than_10 = < num:n > &{ n > 10 }

To test and return a false value if the test passes do the following:

  do_not_match_greater_than_10 = < num:n > !{ n > 10 }

Rules can also act like functions and take parameters. An example of this is
lifted from the {Email List
Validator}[https://github.com/larb/email_address_validator], where an ascii
value is passed in and the character is evaluated against it returning a true
if it matches

  d(num) = <.> &{ text[0] == num }

Rules support some regular expression syntax for matching

* maybe ?
* many +
* kleene *
* groupings ()

Examples:

  letters = alpha+
  words = alpha+ space* period?
  sentence = (letters+ | space+)+

Kpeg also allows a rule to define the acceptable number of matches in the form
of a range. In regular expressions this is often denoted with syntax like
{0,3}. Kpeg uses this syntax to accomplish match ranges [min, max].

  matches_3_to_5_times = letter[3,5]
  matches_3_to_any_times = letter[3,*]

=== Defining Actions

Illustrated above in some of the examples, kpeg allows you to perform actions
based upon a match that are described in block provided or in the rule
definition itself.

  num = /[1-9][0-9]*/
  sum = < num:n1 "+" num:n2 > { n1 + n2 }

As of version 0.8 an alternate syntax has been added for calling defined
methods as actions.

  %% {
    def add(n1, n2){
      n1 + n2
    }
  }
  num = /[1-9][0-9]*/
  sum = < num:n1 "+" num:n2 > ~add(n1, n2)

=== Referencing an external grammar

Kpeg allows you to run a rule that is defined in an external grammar. This is
useful if there is a defined set of rules that you would like to reuse in
another parser. To do this, create your grammar and generate a parser using
the kpeg command line tool.

  kpeg literals.kpeg

Once you have the generated parser, include that file into your new grammar

  %{
    require "literals.kpeg.rb"
  }

Then create a variable to hold to foreign interface and pass it the class name
of your parser. In this case my parser class name is Literal 

  %foreign_grammar = Literal

You can then use rules defined in the foreign grammar in the local grammar
file like so

  sentence = (%foreign_grammar.alpha %foreign_grammar.space*)+
             %foreign_grammar.period

=== Comments

Kpeg allows comments to be added to the grammar file by using the # symbol

  # This is a comment in my grammar

=== Variables

A variable looks like this:

  %% name = value

Kpeg allows the following variables that control the output parser:

name::
  The class name of the generated parser.
custom_initialize::
  When built as a standalone parser a default initialize method will not be
  included.

=== Directives

A directive looks like this:

  %% header {
    ...
  }

Kpeg allows the following directives:

header::
  Placed before any generated code
pre-class::
  Placed before the class definition to provide a class comment
footer::
  Placed after the end of the class (for requiring files dependent upon the
  parser's namespace

== Generating and running your parser

Before you can generate your parser you will need to define a root rule. This
will be the first rule run against the string provided to the parser

  root = sentence

To generate the parser run the kpeg command with the kpeg file(s) as an
argument. This will generate a ruby file with the same name as your grammar
file.

  kpeg example.kpeg

Include your generated parser file into an application that you want to use
the parser in and run it. Create a new instance of the parser and pass in the
string you want to evaluate. When parse is called on the parser instance it
will return a true if the sting is matched, or false if it doesn't. 

  require "example.kpeg.rb"

  parser = Example::Parser.new(string_to_evaluate)
  parser.parse

== Shortcuts and other techniques

Per vito, you can get the current line or current column in the following way 

  line = { current_line }
  column = { current_column }
  foo = line:line ... { # use line here }

== AST Generation

As of Kpeg 0.8 a parser can now generate an AST. To define an AST node use the
following syntax

  %% assign = ast Assignment(name, value)

Once you have a defined AST node, it can be used in your grammar like so

  assignment = identifier:i space* = space* value:v ~assign(i,v)

This will create a new Assign node that you can add into your AST.

For a good example of usage check out Talon[https://github.com/evanphx/talon]

== Examples

There are several examples available in the /examples directory. The upper
parser has a readme with a step by step description of the grammar.

== Projects

{Dang}[https://github.com/veganstraightedge/dang]

{Email Address Validator}[https://github.com/larb/email_address_validator]

{Callisto}[https://github.com/dwaite/Callisto]

{Doodle}[https://github.com/vito/doodle]

{Kanbanpad}[https://kanbanpad.com] (uses kpeg for parsing of the 'enter
something' bar)