File: node_pattern.adoc

package info (click to toggle)
ruby-rubocop-ast 0.3.0%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 892 kB
  • sloc: ruby: 10,886; makefile: 4
file content (533 lines) | stat: -rw-r--r-- 13,970 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
= Node Pattern

Node pattern is a DSL to help find specific nodes in the Abstract Syntax Tree
using a simple string.

It reminds the simplicity of regular expressions but used to find specific
nodes of Ruby code.

== History

The Node Pattern was introduced by https://github.com/alexdowad[Alex Dowad]
and solves a problem that RuboCop contributors were facing for a long time:

* Ability to declaratively define rules for node search, matching, and capture.

The code below belongs to https://www.rubydoc.info/gems/rubocop/RuboCop/Cop/Style/ArrayJoin[Style/ArrayJoin]
cop and it's in favor of `Array#join` over `Array#*`. Then it tries to find
code like `%w(one two three) * ", "` and suggest to use `#join` instead.

It can also be an array of integers, and the code doesn't check it. However,
it checks if the argument sent is a string.

[source,ruby]
----
def on_send(node)
  receiver_node, method_name, *arg_nodes = *node
  return unless receiver_node && receiver_node.array_type? &&
    method_name == :* && arg_nodes.first.str_type?

  add_offense(node, location: :selector)
end
----

This code was replaced in the cop defining a new matcher that does the same as the code above:

[source,ruby]
----
def_node_matcher :join_candidate?, '(send $array :* $str)'
----

And the `on_send` method is simplified to a method usage:

[source,ruby]
----
def on_send(node)
  join_candidate?(node) { add_offense(node, location: :selector) }
end
----

== `(` and `)` Navigate deeply with Parens

Parens delimits navigation inside node and its children.

A simple integer like `1` is represented by `(int 1)` in the AST.

[source,sh]
----
$ ruby-parse -e '1'
(int 1)
----

* `int` will match exactly the node, looking only the node type.
* `(int 1)` will match precisely the node

== `_` for any single node

`_` will check if there's something present in the specific position, no matter the
value:

* `(int _)` will match any number
* `(int _ _)` will not match because `int` types have just one child that
contains the value.

== `+...+` for several subsequent nodes

Where `_` matches any single node, `+...+` matches any number of nodes.

Say for example you want to find instances of calls to the method `sum` with any
number of arguments, be it `sum(1, 2)` or `sum(1, 2, 3, n)`.
First, let's check how it looks like in the AST:

[source,sh]
----
$ ruby-parse -e 'sum(1, 2)'
(send nil :sum
  (int 1)
  (int 2))
----

Or with more children:

[source,sh]
----
$ ruby-parse -e 'sum(1, 2, 3, n)'
(send nil :sum
  (int 1)
  (int 2)
  (int 3)
  (send nil :n))
----

The following expression would only match a call with 2 arguments:

----
(send nil? :sum _ _)
----

Instead, the following expression will any number of arguments (and thus both examples above):

----
(send nil? :sum ...)
----

Note that `+...+` can be appear anywhere in a sequence, for example `+(send nil? :sum ... int)+`
would no longer match the second example, as the last argument is not an integer.

Nesting `+...+` is also supported; the only limitation is that `+...+` and
other "variable length" patterns can only appear once within a sequence.
For example `+(send ... :sum ...)+` is not supported.

== `*`, `+`, `?` for repetitions

Another way to handle a variable number of nodes is by using `*`, `+`, `?` to signify
a particular pattern should match any number of times, at least once and at most once respectively.

Following on the previous example, to find sums of integer literals, we could use:

----
(send nil? :sum int*)
----

This would match our first example `sum(1, 2)` but not the other `sum(1, 2, 3, n)`

This pattern would also match a call to `sum` without any argument, which might not be desirable.

Using `+` would insure that only sums with at least one argument would be matched.

----
(send nil? :sum int+)
----

The `?` can limit the match only 0 or 1 nodes.
The following example would match any sum of three integer literals
optionally followed by a method call:

----
(send nil? :sum int int int send ?)
----

Note that we have to put a space between `send` and `?`,
since `send?` would be considered as a predicate (described below).

== `<>` for match in any order

You may not care about the exact order of the nodes you want to match.
In this case you can put the nodes without brackets:

----
(send nil? :sum <(int 2) int>)
----

This will match our first example (`sum(1, 2)`).

It won't match our second example though, as it specifies that there must be
exactly two arguments to the method call `sum`.

You can add `+...+` before the closing bracket to allow for additional parameters:

----
(send nil? :sum <(int 2) int ...>)
----

This will match both our examples, but not `sum(1.0, 2)` or `sum(2)`,
since the first node in the brackets is found, but not the second (`int`).

== `{}` for "OR"

Lets make it a bit more complex and introduce floats:

[source,sh]
----
$ ruby-parse -e '1'
(int 1)
$ ruby-parse -e '1.0'
(float 1.0)
----

* `({int float} _)` - int or float types, no matter the value

== `[]` for "AND"

Imagine you want to check if the number is `odd?` and also positive numbers:

`(int [odd? positive?])` - is an int and the value should be odd and positive.

== `$` for captures

You can capture elements or nodes along with your search, prefixing the expression
with `$`. For example, in a tuple like `(int 1)`, you can capture the value using `(int $_)`.

You can also capture multiple things like:

----
(${int float} $_)
----

The tuple can be entirely captured using the `$` before the open parens:

----
$({int float} _)
----

Or remove the parens and match directly from node head:

----
${int float}
----

All variable length patterns (`+...+`, `*`, `+`, `?`, `<>`) are captured as arrays.

The following pattern will have two captures, both arrays:

----
(send nil? $int+ (send $...))
----

== `^` for parent

One may use the `^` character to check against a parent.

For example, the following pattern would find any node with two children and
with a parent that is a hash:

----
(^hash _key $_value)
----

It is possible to use `^` somewhere else than the head of a sequnece; in that
case it is relative to that child (i.e. the current node). One case also use
multiple `^` to go up multiple levels.
For example, the previous example is basically the same as:

----
(pair ^^hash $_value)
----

== ``` for descendants

The ``` character can be used to search a node and all its descendants.
For example if looking for a `return` statement anywhere within a method definition,
we can write:

----
(def _method_name _args `return)
----

This would match both of these methods `foo` and `bar`, even though
these `return` for `foo` and `bar` are not at the same level.

----
def foo              # (def :foo
  return 42          #   (args)
end                  #   (return
                     #     (int 42)))

def bar              # (def :bar
  return 42 if foo   #   (args)
  nil                #   (begin
end                  #     (if
                     #       (send nil :foo)
                     #       (return
                     #         (int 42)) nil)
                     #     (nil)))
----

== Predicate methods

Words which end with a `?` are predicate methods, are called on the target
to see if it matches any Ruby method which the matched object supports can be
used.

Example:

* `int_type?` can be used herein replacement of `(int _)`.

And refactoring the expression to allow both int or float types:

* `{int_type? float_type?}` can be used herein replacement of `({int float} _)`

You can also use it at the node level, asking for each child:

* `(int odd?)` will match only with odd numbers, asking it to the current
number.

== `#` to call functions

Sometimes, we want to add extra logic. Let's imagine we're searching for
prime numbers, so we have a method to detect it:

[source,ruby]
----
def prime?(n)
  if n <= 1
    false
  elsif n == 2
    true
  else
    (2..n/2).none? { |i| n % i == 0 }
  end
end
----

We can use the `#prime?` function directly in the expression:

----
(int #prime?)
----

== Arguments for predicate and function calls

Arguments can be passed to predicates and function calls, like literals, parameters:

[source,ruby]
----
def divisible_by?(value, divisor)
  value % divisor == 0
end
----

Example patterns using this function:
----
(int #divisible_by?(42))
(send (int _value) :+ (int #divisible_by?(_value))
----

The arguments can be pattern themselves, in which case a matcher responding to `===` will be passed. This makes patterns composable:

```ruby
def_node_pattern :global_const?, '(const {nil? cbase} %1)'
def_node_pattern :class_creator, '(send #global_const?({:Class :Module}) :new ...)'
```

== Using node matcher macros

The RuboCop base includes two useful methods to use the node pattern with Ruby in a
simple way. You can use the macros to define methods. The basics are
https://www.rubydoc.info/gems/rubocop/RuboCop/NodePattern/Macros#def_node_matcher-instance_method[def_node_matcher]
and https://www.rubydoc.info/gems/rubocop/RuboCop/NodePattern/Macros#def_node_search-instance_method[def_node_search].

When you define a pattern, it creates a method that accepts a node and tries to match.

Lets create an example where we're trying to find the symbols `user` and
`current_user` in expressions like: `user: current_user` or
`current_user: User.first`, so the objective here is pick all keys:

[source,sh]
----
$ ruby-parse -e ':current_user'
(sym :current_user)
$ ruby-parse -e ':user'
(sym :user)
$ ruby-parse -e '{ user: current_user }'
(hash
  (pair
    (sym :user)
    (send nil :current_user)))
----

Our minimal matcher can get it in the simple node `sym`:

[source,ruby]
----
def_node_matcher :user_symbol?, '(sym {:current_user :user})'
----

=== Composing complex expressions with multiple matchers

Now let's go deeply combining the previous expression and also match if the
current symbol is being called from an initialization method, like:

[source,sh]
----
$ ruby-parse -e 'Comment.new(user: current_user)'
(send
  (const nil :Comment) :new
  (hash
    (pair
      (sym :user)
      (send nil :current_user))))
----

And we can also reuse this and check if it's a constructor:

[source,ruby]
----
def_node_matcher :initializing_with_user?, <<~PATTERN
  (send _ :new (hash (pair #user_symbol?)))
PATTERN
----

== `%` for arguments

Arguments can be passed to matchers, either as external method arguments,
or to be used to compare elements. An example of method argument:

[source,ruby]
----
def multiple_of?(n, factor)
  n % factor == 0
end

def_node_matcher :int_node_multiple?, '(int #multiple_of?(%1))'

# ...

int_node_multiple?(node, 10) # => true if node is an 'int' node with a multiple of 10
----

Arguments can be used to match nodes directly:

[source,ruby]
----
def_node_matcher :has_sensitive_data?, '(hash <(pair (_ %1) $_) ...>)'

# ...

has_user_data?(node, :password) # => true if node is a hash with a key +:password+

# matching uses ===, so to match strings or symbols, 'pass' or 'password' one can:
has_user_data?(node, /^pass(word)?$/i)

# one can also pass lambdas...
has_user_data?(node, ->(key) { # return true or false depending on key })
----

NOTE: `Array#===` will never match a single node element (so don't pass arrays),
but `Set#===` is an alias to `Set#include?` (Ruby 2.5+ only), and so can be
very useful to match within many possible literals / Nodes.

== `%param_name` for named parameters

Arguments can be passed as named parameters. They will be matched using `===`
(see `%` above).

Contrary to positional arguments, defaults values can be passed to
`def_node_matcher` and `def_node_search`:

[source,ruby]
----
def_node_matcher :interesting_call?, '(send _ %method ...)',
                 method: Set[:transform_values, :transform_keys,
                             :transform_values!, :transform_keys!,
                             :to_h].freeze

# Usage:

interesting_call?(node) # use the default methods
interesting_call?(node, method: /^transform/) # match anything starting with 'transform'
----

Named parameters as arguments to custom methods are also supported.

== `%CONST` for constants

Constants can be included in patterns. They will be matched using `===`, so
+Regexp+ / +Set+ / +Proc+ can be used in addition to literals and +Nodes+:

[source,ruby]
----
SOME_CALLS = Set[:transform_values, :transform_keys,
                 :transform_values!, :transform_keys!,
                 :to_h].freeze

def_node_matcher :interesting_call?, '(send _ %SOME_CALLS ...)'

----

Constants as arguments to custom methods are also supported.

== Comments

You may have comments in node patterns at the end of lines
by preceeding them with `'# '`:

[source,ruby]
----
def_node_matcher :complex_stuff, <<~PATTERN
  (send
    {#global_const?(:Kernel) nil?}  # check for explicit call like Kernel.p too
    {:p :pp}                        # let's consider `pp` also
    $...                            # capture all arguments
  )
PATTERN
----

== `nil` or `nil?`

Take a special attention to nil behavior:

[source,sh]
----
$ ruby-parse -e 'nil'
(nil)
----

In this case, the `nil` implicit matches with expressions like: `nil`, `(nil)`, or `nil_type?`.

But, nil is also used to represent a call from `nothing` from a simple method call:

[source,sh]
----
$ ruby-parse -e 'method'
(send nil :method)
----

Then, for such case you can use the predicate `nil?`. And the code can be
matched with an expression like:

----
(send nil? :method)
----

== More resources

Curious about how it works?

Check more details in the
https://www.rubydoc.info/gems/rubocop-ast/RuboCop/AST/NodePattern[documentation]
or browse the https://github.com/rubocop-hq/rubocop-ast/blob/master/lib/rubocop/ast/node_pattern.rb[source code]
directly. It's easy to read and hack on.

The https://github.com/rubocop-hq/rubocop-ast/blob/master/spec/rubocop/ast/node_pattern_spec.rb[specs]
are also very useful to comprehend each feature.