File: Recognizer.pod

package info (click to toggle)
libmarpa-r2-perl 12.000000-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 6,660 kB
  • sloc: perl: 42,628; ansic: 23,387; sh: 4,363; makefile: 157
file content (786 lines) | stat: -rw-r--r-- 22,373 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
# Copyright 2022 Jeffrey Kegler
# This file is part of Marpa::R2.  Marpa::R2 is free software: you can
# redistribute it and/or modify it under the terms of the GNU Lesser
# General Public License as published by the Free Software Foundation,
# either version 3 of the License, or (at your option) any later version.
#
# Marpa::R2 is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser
# General Public License along with Marpa::R2.  If not, see
# http://www.gnu.org/licenses/.

=head1 NAME

Marpa::R2::Deprecated::NAIF::Recognizer - NAIF recognizers

=head1 THE NAIF INTERFACE IS DEPRECATED

This document describes the NAIF interface,
which is deprecated.
PLEASE DO NOT USE IT FOR NEW DEVELOPMENT.

=head1 Synopsis

=for Marpa::R2::Display
name: Engine Synopsis Unambiguous Parse
partial: 1
normalize-whitespace: 1

    my $recce = Marpa::R2::Recognizer->new( { grammar => $grammar } );
    $recce->read( 'Number', 42 );
    $recce->read('Multiply');
    $recce->read( 'Number', 1 );
    $recce->read('Add');
    $recce->read( 'Number', 7 );

=for Marpa::R2::Display::End

=head1 Description

This document describes recognizers for Marpa's
named argument interface (NAIF).
If you are a beginner,
or are not sure which interface you are interested in,
or do not know what the NAIF interfaces is,
you probably are looking for
L<the document on recognizers for the SLIF
interface|Marpa::R2::Scanless::R>.

To create a recognizer object, use L<the C<new> method|/new()>.

To read input, use L<the C<read> method|/read()>.

To evaluate a parse tree, based on the input, use L<the C<value> method|/value()>.

=head2 Token streams

By default, Marpa uses the token-stream model of input.
The token-stream model is standard -- so standard the most documents about
parsing do not bother to describe it.
In the token-stream model, each read adds a token at the current location,
then advances the current location by one.
The location before any input is numbered 0
and if I<N> tokens are parsed,
they fill the locations from 1 to I<N>.

This document will describe only the token-stream model of input.
Marpa allows other models of the input, but their use
requires special method calls,
which are described in L<the
document on alternative input models|Marpa::R2::Deprecated::NAIF::Input_Models>.

=head1 Constructor

=head2 new()

=for Marpa::R2::Display
name: Engine Synopsis Unambiguous Parse
partial: 1
normalize-whitespace: 1

    my $recce = Marpa::R2::Recognizer->new( { grammar => $grammar } );

=for Marpa::R2::Display::End

The C<new> method creates a recognizer object.
The C<new> method either returns a new recognizer object or throws an exception.

The arguments to the C<new> method
are references to hashes of named
arguments.
In each key/value pair of these hashes, the key is the argument name,
and the hash value is the value of the argument.
The named arguments are described L<below|/"Named arguments">.

=head1 Accessors

=head2 check_terminal()

=for Marpa::R2::Display
name: Recognizer check_terminal Synopsis
normalize-whitespace: 1

    my $is_symbol_a_terminal = $recce->check_terminal('Document');

=for Marpa::R2::Display::End

Returns a Perl true when its argument is the name of a terminal symbol.
Otherwise, returns a Perl false.
Not often needed.

=head2 events()

=for Marpa::R2::Display
name: Recognizer events() Synopsis
normalize-whitespace: 1

    my @expected_symbols =
        map { $_->[1]; }
        grep { $_->[0] eq 'SYMBOL_EXPECTED' } @{ $recce->events() };

=for Marpa::R2::Display::End

Returns a reference to an array of the events
from the last L</read()> method call.
Each element of the array is a subarray or 1 or 2 elements.
The first element of the subarray is the name of an event type,
as described in L</"Recognizer events">.
The second element is the event value of the event,
where that is applicable.
For more detail, see
L</"Recognizer events">.

=head2 exhausted()

=for Marpa::R2::Display
name: Recognizer exhausted Synopsis
normalize-whitespace: 1

        $recce->exhausted() and die 'Recognizer exhausted';

=for Marpa::R2::Display::End

The C<exhausted> method returns a Perl true if parsing
in a recognizer is exhausted, and a Perl false
otherwise.
Parsing is exhausted when the recognizer will not accept
any further input.
By default, a recognizer event occurs if parsing
is exhausted.
An attempt to read input into an exhausted parser
causes an exception to be thrown.
The recognizer event and the exception are all that
many applications require,
but this method allows the recognizer's exhaustion
status to be discovered directly.

=head2 latest_earley_set()

=for Marpa::R2::Display
name: latest_earley_set() Synopsis
normalize-whitespace: 1

    my $latest_earley_set = $recce->latest_earley_set();

=for Marpa::R2::Display::End

Return the location of the latest (in other words,
the most recent)
Earley set.
In the places where it is most often needed,
the latest Earley set is the default,
and there is usually no need to request
the explicit value
of the latest Earley set.

=head2 progress()

Given the location (Earley set ID) as its argument,
returns an array that describes the parse progress
at that location.
Details on progress reports can be found in
L<their own document|Marpa::R2::Deprecated::NAIF::Progress>.

=head2 terminals_expected()

=for Marpa::R2::Display
name: Recognizer terminals_expected Synopsis
partial: 1
normalize-whitespace: 1

    my $terminals_expected = $recce->terminals_expected();

=for Marpa::R2::Display::End

Returns a reference to a list of strings,
where the strings are the
names of the terminals
acceptable at the current location.
In the default input model, the presence of a terminal
in this list means that terminal will be acceptable
in the next C<read> method call.
This is highly useful for Ruby Slippers parsing.

=head1 Mutators

=head2 expected_symbol_event_set()

=for Marpa::R2::Display
name: Recognizer expected_symbol_event_set() Synopsis
normalize-whitespace: 1

    $recce->expected_symbol_event_set( 'endmark', 1 );

=for Marpa::R2::Display::End

Marpa can generate a recognizer event when 
a symbol is expected at the current earleme.
This method takes a symbol name as its first argument,
and turns the expected-symbol event for that
symbol on or off,
according to whether its second argument is 1 or 0.
Always succeeds or throws an exception.

Events can occur at location 0 -- when the recognizer is first created.
However, the event setting of C<expected_symbol_event_set()>
cannot have an effect until after the first token is read --
after location 0.
In cases where this is an issue,
the L</event_if_expected> named argument of the
L</new()> method can be used to set an expected-symbol event.

=head2 read()

=for Marpa::R2::Display
name: Engine Synopsis Unambiguous Parse
partial: 1
normalize-whitespace: 1

    $recce->read( 'Number', 42 );
    $recce->read('Multiply');
    $recce->read( 'Number', 1 );
    $recce->read('Add');
    $recce->read( 'Number', 7 );

=for Marpa::R2::Display::End

The C<read> method reads one token at the current parse location.
It then advances the current location by 1.

C<read> takes two arguments: a B<token name> and a B<token value>.
The token name is required.
It must be the name of a valid terminal symbol.
The token value is optional.
It defaults to a Perl C<undef>.
For details about terminal symbols,
see L<Marpa::R2::Deprecated::NAIF::Grammar/"Terminal symbols">.

The parser may accept or reject the token.
If the parser accepted the token,
the C<read> method returns
the number of recognizer events that occurred during the
C<read>.
For more about events, see
L</"Recognizer events">.

Marpa may reject a token because it is not one of those
acceptable at the current location.
When this happens, C<read> returns a Perl C<undef>.
A rejected token need not end parsing --
it is perfectly possible to retry the C<read> call
with another token.
This is, in fact, an important technique in Ruby
Slippers parsing.
For details,
see L<the section on Ruby Slippers
parsing|/"Ruby Slippers parsing">.

For other failures,
including an attempt to C<read> a token
into an exhausted parser,
Marpa throws an exception.

=head2 set()

=for Marpa::R2::Display
name: Recognizer set Synopsis
normalize-whitespace: 1

    $recce->set( { max_parses => 10, } );

=for Marpa::R2::Display::End

The C<set> method's arguments are references to hashes of named
arguments.
The C<set> method
can be used to set or change named arguments after the recognizer
has been created.
Details of the named arguments are L<below|/"Named arguments">.

=head2 value()

=for Marpa::R2::Display
name: Engine Synopsis Unambiguous Parse
partial: 1
normalize-whitespace: 1

    my $value_ref = $recce->value;
    my $value = $value_ref ? ${$value_ref} : 'No Parse';

=for Marpa::R2::Display::End

Because Marpa parses ambiguous grammars, every parse
is a series of zero or more parse trees.
There are zero parse trees if there was no valid parse
of the input according to the grammar.

The C<value> method call evaluates the next parse tree
in the parse series,
and returns a reference to the parse result for that parse tree.
If there are no more parse trees,
the C<value> method returns C<undef>.

=head2 reset_evaluation()

=for Marpa::R2::Display
name: reset_evaluation Synopsis
normalize-whitespace: 1

        $recce->reset_evaluation();
        $recce->set( { end => $loc, max_parses => 999, } );

=for Marpa::R2::Display::End

The C<reset_evaluation()> method ends a parse series,
and starts another.
It can be used to "restart" the parse series.
Restarting the parse series with
the C<reset_evaluation()> method
allows the
application to specify new
values for
the C<closures>,
C<end>
and C<ranking_method> named arguments.
Once a parse series is underway,
these values cannot be changed.

The most common use for
C<reset_evaluation()> method
is to parse
a single input stream
at different end points.
This can also be done by creating a new recognizer
and re-reading the input
from the beginning,
but it is much more efficient to
evaluate a single recognizer run
several times,
using different parse end locations.
After the parse is restarted using
the C<reset_evaluation()> method,
the recognizer's C<set()>
method
and its C<end> named argument
can be used to change the
parse end location.

=head1 Trace accessors

=head2 show_progress()

=for Marpa::R2::Display
name: show_progress Synopsis
partial: 1
normalize-whitespace: 1

    print $recce->show_progress()
        or die "print failed: $ERRNO";

=for Marpa::R2::Display::End

Returns a string describing the progress of the parse.
With no arguments,
the string contains reports for
the current location.
With a single integer argument I<N>,
the string contains reports for location I<N>.
With two numeric arguments, I<N> and I<M>, the arguments are interpreted
as a range of locations and the returned string contains
reports for all locations in the range.
("Location" as referred to in this section,
and elsewhere
in this document,
is what is also called the Earley set ID.)

If an argument is negative,
I<-N>,
it indicates
the I<N>th location counting backward
from the furthest location of the parse.
For example, if 42 was the furthest location,
-1 would be location 42 and -2 would be location 41.
For example, the method call
C<< $recce->show_progress(-3, -1) >>
returns reports for the last three locations of the parse.
The method call C<< $recce->show_progress(0, -1) >>
will print progress reports for the entire parse.

C<show_progress> is
Marpa's most powerful tool for
debugging application grammars.
It can also be used to track the
progress of a parse or
to investigate how a parse works.
A much fuller description,
with an example,
is in
L<the document on debugging Marpa
grammars|Marpa::R2::Deprecated::NAIF::Progress>.

=head1 Named arguments

The recognizer's named arguments are
accepted by its
C<new> and C<set> methods.

=head2 closures

The value of C<closures> named argument
must be
a reference to a hash.
In each key/value pair of this hash,
the key must be an action name.
The hash value
must be a CODE ref.
The C<closures> named argument
is not allowed once evaluation has begun.

When an action name is a key in
the
C<closures> named argument,
the usual action resolution mechanism of the semantics
is bypassed.
One common use of
the C<closures> named argument is to
allow anonymous
subroutines to be semantic actions.
For more details, see L<the document on
semantics|Marpa::R2::Deprecated::NAIF::Semantics>.

=head2 end

The C<end> named argument
specifies the parse end location.
The default is for the parse to end where the input did,
so that the parse returned is of the entire input.
The C<end> named argument is not allowed
once evaluation has begun.
"Location" as referred to here and elsewhere
in this document is what is also called
an Earley set ID.

=head2 event_if_expected

The value of the 
C<event_if_expected> named argument
must be a reference to an array
of symbol names.
Expected-symbol events will be turned on for those symbol
names.
Expected-symbol events may be turned off (or back on)
using the L</expected_symbol_event_set()> method.
The advantage of the C<event_if_expected> named argument
is that it takes effect as soon as the recognizer is created,
while events set
using the L</expected_symbol_event_set()> method
cannot occur until after the first token is read.

=head2 grammar

The C<new> method is required to have
a C<grammar> named argument.  Its
value must be
a precomputed Marpa grammar object.
The C<grammar> named argument is not allowed anywhere else.

=head2 max_parses

If non-zero, causes a fatal error when that number
of parse results is exceeded.
C<max_parses> is useful to
limit CPU usage and output length when testing
and debugging.
Stable and production applications may
prefer to count the number of parses,
and take a less Draconian response when the
count is exceeded.

The value must be an integer.
If it is zero, there will be no
limit on the number of parse results returned.
The default is for
there to be no limit.

=head2 ranking_method

The value must be a string:
one of "C<none>",
"C<rule>",
or "C<high_rule_only>".
When the value is "C<none>", Marpa returns the parse results
in arbitrary order.
This is the default.
The C<ranking_method> named argument is not allowed
once evaluation has begun.

The "C<rule>"
and "C<high_rule_only>" ranking methods
allows the user
to control the order
in which parse results are returned by
the C<value> method,
and to exclude some parse results from the parse series.
For details, see L<the document
on parse order|Marpa::R2::Deprecated::NAIF::Semantics::Order>.

=head2 too_many_earley_items

The C<too_many_earley_items> argument is optional.
If specified, it sets the B<Earley item warning threshold>.
If an Earley set becomes larger than the
Earley item warning threshold,
a recognizer event is generated,
and
a warning is printed to the trace file handle.

Marpa parses from any BNF,
and can handle grammars and inputs which produce large
Earley sets.
But parsing that involves large Earley sets can be slow.
Large Earley sets
are something most applications can,
and will wish to, avoid.

By default, Marpa calculates
an Earley item warning threshold
based on the size of the
grammar.
The default threshold will never be less than 100.
If the Earley item warning threshold is set to 0,
no recognizer event is generated,
and
warnings about large Earley sets are turned off.

=head2 trace_actions

The
C<trace_actions> named argument
is a boolean.
If the boolean value is true, Marpa prints tracing information
as it resolves action names to
Perl closures.
A boolean value of false turns tracing off, which is the default.
Traces are written to the trace file handle.

=head2 trace_file_handle

The value is a file handle.
Traces and warning messages
go to the trace file handle.
By default the trace file handle is inherited
from the grammar used to create the recognizer.

=head2 trace_terminals

Very handy in debugging, and often useful
even when the problem is not in the lexing.
The value is a trace level.
When the trace level is 0,
tracing of terminals is off.
This is the default.

At a trace level of 1 or higher,
Marpa produces a trace message
for each terminal as it is accepted or rejected
by the recognizer.
At a trace level of 2 or higher,
the trace messages include, for
every location, a list of the
terminals expected.
In practical grammars, output from
trace level 2 can be voluminous.

=head2 trace_values

The C<trace_values> named argument
is a numeric trace level.
If the numeric trace level is 1, Marpa
prints tracing information
as values are computed in the evaluation stack.
A trace level of 0 turns value tracing off,
which is the default.
Traces are written to the trace file handle.

=head2 warnings

The value is a boolean.
Warnings are written to the trace file handle.
By default, the recognizer's warnings are on.
Usually, an application will want to leave them on.

=head1 Recognizer events

The recognizer's C<read()> method can generate events.
To access events, use the recognizer's L</events()> method.

The C<EARLEY_ITEM_THRESHOLD> and
The C<EXHAUSTED> events are enabled by default.
Events optionally have an "event value",
as specified in the description of each event.
The following events are possible.

=head2 EARLEY_ITEM_THRESHOLD

The Earley item threshold was exceeded.
For more about the 
Earley item warning threshold,
see L</too_many_earley_items>.
No event value is defined for this event.
This event is enabled by default.

=head2 EXHAUSTED

"Exhaustion"
means that the next C<read> call must fail,
because there is no token that will be acceptable to it.
More details on "exhaustion" are in L<a
section below|/"Parse exhaustion">.
No event value is defined for this event.
This event is enabled by default.

=head2 SYMBOL_EXPECTED

A "symbol expected" event means that a symbol is expected
at that point.
The event value of this event is the symbol
whose expectation caused the event.
This event is disabled by default.
For details, see L</expected_symbol_event_set()>.

=head1 Parse exhaustion

A parse is B<exhausted> when it will accept no more input.
An B<exhausted> parse is not necessarily a failed parse.
Grammars are often written so that once they "find what
they are looking for", no further input is acceptable.
Grammars of that kind become exhausted when they succeed.

By default,
a recognizer event occurs whenever the parse is
exhausted.
An application can also check for exhaustion
explicitly, using the recognizer's
L<C<exhausted> method|/exhausted()>.

=head1 Ruby Slippers parsing

=for Marpa::R2::Display
name: Engine Synopsis Interactive Parse
partial: 1
normalize-whitespace: 1

    $recce = Marpa::R2::Recognizer->new( { grammar => $grammar } );

    my @tokens = (
        [ 'Number', 42 ],
        ['Multiply'], [ 'Number', 1 ],
        ['Add'],      [ 'Number', 7 ],
    );

    TOKEN: for ( my $token_ix = 0; $token_ix <= $#tokens; $token_ix++ ) {
        defined $recce->read( @{ $tokens[$token_ix] } )
            or fix_things( $recce, $token_ix, \@tokens )
            or die q{Don't know how to fix things};
    }

=for Marpa::R2::Display::End

Marpa is able to tell the application
which symbols are acceptable as tokens at the next location
in the parse.
The L<C<terminals_expected> method|/terminals_expected()>
returns the list of tokens that B<will> be accepted by
the next C<read>.
The application can use this information to change the
input "on the fly"
so that it is acceptable to the parser.

An application can also take a "try it and see"
approach.
If an application is not sure whether a token is
acceptable or not, the application can
try to read the dubious token using
L<the C<read> method|/read()>.
If the token is rejected,
L<the C<read> method|/read()> call will return a
Perl C<undef>.
At that point,
the application can retry the C<read> with a different token.

=head2 An example

Marpa's HTML parser, L<Marpa::HTML>, is
an example of how Ruby Slippers parsing can help
with a non-trivial, real-life application.
When a token is rejected in L<Marpa::HTML>, it changes
the input to match
the parser's expectations by

=over

=item * Modifying existing tokens, and

=item * Creating new tokens.

=back

The second technique, the creation of
new "virtual" tokens,
is used
by L<Marpa::HTML>
to deal with omitted start and end tags.
The actual HTML grammar that
L<Marpa::HTML> uses takes
an oversimplified view of the HTML --
it assumes,
even when the HTML standards do not require it,
that start and end tags are always present.
For most HTML files of interest,
this assumption will be
contrary to fact.

Ruby Slippers parsing is used to make the grammar's
over-simplistic view of the world come true for it.
Whenever a token is rejected,
L<Marpa::HTML> looks at the expected tokens list.
If it sees that a start or end tag is expected,
L<Marpa::HTML> creates a token for it --
a completely new "virtual" token that gives the parser exactly what it expects.
L<Marpa::HTML> then resumes input at the point in the original input stream
where it left off.

=head1 Copyright and License

=for Marpa::R2::Display
ignore: 1

  Copyright 2022 Jeffrey Kegler
  This file is part of Marpa::R2.  Marpa::R2 is free software: you can
  redistribute it and/or modify it under the terms of the GNU Lesser
  General Public License as published by the Free Software Foundation,
  either version 3 of the License, or (at your option) any later version.

  Marpa::R2 is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  Lesser General Public License for more details.

  You should have received a copy of the GNU Lesser
  General Public License along with Marpa::R2.  If not, see
  http://www.gnu.org/licenses/.

=for Marpa::R2::Display::End

=cut

# Local Variables:
#   mode: cperl
#   cperl-indent-level: 4
#   fill-column: 100
# End:
# vim: expandtab shiftwidth=4: