File: Parser.pm

package info (click to toggle)
pdl 2.005-4
  • links: PTS
  • area: main
  • in suites: potato
  • size: 4,200 kB
  • ctags: 3,301
  • sloc: perl: 14,876; ansic: 7,223; fortran: 3,417; makefile: 54; sh: 16
file content (1112 lines) | stat: -rw-r--r-- 36,823 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
#############################################################################
# Parser.pm -- package which defines a base class for parsing pod docs.
#
# Based on Tom Christiansen's Pod::Text module
# (with extensive modifications).
#
# Copyright (C) 1996 Tom Christiansen. All rights reserved.
# This file is part of "PodParser". PodParser is free software;
# you can redistribute it and/or modify it under the same terms
# as Perl itself.
#############################################################################

package PDL::Pod::Parser;

$VERSION = 1.00;   ## Current version of this package
require  5.002;    ## requires Perl version 5.002 or later

=head1 NAME

PDL::Pod::Parser - base class for creating pod filters and translators

=head1 SYNOPSIS

    use PDL::Pod::Parser;
    package PDL::MyParser;
    @ISA = qw(PDL::Pod::Parser);

    sub new {
        ## constructor code ...
    }

    ## implementation of appropriate subclass methods ...

    package main;
    $parser = new PDL::MyParser;
    @ARGV = ('-')  unless (@ARGV > 0);
    for (@ARGV) {
        $parser->parse_from_file($_);
    }

=head1 DESCRIPTION

B<PDL::Pod::Parser> is an abstract base class for implementing filters and/or
translators to parse pod documentation into other formats. It handles
most of the difficulty of parsing the pod sections in a file and leaves
it to the subclasses to override various methods to provide the actual
translation. The other thing that B<PDL::Pod::Parser> provides is the ability
to process only selected sections of pod documentation from the input.

=head2 SECTION SPECIFICATIONS

Certain methods and functions provided by B<PDL::Pod::Parser> may be given
one or more "section specifications" to restrict the text processed to
only the desired set of sections and their corresponding subsections.  A
section specification is a string containing one or more Perl-style
regular expressions separated by forward slashes ("/").  If you need to
use a forward slash literally within a section title you can escape it
with a backslash ("\/").

The formal syntax of a section specification is:

=over 4

=item
I<head1-title-regexp>/I<head2-title-regexp>/...

=back

Any omitted or empty regular expressions will default to ".*".
Please note that each regular expression given is implicitly
anchored by adding "^" and "$" to the beginning and end.  Also, if a
given regular expression starts with a "!" character, then the
expression is negated (so C<!foo> would match anything I<except>
C<foo>).

Some example section specifications follow.

=over 4

=item
Match the C<NAME> and C<SYNOPSIS> sections and all of their subsections:

C<NAME|SYNOPSIS>

=item
Match only the C<Question> and C<Answer> subsections of the C<DESCRIPTION>
section:

C<DESCRIPTION/Question|Answer>

=item
Match the C<Comments> subsection of I<all> sections:

C</Comments>

=item
Match all subsections of C<DESCRIPTION> I<except> for C<Comments>:

C<DESCRIPTION/!Comments>

=item
Match the C<DESCRIPTION> section but do I<not> match any of its subsections:

C<DESCRIPTION/!.+>

=item
Match all top level sections but none of their subsections:

C</!.+>

=back

=cut

#############################################################################

use strict;
use Carp;

## Maximum number of heading levels supported for '=headN' directives
$PDL::Pod::Parser::MAX_HEAD_LEVEL = 3;

##---------------------------------------------------------------------------

=head1 FUNCTIONS

B<PDL::Pod::Parser> provides the following functions (please note that these
are functions and I<not> methods, they do not take an object reference
as an implicit first parameter):

=cut

##---------------------------------
## Function definitions begin here
##---------------------------------

=head2 version()

Return the current version of this package.

=cut

sub version {
    no strict;
    return  $VERSION;
}

##---------------------------------------------------------------------------

=head1 INSTANCE METHODS

B<PDL::Pod::Parser> provides several methods, some of which should be
overridden by subclasses.  They are as follows:

=cut

##-------------------------------
## Method definitions begin here
##-------------------------------

=head2 new()

This is the the constructor for the base class. You should only use it
if you want to create an instance of a B<PDL::Pod::Parser> instead of one of
its subclasses. The constructor for this class and all of its subclasses
should return a blessed reference to an associative array (hash).

=cut

sub new {
    my $this = shift;
    my $class = ref($this) || $this;
    my %params = @_;
    my $self = {%params};
    bless $self, $class;
    $self->initialize();
    return $self;
}

=head2 initialize()

This method performs any necessary base class initialization.
It takes no arguments (other than the object instance of course).
If subclasses override this method then they I<must> be sure to
invoke the superclass' B<initialize()> method.

=cut

sub initialize {
    my $self = shift;
    return;
}

=head2 select($section_spec1, $section_spec2, ...)

This is the method that is used to select the particular sections and
subsections of pod documentation that are to be printed and/or
processed. If the I<first> I<argument> is the string "+", then the
remaining section specifications are I<added> to the current list of
selections; otherwise the given section specifications will I<replace>
the current list of selections.

Each of the C<$section_spec> arguments should be a section
specification as described in L<"SECTION SPECIFICATIONS">.  The section
specifications are parsed by this method and the resulting regular
expressions are stored in the array referenced by
C<$self-E<gt>{SELECTED}> (please see the description of this member
variable in L<"INSTANCE DATA">).

This method should I<not> normally be overridden by subclasses.

=cut

sub select {
    my $self = shift;
    my @sections = @_;
    my $add = ($sections[0] eq "+") ? shift(@sections) : "";
    ## reset the set of sections to use
    unless (@sections > 0) {
        undef $self->{SELECTED}  unless ($add);
        return;
    }
    $self->{SELECTED} = []  unless ($add  &&  defined $self->{SELECTED});
    local($_);
    my ($spec, $errors);
    my (@re, $i, $negated, %exprs);
    ## Parse each spec
    for $spec (@sections) {
        $_ = $spec;
        s|\\\\|\001|go;  ## handle escaped backward slashes
        s|\\/|\002|go;   ## handle escaped forward slashes
        ## Extract the regexps for the heading titles
        @re = split('/', $_, $PDL::Pod::Parser::MAX_HEAD_LEVEL);
        ## Modify the regexps as needed and check their syntax
        for ($i = 0; $i < $PDL::Pod::Parser::MAX_HEAD_LEVEL; ++$i) {
            $re[$i]  = '.*'  if ((! defined $re[$i]) || $re[$i] eq "");
            $re[$i] .= '.+'  if ($re[$i] eq '!');
            ## Put back any escape sequences we "handled"
            $re[$i] =~ s|\001|\\\\|go;
            $re[$i] =~ s|\002|\\/|go;
            ## Check for negation
            $negated = ($re[$i] =~ /^\!/o);
            $re[$i] = $'  if ($negated);
            ## Check regexp syntax
            eval "/$re[$i]/";
            if ($@) {
                ++$errors;
                carp "Invalid regular expression /$re[$i]/ in \"$spec\": $@\n";
            }
            else {
                ## Add the forward and rear anchors (and put the negator back)
                $re[$i] = '^' . $re[$i]  unless ($re[$i] =~ /^\^/o);
                $re[$i] = $re[$i] . '$'  unless ($re[$i] =~ /\$$/o);
                $re[$i] = '!' . $re[$i]  if ($negated);
            }
        }
        if ($errors) {
            carp "Ignoring section spec \"$spec\"!\n";
        }
        else {
            ## Store them in our sections array
            push(@{$self->{SELECTED}}, [ @re ]);
        }
    }
}

=head2 want_section($head1_title, $head2_title, ...)

Returns a value of true if the given section and subsection titles match
any of the section specifications passed to the B<select()> method (or
if no section specifications were given). Returns a value of false
otherwise. If C<$headN_title> is ommitted then it defaults to the current
C<headN> section title in the input.

This method should I<not> normally be overridden by subclasses.

=cut

sub want_section {
    my $self = shift;
    my (@heads) = @_;
    ## Return true if no restrictions were explicitly specified
    return  1  unless ((defined $self->{SELECTED})
                       && (@{$self->{SELECTED}} > 0));

    ## default any unspecified sections to the current one
    my $i;
    for ($i = 0; $i < $PDL::Pod::Parser::MAX_HEAD_LEVEL; ++$i) {
        $heads[$i] = $self->{HEADINGS}->[$i]  unless (defined $heads[$i]);
    }

    ## Look for a match against the specified section expressions
    my ($sec_spec, $regexp, $negated, $match);
    for $sec_spec (@{$self->{SELECTED}}) {
        $match = 1;
        for ($i = 0; $i < $PDL::Pod::Parser::MAX_HEAD_LEVEL; ++$i) {
            $regexp  = $sec_spec->[$i];
            $negated = ($regexp =~ /^\!/o);
            $regexp  = $'  if ($negated);
            $match  &= ($negated ? ($heads[$i] !~ /${regexp}/)
                                 : ($heads[$i] =~ /${regexp}/));
            last unless ($match);
        }
        return  1  if ($match);
    }
    return  0;  ## no match
}

=head2 begin_input()

This method is invoked by B<parse_from_filehandle()> immediately I<before>
processing input from a filehandle. The base class implementation does
nothing but subclasses may override it to perform any per-file
intializations.

=cut

sub begin_input {
    my $self = shift;
    #----------------------------------------------------
    # Subclasses may wish to make use of some of the
    # commented-out code below for initializing pragmas
    #----------------------------------------------------
    # $self->{PRAGMAS} = {
    #     FILL     => 'on',
    #     STYLE    => 'plain',
    #     INDENT   => 0,
    # };
    # ## Initialize all PREVIOUS_XXX pragma values
    # my ($name, $value);
    # for (($name, $value) = each %{$self->{PRAGMAS}}) {
    #     $self->{PRAGMAS}->{"PREVIOUS_${name}"} = $value;
    # }
    #----------------------------------------------------
    return;
}

=head2 end_input()

This method is invoked by B<parse_from_filehandle()> immediately I<after>
processing input from a filehandle. The base class implementation does
nothing but subclasses may override it to perform any per-file
cleanup actions.

=cut

sub end_input {
    my $self = shift;
    return;
}

=head2 preprocess_line($text)

This methods should be overridden by subclasses that wish to perform any
kind of preprocessing for each I<line> of input (I<before> it has been
determined whether or not it is part of a pod paragraph). The parameter
C<$text> is the input line and the value returned should correspond to
the new text to use in its place. If the empty string or an undefined
value is returned then no further process will be performed for this
line. If desired, this method can call the B<parse_paragraph()> method
directly with any preprocessed text and return an empty string (to
indicate that no further processing is needed).

Please note that the B<preprocess_line()> method is invoked I<before>
the B<preprocess_paragraph()> method. After all (possibly preprocessed)
lines in a paragraph have been assembled together and it has been
determined that the paragraph is part of the pod documentation from one
of the selected sections, then B<preprocess_paragraph()> is invoked.

The base class implementation of this method returns the given text.

=cut

sub preprocess_line($text) {
    my $self = shift;
    my $text = shift;
    return  $text;
}

=head2 preprocess_paragraph($text)

This method should be overridden by subclasses that wish to perform any
kind of preprocessing for each block (paragraph) of pod documentation
that appears in the input stream.  The parameter C<$text> is the pod
paragraph from the input file and the value returned should correspond
to the new text to use in its place.  If the empty string is returned or
an undefined value is returned, then the given C<$text> is ignored (not
processed).

This method is invoked by B<parse_paragraph()>. After it returns,
B<parse_paragraph()> examines the current cutting state (which is
stored in C<$self-E<gt>{CUTTING}>). If it evaluates to false then input text
(including the given C<$text>) is cut (not processed) until the next pod
directive is encountered.

Please note that the B<preprocess_line()> method is invoked I<before>
the B<preprocess_paragraph()> method. After all (possibly preprocessed)
lines in a paragraph have been assembled together and it has been
determined that the paragraph is part of the pod documentation from one
of the selected sections, then B<preprocess_paragraph()> is invoked.

The base class implementation of this method returns the given text.

=cut

sub preprocess_paragraph {
    my $self = shift;
    my $text = shift;
    return  $text;
}

=head2 parse_pragmas($cmd, $text, $sep)

This method is called when an C<=pod> directive is encountered. When
such a pod directive is seen in the input, this method is called and is
passed the command name C<$cmd> (which should be "pod") and the
remainder of the text paragraph C<$text> which appeared immediately
after the command name. If desired, the text which separated the C<=pod>
directive from its corresponding text may be found in C<$sep>.  Each
word in C<$text> is examined to see if it is a pragma specification.
Pragma specifications are of the form C<pragma_name=pragma_value>.

Unless the given object is an instance of the B<PDL::Pod::Parser> class, the
base class implementation of this method will invoke the B<pragma()> method for
each pragma specification in C<$text>.  I<If and only if> the given
object I<is> an instance of the B<PDL::Pod::Parser> class, the base class
version of this method will simply reproduce the C<=pod> command exactly
as it appeared in the input.

Derived classes should I<not> usually need to reimplement this method.

=cut

sub parse_pragmas {
    my $self = shift;
    my $cmd  = shift;
    my $text = shift;
    my $sep  = shift;
    $cmd  = ''  unless (defined $cmd);
    $text = ''  unless (defined $text);
    $sep  = ' ' unless (defined $sep);
    local($_);
    if (ref($self) eq 'PDL::Pod::Parser') {
        ## If and only if this is an instance of the base class, then
        ## dump the '=pod' paragraph exactly as it appeared in the input.
        $self->print("=${cmd}${sep}${text}");
    }
    else {
        ## otherwise invoke the pragma command for each word in $text
        my @pragmas = split(" ", $text);
        for (@pragmas) {
            $self->pragma(lc $`, $')  if (/=/o);
        }
    }
}

=head2 pragma($pragma_name, $pragma_value)

This method is invoked for each pragma encountered inside an C<=pod>
paragraph (see the description of the B<parse_pragmas()> method). The
pragma name is passed in C<$pragma_name> (which should always be
lowercase) and the corresponding value is C<$pragma_value>.

The base class implementation of this method does nothing.  Derived
class implementations of this method should be able to recognize at
least the following pragmas and take any necessary actions when they are
encountered:

=over 4

=item B<fill=value>

The argument I<value> should be one of C<on>, C<off>, or C<previous>.
Specifies that "filling-mode" should set to 1, 0, or its previous value
(respectively). If I<value> is omitted then the default is C<on>.
Derived classes may use this to decide whether or not to perform any
filling (wrapping) of subsequent text.

=item B<style=value>

The argument I<value> should be one of C<bold>, C<italic>, C<code>,
C<plain>, or C<previous>. Specifies that the current default paragraph
font should be set to C<bold>, C<italic>, C<code>, the empty string C<>,
or its previous value (respectively).  If I<value> is omitted then the
default is C<plain>.  Derived classes may use this to determine the
default font style to use for subsequent text.

=item B<indent=value>

The argument I<value> should be an integer value (with an optional
sign).  Specifies that the current indentation level should be reset to
the given value. If a plus (minus) sign precedes the number then the
indentation level should be incremented (decremented) by the given
number. If only a plus or minus sign is given (without a number) then
the current indentation level is incremented or decremented by some
default amount (to be determined by subclasses).

=back

The value returned will be 1 if the pragma name was recognized and 0 if
it wasnt (in which case the pragma was ignored).

Derived classes should override this method if they wish to implement
any pragmas. The base class implementation of this method does nothing
but it does contain some commented-out code which subclasses may want
to make use of when implementing pragmas.

=cut

sub pragma {
    my $self  = shift;
    ## convert remaining args to lowercase
    my $name  = lc shift;
    my $value = lc shift;
    my $rc = 1;
    local($_);
    #----------------------------------------------------
    # Subclasses may wish to make use of some of the
    # commented-out code below for processing pragmas
    #----------------------------------------------------
    # my ($abbrev, %abbrev_table);
    # if ($name eq 'fill') {
    #     %abbrev_table = ('on' => 'on',
    #                      'of' => 'off',
    #                      'p'  => 'previous');
    #     $value = 'on' unless ((defined $value) && ($value ne ''));
    #     return  $rc  unless ($value =~ /^(on|of|p)/io);
    #     $abbrev = $1;
    #     $value = $abbrev_table{$abbrev};
    #     if ($value eq 'previous') {
    #         $self->{PRAGMAS}->{FILL} = $self->{PRAGMAS}->{PREVIOUS_FILL};
    #     }
    #     else {
    #         $self->{PRAGMAS}->{PREVIOUS_FILL} = $self->{PRAGMAS}->{FILL};
    #         $self->{PRAGMAS}->{FILL} = $value;
    #     }
    # }
    # elsif ($name eq 'style') {
    #     %abbrev_table = ('b'  => 'bold',
    #                      'i'  => 'italic',
    #                      'c'  => 'code',
    #                      'pl' => 'plain',
    #                      'pr' => 'previous');
    #     $value = 'plain' unless ((defined $value) && ($value ne ''));
    #     return  $rc  unless ($value =~ /^(b|i|c|pl|pr)/io);
    #     $abbrev = $1;
    #     $value = $abbrev_table{$abbrev};
    #     if ($value eq 'previous') {
    #         $self->{PRAGMAS}->{STYLE} = $self->{PRAGMAS}->{PREVIOUS_STYLE};
    #     }
    #     else {
    #         $self->{PRAGMAS}->{PREVIOUS_STYLE} = $self->{PRAGMAS}->{STYLE};
    #         $self->{PRAGMAS}->{STYLE} = $value;
    #     }
    # }
    # elsif ($name eq 'indent') {
    #     return $rc unless ((defined $value) && ($value =~ /^([-+]?)(\d*)$/o));
    #     my ($sign, $number) = ($1, $2);
    #     $value .= "4"  unless ((defined $number) && ($number ne ''));
    #     $self->{PRAGMAS}->{PREVIOUS_INDENT} = $self->{PRAGMAS}->{INDENT};
    #     if ($sign) {
    #         $self->{PRAGMAS}->{INDENT} += (0 + $value);
    #     }
    #     else {
    #         $self->{PRAGMAS}->{INDENT} = $value;
    #     }
    # }
    # else {
    #     $rc = 0;
    # }
    #----------------------------------------------------
    return $rc;
}

=head2 command($cmd, $text, $sep)

This method should be overridden by subclasses to take the appropriate
action when a pod command paragraph (denoted by a line beginning with
"=") is encountered.  When such a pod directive is seen in the input,
this method is called and is passed the command name C<$cmd> and the
remainder of the text paragraph C<$text> which appears immediately after
the command name. If desired, the text which separated the command from
its corresponding text may be found in C<$sep>.  Note that this method
is I<not> called for C<=pod> paragraphs.

The base class implementation of this method simply prints the raw pod
command to the output filehandle and then invokes the B<textblock()>
method, passing it the C<$text> parameter.

=cut

sub command {
    my $self = shift;
    my $cmd  = shift;
    my $text = shift;
    my $sep  = shift;
    $cmd  = ''  unless (defined $cmd);
    $text = ''  unless (defined $text);
    $sep  = ' ' unless (defined $sep);
    $self->print("=${cmd}${sep}");
    $self->textblock($text);
}

=head2 verbatim($text)

This method may be overridden by subclasses to take the appropriate
action when a block of verbatim text is encountered. It is passed the
text block C<$text> as a parameter.

The base class implementation of this method simply prints the textblock
(unmodified) to the output filehandle.

=cut

sub verbatim {
    my $self = shift;
    my $text = shift;
    $self->print($text);
}

=head2 textblock($text)

This method may be overridden by subclasses to take the appropriate
action when a normal block of pod text is encountered (although the base
class method will usually do what you want). It is passed the text block
C<$text> as a parameter.

In order to process interior sequences, subclasses implementations of
this method will probably want invoke the B<interpolate()> method,
passing it the text block C<$text> as a parameter and then perform any
desired processing upon the returned result.

The base class implementation of this method simply prints the text block
as it occurred in the input stream).

=cut

sub textblock {
    my $self  = shift;
    my $text  = shift;
    local($_) = $self->interpolate($text);
    $self->print($_);
}

=head2 interior_sequence($seq_cmd, $seq_arg)

This method should be overridden by subclasses to take the appropriate
action when an interior sequence is encountered. An interior sequence is
an embedded command within a block of text which appears as a command
name (usually a single uppercase character) followed immediately by
a string of text which is enclosed in angle brackets. This method is
passed the sequence command C<$seq_cmd> and the corresponding text
$seq_arg and is invoked by the B<interpolate()> method for each
interior sequence that occurs in the string that it is passed.
It should return the desired text string to be used in place of
the interior sequence.

Subclass implementationss of this method may wish to examine the
the array referenced by C<$self-E<gt>{SEQUENCES}> which is a
stack of all the interior sequences that are currently being
processed (they may be nested). The current interior sequence
(the one given by C<$seq_cmdE<lt>$seq_argE<gt>>) should always
be at the top of this stack.

The base class implementation of the B<interior_sequence()> method simply
returns the raw text of the of the interior sequence (as it occurred in
the input) to the output filehandle.

=cut

sub interior_sequence {
    my $self = shift;
    my $seq_cmd = shift;
    my $seq_arg = shift;
    return  "${seq_cmd}<${seq_arg}>";
}

=head2 interpolate($text, $end_re)

This method will translate all text (including any embedded interior
sequences) in the given text string C<$text> and return the
interpolated result.  If a second argument is given, then it is taken to
be a regular expression that indicates when to quit interpolating the
string.  Upon return, the C<$text> parameter will have been modified to
contain only the un-processed portion of the given string (which will
I<not> contain any text matched by C<$end_re>).

This method should probably I<not> be overridden by subclasses.
It should be noted that this method invokes itself recursively
to handle any nested interior sequences.

=cut

sub interpolate {
    my $self = shift;
    my ($text, $end_re) = @_;
    $text   = ''   unless (defined $text);
    $end_re = '$'  unless ((defined $end_re) && ($end_re ne ''));
    local($_)  = $text;
    my $result = '';
    my ($seq_cmd, $seq_arg, $end) = ('', '', undef);
    while (($_ ne '') && /([A-Z])<|($end_re)/) {
        $result .= $`;  ## Append text before the match to the result
        $_ = $';        ## Only text after the match remains to be processed
        ## See if we matched an interior sequence or an end-expression
        ($seq_cmd, $end) = ($1, $2);
        last if (defined $end);  ## Saw the end - quit loop here
        ## At this point we have found an interior sequence,
        ## we need to obtain its argument
        push(@{$self->{SEQUENCES}}, $seq_cmd);
        $seq_arg = $self->interpolate($_, '>');
        ## Now process the interior sequence
        $result .= $self->interior_sequence($seq_cmd, $seq_arg);
        pop(@{$self->{SEQUENCES}});
    }
    ## Handle whatever is left if we didnt match the ending regexp
    unless ((defined $end) && ($end_re ne '$')) {
        $result .= $_;
        $_ = '';
    }
    ## Modify the input parameter to consume the text that was
    ## processed so far.
    $_[0] = $_;
    ## Return the processed-text
    return  $result;
}

=head2 parse_paragraph($text)

This method takes the text of a pod paragraph to be processed and
invokes the appropriate method (one of B<command()>, B<verbatim()>,
or B<textblock()>).

This method does I<not> usually need to be overridden by subclasses.

=cut

sub parse_paragraph {
    my $self = shift;
    local $_ = shift;
    my ($para);
    my ($cmd, $arg, $sep);
    ## Keep track of current paragraph number
    ++$self->{PARAGRAPH}  if ((defined $_)  &&  ($_ ne ''));
    ## Ignore up until next pod directive if we are cutting
    if ($self->{CUTTING}) {
        return  1  unless /^=/o;
        $self->{CUTTING} = 0;
    }
    ## Keep track of current sections and their titles
    my ($level, $title);
    if (/^=((?:sub)*)(?:head(?:ing)?|sec(?:tion)?)(\d*)\s+(.*)\s*$/o) {
        ## This is a section heading command
        ($level, $title) = ($2, $3);
        $level = 1 + (length($1) / 3)  if (($level eq '') || ($1 ne ''));
        ## Reset the current section heading at this level
        $self->{HEADINGS}->[$level - 1] = $title;
        ## Reset subsection headings of this one to empty
        my $i;
        for ($i = $level; $i < $PDL::Pod::Parser::MAX_HEAD_LEVEL; ++$i) {
            $self->{HEADINGS}->[$i] = '';
        }
    }
    ## Ignore this block if it isnt in one of the selected sections
    $self->{CUTTING} = 1   unless ($self->want_section());
    return  1  if ($self->{CUTTING});

    ## Perform any desired preprocessing
    $para = $self->preprocess_paragraph($_);
    next unless ((defined $para) && ($para ne ""));
    $_ = $para;
    return  1  if ($self->{CUTTING});

    ## Look for one of the three types of paragraphs
    if (s/^=//o) {
        ## Looks like a command paragraph
        ($cmd, $sep, $_) = /\s+/o ? ($`, $&, $') : ($_, '', '');
        if ($cmd eq 'cut') {
            $self->{CUTTING} = 1;
        }
        elsif ($cmd eq 'pod') {
            ## This is a pod pragma paragraph
            $self->parse_pragmas($cmd, $_, $sep);
        }
        else {
            ## Some other command
            $self->command($cmd, $_, $sep);
        }
    }
    elsif (/^\s+/o) {
        ## Indented text - must be a verbatim paragraph
        $self->verbatim($_);
    }
    else {
        ## Looks like an ordinary block of text
        $self->textblock($_);
    }
    return  1;
}

=head2 parse_from_filehandle($infilehandle, $outfilehandle)

This method takes a glob to a filehandle (which is assumed to already be
opened for reading) and reads the entire input stream looking for blocks
(paragraphs) of pod documentation to be processed. For each block of pod
documentation encountered it will call the B<parse_paragraph()> method.

If a second argument is given then it should be a filehandle glob where
output should be sent (otherwise the default output filehandle is
C<STDOUT>). If no first argument is given the default input filehandle
C<STDIN> is used.

The input filehandle that is currently in use is stored in the member
variable whose key is "INPUT" (e.g. C<$self-E<gt>{INPUT}>).

The output filehandle that is currently in use is stored in the member
variable whose key is "OUTPUT" (e.g. C<$self-E<gt>{OUTPUT}>).

Input is read line-by-line and assembled into paragraphs (which are
separated by lines containing nothing but whitespace). The current line
number is stored in the member variable whose key is "LINE" (e.g.
C<$self-E<gt>{LINE}>) and the current paragraph number is stored in the
member variable whose key is "PARAGRAPH" (e.g.  C<$self-E<gt>{PARAGRAPH}>).

This method does I<not> usually need to be overridden by subclasses.

=cut

sub parse_from_filehandle {
    my $self = shift;
    my $infilehandle = shift;
    my $outfilehandle = shift;
    local($_);

    $infilehandle  = \*STDIN   unless (defined $infilehandle);
    $outfilehandle = \*STDOUT  unless (defined $outfilehandle);

    ## Initialize stuff for this input stream
    $self->{INPUT}     = $infilehandle;
    $self->{OUTPUT}    = $outfilehandle;
    $self->{CUTTING}   = 1;   ## Keep track of when we are cutting
    $self->{SEQUENCES} = [];  ## Keep track of nested interior sequences
    ## Initialize section heading titles
    {
        $self->{HEADINGS}   = [];
        my $i;
        for ($i = 0; $i < $PDL::Pod::Parser::MAX_HEAD_LEVEL; ++$i) {
            $self->{HEADINGS}->[$i] = '';
        }
    }

    $self->begin_input();

    my $paragraph = '';
    while (<$infilehandle>) {
        ++$self->{LINE};  ## Keep track of current line number
        $_ = $self->preprocess_line($_);
        next  unless ((defined $_)  &&  ($_ ne ''));
        if (! /^\s*$/o) {
            ## Append this line to the current paragraph
            $paragraph .= $_;
            next;
        }
        ## This line is blank line and ends the current paragraph
        next  if ($paragraph eq '');
        $paragraph .= "\n";
        $self->parse_paragraph($paragraph);
        $paragraph = '';
    }
    ## Dont forget about the last paragraph in the file
    $self->parse_paragraph($paragraph)  unless ($paragraph eq '');

    $self->end_input();
}

=head2 parse_from_file($filename, $outfile)

This method takes a filename and does the following:

=over 4

=item *

opens the input and output files for reading
(creating the appropriate filehandles)

=item *

invokes the B<parse_from_filehandle()> method passing it the
corresponding input and output filehandles.

=item *

closes the input and output files.

=back

If the special input filename "-" or "<&STDIN" is given then the STDIN
filehandle is used for input (and no open or close is performed).  If no
input filename is specified then "-" is implied.  If a reference is
passed instead of a filename then it is assumed to be a glob-style
reference to a filehandle.


If a second argument is given then it should be the name of the desired
output file.  If the special output filename "-" or ">&STDOUT" is given
then the STDOUT filehandle is used for output (and no open or close is
performed). If the special output filename ">&STDERR" is given then the
STDERR filehandle is used for output (and no open or close is
performed).  If no output filename is specified then "-" is implied.
If a reference is passed instead of a filename then it is assumed to
be a glob-style reference to a filehandle.

The name of the input file that is currently being read is stored in the
member variable whose key is "INFILE" (e.g. C<$self-E<gt>{INFILE}>).

The name of the output file that is currently being written is stored in
the member variable whose key is "OUTFILE" (e.g.  C<$self-E<gt>{OUTFILE}>).

This method does I<not> usually need to be overridden by subclasses.

=cut

sub parse_from_file {
    my $self = shift;
    my $infile = shift;
    my $outfile = shift;
    my ($in_fh,  $in_fh_name)  = (undef, undef);
    my ($out_fh, $out_fh_name) = (undef, undef);

    $infile  = '-'  unless ((defined $infile)  && ($infile ne ''));
    $outfile = '-'  unless ((defined $outfile) && ($outfile ne ''));

    if (($infile  eq '-') || ($infile =~ /^<&(STDIN|0)$/o)) {
        $in_fh = \*STDIN;
        $self->{INFILE} = "<standard input>";
    }
    elsif (ref $infile) {
        $in_fh = $infile;
        $self->{INFILE} = ${$infile};;
    }
    else {
        $self->{INFILE} = $in_fh_name = $infile;
        $in_fh_name =~ s/\W/_/g;
        no strict "refs";
        open($in_fh_name, "<$infile") ||
             croak "Can't open $infile for reading: $!\n";
        $in_fh = \*$in_fh_name;
    }

    if (($outfile  eq '-') || ($outfile =~ /^>&?(STDOUT|1)$/o)) {
        $out_fh  = \*STDOUT;
        $self->{OUTFILE} = "<standard output>";
    }
    elsif ($outfile =~ /^>&(STDERR|2)$/o) {
        $out_fh  = \*STDERR;
        $self->{OUTFILE} = "<standard error>";
    }
    elsif (ref $outfile) {
        $out_fh = $outfile;
        $self->{OUTFILE} = ${$outfile};;
    }
    else {
        $self->{OUTFILE} = $out_fh_name = $outfile;
        $out_fh_name =~ s/\W/_/g;
        no strict "refs";
        open($out_fh_name, ">$outfile") ||
             croak "Can't open $outfile for writing: $!\n";
        $out_fh = \*$out_fh_name;
    }

    $self->parse_from_filehandle($in_fh, $out_fh);

    if (defined $in_fh_name) {
        close($in_fh) || croak "Can't close $infile after reading: $!\n";
    }
    if (defined $out_fh_name) {
        close($out_fh) || croak "Can't close $outfile after writing: $!\n";
    }
}

sub print {
  my $self = shift;
  if (ref($self->{OUTPUT}) eq 'GLOB') {
    my $out_fh = $self->{OUTPUT};
    print $out_fh @_;
  } else {
    $self->{OUTPUT}->print(@_);
  }
}

##---------------------------------------------------------------------------

=head1  INSTANCE DATA

B<PDL::Pod::Parser> uses the following data members for each of its
instances (where C<$self> is a reference to such an instance):

=head2 $self->{INPUT}

The current input filehandle.

=head2 $self->{OUTPUT}

The current output filehandle.

=head2 $self->{INFILE}

The name of the current input file.

=head2 $self->{OUTFILE}

The name of the current output file.

=head2 $self->{LINE}

The current line number from the input stream.

=head2 $self->{PARAGRAPH}

The current paragraph number from the input stream (which includes input
paragraphs that are I<not> part of the pod documentation).

=head2 $self->{HEADINGS}

A reference to an array of the current section heading titles for each
heading level (note that the first heading level title is at index 0).

=head2 $self->{SELECTED}

A reference to an array of references to arrays. Each subarray is a list
of anchored regular expressions (preceded by a "!" if the regexp is to be
negated). The index of the expression in the subarray should correspond
to the index of the heading title in B<$self-E<gt>{HEADINGS}> that it is
to be matched against.

=head2 $self->{CUTTING}

A boolean-valued scalar which evaluates to true if text from the
input file is currently being "cut".

=head2 $self->{SEQUENCES}

An array reference to the stack of interior sequence commands that are
currently in the middle of being processed.

=head1 NOTES

To create a pod translator to translate pod documentation to some other
format, you usually only need to create a subclass of B<PDL::Pod::Parser>
which overrides the base class implementation for the following methods:

=over 4

=item *

B<pragma()>

=item *

B<command()>

=item *

B<verbatim()>

=item *

B<textblock()>

=item *

B<interior_sequence()>

=back

You may also want to implement the B<begin_input()> and B<end_input()>
methods for your subclass (to perform any needed per-file intialization
or cleanup).

If you need to perform any preprocesssing of input before it is parsed
you may want to implement one or both of the B<preprocess_line()> and/or
B<preprocess_paragraph()> methods.

Also, don't forget to make sure your subclass constructor invokes the
base class' B<initialize()> method.

Sometimes it may be necessary to make more than one pass over the input
files. This isn't a problem as long as none of the input files correspond
to C<STDIN>. You can override either the B<parse_from_filehandle()>
method or the B<parse_from_file()> method to make the first pass yourself
to collect all the information you need and then invoke the base class
method to do the rest of the standard processing.

Feel free to add any member data fields you need to keep track of things
like current font, indentation, horizontal or vertical position, or
whatever else you like.

For the most part, the B<PDL::Pod::Parser> base class should be able to
do most of the input parsing for you and leave you free to worry about
how to intepret the commands and translate the result.

=head1 AUTHOR

Brad Appleton E<lt>Brad_Appleton-GBDA001@email.mot.comE<gt>

Based on code for B<Pod::Text> written by
Tom Christiansen E<lt>tchrist@mox.perl.comE<gt>

=cut

1;