File: developers_guide.rst

package info (click to toggle)
fparser 0.2.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 3,084 kB
  • sloc: python: 28,555; f90: 70; makefile: 36
file content (1028 lines) | stat: -rw-r--r-- 42,995 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
..  Copyright (c) 2017-2023 Science and Technology Facilities Council.

    All rights reserved.

    Modifications made as part of the fparser project are distributed
    under the following license:

    Redistribution and use in source and binary forms, with or without
    modification, are permitted provided that the following conditions are
    met:

    1. Redistributions of source code must retain the above copyright
    notice, this list of conditions and the following disclaimer.

    2. Redistributions in binary form must reproduce the above copyright
    notice, this list of conditions and the following disclaimer in the
    documentation and/or other materials provided with the distribution.

    3. Neither the name of the copyright holder nor the names of its
    contributors may be used to endorse or promote products derived from
    this software without specific prior written permission.

    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
    HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

.. _developers:

Developer Guide
===============

Reading Fortran
---------------

A key part of the fparser package is support for reading Fortran code.
`fparser.common.readfortran.FortranFileReader` provides this functionality
for source files while `FortranStringReader` supports Fortran source
provided as a string. Both of these classes sub-class `FortranReaderBase`:

.. autoclass:: fparser.common.readfortran.FortranReaderBase

Note that the setting for `ignore_comments` provided here can be overridden
on a per-call basis by methods such as `get_single_line`.
The 'mode' of the reader is controlled by passing in a suitable instance of
the `FortranFormat` class:

.. autoclass:: fparser.common.sourceinfo.FortranFormat

Due to its origins in the f2py project, the reader contains support
for recognising `f2py` directives
(https://numpy.org/devdocs/f2py/signature-file.html). However, this
functionality is disabled by default.

A convenience script called read.py is provided in the scripts
directory which takes a filename as input and returns the file
reader's representation of that file. This could be useful for
debugging purposes.

Invalid input
-------------

The file reader uses :py:func:`open` to open a Fortran file. If
invalid input is found then Python raises a `UnicodeDecodeError`
exception by default. Since we typically wish to skip invalid
characters (on the principle that, for valid Fortran, they can only
occur in comments) while logging their presence, a bespoke error
handler named "fparser-logging" is implemented in
``fparser/__init__.py`` and registered using
:py:func:`codecs.register_error`.  This handler may be specified when
using :py:func:`open` to open a file by supplying the
``errors='fparser-logging'`` argument.

Fparser2
--------

Fparser2 supports Fortran2003 and is being extended to support
Fortran2008. Fparser2 is being actively developed and will fully
replace fparser1 in the future.

.. _rules:

Rules
+++++

Each version of the Fortran language is defined as a set of rules in a
specification document. The Fortran2003 rules are specified here
https://wg5-fortran.org/N1601-N1650/N1601.pdf and the Fortran2008
rules are specified here
https://j3-fortran.org/doc/year/10/10-007r1.pdf.

Each rule has a number, for example the Fortran2003 document includes
the following top level rules `R201` and `R202`
::

    R201 program is program-unit
                    [ program-unit ] ...

    R202 program-unit is main-program
                         or external-subprogram
                         or module
                         or block-data

It can be seen that the right hand side of these rules consist of more
rules. Note, `[]` means that the content is optional. At some point in
the rule hierarchy rules start to be defined by text. For example,
taking a look at the specification of a module
::

    R1104 module is module-stmt
                    [ specification-part ]
                    [ module-subprogram-part ]
                    end-module-stmt

    R1105 module-stmt is MODULE module-name

    R1106 end-module-stmt is END [ MODULE [ module-name ] ]

it can be seen that rules `R1105` and `R1106` specify the actual code to
write e.g. `MODULE`. Here `module-name` is a type of `name` which has
a rule specifying what is valid syntax (see the specification document
for more details).

Therefore Fortran is specified as rules which reference other rules,
or specify a particular syntax. The top level rule of this hierarchy
is rule `R201`, which defines a program, see above.


Classes
+++++++

In fparser2 each rule is implemented in a class with the class names
closely following the rule names. For example, `program` is
implemented by the `Program` class and `program-unit` is implemented
by the `Program_Unit` class. In general, the name of the class
corresponding to a given rule can be obtained by replacing '-' with
'_' and capitalising each word.

The Fortran2003 classes exist in the Fortran2003.py file and the
Fortran2008 classes exist in the Fortran2008.py file (see
:ref:`Fortran2008` section for Fortran2008-specific implementation
details).

The Fortran2003 and Fortran2008 classes can inherit from a set of
pre-existing base classes which implement certain rule patterns in a
generic way. The base classes are contained in the `utils.py` file.

The base classes and rule patterns are discussed more in the
:ref:`base-classes` section.

The primary components of classes i.e. the parts that developers
typically need to be concerned with are:

1) the `subclass_names` list
2) the `use_names` list
3) the static `match` method
4) the `tostr` method

A `subclass_names` list of classes should be provided when the rule is
a simple choice between classes. In this case the `Base` class ensures that each
child class is tested for a match and the one that matches is
returned. An example of a simple choice rule is `R202`. See the
:ref:`program-unit-class` section for a description of its
implementation.

The `use_names` list should contain any classes that are referenced by the
implementation of the current class. These lists of names are aggregated
(along with `subclass_names`) and used to ensure that all necessary `Scalar_`,
`_List` and `_Name` classes are generated (in code at the end of the
`Fortran2003` and `Fortran2008` modules - see :ref:`class-generation`).

When the rule is not a simple choice the developer needs to supply a
static `match` method. An example of this is rule `R201`. See the
:ref:`program-class` section for a description of its implementation.

.. note::

   A `tostr` description, explanation and example needs to be added.

Class Relationships
+++++++++++++++++++

When a rule is a simple choice, the class implementing this rule
provides a list of classes to be matched in the `subclass_names` list
(or potentially `use_names` list). These class names are provided as
strings, not references to the classes themselves.

In fparser2 these strings are used to create class references to allow
matching to be performed. The creation of class references is
implemented by the `create` method of the `ParserFactory` object.

The `create` method of the `ParserFactory` class also links to
appropriate classes to create parsers compliant to the specified
standard.

.. note::

   The ParserFactory implementation needs to be explained.

A parser conforming to a particular Fortran standard is created by a
ParserFactory object. For example::

    >>> from fparser.two.parser import ParserFactory
    >>> parser_f2003 = ParserFactory().create(std="f2003")

The `create` method returns a `Program` *class* (called `parser_f2003`
in the above example) which contains a `subclasses` dictionary
(declared in its base class - called `Base`) configured with *all* the
Fortran2003 class relationships specified by the `subclass_names` and
`use_names` lists in each class.

As all classes inherit from the `Base` class, the `subclasses`
dictionary is available to all classes. If, for example, we query the
dictionary for the `Program` class relationships we get an empty list
as it has no `subclass_names` or `use_names` entries specified (see
:ref:`program-class`). If however, we query the dictionary for the
`Program_unit` relationships we get the list of classes specified in
that classes `subclass_names` list (see :ref:`program-unit-class`)::

    >>> parser_f2003.__name__
    'Program'
    >>> parser_f2003.subclasses['Program']
    []
    >>> parser_f2003.subclasses['Program_Unit']
    [<class 'fparser.two.Fortran2003.Main_Program'>, <class 'fparser.two.Fortran2003.Function_Subprogram'>, <class 'fparser.two.Fortran2003.Subroutine_Subprogram'>, <class 'fparser.two.Fortran2003.Module'>, <class 'fparser.two.Fortran2003.Block_Data'>]

Symbol Table
++++++++++++

There are many situations when it is not possible to disambiguate the
precise form of the Fortran being parsed without additional type
information (e.g. whether code of the form `a(i,j)` is an array
access or a function call).  Therefore fparser2 contains a single,
global instance of a `SymbolTables` class, accessed as
`fparser.two.symbol_table.SYMBOL_TABLES`. As its name implies, this
holds a collection of symbol tables, one for each top-level scoping
unit (e.g. module or program unit). This is implemented as a
dictionary where the keys are the names of the scoping units e.g. the
name of the associated module, program, subroutine or function. The
corresponding dictionary entries are instances of the `SymbolTable`
class:

.. autoclass:: fparser.two.symbol_table.SymbolTable
   :members:

The entries in these tables are instances of the named tuple,
`SymbolTable.Symbol` which currently has the properties:

 * name
 * primitive_type

Both of these are stored as strings. In future, support for more
properties (e.g. kind, shape, visibility) will be added and strings
replaced with enumerations where it makes sense. Similarly, support
will be added for other types of symbols (e.g. those representing
program/subroutine names or reserved Fortran keywords).

Symbols available in the scoping region of a module may be made
available in another scoping region through one or more `USE` statements.
In a `SymbolTable` such uses are captured as instances of `ModuleUse`:

.. autoclass:: fparser.two.symbol_table.ModuleUse

These instances are created by calling:

.. automethod:: fparser.two.symbol_table.SymbolTable.add_use_symbols

Fortran has support for nested scopes - e.g. variables declared within
a module are in scope within any routines defined within that
module. Therefore, when searching for the definition a symbol, we
require the ability to search up through all symbol tables accessible
from the current scope. In order to support this functionality, each
`SymbolTable` instance therefore has a `parent` property. This holds a
reference to the table that contains the current table (if any).

Since fparser2 relies heavily upon recursion, it is important that the
current scoping unit always be available from any point in the code.
Therefore, the `SymbolTables` class has the `current_scope` property
which contains a reference to the current `SymbolTable`. Obviously,
this property must be updated as the parser enters and leaves scoping
units.  This is handled for all cases bar one within the `BlockBase`
base class since this is sub-classed by all classes which represent a
block of code and that therefore includes all those which define a
scoping region. The exception is the helper class
`Fortran2003.Main_Program0` which represents Program units that do not
include the (optional) program-stmt (see R1101 in the Fortran
standard).  The creation of a scoping unit for such a program is
handled within the `Fortran2003.Main_Program0.match()` method. Since
there is no name associated with such a program, the corresponding
symbol table is given the name "fparser2:main_program", chosen so as
to prevent any clashes with other Fortran names.

Those classes which define scoping regions must subclass the
`ScopingRegionMixin` class:

.. autoclass:: fparser.two.utils.ScopingRegionMixin


.. _class-generation:

Class Generation
++++++++++++++++

Some classes that are specified as strings in the `subclass_names` or
`use_names` variables do not require class implementations. There are 3
categories of these:

1) classes of the form '\*\_Name'
2) classes of the form '\*\_List'
3) classes of the form 'Scalar\_\*'

The reason for this is that such classes can be written in a generic,
boiler-plate way so it is simpler if these are generated rather than
them having to be hand written.

At the end of the Fortran2003.py and Fortran2008.py files there is
code that is executed when the file is imported. This code generates
the required classes described above in the local file.

.. note::

   The way this is implemented needs to be described.

As a practical example, consider rule `R1106`
::

   R1106 end-module-stmt is END [ MODULE [ module-name ] ]

which is implemented in the following way
::

    class End_Module_Stmt(EndStmtBase):  # R1106
        ''' <description> '''
        subclass_names = []
        use_names = ['Module_Name']

        @staticmethod
        def match(string):
            return EndStmtBase.match('MODULE', Module_Name, string)

It can be seen that the `Module_Name` class is specified as a string
in the `use_names` variable. The `Module_Name` class has no
implementation in the Fortran2003.py file, the class is
generated. This code generation is performed when the file is
imported.

.. note::

   At the moment the same code-generation code is replicated in both
   the Fortran2003.py and Fortran2008.py files. It would be better to
   import this code from a separate file if it is possible to do so.

.. _base-classes:

Base classes
++++++++++++

There are a number of base classes implemented to support matching
certain types of pattern in a rule. The two most commonly used are
given below. As mentioned earlier, the class `Base` supports a choice
between classes. The class `BlockBase` supports an initial and final
match with optional subclasses inbetween (useful for matching rules
such as programs, subroutines, if statements etc.).

.. autoclass:: fparser.two.utils.Base
               :members:
               :noindex:

.. autoclass:: fparser.two.utils.BlockBase
               :members:
               :noindex:

.. note::

   The `BlockBase` `match` method is complicated. One way to simplify this
   would be to create a `NamedBlockBase` which subclasses `BlockBase`. This
   would include the code associated with a block having a name.

.. _Fortran2008:

Fortran2008 implementation
++++++++++++++++++++++++++

As Fortran2008 is a superset of Fortran2003, the Fortran2008 classes
are implemented as extensions to the Fortran2003 classes where
possible. For example, the Fortran2003 rule for a program-unit is::
   
    R202 program-unit is main-program
                         or external-subprogram
                         or module
                         or block-data

and for Fortran2008 it is
::
   
    R202 program-unit is main-program
                         or external-subprogram
                         or module
                         or submodule
                         or block-data

Therefore to implement the Fortran2008 version of this class, the
Fortran2003 version needs to be extended with the `subclass_names`
list being extended to include a `Submodule` class as a string (of
course the `Submodule` class also needs to be implemented!)
::

    >>> from fparser.two.Fortran2003 import Program_Unit as Program_Unit_2003

    >>> class Program_Unit(Program_Unit_2003):  # R202
    >>>       ''' <description> '''
    >>>       subclass_names = Program_Unit_2003.subclass_names[:]
    >>>       subclass_names.append("Submodule")


.. _program-class:

Program Class (rule R201)
+++++++++++++++++++++++++

As discussed earlier, Fortran rule `R201` is the 'top level' Fortran
rule. There are no other rules that reference rule `R201`. The rule
looks like this::

    R201 program is program-unit
                    [ program-unit ] ...

which specifies that a Fortran program can consist of one or more program
units. Note, the above rule does not capture the fact that it is valid
to have an arbitrary number of comments before the first program-unit,
inbetween program-units and after the final program-unit.

As the above rule is not a simple choice between different rules a
static `match` method is required for the associated fparser2
`Program` class.

As discussed earlier there are a number of base classes implemented to
support matching certain types of pattern in a rule. The obvious one
to use here would be `BlockBase` as it supports a compulsory first
class, an arbitrary number of optional intermediate classes (provided
as a list) and a final class. Therefore, subclassing `BlockBase` and
setting the first class to `Program_Unit`, the intermediate classes to
`[Program_Unit]`, and the final class to `None` would seem to perform
the required functionality (and this was how it was implemented in
earlier versions of fparser2).

However, there is a problem using `BlockBase`. In the case where there
is no final class (which is the situation here) it is valid for the
first class to match and for an optional class to **fail** to
match. This is not the required behaviour for the `Program` class as, if an
optional `Program_Unit` exists then it must be a valid `Program_Unit`
or the code is invalid. For example, the following code is invalid as
there is a misspelling of `subroutine`::

    program test
    end
    subroutin broken
    end

To implement the required functionality for the `Program` class, the
static `match` method is written manually. A `while` loop is used to
ensure that there is no match if any `Program_Unit` is invalid.

There are also two contraints that must be adhered to by the `Program`
class:

1) Only one program unit may be a main program
2) Any name used by a program-unit (e.g. program fred) must be
   distinct from names used in other program-units.

At the moment neither of these two contraints are enforced in
fparser2. Therefore two xfailing tests `test_one_main1` and
`test_multiple_error1` have been added to the
`tests/fortran2003/test_program_r201.py` file to demonstrate these
limitations.

Further, in Fortran the `program` declaration is actually
optional. For example, the following is a valid (minimal) Fortran
program::

    end

fparser2 does not support the above syntax in its `Program_Unit`
class. Therefore as a workaround, a separate `Program_Unit0` class has
been implemented and added as a final test to the `Program` match
method. This does make use of `BlockBase` to match and therefore
requires the `Program` class to subclass `BlockBase`.

.. note::
   
   It would be much better if `Program_Unit` was coded to support
   optional program declarations and this option should be
   investigated.

The current implementation also has a limitation in that
multiple program-units with one of them not having a program
declaration are not supported. The xfailing test
`test_missing_prog_multi` has been added to the
`tests/fortran2003/test_program_r201.py` file to demonstrate this
limitation.

A final issue is that the line numbers and line information output is
incorrect in certain cases where there is a syntax error in the code
and there are 5 spaces before a statement. The xfailing tests
`test_single2` and `test_single3` have been added to the
`tests/fortran2003/test_program_r201.py` file to demonstrate this
error.

.. _program-unit-class:

Program_Unit Class (rule R202)
++++++++++++++++++++++++++++++

Fortran2003 rule `r202` is specified as
::

    R202 program-unit is main-program
                         or external-subprogram
                         or module
                         or block-data

As the above rule is a simple choice between different rules, the
appropriate matching code is already implemented in one of the base
classes (`Base`) and therefore does not need to be written.  Instead,
the rules on the right hand side can be provided as **strings** in the
`subclass_names` list. The `use_names` list should be empty and the
`tostr` method is not required (as there is no text to output because
this rule is simply used to decide what other rules to use).

.. note::

    it is currently unclear when to use `subclass_names` and when to use
    `use_names`. At the moment the pragmatic suggestion is to follow the
    way it is currently done.

Therefore to implement rule `R202` the following needs to
be specified
::
   
    class Program_Unit(Base):  # R202
        ''' <description> '''
        subclass_names = ['Comment', 'Main_Program', 'External_Subprogram',
                          'Module', 'Block_Data']

In this way fparser2 captures the `R202` rule hierarchy in its
`Program_Unit` class.

.. _exceptions:

Exceptions
++++++++++

There are 7 types of exception raised in fparser2: `NoMatchError`,
`FortranSyntaxError`, `ValueError`, `InternalError`, `AssertionError` and
`NotImplementedError`.

A baseclass `FparserException` is included which `NoMatchError`,
`FortranSyntaxError` and `InternalError` subclass. The reason for this
is to allow external tools to more simply manage fparser if it is used
as a library.

Each of the exceptions are now discussed in turn.

`NoMatchError` can be raised by a class when the text it is given does
not match the pattern for the class. A class can also return an empty
return value to indicate no match. It is currently unclear when it is
appropriate to do one or the other.

`NoMatchError` (or an empty return value) does not necessarily mean that
the text is invalid, just that the text does not match this class. For
example, it may be that some text should match one of a set of
rules. In this case all rules would fail to match except one. It is
only invalid text if none of the possible rules match.

Usually `NoMatchError` is raised by a class with no textual information
(a string provided as an argument to the exception), as textual
information is not required. When textual information is provided this
is ignored.

.. note::

   `NoMatchError` is the place where we can get context-specific
   information about a syntax error. The problem is that there are
   typically many `NoMatchError`s associated with invalid code. The
   reason for this is that every (relevant) rule needs to be matched
   with the associated invalid code. Each of these will return a
   `NoMatchError`. One option would be to always return
   context-specific information from `NoMatchError` and somehow
   aggregate this information until it is known that there is a syntax
   error. At this point a `FortranSyntaxError` is raised and the
   aggregated messages could be used to determine the correct
   message(s) to return. As a simple example, imagine parsing the
   following code: `us mymodule`.  This is probably meant to mean `use
   mymodule`. The associated rule might return a `NoMatchError` saying
   something like `use not found`. However, there might be a missing
   `=` and it could be that an assignment would would also return a
   `NoMatchError` saying something like `invalid assignment`. It is
   unclear which was the programmers intention. In general, it is
   probable that the further into a rule one gets the more likely it
   is a syntax error for that rule, so it may be possible to prune out
   many `NoMatchError`s. There may even be some rule about this
   i.e. if a hierarchy of rules is matched to a certain depth then it
   must be a syntax error associated with this rule. However, in
   general it will not be possible to prune `NoMatchError`s down to one.
   The first step could be to return context information from
   `NoMatchError` for all failures to match and then look at whether
   there is an obvious way to prune these when raising a
   `FortranSyntaxError`.

.. note::

   Need to add an explanation about when `NoMatchError` exceptions are
   used and when a null return is used.

A `FortranSyntaxError` exception should be raised if the parser does
not recognise the syntax. `FortranSyntaxError` takes two
arguments. The first argument is a reader object which allows the line
number and text of the line in question to be output. The second
argument is text which can be used to give details of the error.

Currently the main use of `FortranSyntaxError` is to catch either an
`InternalSyntaxError` exception or the final `NoMatchError` exception
and re-raise it with line number and the text of the line to be
output. These exceptions are caught and re-raised by overriding the
`Base` class `__new__` method in the top level `Program` class. A
limitation of the `NoMatchError` exception (but not the
`InternalSyntaxError` exception) is that it is not able to give any
details of the error, as it knows nothing about which rules failed to
match.

`FortranSyntaxError` should also be used when it is known that there
is a match, the match has a syntax error and the line number
information is available via the reader object. One issue is that when
`FortranSyntaxError` is raised from such a location, the `fparser2.py`
script may not be able to use the reader's fifo buffer to extract
position information. In this case, position information is not
provided in the output. It is possible that if the lines were pushed
back into the buffer in the parser code then this problem would not
occur.

.. note::

   more information about the error could be determined by inspecting
   the FortranReader object. In particular, a match can be over a
   number of lines and the first line could be returned as well as the
   last. At the moment the last line and the line number are returned.

An `InternalSyntaxError` exception should be raised when it is known
that there is a match and that a syntax error has occured but it is
not possible to use the `FortranSyntaxError` exception as the line
number information is not known (typically because the match is part
of a line rather than a full line so the input to the associated match
method is a string not a reader object). As mentioned earlier, this
exception is subsequently picked up and re-raised as a
`FortranSyntaxError` exception with line number information added.
   
A `ValueError` exception is raised if an invalid standard is passed to
the `create` method of the `ParserFactory` class.

An `InternalError` exception is raised when an unexpected condition is
found. Such errors currently specify where there error was, why it
happened and request that the authors are contacted.

.. note::
   
   An additional future idea would be to also wrap the whole code with
   a general exception handler which subsequently raised an
   InternalError. This would catch any additional unforseen errors
   e.g. errors due to the wrong type of data being passed. One
   implementation would be to have this as the the only place an
   InternalError is raised, however, it is considered better to check
   for exceptions where they might happen e.g. a dangling else clause,
   as appropriate contextual information can be given in the
   associated error message.

.. note::

   Information needs to be added about the use of
   `NotImplementedError` and `AssertionError` and/or the code needs to
   be modified. These exceptions come from pre-existing code and it is
   likely that we would want to remove the `AssertionError` from
   fparser. There has also been discussion about using a logger for
   messages, however, there are currently no known situations where it
   makes sense to output messages.

Object Hierarchy
++++++++++++++++

Fortran code is parsed by creating the `Program` object with a
`FortranReader` object as its argument. If the code is parsed
successfully then a hierarchy of objects is returned associated with
the structure of the original code. For example::

    >>> from fparser.common.readfortran import FortranStringReader
    >>> code = "program test\nend"
    >>> reader = FortranStringReader(code)
    >>> ast = parser_f2003(reader)
    >>> ast
    Program(Main_Program(Program_Stmt('PROGRAM', Name('test')), End_Program_Stmt('PROGRAM', None)))

Therefore the above example creates a `Program` object, which contains
a `Main_Program` object. The `Main_Program` object contains a
`Program_Stmt` object followed by an `End_Program_Stmt` object. The
`Program_Stmt` object contains the `PROGRAM` text and a `Name`
object. The `Name` object contains the name of the program
i.e. `test`. The `End_Program_Stmt` object contains the `PROGRAM` text
and a `None` for the name as it is not supplied in the original code.

As one might expect, the object hierarchy adheres to the Fortran rule
hierarchy presented in the associated Fortran specification document
(as each class implements a rule). If one were to manually follow the
rules in the specification document to confirm a code was compliant
and write down the rules visited on a piece of paper in a hierarchical
manner (i.e. also write down which rules triggered subsequent rules)
then there would be a one-to-one correspondance between the rules and
rule hierarchy written on paper and the objects and object hierarchy
returned by fparser2.

Extensions
++++++++++

Compilers often support extensions to the Fortran standard. fparser2
also does this in certain cases. The suggested way to support this in
fparser2 is to add an appropriate name to the `EXTENSIONS` list in
`utils.py` and then support this extension in the appropriate class if
the name is found in the `EXTENSIONS` list. This will allow this list
to be modified in the future (e.g. a `-std` option could force the
compiler to throw out any non-standard Fortran).

.. note::

   A number of extensions do not currently follow this convention and
   are always supported in fparser2 (e.g. support for `$` in
   names). At some point these need to be modified to use the new
   approach. Eventually, the concept of extensions is expected to be
   implemented as a configuration file rather than a static list.

Include files
+++++++++++++

fparser has been extended to support include files as part of the
Fortran syntax. This has been implemented in two new classes
`fparser.two.Fortran2003.Include_Stmt` and
`fparser.two.Fortran2003.Include_Filename`. This allows fparser to
parse code with unresolved include files.

The filename matching pattern implemented in fparser is that the
filename must start with a non-space character and end with a
non-space character. This is purposely a very loose restriction
because many characters can be used in filenames and different
characters may be valid in different operating systems. Note that
whilst the term filename is used here it can be a filepath.

The include statement rule is added to the start of the `BlockBase`
match method by integrating it with the `comments` rule in the
`add_c_and_i()` function. This means that any includes before a
BlockBase will be matched.

The include statement rule is also added to the subclasses to match in
the `BlockBase` match method by simply appending it to the existing
subclasses (the valid classes between the start and end classes) in
the same way that the Comments class is added. This means that any
includes within a `BlockBase` will be matched.

All Fortran rules that are responsible for matching whole line
statements (apart from the top level Program rule R201) make use of
the `BlockBase` match method. Therefore by adding support for includes
at the beginning and within a BlockBase class we support includes at
all possible locations (apart from after the very last statement).

The top level Program rule R201 supports includes at the level of
multiple program units by again making use of the `add_c_and_i()`
function before any 'program units', between 'program units' and after
any 'program units'. This completes all valid locations for include
statements, including the missing last statement mentioned in the
previous paragraph.

Preprocessing Directives
++++++++++++++++++++++++

fparser2 retains preprocessing directives as nodes in the parse tree
but does not interpret them. This has been implemented in
`C99Preprocessor.py` as a number of classes that have names with the
prefix `Cpp_`. This allows fparser2 to parse code successfully that
contains preprocessing directives but reduces to valid Fortran if the
directives are omitted.

Similarly to comments, the readers represent preprocessing directives
by a dedicated class `CppDirective`, which is a subclass of `Line`.
This allows directives to be detected early and matches to be limited
to source lines that are instances of `CppDirective`. Matching of directives
is performed in the same place as include statements to make sure that they
are recognized at all locations in a source file.

Most directives are implemented as subclasses of `WORDClsBase` or
`StringBase` (with the only exceptions being macro definition and
null directive).

Conditional inclusion directives (`#if...[#elif...]...#endif` or their
variants `#ifdef`/`#ifndef`) are represented as individual nodes by
classes `fparser.two.C99Preprocessor.Cpp_If_Stmt`,
`fparser.two.C99Preprocessor.Cpp_Elif_Stmt`,
`fparser.two.C99Preprocessor.Cpp_Else_Stmt`, and
`fparser.two.C99Preprocessor.Cpp_Endif_Stmt` but
currently not grouped together in any way since directives can appear
at any point in a file and thus the span of conditional inclusions may
be orthogonal to a Fortran block. In `#if(n)def` directives the
identifier is matched using
`fparser.two.C99Preprocessor.Cpp_Macro_Identifier`
and may contain only letters and underscore. In `#if` or `#elif`
directives the constant expression is matched very loosely by
`fparser.two.C99Preprocessor.Cpp_Pp_Tokens`
which accepts any non-empty string.

Include directives (`#include`) are handled similarly to Fortran
include statements with the matching of filenames being done by the
same class and therefore with the same (loose) restrictions.

Directives that define macro replacements (`#define`) contain a
macro identifier that is matched using `Cpp_Macro_Identifier`.
This is followed by an optional identifier list in parentheses
(and without white space separating identifier and opening
parenthesis) that defines parameters to the macro for use in the
replacement expression. The identifier list is matched by
`fparser.two.C99Preprocessor.Cpp_Macro_Identifier_List`
which, however, does not treat individual identifiers as separate
names but matches the entire list as a single string.
The replacement expression is matched and represented as
`Cpp_Pp_Tokens`.

The matching of `#undef` statements is implemented in class
`fparser.two.C99Preprocessor.Cpp_Undef_Stmt` with the identifier again
matched by `Cpp_Macro_Identifier`.

Directives `#line`, `#error`, and `#warning` are implemented in classes
`fparser.two.C99Preprocessor.Cpp_Line_Stmt`,
`fparser.two.C99Preprocessor.Cpp_Error_Stmt`, and
`fparser.two.C99Preprocessor.Cpp_Warning_Stmt` with the corresponding
right hand sides matched by `Cpp_Pp_Tokens`.

A single preprocessing directive token `#` without any directive is
a null statement and is matched by
`fparser.two.C99Preprocessor.Cpp_Null_Stmt`.

Utils
+++++

fparser2 includes a `utils.py` file. This file contains the base
classes (discussed in the :ref:`base-classes` section), the
fparser2-specific exceptions (discussed in the :ref:`exceptions`
section), a list of extensions (see previous section) and a tree-walk
utility that can be used to traverse the AST produced by fparser2 for
a valid Fortran program.

.. note::

   the tree-walk utility currently fails if the parent node of the
   tree is provided. The solution is to provide the parent's
   children. This should be fixed at some point.


.. skip
   # Constraints
   # +++++++++++
   # TBD
   # Comment Class
   # +++++++++++++
   # TBD

.. _tokenisation:

Tokenisation
++++++++++++

In order to simplify the problem of parsing code containing
potentially complex expressions, fparser2 performs some limited
tokenisation of a string before proceeding to attempt to match it.
Currently, this tokenisation replaces three different types of quantity with
simple names:

 1. the content of strings;
 2. expressions in parentheses;
 3. literal constants involving exponents (e.g. ``1.0d-3``)

This tokenisation is performed by the `string_replace_map` function:

.. autofunction:: fparser.common.splitline.string_replace_map

In turn, this function uses `splitquote` and `splitparen` (in the same
module) to split a supplied string into quanties within quotes or
parentheses, respectively. The matching for literal constants involving
exponents is implemented using a regular expression.

`string_replace_map` is used in the `match()` method of many of the classes
that implement the various language rules. Note that the tokenisation must
be undone before passing a given string on to a child class (or returning
it). This is performed using the reverse-map that `string_replace_map`
returns, e.g.::

    line, repmap = string_replace_map(string)
    ...
    type_spec = Declaration_Type_Spec(repmap(line[:i].rstrip()))

(The reverse map is an instance of `fparser.common.splitline.StringReplaceDict`
which subclasses`dict` and makes it callable.)

   
Expression matching
+++++++++++++++++++

The Fortran2003 rules specify a hierarchy of expressions (specified in
levels). In summary::

    R722 expr is [ expr defined-binary-op ] level-5-expr
    R717 level-5-expr is [ level-5-expr equiv-op ] equiv-operand
    R716 equiv-operand is [ equiv-operand or-op ] or-operand
    R715 or-operand is [ or-operand and-op ] and-operand
    R714 and-operand is [ not-op ] level-4-expr
    R712 level-4-expr is [ level-3-expr rel-op ] level-3-expr    
    R710 level-3-expr is [ level-3-expr concat-op ] level-2-expr
    R706 level-2-expr is [[level-2-expr] add_op ] add-operand
    R705 add-operand is [ add-operand mult-op ] mult-operand
    R704 mult-operand is level-1-expr [ power-op mult-operand ]
    R702 level-1-expr is [ defined-unary-op ] primary

As can hopefully be seen, the "top level" rule is `expr`, this depends
on a `level-5_expr`, which depends on an `equiv-operand` and so on in
a hierarchy in the order listed.

Fparser2 naturally follows this hierarchy, attempting to match in the
order specified. This works well apart from one case, which is the
matching of a Level-2 expression::

    R706 level-2-expr is [[level-2-expr] add_op ] add-operand

The problem is to do with falsely matching an exponent in a
literal. Take the following example::

    a - 1.0e-1

When searching for a match, the following pattern is a valid candidate
and will be the candidate used in fparser2 as fparser2 matches from the
right hand side of a string by default::

    level-2-expr = "a - 1.0e"
    add-op = "-"
    add-operand = "1"

As expected, this would fail to match, due to the level-2 expression
("a - 1.0e") being invalid. However, once R706 failed to match it
would not be called again as fparser2 follows the rule hierarchy
mentioned earlier. Therefore fparser2 would fail to match this string.

To solve this problem, fparser2 performs limited tokenisation of a string
before attempting to perform a match. Amongst other things, this tokenisation
replaces any numerical constants containing exponents with simple symbols
(see :ref:`tokenisation` for more details). For the example above this means
that the code being matched would now look like::

    a - F2PY_REAL_CONSTANT_1_

which is readily matched as a level-2 expression.

Continuous Integration
----------------------

GitHub Actions are used to run the test suite for a number of different
Python versions and the coverage reports are uploaded automatically to CodeCov
(https://codecov.io/gh/stfc/fparser). The configuration for this is in the
`.github/workflows/unit-tests.yml` file.

In addition, an Action is also used check that all of the code conforms
to Black (https://black.readthedocs.io) formatting. It is up to the developer
to ensure that this passes (e.g. by running `black` locally and committing
the results). Note that it is technically possibly to have the Action
actually make the changes and commit them but this was found to break
the Github review process since the automated commit is not permitted to
trigger further Actions and this then leaves GitHub thinking that the
various checks have not run.

Automatic Packaging
-------------------

A GitHub Action (https://github.com/pypa/gh-action-pypi-publish)
is also used to automate the process of uploading a new
release of fparser to the Python Package Index (pypi). This action is
configured in the `.github/workflows/python_publish.yml` file and is
triggered by the creation of a new release on GitHub.

Test Fixtures
-------------

Various pytest fixtures
(https://docs.pytest.org/en/stable/fixture.html) are provided so as to
aid in the mock-up of a suitable environment in which to run
tests. These are defined in `two/tests/conftest.py`:

=================== ======================= ===================================
Name                Returns                 Purpose
=================== ======================= ===================================
f2003_create        --                      Sets-up the class hierarchy for the
                                            Fortran2003 parser.
f2003_parser        `Fortran2003.Program`   Sets-up the class hierarchy for the
                                            Fortran2003 parser and returns the
                                            top-level Program object.
clear_symbol_table  --                      Removes all stored symbol tables.
fake_symbol_table   --                      Creates a fake scoping region and
                                            associated symbol table.
=================== ======================= ===================================


Performance Benchmark
---------------------

The fparser scripts folder contains a benchmarking script to assess the
performance of the parser by generating a synthetic Fortran file with
multiple subroutines and the associated subroutine calls. It can be executed
with the following command::

    ./src/fparser/scripts/fparser2_bench.py