1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253
|
NAME
Makefile::DOM - Simple DOM parser for Makefiles
VERSION
This document describes Makefile::DOM 0.008 released on 18 November
2014.
DESCRIPTION
This library can serve as an advanced lexer for (GNU) makefiles. It
parses makefiles as "documents" and the parsing is lossless. The results
are data structures similar to DOM trees. The DOM trees hold every
single bit of the information in the original input files, including
white spaces, blank lines and makefile comments. That means it's
possible to reproduce the original makefiles from the DOM trees. In
addition, each node of the DOM trees is modifiable and so is the whole
tree, just like the PPI module used for Perl source parsing and the
HTML::TreeBuilder module used for parsing HTML source.
If you're looking for a true GNU make parser that generates an AST,
please see Makefile::Parser::GmakeDB instead.
The interface of "Makefile::DOM" mimics the API design of PPI. In fact,
I've directly stolen the source code and POD documentation of PPI::Node,
PPI::Element, and PPI::Dumper, with the full permission from the author
of PPI, Adam Kennedy.
"Makefile::DOM" tries to be independent of specific makefile's syntax.
The same set of DOM node types is supposed to get shared by different
makefile DOM generators. For example, MDOM::Document::Gmake parses GNU
makefiles and returns an instance of MDOM::Document, i.e., the root of
the DOM tree while the NMAKE makefile lexer in the future,
"MDOM::Document::Nmake", also returns instances of the MDOM::Document
class. Later, I'll also consider adding support for dmake and bsdmake.
Structure of the DOM
Makefile DOM (MDOM) is a structured set of a series of data types. They
provide a flexible document model conformed to the makefile syntax.
Below is a complete list of the 19 MDOM classes in the current
implementation where the indentation indicates the class inheritance
relationships.
MDOM::Element
MDOM::Node
MDOM::Unknown
MDOM::Assignment
MDOM::Command
MDOM::Directive
MDOM::Document
MDOM::Document::Gmake
MDOM::Rule
MDOM::Rule::Simple
MDOM::Rule::StaticPattern
MDOM::Token
MDOM::Token::Bare
MDOM::Token::Comment
MDOM::Token::Continuation
MDOM::Token::Interpolation
MDOM::Token::Modifier
MDOM::Token::Separator
MDOM::Token::Whitespace
It's not hard to see that all of the MDOM classes inherit from the
MDOM::Element class. MDOM::Token and MDOM::Node are its direct children.
The former represents a string token which is atomic from the
perspective of the lexer while the latter represents a structured node,
which usually has one or more children, and serves as the container for
other DOM::Element objects.
Next we'll show a few examples to demonstrate how to map DOM trees to
particular makefiles.
Case 1
Consider the following simple "hello, world" makefile:
all : ; echo "hello, world"
We can use the MDOM::Dumper class provided by Makefile::DOM to dump
out the internal structure of its corresponding MDOM tree:
MDOM::Document::Gmake
MDOM::Rule::Simple
MDOM::Token::Bare 'all'
MDOM::Token::Whitespace ' '
MDOM::Token::Separator ':'
MDOM::Token::Whitespace ' '
MDOM::Command
MDOM::Token::Separator ';'
MDOM::Token::Whitespace ' '
MDOM::Token::Bare 'echo "hello, world"'
MDOM::Token::Whitespace '\n'
In this example, speparators ":" and ";" are all instances of the
MDOM::Token::Separator class while spaces and new line characters
are all represented as MDOM::Token::Whitespace. The other two leaf
nodes, "all" and "echo "hello, world"" both belong to
MDOM::Token::Bare.
It's worth mentioning that, the space characters in the rule command
"echo "hello, world"" were not represented as
MDOM::Token::Whitespace. That's because in makefiles, the spaces in
commands do not make any sense to "make" in syntax; those spaces are
usually sent to shell programs verbatim. Therefore, the DOM parser
does not try to recognize those spaces specifially so as to reduce
memory use and the number of nodes. However, leading spaces and
trailing new lines will still be recognized as
MDOM::Token::Whitespace.
On a higher level, it's a MDOM::Rule::Simple instance holding
several "Token" and one MDOM::Command. On the highest level, it's
the root node of the whole DOM tree, i.e., an instance of
MDOM::Document::Gmake.
Case 2
Below is a relatively complex example:
a: foo.c bar.h $(baz) # hello!
@echo ...
It's corresponding DOM structure is
MDOM::Document::Gmake
MDOM::Rule::Simple
MDOM::Token::Bare 'a'
MDOM::Token::Separator ':'
MDOM::Token::Whitespace ' '
MDOM::Token::Bare 'foo.c'
MDOM::Token::Whitespace ' '
MDOM::Token::Bare 'bar.h'
MDOM::Token::Whitespace '\t'
MDOM::Token::Interpolation '$(baz)'
MDOM::Token::Whitespace ' '
MDOM::Token::Comment '# hello!'
MDOM::Token::Whitespace '\n'
MDOM::Command
MDOM::Token::Separator '\t'
MDOM::Token::Modifier '@'
MDOM::Token::Bare 'echo ...'
MDOM::Token::Whitespace '\n'
Compared to the previous example, here appears several new node
types.
The variable interpolation "$(baz)" on the first line of the
original makefile corresponds to a MDOM::Token::Interpolation node
in its MDOM tree. Similarly, the comment "# hello" corresponds to a
MDOM::Token::Comment node.
On the second line, the rule command indented by a tab character is
still represented by a MDOM::Command object. Its first child node
(or its first element) is also an MDOM::Token::Seperator instance
corresponding to that tab. The command modifier "@" follows the
"Separator" immediately, which is of type MDOM::Token::Modifier.
Case 3
Now let's study a sample makefile with various global structures:
a: b
foo = bar
# hello!
Here on the top level, there are three language structures: one rule
""a: b"", one assignment statement "foo = bar", and one comment "#
hello!".
Its MDOM tree is shown below:
MDOM::Document::Gmake
MDOM::Rule::Simple
MDOM::Token::Bare 'a'
MDOM::Token::Separator ':'
MDOM::Token::Whitespace ' '
MDOM::Token::Bare 'b'
MDOM::Token::Whitespace '\n'
MDOM::Assignment
MDOM::Token::Bare 'foo'
MDOM::Token::Whitespace ' '
MDOM::Token::Separator '='
MDOM::Token::Whitespace ' '
MDOM::Token::Bare 'bar'
MDOM::Token::Whitespace '\n'
MDOM::Token::Whitespace '\t'
MDOM::Token::Comment '# hello!'
MDOM::Token::Whitespace '\n'
We can see that below the root node MDOM::Document::Gmake, there are
MDOM::Rule::Simple, MDOM::Assignment, and MDOM::Comment three
elements, as well as two MDOM::Token::Whitespace objects.
It can be observed from the examples above that the MDOM representation
for the makefile's lexical elements is rather loose. It only provides
very limited structural representation instead of making a bad guess.
OPERATIONS FOR MDOM TREES
Generating an MDOM tree from a GNU makefile only requires two lines of
Perl code:
use MDOM::Document::Gmake;
my $dom = MDOM::Document::Gmake->new('Makefile');
If the makefile source code being parsed is already stored in a Perl
variable, say, $var, then we can construct an MDOM via the following
code:
my $dom = MDOM::Document::Gmake->new(\$var);
Now $dom becomes the reference to the root of the MDOM tree and its type
is now MDOM::Document::Gmake, which is also an instance of the
MDOM::Node class.
Just as mentioned above, "MDOM::Node" is the container for other
MDOM::Element instances. So we can retrieve some element node's value
via its "child" method:
$node = $dom->child(3);
# or $node = $dom->elements(0);
And we may also use the "elements" method to obtain the values of all
the nodes:
@elems = $dom->elements;
For every MDOM node, its corresponding makefile source can be generated
by invoking its "content" method.
BUGS AND TODO
The current implementation of the MDOM::Document::Gmake lexer is based
on a hand-written state machie. Although the efficiency of the engine is
not bad, the code is rather complicated and messy, which hurts both
extensibility and maintanabilty. So it's expected to rewrite the parser
using some grammatical tools like the Perl 6 regex engine
Pugs::Compiler::Rule or a yacc-style one like Parse::Yapp.
SOURCE REPOSITORY
You can always get the latest source code of this module from its GitHub
repository:
<http://github.com/agentzh/makefile-dom-pm>
If you want a commit bit, please let me know.
AUTHOR
Yichun "agentzh" Zhang (章亦春) <agentzh@gmail.com>
COPYRIGHT
Copyright 2006-2014 by Yichun "agentzh" Zhang (章亦春).
This library is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
SEE ALSO
MDOM::Document, MDOM::Document::Gmake, PPI, Makefile::Parser::GmakeDB,
makesimple.
|