File: README

package info (click to toggle)
libxml-dt-perl 0.3-1.1
  • links: PTS
  • area: main
  • in suites: lenny, squeeze
  • size: 296 kB
  • ctags: 42
  • sloc: perl: 914; xml: 424; makefile: 64
file content (243 lines) | stat: -rw-r--r-- 8,333 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
=head1 XML::DT a Perl XML down translate module

With XML::DT, I think that:

   . it is simple to do simple XML processing tasks :)
   . it is simple to have the XML processor stored in a single variable
       (see example 4)
   . it is simple to translate XML -> Perl user controlled complex structure 
       with a compact "-type" definition  (see last section)

Feedback welcome -> jj@di.uminho.pt

=head1 XML::DT a Perl XML down translate module

This document is also available in HTML (pod2html'ized):
http://www.di.uminho.pt/~jj/perl/XML/XML-DT.readme.html

 . based on XML::Parser (tree mode).
 . design to do simple and compact translation/processing of XML document
 . it includes some features of omnimark and sgmls.pm; functional approach
 . it includes functions to automatic build user controlled complex Perl 
       structures (see "working with structures" section)
 . it was build to show my NLP Perl students that it is easy to work with XML
 . home page and download:  http://www.di.uminho.pt/~jj/perl/XML/DT.html

=head1 HOW IT WORKS:

 . the user must define a handler and call the basic function : 
      dt($filename,%handler) or dtstring($string,%handler)
 . the handler is a HASH mapping element names to functions. Handlers can 
      have a "-default" function , and a "-end" function
 . in order to make it smaller each function receives 3 args as global variables
      $c - contents
      $q - element name
      %v - attribute values
 . the default "-default" function is the identity. The function "toxml" makes
      the original XML text based on $c, $q and %v values.
 . see some advanced features in the last examples

=head1 SOME simple (naive) examples:

  INDEX:
  1. change to lowercase attribute named "a" in element "e"
  2. better solution 
  3. make some statistics and output results in HTML (using side effects)
  4. In a HTML like XML document, substitute <contents/>...<contents> by the 
      real table of contents (a dirty solution...)
  5. a more realistic example: from XML gcapaper DTD to latex

  WORKING WITH STRUCTURES INSTEAD OF STRINGS...

  6. Build the natural Perl structure of the following document (ARRAY,HASH)
  7. Multi map on...

=head2 1. change to lowercase the contents of the attribute named "a" in element "e" 

  use XML::DT ;
  my $filename = shift;
  
  print dt($filename,
           ( e => sub{ "<e a='". lc($v{a}). "'>$c</e>" }));

=head2 2. A better solution of the previous example

Ex.1 wouldn't work if we have more attributes in element e. 
A better solution is

  print dt($filename, 
           ( e => sub{ $v{a} = lc($v{a}); 
                       toxml();}));

=head2 3. make some statistics and output results in HTML (using side effects)

  use XML::DT ;
  my $filename = shift;

  %handler=( -default => sub{$elem_counter++;
                             $elem_table{$q}++;"";} # $q -> element name
  );

  dt($filename,%handler);

  print "<H3>We have found $elem_counter elements in document</H3>";
  print "<TABLE><TH>ELEMENT<TH>OCCURS\n";
  foreach $elem (sort keys %elem_table)
     {print "<TR><TD>$elem<TD>$elem_table{$elem}\n";}
  print "</TABLE>";

=head2 4. In a HTML like XML document, substitute <contents/>...<contents> by the real table of contents (a dirty solution...)

  %handler=( h1 => sub{ $index .= "\n$c";     toxml();},
             h2 => sub{ $index .= "\n\t$c";   toxml();},
             h3 => sub{ $index .= "\n\t\t$c"; toxml();},
             contents => sub{ $c="__CLEAN__"; toxml();},
             -end => sub{ $c =~ s/__CLEAN__/$index/; $c});

  print dt($filename,%handler)

=head2 5. a more realistic example: from XML gcapaper DTD to latex

notes:

  . "TITLE" is processed in context dependent way!
  . output in ISOLATIN1 (this is dirty but my LaTeX doesn't support UNICODE)
  . a stack of authors was necessary because LaTeX structure was different
      from input structure...
  . this example was partially created by the function mkdtskel 
        Perl -MXML::DT -e 'mkdtskel "f.xml"' > f.pl
      and took me about one hour to tune to real LaTeX/XML example.

NAME gcapaper2tex.pl - a Perl script to translate XML gcapaper DTD to latex

SYNOPSIS gcapaper2tex.pl mypaper.xml > mupaper.tex

  use XML::DT ;
  my $filename = shift;
  my $beginLatex = '\documentclass{article} \begin{document} ';
  my $endLatex = '\end{document}';
  
  %handler=(
      '-outputenc' => 'ISO-8859-1',
      '-default'   => sub{"$c"},
       'RANDLIST' => sub{"\\begin{itemize}$c\\end{itemize}"},
       'AFFIL' => sub{""},                              # delete affiliation
       'TITLE' => sub{
                    if(inctxt('SECTION')){"\\section{$c}"}
                 elsif(inctxt('SUBSEC1')){"\\subsection{$c}"}
                 else                    {"\\title{$c}"}
              },
       'GCAPAPER' => sub{"$beginLatex $c $endLatex"},
       'PARA' => sub{"$c\n\n"},
       'ADDRESS' => sub{"\\thanks{$c}"},
       'PUB' => sub{"} $c"},
       'EMAIL' => sub{"(\\texttt{$c}) "},
       'FRONT' => sub{"$c\n"},
       'AUTHOR' => sub{ push @aut, $c ; ""},
       'ABSTRACT' => sub{
          sprintf('\author{%s}\maketitle\begin{abstract}%s\end{abstract}',
                  join ('\and', @aut) ,
                  $c) },
       'CODE.BLOCK' => sub{"\\begin{verbatim}\n$c\\end{verbatim}\n"},
       'XREF' => sub{"\\cite{$v{REFLOC}}"},
       'LI' => sub{"\\item $c"},
       'BIBLIOG' =>sub{"\\begin{thebibliography}{1}$c\\end{thebibliography}\n"},
       'HIGHLIGHT' => sub{" \\emph{$c} "},
       'BIO' => sub{""},                                  #delete biography
       'SURNAME' => sub{" $c "},
       'CODE' => sub{"\\verb!$c!"},
       'BIBITEM' => sub{"\n\\bibitem{$c"},
  );
  print dt($filename,%handler); 

=head1 WORKING WITH STRUCTURES INSTEAD OF STRINGS...

  the "-type" definition defines the way to build structures in each case:

   . "HASH" or "MAP" -> make an hash with the sub-elements;
        keys are the sub-element names; warn on repetitions;
        returns the hash reference.
   . "ARRAY" or "SEQ" -> make an ARRAY with the sub-elements
        returns an array reference.
   . "MULTIMAP" -> makes an HASH of ARRAY; keys are the sub-element
   . MMAPON(name1, ...) -> similar to HASH but accepts repetitions of
        the sub-elements "name1"... (and makes an array with them)
   . STR  ->(DEFAULT) concatenates all the sub-elements returned values
        all the sub-element should return strings to be concatenated


=head2 6. Build the natural Perl structure of the following document

  <institution>
    <id>U.M.</id>
    <name>University of Minho</name>
    <tels>
      <item>1111</item> 
      <item>1112</item>
      <item>1113</item>
    </tels>
    <where>Portugal</where>
    <contacts>J.Joao; J.Rocha; J.Ramalho</contacts>
  </institution>

  use XML::DT;
  %handler = ( -default => sub{$c},
               -type    => { institution => 'HASH',
                             tels        => 'ARRAY' },
               contacts => sub{ [ split(";",$c)] },
             );
  
  $a = dt("ex10.2.xml", %handler);


$a is a reference to an HASH:

  { 'tels' => [ 1111, 1112, 1113 ],
    'name' => 'University of Minho',
    'where' => 'Portugal',
    'id' => 'U.M.',
    'contacts' => [ 'J.Joao', ' J.Rocha', ' J.Ramalho' ] };

=head2 7. Christmas card...

We have the following address book:

  <people>
    <person>
        <name> name0 </name>
        <address> address00 </address>    
        <address> address01 </address>
    </person>
    <person>
        <name> name1 </name>
        <address> address10 </address>    
        <address> address11 </address>
    </person>
  </people>

Now we are going to build a structure to store the address book and write a 
Christmas card to the first address of everyone

  #!/usr/bin/perl
  use XML::DT;
  %handler = ( -default => sub{$c},
               person   => sub{ mkchristmascard($c); $c},
               -type    => { people => 'ARRAY',
                             person => MMAPON('address')});
  
  $people = dt("ex11.1.xml", %handler);
  
  print $people->[0]{address}[1];     # prints  address01

  sub mkchristmascard{ my $x=shift;
    open(A,"|lpr") or die;
    print A <<".";
    $x->{name} 
    $x->{address}[0]
    
    Dear $x->{name}
      Merry Christmas from Braga Perl mongers\n
  .

  close A;
  }