File: namespaceHandlers.xml

package info (click to toggle)
r-cran-xml 3.99-0.19-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 3,688 kB
  • sloc: ansic: 6,659; xml: 2,890; asm: 486; sh: 12; makefile: 2
file content (140 lines) | stat: -rw-r--r-- 4,854 bytes parent folder | download | duplicates (10)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
<?xml version="1.0"?>

<doc xmlns:r="http://www.r-project.org"
     xmlns:pmml="http://www.dmg.org/PMML-3_1">

 <r:model>
  This is an R model
 </r:model>
 <pmml:model>
  This node with the same name identifies a model in PMML.
 </pmml:model>

 <model>
  And this is a model element with no namespace.
 </model>

We want to handle the model nodes in R.
So we might think about this in the usual manner with
something like the following handler.

<r:code>
model = function(node) { 
           cat(xmlValue(node), "\n"); 
           node 
        }
</r:code>
Then, we pass this handler to the DOM parser with
<r:code>
xmlRoot(xmlTreeParse("namespaceHandlers.xml", 
                     handlers = list(model = model), 
                     asTree = TRUE))
</r:code>

The output we get is 
<r:output>
This is an R model 
This node with the same name identifies a model in PMML. 
</r:output>
and then the resulting XML tree.

<para/>
We want to have different handlers,
one for the R model node and the other for the PMML mode
node.
And we will add one for the node that has no namespace prefix.
We can do this with
<r:code>
handlers = 
  list("r:model" = function(node) { 
                  cat("R handler: name = ", xmlName(node), ", value =", xmlValue(node), "\n"); 
           node 
        },
       "pmml:model" = function(node) { 
                  cat("A PMML node\n")
           node 
        },
        model = function(node) {
                   cat("ordinary model node\n")
                   node
                })
</r:code>
We can then invoke these from R
using
<r:code>
xmlTreeParse("namespaceHandlers.xml", asTree = TRUE, 
             handlers = namespaceNodeHandlers ( .handlers = handlers))
</r:code>
The key thing is that we call <r:func>namespaceNodeHandlers</r:func>

<para/>
We'll add an additional node handler to show that
this is not limited to duplicate node names,
but also handles general ones.
<r:code>
handlers[["para"]] =  function(node) {  cat("alignment for para", xmlGetAttr(node, "align"), "\n") ; node}
xmlRoot(xmlTreeParse("namespaceHandlers.xml", asTree = TRUE, 
                      handlers = namespaceNodeHandlers ( .handlers = handlers)))
</r:code>

<para/>

What we have done above is to rely on the fact that we have the same
namespace prefixes used in the document and the handler names.  It is
better/safer to be explicit about this and define our own namespace
prefixes and each to an explicit URI.  Then, we can compare the URI
for the node's namespace to ours and only follow the node then.
We do this by providing a named character 
vector giving the prefix = URI pairs.
<r:code>
h = namespaceNodeHandlers(.handlers = handlers,
                          nsDefs = c(r = 'http://www.r-project.org',
                                     pmml = 'http://www.dmg.org/PMML-3_1'))

z = xmlTreeParse("namespaceHandlers.xml", asTree = TRUE, handlers = h, fullNamespaceInfo = TRUE)
</r:code>
Note that we have to specify <r:arg>fullNamespaceInfo</r:arg> to be <r:true/>.
This is done automatically in <r:func>xmlTreeParse</r:func> for us now.
And we see that we get the same handling of the nodes
since the name space definitins correspond to the URIs in the document.

<para/>
We will illustrate that the name space prefix in the handlers
does not have to be the same as that in the document
for the same URI.
We'll change the pmml prefix in the names of our handlers
to a simple 'm'.
And we use 'm' when specifying the name space definitions in R
via the <r:arg>nsDefs</r:arg> argument.
When we run this code
<r:code>
names(handlers)[2] = 'm:model'
z = xmlTreeParse("namespaceHandlers.xml", asTree = TRUE, 
                 fullNamespaceInfo = TRUE,
                 handlers = namespaceNodeHandlers(.handlers = handlers,
                                                  nsDefs = c(r = 'http://www.r-project.org',
                                                             m = 'http://www.dmg.org/PMML-3_1')))
</r:code>
we get the same result, with all three handlers being called.


<para/>
And it is further useful to verify that if we had a different URI,
then things wouldn't work.
So we'll provide a different URI for the 'r' prefix 
<r:code>
z = xmlTreeParse("namespaceHandlers.xml", asTree = TRUE, 
                 fullNamespaceInfo = TRUE,
                 handlers = namespaceNodeHandlers(.handlers = handlers,
                                                  nsDefs = c(r = 'http://www.other.com',
                                                             m = 'http://www.dmg.org/PMML-3_1')))
</r:code>
The result is that we call the standard model node handler for the
r:model XML node in the document.

<para/>
Note that one does not have to specify the handler functions
as a list, but can identify them separately via the
<r:dots/> parameter of <r:func>namespaceNodeHandlers</r:func>.

</doc>