File: modification.Rmd

package info (click to toggle)
r-cran-xml2 1.4.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 976 kB
  • sloc: cpp: 1,826; xml: 333; javascript: 238; ansic: 178; sh: 71; makefile: 6
file content (203 lines) | stat: -rw-r--r-- 5,145 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
---
title: "Node Modification"
author: "Jim Hester"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Node Modification}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, echo = FALSE, message = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
library(xml2)
library(magrittr)
```

# Modifying Existing XML

Modifying existing XML can be done in xml2 by using the replacement functions
of the accessors. They all have methods for both individual `xml_node` objects
as well as `xml_nodeset` objects. If a vector of values is provided it is
applied piecewise over the nodeset, otherwise the value is recycled.

## Text Modification ##

Text modification only happens on text nodes. If a given node has more than one
text node only the first will be affected. If you want to modify additional
text nodes you need to select them explicitly with `/text()`.

```{r}
x <- read_xml("<p>This is some <b>text</b>. This is more.</p>")
xml_text(x)

xml_text(x) <- "This is some other text."
xml_text(x)

# You can avoid this by explicitly selecting the text node.
x <- read_xml("<p>This is some text. This is <b>bold!</b></p>")
text_only <- xml_find_all(x, "//text()")

xml_text(text_only) <- c("This is some other text. ", "Still bold!")
xml_text(x)
xml_structure(x)
```

## Attribute and Namespace Definition Modification ##

Attributes and namespace definitions are modified one at a time with
`xml_attr()` or all at once with `xml_attrs()`. In both cases using `NULL` as
the value will remove the attribute completely.

```{r}
x <- read_xml("<a href='invalid!'>xml2</a>")
xml_attr(x, "href")

xml_attr(x, "href") <- "https://github.com/r-lib/xml2"
xml_attr(x, "href")

xml_attrs(x) <- c(id = "xml2", href = "https://github.com/r-lib/xml2")
xml_attrs(x)
x

xml_attrs(x) <- NULL
x

# Namespaces are added with as a xmlns or xmlns:prefix attribute
xml_attr(x, "xmlns") <- "http://foo"
x

xml_attr(x, "xmlns:bar") <- "http://bar"
x
```

## Name Modification ##

Node names are modified with `xml_name()`.

```{r}
x <- read_xml("<a><b/></a>")
x
xml_name(x)
xml_name(x) <- "c"
x
```

# Node modification #
All of these functions have a `.copy` argument. If this is set to `FALSE` they
will remove the new node from its location before inserting it into the new
location. Otherwise they make a copy of the node before insertion.

## Replacing existing nodes ##
```{r}
x <- read_xml("<parent><child>1</child><child>2<child>3</child></child></parent>")
children <- xml_children(x)
t1 <- children[[1]]
t2 <- children[[2]]
t3 <- xml_children(children[[2]])[[1]]

xml_replace(t1, t3)
x
```

## Add a sibling ##
```{r}
x <- read_xml("<parent><child>1</child><child>2<child>3</child></child></parent>")
children <- xml_children(x)
t1 <- children[[1]]
t2 <- children[[2]]
t3 <- xml_children(children[[2]])[[1]]

xml_add_sibling(t1, t3)
x

xml_add_sibling(t3, t1, where = "before")
x
```

## Add a child ##
```{r}
x <- read_xml("<parent><child>1</child><child>2<child>3</child></child></parent>")
children <- xml_children(x)
t1 <- children[[1]]
t2 <- children[[2]]
t3 <- xml_children(children[[2]])[[1]]

xml_add_child(t1, t3)
x

xml_add_child(t1, read_xml("<test/>"))
x
```

## Removing nodes ##
The `xml_remove()` can be used to remove a node (and its children) from a
tree. The default behavior is to unlink the node from the tree, but does _not_
free the memory for the node, so R objects pointing to the node are still
valid.

This allows code like the following to work without crashing R

```{r}
x <- read_xml("<foo><bar><baz/></bar></foo>")
x1 <- x %>%
  xml_children() %>%
  .[[1]]
x2 <- x1 %>%
  xml_children() %>%
  .[[1]]

xml_remove(x1)
rm(x1)
gc()

x2
```
If you are not planning on referencing these nodes again this memory is wasted.
Calling `xml_remove(free = TRUE)` will remove the nodes _and_ free the memory
used to store them.  **Note** In this case _any_ node which previously pointed
to the node or its children will instead be pointing to free memory and may
cause R to crash. xml2 can't figure this out for you, so it's your
responsibility to remove any objects which are no longer valid.

In particular `xml_find_*()` results are easy to overlook, for example

```{r}
x <- read_xml("<a><b /><b><b /></b></a>")
bees <- xml_find_all(x, "//b")
xml_remove(xml_child(x), free = TRUE)
# bees[[1]] is no longer valid!!!
rm(bees)
gc()
```

## Namespaces ##

We want to construct a document with the following namespace layout. (From
<https://stackoverflow.com/questions/32939229/creating-xml-in-r-with-namespaces/32941524#32941524>).
```xml
<?xml version = "1.0" encoding="UTF-8"?>
<sld xmlns="http://www.opengis.net/sld"
     xmlns:ogc="http://www.opengis.net/ogc"
     xmlns:se="http://www.opengis.net/se"
     version="1.1.0" >
<layer>
<se:Name>My Layer</se:Name>
</layer>
</sld>
```

```{r}
d <- xml_new_root("sld",
  "xmlns" = "http://www.opengis.net/sld",
  "xmlns:ogc" = "http://www.opengis.net/ogc",
  "xmlns:se" = "http://www.opengis.net//se",
  version = "1.1.0"
) %>%
  xml_add_child("layer") %>%
  xml_add_child("se:Name", "My Layer") %>%
  xml_root()

d
```