-
Mechanism for taking xmlRoot() and not freeing the XMLInternalDocument
and its nodes, but freeing it when none of them are referenced
in R.
The problem is
top = xmlRoot(xmlInternalTreeParse(f))
assigns the node to top but arranges to free the document. If the
G.C. happens then, the doc is cleaned up, freeing the nodes at the
same time. If we could determine that xmlRoot() was called with the
parsing command inlined, then we would know we had this situation.
Then we could detach the document from the node and free the document,
moving forward. Of course, we can also avoid adding a finalizer via
addFinalizer = FALSE in the call to
xmlInternalTreeParse. We could also put a finalizer
on the node that says jump to the document and free it when we are
GC'ing that variable. But the general problem remains that we can
extract sub-nodes at will and assign them to R variables. If we free
an ancestor node, the C-level data structure is freed too and the R
variable will be pointing to garbage.
We might also try to put the same
reference to the document as an attribute on all extracted nodes.
We could attach this SEXP to a userData in the tree.
But how do we protect it - via R_PreserveObject() and that
causes problems too.
So how about we bring the libxml2 memory management under R's
and try to handle the chains, etc.
It is not obvious how to do this and maintain
the copy-on-modify semantics.