1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158
|
.. _intro_to_parsing:
======================
Loading and saving RDF
======================
Reading RDF files
-----------------
RDF data can be represented using various syntaxes (``turtle``, ``rdf/xml``, ``n3``, ``n-triples``,
``trix``, ``JSON-LD``, etc.). The simplest format is
``ntriples``, which is a triple-per-line format.
Create the file :file:`demo.nt` in the current directory with these two lines in it:
.. code-block:: Turtle
<http://example.com/drewp> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://example.com/drewp> <http://example.com/says> "Hello World" .
On line 1 this file says "drewp is a FOAF Person:. On line 2 it says "drep says "Hello World"".
RDFLib can guess what format the file is by the file ending (".nt" is commonly used for n-triples) so you can just use
:meth:`~rdflib.graph.Graph.parse` to read in the file. If the file had a non-standard RDF file ending, you could set the
keyword-parameter ``format`` to specify either an Internet Media Type or the format name (a :doc:`list of available
parsers <plugin_parsers>` is available).
In an interactive python interpreter, try this:
.. code-block:: python
from rdflib import Graph
g = Graph()
g.parse("demo.nt")
print(len(g))
# prints: 2
import pprint
for stmt in g:
pprint.pprint(stmt)
# prints:
# (rdflib.term.URIRef('http://example.com/drewp'),
# rdflib.term.URIRef('http://example.com/says'),
# rdflib.term.Literal('Hello World'))
# (rdflib.term.URIRef('http://example.com/drewp'),
# rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
# rdflib.term.URIRef('http://xmlns.com/foaf/0.1/Person'))
The final lines show how RDFLib represents the two statements in the
file: the statements themselves are just length-3 tuples ("triples") and the
subjects, predicates, and objects of the triples are all rdflib types.
Reading remote RDF
------------------
Reading graphs from the Internet is easy:
.. code-block:: python
from rdflib import Graph
g = Graph()
g.parse("http://www.w3.org/People/Berners-Lee/card")
print(len(g))
# prints: 86
:func:`rdflib.Graph.parse` can process local files, remote data via a URL, as in this example, or RDF data in a string
(using the ``data`` parameter).
Saving RDF
----------
To store a graph in a file, use the :func:`rdflib.Graph.serialize` function:
.. code-block:: python
from rdflib import Graph
g = Graph()
g.parse("http://www.w3.org/People/Berners-Lee/card")
g.serialize(destination="tbl.ttl")
This parses data from http://www.w3.org/People/Berners-Lee/card and stores it in a file ``tbl.ttl`` in this directory
using the turtle format, which is the default RDF serialization (as of rdflib 6.0.0).
To read the same data and to save it as an RDF/XML format string in the variable ``v``, do this:
.. code-block:: python
from rdflib import Graph
g = Graph()
g.parse("http://www.w3.org/People/Berners-Lee/card")
v = g.serialize(format="xml")
The following table lists the RDF formats you can serialize data to with rdflib, out of the box, and the ``format=KEYWORD`` keyword used to reference them within ``serialize()``:
.. csv-table::
:header: "RDF Format", "Keyword", "Notes"
"Turtle", "turtle, ttl or turtle2", "turtle2 is just turtle with more spacing & linebreaks"
"RDF/XML", "xml or pretty-xml", "Was the default format, rdflib < 6.0.0"
"JSON-LD", "json-ld", "There are further options for compact syntax and other JSON-LD variants"
"N-Triples", "ntriples, nt or nt11", "nt11 is exactly like nt, only utf8 encoded"
"Notation-3","n3", "N3 is a superset of Turtle that also caters for rules and a few other things"
"Trig", "trig", "Turtle-like format for RDF triples + context (RDF quads) and thus multiple graphs"
"Trix", "trix", "RDF/XML-like format for RDF quads"
"N-Quads", "nquads", "N-Triples-like format for RDF quads"
Working with multi-graphs
-------------------------
To read and query multi-graphs, that is RDF data that is context-aware, you need to use rdflib's
:class:`rdflib.Dataset` class. This an extension to :class:`rdflib.Graph` that
know all about quads (triples + graph IDs).
If you had this multi-graph data file (in the ``trig`` format, using new-style ``PREFIX`` statement (not the older
``@prefix``):
.. code-block:: Turtle
PREFIX eg: <http://example.com/person/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
eg:graph-1 {
eg:drewp a foaf:Person .
eg:drewp eg:says "Hello World" .
}
eg:graph-2 {
eg:nick a foaf:Person .
eg:nick eg:says "Hi World" .
}
You could parse the file and query it like this:
.. code-block:: python
from rdflib import Dataset
from rdflib.namespace import RDF
g = Dataset()
g.parse("demo.trig")
for s, p, o, g in g.quads((None, RDF.type, None, None)):
print(s, g)
This will print out:
.. code-block::
http://example.com/person/drewp http://example.com/person/graph-1
http://example.com/person/nick http://example.com/person/graph-2
|