File: rd_s_1.xml

package info (click to toggle)
virtuoso-opensource 6.1.4%2Bdfsg1-7
  • links: PTS, VCS
  • area: main
  • in suites: wheezy
  • size: 245,116 kB
  • sloc: ansic: 639,631; sql: 439,225; xml: 287,085; java: 61,048; sh: 38,723; cpp: 36,889; cs: 25,240; php: 12,562; yacc: 9,036; lex: 7,149; makefile: 6,093; jsp: 4,447; awk: 1,643; perl: 1,017; ruby: 1,003; python: 329
file content (129 lines) | stat: -rw-r--r-- 6,400 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
<?xml version="1.0" encoding="ISO-8859-1"?>
<!--
 -
 -  This file is part of the OpenLink Software Virtuoso Open-Source (VOS)
 -  project.
 -
 -  Copyright (C) 1998-2006 OpenLink Software
 -
 -  This project is free software; you can redistribute it and/or modify it
 -  under the terms of the GNU General Public License as published by the
 -  Free Software Foundation; only version 2 of the License, dated June 1991.
 -
 -  This program is distributed in the hope that it will be useful, but
 -  WITHOUT ANY WARRANTY; without even the implied warranty of
 -  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
 -  General Public License for more details.
 -
 -  You should have received a copy of the GNU General Public License along
 -  with this program; if not, write to the Free Software Foundation, Inc.,
 -  51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
 -
 -
-->
<refentry id="RD-S-1">
  <refmeta>
    <refentrytitle>RDF Cartridges (RDF Mappers or RDFizers)</refentrytitle>
    <refmiscinfo>tutorial</refmiscinfo>
  </refmeta>
  <refnamediv>
    <refname>RDF Cartridges</refname>
    <refpurpose>This article explains how to develop and test a custom RDF Cartridge.</refpurpose>
  </refnamediv>
  <refsect1 id="RD-S-1a">
    <title>Concept</title>
    <para>
      RDF cartridges provide a modular approach to RDF oriented entity extraction and ontology mapping
      as part of a Linked Data production pipeline. Typical sources include (X)HTML pages, images,
      Office documents and PDF documents amongst others.
    </para>
    <para>
      Cartridges expose their functionality to service consumers via the Virtuoso Sponger ("Sponger"),
      a middleware layer that extracts and delivers RDF to other Virtuoso components such as the Web
      Crawler and SPARQL Query Process. In addition it is directly exposed as a REST-style Web Service
      for external applications and services to exploit.
    </para>
    <para>
      A Cartridges is comprised of an initialization PL procedure (hook) and an entity extractor and mapper.
      The entity extractor and mapping component can be developed using PL, C or any external language
      supported by Virtuoso via the Virtuoso Server Extensions APIs.
    </para>
    <para>
      Once the cartridge has been developed, it is plugged into the Virtuoso Sponger by adding a record
      to the table DB.DBA. DB.DBA.SYS_RDF_MAPPERS.
    </para>
    <para><b>How Cartridges Work:</b></para>
  </refsect1>
  <refsect1 id="RD-S-1b">
    <title>SPARQL Query Processing</title>
    <para>
      When a SPARQL query is dispatched to Virtuoso, it invokes the Sponger during the act of
      graph or resource URI dereferencing i.e. it actually crawls the Web, locates a resource via
      it's URI/IRI (using a variety of RDF discovery heuristics), and then differences the data that
      the URI exposes. If RDF is discovered, the cartridges play no role. On the other hand, if RDF
      isn't discovered the Sponger will look in the DB.DBA.SYS_RDF_MAPPERS table (in RM_ID order),
      and for every matching URI or MIME type pattern (depending on RM_TYPE column value) it will
      invoke the associated cartridge by invoking the hook procedure. If the hook returns zero the
      next cartridge will be tried. If the result is negative the process stops, instructing the
      SPARQL engine that nothing was retrieved. If the result is positive the process stops, this
      time instructing the SPARQL engine that RDF data was successfully retrieved.
    </para>
  </refsect1>
  <refsect1 id="RD-S-1c">
    <title>PL hook requirements</title>
    <para>
      Every PL function used to plug a cartridge into the SPARQL engine must have the following parameter signature:
    </para>
    <itemizedlist mark="bullet">
      <listitem>
        in graph_iri varchar: the local storage graph IRI
      </listitem>
      <listitem>
	in new_origin_uri varchar: target information resource URI
      </listitem>
      <listitem>
        in destination varchar: the target graph IRI
      </listitem>
      <listitem>
        inout content any: the content of the information resource retrieved for dispatch to the Sponger
      </listitem>
      <listitem>
        inout async_queue any: an asynchronous queue, can be used for background processing (if required)
      </listitem>
      <listitem>
        inout ping_service any: the value in the [SPARQL] section of a Virtuoso instance (i.e PingService? INI parameter) which is enables RDF triple propagation and notification to pinger services such as http://pingthesemanticweb.com
      </listitem>
      <listitem>
        inout api_key any: a plain text id single key value or serialized vector of keys, basically the value of RM_KEY column of the DB.DBA.SYS_RDF_MAPPERS table, which is used for handling of API keys for 3rd party Web Services.
      </listitem>
    </itemizedlist>
    <para>
      Note: the names of the parameters are not important, but their order and presence are vital.
    </para>
  </refsect1>
  <refsect1 id="RD-S-1d">
    <title>Implementation</title>
    <para>
      In the example script we implement a basic cartridge which maps a text/plain mime type to an
      imaginary ontology, which extends the class Document from FOAF with properties 'txt:UniqueWords?'
      and 'txt:Chars', where the prefix 'txt:' we specify as 'urn:txt:v0.0:'.
    </para>
    <para>
      To test the cartridge we just use /sparql endpoint with option 'Retrieve remote RDF data for
      all missing source graphs' to execute:
    </para>
    <programlisting><![CDATA[
      select * from <URL-of-a-txt-file> where { ?s ?p ?o }
    ]]></programlisting>
    <para>
      It is important that the SPARQL_SPONGE role needs to be granted to "SPARQL" user account (or any
      other account bound to SPARQL functionality) in order to enable persistence to local storage.
    </para>
    <para>
      If the above is set correctly then you can just hit <ulink url="/sparql?default-graph-uri=http%3A%2F%2Flocalhost%3A<?U server_http_port() ?>%2Ftutorial%2Fhosting%2Fho_s_30%2FWebCalendar%2Ftools%2Fsummary.txt&should-sponge=soft&query=select+*+where+%7B+%3Fs+%3Fp+%3Fo+.+%7D&format=text%2Fhtml">this link</ulink>.
    </para>
    <para>
      More complex examples can be found in the rdf_cartridges package implementation.
    </para>
  </refsect1>
</refentry>