File: databases.html

package info (click to toggle)
semweb 1.05%2Bdfsg-4
  • links: PTS, VCS
  • area: main
  • in suites: squeeze
  • size: 4,000 kB
  • ctags: 2,832
  • sloc: cs: 14,483; makefile: 176; perl: 20; sh: 11; ansic: 7
file content (129 lines) | stat: -rw-r--r-- 7,760 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
<html>
	<head>
		<title>SemWeb: Docs: Using Databases</title>
		<link rel="stylesheet" type="text/css" href="stylesheet.css"/>
	</head>
	
	<body>

<p><a href="index.html">SemWeb Documentation</a></p>
<h1>SemWeb: Using Databases</h1>

<p>Besides the <tt>MemoryStore</tt>, SemWeb provides four SQL-backed persistent stores: for MySQL, PostgreSQL, SQL Server, and Sqlite.  Since Sqlite is the simplest to get going, I'll use it as an example.</p>

<h3>Overview of database storage</h3>

<p>Just as a brief note, the MySQL store is several times faster than the Sqlite store, at least for adding statements into the store.  Sqlite is easier to work with, so it's appropriate for small stores of data maybe up to 100,000 statements.  Using the Import method of the MySQL store, the store can load over ten million statements in as little as 45 minutes on modest hardware, which is about 4,500 statements per second.  The Sqlite store loads the same graph at 1,400 statements per second.  The speed depends a lot on the structure of data being loaded.  I haven't tested the PostgreSQL store myself, so I don't know how fast it is.</p>

<p>The SQL stores keep the RDF data in three tables, which have a common prefix provided by you.  The main table is the <i>prefix</i>_statements table, which has a row for each statement in the store.  The columns are numeric IDs for the subject, predicate, object, and meta.  The IDs are linked to URIs, for entities, and values, for Literals, in the _entities and _literals tables.</p>

<p>The _statements table is indexed many ways to make the performance of Select fast.  In the MySQL store, the table has a UNIQUE index over all of the columns to ensure that an assertion can be added only once.  This speeds up operations later on when they don't have to check for duplicate statements and doesn't hurt the performance of adding statements significantly.</p>

<h3>Getting set up</h3>

<p>First, prepare your environment for using Sqlite.  To do that, <a href="http://www.sqlite.org">download SQLite</a> and make it available for your program at run time.  In Windows, put the <tt>dll</tt> into your System32 directoy.  In Unix, install the RPM, or put the <tt>so</tt> in <tt>/usr/local/lib</tt> or else make sure the <tt>LD_LIBRARY_PATH</tt> environment variable contains a path containing the Sqlite library.</p>

<p>Second, make sure you have the Mono.Data.SqliteClient data adapter <a href="http://mono-project.com/Downloads">from Mono</a>.  If you're using Unix, be sure you have the mono-data-sqlite package installed.  If you're using Windows and don't have the adapter already, you can get it out of the <a href="http://mono.ximian.com/daily/monocharge-latest.tar.gz">latest daily MonoCharge</a> (either put it in the GAC, or put it along side the other SemWeb DLLs).</p>

<p>Last, download a RDF file to your hard disk, such as: http://www.mozilla.org/news.rdf.</p>

<h3>Moving data into a database</h3>

<p>We want to get going quickly, so we'll use SemWeb's <tt>rdfstorage.exe</tt> tool to move some data into a Sqlite database.  Go to a command prompt in the bin directory of the SemWeb package.  Then run:</p>

<pre class="code">
$ mono rdfstorage.exe news.rdf --out "sqlite:rdf:Uri=file:news.sqlite;version=3"
</pre>

(In Windows, you can leave off the 'mono' to use the Microsoft runtime.)  You'll get the following output (or something like it):

<pre class="code">
news.rdf  0m0s, 81 statements, 597 st/sec
Total Time: 0m0s, 81 statements, 505 st/sec
</pre>

<p>This command read the RDF/XML file and created a Sqlite version 3 database named news.sqlite.</p>

<p>The "out" argument is what I call a "spec" string.  It tells SemWeb how to open up a storage mechanism.  This spec string has three parts.  The first part is "sqlite" which tells SemWeb to use the Sqlite storage engine.  The second part is "rdf" which is the prefix for the tables to use in the database.  The third part, "Uri=file:news.sqlite;version=3", is a connection string used by Mono.Data.SqliteClient.  (For more info on the connection string, see <a href="http://mono-project.com/SQLite">the Mono website</a>.)</p>

<p>If you omit the "out" argument entirely, the program will dump the file to the console in Turtle format.</p>

<h3>Getting data out of the database</h3>

<p>To query the database, we'll write a program.</p>

<p>Create a store backed by the database using the following method:</p>

<pre class="code">
Store store = Store.Create("sqlite:rdf:Uri=file:news.sqlite;version=3");
</pre>

<p>The argument to <tt>Create</tt> is the same "spec" string as before.  This method can be used to create various types of data sources in the factory design pattern.</p>

<p>The actual implementation of the Sqlite store is in the <tt>SemWeb.SqliteStore.dll</tt> assembly, but by using <tt>Create</tt> you don't need to reference that assembly at compile time.  Just make sure it's available at run time.</p>

<p>Now you can select statements from it just as before.  It will be a little slower than the memory-backed store, but you haven't loaded the entire store into memory.</p>

<p>You can also add statements into the store, just as with the memory store.</p>

<p>Be sure to close the database at the end with <tt>Dispose</tt>.</p>

<p>Here's a complete program:</p>

<pre class="code" file="../examples/sqlite.cs">using System;
using SemWeb;

public class Sqlite {
    const string RDF = "http://www.w3.org/1999/02/22-rdf-syntax-ns#";
    const string FOAF = "http://xmlns.com/foaf/0.1/";
    
    static readonly Entity rdftype = RDF+"type";
    static readonly Entity foafPerson = FOAF+"Person";
    static readonly Entity foafname = FOAF+"name";

    public static void Main() {
        Store store = Store.Create("sqlite:rdf:Uri=file:foaf.sqlite;version=3");
        
        Entity newPerson = new Entity("http://www.example.org/me");
        store.Add(new Statement(newPerson, rdftype, foafPerson));
        store.Add(new Statement(newPerson, foafname, (Literal)"New Guy"));
        
        Console.WriteLine("These are the people in the file:");
        foreach (Entity s in store.SelectSubjects(rdftype, foafPerson)) {
            foreach (Resource r in store.SelectObjects(s, foafname))
                Console.WriteLine(r);
        }
        Console.WriteLine();
        
        store.Dispose();
    }
}
</pre>

<h3>Other Database Notes</h3>

<p>Besides Sqlite, MySQL, PostgreSQL, and SQL Server are also supported.  To use those databases,
just replace the &ldquo;spec&rdquo; string ("sqlite:rdf:...") with something like:</p>

<pre>"mysql:rdf:Database=rdf; . . ."</pre>
<pre>"postgresql:rdf:Database=rdf; . . ."</pre>
<pre>"sqlserver:rdf:Database=rdf; . . ."</pre>

<p>Everything after the second colon is a MySQL connection string to make the connection.
You'll have to consult the MySQL/Connector.Net documentation for more information
about the connection string.</p>

<p>For the SQL stores, the Import() method is optimized so that statements are
added to the tables in batch, where possible. Further, the MySQL store can
wrap the Import SQL statements with SQL commands that make the Import faster.
By default, the Import is wrapped in a BEGIN/COMMIT transaction. Set the
SEMWEB_MYSQL_IMPORT_MODE environment variable (or hack the source code) to
change the behavior. Setting this environment variable to LOCK will use
LOCK TABLES to create a write lock on the statements, entities, and literals
tables for the store, which improves performance significantly. Setting
the environment variable to DISABLEKEYS uses ALTER TABLE DISABLE KEYS
on the statements table, which improves performance while it adds statements,
but causes a lengthy rebuilding of the index at the end of the import.</p>

</body>
</html>