File: Using_RDF.md

package info (click to toggle)
bio-vcf 0.9.5-4
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 1,208 kB
  • sloc: ruby: 2,812; sh: 74; lisp: 48; makefile: 4
file content (213 lines) | stat: -rw-r--r-- 5,132 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
# Using bio-vcf with RDF

bio-vcf can output many types of formats. In this exercise we will load
a triple store (4store) with VCF data and do some queries on that.

## Install and start 4store

### On GNU Guix

See https://github.com/pjotrp/guix-notes/blob/master/packages/4store.org

### On Debian

Get root

```sh
su
apt-get install avahi-daemon
apt-get install raptor-utils
exit
```

As normal user

```sh
guix package -i sparql-query curl
```

Initialize and start the server again as root (or another user)

```
su
export PATH=/home/user/.guix-profile/bin:$PATH
mkdir -p /var/lib/4store
dbname=test
4s-backend-setup $dbname
4s-backend $dbname
4s-httpd -p 8000 $dbname
```

Try the web browser and point it to http://localhost:8000/status/

Open a new terminal as user.


Generate rdf with bio-vcf template

```ruby
=HEADER
@prefix : <http://biobeat.org/rdf/ns#> .
=BODY
<%
id = ['chr'+rec.chr,rec.pos,rec.alt].join('_')
%>
:<%= id %>
  :query_id "<%= id %>";
  :chr "<%= rec.chr %>" ;
  :alt "<%= rec.alt.join("") %>" ;
  :pos <%= rec.pos %> .


```

so it looks like

```
:chrX_134713855_A
  :query_id "chrX_134713855_A";
  :chr "X" ;
  :alt "A" ;
  :pos 134713855 .
```

and test with rapper using [gatk_exome.vcf](https://github.com/pjotrp/bioruby-vcf/blob/master/test/data/input/gatk_exome.vcf)

```sh
cat gatk_exome.vcf |bio-vcf -v --template rdf_template.erb
cat gatk_exome.vcf |bio-vcf -v --template rdf_template.erb > my.rdf
rapper -i turtle my.rdf
```

Load into 4store (when no errors)

```bash
rdf=my.rdf
uri=http://localhost:8000/data/http://biobeat.org/data/$rdf
curl -X DELETE $uri
curl -T $rdf -H 'Content-Type: application/x-turtle' $uri
201 imported successfully
This is a 4store SPARQL server
```

First SPARQL query

```sh
SELECT ?id
WHERE
{
  ?id   <http://biobeat.org/rdf/ns#chr>    "X".
}
```

```
cat sparql1.rq |sparql-query "http://localhost:8000/sparql/" -p
┌──────────────────────────────────────────────┐
│ ?id                                          │
├──────────────────────────────────────────────┤
│ <http://biobeat.org/rdf/ns#chrX_107911706_C> │
│ <http://biobeat.org/rdf/ns#chrX_55172537_A>  │
│ <http://biobeat.org/rdf/ns#chrX_134713855_A> │
└──────────────────────────────────────────────┘
```

A simple python query may look like

```python
import requests
import subprocess

host = "http://localhost:8000/"

query = """
SELECT ?s ?p ?o WHERE {
    ?s ?p ?o .
} LIMIT 10
"""

r = requests.post(host, data={ "query": query, "output": "text" })
# print r.url

print r.text
```

renders

```
?id
<http://biobeat.org/rdf/ns#chrX_107911706_C>
<http://biobeat.org/rdf/ns#chrX_55172537_A>
<http://biobeat.org/rdf/ns#chrX_134713855_A>
```

A working example if you are using the server
http://guix.genenetwork.org and the correct PREFIX:

```python
#! /usr/bin/env python
import requests
import subprocess

host = "http://guix.genenetwork.org/sparql/"
query = """
PREFIX : <http://biobeat.org/rdf/pjotr/ns#>
SELECT ?id ?chr ?pos ?alt
WHERE
{
  { ?id   :chr      "X" . }
  UNION
  { ?id   :chr      "1" . }
  ?id   :chr    ?chr .
  ?id   :alt    ?alt .
  ?id   :pos    ?pos .
  FILTER (?pos > 107911705) .
}
"""
r = requests.post(host, data={ "query": query, "output": "text" })
print r.text
```

## EBI


EBI SPARQL has some advanced examples of queries, such as

```
https://www.ebi.ac.uk/rdf/services/ensembl/sparql
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX sio: <http://semanticscience.org/resource/>
PREFIX faldo: <http://biohackathon.org/resource/faldo#>
PREFIX identifiers: <http://identifiers.org/>
PREFIX ensembl: <http://rdf.ebi.ac.uk/resource/ensembl/>
PREFIX ensembltranscript: <http://rdf.ebi.ac.uk/resource/ensembl.transcript/>
PREFIX ensemblexon: <http://rdf.ebi.ac.uk/resource/ensembl.exon/>
PREFIX ensemblprotein: <http://rdf.ebi.ac.uk/resource/ensembl.protein/>
PREFIX ensemblterms: <http://rdf.ebi.ac.uk/terms/ensembl/>

SELECT DISTINCT ?transcript ?id ?typeLabel ?reference ?begin ?end ?location {
  ?transcript obo:SO_transcribed_from ensembl:ENSG00000139618 ;
              a ?type;
              dc:identifier ?id .
  OPTIONAL {
    ?transcript faldo:location ?location .
    ?location faldo:begin [faldo:position ?begin] .
    ?location faldo:end [faldo:position ?end ] .
    ?location faldo:reference ?reference .
  }
  OPTIONAL {?type rdfs:label ?typeLabel}
}
```

See https://www.ebi.ac.uk/rdf/services/ensembl/sparql

# Exercise

Today's exercise is to create a graph using bio-vcf and/or a small program using
RDF triples and define a SPARQL query.

The more interesting the graph/SPARQL the better.