File: stpg.Rnw

package info (click to toggle)
r-cran-spacetime 1.2-8%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 3,300 kB
  • sloc: sh: 13; makefile: 2
file content (250 lines) | stat: -rw-r--r-- 8,262 bytes parent folder | download | duplicates (8)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
% Mon Jun 20 22:48:30 CEST 2011

\documentclass[nogin,a4paper]{article}

%\usepackage[OT1]{fontenc}

\usepackage[colorlinks=true,urlcolor=blue]{hyperref}
\usepackage{Sweave}
\usepackage[utf8]{inputenc}
\newcommand{\code}[1]{{\tt #1}}

\title{\bf Spatio-temporal objects to proxy a PostgreSQL table } 

\author{
\includegraphics[width=5cm]{ifgi-logo_int} \hspace{.5cm}
\includegraphics[width=4cm]{logo52n} \\
\href{mailto:edzer.pebesma@uni-muenster.de}{Edzer Pebesma}
}
\date{\small \today }

\begin{document}
%\VignetteIndexEntry{ Spatio-temporal objects to proxy a PostgreSQL table }
\maketitle

\begin{abstract}
This vignette describes and implements a class that proxies data
sets in a PostgreSQL database with classes in the spacetime package.
This might allow access to data sets too large to fit into R memory.
\end{abstract}

\tableofcontents

\section{Introduction}

Massive data are difficult to analyze with R, because R objects
reside in memory. Spatio-temporal data easily become massive, either
because the spatial domain contains a lot of information (satellite
imagery), or many time steps are available (high resolution sensor
data), or both. This vignette shows how data residing in a data
base can be read into R using spatial or temporal selection.

In case the commands are not evaluated because CRAN packages cannot
access an external data base, a document with evaluated commands
is found \href{http://pebesma.staff.ifgi.de/stpg.pdf}{here}.

This vignette was run using the following libraries:
<<eval=FALSE>>=
library(RPostgreSQL)
@
<<>>=
library(sp)
library(spacetime)
@

\section{Setting up a database}

We will first set the characteristics of the database\footnote{It is
assumed that the database is {\em spatially enabled}, i.e. it
understands how simple features are stored. The standard for this
from the open geospatial consortium is described
\href{http://www.opengeospatial.org/standards/sfs}{here}.}
<<>>=
dbname = "postgis"
user = "edzer"
password = "pw"
#password = ""
@

Next, we will create a driver and connect to the database: 
<<eval=FALSE>>=
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname=dbname, user=user, password=password,
host='localhost', port='5432')
@
It should be noted that these first two commands are specific to
PostgreSQL; from here on, commands are generic and should work
for any database connector that uses the interface of package
\code{DBI}.

We now remove a set of tables (if present) so they can be created later on:
<<eval=FALSE>>=
dbRemoveTable(con, "rural_attr")
dbRemoveTable(con, "rural_space")
dbRemoveTable(con, "rural_time")
dbRemoveTable(con, "space_select")
@
%dbSendQuery(con, "drop index time_idx;")

Now we will create the table with spatial features (observation
locations). For this, we need the \code{rgdal} function \code{writeOGR},
which by default creates an index on the geometry:
<<eval=FALSE>>=
data(air)
rural = STFDF(stations, dates, data.frame(PM10 = as.vector(air)))
rural = as(rural, "STSDF")
p = rural@sp
sp = SpatialPointsDataFrame(p, data.frame(geom_id=1:length(p)))
library(rgdal)
OGRstring = paste("PG:dbname=", dbname, " user=", user, 
	" password=", password, " host=localhost", sep = "")
print(OGRstring)
writeOGR(sp, OGRstring, "rural_space", driver = "PostgreSQL")
@

In case you have problems replicating this, verify that your \code{rgdal}
installation privides the \code{PostgreSQL} driver, e.g. by checking that 
<<eval=FALSE>>=
subset(ogrDrivers(), name == "PostgreSQL")$write
@
prints a \code{TRUE}, and not a \code{logical(0)}.

Second, we will write the table with times to the database,
and create an index to time:
<<eval=FALSE>>=
df = data.frame(time = index(rural@time), time_id = 1:nrow(rural@time))
dbWriteTable(con, "rural_time", df)
idx = "create index time_idx on rural_time (time);"
dbSendQuery(con, idx)
@

Finally, we will write the full attribute data table to PosgreSQL,
along with its indexes to the spatial and temporal tables:
<<eval=FALSE>>=
idx = rural@index
names(rural@data) = "pm10" # lower case
df = cbind(data.frame(geom_id = idx[,1], time_id = idx[,2]), rural@data)
dbWriteTable(con, "rural_attr", df)
@

\section{A proxy class}
The following class has as components a spatial and temporal
data structure, but no spatio-temporal attributes (they are assumed
to be the most memory-hungry). The other slots refer to the according
tables in the PostGIS database, the name(s) of the attributes in the
attribute table, and the database connection.
<<keep.source=TRUE>>=
setClass("ST_PG", contains = "ST", 
	# slots = c(space_table = "character",
	representation(space_table = "character",
	time_table = "character",
	attr_table = "character",
	attr = "character",
	con = "PostgreSQLConnection"))
@
Next, we will create an instance of the new class:
<<eval=FALSE,keep.source=TRUE>>=
rural_proxy = new("ST_PG", 
	#ST(rural@sp, rural@time, rural@endTime),
	as(rural, "ST"),
	space_table = "rural_space",
	time_table = "rural_time",
	attr_table = "rural_attr",
	attr = "pm10",
	con = con)
@

\section{Selection based on time period and/or region }

The following two helper functions create a character string with
an SQL command that for a temporal or spatial selection:
<<keep.source=TRUE>>=
.SqlTime = function(x, j) {
	stopifnot(is.character(j))
	require(xts)
	t = .parseISO8601(j)
	t1 = paste("'", t$first.time, "'", sep = "")
	t2 = paste("'", t$last.time, "'", sep = "")
	what = paste("geom_id, time_id", paste(x@attr, collapse = ","), sep = ", ")
	paste("SELECT", what, "FROM", x@attr_table, "AS a JOIN", x@time_table,
		"AS b USING (time_id) WHERE b.time >= ", t1, "AND b.time <=", t2,";")
}
.SqlSpace = function(x, i) {
	stopifnot(is(i, "Spatial"))
	writeOGR(i, OGRstring, "space_select", driver = "PostgreSQL")
	what = paste("geom_id, time_id", paste(x@attr, collapse = ","), sep = ", ")
	paste("SELECT",  what, "FROM",  x@attr_table, 
		"AS a JOIN (SELECT p.wkb_geometry, p.geom_id FROM",
		x@space_table, " AS p, space_select AS q",
		"WHERE ST_Intersects(p.wkb_geometry, q.wkb_geometry))",
		"AS b USING (geom_id);")
}
@
The following selection method selects a time period only, as
defined by the methods in package \code{xts}. A time period is
defined as a valid ISO8601 string, e.g. 2005-05 is the full month
of May for 2005.
<<keep.source=TRUE>>=
setMethod("[", "ST_PG", function(x, i, j, ... , drop = TRUE) {
	stopifnot(missing(i) != missing(j)) # either of them present
	if (missing(j))
		sql = .SqlSpace(x,i)
	else
		sql = .SqlTime(x,j)
	print(sql)
	df = dbGetQuery(x@con, sql)
	STSDF(x@sp, x@time, df[x@attr], as.matrix(df[c("geom_id", "time_id")]))
})
@
<<eval=FALSE>>=
pm10_20050101 = rural_proxy[, "2005-01-01"]
summary(pm10_20050101)
summary(rural[,"2005-01-01"])

pm10_NRW = rural_proxy[DE_NUTS1[10,],]
summary(pm10_NRW)
summary(rural[DE_NUTS1[10,],])
@
Clearly, the temporal and spatial components are not subsetted, so do
not reflect the actual selection made; the attribute data however do;
the following selection step ``cleans'' the unused features/times:
<<eval=FALSE>>=
dim(pm10_NRW)
pm10_NRW = pm10_NRW[T,]
dim(pm10_NRW)
@
Comparing sizes, we see that the selected object is smaller:
<<eval=FALSE>>=
object.size(rural)
object.size(pm10_20050101)
object.size(pm10_NRW)
@

\section{Closing the database connection}
The following commands close the database connection and release
the driver resources:
<<eval=FALSE>>=
dbDisconnect(con)
dbUnloadDriver(drv)
@

\section{Limitations and alternatives}

The example code in this vignette is meant as an example and is not
meant as a full-fledged database access mechanism for spatio-temporal
data bases. In particular, the selection here can do only {\em
one} of spatial locations (entered as features) or time periods.
If database access is only based on time, a spatially enabled
database (such as PostGIS) would not be needed.

For massive databases, data would typically not be loaded into
the database from R first, but from somewhere else.

An alternative to access from R large, possibly massive
spatio-temporal data bases for the case where the data base is
accessible through a sensor observation
service (SOS) is provided by the R package
\href{https://cran.r-project.org/web/package=sos4R}{sos4R},
which is also on CRAN.

\end{document}