File: dbApply.Rd

package info (click to toggle)
rmysql 0.10.16-1
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 416 kB
  • sloc: ansic: 1,662; sh: 58; makefile: 5
file content (87 lines) | stat: -rw-r--r-- 3,350 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
% Generated by roxygen2 (4.1.1): do not edit by hand
% Please edit documentation in R/extension.R
\docType{methods}
\name{dbApply}
\alias{dbApply}
\alias{dbApply,MySQLResult-method}
\title{Apply R/S-Plus functions to remote groups of DBMS rows (experimental)}
\usage{
dbApply(res, ...)

\S4method{dbApply}{MySQLResult}(res, INDEX, FUN = stop("must specify FUN"),
  begin = NULL, group.begin = NULL, new.record = NULL, end = NULL,
  batchSize = 100, maxBatch = 1e+06, ..., simplify = TRUE)
}
\arguments{
\item{res}{a result set (see \code{\link[DBI]{dbSendQuery}}).}

\item{...}{any additional arguments to be passed to \code{FUN}.}

\item{INDEX}{a character or integer specifying the field name or field
number that defines the various groups.}

\item{FUN}{a function to be invoked upon identifying the last row from every
group. This function will be passed a data frame holding the records of the
current group, a character string with the group label, plus any other
arguments passed to \code{dbApply} as \code{"..."}.}

\item{begin}{a function of no arguments to be invoked just prior to retrieve
the first row from the result set.}

\item{group.begin}{a function of one argument (the group label) to be
invoked upon identifying a row from a new group}

\item{new.record}{a function to be invoked as each individual record is
fetched.  The first argument to this function is a one-row data.frame
holding the new record.}

\item{end}{a function of no arguments to be invoked just after retrieving
the last row from the result set.}

\item{batchSize}{the default number of rows to bring from the remote result
set. If needed, this is automatically extended to hold groups bigger than
\code{batchSize}.}

\item{maxBatch}{the absolute maximum of rows per group that may be extracted
from the result set.}

\item{simplify}{Not yet implemented}
}
\value{
A list with as many elements as there were groups in the result set.
}
\description{
Applies R/S-Plus functions to groups of remote DBMS rows without bringing an
entire result set all at once.  The result set is expected to be sorted by
the grouping field.

The MySQL implementation allows us to register R
functions that get invoked when certain fetching events occur. These include
the ``begin'' event (no records have been yet fetched), ``begin.group'' (the
record just fetched belongs to a new group), ``new record'' (every fetched
record generates this event), ``group.end'' (the record just fetched was the
last row of the current group), ``end'' (the very last record from the
result set). Awk and perl programmers will find this paradigm very familiar
(although SAP's ABAP language is closer to what we're doing).
}
\details{
This function is meant to handle somewhat gracefully(?) large
amounts of data from the DBMS by bringing into R manageable chunks (about
\code{batchSize} records at a time, but not more than \code{maxBatch}); the
idea is that the data from individual groups can be handled by R, but not
all the groups at the same time.
}
\examples{
if (mysqlHasDefault()) {
con <- dbConnect(RMySQL::MySQL(), dbname = "test")

dbWriteTable(con, "mtcars", mtcars, overwrite = TRUE)
res <- dbSendQuery(con, "SELECT * FROM mtcars ORDER BY cyl")
dbApply(res, "cyl", function(x, grp) quantile(x$mpg, names=FALSE))

dbClearResult(res)
dbRemoveTable(con, "mtcars")
dbDisconnect(con)
}
}