File: tbl.src_dbi.Rd

package info (click to toggle)
r-cran-dbplyr 2.5.0%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 2,644 kB
  • sloc: sh: 13; makefile: 2
file content (95 lines) | stat: -rw-r--r-- 3,371 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/src_dbi.R
\name{tbl.src_dbi}
\alias{tbl.src_dbi}
\alias{tbl_dbi}
\title{Use dplyr verbs with a remote database table}
\usage{
\method{tbl}{src_dbi}(src, from, ...)
}
\arguments{
\item{src}{A \code{DBIConnection} object produced by \code{DBI::dbConnect()}.}

\item{from}{Either a table identifier or a literal \code{\link[=sql]{sql()}} string.

Use a string to identify a table in the current schema/catalog. We
recommend using \code{I()} to identify a table outside the default catalog or
schema, e.g. \code{I("schema.table")} or \code{I("catalog.schema.table")}. You can
also use \code{\link[=in_schema]{in_schema()}}/\code{\link[=in_catalog]{in_catalog()}} or \code{\link[DBI:Id]{DBI::Id()}}.}

\item{...}{Passed on to \code{\link[=tbl_sql]{tbl_sql()}}}
}
\description{
All data manipulation on SQL tbls are lazy: they will not actually
run the query or retrieve the data unless you ask for it: they all return
a new \code{tbl_dbi} object. Use \code{\link[=compute]{compute()}} to run the query and save the
results in a temporary in the database, or use \code{\link[=collect]{collect()}} to retrieve the
results to R. You can see the query with \code{\link[=show_query]{show_query()}}.
}
\details{
For best performance, the database should have an index on the variables
that you are grouping by. Use \code{\link[=explain]{explain()}} to check that the database is using
the indexes that you expect.

There is one verb that is not lazy: \code{\link[=do]{do()}} is eager because it must pull
the data into R.
}
\examples{
library(dplyr)

# Connect to a temporary in-memory SQLite database
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")

# Add some data
copy_to(con, mtcars)
DBI::dbListTables(con)

# To retrieve a single table from a source, use `tbl()`
con \%>\% tbl("mtcars")

# Use `I()` for qualified table names
con \%>\% tbl(I("temp.mtcars")) \%>\% head(1)

# You can also use pass raw SQL if you want a more sophisticated query
con \%>\% tbl(sql("SELECT * FROM mtcars WHERE cyl = 8"))

# If you just want a temporary in-memory database, use src_memdb()
src2 <- src_memdb()

# To show off the full features of dplyr's database integration,
# we'll use the Lahman database. lahman_sqlite() takes care of
# creating the database.

if (requireNamespace("Lahman", quietly = TRUE)) {
batting <- copy_to(con, Lahman::Batting)
batting

# Basic data manipulation verbs work in the same way as with a tibble
batting \%>\% filter(yearID > 2005, G > 130)
batting \%>\% select(playerID:lgID)
batting \%>\% arrange(playerID, desc(yearID))
batting \%>\% summarise(G = mean(G), n = n())

# There are a few exceptions. For example, databases give integer results
# when dividing one integer by another. Multiply by 1 to fix the problem
batting \%>\%
  select(playerID:lgID, AB, R, G) \%>\%
  mutate(
   R_per_game1 = R / G,
   R_per_game2 = R * 1.0 / G
 )

# All operations are lazy: they don't do anything until you request the
# data, either by `print()`ing it (which shows the first ten rows),
# or by `collect()`ing the results locally.
system.time(recent <- filter(batting, yearID > 2010))
system.time(collect(recent))

# You can see the query that dplyr creates with show_query()
batting \%>\%
  filter(G > 0) \%>\%
  group_by(playerID) \%>\%
  summarise(n = n()) \%>\%
  show_query()
}
}