File: reprex.Rmd

package info (click to toggle)
r-cran-dbplyr 2.3.0%2Bdfsg-1
links: PTS, VCS
area: main
in suites: bookworm
size: 2,376 kB
sloc: sh: 13; makefile: 2
file content (83 lines) | stat: -rw-r--r-- 2,976 bytes
parent folder | download | duplicates (6)
---
title: "Reprexes for dbplyr"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Reprexes for dbplyr}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

If you're reporting a bug in dbplyr, it is much easier for me to help you if you can supply a [reprex](https://reprex.tidyverse.org) that I can run on my computer. Creating reprexes for dbplyr is particularly challenging because you are probably using a database that you can't share with me. Fortunately, in many cases you can still demonstrate the problem even if I don't have the complete dataset, or even access to the database system that you're using.

This vignette outlines three approaches for creating reprexes that will work anywhere:

* Use `memdb_frame()`/`tbl_memdb()` to easily create datasets that live in an 
  in-memory SQLite database.
  
* Use `lazy_frame()`/`tbl_lazy()` to simulate SQL generation of dplyr pipelines.

* Use `translate_sql()` to simulate SQL generation of columnar expression.

```{r setup, message = FALSE}
library(dplyr)
library(dbplyr)
```

## Using `memdb_frame()`

The first place to start is with SQLite. SQLite is particularly appealing because it's completely embedded instead an R package so doesn't have any external dependencies. SQLite is designed to be small and simple, so it can't demonstrate all problems, but it's easy to try out and a great place to start.

You can easily create a SQLite in-memory database table using `memdb_frame()`:

```{r}
mf <- memdb_frame(g = c(1, 1, 2, 2, 2), x = 1:5, y = 5:1)
mf

mf %>% 
  group_by(g) %>% 
  summarise_all(mean, na.rm = TRUE)
```

Reprexes are easiest to understand if you create very small custom data, but if you do want to use an existing data frame you can use `tbl_memdb()`:

```{r}
mtcars_db <- tbl_memdb(mtcars)
mtcars_db %>% 
  group_by(cyl) %>% 
  summarise(n = n()) %>% 
  show_query()
```

## Translating verbs

Many problems with dbplyr come down to incorrect SQL generation. Fortunately, it's possible to generate SQL without a database using `lazy_frame()` and `tbl_lazy()`. Both take an `con` argument which takes a database "simulator" like `simulate_postgres()`, `simulate_sqlite()`, etc.

```{r}
x <- c("abc", "def", "ghif")

lazy_frame(x = x, con = simulate_postgres()) %>% 
  head(5) %>% 
  show_query()

lazy_frame(x = x, con = simulate_mssql()) %>% 
  head(5) %>% 
  show_query()
```

If you isolate the problem to incorrect SQL generation, it would be very helpful if you could also suggest more appropriate SQL.

## Translating individual expressions

In some cases, you might be able to track the problem down to incorrect translation for a single column expression. In that case, you can make your reprex even simpler with `translate_sql()`:

```{r}
translate_sql(substr(x, 1, 2), con = simulate_postgres())
translate_sql(substr(x, 1, 2), con = simulate_sqlite())
```