File: README.md

package info (click to toggle)
r-cran-readr 1.4.0-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 1,692 kB
  • sloc: cpp: 3,963; ansic: 1,962; makefile: 2
file content (181 lines) | stat: -rw-r--r-- 6,527 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181

<!-- README.md is generated from README.Rmd. Please edit that file -->

# readr <a href="https://readr.tidyverse.org"><img src="man/figures/logo.png" align="right" height="139" /></a>

[![CRAN\_Status\_Badge](https://www.r-pkg.org/badges/version/readr)](https://cran.r-project.org/package=readr)
[![R build
status](https://github.com/tidyverse/readr/workflows/R-CMD-check/badge.svg)](https://github.com/tidyverse/readr)
[![Coverage
Status](https://codecov.io/gh/tidyverse/readr/coverage.svg?branch=master)](https://codecov.io/gh/tidyverse/readr?branch=master)

## Overview

The goal of readr is to provide a fast and friendly way to read
rectangular data (like csv, tsv, and fwf). It is designed to flexibly
parse many types of data found in the wild, while still cleanly failing
when data unexpectedly changes. If you are new to readr, the best place
to start is the [data import
chapter](https://r4ds.had.co.nz/data-import.html) in R for data science.

## Installation

``` r
# The easiest way to get readr is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just readr:
install.packages("readr")

# Or the the development version from GitHub:
# install.packages("devtools")
devtools::install_github("tidyverse/readr")
```

## Cheatsheet

<a href="https://github.com/rstudio/cheatsheets/blob/master/data-import.pdf"><img src="https://raw.githubusercontent.com/rstudio/cheatsheets/master/pngs/thumbnails/data-import-cheatsheet-thumbs.png" width="630" height="252"/></a>

## Usage

readr is part of the core tidyverse, so load it with:

``` r
library(tidyverse)
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
#> ✔ ggplot2 3.3.2          ✔ purrr   0.3.4     
#> ✔ tibble  3.0.3          ✔ dplyr   1.0.2.9000
#> ✔ tidyr   1.1.2          ✔ stringr 1.4.0     
#> ✔ readr   1.3.1.9000     ✔ forcats 0.5.0
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
```

To accurately read a rectangular dataset with readr you combine two
pieces: a function that parses the overall file, and a column
specification. The column specification describes how each column should
be converted from a character vector to the most appropriate data type,
and in most cases it’s not necessary because readr will guess it for you
automatically.

readr supports seven file formats with seven `read_` functions:

  - `read_csv()`: comma separated (CSV) files
  - `read_tsv()`: tab separated files
  - `read_delim()`: general delimited files
  - `read_fwf()`: fixed width files
  - `read_table()`: tabular files where columns are separated by
    white-space.
  - `read_log()`: web log files

In many cases, these functions will just work: you supply the path to a
file and you get a tibble back. The following example loads a sample
file bundled with readr:

``` r
mtcars <- read_csv(readr_example("mtcars.csv"))
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   mpg = col_double(),
#>   cyl = col_double(),
#>   disp = col_double(),
#>   hp = col_double(),
#>   drat = col_double(),
#>   wt = col_double(),
#>   qsec = col_double(),
#>   vs = col_double(),
#>   am = col_double(),
#>   gear = col_double(),
#>   carb = col_double()
#> )
```

Note that readr prints the column specification. This is useful because
it allows you to check that the columns have been read in as you expect,
and if they haven’t, you can easily copy and paste into a new call:

``` r
mtcars <- read_csv(readr_example("mtcars.csv"), col_types = 
  cols(
    mpg = col_double(),
    cyl = col_integer(),
    disp = col_double(),
    hp = col_integer(),
    drat = col_double(),
    vs = col_integer(),
    wt = col_double(),
    qsec = col_double(),
    am = col_integer(),
    gear = col_integer(),
    carb = col_integer()
  )
)
```

`vignette("readr")` gives more detail on how readr guesses the column
types, how you can override the defaults, and provides some useful tools
for debugging parsing problems.

## Alternatives

There are two main alternatives to readr: base R and data.table’s
`fread()`. The most important differences are discussed below.

### Base R

Compared to the corresponding base functions, readr functions:

  - Use a consistent naming scheme for the parameters (e.g. `col_names`
    and `col_types` not `header` and `colClasses`).

  - Are much faster (up to 10x).

  - Leave strings as is by default, and automatically parse common
    date/time formats.

  - Have a helpful progress bar if loading is going to take a while.

  - All functions work exactly the same way regardless of the current
    locale. To override the US-centric defaults, use `locale()`.

### data.table and `fread()`

[data.table](https://github.com/Rdatatable/data.table) has a function
similar to `read_csv()` called fread. Compared to fread, readr
functions:

  - Are slower (currently \~1.2-2x slower. If you want absolutely the
    best performance, use `data.table::fread()`.

  - Use a slightly more sophisticated parser.

  - Forces you to supply all parameters, where `fread()` saves you work
    by automatically guessing the delimiter, whether or not the file has
    a header, and how many lines to skip.

  - Are built on a different underlying infrastructure. Readr functions
    are designed to be quite general, which makes it easier to add
    support for new rectangular data formats. `fread()` is designed to
    be as fast as possible.

## Acknowledgements

Thanks to:

  - [Joe Cheng](https://github.com/jcheng5) for showing me the beauty of
    deterministic finite automata for parsing, and for teaching me why I
    should write a tokenizer.

  - [JJ Allaire](https://github.com/jjallaire) for helping me come up
    with a design that makes very few copies, and is easy to extend.

  - [Dirk Eddelbuettel](http://dirk.eddelbuettel.com) for coming up with
    the name\!

## Code of Conduct

Please note that the readr project is released with a [Contributor Code
of Conduct](https://readr.tidyverse.org/CONDUCT.html). By contributing
to this project, you agree to abide by its terms.