File: README.md

package info (click to toggle)
r-cran-stringr 1.5.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 1,032 kB
  • sloc: javascript: 11; sh: 9; makefile: 2
file content (205 lines) | stat: -rw-r--r-- 6,355 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205

<!-- README.md is generated from README.Rmd. Please edit that file -->

# stringr <a href='https://stringr.tidyverse.org'><img src='man/figures/logo.png' align="right" height="139" /></a>

<!-- badges: start -->

[![CRAN
status](https://www.r-pkg.org/badges/version/stringr)](https://cran.r-project.org/package=stringr)
[![R-CMD-check](https://github.com/tidyverse/stringr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidyverse/stringr/actions/workflows/R-CMD-check.yaml)
[![Codecov test
coverage](https://codecov.io/gh/tidyverse/stringr/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidyverse/stringr?branch=main)
[![Lifecycle:
stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
<!-- badges: end -->

## Overview

Strings are not glamorous, high-profile components of R, but they do
play a big role in many data cleaning and preparation tasks. The stringr
package provides a cohesive set of functions designed to make working
with strings as easy as possible. If you’re not familiar with strings,
the best place to start is the [chapter on
strings](https://r4ds.hadley.nz/strings) in R for Data Science.

stringr is built on top of
[stringi](https://github.com/gagolews/stringi), which uses the
[ICU](https://icu.unicode.org) C library to provide fast, correct
implementations of common string manipulations. stringr focusses on the
most important and commonly used string manipulation functions whereas
stringi provides a comprehensive set covering almost anything you can
imagine. If you find that stringr is missing a function that you need,
try looking in stringi. Both packages share similar conventions, so once
you’ve mastered stringr, you should find stringi similarly easy to use.

## Installation

``` r
# The easiest way to get stringr is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just stringr:
install.packages("stringr")
```

## Cheatsheet

<a href="https://github.com/rstudio/cheatsheets/blob/main/strings.pdf"><img src="https://raw.githubusercontent.com/rstudio/cheatsheets/main/pngs/thumbnails/strings-cheatsheet-thumbs.png" width="630" height="242"/></a>

## Usage

All functions in stringr start with `str_` and take a vector of strings
as the first argument:

``` r
x <- c("why", "video", "cross", "extra", "deal", "authority")
str_length(x) 
#> [1] 3 5 5 5 4 9
str_c(x, collapse = ", ")
#> [1] "why, video, cross, extra, deal, authority"
str_sub(x, 1, 2)
#> [1] "wh" "vi" "cr" "ex" "de" "au"
```

Most string functions work with regular expressions, a concise language
for describing patterns of text. For example, the regular expression
`"[aeiou]"` matches any single character that is a vowel:

``` r
str_subset(x, "[aeiou]")
#> [1] "video"     "cross"     "extra"     "deal"      "authority"
str_count(x, "[aeiou]")
#> [1] 0 3 1 2 2 4
```

There are seven main verbs that work with patterns:

- `str_detect(x, pattern)` tells you if there’s any match to the
  pattern:

  ``` r
  str_detect(x, "[aeiou]")
  #> [1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
  ```

- `str_count(x, pattern)` counts the number of patterns:

  ``` r
  str_count(x, "[aeiou]")
  #> [1] 0 3 1 2 2 4
  ```

- `str_subset(x, pattern)` extracts the matching components:

  ``` r
  str_subset(x, "[aeiou]")
  #> [1] "video"     "cross"     "extra"     "deal"      "authority"
  ```

- `str_locate(x, pattern)` gives the position of the match:

  ``` r
  str_locate(x, "[aeiou]")
  #>      start end
  #> [1,]    NA  NA
  #> [2,]     2   2
  #> [3,]     3   3
  #> [4,]     1   1
  #> [5,]     2   2
  #> [6,]     1   1
  ```

- `str_extract(x, pattern)` extracts the text of the match:

  ``` r
  str_extract(x, "[aeiou]")
  #> [1] NA  "i" "o" "e" "e" "a"
  ```

- `str_match(x, pattern)` extracts parts of the match defined by
  parentheses:

  ``` r
  # extract the characters on either side of the vowel
  str_match(x, "(.)[aeiou](.)")
  #>      [,1]  [,2] [,3]
  #> [1,] NA    NA   NA  
  #> [2,] "vid" "v"  "d" 
  #> [3,] "ros" "r"  "s" 
  #> [4,] NA    NA   NA  
  #> [5,] "dea" "d"  "a" 
  #> [6,] "aut" "a"  "t"
  ```

- `str_replace(x, pattern, replacement)` replaces the matches with new
  text:

  ``` r
  str_replace(x, "[aeiou]", "?")
  #> [1] "why"       "v?deo"     "cr?ss"     "?xtra"     "d?al"      "?uthority"
  ```

- `str_split(x, pattern)` splits up a string into multiple pieces:

  ``` r
  str_split(c("a,b", "c,d,e"), ",")
  #> [[1]]
  #> [1] "a" "b"
  #> 
  #> [[2]]
  #> [1] "c" "d" "e"
  ```

As well as regular expressions (the default), there are three other
pattern matching engines:

- `fixed()`: match exact bytes
- `coll()`: match human letters
- `boundary()`: match boundaries

## RStudio Addin

The [RegExplain RStudio
addin](https://www.garrickadenbuie.com/project/regexplain/) provides a
friendly interface for working with regular expressions and functions
from stringr. This addin allows you to interactively build your regexp,
check the output of common string matching functions, consult the
interactive help pages, or use the included resources to learn regular
expressions.

This addin can easily be installed with devtools:

``` r
# install.packages("devtools")
devtools::install_github("gadenbuie/regexplain")
```

## Compared to base R

R provides a solid set of string operations, but because they have grown
organically over time, they can be inconsistent and a little hard to
learn. Additionally, they lag behind the string operations in other
programming languages, so that some things that are easy to do in
languages like Ruby or Python are rather hard to do in R.

- Uses consistent function and argument names. The first argument is
  always the vector of strings to modify, which makes stringr work
  particularly well in conjunction with the pipe:

  ``` r
  letters %>%
    .[1:10] %>% 
    str_pad(3, "right") %>%
    str_c(letters[2:11])
  #>  [1] "a  b" "b  c" "c  d" "d  e" "e  f" "f  g" "g  h" "h  i" "i  j" "j  k"
  ```

- Simplifies string operations by eliminating options that you don’t
  need 95% of the time.

- Produces outputs than can easily be used as inputs. This includes
  ensuring that missing inputs result in missing outputs, and zero
  length inputs result in zero length outputs.

Learn more in `vignette("from-base")`