File: utf8_nchar.Rd

package info (click to toggle)
r-cran-cli 3.6.4-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 4,288 kB
  • sloc: ansic: 16,412; cpp: 37; sh: 13; makefile: 2
file content (48 lines) | stat: -rw-r--r-- 1,465 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/utf8.R
\name{utf8_nchar}
\alias{utf8_nchar}
\title{Count the number of characters in a character vector}
\usage{
utf8_nchar(x, type = c("chars", "bytes", "width", "graphemes", "codepoints"))
}
\arguments{
\item{x}{Character vector, it is converted to UTF-8.}

\item{type}{Whether to count graphemes (characters), code points,
bytes, or calculate the display width of the string.}
}
\value{
Numeric vector, the length of the strings in the character
vector.
}
\description{
By default it counts Unicode grapheme clusters, instead of code points.
}
\examples{
# Grapheme example, emoji with combining characters. This is a single
# grapheme, consisting of five Unicode code points:
# * `\U0001f477` is the construction worker emoji
# * `\U0001f3fb` is emoji modifier that changes the skin color
# * `\u200d` is the zero width joiner
# * `\u2640` is the female sign
# * `\ufe0f` is variation selector 16, requesting an emoji style glyph
emo <- "\U0001f477\U0001f3fb\u200d\u2640\ufe0f"
cat(emo)

utf8_nchar(emo, "chars") # = graphemes
utf8_nchar(emo, "bytes")
utf8_nchar(emo, "width")
utf8_nchar(emo, "codepoints")

# For comparision, the output for width depends on the R version used:
nchar(emo, "chars")
nchar(emo, "bytes")
nchar(emo, "width")
}
\seealso{
Other UTF-8 string manipulation: 
\code{\link{utf8_graphemes}()},
\code{\link{utf8_substr}()}
}
\concept{UTF-8 string manipulation}