File: stri_enc_isutf8.Rd

package info (click to toggle)
r-cran-stringi 1.7.12-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 39,772 kB
  • sloc: cpp: 482,349; ansic: 51,900; perl: 471; makefile: 9; sh: 1
file content (60 lines) | stat: -rw-r--r-- 1,915 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/encoding_detection.R
\name{stri_enc_isutf8}
\alias{stri_enc_isutf8}
\title{Check If a Data Stream Is Possibly in UTF-8}
\usage{
stri_enc_isutf8(str)
}
\arguments{
\item{str}{character vector, a raw vector, or
a list of \code{raw} vectors}
}
\value{
Returns a logical vector.
Its i-th element indicates whether the i-th string
corresponds to a valid UTF-8 byte sequence.
}
\description{
The function checks whether given sequences of bytes forms
a proper UTF-8 string.
}
\details{
\code{FALSE} means that a string is certainly not valid UTF-8.
However, false positives are possible. For instance,
\code{(c4,85)} represents ('a with ogonek') in UTF-8
as well as ('A umlaut', 'Ellipsis') in WINDOWS-1250.
Also note that UTF-8, as well as most 8-bit encodings, extend ASCII
(note that \code{\link{stri_enc_isascii}} implies that
\code{\link{stri_enc_isutf8}}).

However, the longer the sequence,
the greater the possibility that the result
is indeed in UTF-8 -- this is because not all sequences of bytes
are valid UTF-8.

This function is independent of the way \R marks encodings in
character strings (see \link{Encoding} and \link{stringi-encoding}).
}
\examples{
stri_enc_isutf8(letters[1:3])
stri_enc_isutf8('\u0105\u0104')
stri_enc_isutf8('\u1234\u0222')

}
\seealso{
The official online manual of \pkg{stringi} at \url{https://stringi.gagolewski.com/}

Gagolewski M., \pkg{stringi}: Fast and portable character string processing in R, \emph{Journal of Statistical Software} 103(2), 2022, 1-59, \doi{10.18637/jss.v103.i02}

Other encoding_detection: 
\code{\link{about_encoding}},
\code{\link{stri_enc_detect2}()},
\code{\link{stri_enc_detect}()},
\code{\link{stri_enc_isascii}()},
\code{\link{stri_enc_isutf16be}()}
}
\concept{encoding_detection}
\author{
\href{https://www.gagolewski.com/}{Marek Gagolewski} and other contributors
}