File: stri_trans_general.Rd

package info (click to toggle)
r-cran-stringi 1.7.12-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 39,772 kB
  • sloc: cpp: 482,349; ansic: 51,900; perl: 471; makefile: 9; sh: 1
file content (111 lines) | stat: -rw-r--r-- 4,106 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/trans_transliterate.R
\name{stri_trans_general}
\alias{stri_trans_general}
\title{General Text Transforms, Including Transliteration}
\usage{
stri_trans_general(str, id, rules = FALSE, forward = TRUE)
}
\arguments{
\item{str}{character vector}

\item{id}{a single string with transform identifier,
see \code{\link{stri_trans_list}}, or custom transliteration rules}

\item{rules}{if \code{TRUE}, treat \code{id} as a string with
semicolon-separated transliteration rules (see the \pkg{ICU} manual);}

\item{forward}{transliteration direction (\code{TRUE} for forward,
\code{FALSE} for reverse)}
}
\value{
Returns a character vector.
}
\description{
\pkg{ICU} General transforms provide different ways
for processing Unicode text. They are useful in handling a variety
of different tasks, including:
\itemize{
\item    Upper Case, Lower Case, Title Case, Full/Halfwidth conversions,
\item    Normalization,
\item    Hex and Character Name conversions,
\item    Script to Script conversion/transliteration.
}
}
\details{
\pkg{ICU} Transforms were mainly designed to transliterate characters
from one script to another (for example, from Greek to Latin,
or Japanese Katakana to Latin).
However, these services are also capable of handling a much
broader range of tasks.
In particular, the Transforms include pre-built transformations
for case conversions, for normalization conversions, for the removal
of given characters, and also for a variety of language and script
transliterations. Transforms can be chained together to perform
a series of operations and each step of the process can use a
UnicodeSet to restrict the characters that are affected.

To get the list of available transforms,
call \code{\link{stri_trans_list}}.

Note that transliterators are often combined in sequence
to achieve a desired transformation.
This is analogous to the composition of mathematical functions.
For example, given a script that converts lowercase ASCII characters
from Latin script to Katakana script, it is convenient to first
(1) separate input base characters and accents, and then (2)
convert uppercase to lowercase.
To achieve this, a compound transform can be specified as follows:
\code{NFKD; Lower; Latin-Katakana;} (with the default \code{rules=FALSE}).

Custom rule-based transliteration is also supported, see the \pkg{ICU}
manual and below for some examples.
}
\examples{
stri_trans_general('gro\u00df', 'latin-ascii')
stri_trans_general('stringi', 'latin-greek')
stri_trans_general('stringi', 'latin-cyrillic')
stri_trans_general('stringi', 'upper') # see stri_trans_toupper
stri_trans_general('\u0104', 'nfd; lower') # compound id; see stri_trans_nfd
stri_trans_general('Marek G\u0105golewski', 'pl-pl_FONIPA')
stri_trans_general('\u2620', 'any-name') # character name
stri_trans_general('\\\\N{latin small letter a}', 'name-any') # decode name
stri_trans_general('\u2620', 'hex/c') # to hex
stri_trans_general("\u201C\u2026\u201D \u0105\u015B\u0107\u017C",
    "NFKD; NFC; [^\\\\p{L}] latin-ascii")

x <- "\uC885\uB85C\uAD6C \uC0AC\uC9C1\uB3D9"
stringi::stri_trans_general(x, "Hangul-Latin")
# Deviate from the ICU rules of romanisation of Korean,
# see https://en.wikipedia.org/wiki/Romanization_of_Korean
id <- "
    :: NFD;
    \u11A8 > k;
    \u11AE > t;
    \u11B8 > p;
    \u1105 > r;
    :: Hangul-Latin;
"
stringi::stri_trans_general(x, id, rules=TRUE)


}
\references{
\emph{General Transforms} -- ICU User Guide,
\url{https://unicode-org.github.io/icu/userguide/transforms/general/}
}
\seealso{
The official online manual of \pkg{stringi} at \url{https://stringi.gagolewski.com/}

Gagolewski M., \pkg{stringi}: Fast and portable character string processing in R, \emph{Journal of Statistical Software} 103(2), 2022, 1-59, \doi{10.18637/jss.v103.i02}

Other transform: 
\code{\link{stri_trans_char}()},
\code{\link{stri_trans_list}()},
\code{\link{stri_trans_nfc}()},
\code{\link{stri_trans_tolower}()}
}
\concept{transform}
\author{
\href{https://www.gagolewski.com/}{Marek Gagolewski} and other contributors
}