1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/locale.R
\name{dplyr-locale}
\alias{dplyr-locale}
\title{Locale used by \code{arrange()}}
\description{
This page documents details about the locale used by \code{\link[=arrange]{arrange()}} when
ordering character vectors.
\subsection{Default locale}{
The default locale used by \code{arrange()} is the C locale. This is used when
\code{.locale = NULL} unless the \code{dplyr.legacy_locale} global option is set to
\code{TRUE}. You can also force the C locale to be used unconditionally with
\code{.locale = "C"}.
The C locale is not exactly the same as English locales, such as \code{"en"}. The
main difference is that the C locale groups the English alphabet by \emph{case},
while most English locales group the alphabet by \emph{letter}. For example,
\code{c("a", "b", "C", "B", "c")} will sort as \code{c("B", "C", "a", "b", "c")} in the
C locale, with all uppercase letters coming before lowercase letters, but
will sort as \code{c("a", "b", "B", "c", "C")} in an English locale. This often
makes little practical difference during data analysis, because both return
identical results when case is consistent between observations.
}
\subsection{Reproducibility}{
The C locale has the benefit of being completely reproducible across all
supported R versions and operating systems with no extra effort.
If you set \code{.locale} to an option from \code{\link[stringi:stri_locale_list]{stringi::stri_locale_list()}}, then
stringi must be installed by anyone who wants to run your code. If you
utilize this in a package, then stringi should be placed in \code{Imports}.
}
\subsection{Legacy behavior}{
Prior to dplyr 1.1.0, character columns were ordered in the system locale. If
you need to temporarily revert to this behavior, you can set the global
option \code{dplyr.legacy_locale} to \code{TRUE}, but this should be used sparingly and
you should expect this option to be removed in a future version of dplyr. It
is better to update existing code to explicitly use \code{.locale} instead. Note
that setting \code{dplyr.legacy_locale} will also force calls to \code{\link[=group_by]{group_by()}} to
use the system locale when internally ordering the groups.
Setting \code{.locale} will override any usage of \code{dplyr.legacy_locale}.
}
}
\examples{
\dontshow{if (dplyr:::has_minimum_stringi()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf}
df <- tibble(x = c("a", "b", "C", "B", "c"))
df
# Default locale is C, which groups the English alphabet by case, placing
# uppercase letters before lowercase letters.
arrange(df, x)
# The American English locale groups the alphabet by letter.
# Explicitly override `.locale` with `"en"` for this ordering.
arrange(df, x, .locale = "en")
# This Danish letter is expected to sort after `z`
df <- tibble(x = c("o", "p", "\u00F8", "z"))
df
# The American English locale sorts it right after `o`
arrange(df, x, .locale = "en")
# Using `"da"` for Danish ordering gives the expected result
arrange(df, x, .locale = "da")
# If you need the legacy behavior of `arrange()`, which respected the
# system locale, then you can set the global option `dplyr.legacy_locale`,
# but expect this to be removed in the future. We recommend that you use
# the `.locale` argument instead.
rlang::with_options(dplyr.legacy_locale = TRUE, {
arrange(df, x)
})
\dontshow{\}) # examplesIf}
}
\keyword{internal}
|