File: dplyr-locale.Rd

package info (click to toggle)
r-cran-dplyr 1.1.4-4
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 4,292 kB
sloc: cpp: 1,403; sh: 17; makefile: 7
file content (81 lines) | stat: -rw-r--r-- 3,451 bytes
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/locale.R
\name{dplyr-locale}
\alias{dplyr-locale}
\title{Locale used by \code{arrange()}}
\description{
This page documents details about the locale used by \code{\link[=arrange]{arrange()}} when
ordering character vectors.
\subsection{Default locale}{

The default locale used by \code{arrange()} is the C locale. This is used when
\code{.locale = NULL} unless the \code{dplyr.legacy_locale} global option is set to
\code{TRUE}. You can also force the C locale to be used unconditionally with
\code{.locale = "C"}.

The C locale is not exactly the same as English locales, such as \code{"en"}. The
main difference is that the C locale groups the English alphabet by \emph{case},
while most English locales group the alphabet by \emph{letter}. For example,
\code{c("a", "b", "C", "B", "c")} will sort as \code{c("B", "C", "a", "b", "c")} in the
C locale, with all uppercase letters coming before lowercase letters, but
will sort as \code{c("a", "b", "B", "c", "C")} in an English locale. This often
makes little practical difference during data analysis, because both return
identical results when case is consistent between observations.
}

\subsection{Reproducibility}{

The C locale has the benefit of being completely reproducible across all
supported R versions and operating systems with no extra effort.

If you set \code{.locale} to an option from \code{\link[stringi:stri_locale_list]{stringi::stri_locale_list()}}, then
stringi must be installed by anyone who wants to run your code. If you
utilize this in a package, then stringi should be placed in \code{Imports}.
}

\subsection{Legacy behavior}{

Prior to dplyr 1.1.0, character columns were ordered in the system locale. If
you need to temporarily revert to this behavior, you can set the global
option \code{dplyr.legacy_locale} to \code{TRUE}, but this should be used sparingly and
you should expect this option to be removed in a future version of dplyr. It
is better to update existing code to explicitly use \code{.locale} instead. Note
that setting \code{dplyr.legacy_locale} will also force calls to \code{\link[=group_by]{group_by()}} to
use the system locale when internally ordering the groups.

Setting \code{.locale} will override any usage of \code{dplyr.legacy_locale}.
}
}
\examples{
\dontshow{if (dplyr:::has_minimum_stringi()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf}
df <- tibble(x = c("a", "b", "C", "B", "c"))
df

# Default locale is C, which groups the English alphabet by case, placing
# uppercase letters before lowercase letters.
arrange(df, x)

# The American English locale groups the alphabet by letter.
# Explicitly override `.locale` with `"en"` for this ordering.
arrange(df, x, .locale = "en")

# This Danish letter is expected to sort after `z`
df <- tibble(x = c("o", "p", "\u00F8", "z"))
df

# The American English locale sorts it right after `o`
arrange(df, x, .locale = "en")

# Using `"da"` for Danish ordering gives the expected result
arrange(df, x, .locale = "da")

# If you need the legacy behavior of `arrange()`, which respected the
# system locale, then you can set the global option `dplyr.legacy_locale`,
# but expect this to be removed in the future. We recommend that you use
# the `.locale` argument instead.
rlang::with_options(dplyr.legacy_locale = TRUE, {
  arrange(df, x)
})
\dontshow{\}) # examplesIf}
}
\keyword{internal}