File: group_var.Rd

package info (click to toggle)
r-cran-sjmisc 2.8.10-1
links: PTS, VCS
area: main
in suites: sid, trixie
size: 1,232 kB
sloc: sh: 13; makefile: 2
file content (173 lines) | stat: -rw-r--r-- 6,731 bytes
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/group_var.R
\name{group_var}
\alias{group_var}
\alias{group_var_if}
\alias{group_labels}
\alias{group_labels_if}
\title{Recode numeric variables into equal-ranged groups}
\usage{
group_var(
  x,
  ...,
  size = 5,
  as.num = TRUE,
  right.interval = FALSE,
  n = 30,
  append = TRUE,
  suffix = "_gr"
)

group_var_if(
  x,
  predicate,
  size = 5,
  as.num = TRUE,
  right.interval = FALSE,
  n = 30,
  append = TRUE,
  suffix = "_gr"
)

group_labels(x, ..., size = 5, right.interval = FALSE, n = 30)

group_labels_if(x, predicate, size = 5, right.interval = FALSE, n = 30)
}
\arguments{
\item{x}{A vector or data frame.}

\item{...}{Optional, unquoted names of variables that should be selected for
further processing. Required, if \code{x} is a data frame (and no
vector) and only selected variables from \code{x} should be processed.
You may also use functions like \code{:} or tidyselect's
select-helpers.
See 'Examples' or \href{../doc/design_philosophy.html}{package-vignette}.}

\item{size}{Numeric; group-size, i.e. the range for grouping. By default,
for each 5 categories of \code{x} a new group is defined, i.e. \code{size = 5}.
Use \code{size = "auto"} to automatically resize a variable into a maximum
of 30 groups (which is the ggplot-default grouping when plotting
histograms). Use \code{n} to determine the amount of groups.}

\item{as.num}{Logical, if \code{TRUE}, return value will be numeric, not a factor.}

\item{right.interval}{Logical; if \code{TRUE}, grouping starts with the lower
bound of \code{size}. See 'Details'.}

\item{n}{Sets the maximum number of groups that are defined when auto-grouping is on
(\code{size = "auto"}). Default is 30. If \code{size} is not set to \code{"auto"},
this argument will be ignored.}

\item{append}{Logical, if \code{TRUE} (the default) and \code{x} is a data frame,
\code{x} including the new variables as additional columns is returned;
if \code{FALSE}, only the new variables are returned.}

\item{suffix}{Indicates which suffix will be added to each dummy variable.
Use \code{"numeric"} to number dummy variables, e.g. \emph{x_1},
\emph{x_2}, \emph{x_3} etc. Use \code{"label"} to add value label,
e.g. \emph{x_low}, \emph{x_mid}, \emph{x_high}. May be abbreviated.}

\item{predicate}{A predicate function to be applied to the columns. The
variables for which \code{predicate} returns \code{TRUE} are selected.}
}
\value{
\itemize{
    \item For \code{group_var()}, a grouped variable, either as numeric or as factor (see paramter \code{as.num}). If \code{x} is a data frame, only the grouped variables will be returned.
    \item For \code{group_labels()}, a string vector or a list of string vectors containing labels based on the grouped categories of \code{x}, formatted as "from lower bound to upper bound", e.g. \code{"10-19"  "20-29"  "30-39"} etc. See 'Examples'.
  }
}
\description{
Recode numeric variables into equal ranged, grouped factors,
  i.e. a variable is cut into a smaller number of groups, where each group
  has the same value range. \code{group_labels()} creates the related value
  labels. \code{group_var_if()} and \code{group_labels_if()} are scoped
  variants of \code{group_var()} and \code{group_labels()}, where grouping
  will be applied only to those variables that match the logical condition
  of \code{predicate}.
}
\details{
If \code{size} is set to a specific value, the variable is recoded
  into several groups, where each group has a maximum range of \code{size}.
  Hence, the amount of groups differ depending on the range of \code{x}.
  \cr \cr
  If \code{size = "auto"}, the variable is recoded into a maximum of
  \code{n} groups. Hence, independent from the range of
  \code{x}, always the same amount of groups are created, so the range
  within each group differs (depending on \code{x}'s range).
  \cr \cr
  \code{right.interval} determins which boundary values to include when
  grouping is done. If \code{TRUE}, grouping starts with the \strong{lower
  bound} of \code{size}. For example, having a variable ranging from
  50 to 80, groups cover the ranges from  50-54, 55-59, 60-64 etc.
  If \code{FALSE} (default), grouping starts with the \code{upper bound}
  of \code{size}. In this case, groups cover the ranges from
  46-50, 51-55, 56-60, 61-65 etc. \strong{Note:} This will cover
  a range from 46-50 as first group, even if values from 46 to 49
  are not present. See 'Examples'.
  \cr \cr
  If you want to split a variable into a certain amount of equal
  sized groups (instead of having groups where values have all the same
  range), use the \code{\link{split_var}} function!
  \cr \cr
  \code{group_var()} also works on grouped data frames (see \code{\link[dplyr]{group_by}}).
  In this case, grouping is applied to the subsets of variables
  in \code{x}. See 'Examples'.
}
\note{
Variable label attributes (see, for instance,
  \code{\link[sjlabelled]{set_label}}) are preserved. Usually you should use
  the same values for \code{size} and \code{right.interval} in
  \code{group_labels()} as used in the \code{group_var} function if you want
  matching labels for the related recoded variable.
}
\examples{
age <- abs(round(rnorm(100, 65, 20)))
age.grp <- group_var(age, size = 10)
hist(age)
hist(age.grp)

age.grpvar <- group_labels(age, size = 10)
table(age.grp)
print(age.grpvar)

# histogram with EUROFAMCARE sample dataset
# variable not grouped
library(sjlabelled)
data(efc)
hist(efc$e17age, main = get_label(efc$e17age))

# bar plot with EUROFAMCARE sample dataset
# grouped variable
ageGrp <- group_var(efc$e17age)
ageGrpLab <- group_labels(efc$e17age)
barplot(table(ageGrp), main = get_label(efc$e17age), names.arg = ageGrpLab)

# within a pipe-chain
library(dplyr)
efc \%>\%
  select(e17age, c12hour, c160age) \%>\%
  group_var(size = 20)

# create vector with values from 50 to 80
dummy <- round(runif(200, 50, 80))
# labels with grouping starting at lower bound
group_labels(dummy)
# labels with grouping startint at upper bound
group_labels(dummy, right.interval = TRUE)

# works also with gouped data frames
mtcars \%>\%
  group_var(disp, size = 4, append = FALSE) \%>\%
  table()

mtcars \%>\%
  group_by(cyl) \%>\%
  group_var(disp, size = 4, append = FALSE) \%>\%
  table()
}
\seealso{
\code{\link{split_var}} to split variables into equal sized groups,
  \code{\link{group_str}} for grouping string vectors or
  \code{\link{rec_pattern}} and \code{\link{rec}} for another convenient
  way of recoding variables into smaller groups.
}