1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/group_var.R
\name{group_var}
\alias{group_var}
\alias{group_var_if}
\alias{group_labels}
\alias{group_labels_if}
\title{Recode numeric variables into equal-ranged groups}
\usage{
group_var(
x,
...,
size = 5,
as.num = TRUE,
right.interval = FALSE,
n = 30,
append = TRUE,
suffix = "_gr"
)
group_var_if(
x,
predicate,
size = 5,
as.num = TRUE,
right.interval = FALSE,
n = 30,
append = TRUE,
suffix = "_gr"
)
group_labels(x, ..., size = 5, right.interval = FALSE, n = 30)
group_labels_if(x, predicate, size = 5, right.interval = FALSE, n = 30)
}
\arguments{
\item{x}{A vector or data frame.}
\item{...}{Optional, unquoted names of variables that should be selected for
further processing. Required, if \code{x} is a data frame (and no
vector) and only selected variables from \code{x} should be processed.
You may also use functions like \code{:} or tidyselect's
select-helpers.
See 'Examples' or \href{../doc/design_philosophy.html}{package-vignette}.}
\item{size}{Numeric; group-size, i.e. the range for grouping. By default,
for each 5 categories of \code{x} a new group is defined, i.e. \code{size = 5}.
Use \code{size = "auto"} to automatically resize a variable into a maximum
of 30 groups (which is the ggplot-default grouping when plotting
histograms). Use \code{n} to determine the amount of groups.}
\item{as.num}{Logical, if \code{TRUE}, return value will be numeric, not a factor.}
\item{right.interval}{Logical; if \code{TRUE}, grouping starts with the lower
bound of \code{size}. See 'Details'.}
\item{n}{Sets the maximum number of groups that are defined when auto-grouping is on
(\code{size = "auto"}). Default is 30. If \code{size} is not set to \code{"auto"},
this argument will be ignored.}
\item{append}{Logical, if \code{TRUE} (the default) and \code{x} is a data frame,
\code{x} including the new variables as additional columns is returned;
if \code{FALSE}, only the new variables are returned.}
\item{suffix}{Indicates which suffix will be added to each dummy variable.
Use \code{"numeric"} to number dummy variables, e.g. \emph{x_1},
\emph{x_2}, \emph{x_3} etc. Use \code{"label"} to add value label,
e.g. \emph{x_low}, \emph{x_mid}, \emph{x_high}. May be abbreviated.}
\item{predicate}{A predicate function to be applied to the columns. The
variables for which \code{predicate} returns \code{TRUE} are selected.}
}
\value{
\itemize{
\item For \code{group_var()}, a grouped variable, either as numeric or as factor (see paramter \code{as.num}). If \code{x} is a data frame, only the grouped variables will be returned.
\item For \code{group_labels()}, a string vector or a list of string vectors containing labels based on the grouped categories of \code{x}, formatted as "from lower bound to upper bound", e.g. \code{"10-19" "20-29" "30-39"} etc. See 'Examples'.
}
}
\description{
Recode numeric variables into equal ranged, grouped factors,
i.e. a variable is cut into a smaller number of groups, where each group
has the same value range. \code{group_labels()} creates the related value
labels. \code{group_var_if()} and \code{group_labels_if()} are scoped
variants of \code{group_var()} and \code{group_labels()}, where grouping
will be applied only to those variables that match the logical condition
of \code{predicate}.
}
\details{
If \code{size} is set to a specific value, the variable is recoded
into several groups, where each group has a maximum range of \code{size}.
Hence, the amount of groups differ depending on the range of \code{x}.
\cr \cr
If \code{size = "auto"}, the variable is recoded into a maximum of
\code{n} groups. Hence, independent from the range of
\code{x}, always the same amount of groups are created, so the range
within each group differs (depending on \code{x}'s range).
\cr \cr
\code{right.interval} determins which boundary values to include when
grouping is done. If \code{TRUE}, grouping starts with the \strong{lower
bound} of \code{size}. For example, having a variable ranging from
50 to 80, groups cover the ranges from 50-54, 55-59, 60-64 etc.
If \code{FALSE} (default), grouping starts with the \code{upper bound}
of \code{size}. In this case, groups cover the ranges from
46-50, 51-55, 56-60, 61-65 etc. \strong{Note:} This will cover
a range from 46-50 as first group, even if values from 46 to 49
are not present. See 'Examples'.
\cr \cr
If you want to split a variable into a certain amount of equal
sized groups (instead of having groups where values have all the same
range), use the \code{\link{split_var}} function!
\cr \cr
\code{group_var()} also works on grouped data frames (see \code{\link[dplyr]{group_by}}).
In this case, grouping is applied to the subsets of variables
in \code{x}. See 'Examples'.
}
\note{
Variable label attributes (see, for instance,
\code{\link[sjlabelled]{set_label}}) are preserved. Usually you should use
the same values for \code{size} and \code{right.interval} in
\code{group_labels()} as used in the \code{group_var} function if you want
matching labels for the related recoded variable.
}
\examples{
age <- abs(round(rnorm(100, 65, 20)))
age.grp <- group_var(age, size = 10)
hist(age)
hist(age.grp)
age.grpvar <- group_labels(age, size = 10)
table(age.grp)
print(age.grpvar)
# histogram with EUROFAMCARE sample dataset
# variable not grouped
library(sjlabelled)
data(efc)
hist(efc$e17age, main = get_label(efc$e17age))
# bar plot with EUROFAMCARE sample dataset
# grouped variable
ageGrp <- group_var(efc$e17age)
ageGrpLab <- group_labels(efc$e17age)
barplot(table(ageGrp), main = get_label(efc$e17age), names.arg = ageGrpLab)
# within a pipe-chain
library(dplyr)
efc \%>\%
select(e17age, c12hour, c160age) \%>\%
group_var(size = 20)
# create vector with values from 50 to 80
dummy <- round(runif(200, 50, 80))
# labels with grouping starting at lower bound
group_labels(dummy)
# labels with grouping startint at upper bound
group_labels(dummy, right.interval = TRUE)
# works also with gouped data frames
mtcars \%>\%
group_var(disp, size = 4, append = FALSE) \%>\%
table()
mtcars \%>\%
group_by(cyl) \%>\%
group_var(disp, size = 4, append = FALSE) \%>\%
table()
}
\seealso{
\code{\link{split_var}} to split variables into equal sized groups,
\code{\link{group_str}} for grouping string vectors or
\code{\link{rec_pattern}} and \code{\link{rec}} for another convenient
way of recoding variables into smaller groups.
}
|