File: group_var.Rd

package info (click to toggle)
r-cran-sjmisc 2.8.10-1
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 1,232 kB
  • sloc: sh: 13; makefile: 2
file content (173 lines) | stat: -rw-r--r-- 6,731 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/group_var.R
\name{group_var}
\alias{group_var}
\alias{group_var_if}
\alias{group_labels}
\alias{group_labels_if}
\title{Recode numeric variables into equal-ranged groups}
\usage{
group_var(
  x,
  ...,
  size = 5,
  as.num = TRUE,
  right.interval = FALSE,
  n = 30,
  append = TRUE,
  suffix = "_gr"
)

group_var_if(
  x,
  predicate,
  size = 5,
  as.num = TRUE,
  right.interval = FALSE,
  n = 30,
  append = TRUE,
  suffix = "_gr"
)

group_labels(x, ..., size = 5, right.interval = FALSE, n = 30)

group_labels_if(x, predicate, size = 5, right.interval = FALSE, n = 30)
}
\arguments{
\item{x}{A vector or data frame.}

\item{...}{Optional, unquoted names of variables that should be selected for
further processing. Required, if \code{x} is a data frame (and no
vector) and only selected variables from \code{x} should be processed.
You may also use functions like \code{:} or tidyselect's
select-helpers.
See 'Examples' or \href{../doc/design_philosophy.html}{package-vignette}.}

\item{size}{Numeric; group-size, i.e. the range for grouping. By default,
for each 5 categories of \code{x} a new group is defined, i.e. \code{size = 5}.
Use \code{size = "auto"} to automatically resize a variable into a maximum
of 30 groups (which is the ggplot-default grouping when plotting
histograms). Use \code{n} to determine the amount of groups.}

\item{as.num}{Logical, if \code{TRUE}, return value will be numeric, not a factor.}

\item{right.interval}{Logical; if \code{TRUE}, grouping starts with the lower
bound of \code{size}. See 'Details'.}

\item{n}{Sets the maximum number of groups that are defined when auto-grouping is on
(\code{size = "auto"}). Default is 30. If \code{size} is not set to \code{"auto"},
this argument will be ignored.}

\item{append}{Logical, if \code{TRUE} (the default) and \code{x} is a data frame,
\code{x} including the new variables as additional columns is returned;
if \code{FALSE}, only the new variables are returned.}

\item{suffix}{Indicates which suffix will be added to each dummy variable.
Use \code{"numeric"} to number dummy variables, e.g. \emph{x_1},
\emph{x_2}, \emph{x_3} etc. Use \code{"label"} to add value label,
e.g. \emph{x_low}, \emph{x_mid}, \emph{x_high}. May be abbreviated.}

\item{predicate}{A predicate function to be applied to the columns. The
variables for which \code{predicate} returns \code{TRUE} are selected.}
}
\value{
\itemize{
    \item For \code{group_var()}, a grouped variable, either as numeric or as factor (see paramter \code{as.num}). If \code{x} is a data frame, only the grouped variables will be returned.
    \item For \code{group_labels()}, a string vector or a list of string vectors containing labels based on the grouped categories of \code{x}, formatted as "from lower bound to upper bound", e.g. \code{"10-19"  "20-29"  "30-39"} etc. See 'Examples'.
  }
}
\description{
Recode numeric variables into equal ranged, grouped factors,
  i.e. a variable is cut into a smaller number of groups, where each group
  has the same value range. \code{group_labels()} creates the related value
  labels. \code{group_var_if()} and \code{group_labels_if()} are scoped
  variants of \code{group_var()} and \code{group_labels()}, where grouping
  will be applied only to those variables that match the logical condition
  of \code{predicate}.
}
\details{
If \code{size} is set to a specific value, the variable is recoded
  into several groups, where each group has a maximum range of \code{size}.
  Hence, the amount of groups differ depending on the range of \code{x}.
  \cr \cr
  If \code{size = "auto"}, the variable is recoded into a maximum of
  \code{n} groups. Hence, independent from the range of
  \code{x}, always the same amount of groups are created, so the range
  within each group differs (depending on \code{x}'s range).
  \cr \cr
  \code{right.interval} determins which boundary values to include when
  grouping is done. If \code{TRUE}, grouping starts with the \strong{lower
  bound} of \code{size}. For example, having a variable ranging from
  50 to 80, groups cover the ranges from  50-54, 55-59, 60-64 etc.
  If \code{FALSE} (default), grouping starts with the \code{upper bound}
  of \code{size}. In this case, groups cover the ranges from
  46-50, 51-55, 56-60, 61-65 etc. \strong{Note:} This will cover
  a range from 46-50 as first group, even if values from 46 to 49
  are not present. See 'Examples'.
  \cr \cr
  If you want to split a variable into a certain amount of equal
  sized groups (instead of having groups where values have all the same
  range), use the \code{\link{split_var}} function!
  \cr \cr
  \code{group_var()} also works on grouped data frames (see \code{\link[dplyr]{group_by}}).
  In this case, grouping is applied to the subsets of variables
  in \code{x}. See 'Examples'.
}
\note{
Variable label attributes (see, for instance,
  \code{\link[sjlabelled]{set_label}}) are preserved. Usually you should use
  the same values for \code{size} and \code{right.interval} in
  \code{group_labels()} as used in the \code{group_var} function if you want
  matching labels for the related recoded variable.
}
\examples{
age <- abs(round(rnorm(100, 65, 20)))
age.grp <- group_var(age, size = 10)
hist(age)
hist(age.grp)

age.grpvar <- group_labels(age, size = 10)
table(age.grp)
print(age.grpvar)

# histogram with EUROFAMCARE sample dataset
# variable not grouped
library(sjlabelled)
data(efc)
hist(efc$e17age, main = get_label(efc$e17age))

# bar plot with EUROFAMCARE sample dataset
# grouped variable
ageGrp <- group_var(efc$e17age)
ageGrpLab <- group_labels(efc$e17age)
barplot(table(ageGrp), main = get_label(efc$e17age), names.arg = ageGrpLab)

# within a pipe-chain
library(dplyr)
efc \%>\%
  select(e17age, c12hour, c160age) \%>\%
  group_var(size = 20)

# create vector with values from 50 to 80
dummy <- round(runif(200, 50, 80))
# labels with grouping starting at lower bound
group_labels(dummy)
# labels with grouping startint at upper bound
group_labels(dummy, right.interval = TRUE)

# works also with gouped data frames
mtcars \%>\%
  group_var(disp, size = 4, append = FALSE) \%>\%
  table()

mtcars \%>\%
  group_by(cyl) \%>\%
  group_var(disp, size = 4, append = FALSE) \%>\%
  table()
}
\seealso{
\code{\link{split_var}} to split variables into equal sized groups,
  \code{\link{group_str}} for grouping string vectors or
  \code{\link{rec_pattern}} and \code{\link{rec}} for another convenient
  way of recoding variables into smaller groups.
}