File: cutFancy.Rd

package info (click to toggle)
r-cran-rockchalk 1.8.144%2Bdfsg-3
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 3,768 kB
  • sloc: sh: 13; makefile: 2
file content (132 lines) | stat: -rw-r--r-- 5,516 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/cutFancy.R
\name{cutFancy}
\alias{cutFancy}
\title{Create an ordinal variable by grouping numeric data input.}
\usage{
cutFancy(y, cutpoints = "quantile", probs, categories)
}
\arguments{
\item{y}{The input data from which the categorized variable will
be created.}

\item{cutpoints}{Optional paramter, a vector of thresholds at
which to cut the data. If it is not supplied, the default
value \code{cutpoints="quantile"} will take effect. Users can
supplement with \code{probs} and/or \code{categories} as shown
in examples.}

\item{probs}{This is an optional parameter, relevant only when the
R function \code{\link{quantile}} function is used to
calculate cutpoints. The length should be number of desired
categories PLUS ONE, as in \code{c(0, .3, .6, 1)}. That will
create categories that represent 1) less than .3, between .3
and .6, and above .6.  A common user error is to specify only
the internal divider values, such as \code{probs = c(.3,
.6)}. To anticipate and correct that error, this function will
insert the lower limit of 0 and the upper limit of 1 if they
are not already present in \code{probs}.}

\item{categories}{Can be a number to designate the number of
sub-groups created, or it can be a vector of names used. If
\code{cutpoints} and \code{probs} are not specified, the
parameter \code{categories} should be an integer to specify
how many data groups to create.It is required if
cutpoints="quantile" and probs is not specified. Can also be a
vector of names to be used for the categories that are
created. If category names are not provided, the names for the
ordinal variable will be the midpoint of the numeric range
from which they are constructed.}
}
\value{
an ordinal vector with attributes "cutpoints" and "props"
    (proportions)
}
\description{
This is a convenience function for usage of R's \code{cut}
function. Users can specify cutpoints or category labels or
desired proportions of groups in various ways. In that way, it has
a more flexible interface than \code{cut}. It also tries to notice
and correct some common user errors, such as omitting the outer
boundaries from the probs argument. The returned values are
labeled by their midpoints, rather than cut's usual boundaries.
}
\details{
The dividing points, thought of as "thresholds" or "cutpoints",
can be specified in several ways.  \code{cutFancy} will
automatically create equally-sized sets of observations for a
given number of categories if neither \code{probs} nor
\code{cutpoints} is specified. The bare minimum input needed is
\code{categories=5}, for example, to ask for 5 equally sized
groups. More user control can be had by specifying either
\code{cutpoints} or \code{probs}. If \code{cutpoints} is not
specified at all, or if \code{cutpoints="quantile"}, then
\code{probs} can be used to specify the proportions of the data
points that are to fall within each range. On the other hand, one
can specify \code{cutpoints = "quantile"} and then \code{probs} will
be used to specify the proportions of the data points that are to
fall within each range.

If \code{categories} is not specified, the category names will be
created. Names for ordinal categories will be the numerical
midpoints for the outcomes.  Perhaps this will deviate from your
expectation, which might be ordinal categories name "0", "1", "2",
and so forth.  The numerically labeled values we provide can be
used in various ways during the analysis process. Read "?factor"
to learn ways to convert the ordinal output to other
formats. Examples include various ways of converting the ordinal
output to numeric.

The \code{categories} parameter works together with
\code{cutpoints}.  \code{cutpoints} allows a character string
"quantile". If \code{cutpoints} is not specified, or if the user
specifies a character string \code{cutpoints="quantile"}, then the
\code{probs} would be used to determine the cutpoints.  However,
if \code{probs} is not specified, then the \code{categories}
argument can be used. If \code{cutpoints="quantile"}, then

\itemize{
\item if \code{categories} is one integer, then it is interpreted
as the number of "equally sized" categories to be created, or

\item \code{categories} can be a vector of names. The length
of the vector is used to determine the number of categories, and
the values are put to use as factor labels.
}
}
\examples{
set.seed(234234)
y <- rnorm(1000, m = 35, sd = 14)
yord <- cutFancy(y, cutpoints = c(30, 40, 50))
table(yord)
attr(yord, "props")
attr(yord, "cutpoints")
yord <- cutFancy(y, categories = 4L)
table(yord, exclude = NULL)
attr(yord, "props")
attr(yord, "cutpoints")
yord <- cutFancy(y, probs = c(0, .1, .3, .7, .9, 1.0),
                  categories = c("A", "B", "C", "D", "E"))
table(yord, exclude = NULL)
attr(yord, "props")
attr(yord, "cutpoints")
yord <- cutFancy(y, probs = c(0, .1, .3, .7, .9, 1.0))
table(yord, exclude = NULL)
attr(yord, "props")
attr(yord, "cutpoints")
yasinteger <- as.integer(yord)
table(yasinteger, yord)
yasnumeric <- as.numeric(levels(yord))[yord]
table(yasnumeric, yord)
barplot(attr(yord, "props"))
hist(yasnumeric)
X1a <-
   genCorrelatedData3("y ~ 1.1 + 2.1 * x1 + 3 * x2 + 3.5 * x3 + 1.1 * x1:x3",
                       N = 10000, means = c(x1 = 1, x2 = -1, x3 = 3),
                       sds = 1, rho = 0.4)
## Create cutpoints from quantiles
probs <- c(.3, .6)
X1a$yord <- cutFancy(X1a$y, probs = probs)
attributes(X1a$yord)
table(X1a$yord, exclude = NULL)
}