File: combine.levels.Rd

package info (click to toggle)
hmisc 5.2-5-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 4,044 kB
  • sloc: asm: 28,907; f90: 590; ansic: 415; xml: 160; fortran: 75; makefile: 2
file content (59 lines) | stat: -rw-r--r-- 2,713 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/combine.levels.r
\name{combine.levels}
\alias{combine.levels}
\title{combine.levels}
\usage{
combine.levels(
  x,
  minlev = 0.05,
  m,
  ord = is.ordered(x),
  plevels = FALSE,
  sep = ","
)
}
\arguments{
\item{x}{a factor, `ordered` factor, or numeric or character variable that will be turned into a `factor`}

\item{minlev}{the minimum proportion of observations in a cell before that cell is combined with one or more cells.  If more than one cell has fewer than minlev*n observations, all such cells are combined into a new cell labeled `"OTHER"`.  Otherwise, the lowest frequency cell is combined with the next lowest frequency cell, and the level name is the combination of the two old level levels. When `ord=TRUE` combinations happen only for consecutive levels.}

\item{m}{alternative to `minlev`, is the minimum number of observations in a cell before it will be combined with others}

\item{ord}{set to `TRUE` to treat `x` as if it were an ordered factor, which allows only consecutive levels to be combined}

\item{plevels}{by default `combine.levels` pools low-frequency levels into a category named `OTHER` when `x` is not ordered and `ord=FALSE`.  To instead name this category the concatenation of all the pooled level names, separated by a comma, set `plevels=TRUE`.}

\item{sep}{the separator for concatenating levels when `plevels=TRUE`}
}
\value{
a factor variable, or if `ord=TRUE` an ordered factor variable
}
\description{
Combine Infrequent Levels of a Categorical Variable
}
\details{
After turning `x` into a `factor` if it is not one already, combines
levels of `x` whose frequency falls below a specified relative frequency `minlev` or absolute count `m`.  When `x` is not treated as ordered, all of the
small frequency levels are combined into `"OTHER"`, unless `plevels=TRUE`.
When `ord=TRUE` or `x` is an ordered factor, only consecutive levels
are combined.  New levels are constructed by concatenating the levels with
`sep` as a separator.  This is useful when comparing ordinal regression
with polytomous (multinomial) regression and there are too many
categories for polytomous regression.  `combine.levels` is also useful
when assumptions of ordinal models are being checked empirically by
computing exceedance probabilities for various cutoffs of the
dependent variable.
}
\examples{
x <- c(rep('A', 1), rep('B', 3), rep('C', 4), rep('D',1), rep('E',1))
combine.levels(x, m=3)
combine.levels(x, m=3, plevels=TRUE)
combine.levels(x, ord=TRUE, m=3)
x <- c(rep('A', 1), rep('B', 3), rep('C', 4), rep('D',1), rep('E',1),
       rep('F',1))
combine.levels(x, ord=TRUE, m=3)
}
\author{
Frank Harrell
}