1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/combine.levels.r
\name{combine.levels}
\alias{combine.levels}
\title{combine.levels}
\usage{
combine.levels(
x,
minlev = 0.05,
m,
ord = is.ordered(x),
plevels = FALSE,
sep = ","
)
}
\arguments{
\item{x}{a factor, `ordered` factor, or numeric or character variable that will be turned into a `factor`}
\item{minlev}{the minimum proportion of observations in a cell before that cell is combined with one or more cells. If more than one cell has fewer than minlev*n observations, all such cells are combined into a new cell labeled `"OTHER"`. Otherwise, the lowest frequency cell is combined with the next lowest frequency cell, and the level name is the combination of the two old level levels. When `ord=TRUE` combinations happen only for consecutive levels.}
\item{m}{alternative to `minlev`, is the minimum number of observations in a cell before it will be combined with others}
\item{ord}{set to `TRUE` to treat `x` as if it were an ordered factor, which allows only consecutive levels to be combined}
\item{plevels}{by default `combine.levels` pools low-frequency levels into a category named `OTHER` when `x` is not ordered and `ord=FALSE`. To instead name this category the concatenation of all the pooled level names, separated by a comma, set `plevels=TRUE`.}
\item{sep}{the separator for concatenating levels when `plevels=TRUE`}
}
\value{
a factor variable, or if `ord=TRUE` an ordered factor variable
}
\description{
Combine Infrequent Levels of a Categorical Variable
}
\details{
After turning `x` into a `factor` if it is not one already, combines
levels of `x` whose frequency falls below a specified relative frequency `minlev` or absolute count `m`. When `x` is not treated as ordered, all of the
small frequency levels are combined into `"OTHER"`, unless `plevels=TRUE`.
When `ord=TRUE` or `x` is an ordered factor, only consecutive levels
are combined. New levels are constructed by concatenating the levels with
`sep` as a separator. This is useful when comparing ordinal regression
with polytomous (multinomial) regression and there are too many
categories for polytomous regression. `combine.levels` is also useful
when assumptions of ordinal models are being checked empirically by
computing exceedance probabilities for various cutoffs of the
dependent variable.
}
\examples{
x <- c(rep('A', 1), rep('B', 3), rep('C', 4), rep('D',1), rep('E',1))
combine.levels(x, m=3)
combine.levels(x, m=3, plevels=TRUE)
combine.levels(x, ord=TRUE, m=3)
x <- c(rep('A', 1), rep('B', 3), rep('C', 4), rep('D',1), rep('E',1),
rep('F',1))
combine.levels(x, ord=TRUE, m=3)
}
\author{
Frank Harrell
}
|