1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/winsorize.R
\name{winsorize}
\alias{winsorize}
\alias{winsorize.numeric}
\title{Winsorize data}
\usage{
winsorize(data, ...)
\method{winsorize}{numeric}(
data,
threshold = 0.2,
method = "percentile",
robust = FALSE,
verbose = TRUE,
...
)
}
\arguments{
\item{data}{data frame or vector.}
\item{...}{Currently not used.}
\item{threshold}{The amount of winsorization, depends on the value of \code{method}:
\itemize{
\item For \code{method = "percentile"}: the amount to winsorize from \emph{each} tail.
The value of \code{threshold} must be between 0 and 0.5 and of length 1.
\item For \code{method = "zscore"}: the number of \emph{SD}/\emph{MAD}-deviations from the
\emph{mean}/\emph{median} (see \code{robust}). The value of \code{threshold} must be greater
than 0 and of length 1.
\item For \code{method = "raw"}: a vector of length 2 with the lower and upper bound
for winsorization.
}}
\item{method}{One of "percentile" (default), "zscore", or "raw".}
\item{robust}{Logical, if TRUE, winsorizing through the "zscore" method is
done via the median and the median absolute deviation (MAD); if FALSE, via
the mean and the standard deviation.}
\item{verbose}{Not used anymore since \code{datawizard} 0.6.6.}
}
\value{
A data frame with winsorized columns or a winsorized vector.
}
\description{
Winsorize data
}
\details{
Winsorizing or winsorization is the transformation of statistics by limiting
extreme values in the statistical data to reduce the effect of possibly
spurious outliers. The distribution of many statistics can be heavily
influenced by outliers. A typical strategy is to set all outliers (values
beyond a certain threshold) to a specified percentile of the data; for
example, a \verb{90\%} winsorization would see all data below the 5th percentile set
to the 5th percentile, and data above the 95th percentile set to the 95th
percentile. Winsorized estimators are usually more robust to outliers than
their more standard forms.
}
\examples{
hist(iris$Sepal.Length, main = "Original data")
hist(winsorize(iris$Sepal.Length, threshold = 0.2),
xlim = c(4, 8), main = "Percentile Winsorization"
)
hist(winsorize(iris$Sepal.Length, threshold = 1.5, method = "zscore"),
xlim = c(4, 8), main = "Mean (+/- SD) Winsorization"
)
hist(winsorize(iris$Sepal.Length, threshold = 1.5, method = "zscore", robust = TRUE),
xlim = c(4, 8), main = "Median (+/- MAD) Winsorization"
)
hist(winsorize(iris$Sepal.Length, threshold = c(5, 7.5), method = "raw"),
xlim = c(4, 8), main = "Raw Thresholds"
)
# Also works on a data frame:
winsorize(iris, threshold = 0.2)
}
\seealso{
\itemize{
\item Add a prefix or suffix to column names: \code{\link[=data_addprefix]{data_addprefix()}}, \code{\link[=data_addsuffix]{data_addsuffix()}}
\item Functions to reorder or remove columns: \code{\link[=data_reorder]{data_reorder()}}, \code{\link[=data_relocate]{data_relocate()}},
\code{\link[=data_remove]{data_remove()}}
\item Functions to reshape, pivot or rotate data frames: \code{\link[=data_to_long]{data_to_long()}},
\code{\link[=data_to_wide]{data_to_wide()}}, \code{\link[=data_rotate]{data_rotate()}}
\item Functions to recode data: \code{\link[=rescale]{rescale()}}, \code{\link[=reverse]{reverse()}}, \code{\link[=categorize]{categorize()}},
\code{\link[=recode_values]{recode_values()}}, \code{\link[=slide]{slide()}}
\item Functions to standardize, normalize, rank-transform: \code{\link[=center]{center()}}, \code{\link[=standardize]{standardize()}},
\code{\link[=normalize]{normalize()}}, \code{\link[=ranktransform]{ranktransform()}}, \code{\link[=winsorize]{winsorize()}}
\item Split and merge data frames: \code{\link[=data_partition]{data_partition()}}, \code{\link[=data_merge]{data_merge()}}
\item Functions to find or select columns: \code{\link[=data_select]{data_select()}}, \code{\link[=extract_column_names]{extract_column_names()}}
\item Functions to filter rows: \code{\link[=data_match]{data_match()}}, \code{\link[=data_filter]{data_filter()}}
}
}
|