File: weightedMedian.Rd

package info (click to toggle)
r-cran-matrixstats 1.5.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 2,104 kB
  • sloc: ansic: 7,300; sh: 11; makefile: 2
file content (150 lines) | stat: -rw-r--r-- 5,484 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/weightedMedian.R
\name{weightedMedian}
\alias{weightedMedian}
\title{Weighted Median Value}
\usage{
weightedMedian(x, w = NULL, idxs = NULL, na.rm = FALSE,
  interpolate = is.null(ties), ties = NULL, ...)
}
\arguments{
\item{x}{\code{\link[base]{vector}} of type \code{\link[base]{integer}},
\code{\link[base]{numeric}}, or \code{\link[base]{logical}}.}

\item{w}{a vector of weights the same length as \code{x} giving the weights
to use for each element of \code{x}. Negative weights are treated as zero
weights. Default value is equal weight to all values.}

\item{idxs}{A \code{\link[base]{vector}} indicating subset of elements to
operate over. If \code{\link[base]{NULL}}, no subsetting is done.}

\item{na.rm}{a logical value indicating whether \code{\link[base]{NA}}
values in \code{x} should be stripped before the computation proceeds, or
not.  If \code{\link[base]{NA}}, no check at all for \code{\link[base]{NA}}s
is done.}

\item{interpolate}{If \code{\link[base:logical]{TRUE}}, linear interpolation
is used to get a consistent estimate of the weighted median.}

\item{ties}{If \code{interpolate == FALSE}, a character string specifying
how to solve ties between two \code{x}'s that are satisfying the weighted
median criteria.  Note that at most two values can satisfy the criteria.
When \code{ties} is \code{"min"} ("lower weighted median"), the smaller
value of the two is returned and when it is \code{"max"} ("upper weighted
median"), the larger value is returned.  If \code{ties}
is \code{"mean"}, the mean of the two values is returned.  Finally, if
\code{ties} is \code{"weighted"} (or \code{\link[base]{NULL}}) a weighted
average of the two are returned, where the weights are weights of all values
\code{x[i] <= x[k]} and \code{x[i] >= x[k]}, respectively.}

\item{...}{Not used.}
}
\value{
Returns a \code{\link[base]{numeric}} scalar.

For the \code{n} elements \code{x = c(x[1], x[2], ..., x[n])} with positive
weights \code{w = c(w[1], w[2], ..., w[n])} such that \code{sum(w) = S}, the
\emph{weighted median} is defined as the element \code{x[k]} for which the
total weight of all elements \code{x[i] < x[k]} is less or equal to
\code{S/2} and for which the total weight of all elements \code{x[i] > x[k]}
is less or equal to \code{S/2} (c.f. [1]).

When using linear interpolation, the weighted mean of \code{x[k-1]} and
\code{x[k]} with weights \code{S[k-1]} and \code{S[k]} corresponding to the
cumulative weights of those two elements is used as an estimate.

If \code{w} is missing then all elements of \code{x} are given the same
positive weight. If all weights are zero, \code{\link[base:NA]{NA_real_}} is
returned.

If one or more weights are \code{Inf}, it is the same as these weights have
the same weight and the others have zero. This makes things easier for cases
where the weights are result of a division with zero.

If there are missing values in \code{w} that are part of the calculation
(after subsetting and dropping missing values in \code{x}), then the final
result is always \code{NA} of the same type as \code{x}.

The weighted median solves the following optimization problem:

\deqn{\alpha^* = \arg_\alpha \min \sum_{i = 1}^{n} w_i |x_i-\alpha|} where
\eqn{x = (x_1, x_2, \ldots, x_n)} are scalars and
\eqn{w = (w_1, w_2, \ldots, w_n)} are the corresponding "weights" for each
individual \eqn{x} value.
}
\description{
Computes a weighted median of a numeric vector.
}
\examples{
x <- 1:10
n <- length(x)

m1 <- median(x)                             # 5.5
m2 <- weightedMedian(x)                     # 5.5
stopifnot(identical(m1, m2))

w <- rep(1, times = n)
m1 <- weightedMedian(x, w)                  # 5.5 (default)
m2 <- weightedMedian(x, ties = "weighted")  # 5.5 (default)
m3 <- weightedMedian(x, ties = "min")       # 5
m4 <- weightedMedian(x, ties = "max")       # 6
stopifnot(identical(m1, m2))

# Pull the median towards zero
w[1] <- 5
m1 <- weightedMedian(x, w)                  # 3.5
y <- c(rep(0, times = w[1]), x[-1])         # Only possible for integer weights
m2 <- median(y)                             # 3.5
stopifnot(identical(m1, m2))

# Put even more weight on the zero
w[1] <- 8.5
weightedMedian(x, w)                # 2

# All weight on the first value
w[1] <- Inf
weightedMedian(x, w)                # 1

# All weight on the last value
w[1] <- 1
w[n] <- Inf
weightedMedian(x, w)                # 10

# All weights set to zero
w <- rep(0, times = n)
weightedMedian(x, w)                # NA

# Simple benchmarking
bench <- function(N = 1e5, K = 10) {
  x <- rnorm(N)
  gc()
  t <- c()
  t[1] <- system.time(for (k in 1:K) median(x))[3]
  t[2] <- system.time(for (k in 1:K) weightedMedian(x))[3]
  t <- t / t[1]
  names(t) <- c("median", "weightedMedian")
  t
}

print(bench(N =     5, K = 100))
print(bench(N =    50, K = 100))
print(bench(N =   200, K = 100))
print(bench(N =  1000, K = 100))
print(bench(N =  10e3, K =  20))
print(bench(N = 100e3, K =  20))
}
\references{
[1] T.H. Cormen, C.E. Leiserson, R.L. Rivest, Introduction to
Algorithms, The MIT Press, Massachusetts Institute of Technology, 1989.
}
\seealso{
\code{\link[stats]{median}}, \code{\link[base]{mean}}() and
\code{\link{weightedMean}}().
}
\author{
Henrik Bengtsson and Ola Hossjer, Centre for Mathematical Sciences,
Lund University.  Thanks to Roger Koenker, Econometrics, University of
Illinois, for the initial ideas.
}
\keyword{robust}
\keyword{univar}