File: mice.impute.midastouch.Rd

package info (click to toggle)
r-cran-mice 3.17.0-1
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 2,380 kB
  • sloc: cpp: 121; sh: 25; makefile: 2
file content (155 lines) | stat: -rw-r--r-- 5,995 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/mice.impute.midastouch.R
\name{mice.impute.midastouch}
\alias{mice.impute.midastouch}
\title{Imputation by predictive mean matching with distance aided donor selection}
\usage{
mice.impute.midastouch(
  y,
  ry,
  x,
  wy = NULL,
  ridge = 1e-05,
  midas.kappa = NULL,
  outout = TRUE,
  neff = NULL,
  debug = NULL,
  ...
)
}
\arguments{
\item{y}{Vector to be imputed}

\item{ry}{Logical vector of length \code{length(y)} indicating the
the subset \code{y[ry]} of elements in \code{y} to which the imputation
model is fitted. The \code{ry} generally distinguishes the observed
(\code{TRUE}) and missing values (\code{FALSE}) in \code{y}.}

\item{x}{Numeric design matrix with \code{length(y)} rows with predictors for
\code{y}. Matrix \code{x} may have no missing values.}

\item{wy}{Logical vector of length \code{length(y)}. A \code{TRUE} value
indicates locations in \code{y} for which imputations are created.}

\item{ridge}{The ridge penalty used in \code{.norm.draw()} to prevent
problems with multicollinearity. The default is \code{ridge = 1e-05},
which means that 0.01 percent of the diagonal is added to the cross-product.
Larger ridges may result in more biased estimates. For highly noisy data
(e.g. many junk variables), set \code{ridge = 1e-06} or even lower to
reduce bias. For highly collinear data, set \code{ridge = 1e-04} or higher.}

\item{midas.kappa}{Scalar. If \code{NULL} (default) then the
optimal \code{kappa} gets selected automatically. Alternatively, the user
may specify a scalar. Siddique and Belin 2008 find \code{midas.kappa = 3}
to be sensible.}

\item{outout}{Logical. If \code{TRUE} (default) one model is estimated
for each donor (leave-one-out principle). For speedup choose
\code{outout = FALSE}, which estimates one model for all observations
leading to in-sample predictions for the donors and out-of-sample
predictions for the recipients. Mind the inappropriateness, though.}

\item{neff}{FOR EXPERTS. Null or character string. The name of an existing
environment in which the effective sample size of the donors for each
loop (CE iterations times multiple imputations) is supposed to be written.
The effective sample size is necessary to compute the correction for the
total variance as originally suggested by Parzen, Lipsitz and
Fitzmaurice 2005. The objectname is \code{midastouch.neff}.}

\item{debug}{FOR EXPERTS. Null or character string. The name of an existing
environment in which the input is supposed to be written. The objectname
is \code{midastouch.inputlist}.}

\item{...}{Other named arguments.}
}
\value{
Vector with imputed data, same type as \code{y}, and of
length \code{sum(wy)}
}
\description{
Imputes univariate missing data using predictive mean matching.
}
\details{
Imputation of \code{y} by predictive mean matching, based on
Rubin (1987, p. 168, formulas a and b) and Siddique and Belin 2008.
The procedure is as follows:
\enumerate{
\item Draw a bootstrap sample from the donor pool.
\item Estimate a beta matrix on the bootstrap sample by the leave one out principle.
\item Compute type II predicted values for \code{yobs} (nobs x 1) and \code{ymis} (nmis x nobs).
\item Calculate the distance between all \code{yobs} and the corresponding \code{ymis}.
\item Convert the distances in drawing probabilities.
\item For each recipient draw a donor from the entire pool while considering the probabilities from the model.
\item Take its observed value in \code{y} as the imputation.
}
}
\examples{
# do default multiple imputation on a numeric matrix
imp <- mice(nhanes, method = "midastouch")
imp

# list the actual imputations for BMI
imp$imp$bmi

# first completed data matrix
complete(imp)

# imputation on mixed data with a different method per column
mice(nhanes2, method = c("sample", "midastouch", "logreg", "norm"))
}
\references{
Gaffert, P., Meinfelder, F., Bosch V. (2018) Towards an MI-proper
Predictive Mean Matching, JSM 2018. Discussion Paper.

Little, R.J.A. (1988), Missing data adjustments in large
surveys (with discussion), Journal of Business Economics and
Statistics, 6, 287--301.

Parzen, M., Lipsitz, S. R., Fitzmaurice, G. M. (2005), A note on reducing
the bias of the approximate Bayesian bootstrap imputation variance estimator.
Biometrika \bold{92}, 4, 971--974.

Rubin, D.B. (1987), Multiple imputation for nonresponse in surveys. New York: Wiley.

Siddique, J., Belin, T.R. (2008), Multiple imputation using an iterative
hot-deck with distance-based donor selection. Statistics in medicine,
\bold{27}, 1, 83--102

Van Buuren, S., Brand, J.P.L., Groothuis-Oudshoorn C.G.M., Rubin, D.B. (2006),
Fully conditional specification in multivariate imputation.
\emph{Journal of Statistical Computation and Simulation}, \bold{76}, 12,
1049--1064.

Van Buuren, S., Groothuis-Oudshoorn, K. (2011), \code{mice}: Multivariate
Imputation by Chained Equations in \code{R}. \emph{Journal of
Statistical Software}, \bold{45}, 3, 1--67. \doi{10.18637/jss.v045.i03}
}
\seealso{
Other univariate imputation functions: 
\code{\link{mice.impute.cart}()},
\code{\link{mice.impute.lasso.logreg}()},
\code{\link{mice.impute.lasso.norm}()},
\code{\link{mice.impute.lasso.select.logreg}()},
\code{\link{mice.impute.lasso.select.norm}()},
\code{\link{mice.impute.lda}()},
\code{\link{mice.impute.logreg}()},
\code{\link{mice.impute.logreg.boot}()},
\code{\link{mice.impute.mean}()},
\code{\link{mice.impute.mnar.logreg}()},
\code{\link{mice.impute.mpmm}()},
\code{\link{mice.impute.norm}()},
\code{\link{mice.impute.norm.boot}()},
\code{\link{mice.impute.norm.nob}()},
\code{\link{mice.impute.norm.predict}()},
\code{\link{mice.impute.pmm}()},
\code{\link{mice.impute.polr}()},
\code{\link{mice.impute.polyreg}()},
\code{\link{mice.impute.quadratic}()},
\code{\link{mice.impute.rf}()},
\code{\link{mice.impute.ri}()}
}
\author{
Philipp Gaffert, Florian Meinfelder, Volker Bosch 2015
}
\concept{univariate imputation functions}
\keyword{datagen}