File: 02missing_data.frame.Rd

package info (click to toggle)
r-cran-mi 1.2-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 1,380 kB
  • sloc: sh: 13; makefile: 2
file content (188 lines) | stat: -rw-r--r-- 10,765 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
\name{02missing_data.frame}
\Rdversion{1.1}
\docType{class}
\alias{02missing_data.frame}
\alias{missing_data.frame-class}
\alias{missing_data.frame}
\title{Class "missing_data.frame"}
\description{
This class is similar to a \code{\link{data.frame}} but is customized for the situation in 
which variables with missing data are being modeled for multiple imputation. This class primarily 
consists of a list of \code{\link{missing_variable}}s plus slots containing metadata indicating how the
\code{\link{missing_variable}}s relate to each other. Most operations that work for a
\code{\link{data.frame}} also work for a missing_data.frame.
}
\section{Objects from the Class}{
Objects can be created by calls of the form \code{new("missing_data.frame", ...)}.
However, useRs almost always will pass a \code{\link{data.frame}} to the 
missing_data.frame constructor function to produce an object of missing_data.frame class.
}
\usage{
missing_data.frame(y, ...)
## Hidden arguments not included in the signature
## favor_ordered = TRUE, favor_positive = FALSE, 
## subclass = NA_character_,
## include_missingness = TRUE, skip_correlation_check = FALSE
}
\arguments{
  \item{y}{Usually a \code{\link{data.frame}}, possibly a numeric matrix, 
    possibly a list of \code{\link{missing_variable}}s.}
  \item{\dots}{Hidden arguments. The \code{favor_ordered} and \code{favor_positive}
    arguments are passed to the \code{\link{missing_variable}} function and are 
    documented under the \code{type} argument. Briefly, they affect the heuristics
    that are used to guess what class a variable should be coerced to. The 
    \code{subclass} argument defaults to \code{\link{NA}} and can be used to specify
    that the resulting object should inherit from the missing_data.frame class
    rather than be an object of \code{missing_data.frame} class.

    Any further arguments are passed to the \code{\link{initialize-methods}} for
    a missing_data.frame. They currently are \code{include_missingness}, which 
    defaults to \code{TRUE} and indicates that the missingness pattern of the other
    variables should be included when modeling a particular \code{\link{missing_variable}}, 
    and \code{skip_correlation_check}, which defaults to FALSE and indicates whether
    to skip the default check for whether the observed values of each pair of \code{\link{missing_variable}}s 
    has a perfect absolute Spearman \code{\link{cor}}relation.
}
}
\section{Slots}{
  This section is primarily aimed at developeRs. A missing_data.frame inherits from
  \code{\link{data.frame}} but has the following additional slots:
  \describe{
    \item{\code{variables}:}{Object of class \code{"list"} and each list element
      is an object that inherits from the \code{\link{missing_variable-class}} }
    \item{\code{no_missing}:}{Object of class \code{"logical"}, which is a vector
      whose length is the same as the length of the \bold{variables} slot indicating 
      whether the corresponding \code{\link{missing_variable}} is fully observed }
    \item{\code{patterns}:}{Object of class \code{\link{factor}} whose length is equal
      to the number of observation and whose elements indicate the missingness pattern
      for that observation}
    \item{\code{DIM}:}{Object of class \code{"integer"} of length two indicating
      first the number of observations and second the length of the \bold{variables}
      slot }
    \item{\code{DIMNAMES}:}{Object of class \code{"list"} of length two providing
      the appropriate number rownames and column names }
    \item{\code{postprocess}:}{Object of class \code{"function"} used to create
      additional variables from existing variables, such as interactions between
      two \code{\link{missing_variable}}s once their missing values have been
      imputed. Does not work at the moment}
    \item{\code{index}:}{Object of class \code{"list"} whose length is equal to 
      the number of \code{\link{missing_variable}}s with some missing values. Each
      list element is an integer vector indicating which columns of the \bold{X}
      slot must be dropped when modeling the corresponding \code{\link{missing_variable}} }
    \item{\code{X}:}{Object of \code{\link{MatrixTypeThing-class}} with rows equal to the
      number of observations and is loosely related to a \code{\link{model.matrix}}. Rather 
      than repeatedly parsing a \code{\link{formula}} during the multiple imputation process,
      this \bold{X} matrix is created once and some of its columns are dropped when
      modeling a \code{\link{missing_variable}} utilizing the \bold{index} slot.
      The columns of the \bold{X} matrix consists of numeric representations of the 
      \code{\link{missing_variable}}s plus (by default) the unique missingness patterns }
    \item{\code{weights}:}{Object of class \code{"list"} whose length is equal to one
       or the number of \code{\link{missing_variable}}s with some missing values. Each 
       list element is passed to the corresponding argument of \code{\link[arm]{bayesglm}} 
       and similar functions. In particular, some observations can be given a weight
       of zero, which should drop them when modeling some \code{\link{missing_variable}}s}
    \item{\code{priors}:}{Object of class \code{"list"} whose length is equal to the number
       of \code{\link{missing_variable}}s and whose elements give appropriate values for
       the priors used by the model fitting function wraped by the \code{\link{fit_model-methods}}; 
       see, e.g., \code{\link[arm]{bayesglm}}}
    \item{\code{correlations}:}{Object of class \code{"matrix"} with rows and
        columns equal to the length of the \bold{variables} slot. Its strict upper
        triangle contains Spearman \code{\link{cor}}relations between pairs of
        variables (ignoring missing values), and its strict lower triangle contains
        Squared Multiple Correlations (SMCs) between a variable and all other
        variables (ignoring missing values). If either a Spearman correlation or
        a SMC is very close to unity, there may be difficulty or error messages
        during the multiple imputation process.}
    \item{\code{done}:}{Object of class \code{"logical"} of length one indicating
        whether the missing values have been imputed}
    \item{\code{workpath}:}{Object of class \code{\link{character}} of length one indicating
        the path to a working directory that is used to store some objects}
  }
}
\details{
In most cases, the first step of an analysis is for a useR to call the 
\code{missing_data.frame} function on a \code{\link{data.frame}} whose variables
have some \code{\link{NA}} values, which will call the \code{\link{missing_variable}}
function on each column of the \code{\link{data.frame}} and return the \code{\link{list}}
that fills the \bold{variable} slot. The classes of the list elements will depend on the
nature of the column of the \code{\link{data.frame}} and various fallible heuristics. The
success rate can be enhanced by making sure that columns of the original 
\code{\link{data.frame}} that are intended to be categorical variables are 
(ordered if appropriate) \code{\link{factor}}s with labels. Even in the best case
scenario, it will often be necessary to utlize the \code{\link{change}} function to 
modify various discretionary aspects of the \code{\link{missing_variable}}s in the 
\bold{variables} slot of the missing_data.frame. The \code{\link{show}} method for
a missing_data.frame should be utilized to get a quick overview of the 
\code{\link{missing_variable}}s in a missing_data.frame and recognized what needs
to be \code{\link{change}}d.
}
\section{Methods}{
  There are many methods that are defined for a missing_data.frame, although some
  are primarily intended for developers. The most relevant ones for users are:
  \describe{
    \item{change}{\code{signature(data = "missing_data.frame", y = "ANY", what = "character", to = "ANY")}
      which is used to change discretionary aspects of the \code{\link{missing_variable}}s
      in the \bold{variables} slot of a missing_data.frame}
    \item{hist}{\code{signature(x = "missing_data.frame")} which shows histograms
      of the observed variables that have missingness}
    \item{image}{\code{signature(x = "missing_data.frame")} which plots 
      an image of the \bold{missingness} slot to visualize the pattern of missingness
      when \code{grayscale = FALSE} or the pattern of missingness in light of the
      observed values (\code{grayscale = TRUE}, the default)}
    \item{mi}{\code{signature(y = "missing_data.frame", model = "missing")} which 
      multiply imputes the missing values}
    \item{show}{\code{signature(object = "missing_data.frame")} which gives an overview
      of the salient characteristics of the \code{\link{missing_variable}}s in the 
      \bold{variables} slot of a missing_data.frame }
    \item{summary}{\code{signature(object = "missing_data.frame")} which produces the same
      result as the \code{\link{summary}} method for a \code{\link{data.frame}}}
  }
  There are also S3 methods for the \code{\link{dim}}, \code{\link{dimnames}}, and \code{\link{names}}
  generics, which allow functions like \code{\link{nrow}}, \code{\link{ncol}}, \code{\link{rownames}},
  \code{\link{colnames}}, etc. to work as expected on \code{missing_data.frame}s. Also, accessing
  and changing elements for a \code{missing_data.frame} mostly works the same way as for a
  \code{\link{data.frame}}
}
\value{
The \code{missing_data.frame} constructor function returns an object of class \code{missing_data.frame} 
or that inherits from the \code{missing_data.frame} class.
}
\author{
Ben Goodrich and Jonathan Kropko, for this version, based on earlier versions written by Yu-Sung Su, Masanao Yajima,
Maria Grazia Pittau, Jennifer Hill, and Andrew Gelman.
}
\seealso{
\code{\link{change}}, \code{\link{missing_variable}}, \code{\link{mi}},
\code{\link{experiment_missing_data.frame}}, \code{\link{multilevel_missing_data.frame}}
}
\examples{
# STEP 0: Get data
data(CHAIN, package = "mi")

# STEP 1: Convert to a missing_data.frame
mdf <- missing_data.frame(CHAIN) # warnings about missingness patterns
show(mdf)

# STEP 2: change things
mdf <- change(mdf, y = "log_virus", what = "transformation", to = "identity")

# STEP 3: look deeper
summary(mdf)
hist(mdf)
image(mdf)

# STEP 4: impute
\dontrun{
imputations <- mi(mdf)
}

## An example with subsetting on a fully observed variable
data(nlsyV, package = "mi")
mdfs <- missing_data.frame(nlsyV, favor_positive = TRUE, favor_ordered = FALSE, by = "first")
mdfs <- change(mdfs, y = "momed", what = "type", to = "ord")
show(mdfs)

}
\keyword{classes}
\keyword{manip}
\keyword{AimedAtUseRs}