File: quark.Rd

package info (click to toggle)
r-cran-semtools 0.5.7-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 3,204 kB
  • sloc: makefile: 2
file content (116 lines) | stat: -rw-r--r-- 5,081 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/quark.R
\name{quark}
\alias{quark}
\title{Quark}
\usage{
quark(data, id, order = 1, silent = FALSE, ...)
}
\arguments{
\item{data}{The data frame is a required component for \code{quark}.  In
order for \code{quark} to process a data frame, it must not contain any
factors or text-based variables.  All variables must be in numeric format.
Identifiers and dates can be left in the data; however, they will need to be
identified under the \code{id} argument.}

\item{id}{Identifiers and dates within the dataset will need to be
acknowledged as \code{quark} cannot process these.  By acknowledging the
identifiers and dates as a vector of column numbers or variable names,
\code{quark} will remove them from the data temporarily to complete its main
processes.  Among many potential issues of not acknowledging identifiers and
dates are issues involved with imputation, product and polynomial effects,
and principal component analysis.}

\item{order}{Order is an optional argument provided by quark that can be
used when the imputation procedures in mice fail.  Under some circumstances,
mice cannot calculate missing values due to issues with extreme missingness.
Should an error present itself stating a failure due to not having any
columns selected, set the argument \code{order = 2} in order to reorder the
imputation method procedure.  Otherwise, use the default \code{order = 1}.}

\item{silent}{If \code{FALSE}, the details of the \code{quark} process are
printed.}

\item{\dots}{additional arguments to pass to \code{\link[mice:mice]{mice::mice()}}.}
}
\value{
The output value from using the quark function is a list. It will
return a list with 7 components.
\item{ID Columns}{Is a vector of the identifier columns entered when
running quark.}
\item{ID Variables}{Is a subset of the dataset that contains the identifiers
as acknowledged when running quark.}
\item{Used Data}{Is a matrix / dataframe of the data provided by user as
the basis for quark to process.}
\item{Imputed Data}{Is a matrix / dataframe of the data after the multiple
method imputation process.}
\item{Big Matrix}{Is the expanded product and polynomial matrix.}
\item{Principal Components}{Is the entire dataframe of principal components
for the dataset.  This dataset will have the same number of rows of the big
matrix, but will have 1 less column (as is the case with principal
component analyses).}
\item{Percent Variance Explained}{Is a vector of the percent variance
explained with each column of principal components.}
}
\description{
The \code{quark} function provides researchers with the ability to calculate
and include component scores calculated by taking into account the variance
in the original dataset and all of the interaction and polynomial effects of
the data in the dataset.
}
\details{
The \code{quark} function calculates these component scores by first filling
in the data via means of multiple imputation methods and then expanding the
dataset by aggregating the non-overlapping interaction effects between
variables by calculating the mean of the interactions and polynomial
effects.  The multiple imputation methods include one of iterative sampling
and group mean substitution and multiple imputation using a polytomous
regression algorithm (mice). During the expansion process, the dataset is
expanded to three times its normal size (in width). The first third of the
dataset contains all of the original data post imputation, the second third
contains the means of the polynomial effects (squares and cubes), and the
final third contains the means of the non-overlapping interaction effects. A
full principal componenent analysis is conducted and the individual
components are retained. The subsequent \code{\link[=combinequark]{combinequark()}} function
provides researchers the control in determining how many components to
extract and retain. The function returns the dataset as submitted (with
missing values) and the component scores as requested for a more accurate
multiple imputation in subsequent steps.
}
\examples{

set.seed(123321)

dat <- HolzingerSwineford1939[,7:15]
misspat <- matrix(runif(nrow(dat) * 9) < 0.3, nrow(dat))
dat[misspat] <- NA
dat <- cbind(HolzingerSwineford1939[,1:3], dat)
\donttest{
quark.list <- quark(data = dat, id = c(1, 2))

final.data <- combinequark(quark = quark.list, percent = 80)

## Example to rerun quark after imputation failure:
quark.list <- quark(data = dat, id = c(1, 2), order = 2)
}

}
\references{
Howard, W. J., Rhemtulla, M., & Little, T. D. (2015). Using
Principal Components as Auxiliary Variables in Missing Data Estimation.
\emph{Multivariate Behavioral Research, 50}(3), 285--299.
\doi{10.1080/00273171.2014.999267}
}
\seealso{
\code{\link[=combinequark]{combinequark()}}
}
\author{
Steven R. Chesnut (University of Southern Mississippi;
\email{Steven.Chesnut@usm.edu})

Danny Squire (Texas Tech University)

Terrence D. Jorgensen (University of Amsterdam)

The PCA code is copied and modified from the \code{FactoMineR} package.
}