1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/MxCompute.R
\name{mxComputeLoadData}
\alias{mxComputeLoadData}
\alias{MxComputeLoadData-class}
\title{Load columns into an MxData object}
\usage{
mxComputeLoadData(
dest,
column,
method = c("csv", "data.frame"),
...,
path = c(),
originalDataIsIndexOne = FALSE,
byrow = TRUE,
row.names = c(),
col.names = c(),
skip.rows = 0,
skip.cols = 0,
verbose = 0L,
cacheSize = 100L,
checkpointMetadata = TRUE,
na.strings = c("NA"),
observed = NULL,
rowFilter = c()
)
}
\arguments{
\item{dest}{the name of the model where the columns will be loaded}
\item{column}{a character vector. The column names to replace.}
\item{method}{name of the conduit used to load the columns.}
\item{...}{Not used. Forces remaining arguments to be specified by name.}
\item{path}{the path to the file containing the data}
\item{originalDataIsIndexOne}{logical. Whether to use the initial data for index 1}
\item{byrow}{logical. Whether the data columns are stored in rows.}
\item{row.names}{optional integer. Column containing the row names.}
\item{col.names}{optional integer. Row containing the column names.}
\item{skip.rows}{integer. Number of rows to skip before reading data.}
\item{skip.cols}{integer. Number of columns to skip before reading data.}
\item{verbose}{integer. Level of run-time diagnostic output. Set to zero to disable}
\item{cacheSize}{integer. How many columns to cache per
scan through the data. Only used when byrow=FALSE.}
\item{checkpointMetadata}{logical. Whether to add per record metadata to the checkpoint}
\item{na.strings}{character vector. A vector of strings that denote a missing value.}
\item{observed}{data frame. The reservoir of data for \code{method='data.frame'}.}
\item{rowFilter}{logical vector. Whether to skip the source row.}
}
\description{
\lifecycle{experimental}
}
\details{
The purpose of this compute step is to help quickly perform many
similar analyses. For example, if we are given a sample of people
with a few million SNPs (single-nucleotide polymorphism) per
person then we could fit a separate model for each SNP by iterating
over the SNP data.
The column names given in the \code{column} parameter must already
exist in the model's MxData object. Pre-existing data is assumed to be
a placeholder and is not used unless
\code{originalDataIsIndexOne} is set to TRUE.
For \code{method='csv'}, the highest performance arrangement is
\code{byrow=TRUE} because entire columns are stored in single
chunks (rows) on the disk and can be easily loaded. For
\code{byrow=FALSE}, the data requires transposition. To load a
single column of observed data, it is necessary to read through
the whole file. This can be slow for large files. To amortize the
cost of transposition, \code{cacheSize} columns are loaded on
every pass through the file.
After \code{mxRun} returns, the \code{dest} mxData object will
contain the most recently loaded data. Hence, any single analysis
of a series can be reproduced by issuing \code{mxComputeLoadData}
with the single index associated with a particular dataset,
replacing the compute plan with something like
\code{omxDefaultComputePlan}, and then passing the model back
through \code{mxRun}. This can be a helpful approach when
investigating unexpected results.
}
\seealso{
\link{mxComputeLoadMatrix}, \link{mxComputeCheckpoint}, \link{mxRun}, \link{omxDefaultComputePlan}
}
|