File: mxComputeLoadData.Rd

package info (click to toggle)
r-cran-openmx 2.21.1%2Bdfsg-1
links: PTS, VCS
area: main
in suites: bookworm
size: 14,412 kB
sloc: cpp: 36,577; ansic: 13,811; fortran: 2,001; sh: 1,440; python: 350; perl: 21; makefile: 5
file content (99 lines) | stat: -rw-r--r-- 3,462 bytes
parent folder | download | duplicates (3)
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/MxCompute.R
\name{mxComputeLoadData}
\alias{mxComputeLoadData}
\alias{MxComputeLoadData-class}
\title{Load columns into an MxData object}
\usage{
mxComputeLoadData(
  dest,
  column,
  method = c("csv", "data.frame"),
  ...,
  path = c(),
  originalDataIsIndexOne = FALSE,
  byrow = TRUE,
  row.names = c(),
  col.names = c(),
  skip.rows = 0,
  skip.cols = 0,
  verbose = 0L,
  cacheSize = 100L,
  checkpointMetadata = TRUE,
  na.strings = c("NA"),
  observed = NULL,
  rowFilter = c()
)
}
\arguments{
\item{dest}{the name of the model where the columns will be loaded}

\item{column}{a character vector. The column names to replace.}

\item{method}{name of the conduit used to load the columns.}

\item{...}{Not used.  Forces remaining arguments to be specified by name.}

\item{path}{the path to the file containing the data}

\item{originalDataIsIndexOne}{logical. Whether to use the initial data for index 1}

\item{byrow}{logical. Whether the data columns are stored in rows.}

\item{row.names}{optional integer. Column containing the row names.}

\item{col.names}{optional integer. Row containing the column names.}

\item{skip.rows}{integer. Number of rows to skip before reading data.}

\item{skip.cols}{integer. Number of columns to skip before reading data.}

\item{verbose}{integer. Level of run-time diagnostic output. Set to zero to disable}

\item{cacheSize}{integer. How many columns to cache per
scan through the data. Only used when byrow=FALSE.}

\item{checkpointMetadata}{logical. Whether to add per record metadata to the checkpoint}

\item{na.strings}{character vector. A vector of strings that denote a missing value.}

\item{observed}{data frame. The reservoir of data for \code{method='data.frame'}.}

\item{rowFilter}{logical vector. Whether to skip the source row.}
}
\description{
\lifecycle{experimental}
}
\details{
The purpose of this compute step is to help quickly perform many
similar analyses. For example, if we are given a sample of people
with a few million SNPs (single-nucleotide polymorphism) per
person then we could fit a separate model for each SNP by iterating
over the SNP data.

The column names given in the \code{column} parameter must already
exist in the model's MxData object. Pre-existing data is assumed to be
a placeholder and is not used unless
\code{originalDataIsIndexOne} is set to TRUE.

For \code{method='csv'}, the highest performance arrangement is
\code{byrow=TRUE} because entire columns are stored in single
chunks (rows) on the disk and can be easily loaded. For
\code{byrow=FALSE}, the data requires transposition. To load a
single column of observed data, it is necessary to read through
the whole file. This can be slow for large files. To amortize the
cost of transposition, \code{cacheSize} columns are loaded on
every pass through the file.

After \code{mxRun} returns, the \code{dest} mxData object will
contain the most recently loaded data. Hence, any single analysis
of a series can be reproduced by issuing \code{mxComputeLoadData}
with the single index associated with a particular dataset,
replacing the compute plan with something like
\code{omxDefaultComputePlan}, and then passing the model back
through \code{mxRun}. This can be a helpful approach when
investigating unexpected results.
}
\seealso{
\link{mxComputeLoadMatrix}, \link{mxComputeCheckpoint}, \link{mxRun}, \link{omxDefaultComputePlan}
}