1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148
|
%% $Id$
\encoding{UTF-8}
\name{crossval}
\alias{crossval}
\title{Cross-validation of PLSR and PCR models}
\description{
A \dQuote{stand alone} cross-validation function for \code{mvr} objects.
}
\usage{
crossval(object, segments = 10,
segment.type = c("random", "consecutive", "interleaved"),
length.seg, jackknife = FALSE, trace = 15, \dots)
}
\arguments{
\item{object}{an \code{mvr} object; the regression to cross-validate.}
\item{segments}{the number of segments to use, or a list with segments
(see below).}
\item{segment.type}{the type of segments to use. Ignored if
\code{segments} is a list.}
\item{length.seg}{Positive integer. The length of the segments to
use. If specified, it overrides \code{segments} unless
\code{segments} is a list.}
\item{jackknife}{logical. Whether jackknifing of regression
coefficients should be performed.}
\item{trace}{if \code{TRUE}, tracing is turned on. If numeric, it
denotes a time limit (in seconds). If the estimated total time of
the cross-validation exceeds this limit, tracing is turned on.}
\item{\dots}{additional arguments, sent to the underlying fit function.}
}
\details{
This function performs cross-validation on a model fit by \code{mvr}.
It can handle models such as \code{plsr(y ~ msc(X), \dots)} or other
models where the predictor variables need to be recalculated for each
segment. When recalculation is not needed, the result of
\code{crossval(mvr(\dots))} is identical to \code{mvr(\dots,
validation = "CV")}, but slower.
Note that to use \code{crossval}, the data \emph{must} be specified
with a \code{data} argument when fitting \code{object}.
If \code{segments} is a list, the arguments \code{segment.type} and
\code{length.seg} are ignored. The elements of the list should be
integer vectors specifying the indices of the segments. See
\code{\link{cvsegments}} for details.
Otherwise, segments of type \code{segment.type} are generated. How
many segments to generate is selected by specifying the number of
segments in \code{segments}, or giving the segment length in
\code{length.seg}. If both are specified, \code{segments} is
ignored.
If \code{jackknife} is \code{TRUE}, jackknifed regression coefficients
are returned, which can be used for for variance estimation
(\code{\link{var.jack}}) or hypothesis testing (\code{\link{jack.test}}).
When tracing is turned on, the segment number is printed for each segment.
By default, the cross-validation will be performed serially. However,
it can be done in parallel using functionality in the
\code{\link{parallel}} package by setting the option \code{parallel} in
\code{\link{pls.options}}. See \code{\link{pls.options}} for the
different ways to specify the parallelism. See also Examples below.
}
\value{
The supplied \code{object} is returned, with an additional component
\code{validation}, which is a list with components
\item{method}{euqals \code{"CV"} for cross-validation.}
\item{pred}{an array with the cross-validated predictions.}
\item{coefficients}{(only if \code{jackknife} is \code{TRUE}) an array
with the jackknifed regression coefficients. The dimensions
correspond to the predictors, responses, number of components, and
segments, respectively.}
\item{PRESS0}{a vector of PRESS values (one for each response variable)
for a model with zero components, i.e., only the intercept.}
\item{PRESS}{a matrix of PRESS values for models with 1, \ldots,
\code{ncomp} components. Each row corresponds to one response variable.}
\item{adj}{a matrix of adjustment values for calculating bias
corrected MSEP. \code{MSEP} uses this.}
\item{segments}{the list of segments used in the cross-validation.}
\item{ncomp}{the number of components.}
\item{gammas}{if method \code{cppls} is used, gamma values for the
powers of each CV segment are returned.}
}
\references{
Mevik, B.-H., Cederkvist, H. R. (2004) Mean Squared Error of
Prediction (MSEP) Estimates for Principal Component Regression (PCR)
and Partial Least Squares Regression (PLSR).
\emph{Journal of Chemometrics}, \bold{18}(9), 422--429.
}
\author{Ron Wehrens and Bjørn-Helge Mevik}
\note{
The \code{PRESS0} is always cross-validated using leave-one-out
cross-validation. This usually makes little difference in practice,
but should be fixed for correctness.
The current implementation of the jackknife stores all
jackknife-replicates of the regression coefficients, which can be very
costly for large matrices. This might change in a future version.
}
\seealso{
\code{\link{mvr}}
\code{\link{mvrCv}}
\code{\link{cvsegments}}
\code{\link{MSEP}}
\code{\link{var.jack}}
\code{\link{jack.test}}
}
\examples{
data(yarn)
yarn.pcr <- pcr(density ~ msc(NIR), 6, data = yarn)
yarn.cv <- crossval(yarn.pcr, segments = 10)
\dontrun{plot(MSEP(yarn.cv))}
\dontrun{
## Parallelised cross-validation, using transient cluster:
pls.options(parallel = 4) # use mclapply (not available on Windows)
pls.options(parallel = quote(parallel::makeCluster(4, type = "PSOCK"))) # parLapply
## A new cluster is created and stopped for each cross-validation:
yarn.cv <- crossval(yarn.pcr)
yarn.loocv <- crossval(yarn.pcr, length.seg = 1)
## Parallelised cross-validation, using persistent cluster:
library(parallel)
## This creates the cluster:
pls.options(parallel = makeCluster(4, type = "FORK")) # not available on Windows
pls.options(parallel = makeCluster(4, type = "PSOCK"))
## The cluster can be used several times:
yarn.cv <- crossval(yarn.pcr)
yarn.loocv <- crossval(yarn.pcr, length.seg = 1)
## The cluster should be stopped manually afterwards:
stopCluster(pls.options()$parallel)
## Parallelised cross-validation, using persistent MPI cluster:
## This requires the packages snow and Rmpi to be installed
library(parallel)
## This creates the cluster:
pls.options(parallel = makeCluster(4, type = "MPI"))
## The cluster can be used several times:
yarn.cv <- crossval(yarn.pcr)
yarn.loocv <- crossval(yarn.pcr, length.seg = 1)
## The cluster should be stopped manually afterwards:
stopCluster(pls.options()$parallel)
## It is good practice to call mpi.exit() or mpi.quit() afterwards:
mpi.exit()
}
}
\keyword{regression}
\keyword{multivariate}
|