File: plsda.Rd

package info (click to toggle)
r-cran-caret 6.0-81-2
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 7,268 kB
  • sloc: ansic: 208; sh: 10; makefile: 2
file content (142 lines) | stat: -rw-r--r-- 4,891 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/plsda.R
\name{plsda}
\alias{plsda}
\alias{plsda.default}
\alias{predict.plsda}
\alias{splsda.default}
\alias{predict.splsda}
\alias{splsda}
\title{Partial Least Squares and Sparse Partial Least Squares Discriminant Analysis}
\usage{
plsda(x, ...)

\method{predict}{plsda}(object, newdata = NULL, ncomp = NULL,
  type = "class", ...)

\method{plsda}{default}(x, y, ncomp = 2, probMethod = "softmax",
  prior = NULL, ...)
}
\arguments{
\item{x}{a matrix or data frame of predictors}

\item{\dots}{arguments to pass to \code{\link[pls:mvr]{plsr}} or
\code{\link[spls]{spls}}. For \code{splsda}, this is the method for passing
tuning parameters specifications (e.g. \code{K}, \code{eta} or \code{kappa})}

\item{object}{an object produced by \code{plsda}}

\item{newdata}{a matrix or data frame of predictors}

\item{ncomp}{the number of components to include in the model. Predictions
can be made for models with values less than \code{ncomp}.}

\item{type}{either \code{"class"}, \code{"prob"} or \code{"raw"} to produce
the predicted class, class probabilities or the raw model scores,
respectively.}

\item{y}{a factor or indicator matrix for the discrete outcome. If a matrix,
the entries must be either 0 or 1 and rows must sum to one}

\item{probMethod}{either "softmax" or "Bayes" (see Details)}

\item{prior}{a vector or prior probabilities for the classes (only used for
\code{probeMethod = "Bayes"})}
}
\value{
For \code{plsda}, an object of class "plsda" and "mvr". For
\code{splsda}, an object of class \code{splsda}.

The predict methods produce either a vector, matrix or three-dimensional
array, depending on the values of \code{type} of \code{ncomp}. For example,
specifying more than one value of \code{ncomp} with \code{type = "class"}
with produce a three dimensional array but the default specification would
produce a factor vector.
}
\description{
\code{plsda} is used to fit standard PLS models for classification while
\code{splsda} performs sparse PLS that embeds feature selection and
regularization for the same purpose.
}
\details{
If a factor is supplied, the appropriate indicator matrix is created.

A multivariate PLS model is fit to the indicator matrix using the
\code{\link[pls:mvr]{plsr}} or \code{\link[spls]{spls}} function.

Two prediction methods can be used.

The \bold{softmax function} transforms the model predictions to
"probability-like" values (e.g. on [0, 1] and sum to 1). The class with the
largest class probability is the predicted class.

Also, \bold{Bayes rule} can be applied to the model predictions to form
posterior probabilities. Here, the model predictions for the training set
are used along with the training set outcomes to create conditional
distributions for each class. When new samples are predicted, the raw model
predictions are run through these conditional distributions to produce a
posterior probability for each class (along with the prior). This process is
repeated \code{ncomp} times for every possible PLS model. The
\code{\link[klaR]{NaiveBayes}} function is used with \code{usekernel = TRUE}
for the posterior probability calculations.
}
\examples{

\dontrun{
data(mdrr)
set.seed(1)
inTrain <- sample(seq(along = mdrrClass), 450)

nzv <- nearZeroVar(mdrrDescr)
filteredDescr <- mdrrDescr[, -nzv]

training <- filteredDescr[inTrain,]
test <- filteredDescr[-inTrain,]
trainMDRR <- mdrrClass[inTrain]
testMDRR <- mdrrClass[-inTrain]

preProcValues <- preProcess(training)

trainDescr <- predict(preProcValues, training)
testDescr <- predict(preProcValues, test)

useBayes   <- plsda(trainDescr, trainMDRR, ncomp = 5,
                    probMethod = "Bayes")
useSoftmax <- plsda(trainDescr, trainMDRR, ncomp = 5)

confusionMatrix(predict(useBayes, testDescr),
                testMDRR)

confusionMatrix(predict(useSoftmax, testDescr),
                testMDRR)

histogram(~predict(useBayes, testDescr, type = "prob")[,"Active",]
          | testMDRR, xlab = "Active Prob", xlim = c(-.1,1.1))
histogram(~predict(useSoftmax, testDescr, type = "prob")[,"Active",]
          | testMDRR, xlab = "Active Prob", xlim = c(-.1,1.1))


## different sized objects are returned
length(predict(useBayes, testDescr))
dim(predict(useBayes, testDescr, ncomp = 1:3))
dim(predict(useBayes, testDescr, type = "prob"))
dim(predict(useBayes, testDescr, type = "prob", ncomp = 1:3))

## Using spls:
## (As of 11/09, the spls package now has a similar function with
## the same mane. To avoid conflicts, use caret:::splsda to
## get this version)

splsFit <- caret:::splsda(trainDescr, trainMDRR,
                          K = 5, eta = .9,
                          probMethod = "Bayes")

confusionMatrix(caret:::predict.splsda(splsFit, testDescr),
                testMDRR)
}

}
\seealso{
\code{\link[pls:mvr]{plsr}}, \code{\link[spls]{spls}}
}
\keyword{models}