File: mda.Rd

package info (click to toggle)
r-cran-mda 0.4-10-2
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 780 kB
  • sloc: fortran: 2,597; f90: 15; makefile: 2
file content (179 lines) | stat: -rw-r--r-- 8,294 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
\name{mda}
\alias{mda}
\alias{print.mda}
\title{Mixture Discriminant Analysis}
\description{
  Mixture discriminant analysis.
}
\usage{
mda(formula, data, subclasses, sub.df, tot.df, dimension, eps,
    iter, weights, method, keep.fitted, trace, \dots)
}
\arguments{
  \item{formula}{of the form \code{y~x} it describes the response and
    the predictors.  The formula can be more complicated, such as
    \code{y~log(x)+z} etc (see \code{\link{formula}} for more details).
    The response should be a factor representing the response variable,
    or any vector that can be coerced to such (such as a logical
    variable).}
  \item{data}{data frame containing the variables in the formula
    (optional).}
  \item{subclasses}{Number of subclasses per class, default is 3.  Can be
    a vector with a number for each class.}
  \item{sub.df}{If subclass centroid shrinking is performed, what is the
    effective degrees of freedom of the centroids per class.  Can be a
    scalar, in which case the same number is used for each class, else a
    vector.}
  \item{tot.df}{The total df for all the centroids can be specified
    rather than separately per class.}
  \item{dimension}{The dimension of the reduced model.  If we know our
    final model will be confined to a discriminant subspace (of the
    subclass centroids), we can specify this in advance and have the EM
    algorithm operate in this subspace.}
  \item{eps}{A numerical threshold for automatically truncating the
    dimension.}
  \item{iter}{A limit on the total number of iterations,  default is 5.}
  \item{weights}{\emph{NOT} observation weights!  This is a special
    weight structure, which for each class assigns a weight (prior
    probability) to each of the observations in that class of belonging
    to one of the subclasses.  The default is provided by a call to
    \code{mda.start(x, g, subclasses, trace, \dots)} (by this time
    \code{x} and \code{g} are known).  See the help for
    \code{\link{mda.start}}.  Arguments for \code{mda.start} can be
    provided via the \code{\dots} argument to mda, and the
    \code{weights} argument need never be accessed.  A previously fit
    mda object can be supplied, in which case the final subclass
    \code{responsibility} weights are used for \code{weights}.  This 
    allows the iterations from a previous fit to be continued.}
  \item{method}{regression method used in optimal scaling.  Default is
    linear regression via the function \code{polyreg}, resulting in the
    usual mixture model.  Other possibilities are \code{mars} and 
    \code{bruto}.  For penalized mixture discriminant models
    \code{gen.ridge} is appropriate.}
  \item{keep.fitted}{a logical variable, which determines whether the
    (sometimes large) component \code{"fitted.values"} of the \code{fit}
    component of the returned \code{mda} object should be kept.  The
    default is \code{TRUE} if \code{n * dimension < 5000}.} 
  \item{trace}{if \code{TRUE}, iteration information is printed.  Note
    that the deviance reported is for the posterior class likelihood,
    and not the full likelihood, which is used to drive the EM algorithm
    under \code{mda}.  In general the latter is not available.}
  \item{\dots}{additional arguments to \code{mda.start} and to
    \code{method}.}
}
\value{
  An object of class \code{c("mda", "fda")}.  The most useful extractor
  is \code{predict}, which can make many types of predictions from this
  object.  It can also be plotted, and any functions useful for fda
  objects will work here too, such as \code{confusion} and \code{coef}.

  The object has the following components:
  \item{percent.explained}{the percent between-group variance explained
    by each dimension (relative to the total explained.)}
  \item{values}{optimal scaling regression sum-of-squares for each
    dimension (see reference).}
  \item{means}{subclass means in the discriminant space.  These are also
    scaled versions of the final theta's or class scores, and can be
    used in a subsequent call to \code{mda} (this only makes sense if
    some columns of theta are omitted---see the references)}
  \item{theta.mod}{(internal) a class scoring matrix which allows
    \code{predict} to work properly.}
  \item{dimension}{dimension of discriminant space.}
  \item{sub.prior}{subclass membership priors, computed in the fit.  No
    effort is currently spent in trying to keep these above a threshold.}
  \item{prior}{class proportions for the training data.}
  \item{fit}{fit object returned by \code{method}.}
  \item{call}{the call that created this object (allowing it to be
    \code{update}-able).}
  \item{confusion}{confusion matrix when classifying the training data.}
  \item{weights}{These are the subclass membership probabilities for
    each member of the training set; see the weights argument.}
  \item{assign.theta}{a pointer list which identifies which elements of
    certain lists belong to individual classes.}
  \item{deviance}{The multinomial log-likelihood of the fit.  Even though
    the full log-likelihood drives the iterations, we cannot in general
    compute it because of the flexibility of the \code{method} used.
    The deviance can increase with the iterations, but generally does not.}

  The \code{method} functions are required to take arguments \code{x}
  and \code{y} where both can be matrices, and should produce a matrix
  of \code{fitted.values} the same size as \code{y}.  They can take
  additional arguments \code{weights} and should all have a \code{\dots}
  for safety sake.  Any arguments to method() can be passed on via the
  \code{\dots} argument of \code{mda}.  The default method
  \code{polyreg} has a \code{degree} argument which allows polynomial
  regression of the required total degree.  See the documentation for
  \code{\link{predict.fda}} for further requirements of \code{method}.
  The package \code{earth} is suggested for this package as well;
  \code{earth} is a more detailed implementation of the mars model, and
  works as a \code{method} argument.
  
  The function \code{mda.start} creates the starting weights; it takes
  additional arguments which can be passed in via the \code{\dots}
  argument to \code{mda}.  See the documentation for \code{mda.start}.
}
\author{
  Trevor Hastie and Robert Tibshirani
}
\seealso{
  \code{\link{predict.mda}},
  \code{\link{mars}},
  \code{\link{bruto}},
  \code{\link{polyreg}},
  \code{\link{gen.ridge}},
  \code{\link{softmax}},
  \code{\link{confusion}}
  %%\code{\link{coef.fda}},
  %%\code{\link{plot.fda}}
}
\references{
  ``Flexible Disriminant Analysis by Optimal Scoring'' by Hastie,
  Tibshirani and Buja, 1994, JASA, 1255-1270.

  ``Penalized Discriminant Analysis'' by Hastie, Buja and Tibshirani, 1995,
  Annals of Statistics, 73-102
    
  ``Discriminant Analysis by Gaussian Mixtures'' by Hastie and
  Tibshirani, 1996, JRSS-B, 155-176.
  
  ``Elements of Statisical Learning - Data Mining, Inference and
  Prediction'' (2nd edition, Chapter 12) by Hastie, Tibshirani and
  Friedman, 2009, Springer
}
\examples{
data(iris)
irisfit <- mda(Species ~ ., data = iris)
irisfit
## Call:
## mda(formula = Species ~ ., data = iris)
##
## Dimension: 4
##
## Percent Between-Group Variance Explained:
##     v1     v2     v3     v4
##  96.02  98.55  99.90 100.00
##
## Degrees of Freedom (per dimension): 5
##
## Training Misclassification Error: 0.02 ( N = 150 )
##
## Deviance: 15.102

data(glass)
# random sample of size 100
samp <- c(1, 3, 4, 11, 12, 13, 14, 16, 17, 18, 19, 20, 27, 28, 31,
          38, 42, 46, 47, 48, 49, 52, 53, 54, 55, 57, 62, 63, 64, 65,
          67, 68, 69, 70, 72, 73, 78, 79, 83, 84, 85, 87, 91, 92, 94,
          99, 100, 106, 107, 108, 111, 112, 113, 115, 118, 121, 123,
          124, 125, 126, 129, 131, 133, 136, 139, 142, 143, 145, 147,
          152, 153, 156, 159, 160, 161, 164, 165, 166, 168, 169, 171,
          172, 173, 174, 175, 177, 178, 181, 182, 185, 188, 189, 192,
          195, 197, 203, 205, 211, 212, 214) 
glass.train <- glass[samp,]
glass.test <- glass[-samp,]
glass.mda <- mda(Type ~ ., data = glass.train)
predict(glass.mda, glass.test, type="post") # abbreviations are allowed
confusion(glass.mda,glass.test)
}
\keyword{classif}
% Converted by Sd2Rd version 0.3-3.