File: multimix.1

package info (click to toggle)
multimix 19981218-12
  • links: PTS
  • area: main
  • in suites: squeeze
  • size: 1,652 kB
  • ctags: 238
  • sloc: makefile: 66; sh: 56; ansic: 30
file content (254 lines) | stat: -rw-r--r-- 8,300 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
.\"						-*- nroff -*-
.\" This manual page was written by Jim Van Zandt <jrv@debian.org>
.\" and is hereby placed in the public domain.
.TH MULTIMIX 1 "December 10, 2001"
.SH NAME
multimix, multimix-prep \- automatically discover classes in data
.SH SYNOPSIS
.B multimix
.sp
.B multimix-prep
.SH DESCRIPTION
\fBmultimix\fP fits a mixture of multivariate distributions to a set
of observations using the EM algorithm. The data file may contain both
categorical and continuous variables.
.P
\fBmultimix\fP prompts for the names of the data and parameter files.
.P
The assignment of the observations to groups and the posterior
probabilities are written to \fIGROUPS.OUT\fP.  Parameter estimates,
convergence information, and group assignment probabilities are
written to \fIGENERAL.OUT\fP.
.P
If \fBmultimix\fP does not converge after \fIITER\fP=200 iterations,
the estimates of the parameters will be written to
\fIEMPARAMEST.OUT\fP. This file can then be used as the parameter
input file for \fBmultimix\fP if desired.
.P
\fBmultimix\fP is limited to a maximum of
.nf
       1500 observations (\fIIOB\fP=1500)
       6 groups (\fIIK6\fP=6)
       15 attributes and partition cells (\fIIP15\fP=15)
       10 levels of categories (\fIIM10\fP=10)
       200 iterations to convergence (\fIITER\fP=200)
.fi
Recompilation is required to change these parameters.
.SH "DATA FILE"
The data file has one line for each observation.  Each line has one
entry for each variable.  Only the first \fINVAR\fP entries on each
line are read.
.SH "PARAMETER FILE"
The parameter file contains free field values which describe the data
and the fitting models.  \fBmultimix-prep\fP will ask the user a
series of questions and write a suitable parameter file.  If the
starting point for the fit is given by specifying initial group
assignments for the observations, then the user should prepare the
file of group assignments before starting \fBmultimix-prep\fP.  The
file format is simple: the \fII\fPth line of the file contains an
integer between 1 and \fING\fP giving the group number of the
\fII\fPth observation.  (The experienced user finds it faster to edit
old parameter files into new ones.)
.P
\fBmultimix\fP requires variables in a partition to be stored
contiguously. Hence the data is read in with the variable order being
specified by \fIJP\fP(J). \fIINTYPE\fP(J) and \fINCAT\fP(J) both refer
to the rearranged data.
.P
The first five values are 
.TP
.I NG
The number of groups (distributions) in the finite mixture to be fitted.
.TP
.I NOBS
The number of observations.
.TP
.I NVAR
The number of attributes.
.TP
.I NPAR
The number of partition cells (sets of attributes associated within
each distribution).
.TP
.I ISPEC
Flag indicating how the starting point is specified for the fit:
.RS
\fB       1\fP   Initial parameter estimates are specified.
.P
\fB       2\fP   Observations are assigned to groups.
.RE
Next come eight arrays of data:
.TP
.I JP
.IR JP ( J )
is the column of the data array into which the
\fIJ\fPth attribute of the data file will be stored, where \fIJ\fP
varies from 1 to \fINVAR\fP.  For example, suppose we want the third
attribute in the first column, attribute 4 in the second column,
attribute 7 in the 3rd column, and then attributes 1, 2, 5, and 6.
Then JP(J) = 4 5 1 2 6 7 3, for J=1,...,7.
.TP
.I IP
.IR IP ( L )
is the number of attributes in the \fIL\fPth
partition cell, \fIL\fP=1,...,\fINPAR\fP.
.TP
.I IPC
.IR IPC ( L )
is the number of continuous attributes in the
\fIL\fPth partition cell.
.TP
.I ISV
.IR ISV ( L )
gives the index \fIJ\fP of the start of partition
cell \fIL\fP.  E.g. if attributes 6, 7, and 8 are in the same
partition cell \fIL\fP, then ISV(L)=6 and IEV(L)=8.
.TP
.I IEV
.IR IEV ( L )
gives the index \fIJ\fP of the end of partition cell \fIL\fP.
.TP
.I IPARTYPE
.IR IPARTYPE ( L )
is an indicator giving the type of model for partition \fIL\fP:
.RS
\fB       1\fP   for a categorical model.
.P
\fB       2\fP   for a multivariate normal model.
.P
\fB       3\fP   for a location model.
.RE
.TP
.I IVARTYPE
.IR IVARTYPE ( J )
is an indicator for the type of attribute
\fIJ\fP:
.RS
\fB       1\fP   for a categorical attribute.
.P
\fB       2\fP   for a multivariate normal attribute;
.P
\fB       3\fP   for a categorical attribute in a location model;
.P
\fB       4\fP   for a multivariate normal attribute in a location model.
.RE
.TP
.I NCAT
.IR NCAT ( J )
is the number of categories for the \fIJ\fPth categorical attribute.
For continuous attributes, \fINCAT\fP(\fIJ\fP) should be 0.
.PP
If observations are assigned to groups (\fIISPEC\fP=2), then those
assignments are next:
.TP
.I IGRP
.IR IGRP ( I )
is the index of the group that observation \fII\fP
is in.
.PP
If observations are not assigned to groups (\fIISPEC\fP=1), then
estimates of the parameters are next:
.TP
.I PI
.IR PI ( K )
is the estimated mixing proportion for group \fIK\fP
.RI ( K "=1,...," NG ).
.PP
The parameters for each group depend on the type of attribute:
.TP
.I THETA
.IR THETA ( K , J , M )
is the estimated probability that the \fIJ\fPth categorical attribute
is at level \fIM\fP, given that in group \fIK\fP.  Repeat for each
attribute, 
.IR J = ISV ( L ), IEV ( L ).
\fBcategorical attributes only\fP
.TP
.I EMU
.IR EMU ( K , L , J )
is the estimated mean vector for group \fIK\fP, partition cell \fIL\fP
and attribute \fIJ\fP.
\fBmultivariate normal model only\fP
.TP
.I THETA
.IR THETA ( K , J , M )
is the estimated probability that the \fIJ\fPth categorical attribute
in the location model is at level \fIM\fP, given that in group
\fIK\fP.  
\fBcategorical attributes only\fP
.TP
.I EMUL
.IR EMUL ( K , L , J , M )
is the estimated mean vector for group \fIK\fP, partition cell \fIL\fP
and attribute \fIJ\fP, at the \fIM\fPth level of the categorical
attribute in the location model.
\fBmultivariate normal model only\fP
.TP
.I VARIX
.RI (( VARIX ( K , L , I , J ), J =1, IPC ( L )), " I" =1, IPC ( L ))
An entry in \fIVARIX\fP is the estimated covariance between attributes
\fII\fP and \fIJ\fP for group \fIK\fP, partition cell \fIL\fP, where 
.IR I =1,..., IPC ( L "), and " J =1,..., IPC ( L ).
.PP
The required parameters are read in for each partition cell,
.IR L =1,..., \fINPAR\fP .
For example, if the attributes within the partition cell are all
categorical, that is,
.IR ITYPE ( L )=1,
then 
.IR THETA ( K , J , M ),
for 
.IR M =1,..., NCAT ( J )
is required for the attribute in that partition cell.
.PP
If the attributes within the partition cell are continuous,
multivariate normal attributes, that is
.IR ITYPE ( L )=2,
then estimates of 
.IR EMU ( K , L , J )
are required for each attribute.
.PP
If the attributes within the partition cell follow the location model,
that is,
.IR ITYPE ( L )=3,
then
.IR THETA ( K , J , M ), M =1,..., NCAT ( J )
is required for the categorical attribute, and
.IR EMUL ( K , L , J , M), M =1,..., IM ( L )
is required for each continuous multivariate normal attribute.  (Note
that
.IR IM ( L )
is the number of categories of the categorical attribute associated
with the location model.)
.PP
The estimates are read in first for group 1, then for group 2, etc.
.SH EXAMPLES
See \fI/usr/share/doc/multimix/examples\fP.
.SH FILES
\fIGROUPS.OUT\fP \fBmultimix\fP output: the assignment of the
observations to groups and the posterior probabilities.  If
observations were initially assigned to groups (\fIISPEC\fP=2), these
assignments may be different.  Some are likely to be different if the
fitting distributions overlap.
.P
\fIGENERAL.OUT\fP \fBmultimix\fP output: parameter estimates,
convergence information, and group assignment probabilities.
.P
\fIEMPARAMEST.OUT\fP \fBmultimix\fP output on failure to converge:
current parameter estimates.  This file can then be used as the
parameter input file for \fBmultimix\fP if desired.
.SH AUTHORS
Lynette A. Hunt <lah@waikato.ac.nz> and Murray Jorgensen
<maj@waikato.ac.nz>.
.\" This manual page was written by James R. Van Zandt
.\" <jrv@debian.org>, for the Debian GNU/Linux system (but may be
.\" used by others).
.SH "SEE ALSO"
.nf
.I /usr/share/doc/multimix/paper.ps.gz
.I /usr/share/doc/multimix/talk.ps.gz
.I /usr/share/doc/multimix/notes.ps.gz
.I /usr/share/doc/multimix/PPAPER.ps.gz
.I /usr/share/doc/multimix/alltables.ps.gz
.BR autoclass (1).
.fi