File: reports-c.text

package info (click to toggle)
autoclass 3.3.3-5
  • links: PTS
  • area: main
  • in suites: woody
  • size: 3,376 kB
  • ctags: 1,001
  • sloc: ansic: 16,644; sh: 142; makefile: 105; cpp: 95; csh: 77
file content (240 lines) | stat: -rw-r--r-- 11,434 bytes parent folder | download | duplicates (8)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240

	Contents
	--------
	Generating Reports
        Generating Sigma Contour Values
        Reports Params File Keywords
	Changing Filenames In A Saved Results File


------------------------------------------------------------------------------------
	GENERATING REPORTS
------------------------------------------------------------------------------------

You are provided three standard reports generated from "results" files by
invoking:

% autoclass -reports <.results[-bin] file path> <.search file path> 
         <.r-params file path> 

The standard reports are 

	1) attribute influence values: presents the relative influence or 
	   significance of the data's attributes both globally (averaged over
	   all classes), and locally (specifically for each class). A heuristic
	   for relative class strength is also listed;

	2) cross-reference by case (datum) number: lists the primary class 
	   probability for each datum, ordered by case number.  When
           report_mode = "data", additional lesser class probabilities 
           (greater than or equal to 0.001) are listed for each datum;

	3) cross-reference by class number: for each class the primary class
	   probability and any lesser class probabilities (greater than or
           equal to 0.001) are listed for each datum in the class, ordered by 
           case number. It is also possible to list, for each datum, the values
           of attributes, which you select.

The attribute influence values report attempts to provide relative measures of
the "influence" of the data attributes on the classes found by the classification.
The normalized class strengths, the normalized attribute influence values summed
over all classes, and the individual influence values (I[jkl]) are all only 
relative measures and should be interpreted with more meaning than rank ordering,
but not like anything approaching absolute values.

The reports are output to files whose names and pathnames are taken
from the ".r-params" file pathname.  The report file types (extensions)
are:

	influence values report:  "influ-o-text-n" or "influ-no-text-n"
	cross-reference by case:  "case-text-n"
	cross-reference by class: "class-text-n" 

or, if report_mode is overridden to "data":

	influence values report:  "influ-o-data-n" or "influ-no-data-n"
	cross-reference by case:  "case-data-n"
	cross-reference by class: "class-data-n" 

were n is the classification number from the "results" file.  The first
or best classification is numbered 1, the next best 2, etc.  The default
is to generate reports only for the best classification in the "results"
file.  You can produce reports for other saved classifications by using
report params keywords n_clsfs and clsf_n_list.  The "influ-o-text-n"
file type is the default (order_attributes_by_influence_p = true), and lists
each class's attributes in descending order of attribute influence value.
If the value of order_attributes_by_influence_p is overridden to be false
in the <...>.r-params file, then each class's attributes will be listed
in ascending order by attribute number.  The extension of the file
generated will be "influ-no-text-n".  This method of listing facilitates the
visual comparison of attribute values between classes.

See sample reports in directory ....autoclass-c/sample/:    
                                                                 
	"imports-85.influ-o-text-1"
	"imports-85.case-text-1"
	"imports-85.class-text-1"

which were generated by the form:

% autoclass -reports sample/imports-85c.results-bin sample/imports-85c.search 
        sample/imports-85c.r-params

with xref_class_report_att_list = 2, 5, 6 in the ".r-params" file.

Logging messages will be written to a ".rlog" file, a separate file from that
used to log messages during search runs (".log").

-------------------------------------------------------------------------------
        GENERATING SIGMA CONTOUR VALUES 
-------------------------------------------------------------------------------

The AutoClass C reports provide the capability to compute sigma class contour 
values for specified pairs of real valued attributes, when generating the 
influence values report with the data option (report_mode = "data").  Note
that sigma class contours are not generated from discrete type attributes.

The sigma contours are the two dimensional equivalent of n-sigma error bars 
in one dimension.  Specifically, for two independent attributes the n-sigma 
contour is defined as the ellipse where 

       ((x - xMean) / xSigma)^2 + ((y - yMean) / ySigma)^2 == n 

With covariant attributes, the n-sigma contours are defined identically, in 
the rotated coordinate system of the distribution's principle axes.  Thus 
independent attributes give ellipses oriented parallel with the attribute 
axes, while the axes of sigma contours of covariant attributes are rotated 
about the center determined by the means.  In either case the sigma contour 
represents a line where the class probability is constant, irrespective of 
any other class probabilities.

With three or more attributes the n-sigma contours become k-dimensional
ellipsoidal surfaces.  This code takes advantage of the fact that the 
parallel projection of an n-dimensional ellipsoid, onto any 2-dim plane, 
is an ellipse.  In this simplified case of projecting the single 
sigma ellipsoid onto the coordinate planes, it is also true that the
2-dim covariances of this ellipse are equal to the corresponding
elements of the n-dim ellipsoid's covariances.  The Eigen-system of
the 2-dim covariance then gives the variances w.r.t. the principal
axes of the ellipse, and the rotation that aligns it with the
data.  This represents the best way to display a distribution in the 
marginal plane.


-------------------------------------------------------------------------------
        REPORTS PARAMS FILE KEYWORDS
-------------------------------------------------------------------------------

# PARAMETERS TO AUTOCLASS-REPORTS -- AutoClass C
# ---------------------------------------------------------------
# as the first character makes the line a comment, or
! as the first character makes the line a comment, or
; as the first character makes the line a comment, or
;;; '\n' as the first character (empty line) makes the line a comment.

# to override the following default parameters,
# enter below the line => #!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;
# <parameter_name> = <parameter_value>, or
# <parameter_name> <parameter_value>, or      # separator is a space
# <parameter_name>\tab<parameter_value>.
# note: blanks/spaces are ignored if '=', or '\tab' are separators;
# note: no trailing ';'s.
# ---------------------------------------------------------------
#  DEFAULT PARAMETERS
# ---------------------------------------------------------------
# n_clsfs = 1
!       number of clsfs in the .results file for which to generate reports,
!       starting with the first or "best".

# clsf_n_list = 
!       if specified, this is a one-based index list of clsfs in the clsf
!       sequence read from the .results file.  It overrides "n_clsfs".
!       For example: clsf_n_list = 1, 2 
!           will produce the same output as
!                    n_clsfs = 2
!           but
!                    clsf_n_list = 2
!           will only output the "second best" classification report.

# report_type = "all"
!       type of reports to generate: "all", "influence_values", "xref_case", or
!       "xref_class".

# report_mode = "text"
!       mode of reports to generate. "text" is formatted text layout.  "data"
!       is numerical -- suitable for further processing.

# comment_data_headers_p = false
!       the default value does not insert # in column 1 of most 
!       report_mode = "data" header lines.  If specified as true, the comment 
!       character will be inserted in most header lines.

#  num_atts_to_list = 
!       if specified, the number of attributes to list in influence values report.
!       if not specified, *all* attributes will be listed. 
!       (e.g. num_atts_to_list = 5)

# xref_class_report_att_list = 
!       if specified, a list of attribute numbers (zero-based), whose values will 
!       be output in the "xref_class" report along with the case probabilities.  
!       if not specified, no attributes values will be output. 
!       (e.g. xref_class_report_att_list = 1, 2, 3)

# order_attributes_by_influence_p = true
!       The default value lists each class's attributes in descending order of
!       attribute influence value, and uses ".influ-o-text-n" as the
!       influence values report file type.  If specified as false, then each 
!       class's attributes will be listed in ascending order by attribute number.  
!       The extension of the file generated will be "influ-no-text-n".

# break_on_warnings_p = true
!       The default value asks the user whether to continue or not when data
!       definition warnings are found.  If specified as false, then AutoClass
!       will continue, despite warnings -- the warning will continue to be
!       output to the terminal.

# free_storage_p = true
!       The default value tells AutoClass to free the majority of its allocated
!       storage.  This is not required, and in the case of DEC Alpha's causes
!       core dump.  If specified as false, AutoClass will not attempt to free
!       storage.

# max_num_xref_class_probs = 5
!       Determines how many lessor class probabilities will be printed for the 
!       case and class cross-reference reports.  The default is to print the
!       most probable class probability value and up to 4 lessor class prob-
!       ibilities.  Note this is true for both the "text" and "data" class
!       cross-reference reports, but only true for the "data" case cross-
!       reference report.  The "text" case cross-reference report only has the
!       most probable class probability.

# sigma_contours_att_list = 
!       If specified, a list of real valued attribute indices (from .hd2 file) 
!       will be to compute sigma class contour values, when generating 
!       influence values report with the data option (report_mode = "data"). 
!       If not specified, there will be no sigma class contour output.
!       (e.g. sigma_contours_att_list = 3, 4, 5, 8, 15)



#!#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;
# OVERRIDE PARAMETERS
#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;



-------------------------------------------------------------------------------
	CHANGING FILENAMES IN A SAVED RESULTS FILE
-------------------------------------------------------------------------------

AutoClass caches the data, header, and model file pathnames in the saved
classification structure in the ascii ".results" file.  If the ".results"
and ".search" files are moved to a different directory location, the 
search cannot be successfully restarted if you use absolute pathnames.
Thus it is advantageous to run invoke AutoClass in a parent directory
of the data, header, and model files, so that relative pathnames can
be used.  The pathnames cached will then be relative, and the files
can be moved to a different host or file system and restarted.

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------