1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240
|
Contents
--------
Generating Reports
Generating Sigma Contour Values
Reports Params File Keywords
Changing Filenames In A Saved Results File
------------------------------------------------------------------------------------
GENERATING REPORTS
------------------------------------------------------------------------------------
You are provided three standard reports generated from "results" files by
invoking:
% autoclass -reports <.results[-bin] file path> <.search file path>
<.r-params file path>
The standard reports are
1) attribute influence values: presents the relative influence or
significance of the data's attributes both globally (averaged over
all classes), and locally (specifically for each class). A heuristic
for relative class strength is also listed;
2) cross-reference by case (datum) number: lists the primary class
probability for each datum, ordered by case number. When
report_mode = "data", additional lesser class probabilities
(greater than or equal to 0.001) are listed for each datum;
3) cross-reference by class number: for each class the primary class
probability and any lesser class probabilities (greater than or
equal to 0.001) are listed for each datum in the class, ordered by
case number. It is also possible to list, for each datum, the values
of attributes, which you select.
The attribute influence values report attempts to provide relative measures of
the "influence" of the data attributes on the classes found by the classification.
The normalized class strengths, the normalized attribute influence values summed
over all classes, and the individual influence values (I[jkl]) are all only
relative measures and should be interpreted with more meaning than rank ordering,
but not like anything approaching absolute values.
The reports are output to files whose names and pathnames are taken
from the ".r-params" file pathname. The report file types (extensions)
are:
influence values report: "influ-o-text-n" or "influ-no-text-n"
cross-reference by case: "case-text-n"
cross-reference by class: "class-text-n"
or, if report_mode is overridden to "data":
influence values report: "influ-o-data-n" or "influ-no-data-n"
cross-reference by case: "case-data-n"
cross-reference by class: "class-data-n"
were n is the classification number from the "results" file. The first
or best classification is numbered 1, the next best 2, etc. The default
is to generate reports only for the best classification in the "results"
file. You can produce reports for other saved classifications by using
report params keywords n_clsfs and clsf_n_list. The "influ-o-text-n"
file type is the default (order_attributes_by_influence_p = true), and lists
each class's attributes in descending order of attribute influence value.
If the value of order_attributes_by_influence_p is overridden to be false
in the <...>.r-params file, then each class's attributes will be listed
in ascending order by attribute number. The extension of the file
generated will be "influ-no-text-n". This method of listing facilitates the
visual comparison of attribute values between classes.
See sample reports in directory ....autoclass-c/sample/:
"imports-85.influ-o-text-1"
"imports-85.case-text-1"
"imports-85.class-text-1"
which were generated by the form:
% autoclass -reports sample/imports-85c.results-bin sample/imports-85c.search
sample/imports-85c.r-params
with xref_class_report_att_list = 2, 5, 6 in the ".r-params" file.
Logging messages will be written to a ".rlog" file, a separate file from that
used to log messages during search runs (".log").
-------------------------------------------------------------------------------
GENERATING SIGMA CONTOUR VALUES
-------------------------------------------------------------------------------
The AutoClass C reports provide the capability to compute sigma class contour
values for specified pairs of real valued attributes, when generating the
influence values report with the data option (report_mode = "data"). Note
that sigma class contours are not generated from discrete type attributes.
The sigma contours are the two dimensional equivalent of n-sigma error bars
in one dimension. Specifically, for two independent attributes the n-sigma
contour is defined as the ellipse where
((x - xMean) / xSigma)^2 + ((y - yMean) / ySigma)^2 == n
With covariant attributes, the n-sigma contours are defined identically, in
the rotated coordinate system of the distribution's principle axes. Thus
independent attributes give ellipses oriented parallel with the attribute
axes, while the axes of sigma contours of covariant attributes are rotated
about the center determined by the means. In either case the sigma contour
represents a line where the class probability is constant, irrespective of
any other class probabilities.
With three or more attributes the n-sigma contours become k-dimensional
ellipsoidal surfaces. This code takes advantage of the fact that the
parallel projection of an n-dimensional ellipsoid, onto any 2-dim plane,
is an ellipse. In this simplified case of projecting the single
sigma ellipsoid onto the coordinate planes, it is also true that the
2-dim covariances of this ellipse are equal to the corresponding
elements of the n-dim ellipsoid's covariances. The Eigen-system of
the 2-dim covariance then gives the variances w.r.t. the principal
axes of the ellipse, and the rotation that aligns it with the
data. This represents the best way to display a distribution in the
marginal plane.
-------------------------------------------------------------------------------
REPORTS PARAMS FILE KEYWORDS
-------------------------------------------------------------------------------
# PARAMETERS TO AUTOCLASS-REPORTS -- AutoClass C
# ---------------------------------------------------------------
# as the first character makes the line a comment, or
! as the first character makes the line a comment, or
; as the first character makes the line a comment, or
;;; '\n' as the first character (empty line) makes the line a comment.
# to override the following default parameters,
# enter below the line => #!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;
# <parameter_name> = <parameter_value>, or
# <parameter_name> <parameter_value>, or # separator is a space
# <parameter_name>\tab<parameter_value>.
# note: blanks/spaces are ignored if '=', or '\tab' are separators;
# note: no trailing ';'s.
# ---------------------------------------------------------------
# DEFAULT PARAMETERS
# ---------------------------------------------------------------
# n_clsfs = 1
! number of clsfs in the .results file for which to generate reports,
! starting with the first or "best".
# clsf_n_list =
! if specified, this is a one-based index list of clsfs in the clsf
! sequence read from the .results file. It overrides "n_clsfs".
! For example: clsf_n_list = 1, 2
! will produce the same output as
! n_clsfs = 2
! but
! clsf_n_list = 2
! will only output the "second best" classification report.
# report_type = "all"
! type of reports to generate: "all", "influence_values", "xref_case", or
! "xref_class".
# report_mode = "text"
! mode of reports to generate. "text" is formatted text layout. "data"
! is numerical -- suitable for further processing.
# comment_data_headers_p = false
! the default value does not insert # in column 1 of most
! report_mode = "data" header lines. If specified as true, the comment
! character will be inserted in most header lines.
# num_atts_to_list =
! if specified, the number of attributes to list in influence values report.
! if not specified, *all* attributes will be listed.
! (e.g. num_atts_to_list = 5)
# xref_class_report_att_list =
! if specified, a list of attribute numbers (zero-based), whose values will
! be output in the "xref_class" report along with the case probabilities.
! if not specified, no attributes values will be output.
! (e.g. xref_class_report_att_list = 1, 2, 3)
# order_attributes_by_influence_p = true
! The default value lists each class's attributes in descending order of
! attribute influence value, and uses ".influ-o-text-n" as the
! influence values report file type. If specified as false, then each
! class's attributes will be listed in ascending order by attribute number.
! The extension of the file generated will be "influ-no-text-n".
# break_on_warnings_p = true
! The default value asks the user whether to continue or not when data
! definition warnings are found. If specified as false, then AutoClass
! will continue, despite warnings -- the warning will continue to be
! output to the terminal.
# free_storage_p = true
! The default value tells AutoClass to free the majority of its allocated
! storage. This is not required, and in the case of DEC Alpha's causes
! core dump. If specified as false, AutoClass will not attempt to free
! storage.
# max_num_xref_class_probs = 5
! Determines how many lessor class probabilities will be printed for the
! case and class cross-reference reports. The default is to print the
! most probable class probability value and up to 4 lessor class prob-
! ibilities. Note this is true for both the "text" and "data" class
! cross-reference reports, but only true for the "data" case cross-
! reference report. The "text" case cross-reference report only has the
! most probable class probability.
# sigma_contours_att_list =
! If specified, a list of real valued attribute indices (from .hd2 file)
! will be to compute sigma class contour values, when generating
! influence values report with the data option (report_mode = "data").
! If not specified, there will be no sigma class contour output.
! (e.g. sigma_contours_att_list = 3, 4, 5, 8, 15)
#!#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;
# OVERRIDE PARAMETERS
#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;#!;
-------------------------------------------------------------------------------
CHANGING FILENAMES IN A SAVED RESULTS FILE
-------------------------------------------------------------------------------
AutoClass caches the data, header, and model file pathnames in the saved
classification structure in the ascii ".results" file. If the ".results"
and ".search" files are moved to a different directory location, the
search cannot be successfully restarted if you use absolute pathnames.
Thus it is advantageous to run invoke AutoClass in a parent directory
of the data, header, and model files, so that relative pathnames can
be used. The pathnames cached will then be relative, and the files
can be moved to a different host or file system and restarted.
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
|