1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221
|
AUTOCLASS C VERSION 3.2 NOTES
======================================================================
======================================================================
Documentation:
------------------------------
1. autoclass-c/doc/search-c.text -
Added a new section: 14.0 How to get AutoClass C to Produce
Repeatable Results.
Added information about running AutoClass C with more than 1000
attributes in sections: 10.0 Do I Have Enough Memory and Disk Space?
Changed the behavior of search parameter force_new_search_p in
order to prevent search trials from being inadvertently lost:
if TRUE, will ignore any previous search results, discarding the
existing .search & .results[-bin] files after confirmation by the
user; if FALSE, will continue the search using the existing
.search & .results[-bin] files. The default value of
force_new_search_p is now true.
2. autoclass-c/doc/interpretation-c.text -
Added section headings and a new section entitled: Comparing
Influence Report Class Weights And Class/Case Report Assignments
3. autoclass-c/doc/preparation-c.text -
Added more to section: 1.2.1 SINGLE_NORMAL_CN/CM and
MULTI_NORMAL_CN Models
4. autoclass-c/doc/reports-c.text -
Improved the last pargraph of Generating Sigma Contour Values.
Replace parameters start_sigma_contours_att and
stop_sigma_contours_att with sigma_contours_att_list, to allow
non-contiguous groups of attributes to be specified.
Programming:
------------------------------
1. autoclass-c/prog/globals.c -
Update "G_ac_version" to 3.2.
2. autoclass-c/prog/intf-reports.c -
In INFLUENCE_VALUES_HEADER, change
`fprintf( influence_report_fp, header);' to
`fprintf( influence_report_fp, header, "");', and in
CLASS_WEIGHTS_AND_STRENGTHS and CLASS_DIVERGENCES add args to
output_title fprintf for new page -- this prevents
segmentation faults, when the number of attributes exceeds one
page, while in report_mode = "text".
3. autoclass-c/prog/intf-sigma-contours.c -
In COMPUTE_SIGMA_CONTOUR_FOR_2_ATTS, corrected initialization
of *rotation. This corrects erroneous values of the contour's
rotation.
4. autoclass-c/prog/struct-class.c -
Correct compiler warning "struct-class.c:239: warning:
unused variable `database'".
5. autoclass-c/prog/struct-data.c, globals.h, globals.c, search-control.c -
In EXPAND_DATABASE, use comp_database->n_data rather than
G_s_params_n_data, since G_s_params_n_data does not do the right thing
when expand_database is called during report generation (it reads
the whole file, not just n_data cases). Remove references to
G_s_params_n_data from the 2nd to 4th files.
6. autoclass-c/prog/intf-reports.c -
In XREF_GET_DATA, allocate more storage for instance class
probabilities if there are more than MAX_NUM_XREF_CLASS_PROBS, and
only save for printing a maximum of MAX_NUM_XREF_CLASS_PROBS classes.
IMPORTANT NOTE: This bug fix means that for any previous reports
generated by AutoClass C, any data base instance
which has five class probability entries in the class cross-reference
report, and 1.0 minus the sum of the five probabilities is greater
than the largest of them, is in the WRONG CLASS! Re-run the reports
with this version!
7. autoclass-c/prog/autoclass.c -
Print the AutoClass C version when the user invokes AutoClass
with no arguments: % autoclass
8. autoclass-c/load-ac -
Specified define flags for SunOS gcc and Solaris gcc compilations
to prevent compiler warnings. Added IRIX 6.4 compatibility.
9. autoclass-c/prog/autoclass.h -
For gcc under SunOS, include function prototypes for *rand48 functions,
to prevent compiler warnings.
10. autoclass-c/prog/intf-reports.c -
Add descriptive text for each influence value class parameter for
reports with parameter report_mode = "text".
11. autoclass-c/prog/autoclass.make.solaris.cc -
Corrected optimization flag.
12. autoclass-c/prog/intf-reports.c -
In FORMAT_REAL_ATTRIBUTE, correct correlation matrices print-out
for non-contiguous model term attributes, and print matrices only once,
after all class attributes are listed.
13. autoclass-c/prog/search-control.c -
In AUTOCLASS_SEARCH, if force_new_search_p is false, exit if there
is no <...>.results[-bin] file. Make TRUE the default for
force_new_search_p.
14. autoclass-c/prog/intf-reports.c -
In PRINT_ATTRIBUTE_HEADER, remove references to INTEGER attribute type.
15. autoclass-c/prog/getparams.c -
In GETPARAMS, correct logic so that missing "line feed" on last line
of the file will be read properly, rather than getting:
ERROR: line read exceeds 100 characters: <.....>.
In GETPARAMS, correct logic so that an empty integer list (e.g.
start_j_list =) may be entered in the .s-params file. This is needed
for a restart search situation when it is necessay to peel off as many
classes from the start_j_list as were already done by the previous run.
If all of the start_j_list was done already, then an empty list is
required.
16. autoclass-c/prog/io-read-data.c, io-results.c, io-results-bin.c -
In READ_DATA, EXPAND_CLSF_WTS, and LOAD_CLASS_DS_S add checks for
"out of memory" returns from malloc and realloc.
17. autoclass-c/prog/io-results.c -
In MAKE_AND_VALIDATE_PATHNAME, VALIDATE_RESULTS_PATHNAME,
VALIDATE_DATA_PATHNAME, and GET_CLSF_SEQ change strchr to strrchr
to handle `../filename.extension'
18. autoclass-c/prog/autoclass.h, predictions.c, search-basic.c, &
search-control.c -
Notify the user with a warning messasge and an option to exit from
an initial classification run, if the data set size is greater than
1000. The messasge is "WARNING: the default start_j_list may not
find the correct number of classes in your data set!".
19. autoclass-c/prog/autoclass.h, autoclass.c, & intf-reports.c -
Write -reports option screen output to log file.
20. autoclass-c/prog/io-read-data.c -
In FIND_DISCRETE_STATS, when the number of discrete value
translators is less than attribute definition range, reduce the
range and output an advisory, rather than outputting warning
message and asking the user whether to proceed or not.
The above change was REMOVED, since it caused an incompatablility with
previous results files: "ERROR: expand_database found unmatched common
attributes defs in <.results[-bin] file> and ........
21. autoclass-c/prog/global.h, global.c, search-control-2.c, & search-control.c -
Warn user of search trials which do not converge, which means that
their number of try cycles reached the value of the "max_cycles" search
parameter. Do this by printing a warning message after the trial completes.
Also after the "SUMMARY OF n BEST RESULTS" at the conclusion of each
run, print "SUMMARY OF TRY CONVERGENCE" for the n best results.
22. autoclass-c/prog/model-multi-normal-cn.c -
It was recently brought to our attention that the multi-normal
model, with more than about 10 attributes and several thousand
instances, would consistently run to the the max_duration or
max_n_tries limit, regardless of how large those limits were.
Suitably instrumented experiments showed that EM (expectation
maximization) was actually oscillating. The problem was traced
to a conceptual error in the underflow limiting code that
constrains the estimation of empirical standard deviations.
This has been corrected. However users should be alert for,
and report, any further problems of this nature.
23. autoclass-c/prog/autoclass.h, intf-reports.c -
For MNcn attributes, do not sort them within their model term
when order_attributes_by_influence_p = false. The outputing of
MNcn correlation matrices after last class attribute, instead of
after each term, is now done by a call to
GENERATE_MNCN_CORRELATION_MATRICES from
AUTOCLASS_CLASS_INFLUENCE_VALUES_REPORT.
24. autoclass-c/prog/intf-reports.c, intf-sigma-contours.c -
Replace report parameters start_sigma_contours_att and
stop_sigma_contours_att with sigma_contours_att_list, to allow
non-contiguous groups of attributes to be specified.
Check for attribute indices of reports parameter
sigma_contours_att_list which are declared "ignore" by the .model file.
Prevents segmentation fault.
Correct erroneous rotations for non-covariant pairs of attributes
modeled in two different covariant normal terms (the rotations in these
cases should be 0.0).
25. autoclass-c/prog/intf-reports.c -
Previously when specifying report_type = "xref_case" or
report_type = "xref_class" along with n_clsfs > 1 or clsf_n_list with
more than 1 list element, the .case-text-n or .class-text-n data would
be identical. Sometimes segmentation faults would occur. This has
been corrected. This was not a problem for report_type = "all"
(the default). Also when using the default for report_type ("all"),
previously the memory allocated for each classification's cross
reference was not deallocated after each classification was processed.
It is now properly deallocated.
======================================================================
|