File: prediction-c.text

package info (click to toggle)
autoclass 3.3.4-6
  • links: PTS
  • area: main
  • in suites: lenny
  • size: 3,844 kB
  • ctags: 994
  • sloc: ansic: 16,674; makefile: 123; sh: 98; cpp: 95; csh: 77
file content (73 lines) | stat: -rw-r--r-- 3,108 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
		PREDICTION USING CLASSIFICATIONS

    Classifications can be used to predict class membership for new cases.  
So in addition to possibly giving you some insight into the structure behind 
your data, you can now use Autoclass directly to make predictions, and compare 
Autoclass to other learning systems.

    This technique for predicting class probabilities is applicable to all
attributes, regardless of data type/sub_type or likelihood model term type.

    In the event that the class membership of a data case does not exceed
0.0099999 for any of the "training" classes, the following message will appear
in the screen output for each case:
        xref_get_data: case_num xxx => class 9999
Class 9999 members will appear in the "case" and "class" cross-reference 
reports with a class membership of 1.0.


Cautionary Points:

    The usual way of using Autoclass is to put all of your data in a data_file,
describe that data with model and header files, and run "autoclass -search".  
Now, instead of one data_file you will have two, a training_data_file and a
test_data_file.

    It is most important that both databases have the same AutoClass internal
representation.  Should this not be true, AutoClass will exit, or possibly in
in some situations, crash.  The prediction mode is designed to hopefully
direct the user into conforming to this requirement.


Preparation:

    Prediction requires having a training classification and a test database.
The training classification is generated by the running of "autoclass -search" 
on the training data_file ("data/soybean/soyc.db2"), for example:

    % autoclass -search data/soybean/soyc.db2 data/soybean/soyc.hd2 
        data/soybean/soyc.model data/soybean/soyc.s-params

This will produce "soyc.results-bin" and "soyc.search".  Then create a 
"reports" parameter file, such as "soyc.r-params" (see "reports-c.text"),
and run AutoClass in "reports" mode, such as:

    % autoclass -reports data/soybean/soyc.results-bin 
        data/soybean/soyc.search data/soybean/soyc.r-params

This will generate class and case cross-reference files, and an influence
values file.  The file names are based on the ".r-params" file name:

        data/soybean/soyc.class-text-1
        data/soybean/soyc.case-text-1
        data/soybean/soyc.influ-text-1

These will describe the classes found in the training_data_file.
Now this classification can be used to predict the probabilistic class
membership of the test_data_file cases ("data/soybean/soyc-predict.db2")
in the training_data_file classes.

    % autoclass -predict data/soybean/soyc-predict.db2
        data/soybean/soyc.results-bin data/soybean/soyc.search
        data/soybean/soyc.r-params

This will generate class and case cross-reference files for the test_data_file
cases predicting their probabilistic class memberships in the training_data_file
classes.  The file names are based on the ".db2" file name:

        data/soybean/soyc-predict.class-text-1
        data/soybean/soyc-predict.case-text-1

--------------------------------------------------------------------------------