File: cssutil.1

package info (click to toggle)
crm114 20060704a-5
  • links: PTS
  • area: main
  • in suites: etch, etch-m68k
  • size: 1,848 kB
  • ctags: 630
  • sloc: ansic: 17,713; sh: 536; makefile: 351; lisp: 208
file content (240 lines) | stat: -rw-r--r-- 6,062 bytes parent folder | download | duplicates (10)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
.\" Copyright (c) 2004 William S\&. Yerazunis\&. Manpage typesetting by Joost van Baal and Shalendra Chhabra
.TH "cssutil" 1 "19 Aug 2004" "cssutil 20040816\&.BlameClockworkOrange-auto\&.3" "CRM114"
.po 2m
.de ZI
.\" Zoem Indent/Itemize macro I.
.br
'in +\\$1
.nr xa 0
.nr xa -\\$1
.nr xb \\$1
.nr xb -\\w'\\$2'
\h'|\\n(xau'\\$2\h'\\n(xbu'\\
..
.de ZJ
.br
.\" Zoem Indent/Itemize macro II.
'in +\\$1
'in +\\$2
.nr xa 0
.nr xa -\\$2
.nr xa -\\w'\\$3'
.nr xb \\$2
\h'|\\n(xau'\\$3\h'\\n(xbu'\\
..
.if n .ll -2m
.am SH
.ie n .in 4m
.el .in 8m
..
.SH NAME
\fBcssutil\fP \- utility to measure and manipulate CRM114 statistics files\&.
.SH SYNOPSIS

\fBcssutil\fP
[\&.css file]
[OPTIONS]
.SH WARNING
This man page is taken from an older CRM114 version.  It is provided as a
convenience to Debian users and may not be up-to-date.  If you would like to
update it, please send appropriate patches to the Debian bug tracking system.
.SH OPTIONS

.ZI 3m "\fB-h\fP"
\&
.br
print basic help
.in -3m

.ZI 3m "\fB-b\fP"
\&
.br
brief - print only a summary of the statistics of the
\&.css file (otherwise, prints a full list of how many bins are in each counter
state)
.in -3m

.ZI 3m "\fB-q\fP"
\&
.br
quiet mode; no warning messages
.in -3m

.ZI 3m "\fB-r\fP"
\&
.br
report then exit (no menu)\&. The default if -r is not
specified is to drop into a command-menu based system\&.
.in -3m

.ZI 3m "\fB-s\fP"
\&
.br
if no css file found, create new one with this
many buckets\&. Default is 1 million + 1 buckets
.in -3m

.ZI 3m "\fB-S\fP"
\&
.br
same as -s, but round up to next 2^n + 1 boundary\&.
.in -3m

.ZI 3m "\fB-v\fP"
\&
.br
print version and exit
.in -3m

.ZI 3m "\fB-D\fP"
\&
.br
dump css file to stdout in the architecture-independent
CSV format, suitable for reloading with -R in an architecture\&. (note that \&.css
files are a hardware-architecture dependent format)
.in -3m

.ZI 3m "\fB-R\fP"
\&
.br
create and restore css from the
hardware-architecture independent CSV format file (reads from stdin if csv-file
is not supplied\&.
.in -3m
.SH THE COMMAND MENU
If -r is not supplied, a menu appears with the following options\&. Note that
all of these operations are "in place" and surgical- there is NO undo
functionality\&. Wise users will make a backup copy of all \&.css files before
using cssutil to alter values\&.

.ZI 3m "\fB-Z\fP"
\&
.br
zero all bins at or below a value\&. This is useful for
deleting all small-count features from the \&.css statistics files leaving
higher-count features untouched\&.
.in -3m

.ZI 3m "\fB-S\fP"
\&
.br
subtract a constant from all bins - this rolls all
features back a constant amount\&.
.in -3m

.ZI 3m "\fB-D\fP"
\&
.br
divide all bins by a constant - this rolls features back
linearly, rather than in scalar fashion\&.
.in -3m

.ZI 3m "\fB-R\fP"
\&
.br
rescan - regenerate the statistics output that was
initially printed\&.
.in -3m

.ZI 3m "\fB-P\fP"
\&
.br
pack - re-slot features to optimize access time\&.
.in -3m

.ZI 3m "\fB-Q\fP"
\&
.br
- gracefully exit, saving changes\&. (note that since these
operations are in-place and surgical, there is no option to exit without saving
changes\&.
.in -3m
.SH DESCRIPTION
\fBcssutil\fP is a general utility to manipulate and measure the \&.css format
statistics files used by CRM114\&'s Markovian and OSB classifiers\&. The biggest
uses are to check the available space remaining in a \&.css file, to selectively
groom a \&.css file, and to port architecture-dependent \&.css files to and from an
ASCII CSV format, which is architecture independent\&.
The \fBcssutil\fP program can be used to create information-less
\&.css files:

.di ZV
.in 0
.nf \fC
     cssutil -b -r spam\&.css
     cssutil -b -r nonspam\&.css
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR

\&. This creates the full-size files \&./spam\&.css and \&./nonspam\&.css,
holding no information\&.
The \fBcssutil\fP program can be used check that the \&.css files are reasonable\&.
Invoke \fBcssutil\fP as:

.di ZV
.in 0
.nf \fC
    cssutil -b -r spam\&.css
    cssutil -b -r nonspam\&.css
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR

You should get back a report something like this:

.di ZV
.in 0
.nf \fC
     Sparse spectra file spam\&.css statistics:

     Total available buckets          :      1048576
     Total buckets in use             :       506987
     Total hashed datums in file      :      1605968
     Average datums per bucket        :         3\&.17
     Maximum length of overflow chain :           39
     Average length of overflow chain :         1\&.84
     Average packing density          :         0\&.48
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR

Note that the packing density is 0\&.48; this means that this \&.css file is about
half full of features\&. Once the packing density gets above about 0\&.9, you will
notice that CRM114 will take longer to process text\&. The penalty is small
below packing densities below about 0\&.95 and only about a factor of 2 at 0\&.97 \&.
Best is to keep it below \&.7 to \&.8\&.
.SH SHORTCOMINGS
Note that \fBcssutil\fP as of version 20040816 is NOT capable of dealing with the
CRM114 Winnow classifier\&'s floating-point \&.cow files\&. Worse, \fBcssutil\fP is
unaware of it\&'s shortcomings, and will try anyway\&. The only recourse is to be
aware of this issue and not use \fBcssutil\fP on a Winnow classifier floating point
\&.cow format file\&.
.SH HOMEPAGE AND REPORTING BUGS
http://crm114\&.sourceforge\&.net/
.SH VERSION
This manpage: $Id: cssutil\&.azm,v 1\&.4 2004/08/19 09:23:24 vanbaal Exp $
This manpage describes cssutil as shipped with crm114 version
20040816\&.BlameClockworkOrange\&.
.SH AUTHOR
William S\&. Yerazunis\&. Manpage typesetting by Joost van Baal and Shalendra Chhabra
.SH COPYRIGHT
Copyright (C) 2001, 2002, 2003, 2004 William S\&. Yerazunis\&. This is free
software, copyrighted under the FSF\&'s GPL\&. There is NO warranty; not even for
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE\&. See the file COPYING for
more details\&.
.SH SEE ALSO
\fBcssmerge(1)\fP, \fBcssdiff(1)\fP,
\fBcrm(1)\fP