File: tabulate.html

package info (click to toggle)
ploticus 2.0.3-1
  • links: PTS
  • area: main
  • in suites: woody
  • size: 3,696 kB
  • ctags: 2,035
  • sloc: ansic: 40,100; perl: 456; sh: 186; makefile: 129
file content (363 lines) | stat: -rw-r--r-- 10,582 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
<html>
<head>
<!-- This file has been generated by unroff 1.0, 03/01/02 09:13:44. -->
<!-- Do not edit! -->
<STYLE TYPE="text/css">
<!--
        A:link{text-decoration:none}
        A:visited{text-decoration:none}
        A:active{text-decoration:none}
-->
</STYLE>
<title>ploticus: proc tabulate</title>
<body bgcolor=D0D0EE vlink=0000FF>
<br>
<br>
<center>
<table cellpadding=2 bgcolor=FFFFFF width=550 ><tr>
<td>
  <table cellpadding=2 width=550><tr>
  <td><br><h2>proc tabulate</h2></td>
  <td align=right>
  <small>
  <a href="../doc/Welcome.html"><img src="../doc/ploticus.gif" border=0></a><br>
  <a href="../doc/Welcome.html">Welcome</a> &nbsp; &nbsp;
  <a href="../gallery/index.html">Gallery</a> &nbsp; &nbsp;
  <a href="../doc/Contents.html">Handbook</a> 
  <td></tr></table>
</td></tr>
<td>
<br>
<br>

<title>Manual page for proc_tabulate(PL)</title>
</head>
<body>
 

<h2>DESCRIPTION</h2>
<b>proc tabulate</b> may be used to compute a one-way distribution on one 
data field, or a two-way distribution using two data fields.<tt> </tt>
The results are then considered the "current"
data set for plotting.  
<p>
<b>proc tabulate</b> has a capacity of 200 result rows and 
60 result columns.<tt> </tt>
Data does not have to be ordered in any particular way.<tt> </tt>
See also the <b>proc processdata</b> <tt>count</tt> action, which requires
ordered data but has no upper limit on number of "bins".<tt> </tt>
<p>
The <tt>savetable</tt> feature is recommended so that you can see
what the data set will look like when developing a plot.<tt> </tt>

<h2>FEATURES</h2>
Can tabulate to produce one- or two-way distributions.  
Bins may be based on natural occurance or by preset lists or ranges.  
Percents may be calculated.  Ordering of results may be controlled.<tt> </tt>
Occurances may be counted, or accumulations may be done.<tt> </tt>
Resulting text table may be displayed on screen or saved for other uses.<tt> </tt>

<h2>EXAMPLES</h2>
See the Gallery examples
<a href="../gallery/distrib.htm">
distrib
</a>
and 
<a href="../gallery/vermonth.htm">
vermonth
</a>

<h2>PREREQUISITES</h2>
<b>proc getdata</b> must first be executed to define or access some data.<tt> </tt>

<h2>VARIABLES SET</h2>
<p>
<b>NROWS</b> = Number of rows in the data result.<tt> </tt>
<p>
<b>NFIELDS</b> = Number of fields per row in the data result.<tt> </tt>
<p>
Thus, if a one-way distribution is being done and there are 7 varieties,
NRECORDS will hold 7, which may then be used (e.g. xrange: @NRECORDS+1) 
to automatically set scaling for a bargraph.<tt> </tt>


<h2>MODES</h2>
Either one-dimensional or two-dimensional modes.<tt> </tt>
<p>
If <tt>datafield1</tt> is specified but <tt>datafield2</tt> is not,
a <b>one-dimensional</b> distribution will be computed.<tt> </tt>
The result will be two data fields-- the first field
will be the value and the second field will be the number of
instances.  The number of records in the result will be
the number of bins.<tt> </tt>
<p>
If both <tt>datafield1</tt> and <tt>datafield2</tt> are specified
then a <b>two-dimensional</b> distribution will be computed.<tt> </tt>
Datafield1 will be distributed downward and datafield2
will be distributed across.  
The result's column headings will be usable as field names
(the first column is always named <tt>rowname</tt>).<tt> </tt>
<p>
Many of the attributes are named with either 1 or 2
to correspond with either the distribution on
datafield1, or datafield2.<tt> </tt>

<h2>NOTE</h2>
After <b>proc tabulate</b> executes, all subsequent plotting
procedures in the script file will access its results 
for plotting.  However,
the original data is still in memory.  If later it is
necessary to plot the original data, <b>proc originaldata</b>
may be invoked.<tt> </tt>



<h2>MANDATORY ATTRIBUTES</h2>
The <tt>datafield1</tt> attribute must be specified.<tt> </tt>
<p>
If a <tt>valuelist</tt> is not specified, all values encountered will
get their own bin in the distribution.<tt> </tt>


<h2>ATTRIBUTES</h2>
<p>
<b>datafield1</b> 
<a href="attributetypes.html#dfield">
<i> dfield </i>
</a>
<dl>
<dt><dd><p>
Compute a distribution on this data field.<tt> </tt>
Value will be in result data field 1 and N will be
in result data field2.<tt> </tt>
Example: <tt>datafield1: 1</tt>

</dl>
<p>
<b>datafield2</b> 
<a href="attributetypes.html#dfield">
<i> dfield </i>
</a>
<dl>
<dt><dd><p>
Compute a two way distribution on datafield1 and this field.<tt> </tt>
Distribution on datafield2 will be horizontal.<tt> </tt>
See also MODES above.<tt> </tt>
Example: <tt>datafield2: 5</tt>

</dl>
<p>
<b>axis1</b> <tt>x | y</tt>
<dl>
<dt><dd><p>
Axis to equivicate with the distribution on datafield1.<tt> </tt>
This needs to be specified when working with 
data which is to be scaled using units such as date or time.<tt> </tt>
Otherwise it does not need to be specified.<tt> </tt>

</dl>
<p>
<b>axis2</b> <tt>x</tt>|<tt>y</tt>
<dl>
<dt><dd><p>
Same as <tt>axis1</tt>.<tt> </tt>

</dl>
<p>
<b>valuelist1</b> <i>stringlist</i>
<dl>
<dt><dd><p>
Define a set of values that will be included in the distribution
of datafield1.<tt> </tt>
The ordering of this set determines the order that categories 
are presented in the result.  
This is a space- or comma-delimited list of values.  
<br>
Example: <tt>valuelist: red green blue</tt>
<dt><dd><p>
If ranges are being used (<tt>dorange1: yes</tt>), then this attribute
may be used to explicitly define the ranges.  See the following example for
the syntax; by default, dash (-) is used to separate the low and hi
values in a range, with no embedded spaces allowed.<tt> </tt>
<dt><dd><p>
As a convenience, the letter "C" may be used in 
place of a low value in a range to
indicate "continuous"; its effect if for the previous high value to
be copied and taken as the next low value.<tt> </tt>
This saves the tedium and error-prone-ness 
of having to enter values twice.<tt> </tt>
<br>
Example: <tt>valuelist1: 0-2.5 C-5 C-7.5 C-10</tt>
<br>
This would be equivalent to <tt>valuelist1: 0-2.5 2.5-5 5-7.5 7.5-10</tt>.<tt> </tt>
Either way, a value of 2.5 would end up in the 2nd bin.<tt> </tt>

</dl>
<p>
<b>valuelist2</b>
<dl>
<dt><dd><p>
Value list for datafield2.  (see valuelist1)

</dl>
<p>
<b>doranges1</b> <tt>yes | no</tt>
<dl>
<dt><dd><p>
If <tt>yes</tt>, distribution on datafield1 will use ranges rather
than values.  If defined ranges overlap the higher bin has presidence.<tt> </tt>
The ranges may be defined using either <tt>valuelist1</tt>, or 
<tt>rangespec1</tt>.<tt> </tt>

</dl>
<p>
<b>doranges2</b> <tt>yes | no</tt>
<dl>
<dt><dd><p>
If <tt>yes</tt>, distribution on datafield2 will use ranges rather
than values.  
The ranges may be defined using either <tt>valuelist2</tt>, or 
<tt>rangespec2</tt>.<tt> </tt>

</dl>
<p>
<b>rangespec1</b>  <i>lowval  binsize</i>  [<i>hival</i>]
<dl>
<dt><dd><p>
If doing ranges, this attribute may be used instead of <tt>valuelist</tt>
if ranges of uniform size are to be used when tabulating.<tt> </tt>
Ranges will begin at <i>lowval</i> and be of size <i>binsize</i>.<tt> </tt>
Ranges will end when <i>hival</i> is passed, or when the high end
of the axis is passed (if an axis has been defined).<tt> </tt>
<i>lowval</i> and <i>hival</i> should be 
<a href="attributetypes.html#plotvalue">
plotvalues
</a>
 .  Implies <tt>doranges1: yes</tt>.<tt> </tt>
<br>
Example: <tt>rangespec1: 0 5 39</tt>
<br>
This would set up ranges 0-5, 5-10, 10-15, and so on, up to
35-40 (remember that where ranges overlap, the higher bin has presidence).<tt> </tt>

</dl>
<p>
<b>rangespec2</b>  <i>lowval  binsize</i>  [<i>hival</i>]
<dl>
<dt><dd><p>
Same as <tt>rangespec1</tt>, but for datafield2.<tt> </tt>

</dl>
<p>
<b>accumfield</b> 
<a href="attributetypes.html#dfield">
<i> dfield </i>
</a>
<dl>
<dt><dd><p>
Normally, proc tabulate works by counting occurances.  However,
If <tt>accumfield</tt> is specified, instead of counting, an accumulation
will be done using the specified field.<tt> </tt>

</dl>
<p>
<b>order1</b> <tt>natural | magnitude | reversemagnitude</tt>
<dl>
<dt><dd><p>
Specify order that categories are presented in the result.<tt> </tt>

</dl>
<p>
<b>order2</b> <tt>natural | magnitude | reversemagnitude</tt>
<dl>
<dt><dd><p>
Same as above for datafield2.<tt> </tt>

</dl>
<p>
<b>percents</b>  <tt>yes</tt> | <tt>no</tt>
<dl>
<dt><dd><p>
If yes, each tabulation column will be accompanied by a column of percents.<tt> </tt>

</dl>
<p>
<b>savetable</b> <i>filename</i> | <tt>stdout</tt> | <tt>stderr</tt>
<dl>
<dt><dd><p>
If specified, write tabulation results to the
given file, standard output or standard error.<tt> </tt>

</dl>
<p>
<b>select</b> 
<a href="condex.html">
<i> conditional-expression </i>
</a>
<dl>
<dt><dd><p>
<i>conditional-expression</i> 
is applied to each data record (row).<tt> </tt>
If specified and if the expression evaluates to true, the
data is included; otherwise it is excluded.<tt> </tt>
Data fields are referenced by preceding them with
two at-signs (@).<tt> </tt>
<br>
Example: <tt>select: @4 = G</tt>

</dl>
<p>
<b>rangesepchar</b> <i>char</i>
<dl>
<dt><dd><p>
Allows user specification of the range separator character 
(the character that should be used
to separate the low and high values of a range in the valuelists).<tt> </tt>
Default range separator charactor is dash (-).<tt> </tt>
Example: <tt>rangesepchar: ,</tt>

</dl>
<p>
<b>showrange</b> <tt>low</tt>  |  <tt>avg</tt>
<dl>
<dt><dd><p>
If specified, controls the content of row or column labels 
when ranges are being used.  Normal behavior is for the
label to be formatted <i>lowend</i><tt> - </tt><i>hiend</i>.<tt> </tt>
If this attribute is <tt>low</tt>, only the low value will be given.<tt> </tt>
If this attribute is <tt>avg</tt>, an average of the low and high
will be given.<tt> </tt>
This attribute is useful when proc tabulate result range bins will 
be plotted by location, e.g. for a histogram (the 1st result data
field can be used as the bar location).<tt> </tt>


</dl>
<p>
<b>showrangelowonly</b> <tt>yes | no</tt>
<dl>
<dt><dd><p>
[Superceded by the <tt>showrange</tt> attribute.]
If <tt>yes</tt> and if ranges are being used, show only the low 
part of the range in the results.  


<br>
<br>
</td></tr>
<td align=right>
<a href="../doc/Welcome.html">
<img src="../doc/ploticus.gif" border=0></a><br><small>data display engine &nbsp; <br>
<a href="../doc/Copyright.html">Copyright Steve Grubb</a>
<br>
<br>
<center>
<img src="../gallery/all.gif">
</center>
</td></tr>
</table>
</dl>
<p><hr>
Markup created by <em>unroff</em> 1.0,&#160;<tt> </tt>&#160;<tt> </tt>March 01, 2002.
</body>
</html>