1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245
|
<html>
<head>
<!-- This file has been generated by unroff 1.0, 02/01/02 12:57:11. -->
<!-- Do not edit! -->
<STYLE TYPE="text/css">
<!--
A:link{text-decoration:none}
A:visited{text-decoration:none}
A:active{text-decoration:none}
-->
</STYLE>
<title>ploticus: input data formats</title>
<body bgcolor=D0D0EE vlink=0000FF>
<br>
<br>
<center>
<table cellpadding=2 bgcolor=FFFFFF width=550 ><tr>
<td>
<table cellpadding=2 width=550><tr>
<td><br><h2>Input data formats</h2></td>
<td align=right>
<small>
<a href="../doc/Welcome.html"><img src="../doc/ploticus.gif" border=0></a><br>
<a href="../doc/Welcome.html">Welcome</a>
<a href="../gallery/index.html">Gallery</a>
<a href="../doc/Contents.html">Handbook</a>
<td></tr></table>
</td></tr>
<td>
<br>
<br>
<title>Manual page for Input_data_formats(PL)</title>
</head>
<body>
<p>
<a href="getdata.html">
proc getdata
</a>
is used to read or specify plotting data.<tt> </tt>
<a href="trailer.html">
proc trailer
</a>
may be used to place larger amounts of embedded plot data
at the end of the script file, to get it out of the way.<tt> </tt>
Ploticus can read tabular data from files, from command results, or data may be
embedded in the ploticus script.<tt> </tt>
<br><br><br>
<h2>Plotting from data fields</h2>
<p>
Plotting and data display operations are done
using fields. Taking a look at the first example data set below,
we might draw a bar graph using the values in field 2,
and draw error bars using the values in field 3.<tt> </tt>
The bars could be labeled with the values in field 4, or
perhaps field 1.<tt> </tt>
<p>
If your data exists in a state such that additional processing is
required in order to display it in a desired way, you may be able
to manipulate it after it is read by ploticus, using
<a href="processdata.html">
proc processdata
</a>
, to perform accumulation, tabulation and counting, rewriting
as percents, computation of totals, reversing record order,
rotation of row/column matrix, break processing, etc.<tt> </tt>
<br><br><br>
<h2>Recognized data formats</h2>
Data files or streams should be plain ascii text, not binary, and should be organized as a
collection of rows having one or more fields.<tt> </tt>
Fields may have numeric or alphanumeric content and may be delimited in one of these ways:
<br><br><br>
<ul>
<li>
<b>spacequoted</b>
<br>
<pre>
F1 2.43 0.47 "Jane Doe" PF7955
F2 2.79 0.28 "John Smith" PT2705
F3 2.62 0.37 "Ken Brown" PB2702
F4 "" "" "Bud Flippner" PX7205
</pre>
Fields are delimited by one or more spaces or tabs.<tt> </tt>
Fields may be enclosed in double quotes ("), and such fields may have
embedded white space. Blank fields may be represented as shown.<tt> </tt>
<br><br><br>
<li>
<b>whitespace</b>
<br>
<pre>
F1 2.43 0.47 Jane_Doe PF7955
F2 2.79 0.28 John_Smith PT2705
F3 2.62 0.37 Ken_Brown PB2702
F4 - - Bud_Flippner PX7205
...
</pre>
Fields are delimited by one or more spaces or tabs.<tt> </tt>
No quote processing is done.<tt> </tt>
Blank fields must be represented using a code, and
alphanumeric fields cannot contain white space.<tt> </tt>
Parsing of <tt>whitespace</tt> data is faster than processing
of <tt>spacequoted</tt> data.<tt> </tt>
<br><br><br>
<li>
<b>tab delimited</b>
<br>
<pre>
F1 2.43 0.47 Jane Doe
F2 2.79 0.28 John Smith
F3 2.62 0.37 Ken Brown
F4 Bud Flippner
...
</pre>
Fields are separated by a single tab.
Zero length fields are taken to be blank.<tt> </tt>
Data fields cannot have embedded tabs.<tt> </tt>
The first field must start at the very beginning of the line.<tt> </tt>
The last field in a row may be terminated by a tab or not.<tt> </tt>
<br><br><br>
<li>
<b>comma delimited</b>
<pre>
"F1",2.43,0.47,"Jane Doe"
"F2",2.79,0.28,"John Smith"
"F3",2.62,0.37,"Ken Brown"
"F4",,,"Hello""world"
...
</pre>
Also known as comma-quote delimited or CSV. Fields are separated by commas.
Alphanumeric fields are enclosed in double quotes (although ploticus really
doesn't care about this unless a field contains embedded whitespace or comma
characters).<tt> </tt>
Zero length fields and fields containing "" are taken to be blank.<tt> </tt>
An embedded double quote is represented using ("") as seen in row F4 above.<tt> </tt>
No whitespace is allowed before or after fields (although this
apparently is tolerated in the CSV spec).<tt> </tt>
<br><br><br>
</ul>
<p>
<b>Notes:</b>
<p>
Data that is specified within a script is subject to script processing: leading white space
is stripped off and the script interpreter will attempt to evaluate constructs that look like
operators or variables.<tt> </tt>
<p>
Empty rows and commented rows are ignored (the comment marker may be specified via
<a href="getdata.html">
proc getdata
</a>
) .<tt> </tt>
<p>
Data sets with variable number of fields may be accomodated by specifying
<a href="getdata.html">
proc getdata
</a>
attribute <tt>nfields</tt>.<tt> </tt>
Otherwise, the first usable row will dictate the expected number of fields per record.<tt> </tt>
If a row has <b>more</b> than the expected number of fields, extra fields are silently ignored.<tt> </tt>
If a row has <b>less</b> than the expected number of fields, blank fields are silently added
until the record has same number of fields as other records.<tt> </tt>
<tt>nfields</tt> may also be used to read only the first few fields on every row, and ignore the rest.<tt> </tt>
<p>
Leading white space is allowed when using <tt>spacequoted</tt> or <tt>whitespace</tt> delimitation.<tt> </tt>
It is not allowed on the other types.<tt> </tt>
<p>
Each row, including the last one, should be terminated with the standard line terminator
for your system. For unix systems this is the newline character.<tt> </tt>
For Win32 it is CR/LF; these are handled properly by MingW builds but not by unix builds.<tt> </tt>
<p>
The data parser was improved for version 2.02; earlier versions did not support zero-length
fields or data sets with variable number of fields.<tt> </tt>
<br><br><br>
<h2>Missing data</h2>
Missing data values may be represented using a code or by a zero-length field, if the
specific delimitation method allows them.<tt> </tt>
When plotting,
missing values are generally skipped over, but exactly what occurs depends on
what kind of plot operation is being done. The individual plotting
proc manual pages give details.<tt> </tt>
<br><br><br>
<h2>Embedded #set statements</h2>
Data files may contain embedded <tt>#set</tt> statements for setting ploticus
variables directly from the data file. The syntax is:
<dl>
<dt><dd><p>
<tt>#set VARIABLE = value</tt>.<tt> </tt>
<br><br><br>
</dl>
<h2>Examples</h2>
Gallery examples include:
<br>
<a href="../gallery/scat7.dat">
scat7.dat
</a>
(white-space delimited)
<br>
<a href="../gallery/stock.csv">
stock.csv
</a>
(comma delimited)
<br>
<a href="../gallery/timeline3.htm">
timeline3
</a>
(data specified within script)
<br>
<a href="../gallery/km2.htm">
km2
</a>
(data specified within script).<tt> </tt>
<br>
<br>
</td></tr>
<td align=right>
<a href="Welcome.html">
<img src="../doc/ploticus.gif" border=0></a><br><small>data display engine <br>
<a href="../doc/Copyright.html">Copyright Steve Grubb</a>
<br>
<br>
<center>
<img src="../gallery/all.gif">
</center>
</td></tr>
</table>
<p><hr>
Markup created by <em>unroff</em> 1.0, <tt> </tt> <tt> </tt>February 01, 2002.
</body>
</html>
|