1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315
|
.\"
.\" $Id: fields.3,v 1.3 1994/01/05 20:13:43 geoff Exp $
.\"
.\" $Log: fields.3,v $
.\" Revision 1.3 1994/01/05 20:13:43 geoff
.\" Add the maxf parameter
.\"
.\" Revision 1.2 1994/01/04 02:40:16 geoff
.\" Add descriptions of field_line_inc, field_field_inc, and the
.\" FLD_NOSHRINK flag.
.\"
.\" Revision 1.1 1993/09/09 01:09:44 geoff
.\" Initial revision
.\"
.\"
.TH FIELDS 3 local
.SH NAME
fieldread, fieldmake, fieldwrite, fieldfree \- field access package
.SH SYNTAX
.nf
#include "fields.h"
typedef struct {
int nfields;
int hadnl;
char *linebuf;
char **fields;
} field_t;
#define FLD_RUNS 0x0001
#define FLD_SNGLQUOTES 0x0002
#define FLD_BACKQUOTES 0x0004
#define FLD_DBLQUOTES 0x0008
#define FLD_SHQUOTES 0x0010
#define FLD_STRIPQUOTES 0x0020
#define FLD_BACKSLASH 0x0040
extern field_t *fieldread (FILE * file, char * delims,
int flags, int maxf);
extern field_t *fieldmake (char * line, int allocated,
char * delims, int flags, int maxf);
extern int fieldwrite (FILE * file, field_t * fieldp,
int delim);
extern void fieldfree (field_t * fieldp);
extern unsigned int field_line_inc;
extern unsigned int field_field_inc;
.fi
.SH DESCRIPTION
.PP
The fields access package eases the common task of parsing and
accessing information which is separated into fields by whitespace or
other delimiters. Various options can be specified to handle many
common cases, including selectable delimiters, runs of delimiters, and
quoting.
.PP
.I fieldread
reads one line from a file, parses it into fields as specified by the
parameters, and returns a
.B field_t
structure describing the result.
.I fieldmake
performs the same process on a buffer already in memory.
.I fieldwrite
creates an output line from a
.B field_t
structure and writes it to an output file.
.I fieldfree
frees a
.B field_t
structure and any associated memory allocated by the package.
.PP
The
.B field_t
structure describes the fields in a parsed line.
A well-behaved should only access the
.BR nfields ,
.BR fields ,
and
.B hadnl
elements;
all other elements are used internally by the package and are not
guaranteed to remain the same even though they are documented here.
.B Nfields
gives the number of fields in the parsed line, just like the
.B argc
argument to a C program;
.B fields
is a pointer to an array of string pointers, just like the
.B argv
argument to a C program.
As in C, the last field pointer is followed by a null pointer,
although the field count is the preferred method of accessing fields.
The user may alter
.B nfields
by decreasing it, and may replace any pointer in
.B fields
without harm.
This is often useful in replacing a single field with a calculated
value preparatory to output.
The
.B hadnl
element is nonzero if the original line was terminated with a newline
when it was parsed;
this is used to accurately reproduce the input when
.I fieldwrite
is called.
.PP
The
.B linebuf
element contains a pointer to an internal buffer allocated by
.I fieldread
or provided to
.IR fieldmake .
This buffer is
.I not
guaranteed to contain anything sensible, although in the current
implementation all of the field contents can be found therein.
.PP
.I fieldread
reads a single line of arbitrary length from
.BR file ,
allocating as much memory as necessary to hold it, and then parses the
line according to its remaining arguments.
A pointer to the parsed
.B field_t
structure is returned, with
.B NULL
returned if an error occurs or if
.B EOF
is reached on the input file.
Fields in the input line are considered to be separated by any of the
delimiters in the
.B delims
parameter.
For example, if delimiters of ":.;" are specified, a line containing
"a:b;c.d" would be considered to have four fields.
.PP
The default parsing of fields considers each delimiter to indicate a
separate field, and does not allow any quoting. This is similar to
the parsing done by
.IR cut (1).
This behavior can be modified by specifying flags.
Multiple flags may be OR'ed together.
The available flags are:
.IP \fBFLD_RUNS\fP
Consider runs of delimiters to be the same as a single delimiter,
suppressing all null fields.
This is similar to the way utilities like
.IR awk (1)
and
.IR sort (1)
treat whitespace, but it is not limited to whitespace.
A run does not have to consist of a single type of delimiter; if both
semicolon and colon are delimiters, ";::;" is a run.
.IP \fBFLD_SNGLQUOTES\fP
Allow field contents to be quoted with single quotes.
Delimiters and other quotes appearing within single quotes are ignored.
This may appear in combination with other quote options.
.IP \fBFLD_BACKQUOTES\fP
Allow field contents to be quoted with reverse single quotes.
Delimiters and other quotes appearing within reverse single quotes are ignored.
This may appear in combination with other quote options.
.IP \fBFLD_DBLQUOTES\fP
Allow field contents to be quoted with single quotes.
Delimiters and other quotes appearing within double quotes are ignored.
This may appear in combination with other quote options.
.IP \fBFLD_SHQUOTES\fP
Allow shell-style quoting.
In the absence of this option, quotes are only recognized at the
beginning of a field, and characters following the close quote are
removed from the field (and are thus lost from the input line).
If this option is specified, quotes may appear within a field, in the
same way as they are handled by
.IR sh (1).
Multiple quoting styles may be used in the same field.
If none of
.BR FLD_SNGLQUOTES ,
.BR FLD_BACKQUOTES ,
or
.B FLD_DBLQUOTES
is specified with
.BR FLD_SHQUOTES ,
all three options are implied.
.IP \fBFLD_STRIPQUOTES\fP
Remove quotes and backslash sequences from the field while parsing,
converting backslash sequences to their proper ASCII equivalent.
The C sequences \ea, \eb, \ef, \en, \er, \ev, \ex\fInn\fP, and \e\fInnn\fP are
supported.
Any other sequence is simply converted to the backslashed character,
as in
.IR sh (1).
.IP \fBFLD_BACKSLASH\fP
Accept standard C-style backslash sequences.
The sequence will be converted to an ASCII equivalent if
.B FLD_STRIPQUOTES
is specified (q.v.).
.IP \fBFLD_NOSHRINK\fP
Don't shrink allocated memory using
.IR realloc (3)
before returning.
This option can have a significant effect on performance, especially
when
.I fieldfree
is going to be called soon after
.I fieldread
or
.IR fieldmake .
The disadvantage is that slightly more memory will be occupied until
the field structure is freed.
.PP
The
.I maxf
parameter, if nonzero, specifies the maximum number of fields to be
generated.
This may enhance performance if only the first few fields of a long
line are of interest to the caller.
The actual number of fields returned is one greater than
.IR maxf ,
because the remainder of the line will be returned as a single
contiguous (and uninterpreted,
.B FLD_STRIPQUOTES
or
.B FLD_BACKSLASH
is specified) field.
.PP
.I fieldmake
operates exactly like
.IR fieldread ,
except that the line parsed is provided by the caller rather than
being read from a file.
If the
.I allocated
parameter is nonzero, the memory pointed to by the
.I line
parameter will automatically be freed when
.I fieldfree
is called;
otherwise this memory is the caller's responsibility.
The memory pointed to by
.I line
is destroyed by
.IR fieldmake .
All other parameters are the same as for
.IR fieldread.
.PP
.I fieldwrite
writes a set of fields to the specified
.IR file ,
separating them with the delimiter character
.I delim
(note that this is a character, not a string), and appending a newline
if specified by the
.I hadnl
element of the structure.
The field structure is not freed.
.I fieldwrite
will return nonzero if an I/O error is detected.
.PP
.I fieldfree
frees the
.B field_t
structure passed to it, along with any associated auxiliary memory
allocated by the package (or passed to
.IR fieldmake ).
The structure may not be accessed after
.I fieldfree
is called.
.PP
.B field_line_inc
(default 512) and
.B field_field_inc
(default 20) describe the increments to use when expanding lines as
they are read in and parsed.
.I fieldread
initially allocates a buffer of
.B field_line_inc
bytes and, if the input line is larger than that, expands the buffer
in increments of the same amount until it is large enough.
If input lines are known to consistently reach a certain size,
performance will be improved by setting
.B field_line_inc
to a value larger than that size (larger because there must be room
for a null byte).
.B field_field_inc
serves the same purpose in both
.I fieldread
and
.IR fieldmake ,
except that it is related to the number of fields in the line rather
than to the line length.
If the number of fields is known, performance will be improved by
setting
.B field_field_inc
to at least one more than that number.
.SH RETURN VALUES
.I fieldread
and
.I fieldmake
return
.B NULL
if an error occurs or if
.B EOF
is reached on the input file.
.I fieldwrite
returns nonzero if an output error occurs.
.SH BUGS
Thanks to the vagaries of ANSI C, the
.B fields.h
header file defines an auxiliary macro named
.BR P .
If the user needs a similarly-named macro, this macro must be
undefined first, and the user's macro must be defined after
.B fields.h
is included.
|