1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370
|
<html>
<HEAD>
<title>An Example of noweb</title></HEAD>
<BODY>
<H1>An Example of noweb</H1>
<address>Norman Ramsey<br>
Dept. of Computer Science, Princeton University<br>
Princeton, NJ 08544
</address>
<h2><a name="contents">Contents</a></h2>
<ul>
<li><a href="#intro">Introduction</a>
<li><a href="#counting">Counting words</a>
<li><a href="#index">Index</a>
</ul>
<h2><a name="intro">Introduction</a></h2>
The following short program illustrates the use of <tt>noweb</tt>,
a low-tech tool for literate programming.
The purpose of the program is to provide a basis for
comparing <tt>WEB</tt> and <tt>noweb</tt>, so I have used a program that has
been published before;
the text, code, and presentation are taken
from Chapter 12 of D. E. Knuth,
<cite>Literate Programming</cite>
(volume 27 of <cite>Center for the Study of
Language and Information Lecture Notes</cite>,
Stanford Univ., 1992).<p>
The notable differences are:
<ul>
<li>
When displaying source code,
<tt>noweb</tt> uses different typography.
In particular, <tt>WEB</tt> makes good use of multiple fonts
and the ablity to typeset mathematics, and it may use
mathematical symbols in place of C symbols (e.g.
a logical ``and'' symbol for ``[[&&]]'').
<tt>noweb</tt> uses a single fixed-width font for code.
<li>
<tt>noweb</tt> can work with HTML, and I have used HTML in this example.
<li>
<tt>noweb</tt> has no numbered ``sections.''
When cross-referencing is needed, <tt>noweb</tt> uses hypertext links or page
numbers.
<li>
<tt>noweb</tt> has no special support for macros.
In the sample program, I have used
a ``Definitions'' chunk to hold
macro definitions.
<li>
<tt>noweb</tt>'s index of identifiers is less accurate than <tt>WEB</tt>'s,
because it uses a language-independent heuristic to find identifiers.
This heuristic may erroneously find ``uses'' of identifiers
in string literals or comments.
Although <tt>noweb</tt> does have a language-dependent algorithm for finding
definitions of identifiers, that algorithm is less reliable than <tt>CWEB</tt>'s,
because <tt>noweb</tt> does not really parse C code.
<li>
The <tt>CWEB</tt> version of this program has semicolons following most uses
of <...>.
<tt>WEB</tt> needs the semicolon or its equivalent to make
its prettyprinting come out right.
Because it does not attempt prettyprinting, <tt>noweb</tt> needs no semicolons.
</ul>
<h2><a name="counting">Counting words</a></h2>
This example, based on a program by Klaus Guntermann and
Joachim Schrod (`<tt>WEB</tt> adapted to C.'
<cite>TUGboat</cite> <b>7</b>(3):134-7, Oct. 1986)
and a program by Silvio Levy and
D. E. Knuth (Ch. 12 of <cite>Literate Programming</cite>),
presents the ``word count''
program from Unix, rewritten in <tt>noweb</tt> to demonstrate
literate programming using <tt>noweb</tt>.
The level of detail in this document is intentionally high, for
didactic purposes; many of the things spelled out here don't need to
be explained in other programs.<p>
The purpose of <tt>wc</tt> is to count lines, words, and/or characters in
a list of files.
The number of lines in a file is the number of newline characters it
contains.
The number of characters is the file length in bytes.
A ``word'' is a maximal sequence of consecutive characters other than
newline, space, or tab, containing at least one visible ASCII code.
(We assume that the standard ASCII code is in use.)<p>
Most literate C programs share a common structure.
It's probably a good idea to state the overall structure explicitly at
the outset, even though the various parts could all be introduced in
chunks named <*> if we wanted to add them piecemeal.<p>
Here, then, is an overview of the file <tt>wc.c</tt> that is defined by
the <tt>noweb</tt> program <tt>wc.nw</tt>:
<<*>>=
<<Header files to include>>
<<Definitions>>
<<Global variables>>
<<Functions>>
<<The main program>>
@
We must include the standard I/O definitions, since we want to send
formatted output to [[stdout]] and [[stderr]].
<<Header files to include>>=
#include <stdio.h>
@
The [[status]] variable will tell the operating system if the run was
successful or not, and [[prog_name]] is used in case there's an error
message to be printed.
<<Definitions>>=
#define OK 0
/* status code for successful run */
#define usage_error 1
/* status code for improper syntax */
#define cannot_open_file 2
/* status code for file access error */
@ %def OK usage_error cannot_open_file
<<Global variables>>=
int status = OK;
/* exit status of command, initially OK */
char *prog_name;
/* who we are */
@ %def status prog_name
Now we come to the general layout of the [[main]]
function.
<<The main program>>=
main(argc, argv)
int argc;
/* number of arguments on UNIX command line */
char **argv;
/* the arguments, an array of strings */
{
<<Variables local to [[main]]>>
prog_name = argv[0];
<<Set up option selection>>
<<Process all the files>>
<<Print the grand totals if there were multiple files>>
exit(status);
}
@ %def main argc argv
If the first argument begins with a `[[-]]', the
user is choosing the desired counts and specifying
the order in which they should be displayed.
Each selection is given by the
initial character (lines, words, or characters).
For example, `[[-cl]]' would cause just the
number of characters and the number of lines to
be printed, in that order.<p>
We do not process this string now; we simply remember where it is.
It will be used to control the formatting at output time.
<<Variables local to [[main]]>>=
int file_count;
/* how many files there are */
char *which;
/* which counts to print */
@ %def file_count which
<<Set up option selection>>=
which = "lwc";
/* if no option is given, print 3 values */
if (argc > 1 && *argv[1] == '-') {
which = argv[1] + 1;
argc--;
argv++;
}
file_count = argc - 1;
@
Now we scan the remaining arguments and try to open a file, if possible.
The file is processed and its statistics are given.
We use a [[do ... while]] loop because we should read from the standard
input if no file name is given.
<<Process all the files>>=
argc--;
do {
<<If a file is given, try to open [[*(++argv)]]; [[continue]] if unsuccessful>>
<<Initialize pointers and counters>>
<<Scan file>>
<<Write statistics for file>>
<<Close file>>
<<Update grand totals>>
/* even if there is only one file */
} while (--argc > 0);
@
Here's the code to open the file. A special trick allows us to handle
input from [[stdin]] when no name is given.
Recall that the file descriptor to [[stdin]] is 0; that's what we use
as the default initial value.
<<Variables local to [[main]]>>=
int fd = 0;
/* file descriptor, initialized to stdin */
@ %def fd
<<Definitions>>=
#define READ_ONLY 0
/* read access code for system open */
@ %def READ_ONLY
<<If a file is given, try to open [[*(++argv)]]; [[continue]] if unsuccessful>>=
if (file_count > 0
&& (fd = open(*(++argv), READ_ONLY)) < 0) {
fprintf(stderr,
"%s: cannot open file %s\n",
prog_name, *argv);
status |= cannot_open_file;
file_count--;
continue;
}
<<Close file>>=
close(fd);
@
We will do some homemade buffering in order to speed things up:
Characters will be read into the [[buffer]] array before we process
them.
To do this we set up appropriate pointers and counters.
<<Definitions>>=
#define buf_size BUFSIZ
/* stdio.h BUFSIZ chosen for efficiency */
@ %def buf_size
<<Variables local to [[main]]>>=
char buffer[buf_size];
/* we read the input into this array */
register char *ptr;
/* first unprocessed character in buffer */
register char *buf_end;
/* the first unused position in buffer */
register int c;
/* current char, or # of chars just read */
int in_word;
/* are we within a word? */
long word_count, line_count, char_count;
/* # of words, lines, and chars so far */
@ %def buffer ptr buf_end in_word word_count line_count char_count
<<Initialize pointers and counters>>=
ptr = buf_end = buffer;
line_count = word_count = char_count = 0;
in_word = 0;
@
The grand totals must be initialized to zero at the beginning of the
program.
If we made these variables local to [[main]], we would have to do this
initialization explicitly; however, C's globals are automatically
zeroed. (Or rather, ``statically zeroed.'') (Get it?)
<<Global variables>>=
long tot_word_count, tot_line_count,
tot_char_count;
/* total number of words, lines, chars */
@
The present chunk, which does the counting that is <tt>wc</tt>'s
<i>raison d'etre</i>, was actually one of the simplest to write.
We look at each character and change state if it begins or ends a word.
<<Scan file>>=
while (1) {
<<Fill [[buffer]] if it is empty; [[break]] at end of file>>
c = *ptr++;
if (c > ' ' && c < 0177) {
/* visible ASCII codes */
if (!in_word) {
word_count++;
in_word = 1;
}
continue;
}
if (c == '\n') line_count++;
else if (c != ' ' && c != '\t') continue;
in_word = 0;
/* c is newline, space, or tab */
}
@
Buffered I/O allows us to count the number of characters almost for
free.
<<Fill [[buffer]] if it is empty; [[break]] at end of file>>=
if (ptr >= buf_end) {
ptr = buffer;
c = read(fd, ptr, buf_size);
if (c <= 0) break;
char_count += c;
buf_end = buffer + c;
}
@
It's convenient to output the statistics by defining a new function
[[wc_print]]; then the same function can be used for the totals.
Additionally we must decide here if we know the name of the file we have
processed or if it was just [[stdin]].
<<Write statistics for file>>=
wc_print(which, char_count, word_count,
line_count);
if (file_count)
printf(" %s\n", *argv); /* not stdin */
else
printf("\n"); /* stdin */
@
<<Update grand totals>>=
tot_line_count += line_count;
tot_word_count += word_count;
tot_char_count += char_count;
@
We might as well improve a bit on Unix's <tt>wc</tt> by displaying
the number of files too.
<<Print the grand totals if there were multiple files>>=
if (file_count > 1) {
wc_print(which, tot_char_count,
tot_word_count, tot_line_count);
printf(" total in %d files\n", file_count);
}
@
Here now is the function that prints the values according to the
specified options.
The calling routine is supposed to supply a newline.
If an invalid option character is found we inform the user about proper
usage of the command.
Counts are printed in 8-digit fields so that they will line up in
columns.
<<Definitions>>=
#define print_count(n) printf("%8ld", n)
@ %def print_count
<<Functions>>=
wc_print(which, char_count, word_count, line_count)
char *which; /* which counts to print */
long char_count, word_count, line_count;
/* given totals */
{
while (*which)
switch (*which++) {
case 'l': print_count(line_count);
break;
case 'w': print_count(word_count);
break;
case 'c': print_count(char_count);
break;
default:
if ((status & usage_error) == 0) {
fprintf(stderr,
"\nUsage: %s [-lwc] [filename ...]\n",
prog_name);
status |= usage_error;
}
}
}
@ %def wc_print
Incidentally, a test of this program against the system <tt>wc</tt>
command on a SPARCstation showed that the ``official'' <tt>wc</tt> was
slightly slower.
Furthermore, although that <tt>wc</tt> gave an appropriate error message
for the options `[[-abc]]', it made no complaints about the options
`[[-labc]]'!
Dare we suggest that the system routine might have been better if its
programmer had used a more literate approach?
<h2><a name=index>Index</a></h2>
<h3>Chunks
<nowebchunks>
<h3>Indifiers
<nowebindex>
</body>
</html>
|