1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430
|
% This program by D. E. Knuth is not copyrighted and can be used freely.
% Version 1 was implemented in June 1982.
% Slight changes were made in October, 1982, for version 0.6 of TeX.
% Version 2 (July 1983) is consistent with TeX version 0.999.
% Version 3 (September 1989) is consistent with 8-bit TeX.
% Here is TeX material that gets inserted after \input webmac
\def\hang{\hangindent 3em\indent\ignorespaces}
\font\ninerm=cmr9
\let\mc=\ninerm % medium caps for names like SAIL
\def\PASCAL{Pascal}
\def\(#1){} % this is used to make section names sort themselves better
\def\9#1{} % this is used for sort keys in the index
\def\title{POOL\lowercase{type}}
\def\contentspagenumber{101}
\def\topofcontents{\null
\def\titlepage{F} % include headline on the contents page
\def\rheader{\mainfont\hfil \contentspagenumber}
\vfill
\centerline{\titlefont The {\ttitlefont POOLtype} processor}
\vskip 15pt
\centerline{(Version 3, September 1989)}
\vfill}
\def\botofcontents{\vfill
\centerline{\hsize 5in\baselineskip9pt
\vbox{\ninerm\noindent
The preparation of this report
was supported in part by the National Science
Foundation under grants IST-8201926 and MCS-8300984,
and by the System Development Foundation. `\TeX' is a
trademark of the American Mathematical Society.}}}
\pageno=\contentspagenumber \advance\pageno by 1
@* Introduction.
The \.{POOLtype} utility program converts string pool files output
by \.{TANGLE} into a slightly more symbolic format that may be useful
when \.{TANGLE}d programs are being debugged.
It's a pretty trivial routine, but people may want to try transporting
this program before they get up enough courage to tackle \TeX\ itself.
The first 256 strings are treated as \TeX\ treats them, using routines
copied from \TeX82.
@ \.{POOLtype} is written entirely in standard \PASCAL, except that it has
to do some slightly system-dependent character code conversion on input
and output. The input is read from |pool_file|, and the output is written
on |output|. If the input is erroneous, the |output| file will describe
the error.
@^system dependencies@>
@p program POOLtype(@!pool_file,@!output);
label 9999; {this labels the end of the program}
type @<Types in the outer block@>@/
var @<Globals in the outer block@>@/
procedure initialize; {this procedure gets things started properly}
var @<Local variables for initialization@>@;
begin @<Set initial values of key variables@>@/
end;
@ Here are some macros for common programming idioms.
@d incr(#) == #:=#+1 {increase a variable by unity}
@d decr(#) == #:=#-1 {decrease a variable by unity}
@d do_nothing == {empty statement}
@* The character set.
(The following material is copied verbatim from \TeX82.
Thus, the same system-dependent changes should be made to both programs.)
In order to make \TeX\ readily portable between a wide variety of
computers, all of its input text is converted to an internal eight-bit
code that includes standard ASCII, the ``American Standard Code for
Information Interchange.'' This conversion is done immediately when each
character is read in. Conversely, characters are converted from ASCII to
the user's external representation just before they are output to a
text file.
Such an internal code is relevant to users of \TeX\ primarily because it
governs the positions of characters in the fonts. For example, the
character `\.A' has ASCII code $65=@'101$, and when \TeX\ typesets
this letter it specifies character number 65 in the current font.
If that font actually has `\.A' in a different position, \TeX\ doesn't
know what the real position is; the program that does the actual printing from
\TeX's device-independent files is responsible for converting from ASCII to
a particular font encoding.
@^ASCII code@>
\TeX's internal code is relevant also with respect to constants
that begin with a reverse apostrophe; and it provides an index to the
\.{\\catcode}, \.{\\mathcode}, \.{\\uccode}, \.{\\lccode}, and \.{\\delcode}
tables.
@ Characters of text that have been converted to \TeX's internal form
are said to be of type |ASCII_code|, which is a subrange of the integers.
@<Types...@>=
@!ASCII_code=0..255; {eight-bit numbers}
@ The original \PASCAL\ compiler was designed in the late 60s, when six-bit
character sets were common, so it did not make provision for lowercase
letters. Nowadays, of course, we need to deal with both capital and small
letters in a convenient way, especially in a program for typesetting;
so the present specification of \TeX\ has been written under the assumption
that the \PASCAL\ compiler and run-time system permit the use of text files
with more than 64 distinguishable characters. More precisely, we assume that
the character set contains at least the letters and symbols associated
with ASCII codes @'40 through @'176; all of these characters are now
available on most computer terminals.
Since we are dealing with more characters than were present in the first
\PASCAL\ compilers, we have to decide what to call the associated data
type. Some \PASCAL s use the original name |char| for the
characters in text files, even though there now are more than 64 such
characters, while other \PASCAL s consider |char| to be a 64-element
subrange of a larger data type that has some other name.
In order to accommodate this difference, we shall use the name |text_char|
to stand for the data type of the characters that are converted to and
from |ASCII_code| when they are input and output. We shall also assume
that |text_char| consists of the elements |chr(first_text_char)| through
|chr(last_text_char)|, inclusive. The following definitions should be
adjusted if necessary.
@^system dependencies@>
@d text_char == char {the data type of characters in text files}
@d first_text_char=0 {ordinal number of the smallest element of |text_char|}
@d last_text_char=255 {ordinal number of the largest element of |text_char|}
@<Local variables for init...@>=
@!i:integer;
@ The \TeX\ processor converts between ASCII code and
the user's external character set by means of arrays |xord| and |xchr|
that are analogous to \PASCAL's |ord| and |chr| functions.
@<Glob...@>=
@!xord: array [text_char] of ASCII_code;
{specifies conversion of input characters}
@!xchr: array [ASCII_code] of text_char;
{specifies conversion of output characters}
@ Since we are assuming that our \PASCAL\ system is able to read and
write the visible characters of standard ASCII (although not
necessarily using the ASCII codes to represent them), the following
assignment statements initialize the standard part of the |xchr| array
properly, without needing any system-dependent changes. On the other
hand, it is possible to implement \TeX\ with less complete character
sets, and in such cases it will be necessary to change something here.
@^system dependencies@>
@<Set init...@>=
xchr[@'40]:=' ';
xchr[@'41]:='!';
xchr[@'42]:='"';
xchr[@'43]:='#';
xchr[@'44]:='$';
xchr[@'45]:='%';
xchr[@'46]:='&';
xchr[@'47]:='''';@/
xchr[@'50]:='(';
xchr[@'51]:=')';
xchr[@'52]:='*';
xchr[@'53]:='+';
xchr[@'54]:=',';
xchr[@'55]:='-';
xchr[@'56]:='.';
xchr[@'57]:='/';@/
xchr[@'60]:='0';
xchr[@'61]:='1';
xchr[@'62]:='2';
xchr[@'63]:='3';
xchr[@'64]:='4';
xchr[@'65]:='5';
xchr[@'66]:='6';
xchr[@'67]:='7';@/
xchr[@'70]:='8';
xchr[@'71]:='9';
xchr[@'72]:=':';
xchr[@'73]:=';';
xchr[@'74]:='<';
xchr[@'75]:='=';
xchr[@'76]:='>';
xchr[@'77]:='?';@/
xchr[@'100]:='@@';
xchr[@'101]:='A';
xchr[@'102]:='B';
xchr[@'103]:='C';
xchr[@'104]:='D';
xchr[@'105]:='E';
xchr[@'106]:='F';
xchr[@'107]:='G';@/
xchr[@'110]:='H';
xchr[@'111]:='I';
xchr[@'112]:='J';
xchr[@'113]:='K';
xchr[@'114]:='L';
xchr[@'115]:='M';
xchr[@'116]:='N';
xchr[@'117]:='O';@/
xchr[@'120]:='P';
xchr[@'121]:='Q';
xchr[@'122]:='R';
xchr[@'123]:='S';
xchr[@'124]:='T';
xchr[@'125]:='U';
xchr[@'126]:='V';
xchr[@'127]:='W';@/
xchr[@'130]:='X';
xchr[@'131]:='Y';
xchr[@'132]:='Z';
xchr[@'133]:='[';
xchr[@'134]:='\';
xchr[@'135]:=']';
xchr[@'136]:='^';
xchr[@'137]:='_';@/
xchr[@'140]:='`';
xchr[@'141]:='a';
xchr[@'142]:='b';
xchr[@'143]:='c';
xchr[@'144]:='d';
xchr[@'145]:='e';
xchr[@'146]:='f';
xchr[@'147]:='g';@/
xchr[@'150]:='h';
xchr[@'151]:='i';
xchr[@'152]:='j';
xchr[@'153]:='k';
xchr[@'154]:='l';
xchr[@'155]:='m';
xchr[@'156]:='n';
xchr[@'157]:='o';@/
xchr[@'160]:='p';
xchr[@'161]:='q';
xchr[@'162]:='r';
xchr[@'163]:='s';
xchr[@'164]:='t';
xchr[@'165]:='u';
xchr[@'166]:='v';
xchr[@'167]:='w';@/
xchr[@'170]:='x';
xchr[@'171]:='y';
xchr[@'172]:='z';
xchr[@'173]:='{';
xchr[@'174]:='|';
xchr[@'175]:='}';
xchr[@'176]:='~';@/
@ Some of the ASCII codes without visible characters have been given symbolic
names in this program because they are used with a special meaning.
@d null_code=@'0 {ASCII code that might disappear}
@d carriage_return=@'15 {ASCII code used at end of line}
@d invalid_code=@'177 {ASCII code that many systems prohibit in text files}
@ The ASCII code is ``standard'' only to a certain extent, since many
computer installations have found it advantageous to have ready access
to more than 94 printing characters. Appendix~C of {\sl The \TeX book\/}
gives a complete specification of the intended correspondence between
characters and \TeX's internal representation.
@:TeXbook}{\sl The \TeX book@>
If \TeX\ is being used
on a garden-variety \PASCAL\ for which only standard ASCII
codes will appear in the input and output files, it doesn't really matter
what codes are specified in |xchr[0..@'37]|, but the safest policy is to
blank everything out by using the code shown below.
However, other settings of |xchr| will make \TeX\ more friendly on
computers that have an extended character set, so that users can type things
like `\.^^Z' instead of `\.{\\ne}'. People with extended character sets can
assign codes arbitrarily, giving an |xchr| equivalent to whatever
characters the users of \TeX\ are allowed to have in their input files.
It is best to make the codes correspond to the intended interpretations as
shown in Appendix~C whenever possible; but this is not necessary. For
example, in countries with an alphabet of more than 26 letters, it is
usually best to map the additional letters into codes less than~@'40.
To get the most ``permissive'' character set, change |' '| on the
right of these assignment statements to |chr(i)|.
@^character set dependencies@>
@^system dependencies@>
@<Set init...@>=
for i:=0 to @'37 do xchr[i]:=' ';
for i:=@'177 to @'377 do xchr[i]:=' ';
@ The following system-independent code makes the |xord| array contain a
suitable inverse to the information in |xchr|. Note that if |xchr[i]=xchr[j]|
where |i<j<@'177|, the value of |xord[xchr[i]]| will turn out to be
|j| or more; hence, standard ASCII code numbers will be used instead of
codes below @'40 in case there is a coincidence.
@<Set init...@>=
for i:=first_text_char to last_text_char do xord[chr(i)]:=invalid_code;
for i:=@'200 to @'377 do xord[xchr[i]]:=i;
for i:=0 to @'176 do xord[xchr[i]]:=i;
@* String handling.
(The following material is copied from the \\{get\_strings\_started} procedure
of \TeX82, with slight changes.)
@<Glob...@>=
@!k,@!l:0..255; {small indices or counters}
@!m,@!n:text_char; {characters input from |pool_file|}
@!s:integer; {number of strings treated so far}
@ The global variable |count| keeps track of the total number of characters
in strings.
@<Glob...@>=
@!count:integer; {how long the string pool is, so far}
@ @<Set init...@>=
count:=0;
@ This is the main program, where \.{POOLtype} starts and ends.
@d abort(#)==begin write_ln(#); goto 9999;
end
@p begin initialize;@/
@<Make the first 256 strings@>;
s:=256;@/
@<Read the other strings from the \.{POOL} file,
or give an error message and abort@>;
write_ln('(',count:1,' characters in all.)');
9999:end.
@ @d lc_hex(#)==l:=#;
if l<10 then l:=l+"0" @+else l:=l-10+"a"
@<Make the first 256...@>=
for k:=0 to 255 do
begin write(k:3,': "'); l:=k;
if (@<Character |k| cannot be printed@>) then
begin write(xchr["^"],xchr["^"]);
if k<@'100 then l:=k+@'100
else if k<@'200 then l:=k-@'100
else begin lc_hex(k div 16); write(xchr[l]); lc_hex(k mod 16); incr(count);
end;
count:=count+2;
end;
if l="""" then write(xchr[l],xchr[l])
else write(xchr[l]);
incr(count); write_ln('"');
end
@ The first 128 strings will contain 95 standard ASCII characters, and the
other 33 characters will be printed in three-symbol form like `\.{\^\^A}'
unless a system-dependent change is made here. Installations that have
an extended character set, where for example |xchr[@'32]=@t\.{\'^^Z\'}@>|,
would like string @'32 to be the single character @'32 instead of the
three characters @'136, @'136, @'132 (\.{\^\^Z}). On the other hand,
even people with an extended character set will want to represent string
@'15 by \.{\^\^M}, since @'15 is |carriage_return|; the idea is to
produce visible strings instead of tabs or line-feeds or carriage-returns
or bell-rings or characters that are treated anomalously in text files.
Unprintable characters of codes 128--255 are, similarly, rendered
\.{\^\^80}--\.{\^\^ff}.
The boolean expression defined here should be |true| unless \TeX\
internal code number~|k| corresponds to a non-troublesome visible
symbol in the local character set. An appropriate formula for the
extended character set recommended in {\sl The \TeX book\/} would, for
example, be `|k in [0,@'10..@'12,@'14,@'15,@'33,@'177..@'377]|'.
If character |k| cannot be printed, and |k<@'200|, then character |k+@'100| or
|k-@'100| must be printable; moreover, ASCII codes |[@'41..@'46,
@'60..@'71, @'141..@'146, @'160..@'171]| must be printable.
Thus, at least 80 printable characters are needed.
@:TeXbook}{\sl The \TeX book@>
@^character set dependencies@>
@^system dependencies@>
@<Character |k| cannot be printed@>=
(k<" ")or(k>"~")
@ When the \.{WEB} system program called \.{TANGLE} processes a source file,
it outputs a \PASCAL\ program and also a string pool file. The present
program reads the latter file, where each string appears as a two-digit decimal
length followed by the string itself, and the information is output with its
associated index number. The strings are surrounded by double-quote marks;
double-quotes in the string itself are repeated.
@<Glob...@>=
@!pool_file:packed file of text_char;
{the string-pool file output by \.{TANGLE}}
@!xsum:boolean; {has the check sum been found?}
@ @<Read the other strings...@>=
reset(pool_file); xsum:=false;
if eof(pool_file) then abort('! I can''t read the POOL file.');
repeat @<Read one string, but abort if there are problems@>;
until xsum;
if not eof(pool_file) then abort('! There''s junk after the check sum')
@ @<Read one string...@>=
if eof(pool_file) then abort('! POOL file contained no check sum');
read(pool_file,m,n); {read two digits of string length}
if m<>'*' then
begin if (xord[m]<"0")or(xord[m]>"9")or(xord[n]<"0")or(xord[n]>"9") then
abort('! POOL line doesn''t begin with two digits');
l:=xord[m]*10+xord[n]-"0"*11; {compute the length}
write(s:3,': "'); count:=count+l;
for k:=1 to l do
begin if eoln(pool_file) then
begin write_ln('"'); abort('! That POOL line was too short');
end;
read(pool_file,m); write(xchr[xord[m]]);
if xord[m]="""" then write(xchr[""""]);
end;
write_ln('"'); incr(s);
end
else xsum:=true;
read_ln(pool_file)
@* System-dependent changes.
This section should be replaced, if necessary, by changes to the program
that are necessary to make \.{POOLtype} work at a particular installation.
It is usually best to design your change file so that all changes to
previous sections preserve the section numbering; then everybody's version
will be consistent with the printed program. More extensive changes,
which introduce new sections, can be inserted here; then only the index
itself will get a new section number.
@^system dependencies@>
@* Index.
Indications of system dependencies appear here together with the section numbers
where each ident\-i\-fier is used.
|