File: makarov.html

package info (click to toggle)
lg-issue47 2-6
  • links: PTS
  • area: main
  • in suites: woody
  • size: 3,768 kB
  • ctags: 237
  • sloc: ansic: 232; sh: 122; makefile: 36; perl: 9
file content (733 lines) | stat: -rw-r--r-- 30,687 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
<!--startcut BEGIN header ==============================================-->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML><HEAD>
<title>Programming in Dino LG #47</title>
</HEAD>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#0000AF"
ALINK="#FF0000">
<!--endcut ============================================================-->

<H4>
"Linux Gazette...<I>making Linux just a little more fun!</I>"
</H4>

<P> <HR> <P> 
<!--===================================================================-->

<center>
<H1><font color="maroon">Programming in Dino</font></H1>
<H4>By <a href="mailto:vmakarov@fnmail.com">Vladimir N. Makarov</a></H4>
</center>
<P> <HR> <P>  

<!-- END header -->




<BR><I>Dino</I> is a high-level, dynamic scripting language that
has been designed for simplicity, uniformity, and expressiveness. 
Dino is similar to such well known scripting languages as <I>Perl</I>,
<I>TCL</I>, and <I>Python</I>. 
As most programmers know the C language, Dino resembles C where possible.

<p>Dino is an extensible, object oriented language that has garbage collection.  
It supports parallelism description, exception handling, and dynamic loading 
of libraries written on other languages.  
Although Dino is a multiplatform language, its main platform is Linux.

<P>This document is a concise introduction to the new Dino scripting language, 
but is not a programmer's manual. 

<H2>

<HR WIDTH="100%"></H2>

<H2>
1. Some History</H2>

Originally, Dino was designed and implemented by the Russian graphics
company <A HREF="http://www.animatek.com">ANIMATEK</A> to describe the
movement of dinosaurs in an animation project. (This is origin of
the language's name.) At that time it worked in only 64Kb memory. It has
been considerably redesigned and reimplemented with the aid of the <A
HREF="http://www.linuxstart.com/~vladimir_makarov/download.html">COCOM
toolset</A>.

<H2>2. Let's Begin</H2> 

The best way to get the feel of a programming language is to see a program 
written in it. Because I have worked in the compiler field for the last 18 years, 
I'll write a small assembler in Dino. 

<P>
Most of us do not
remember how programmers wrote programs for old computers that had
only a few kilobytes of memory. Long ago I read about an Algol 60 compiler 
that worked on a  computer that had only 420 20-bits words.  
In an old book "Compiler Construction for Digital
Computers", Gries describes an Algol compiler working on 1024 42-bits
words. How did they achieve this?  One of the ways is to use an interpreter
for a specialized language; a program in a specialized language is
usually smaller.  Let's implement an assembler for syntactic parsers.  
The assembler output will be a syntactic parser
interpreter in C.  The assembler instructions have the following format:

<BLOCKQUOTE>
<BLOCKQUOTE>
<PRE><TT>[label:] [code [operand]]</TT></PRE>
</BLOCKQUOTE>
</BLOCKQUOTE>
Here, the constructions in brackets are optional. 
For convenience we will allow comments that start with <B>;</B> and
finish at the end of the line.
<P> 
The assembler will have the following instructions:
<BR> 
<CENTER><TABLE WIDTH="80%" NOSAVE >
<TR ALIGN=LEFT VALIGN=TOP NOSAVE>
<TD NOSAVE><TT><FONT SIZE=+1>goto <i>label</i></FONT></TT></TD>

<TD NOSAVE>Transfer control to the instruction marked <i>label</i>.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP NOSAVE>
<TD><TT><FONT SIZE=+1>gosub <i>label</i></FONT></TT></TD>

<TD NOSAVE>Transfer control to the instruction marked <i>label</i> and save
the next instruction address.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP NOSAVE>
<TD NOSAVE><TT><FONT SIZE=+1>return</FONT></TT></TD>

<TD NOSAVE>Transfer control to the instruction following the
latest executed <TT>gosub</TT> instruction.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP NOSAVE>
<TD><TT><FONT SIZE=+1>skipif symbol</FONT></TT></TD>

<TD ALIGN=LEFT VALIGN=TOP NOSAVE>If the current token is <TT>symbol</TT>,
the following instruction is skipped. Otherwise, transfer control 
to the following instruction.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP NOSAVE>
<TD ALIGN=LEFT VALIGN=TOP NOSAVE><TT><FONT SIZE=+1>match symbol</FONT></TT></TD>

<TD NOSAVE>The current token should be <TT>symbol</TT>, otherwise a 
syntax error is set.  After matching, the next token is read and
become the current token.</TD> </TR>

<TR ALIGN=LEFT VALIGN=TOP NOSAVE>
<TD><TT><FONT SIZE=+1>next</FONT></TT></TD>

<TD NOSAVE>The next token is read and become the current token.</TD>
</TR>
</TABLE></CENTER>
<P>
The following assembler fragment recognizes Pascal designators. 
<BLOCKQUOTE>
<PRE><TT>;
; Designator = Ident { "." Ident | "[" { expr / ","} "]" | "@" }
;
start:
Designator:
        match   Ident
point:  skipif  Point
        goto    lbrack
        next    ; skip .
        match   Ident
        goto    point
lbrack: skipif  LBracket
        goto    at
        next    ; skip [
next:   gosub   expr
        skipif  Comma
        goto    rbrack
        next    ; skip ,
        goto    next
rbrack: match   RBracket
        goto    point
at:     skipif  At
        return
        next    ; skip @
        goto    point</TT></PRE>
</BLOCKQUOTE>

<H3>2.1. Overall structure of the assembler.</H3>

As a rule, assemblers work in two passes. Therefore, we need to have some
internal representation (IR) to store the program between the passes. 
We will create the following Dino files:
<UL>
<LI>
The code that describes the IR and the auxiliary functions will be in the file <A
HREF="misc/makarov/ir.d">ir.d</A>.
</LI><LI>  
The code for reading the
assembler program and for generating the IR will be in the file <A
HREF="misc/makarov/input.d">input.d</A>.  
</LI><LI>
The code for checking the IR will be in the file
<A HREF="misc/makarov/check.d">check.d</A>.
</LI><LI>The code for generating the interpreter
in C will be in the file <A HREF="misc/makarov/gen.d">gen.d</A>
</LI><LI>
The top level code will be in the file <A HREF="misc/makarov/sas.d">sas.d</A>.
</LI>
</UL>
<P>These files are described in detail below.</P>

<H3>2.2. File <A HREF="misc/makarov/ir.d">ir.d</A></H3>

This file contains the description of the IR in Dino and also some auxiliary
functions.  Dino has dynamic variables. In other words, a Dino
variable may contain a value of any Dino type. The major Dino types are:
<UL>
<LI>
<i>characters</i>
</LI><LI>
<i>integers</i> (32-bit)
</LI><LI>
<i>floating point
numbers</i> (IEEE double floating point numbers)
</LI><LI>
<i>heterogeneous vectors</i> (that is, vectors that may contain elements of different
types. A typical example of vector is a string, a vector whose
values can only be characters)
</LI><LI>
<i>associative tables</i>
</LI><LI>
<i>objects</i>
</LI></UL>
The values of the last three types are <i>shared</i>.
That means that if a variable value is assigned to another variable, any
changes to the shared value through the first variable are
reflected in the value of the second variable.  In general, working 
with shared values is analogous to working with pointers in C, but with fewer risks.

<P>Line 1 describes an abstract node of an IR. A node of such class has the 
variable <TT>lno</TT> (which is the source line of the corresponding assembler
instruction).  The variable is also a class parameter.  That means
that you should define its value when creating a class instance or
object (see line 7).  Inside class <TT>irn</TT>, classes describing
each assembler instruction are defined.  Each Dino object (and not
only objects) stores information about its <i>context</i>.  So if you
create an object of class <TT>next</TT> (see line 40 in file <A
HREF="misc/makarov/input.d">input.d</A>) by calling a class that is accessed through an
object of class <TT>irn</TT>, and then you take value of the variable
<tt>lno</tt> through the object of the class <tt>next</tt>, you actually
take the value of the variable of the object of the class <TT>irn</TT>. This is a 
more simple and more general mechanism of implementing a single
inheritance.

<P>An object of the class <TT>ir</TT> (lines 9-13) contains information about
the entire program:
<UL>
<li><TT>ns</TT>, which is initialized by an empty vector, will
contain a vector of IR nodes that correspond to all instructions in the source
program.</li>
<li><TT>l2i</TT>, which is initialized by an empty associative table,
will contain a table for transforming label names into an index of the 
node in <TT>ns</TT>.  This node will represent assembler instruction marked
by the label.</li>
<li><TT>i2l</TT>, which is initialized by an empty associative
table, will contain a table for transforming the index of the node in <tt>ns</tt>
into a vector of label names. A node with such an index in <tt>ns</tt> will
represent assembler instruction marked by the labels in the vector.</li>
<li><TT>ss</TT>, which is initialized by an empty associative table, will
 contain a table of all symbols in the assembler instructions
 <TT>match</TT> and <TT>skipif</TT>.</li>
<li><TT>mind</TT> and <TT>maxd</TT> will contain the minimum and maximum
displacements of labels in the source program.</li>
</UL>

<BLOCKQUOTE>
<PRE><TT> 1. class irn (lno) {
 2.   class goto (lab)     {}     class skipif (sym)    {}
 3.   class match (sym)    {}     class gosub (lab)     {}
 4.   class next ()        {}     class ret ()          {}
 5. }
 6.
 7. var an = irn (0);
 8. 
 9. class ir () {
10.   // all ir nodes, label->node index, node index -> vector of labels.
11.   var ns = [], l2i = {}, i2l = {};
12.   var ss = {}, mind, maxd;
13. }
14.
15. func err (...) {
16.   var i;
17.   fput (stderr, argv[0], ": ");
18.   for (i = 0; i ? #args; i++)
19.     if (args[i] != nil)
20.       fput (stderr, args[i]);
21.   fputln (stderr);
22.   exit (1);
23. }
24.
25. func tab2vect (tab) {
26.   var i, vect = [#tab:""];
27.   for (i in tab)
28.     vect [tab {i}] = i;
29.   return vect;
30. }</TT></PRE>
</BLOCKQUOTE>

Lines 15-23 contain a function to output errors.  The function accepts a 
variable number of parameters whose values will be elements of
the vector in the implicitly defined variable <TT>args</TT>.  Any function or
class can be called with any number of actual parameters.  If the
number of formal parameters is more than the number of actual
parameters, the rest of formal parameters will have the default value
<TT>nil</TT>.  If the number of actual parameters is more than the
number of formal parameters, the rest of the actual parameters will be
ignored unless the last formal parameter is "<TT>...</TT>".
<P>
The other elements are:
<UL>
<LI>The <tt>err</tt> function, which outputs all parameters into the standard error stream.
</LI><LI>
The <TT>fput</TT> function, which outputs strings, characters, integers, or floating
point numbers
</LI><LI>
The <TT>fputln</TT> function, which is the same as <TT>fput</TT>, but 
additionally outputs a new line)
</LI><LI>
The <TT>exit</TT> function, which finishes the Dino program with given
code.
</LI><LI>
The variables <TT>argv</TT>, which are all command line arguments of the Dino
program. So <TT>argv[0]</TT> will be an assembler program file name.
</LI><LI>
<TT>stderr</TT> (standard error stream), which is predefined in Dino.  
</LI>
</UL>
<P>There are many other predefined functions, classes, and variables in Dino.  On
line 18 you can see the operator <TT>#</TT>, which returns the number of
elements in a vector or an associative table.

<P>Lines 25-30 contain a function that transforms a table into a vector.
The table's keys are a sequence of integers that start with 0.
The result is a vector whose elements are the table
elements placed in the vector according their keys.  
First we create a vector of the table size and initialize it with empty strings (line 26). 
Then we fill each element of the vector, iterating by the keys of the table
(lines 27-28).


<H3>2.3. File <A HREF="misc/makarov/input.d">input.d</A></H3>

This file contains the function <TT>get_ir</TT>, which reads the file given as
its parameter, performs some checks on the file, and generates the IR of the
source program.

<P>The first line contains an <i>include-clause</i> that specifies a source file 
without the suffix <B>.d</B> (all Dino source files should have this
suffix). The file is given as a string in the clause; this means that
the entire file is inserted in place of the clause.  As result, we could
check the file by calling the Dino interpreter with <TT>input.d</TT> on
a command line.  There are several rules that define which directories 
are searched for the included file.  One such directory is the
directory of the file that contains the include-clause. Thus, we can place all
the assembler files in that one directory and forget about the other rules.  

<P>The
file is inserted only once in a given <i>block</i> (this is the construction that 
starts with <TT>{</TT> and finishes with <TT>}</TT>).  This is
important for our program because each file will contain an inclusion of
the file <TT>ir.d</TT>, and eventually all the files will be included into the 
main program file.  Unconditional inclusion in this case would result
in many error messages about repeated definitions.  By the way, there is
also special form of the include-clause that permits unconditional file
inclusion.

<P>On lines 6-13 we define some variables. We use regular expressions 
to assign them strings that describe the correct assembler lines. 
The regular expressions are extended regular expressions that are described in POSIX
10003.2.  To concatenate the strings (vectors), we use the operator <TT>@</TT>.

<P>Lines 16 to 52, 53 form a <i>try-block</i> that is used to process
<i>exceptional</i> situations in the Dino program. The Dino interpreter can
generate a lot of predefined exceptions. A Dino programmer can also 
describe and generate other exceptions.  The exceptions are objects of the 
predefined class <TT>except</TT>, or they are objects of a class defined inside
the class <TT>except</TT>. Dino has special constructions (extension
blocks) to add something into a class, and functions when the class or the
function is already defined.  In our example, the exception we catch is "reaching the
end of a file", which is generated by the predefined function <TT>fgetln</TT>
(reading a new line from a file).  If we do not catch the exception, the
program finishes with a diagnostic about reaching the end of the file. In the clause
<tt>catch</tt>, we write a class of exceptions that we want to catch.
The value of the predefined variable <TT>invcalls</TT> is the class
<TT>invcall</TT>, in which class <TT>eof</TT> is defined.  In turn, the 
class <TT>invcall</TT> is defined inside the class <TT>except</TT>.  If
the exception is of a class given in the <i>catch-clause</i> or of a class
defined somewhere inside a class given in the catch-clause, a block
corresponding to the catch-clause is executed. The variable <TT>e</TT> is
implicitly defined in the block that contains the exception.  The exception
is propagated further, unless the catch-clause corresponding to the
exception is found.

<P>The predefined function <TT>fgetln</TT> returns the next line from the
file.  After this, the line is matched with the pattern on line 20.
The predefined function <TT>match</TT> returns the value <TT>nil</TT> if the
input line does not correspond to the pattern, otherwise it returns a 
vector of integer pairs.  The first pair is the first and the last
character indexes in the line.  The first pair defines the substring that
corresponds to the whole pattern.  The following pairs of indexes
correspond to constructions in parentheses in the pattern.  They
define substrings that are matched to the constructions in the
parentheses.  If a construction is not matched (for example, because an
alternative is rejected), the indexes have the value -1.

<P>The statement on line 23 extracts a label. The predefined function <TT>subv</TT>
is used to extract the sub-vectors (sub-strings).

<P>On lines 24 and 25, we use an empty vector to initialize a table element 
that corresponds to the current assembler instruction.  On lines 26-31, we
process a label, if it is defined on the line.  On lines 27-28, we check
that the label is not defined repeatedly.  On line 29, we define how to 
map the label name into number of the assembler instruction to
which the label is bound.  We make that mapping with the aid of associative
table <TT>pr.l2i</TT>.  On line 30, we add the label name to the
vector that is the element of associative table <TT>pr.i2l</TT> that has a key
equal to the number of the assembler instruction.  Predefined function
<TT>ins</TT> (insertion of element into vector) is used with index -1, which 
means addition of the element at the vector end.  Dino has
extensible vectors.  There are also predefined functions to delete
elements in vectors (and associative tables).

<BLOCKQUOTE>
<PRE><TT> 1. include "ir";
 2.
 3. func get_ir (f) {
 4.   var ln, lno = 0, code, lab, op, v;
 5.   // Patterns
 6.   var p_sp = "[ \t]*";
 7.   var p_code = p_sp @ "(goto|skipif|gosub|match|return|next)";
 8.   var p_id = "[a-zA-Z][a-zA-Z0-9]*";
 9.   var p_lab = p_sp @ "((" @ p_id @ "):)?";
10.   var p_str = "\"[^\"]*\"";
11.   var p_op = p_sp @ "(" @ p_id @ "|" @ p_str @ ")?";
12.   var p_comment = p_sp @ "(;.*)?";
13.   var pattern = "^" @ p_lab @ "(" @ p_code @ p_op @ ")?" @ p_comment @ "$";
14.
15.   var pr = ir ();
16.   try {
17.     for (;;) {
18.       ln = fgetln (f);
19.       lno++;
20.       v = match (pattern, ln);
21.       if (v == nil)
22.         err ("syntax error on line ", lno);
23.       lab = (v[4] >= 0 ? subv (ln, v[4], v[5] - v[4]) : nil);
24.       if (!(#pr.ns in pr.i2l))
25.         pr.i2l {#pr.ns} = [];
26.       if (lab != nil) {
27.         if (lab in pr.l2i)
28.           err ("redefinition lab ", lab, " on line ", lno);
29.         pr.l2i {lab} = #pr.ns;
30.         ins (pr.i2l {#pr.ns}, lab, -1);
31.       }
32.       code = (v[8] >= 0 ? subv (ln, v[8], v[9] - v[8]) : nil);
33.       if (code == nil)
34.         continue;  // skip comment or absent code
35.       op = (v[10] >= 0 ? subv (ln, v[10], v[11] - v[10]) : nil);
36.       var node = irn (lno);
37.       if (code == "goto" || code == "gosub") {
38.         if (op == nil || match (p_id, op) == nil)
39.           err ("invalid or absent lab `", op, "' on line ", lno);
40.         node = (code == "goto" ? node.goto (op) :  node.gosub (op));
41.       } else if (code == "skipif" || code == "match") {
42.         if (op == nil || match (p_id, op) == nil)
43.           err ("invalid or absent name `", op, "' on line ", lno);
44.         node = (code == "skipif" ? node.skipif (op) : node.match (op));
45.       } else if (code == "return" || code == "next") {
46.         if (op != nil)
47.           err (" non empty operand `", op, "' on line ", lno);
48.         node = (code == "next" ? node.next (op) : node.ret ());
49.       }
50.       ins (pr.ns, node, -1);
51.     }
52.   } catch (invcalls.eof) {
53.   }
54.   return pr;
55. }</TT></PRE>
</BLOCKQUOTE>

On lines 36-49 we check the current assembler instruction and create
the corresponding IR node (an object of a class inside the class <TT>ir</TT>
-- see file <TT>ir.d</TT>).  And finally, we insert the node at the end
of the vector <TT>pr.ns</TT> (line 50).

<H3>2.4. File <A HREF="misc/makarov/check.d">check.d</A></H3>

After processing all assembler instructions in the file <TT>input.d</TT>,
we can check that all labels are defined (lines 7-9) and we can evaluate the 
maximum and minimum displacements of <TT>goto</TT> and <TT>gosub</TT>
instructions from the corresponding label definition (lines 10-13).
The function <TT>check</TT> makes this work.  It also forms an associative
table <TT>pr.ss</TT> of all symbols given in the instructions
<TT>match</TT> and <TT>skipif</TT>, and enumerates the symbols (lines
16-17).  Here the function <TT>inside</TT> (lines 6 and 14) is used to
define that an object is of a given class, or of a class defined
somewhere in a given class.

<BLOCKQUOTE>
<PRE><TT> 1. include "ir";
 2.
 3. func check (pr) {
 4.   var i;
 5.   for (i = 0; i ? #pr.ns; i++) {
 6.     if (inside (pr.ns[i], an.goto) || inside (pr.ns[i], an.gosub)) {
 7.       if (!(pr.ns[i].lab in pr.l2i))
 8.         err ("undefined label `", pr.ns[i].lab, "' on line ",
 9.              pr.ns[i].lno);
10.       if (pr.maxd == nil || pr.maxd ? pr.l2i {pr.ns[i].lab} - i)
11.         pr.maxd = pr.l2i {pr.ns[i].lab} - i;
12.       if (pr.mind == nil || pr.mind > pr.l2i {pr.ns[i].lab} - i)
13.         pr.mind = pr.l2i {pr.ns[i].lab} - i;
14.     } else if (inside (pr.ns[i], an.match)
15.                || inside (pr.ns[i], an.skipif)) {
16.       if (!(pr.ns[i].sym in pr.ss))
17.         pr.ss {pr.ns[i].sym} = #pr.ss;
18.     }
19.   }
20. }</TT></PRE>
</BLOCKQUOTE>

<H3>2.5. File <A HREF="misc/makarov/gen.d">gen.d</A></H3>

The biggest assembler source file is the interpreter generator.  We
generates two files: a <B><TT>.h</TT></B> file (the interface of the interpreter)
and a <B><TT>.c</TT></B> file (the interpreter itself).  We create the files
on line 4.  The parameter <TT>bname</TT> of the function <TT>gen</TT> is a 
base name of the generated files.  The interface file contains
definitions of codes of tokens in <TT>match</TT> and <TT>skipif</TT>
instructions as C macros (lines 6-9) and definition of function
<TT>yyparse</TT> (line 35).  Function <TT>yyparse</TT> is a main
interpreter function.  It returns 0 if the source program is correct,
and nonzero otherwise.

<P>The generated interpreter requires the external functions <TT>yylex</TT>
and <TT>yyerror</TT> (line 34). The function <TT>yylex</TT> is used by
the interpreter to read and to get the code of the current token.
Function <TT>yyerror</TT> should output error diagnostics. (The
interface is a simplified version of the Yacc Unix Utility interface.)

<P>The compiled assembler program is presented by a C array of chars or
short integers with the name <tt>program</tt>.  Each element of the array
is an encoded instruction of the source program.  On lines 11-15, we
evaluate the start code for each kind of assembler instruction and define the 
type of array elements.  On lines 16-33, we output the array
<tt>program</tt>. On lines 37-61, we output the function <TT>yyparse</TT>.
Finally, on lines 62-63 we close the two output files with the aid of the
predefined function <TT>close</TT>.

<BLOCKQUOTE>
<PRE><TT> 1. include "ir";
 2. 
 3. func gen (pr, bname) {
 4.   var h = open (bname @ ".h", "w"), c = open (bname @ ".c", "w");
 5.   var i, vect;
 6.   vect = tab2vect (pr.ss);
 7.   for (i = 0; i ? #vect; i++)
 8.     fputln (h, "#define ", vect[i], "\t", i + 1);
 9.   fputln (h);
10.   fputln (c, "#include \"", bname, ".h\"\n\n");
11.   var match_start = 3, skipif_start = match_start + #pr.ss,
12.       goto_start = skipif_start + #pr.ss,
13.       gosub_start = goto_start + (pr.maxd - pr.mind) + 1,
14.       max_code = gosub_start + (pr.maxd - pr.mind);
15.   var t = (max_code ? 256 ? "unsigned char" : "unsigned short");
16.   fputln (c, "\nstatic ", t, " program [] = {");
17.   for (i = 0; i ? #pr.ns; i++) {
18.     if (inside (pr.ns[i], an.goto))
19.       fput (c, " ", goto_start + pr.l2i{pr.ns[i].lab} - i - pr.mind, ",");
20.     else if (inside (pr.ns[i], an.match))
21.       fput (c, " ", match_start + pr.ss{pr.ns[i].sym}, ",");
22.     else if (inside (pr.ns[i], an.next))
23.       fput (c, " 1,");
24.     else if (inside (pr.ns[i], an.ret))
25.       fput (c, " 2,");
26.     else if (inside (pr.ns[i], an.skipif))
27.       fput (c, " ", skipif_start + pr.ss{pr.ns[i].sym}, ",");
28.     else if (inside (pr.ns[i], an.gosub))
29.       fput (c, " ", gosub_start + pr.l2i{pr.ns[i].lab} - i - pr.mind, ",");
30.     if ((i + 1) % 20 == 0)
31.       fputln (c);
32.   }
33.   fputln (c, " 0, 0\n};\n\n");
34.   fputln (h, "extern int yylex ();\nextern int yyerror ();\n");
35.   fputln (h, "\nextern int yyparse ();\n");
36.   fputln (h, "#ifndef YYSTACK_SIZE\n#define YYSTACK_SIZE 50\n#endif");
37.   fputln (c, "\nint yyparse () {\n  int yychar=yylex (), pc=0, code;\n  ",
38.           t, " call_stack [YYSTACK_SIZE];\n  ", t, " *free=call_stack;");
39.   fputln (c, "\n  *free++=sizeof (program) / sizeof (program [0]) - 1;");
40.   fputln (c, "  while ((code=program [pc]) != 0 ?? yychar > 0) {");
41.   fputln (c, "    pc++;\n    if (code == 1)\n      yychar=yylex ();");
42.   fputln (c, "    else if (code == 2) /*return*/\n      pc=*--free;");
43.   fputln (c, "    else if ((code -= 2) ? ", #pr.ss, ") {/*match*/");
44.   fputln (c, "      if (yychar == code)\n        pc++;\n      else {");
45.   fputln (c, "        yyerror (\"Syntax error\");");
46.   fputln (c, "        return 1;\n      }");
47.   fputln (c, "    } else if ((code -= ", #pr.ss, ") ? ", #pr.ss, ") {");
48.   fputln (c, "      if (yychar == code)\n        pc++; /*skipif*/");
49.   fputln (c, "    } else if ((code -= ", #pr.ss, ") ?= ", pr.maxd-pr.mind,
50.           ") /*goto*/\n      pc += code + ", pr.mind, ";");
51.   fputln (c, "    else if ((code -= ", pr.maxd - pr.mind + 1, ") ?= ",
52.           pr.maxd - pr.mind, ") { /*gosub*/");
53.   fputln (c, "      if (free >= call_stack + YYSTACK_SIZE) {");
54.   fputln (c, "        yyerror (\"Call stack overflow\");");
55.   fputln (c, "        return 1;\n      }\n      pc += code + ", pr.mind,
56.       ";\n      *free++=pc;\n    } else {");
57.   fputln (c, "      yyerror(\"Internal error\");\n      return 1;\n    }");
58.   fputln (c, "  }\n  if (code != 0 || yychar > 0) {");
59.   fputln (c, "    if (code != 0)\n      yyerror (\"Unexpected EOF\");");
60.   fputln (c, "    else\n      yyerror(\"Garbage after end of program\");");
61.   fputln (c, "    return 1;\n  }\n  return 0;\n}");
62.   close (h);
63.   close (c);
64. }</TT></PRE>
</BLOCKQUOTE>

<H3>2.6. File <A HREF="misc/makarov/sas.d">sas.d</A></H3>

This is the main assembler file.  Lines 1-4 are include-clauses for the
inclusion of the previous files.  Line 6-7 checks that the argument is
given on the command line.  On line 9 we open the file given on the
command line, and call the function for reading and generating the IR of
the program.  If the file does not exist or cannot be opened for
reading, an exception is generated.  The exception results in the output
of standard diagnostics and finishes the program. We could catch the
exception and do something else, but the standard diagnostics will be
sufficient here.  On line 10, we check the IR.  And finally on line 11,
we generate the interpreter of the program.  To get the base name of
the assembler file, we use the predefined function <TT>sub</TT>, deleting
all directories and suffixes from the file name and returning the
result.

<BLOCKQUOTE>
<PRE><TT> 1. include "ir";
 2. include "input";
 3. include "check";
 4. include "gen";
 5.
 6. if (#argv != 1)
 7.   err ("Usage: sas file");
 8.
 9. var pr = get_ir (open (argv[0], "r"));
10. check (pr);
11. gen (pr, sub ("^(.*/)?([^.]*)(\\..*)?$", argv[0], "\\2"));</TT></PRE>
</BLOCKQUOTE>

<H3>2.7. Results</H3>

So we've written the assembler (this is 200 lines in Dino).  As a 
test, we will use <A
HREF="http://www.edm2.com/0608/oberon2.html">Oberon-2</A> language
grammar.  You can look at Oberon-2 parser in the file <A
HREF="misc/makarov/oberon2.sas">oberon2.sas</A>. After

<UL>
<UL><TT>dino sas.d oberon2.sas</TT></UL>
</UL>

we will get two files <A HREF="misc/makarov/oberon2.h">oberon2.h</A> and <A
HREF="misc/makarov/oberon2.c">oberon2.c</A>.  Let's look at the size of generated
IAX-32 code:

<BLOCKQUOTE>
<PRE><TT>    gcc -c -Os oberon2.c; size oberon2.o

    text        data    bss     dec     hex     filename
    382         934     0       1316    524     oberon2.o</TT></PRE>
</BLOCKQUOTE>

For comparison, we would have about 15Kb for a YACC generated parser.
Not bad.  But we could make the parser even less than 1Kb by using 
short and long <TT>goto</TT> and <TT>gosub</TT> instructions.
Actually, what we generate is not a parser, it is only a recognizer.
But the assembler language could be easily extended to write parsers.
Just add the instructions:

<BLOCKQUOTE>
<PRE><TT>   call C-function-name</TT></PRE>
</BLOCKQUOTE>

to call semantic functions for the generation of parsed code.  In
any case, most of a compiler's code would be in C.  To further decrease the
compiler size (not only its parser), an interpreter of C that is
specialized to the compiler could be written.

<P>Of course, it is not easy work to write a parser on the assembler. 
So we could generate assembler code from a high-level syntax description,
for example, from a Backus-Naur form.  Another area for improvement is the implementation
of error recovery, but this was not our purpose.  Our goal
was just to demonstrate the Dino language.

<H2>3. Some last comments</H2>

What Dino's features were missed in this introduction?  Many details, of
course, but we also have not described the following major parts of Dino
language:

<UL>
<LI>Extensions</LI>

<LI>Threads</LI>

<LI>Public and private variables</LI>

<LI>Mutable and immutable values</LI>

<LI>Functions, classes, threads, and types as values</LI>

<LI>Interface with C language and writing external dynamic libraries</LI>
</UL>

The Dino interpreter is distributed under GNU Public license.  You can
find it on

<P> <A HREF="http://www.linuxstart.com/~vladimir_makarov/dinoload.html">http://www.linuxstart.com/~vladimir_makarov/dinoload.html</A>
<BR> <A HREF="http://www.freespeech.org/vmakarov/dinoload.html">http://www.freespeech.org/vmakarov/dinoload.html</A>
<BR>

<P> Special thanks to Michael Behm (a member of the Cygnus documentation group)
for his help in editing this article.
</BODY>
</HTML>



<!-- BEGIN copyright ==================================================-->
<P> <hr> <P> 
<H5 ALIGN=center>

Copyright &copy; 1999, Vladimir N. Makarov<BR> 
Published in Issue 47 of <i>Linux Gazette</i>, November 1999</H5>
<!-- END copyright ===================================================-->





<!--startcut footer ===================================================-->
<P> <hr> <P> 
<A HREF="lg_toc47.html"><IMG ALIGN=BOTTOM SRC="../gx/indexnew.gif" 
ALT="[ TABLE OF CONTENTS ]"></A>
<A HREF="../lg_frontpage.html"><IMG ALIGN=BOTTOM SRC="../gx/homenew.gif"
ALT="[ FRONT PAGE ]"></A>
<A HREF="lukas.html"><IMG SRC="../gx/back2.gif"
ALT=" Back "></A>
<A HREF="../lg_faq.html"
	><IMG SRC="./../gx/dennis/faq.gif"
              ALT="[ Linux Gazette FAQ ]"></A>
<A HREF="mcgowan.html"><IMG SRC="../gx/fwd.gif" ALT=" Next "></A>
<P> <hr> <P> 
</BODY></HTML>
<!--endcut ============================================================-->