File: TUNING

package info (click to toggle)
pfe 0.9.14-5
  • links: PTS
  • area: main
  • in suites: potato
  • size: 1,436 kB
  • ctags: 2,439
  • sloc: ansic: 14,095; sh: 438; asm: 113; makefile: 70; perl: 13
file content (194 lines) | stat: -rw-r--r-- 6,522 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
TUNING THE PORTABLE FORTH ENVIRONMENT			-*- indented-text -*-
#####################################

1) Loop unrolling in the inner interpreter
==========================================

The most time critical piece of code in pfe is the inner interpreter,
a tight loop calling all primitives compiled into a high-level
definition. You find it in file support.c, function run_forth().

On some CPU's it significantly saves time when the code of the inner
interpreter is unrolled several times without the need to jump back to
the start of the loop after every primitive is executed. On other
CPU's it doesn't help or even makes it slower.

For example the benchmark-performance of pfe on a 486 is about 15%
better with unrolled NEXT, while the performance on a Pentium becomes
slightly worse.

You'll have to try it, what is better on your machine. To enable the
feature, add the following compiler option in Makefile:

	 -DUNROLL_NEXT


2) Using global register variables
==================================

pfe is designed for best portability. This means it can be compiled
with a variety of compilers on many systems. Obviously this prevented
me from squeezing the last bit of performance out of any special
system.

Fortunately there's a way to tune it up significantly with only little
effort provided you have GNU-C at hand.

Let me explain: As most of you probably know, a Forth-interpreter
traditionally contains a so-called virtual machine. PFE does. This
virtual machine consists of several virtual registers and a basic set
of operations. The virtual registers are:

	ip	an instruction pointer
	sp	the data stack pointer
	rp	the return stack pointer
	w	an auxiliary register

in pfe there are additionally:

	lp	pointer to local variables
	fp	floating point stack pointer

In a traditional assembler-based Forth implementation these virtual
registers would be mapped to physical registers of the CPU at hand.
How efficient such an implementation is depends heavily on how
cleverly this mapping is done.

pfe has no other choice than to declare C-language global variables to
represent these virtual registers. These variables are accessed *very*
frequently.

Now GNU-C allows us to put global variables in registers! Obviously
the number of registers in a CPU is limited and the use of registers
by library functions and the compiler itself interferes.

In spite of these restrictions it is possible to find a niche even in
an i386 where to place the two most important virtual registers
resulting in a performance boost of about 50%. (Just one more detail
that shows what a great job the GNU-C developers did.)


If your system is one of those known by the config-script then all
provisions to use global register variables are already taken.
You can enable and disable the usage of global register variables in
`src/makefile' by specifying the command line option '-DUSE_REGS'
(default) or removing it.

If your system isn't known by the config script, then first make sure
you have a stable port according to the instructions in the file
`INSTALL'. Then read the next section to enable the usage of register
variables on your system. If all works well please send me your
changes.


Warning:

current versions of gcc (<= 2.6.0) seem to compile incorrect code in
very special situations when global register variables are used. This
is reported and fixed in later gcc versions.

When you find something not working that worked in previous versions
of pfe, then please check if it works again after recompiling pfe
without -DUSE_REGS. Please inform me of such cases:
duz@roxi.rz.fht-mannheim.de <Dirk Zoller>


Choosing registers to use
=========================

When you use global register variables in GNU-C then you have to
explicitly state which machine register to use for the global variable
to declare "register". The syntax is like this:

	register type variable_name asm ("machine register name");

instead of just

	type variable_name;

As far as I see choosing machine registers to use for global register
variables is just a matter of trial and error.

First find out how registers are named on your machine. Not how the
CPU-manufacturer names them but how the assembler used by gcc (as or
gas depending on the configuration of gcc) names them.  It's easy:
simply use gcc to compile one of the C files with option -S.
I changed the `makefile' to allow this by simply `make core.s'.

Then look at `core.s': You don't have to know much of assembly
language programming and even less of the particular CPU. All you are
interested in is: what are the registers? In `core.s' search for the
label `dupe_' i.e. the compiled function that does the work of the
Forth word `DUP'.  The C-source for dupe is:

	Code (dupe)
	{
  	  --sp;
  	  sp[0] = sp[1];
	}


On an RS/6000 (where you won't have to do this because I did it
already) using gcc you'd find the following assembler lines generated
for dupe_:

.dupe_:
        l 11,LC..106(2)
        l 9,0(11)
        cal 0,-4(9)
        st 0,0(11)
        l 0,0(9)
        st 0,-4(9)
        br

Reading more of the generated assembler source allowed a guess that
 - Gcc talks to the assembler about registers by their numbers only.
 - Gcc never uses registers with numbers around 16 while the cpu seems
   to have 32 such registers.

Next edit the file `src/virtual.h'. Add a system specific section of
preprocessor definitions naming CPU registers to use for virtual
machine registers like this:

	...
	#elif AIX3

	#  define REGIP "13"
	#  define REGSP "14"
	#  define REGRP "15"
	#  define REGW  "16"
	#  define REGLP "17"
	#  define REGFP "18"

	#elif...

Ok, the full set needed a little more experimentation. Maybe start
with only REGSP or REGIP.

After enabeling these declarations with the -DUSE_REGS command line
option another `make core.s' yields the following translation for DUP:

.dupe_:
	cal 14,-4(14)
	l 0,4(14)
	st 0,0(14)
	br

Quite a difference!

If your CPU has different types of registers for data and for pointers
then the pointers are needed in pfe. (On M68k the Ax not the Dx.)

If you don't have enough free registers in your CPU then serve the
first virtual registers in the above list first. They are ordered by
their importance.

Then do a `make new' with option -DUSE_REGS. If you get compiler
errors and warnings about `spilled' or `clobbered' registers then
change the mapping until it compiles quietly. There's a good chance
that it still runs now and if it does it runs significantly faster
than before.

Good luck!

Dirk