File: hacking.html

package info (click to toggle)
netrik 1.16.1-2
  • links: PTS, VCS
  • area: main
  • in suites: bullseye, buster, sid, stretch
  • size: 3,272 kB
  • ctags: 729
  • sloc: ansic: 6,657; sh: 994; makefile: 114
file content (600 lines) | stat: -rw-r--r-- 22,232 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
<html>
<head>
<title>netrik hacker's manual</title>
</head>
<body>

<h1 align="center">netrik hacker's manual<br />>========================&lt;</h1>

<p>
The manual contains detailed documetation on the netrik code. It should be of
interest to you if you want to hack on netrik, but also if you are only curious
how it works.
</p>

<p>
The manual includes:
<ul><li>
 hacking.* (this file): general notes, module overview
</li><li>
 <a href="hacking-load.html">hacking-load.*</a>: loading HTML files from local disk or HTTP connections
</li><li>
 <a href="hacking-page.html">hacking-page.*</a>: page (history) handling
</li><li>
 <a href="hacking-layout.html">hacking-layout.*</a>: the layouting/rendering module
</li><li>
 <a href="hacking-pager.html">hacking-pager.*</a>: the interactive viewing module
</li><li>
 <a href="hacking-links.html">hacking-links.*</a>: hyperlink handling
</li><li>
 <a href="hacking-search.html">hacking-links.*</a>: text search
</li><li>
 <a href="hacking-terminal.html">hacking-terminal.*</a>: terminal input/output and color handling
</li></ul>
("*" means txt or html -- choose the one you prefer...)
</p>

<a name="notes" id="notes">

<h2>0. General notes</h2>

<p>
The program (except for the layout engine) presently doesn't obey any specific
structure. It can be roughly divided into a copule of big modules: layout
engine, file loading, and pager; the page handling, which is another part of
the loading mechanism, may also be considered a module of its own. However,
these modules aren't seperated clearly; they intersect at many places. The
loading is tightly coupled to the layouting -- the page handling part invokes
certain layouting passes, while the file loader offers it's service to feed the
data into the layout engine. However, these two parts of the loading mechanism
need to exchange much information (URLs, error conditions etc.), which can only
be passed through the layout engine. The pager naturally has to make use of the
layout engine to get the screen content; but it also uses some of it's data
structures and helper functions for link handling. The main programm basically
makes calls to both the pager and the page handling (which in turn invokes the
layouting), but also has to use data structures and functions directly;
actually, it contains part of the page handlig. Finally, the loading module
affects and depends on some data used by the pager.
</p>

<p>
A problem arises from the fact that all control passing is done by direct
function calls. This was perfectly OK at the time when netrik wasn't much more
than a layouting engine; with all the link and anchor handling, various loading
mechanisms (including error handling, user breaks etc.), form handling, and
other functionality, this has gotten quite chaotic -- it's very hard to retrace
the various intermingled calls between the modules or the data dependencies.
</p>

<p>
This probably could be helped somewhat by using a kind of object-oriented
approach, i.e. trying to put all operations on a given data structure in one
place. (Encapsulation.) However, I don't think it would be possible to make
great progress without much hassle and considerable efficiency loss.
</p>

<p>
I've devised a much nicer, clearer solution instead (well, at least it looks
nice in my imagination ;-) ), which involves a fundamental change in program
structure. Basically it's about passing flow control to a main dispatcher,
using state handles and an event queue.
</p>

<p>
We will probably implement that concept in one of the next major releases. It
is inevitable for multitasking (displaying of partially loaded pages,
multiwindowing) anyways, as we want to stick to explicit multitasking (real
multithreading/multiprocessing is much more complicated, and terribly
inefficient), and only this concept allows resuming and a fine grained
scheduling. But again, it should also simplify and clearify the overall program
structure, so we will probably implement it even before it's absolutely
necessary.
</p>

<p>
As mentioned before, the layout engine, contrary to the other modules, has a
fairly clear concept from the beginning. The main idea was to split HTML
processing into a couple of independant passes. As the individual passes are
quite simple, this makes understanding and altering the code quite easy.
</p>

<p>
On the other hand, this isn't terribly efficient the way it is done now. We
think however that in the early stage netrik is now, it's more important to
keep the code as simple as possible to faciliate fast development.
</p>

<p>
Also, splitting up the processing is necessary to allow rendering of not
completely loaded pages. As this is to be one of the major features of netrik
(it's not implemented yet), we pay much attention to that.
</p>

<p>
However, the splitting into passes isn't presently optimal by any means. It's
more or less the first which came to mind. In the new parser planned for the
next major release the segmentation will be somewhat different.
</p>

<p>
Most of the processing steps of the layouting are working recursively due to
the nature of SGML/XML. They arn't implemented recursively, though. This makes
understanding a tick harder, but there is a couple of reasons for that.
Efficiency is one of them, but not that important presently. The main reason is
that a real recursive implementation wouldn't work with the planned
multitasking system (s.a.) as it doesn't allow resuming.
</p>

</a>    <!-- notes -->

<h2>1. Module Overview</h2>

<p>
Here is an overview how the different modules work together. Detailed
descripitons of the modules are located in the various hacking-... files.
</p>

<a name="main" id="main">

<h3>main.c</h3>

<p>
The main program is responsible for two things: Program initialization, and the
main (pager) loop.
</p>

<p>
Initialization presently mostly comprises reading command line and config file
options.
</p>

<p>
This is somewhat tricky, but not too complicated when you get the idea: The
config file (~/.netrikrc) just contains options identical to the command line
options, one per line. They are simply prepended to the command line options,
and processed with getopt_long() together.
</p>

<p>
For that, netrik first reads the file. This is done by determining the file
name (using the HOME environment variable), then using stat() to get the file
size; this also tells us whether the file exists at all. Then it is opened and
read into a memory buffer, all at once. The next step is the intersting one:
The file contents is scanned, to extract the single options. Every encounterd
newline is replaced by a '\0' (string end), and a pointer to the beginning of
that line (option) is stored in the "argv_all" array . "argv_all" and
"argc_all" are identical to "argv" and "argc", only we also store the config
file options here. argv_all[0] is not filled, as argv[0] is always the command
that started the programm; the real arguments start with 1. After extracting
all config file options, the real command line options from "argv" are appended
to "argv_all". This way, defaults can be set in the config file, but still
overridden by command line options.
</p>

<p>
Having this, the arguments are scanned with getopt_long() in config_cmdln(),
where options are extracted and stored in the global "cfg" struct, and
non-option arguments singled out. (They are moved to the last argv-entries by
getopt_long().) The last argument is used as the resource name for the document
to load on startup; the other ones are ignored. (This allows setting a "home
page" in the config file, which will be overwritten if some URL is specified on
the command line.)
</p>

<p>
After scannig the arguments, the color map is initialized to the desired color
scheme. (See <a href="hacking-terminal.html#colorSchemes">Color
Schemes</a> in hacking-terminal.* for details.)
</p>

<p>
Now the first document is loaded and, unless netrik was invoked with the
"--dump" option, the pager loop is entered.
</p>

<p>
At the beginning of each iteration,
<a href="hacking-pager.html#display">display()</a> is called to start
the pager. This one performs all pure viewing operations, including scrolling
and link selection; it returns only when some page handling and/or leaving
fullscreen mode is required. main() then performs the desired action (indicated
by the return value of display()), before restarting the pager in the next
iteration.
</p>

<p>
Most of those actions involve loading some new page -- actually, the main loop
can be considered part of the <a href="#page">Page Loading</a>
mechanism.
</p>

<p>
The new page (as well es the startup page, before entering the loop) is always
loaded by calling
<a href="hacking-page.html#loadPage">load_page()</a> in one mode or
the other. The mode varies depending on the exact command: Following a link (or
a form submit button), opening a new URL on the command prompt (s.b.), history
commands. The precise actions are described in detail in
<a href="hacking-load.html">hacking-load.*</a>.
</p>

<p>
There are also situations that do not require loading a new page; in this case,
the viewer is simply restarted with the same page in the next iteration.
</p>

<p>
For one, this includes handling of non-submit form items. This is described
under <a href="hacking-links.html#manipulating">Manipulating</a>
in hacking-links.*. Note that all form elements, including submit buttons, and
also normal links are handled by one pager return value; they are distinguished
by the type of the active link item, which is handled in another switch by
main().
</p>

<p>
The second group of commands that do not involve loading a new page are the URL
displaying commands ('u', 'U', 'c'), which just print some information on the
screen before returning to the pager. Besides of that, they are also related to
the page handling, and thus described in
<a href="hacking-page.html">hacking-page.*</a>.
</p>

<p>
The command prompt is somewhat special also, as it doesn't cause loading some
new page immediately. readline() is used to get a command line from the user
instead, and the action taken depends on that command. However, presently only
the ":e" and ":E" commands are implemented, which also load an new page; they
are described in <a href="hacking-page.html">hacking-page.*</a> also.
</p>

<p>
Finally, if the pager returned because the terminal size changed (SIGWINCH
received), main() calls
<a href="hacking-layout.html#resize">resize()</a> (described in
hacking-layout.*) to prepare the page for the new screen width, and then
restarts the pager as usual. This is detailed under
a href=hacking-terminal.in#winch">SIGWINCH Handling</a> in
hacking-terminal.*.
</p>

<p>
A somewhat unobvious aspect is switching between curses fullscreen mode and
normal scrolling mode. The viewer always operates in fullscreen mode, while
everything in main() is done in scrolling mode.
</p>

<p>
Curses fullscreen mode is initialized once, just before entering the main loop.
The program doesn't switch to fullscreen mode immediately, however; this is
done only on the first refresh, which is induced by the getch() inside
display() -- after rendering the screen content.
</p>

<p>
Fullscreen mode is turned off using endwin() before display() returns, and will
be reactivated only when getch() is called again from display() in the next
iteration.
</p>

<p>
Some commands need to display some information to the user before returning to
the pager. Particularily the URL show commands, but also loading if some error
occurs or if in --debug mode. This is accomplised by the "pager_wait" flag,
which is set by any command that needs to display something. If the flag was
set during an iteration, main() waits for a keypress before proceeding with the
next one -- which will reactivate the pager.
</p>

<p>
Quitting netrik is also handled in the main loop: When the user typed 'q' in
the pager, display() returns a special value, and the loop terminates. A few
cleanup actions follow, whereafter main() returns.
</p>

</a>    <!-- main -->

<a name="loading" id="loading">

<h2>File Loading</h2>

<p>
The loading module consists of two parts: One is responsible for actually
loading an HTML document from a local file or from a HTTP server. It consists
of the files load.[ch], http.[ch], http-parse-header.[ch], the url handling
functions in url.[ch], and interrupt.[ch] for user break handling. Some
functions from forms.c are necessary also, when submitting HTML forms. This
part is invoked from
<a href="hacking-layout.html#layout">layout()</a>, as part of the <a
href="#layout">Layouting</a> process.
</p>

<p>
The other part of the loading mechanism is the <a href="#page">Page
Loading</a> system.
</p>

<p>
The main file of the file loader is load.c, which contains the init_load() and
load() functions.
</p>

<p>
The loading is intialized by a call to
<a href="hacking-load.html#initLoad">init_load()</a> with the desired
URL as argument. A base URL is also passed, which is merged with the target URL
to create the effective URL, if the target URL is a relative one. (Following
links etc.) init_load() then decides whether it is a local file or an HTTP URL,
and initializes the loading. (Opens file or establishes HTTP connection.)
</p>

<p>
The loading itself is done by calling
<a href="hacking-load.html#load">load()</a>. This function fills a
buffer, which then can be processed. After ther buffer is processed, load()
has to be called again, loading the next chunk.
</p>

<p>
The loading module is described in detail in
<a href="hacking-load.html">hacking-load.*</a>.
</p>

</a>    <!-- loading -->

<a name="page" id="page">

<h3>Page Loading</h3>

<p>
The second part of the loading mechanism is the page handling system. This one
is responsible for getting a new document upon request, and preparing it so it
can be displayed in the <a href="#viewer">Viewer</a>. It's also
responsible for history handling, and everything else connected with handling
the <a href="hacking-page.html#pageList">Page List</a> (see
hacking-page.*).
</p>

<p>
The page loading mechanism only has the page.c and page.h files for itself;
however, part of it is also located in <a href="#main">main()</a>.
</p>

<p>
The main function is
<a href="hacking-page.html#loadPage">load_page()</a>, with the basic
functionality of loading a new document and preparing it so it can (and will)
be displayed in the <a href="#viewer">Viewer</a>, as soon as
<a href="hacking-pager.html#display">display()</a> (see
hacking-pager.*) is called from <a href="#main">main()</a>. For
this, a new page list entry is created, and
<a href="hacking-layout.html#layout">layout()</a> (see
hacking-layout.*) is used to load the page and prepare it for rendering.
</p>

<p>
load_page() also has other modes, depending on what kind of page is to be
displayed: If the new page uses the same HTML document, and the page only has
to jump to an anchor, the document isn't reloaded; a new page descriptor is
created, but the layout date is taken from the old page.
</p>

<p>
If a page from the page history is to be reloaded, no new page descriptor is
created; the old one is simply reactivated. The document is reloaded using
layout(), if necessary.
</p>

<p>
load_page() is called from <a href="#main">main()</a> to load the
start page given on the command line, and then in various modes from the main
loop, when the user requests various functions in the
<a href="#viewer">Viewer</a> (loading new URL, following link,
going back/forward in page history).
</p>

<p>
The page loading mechanism is described in detail in
<a href="hacking-page.html">hacking-page.*</a>.
</p>

</a>    <!-- page -->

<a name="layout" id="layout">

<h3>Layouting</h3>

<p>
The layout engine consists of the files parse-syntax.c, syntax.h, facilities.c,
dump-tree.c parse_elements.c, sgml.c, parse-struct.c, items.c, items.h, pre-render.c,
render.c, render.h, layout.c, and layout.h.
</p>

<p>
As mentioned before, layouting is done in several passes.
</p>

<p>
The first passes are parsing syntax
(<a href="hacking-layout.html#parseSyntax">parse_syntax()</a>),
looking up the element and attribute names
(<a href="hacking-layout.html#parseElements">parse_elements()</a>),
(optionally) fixing a broken tree created by SGML documents
(<a href="hacking-layout.html#sgmlRework">sgml_rework()</a>,
interpreting the elements
(<a href="hacking-layout.html#parseStruct">parse_struct()</a>),
and assigning positions and sizes to all items of the output page
(<a href="hacking-layout.html#preRender">pre_render()</a>).
</p>

<p>
make_link_list() and make_anchor_list() are also called after parse_struct(),
creating the <a href="hacking-links.html#linkList">link list</a> and
<a href="hacking-links.html#anchorList">anchor list</a> data
structures necessary for link handling. (See
<a href="hacking-links.html">hacking-links.*</a>.)
</p>

<p>
All these passes are applied from layout(), immediately after the file is
opened; <a href="#loading">load()</a> is called from within
parse_syntax().
</p>

<p>
After these processing steps, the page is ready for rendering. The actual rendering is
done just in time by <a href="hacking-layout.html#render">render()</a>,
which is called from the viewer, each time some new region of the output page
needs to be displayed.
</p>

<p>
The layout engine is described in detail in
<a href="hacking-layout.html">hacking-layout.*</a>.
</p>

</a>    <!-- layout -->

<a name="viewer" id="viewer">

<h3>Viewer</h3>

<p>
The viewer module consists of the files pager.c and pager.h.
</p>

<p>
The pager uses curses to display the layouted page in an interactive manner.
</p>

<p>
Every time the visible output page region changes,
<a href="hacking-layout.html#render">render()</a> is called to display
the new region.
</p>

<p>
The viewer module is described in detail in
<a href="hacking-pager.html">hacking-pager.*</a>.
</p>

</a>    <!-- viewer -->

<h3>Link Handling</h3>

<p>
Hyperlink (and anchor) handling is not a module in a classical sense; there is
no central place, no source files specific for that. There are the links.c and
links.h files, but they contain only some helper functions; most of the code
necessary for handling links is distributed among almost all of the other
modules.
</p>

<p>
The layout engine needs to extract all links and anchors while parsing the
page, and assign coordinates to them while pre-rendering. The pager needs to
highlight the selected link; provide commands for selecting and following
links; and inform the main program about the link following. The main program
needs to initiate loading of the link. The file loader needs to construct the
target URL from the current page URL and the link URL.
</p>

<p>
All necessary steps are described in
<a href="hacking-links.html">hacking-links.*</a>, or else pointers to
the specific module documentation are given.
</p>

<h3>Form Handling</h3>

<p>
HTML forms are very similar to links, and mostly they are handled together. Some
special handling is required at certain places, though. These are mentioned in
<a href="hacking-links.html">hacking-links.*</a>. Some additional
functions necessary for form handling reside in forms.c and form-file.c; these are also covered
in hacking-links.*.
</p>

<a name="search" id="search">

<h3>Text Search</h3>

<p>
The search command presently doesn't reside in a file of its own; search.c only
defines the global "search" struct, which is used to pass information between
the search handling inside the pager and in main(), as well as keep information
between multiple searches. Searching itself is performed inside main().
</p>

<p>
The search framework is very similar to the command prompt (see
<a href="#main">main.c</a>). When typing the search character,
display() quits with a special status. main() than requests a search string
(via readline) and performs the search, resulting in a new cursor position;
display() is then restarted in the next main loop iteration, and knows to jump
to the match postion by the state stored in "search.type".
</p>

<p>
Details on how the search itself works can be found in
<a href="hacking-search.html">hacking-search.*</a>.
</p>

</a>    <!-- search -->

<h3>Terminal Handling</h3>

<p>
netrik uses the curses library for all terminal handling.
</p>

<p>
There are two distinct modes in which netrik operates: Full screen mode (where
all output is managed by curses), and scrolling mode, where curses (more
exactly, the terminfo handling functions of curses) is used to get some control
strings (color setting etc.) and for setting text attributes like bold, but
output is printed to the screen directly.
</p>

<p>
Other terminal related issues like keyboard input and signal handling are also
done using curses.
</p>

<p>
Most input/output tasks (reading a single character from keyboard, printing a
text string etc.) are trivial, and the curses functions are called directly
from the functions needing them (mostly in pager.c and render.c). There are
however a number of more complicated tasks like initialising the terminal or
setting colors, which can't be done in a single function call. Those are
handled by helper functions, which are located in
<a href="hacking-terminal.html#screenC">screen.c</a> (documented
in <a href="hacking-terminal.html">hacking-terminal.*</a>).
</p>

<p>
Another issue related to output (though only indirectly to terminal handling)
is handling of text colors inside netrik. This is also described in
hacking-terminal.*, under <a href="hacking-terminal.html#color">Color
Handling</a>.
</p>

<p>
Last but not least, handling of SIGWINCH is an important issue of terminal
handling -- reacting to terminal size changes. Handling the signals itself is
controlled by the helper functions in winch.c, called from pager.c. The real
resizing action necessary to adopt to the new screen size is handled by
<a href="#main">main.c</a>, which in turn uses the
<a href="hacking-layout.html#resize">resize()</a> function from
pre-render.c. (Described in
<a href="hacking-pager.html">hacking-pager.*</a>.) 
</p>

<p>
SIGWINCH handling is described in more detail under
<a href="hacking-terminal.html#winch">SIGWINCH Handling</a>,
again in hacking-terminal.*.
</p>

</body>
</html>