netrik hacker's manual
[This file contains a description of the hyperlink mechanism. See hacking.txt
or hacking.html for an overview of the manual.]
In contrast to most other aspects of functionality, there is no seperate module
for link handling. The components forming the link mechanism are spread over
several other modules instead.
There is a links.c file (and a related links.h, of course), but it contains
only some simple helper functions -- at least for now.
There are also a number of specific files for form handling (forms.[ch],
form-file.[ch]), which is also part of the link handling module. (See _Forms_
The first step is done already in the layout module: The links need to be
extracted while parsing the structure of the page to generate the item tree.
This is described under _Links_ in hacking-layout.*.
The start and end position of a link, and the link's destination (extracted
from the "href" attribute) are stored in "string->link".
This is a list of "struct Link", containing the data of all links and form
controls inside the current text item:
- The ints "start" and "end" store the start and end postitions of the link
inside the string
- The page coordinates of the link are stored in "x_start", "x_end", "y_start"
- The "value" contains the link's destination URL for normal links, or the form
data for form controls. "value" is of type "struct Data_string", which is a
Pascal-like string structure with a data area (dynamic array of char) and an
explicitly stored string length. This is necessary for binary form data, as
this can contain NUL bytes. For normal links, the URL is stored as a normal C
string in the "data" component, and the "size" component is unused.
- "form" is an "enum Form_control", telling to what kind of link or form control
the link structure refers
- "name" is a string storing the value of the "name" or "id" attribute
identifying a form control
- The "enabled" flag indicates whether a form element is to be submitted to the
server; for conditional elements (checkboxes etc.) this depends on the
To allow easy access to all links on the page without having to scan every text
item, another structure is created. This structure doesn't itself store any
data; it only stores pointers to the entries inside the text items' link lists.
The "Link_list" structure (defined in items.h) contains:
- The number of strings on the page, and
- An array of "struct Link_ptr", containing for every link:
- The item containing the link
- The number of the link inside the item's link list
This structure is created by make_link_list() (just after parse_struct()). This
function traverses all items, and for every encountered text item stores
pointers to all of its links. The only exception are links representing
"hidden" form input fields: These aren't stored in the link list, so they
won't be selectable.
When assigning coordinates to all items in the various sub-functions of
pre_render() (as another part of the layouting process), the page coordinates
of the links also have to be calculated. This is done in assign_ywidth(),
and described under _Link Coordinates_ in hacking-layout.*.
After the links are extracted during layouting, they can be activated and
followed in the pager. (Which is described in hacking-pager.*)
Links can be activated in the pager using the various link selection commands,
or the cursor movement comands if the cursor is moved over a link.
How these commands decide which link to activate, is described under _Link
Selection Commands_ in hacking-pager.*.
Actually activating (or deactivating) a selected link is done with the
activate_link() function. This function is explicitly called by the
link selection commands, but can be also called by _scroll_to()_ (see
hacking-pager.*) automatically, if the active link is scrolled out of the
valid screen area.
As a special case, the current link is deactivated by calling activate_link
with -1 as argument, so no link is active afterwards.
Before activating the requested link, the page is scrolled so that the link
is actually in the valid screen area. (This isn't done when called with -1, as
links can be deactivated even if they are no longer on the screen.) Scrolling
is done using _scroll_to()_ .
Next step is deactivating a previously active link, if any. This is done by
recursively calling active_link() (with -1 as argument). Of course this is
also done only when activating a new link -- if we are only to deactivate
the current active one, we won't call ourself to do so...
Now the affected link is highlighted (or unhighlighted, if it is to be
deactivated) using the highligh_link() function from links.c. This function
takes the page descriptor (for the item tree and the link list), the link
number, and a flag indicating whether to highlight or unhighlight as arguments.
The higlighting is done by finding the first div of the associated string
that belongs to the link text, and changing the background color of it (and
all other divs before the link end). This is done by or-ing the color with
"color_map[TM_ACTIVE]"; deactivating is done by clearing the bits.
Afterwards, the screen area affected by the link activation is re-rendered.
The coordinates of the affected page area normally are determined by the
coordinates assigned to the link in _pre_render()_ ; if the link spans
multiple lines (contains line wraps) however, the x-coordinates of the item
containing the link are taken instead, so all lines containing parts or the
link are repainted as a whole. (It would be too complicated to determine the
exact affected line parts, and it doesn't make much of a difference anyhow.)
The coordinates are truncated to the part visible on the screen, and the area
Finally, "active_link" is set to the newly activated link, and the cursor
position is updated. If a new link is actually activated, not the old one
deactivated, the cursor is set to the beginning of the link with _set_cursor()_
(see hacking-pager.*); it will keep this position as long as the link is active.
When activating a new link, some additional variables are assigned also: Hashes
of the link URL and contained text are copied from the link structure. These
are necessary to be able to activate the right link when the page is revisited
later, but the page content has changed in the meantime. This is explained
under _Reactivating Link_ in hacking-page.*.
If some link is active, it can be followed in the pager by pressing <return>.
The pager immediatly exits in this case, with a return value indicating that
a link was followed; everything else has to be done by the caller. (s.b.)
Following the links itself is done in main(). When it sees that a link was
followed (the pager returned "RET_LINK"), first the "active_link" URL has to
get_link() retrieves the link structure of the desired link (by the page's
link list) inside the item tree, and returns a pointer to it, so all link
data can be accessed.
The URL stored in the link structure ist copied to the local "url".
Having this, the old page is not needed any longer, and is freed using
_free_layout()_ ; afterwards, the new page is loaded using _load_page()_
with the current page's URL as base.
If the link only references a local anchor (the URL starts with '#'), the
old page isn't freed; instead, its descriptor is passed as the "reference"
parameter to load_page(), so the page data will be reused and only the anchor
activated, instead of reloading the page.
The full URL of the new page to be loaded needs to be determined using the
link URL and the current page's URL. That is done in _init_load()_ (called
from load_page()). This process is described in detail in hacking-load.*.
HTML forms are handled together with links. The "form" element of the _Link
Struct_ indicates that the link list entry refers to some form control,
not to a normal link.
forms.c contains some helper functions for handling forms.
set_form() writes the value of a form control (given by its link structure as
argument) to a string in the item tree, so it will be displayed on the output
page. It is called directly to set the initial value just after the form element
was extracted during structure parsing, when the link list doesn't exist yet.
(It is presently also used instead of update_form() in one place -- which is
a hack; see below under _Manipulating_ .)
First it has to find out where to store the data. For this, the string
containing the element is searched for the first div *after* the link start.
(The first div of a form link, starting at the link start, is the form
indicator, e.g. '['; the following div then contains the value.)
Having this, the value can be stored to the string. How this is done depends
on the type of the form control.
For text, password, and hidden input fields, as much of the "value" as the
div length is copied to the string div. If there is less than the div length,
the remaining space is padded with '_' characters. (Hidden input fields are
displayed like text fields, only they are "dim", and can't be selected.)
Textareas are also handled like normal single-lined text input fields for now.
(The linefeeds will be simply replaced by spaces, just like any other control
For radio buttons and checkboxes, either an '*' is put in, or an nbsp. This
is determined by the "enabled" flag of the form link. (It does *not* depend
on the "value"!)
For <select> options, the character introducing the option (the one *before*
the first char of the option text) is set either to '+' or to '-', also
depending on whether "enabled" is set.
update_form() is very similar; the difference is that it takes the whole page
structure and a link number from the link list as arguments; the string and
the link structure are then extracted from the list, and set_form() is called
to actually set the data.
get_form_item() is used to find out to which form a form control belongs. (This
is necessary for submit buttons.) It takes the link number of the form control,
retrieves the right form item in the item tree, and returns a pointer to it.
The form item is found by starting with the item containing the form control
(determined by the _Link List_ ), and going up in the item tree until
an item of type ITEM_FORM is found -- it contains the item with the button,
so it belongs to the form in which the button is located.
form_next() (together with form_start()) is used to get all form elements
of some form from the item tree. (This is necessary when submitting, or
when activating "radio" input elements.) Depending on the "filter" flag,
it returns either all form elements, or only the ones that are "successful"
(have a name, a value, and are enabled), and should be submitted to the server.
Every time it is called, it returns one link pointer, belonging to a form
control. When there are no more elements in the form NULL is returned.
form_next() takes/returns a "struct Form_handle" as parameter, which stores
the form item, the item of the last returned link, the link number of the last
returned link, and the "filter" flag. This is used to keep track of which
form elements have already been returned in previous calls. The handle has
to be initialized by form_start(), which takes the form item and the "filter"
flag as paramters, and returns the handle. (Not a pointer!)
form_next() traverses all items below the form item, using the "list_next"
pointers. In every text item found, it scans all of its links. Both loops work
directly with the handle values as loop variables. They are not initialized
on entering, so that they continue right where the last call to form_next()
To ensure all items inside the form are scanned, form_start() has to find the
first item inside the form, so form_next() will start scanning there. This
is done by repeatedly descending to "first_child", until a childless item is
found -- this is always the first one in the form.
+------+ start --> +------+
| text |-. ,->| form |
| | x ^
| xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +
| x ++++++++++++++++++++++++++++++++++++
| v + + |
| +-----+ +-----+ |
| 1. descend --> ,->| box |-. ,->| box |-'
| | +-----+=|===========|=>+-----+==>NULL
| | x ^ | | x ^
| xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx + | xxxxxxxxxxxxx +
| x +++++++++++++++++++++++++++++++++++++++++++++++ | x +++++++++++++
| v + + + | | v + |
| +------+ +------+ +-----+ | | +------+ |
`->| text |-->| text |-. ,->| box |-' `->| text |-'
| x x | | x ^ x
2. descend v v | xxxxxxxxxxxxx + v
(goal) NULL NULL | x +++++++++++++ NULL
| v + |
| +------+ |
`->| text |-'
(See _Structure Tree_ in hacking-layout.* for a description of the item tree.)
url_encode() is responsible for creating the string that will be submitted to
the server as part of the URL when a form is posted using the "GET" method,
or in the HTTP request body when using the "POST" method with URL encoding.
This function retrieves all (successful) form controls from the given form,
using form_start()/form_next(). For each form control, the name and the value
are stored, separated by an '=' and followed by an '&'. In both name and value,
certain characters have to be escaped, which is done in the encode_string()
Spaces are replaced by a '+'. Special characters, including all
non-ASCII-characters as well as a "real" '+' and some other characters
with special meaning inside the URL ('&', '=', '%' and '#') are replaced
by a hexadecimal representation, introduced with a '%' char. ("%HH") Other
characters are stored directly.
For "file" form controls, the "value" isn't submitted directly; the
*contents* of the file named in "value" is loaded into memory instead (using
_form_read_file()_, see below) and encoded.
The memory for the created string is allocated in chunks, for efficency
reasons. This is done inside encode_string(). As soon as the currently
allocated size of the string doesn't suffice to hold the next character,
it is grown by one chunk. The test is very crude, to keep it simple: The
array is considered too small if there is space for less then 5 characters,
as up to 5 characters may be stored before the next test, in the worst case:
Up to 3 for the currently processed char (if it has to be hex-escaped),
plus the following '=' if we are at the last character in the name, plus the
'&' if the value for this form control is empty. (No checking is done before
storing the '=' and '&', so we have to reserve the place for them here!)
The return value is a "struct Data_string" -- this is not really necessary, as
URL encoded strings never contain a NUL character, and thus could be passed as
a normal C string; but we do this anyways to get consistency with mime_encode().
mime_encode() does the same as url_encode, except that the data is encoded
using multipart MIME format.
This is much simpler: There is no encoding necessary to the data itself;
only the mime header information have to be stored along with the data itself.
As the size of a data block can be easily calculated here, the string is
first resized to fit the new block in each iteration (for each name/value
pair), and then the data is stored with a simple sprintf() and a few memcpy()
calls. This should be much more efficient than the approach in url_encode
for bigger data blocks, especially files.
The size is estimated by adding the size of the data strings ("name" and
"value") to the size of the format string -- this is not exact (the format
specifiers ("%s") are counted, though they are replaced in the resulting
string), but the few bytes too much shouldn't matter. (Especially as they
will be used up later anyways...)
Just like in _url_encode_ , "file" controls need a special handling. As in
mime encoding the file name is submitted explicitly, we need an additional
"file_name" parameter. This makes the handling more complicated than in
Note that, to reduce the amount of extra code, the "file_name" parameter is
also used with other form controls; it is simply set to the an empty string.
(And a dummy "%s" is passed to snprintf() so the parameter will be consumed --
the empty string won't affect the output.)
mime_encode() has to operate on and return a "struct Data_string" (see _Link
Structure_ above for a description, under "value"), as the form may contain
binary data with NUL characters -- we can't use normal C strings here.
Reading the files of "file" form controls into memory is done with
form_read_file(), which is located in form-file.c. This function takes the
name of a file to read as parameter, and returns a "struct Data_string"
(not a pointer!) with the contents of the named file.
If reading the file failed for some reason, an error message is printed, and a
"Data_string" with -1 as "size" (and NULL as "data") is returned, indicating
that an errer occured and the string doesn't contain valid data. (The caller
has to handle this.)
form_read_file() actually doesn't do very much -- the real loading is done
in the local read_file() function (also used for _edit_textarea()_);
form_read_file() is just a wrapper.
The form controls are extracted in parse_struct() (described in
hacking-layout.*), just like normal links. The only difference (besides of
storing the "value" and "enabled" variables) is that the initial value/state
of a form control has to be displayed on the page, which is done by set_form().
When <return> is pressed on a selected form control (display() returns RET_LINK,
and "form" of "active_link" is not FORM_NO), special action is taken in main().
On a form control, this generally involves getting a new value (either implicit
by the link activation, or by explicit input), which is then saved either in
"link->value" or in "link->enabled" (depending on the control type), and also
stored to the item tree with update_form(). (So the new value will show up
on the output page.)
For text and password input fields, the user is prompted for a new value,
which is stored in "link->value".
File input fields are also handled here. There is one additional test however:
If some file name is actually specified (not an empty value), we test whether
the file exists and is readable, so the user will see immediately if a wrong
file name was entered.
For checkboxes, simply the "link->enabled" flag is toggled.
For radio buttons, "link->enabled" is set for the activated button, and reset
for all other radio buttons in the form which have the same "name". This is
done by iterating through all form items (with form_start()/form_next() without
"filter"), and every time a control with the same name is found, disabling it.
Reflecting the disabling in the item tree can't be done with update_form()
however, as the link number in the link list isn't known here. (It's not
returned nor used by form_next().) Thus, we grab the current string item from
the handle used by form_next(), and call set_form() with that. This is a hack
(it depends on the implementation of "struct Form_handle", which it shouldn't);
however, I don't know any simple and reasonable way to implement that with
the current link storing mechanism...
For <select> options, the behaviour depends on whether the select has the
"multiple" attribute. (Which is coded in the link type: Either "FORM_OPTION"
or "FORM_MULTIOPTION") If yes, they behave like checkboxes, otherwise like
On a submit button, a page load is performed (using _load_page()_ as usual);
the target URL is taken from the "data.form->url" field of the item representing
the form to which the button belongs, retrieved with _get_form_item()_
. The "form" parameter is set to the form item itself; load_page() passes
this on to init_load(), where it is handled appropriately. (See hacking-load.*)
<textarea> fields are more tricky. They require an own function to do the work,
which is located in form-file.c.
The textarea fields aren't manipulated in netrik directly; instead, an external
editor is invoked. (Determined the EDITOR environment variable, falling back
to "vi".) The form data is passed to that editor in a temporary file; after
the editor quits, the new content of the file is read and stored in the form.
The major difficulty is the heavy use of system functions, inducing a multitude
of possible error conditions which have to be checked for. On each error
condition, edit_textarea() exits returning a string describing the error;
this is then printed in main().
When init_load() is called with a "form" argument, the form data is extracted
and encoded, and passed to the HTTP server.
For forms using the "GET" submit method, this is done in init_load() itself.
The data is extracted and encoded using _url_encode()_ ; the resulting CGI
parameter string is then passed to merge_urls(), where it is stored as part
of the resulting target URL (in place of any other CGI paramters). As the
form data is now part of the URL, no other special handling is necessary --
it will be passed to the server with the URL.
For "POST" forms, the form item is simply passed on to http_init_load() and
from there to get_http_cmd(), where either _url_encode()_ or _mime_encode()_
is used to get the encoded form data, which is then stored in the body of
the HTTP request and passed to the server.
If the loaded link contains a fragment identifier, "active_anchor" is set
to the desired anchor after loading the page. This is done in _load_page()_
(described in hacking-layout.*), after all other steps of the loading process
Similar to links, anchors are extracted while parsing the page structure.
However, they are stored in another way: Every anchor has an own item, either
of type ITEM_BLOCK_ANCHOR or ITEM_INLINE_ANCHOR, depending on the anchor type.
Both types are _Virtual Items_ ; this is described in hacking-layout.*.
After extracting the anchors, the "anchor_list" data structure is created
(using make_anchor_list()), which is very similar to the "link_list". It
consists only of the total anchor count, and an array containing pointers to
every anchor item.
Jumping to an anchor primarily implies retrieving the matching anchor
number. This is done (inside load_page()) by comparing the fragment identifier
given in the URL with the names of all anchors from the anchor list. When
the right anchor is found, its entry number in the anchor list is stored in
Displaying the anchor itself is done in the pager. When _display()_
(described in hacking-pager.*) finds an "anctive_anchor" while starting up,
it calls activate_anchor() to jump to and show the anchor position.
This function first calculates the position to scroll the page to. This
position depends on the setting of the "anchor_offset" config variable:
If this has a nonzero value, the page is scrolled so that the anchor will
start at the reciprocal value of "anchor_offset" of the screen height; when
"anchor_offset" is five for example (current default), we will scroll to the
anchor start position minus one fifth of LINES -- the anchor start will appear
one fifth of the screen height below the screen top. If "anchor_offset" is
zero, the anchor will be shown "link_margin" lines below the screen top instead.
With this calculated "optimal" scroll position scroll_to() is called, and
returns the actual pager position -- this may differ from the requested when
the anchor is near the screen top or bottom. Having this, the actual screen
positions of the anchor start and end are calculated. If these are identical
(empty block anchor), the end position (pointing, as always, *after* the last
anchor line), is incremented by one to cause a mark being displayed in the
next line. The theoretical start and end positions are then truncated to the
Finally, a mark (the string "cfg.anchor_mark") is printed in the rightmost
column of every screen line in the area spanned by the link, as calculated
before. These marks are printed directly to the curses screen; they aren't
stored in the item tree.
The cursor is moved to the exact anchor start position using _set_cursor()_
When activate_anchor() is called with -1 as anchor number, the previously
activated anchor is deactivated. The page isn't scrolled in this case, but the
screen position of the anchor is calculated as well; the marks are cleared
by re-rendering the area where the marks were drawn. (Calling _render()_
(described in hacking-layout.*) with the "overpaint" flag.)
The deactivating is done by the pager after the first keypress. (*After*
the associated function was performed.)