Libwhisker 2.x forms walkthrough --------------------------------------------------------------------------- This document discusses and demonstrates how to use the form parsing functionality contained in Libwhisker 2.x. This document assumes a Libwhisker 2.x version of at least 2.2. First and foremost, using the Libwhisker forms function requires a comfortable working knowledge of Perl anonymous structures and references. I recommend reading the first chapter of "Advanced Perl Programming" by O'Reilly to brush up on your anonymous data storage and reference concepts. Alright, let's start. --------------------------------------------------------------------------- Libwhisker stores all form data in a hash. Let's call this hash %FORM for now. The keys of the %FORM hash are one of three things: - The name of the element, as specified by 'name="..."' attribute - The value "unknown#" (where # is an incrementing number starting at 0) for any elements which do not specify a 'name="..."' attribute - The value "\0" (NULL) for the
tag data Let's look at an example form, and see how this maps:
If we were to pass this form data to the Libwhisker, the resulting hash would look like: $FORM = ( "\0" => ..., # data for
tag "first-input" => ..., # data for first input/text tag with # value 'one' "second-input" => ..., # data for second input/text tag with # value 'two' "unknown0" => ..., # data for third input/text tag with # value 'three' "first-check" => ..., # data for first checkbox with value # 'mycheck' "the-radio" => ..., # data for all the radio boxes named # 'the-radio' "areatext" => ..., # data for the first textarea tag "unknown1" => ... # data for first input/submit tag ); Right now we're not concerned with the actual data being stored--we're only focusing on the hash key names. The notable highlights to point out: - The form data is under the key "\0" (NULL) - Since no 'name=""' attribute was given for the third input/text and the input/submit elements, they were assigned names of "unknown0" and "unknown1", respectively - Elements which use multiple tags under the same name (select/option, radio, etc) all appear under a single entry of that name; in this case, the three "the-radio" radio buttons will all be contained under a single "the-radio" hash key Methodically parsing a Libwhisker form structure is a simple matter of first handling the special key "\0", and then iterating over the hash and handling each key (except "\0") as necessary. --------------------------------------------------------------------------- Let's move on to the next level of storage. For every key in the %FORM hash, Libwhisker creates an anonymous array. This array contains one entry for every tag with the given name. For many entries, which will lead to an array with only a single entry; however there will likely be many entries for radio buttons and select/options. There will also be multiple entries for tags having the same name. Let's look at another example form:
If we were to pass this form data to the Libwhisker, the resulting hash would look like: $FORM = ( "\0" => ..., # data for the
tag, to be discussed # later "first-input" => [ ... # data for the first input box with value # "one" ], "unknown0" => [ ... # data for the second input box with value # "two" ], "the-radio" => [ ..., # data for radio box with value "1" ... # data for radio box with value "2" ], "the-select" => [ ..., # data for the tag ], "unknown1" => [ ... # data for the first submit element ] ); Note: At this point, the 'data' stored for each element (represented by '...') is still arbitrary and will be discussed later. Right now, we're just looking at how many sets of data are stored. Some highlights to note from this hash dump: - The "\0" key does not follow the same format as the rest of the keys (it is not a reference to an anonymous array); the "\0" key should always be processed by itself, and skipped for all other element processing - The "first-input", "unknown0", and "unknown1" keys have anonymous arrays with only a single element of data - The "the-radio" key is an anonymous array of two data elements, one for each encountered radio box - The "the-select" key is an anonymous array with a data set that contains an entry for the tag Hopefully this is still easy to understand thus far. All elements of the same name are put under the same hash key. The hash key points to an anonymous array which contains one entry for each tag/element encountered for that given element name. There is a corner case worth noting. Imagine the following HTML:
The tricky thing here is that all the elements are named "bar". How this is actually interpreted/handled is browser dependant; however, Libwhisker handles it by creating a single hash key "bar" and putting all the elements in the anonymous array, like so: $FORM = ( "\0" => ..., # data for
tag "bar" => [ ..., # data for input/text ..., # data for textarea ..., # data for checkbox ... # data for submit ] ); It should now be easy to see how multiple elements are stored under a single key. Let's move on. --------------------------------------------------------------------------- Up until this point, we've been using '...' as a placeholder for the tag data. It's time now for us to define exactly what this is. All tags/elements, except for the "\0" entry, have a set of data associated with them. This set of data takes the form of an anonymous array. The anonymous array has three entries: the type of the tag, the value of the tag, and a reference to an anonymous array containing additional tag attributes (we will discuss this a bit later, below). Thus a 'data set' looks like: [ $type, $value, [ ] ] First we'll discuss the $type. This is a string value of 'select', '/select', 'option', 'textarea', or an input value of the form 'input-?', where '?' is the actual input type (such as "input-submit", "input-text", "input-checkbox", etc.). One important thing to note: the 'input-?' value uses the value specified by the 'type' attribute, with no sanity checking or case adjustment. E.g.: # type is "input-text" # type is "input-TEXT" # type is "input-foobar" # type is "input-foo Bar bAZ" When processing the $type, you should always lowercase the value and then compare it to valid known types. The next data set entry is the $value. This is simply the value of the tag/element, or undef if the tag/element doesn't have a value attribute or when a value doesn't otherwise apply. The actual value contained in $value can vary depending on the specific HTML tag being parsed. Let's do a quick run-through of possible values: ---- ---- The value will be data in the 'value="..."' attribute; if a value attribute doesn't exist, then the value will be undef. # $value equals "foobar" # $value equals "" # $value is undef ---- ---- This is identical to input/text. ---- ---- This is identical to input/text; however, in order for radio boxes to work, you generally always need a value attribute. The trick here is that there will be one data set for every radio box encountered, meaning every possible value will be enumerated. You can tell which one is actually selected by default by looking at the optional attributes (discussed later). ---- ---- Checkboxes don't necessary need a value in order to work; the browser will typically submit a value of 'on' or '1' if the box is checked and a value is not specified. Therefore many times you'll find the value to be undef, but if a value attribute is specified, then the value will be set accordingly. Like the radio box, you can tell if the box is checked by consulting the optional attributes (again, discussed later). ---- ---- Submit buttons may or may not have an actual value assigned to them. In general, it's the same as input/text. ---- tags. If the area between the tags is empty, then the value will be an empty string (""). If a tag, then the value will be undef. ---- and will always be undef, since these tags do not carry any actual value. The values of subsequent