1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213
|
<html>
<head>
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="Content-Language" content="en" />
<title>execline: value transformation</title>
<meta name="Description" content="execline: value transformation" />
<meta name="Keywords" content="execline value transformation el_transform crunch chomp split" />
<!-- <link rel="stylesheet" type="text/css" href="//skarnet.org/default.css" /> -->
</head>
<body>
<p>
<a href="index.html">execline</a><br />
<a href="//skarnet.org/software/">Software</a><br />
<a href="//skarnet.org/">skarnet.org</a>
</p>
<h1> Value transformation </h1>
<p>
You can apply 3 kinds of transformations to a value which is to be
<a href="el_substitute.html">substituted</a> for a variable:
crunching, chomping and splitting. They
always occur in that order.
</p>
<a name="delim">
<h2> Delimiters </h2>
</a>
<p>
The transformations work around <em>delimiters</em>. Delimiters are
the semantic bounds of the "words" in your value.
You can use any character (except the null character, which you cannot
use in execline scripts) as a delimiter, by giving a string consisting
of all the delimiters you want as the argument to the <tt>-d</tt> option
used by substitution commands. By default, the string "<tt> \n\r\t</tt>"
is used, which means that the default delimiters are spaces, newlines,
carriage returns and tabs.
</p>
<p>
(The <a href="forstdin.html">forstdin</a> command is a small exception:
by default, it only recognizes newlines as delimiters.)
</p>
<a name="crunch">
<h2> Crunching </h2>
</a>
<p>
You can tell the substitution command to merge sets of consecutive
delimiters into a single delimiter. For instance, to replace
three consecutive spaces, or a space and 4 tab characters, with a
single space. This is called <em>crunching</em>, and it is done
by giving the <tt>-C</tt> switch to the substitution command. The
remaining delimiter will always be the first in the sequence.
Crunching is <em>off</em> by default, or if you give the <tt>-c</tt>
switch.
</p>
<p>
Crunching is mainly useful when also <a href="#split">splitting</a>.
</p>
<a name="chomp">
<h2> Chomping </h2>
</a>
<p>
Sometimes you don't want the last delimiter in a value.
<em>Chomping</em> deletes the last character of a value if it is a
delimiter. It is requested by giving the <tt>-n</tt> switch to the
substitution command. You can turn it off by giving the <tt>-N</tt>
switch. It is off by default unless mentioned in the documentation
page of specific binaries. Note that chomping always happens <em>after</em>
crunching, which means you can use crunching+chomping to ignore, for
instance, a set of trailing spaces.
</p>
<a name="split">
<h2> Splitting </h2>
</a>
<p>
In a shell, when you write
</p>
<pre>
$ A='foo bar' ; echo $A
</pre>
<p>
the <tt>echo</tt> command is given two arguments, <tt>foo</tt>
and <tt>bar</tt>. The <tt>$A</tt> value has been <em>split</em>,
and the space between <tt>foo</tt> and <tt>bar</tt> acted as a
<em>delimiter</em>.
</p>
<p>
If you want to avoid splitting, you must write something like
</p>
<pre>
$ A='foo bar' ; echo "$A"
</pre>
<p>
The doublequotes "protect" the spaces. Unfortunately, it's easy
to forget them and perform unwanted splits during script execution
- countless bugs happen because of the shell's splitting behaviour.
</p>
<p>
<tt>execline</tt> provides a <em>splitting</em> facility, with
several advantages over the shell's:
</p>
<ul>
<li> Splitting is off by default, which means that substitutions
are performed as is, without interpreting the characters in the
value. In execline, splitting has to be explicitly requested
by specifying the <tt>-s</tt> option to commands that perform
<a href="el_substitute.html">substitution</a>. </li>
<li> Positional parameters are never split, so that execline
scripts can handle arguments the way the user intended to. To
split <tt>$1</tt>, for instance, you have to ask for it
specifically:
<pre>
#!/command/<a href="execlineb.html">execlineb</a> -S1
<a href="define.html">define</a> -sd" " ARG1S $1
blah $ARG1S
</pre>
and $ARG1S will be split using the space character as only delimiter.
</li>
<li> Any character can be a delimiter. </li>
</ul>
<h3> How it works </h3>
<ul>
<li> A substitution command can request that the substitution value
be split, via the <tt>-s</tt> switch. </li>
<li> The splitting function parses the value, looking for delimiters.
It fills up a structure, marking the split points, and the number
<em>n</em> of words the value is to be split into.
<ul>
<li> A word is a sequence of characters in the value <em>terminated
by a delimiter</em>. The delimiter is not included in the word. </li>
<li> If the value begins with <em>x</em> delimiters, the word list
will begin with <em>x</em> empty words. </li>
<li> The last sequence of characters in the value will be recognized
as a word even if it is not terminated by a delimiter, unless you have
requested <a href="#chomp">chomping</a> and there was no delimiter at
the end of the value <em>before</em> the chomp operation - in which case
that last sequence will not appear at all. </li>
</ul> </li>
<li> The substitution rewrites the argv. A non-split value will
be written as one word in the argv; a split value will be written
as <em>n</em> separate words. </li>
<li> Substitution of split values is
<a href="el_substitute.html#recursive">performed recursively</a>. </li>
</ul>
<a name="netstrings">
<h3> Decoding netstrings </h3>
</a>
<p>
<a href="https://cr.yp.to/proto/netstrings.txt">Netstrings</a> are
a way to reliably encode strings containing arbitrary characters.
<tt>execline</tt> takes advantage of this to offer a completely safe
splitting mechanism. If a substitution command is given an empty
delimiter string (by use of the <tt>-d ""</tt> option), the
splitting function will try to interpret the value as a sequence
of netstrings, every netstring representing a word. For instance,
in the following command line:
</p>
<pre>
$ define -s -d "" A '1:a,2:bb,0:,7:xyz 123,1: ,' echo '$A'
</pre>
<p>
the <tt>echo</tt> command will be given five arguments:
</p>
<ul>
<li> the "<tt>a</tt>" string </li>
<li> the "<tt>bb</tt>" string </li>
<li> the empty string </li>
<li> the "<tt>xyz 123</tt>" string </li>
<li> the "<tt> </tt>" string (a single space) </li>
</ul>
<p>
However, if the value is not a valid sequence of netstrings, the
substitution command will die with an error message.
</p>
<p>
The <a href="dollarat.html">dollarat</a> command, for instance,
can produce a sequence of netstrings (encoding all the arguments
given to an execline script), meant to be decoded by a substitution
command with the <tt>-d ""</tt> option.
</p>
</body>
</html>
|