1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194
|
==========
Background
==========
Command-line invocation interfaces are surprisingly complicated, and Invoke's
needs are a little on the unique side. This document outlines the various
possible approaches to this problem, and why Invoke ended up with the mechanism
it currently provides.
.. note::
This is purely design documentation; you don't need to understand it in
order to use the software!
Basics
======
Flags/options, and arguments given to them, are pretty universal::
$ invoke --boolean
$ invoke --takes-arg value
$ invoke --takes-arg=value
$ invoke -s
$ invoke -s svalue
Using "standard" GNU style long and short flags is a no-brainer. A less common
syntax is "Java style" where verbose/long flags can be given with a single
dash::
$ java -jar path/to/jar
We're not a fan of this style so it was never under consideration. (That said,
Invoke's parser mechanism is generic enough to support it with only minor
tweaks.)
Short flag peculiarities
------------------------
Implementation of short flags tends to vary, re: whether an equals sign is
allowed (``program -f=bar``), and sometimes even whether a space is required
(``tail -n5``).
There's also combined short flags (``tar -xzv`` standing in for ``tar -x -z
-v``) which only work for boolean options as they'd be ambiguous otherwise (are
the 2nd through Nth characters other combined short flags, or are they a value
intended for the 1st short flag?)
Invoke supports both of these features: equals signs may be used with short
flags just as with long flags (but are not required), and Boolean short flags
may be combined.
The multiple tasks vs positional args problem
=============================================
This is the meat. Invoke really wanted to have multiple tasks in the spirit of
Fabric 1.x, which supported invocations like::
$ fab task1:arg1,arg2=val2 task2:args
though we didn't want to continue using that specific invoke style. Instead we
wanted something that looked like this::
$ invoke --global-opts task1 --task1-opts task2 --task2-opts [...]
Unfortunately ``argparse`` and friends are unable to do this, because they:
* have no way of "partitioning" input, so all flags/options get lumped
together; and
* consider any non-flag input to be "positional arguments" and toss them into a
simple list with no way to tell where they showed up in the original
invocation.
That leaves us on our own.
The naive approach
------------------
The main hurdle here is determining when one task's "chunk" of the input
ends and another begins. In a really simple use case, we could just say
"anything that's not a flag is a task name". Our earlier example, repeated
below, is exactly that simple::
$ invoke --global-opts task1 --task1-opts task2 --task2-opts [...]
Using the "not a flag" heuristic above gives us nicely delineated chunks to
perhaps hand off to some ``argparse`` based sub-parsers.
Unfortunately, this breaks down as soon as you get into non-boolean options
(i.e. options that take values/arguments.) E.g. what if ``task1`` can be
parameterized by some ``kwarg`` that takes a value::
$ invoke --global-opts task1 --kwarg value task2
In this case, our naive parser will think ``value`` is a task name, when it's
actually supposed to be the value given to ``kwarg``. Not good!
Telling tasks apart
-------------------
There are multiple ways to solve this problem and tell tasks apart from flag
arguments, each with pluses/minuses:
* Give the parser deep knowledge of the flags involved so that when it
encounters ``--flag value task`` it knows that ``--flag`` is supposed to have
a value after it. This can be done explicitly (think ``argparse`` style
argument definitions) or can be inferred from the task definition.
* Unfortunately, we can't use ``argparse`` or ``optparse`` to do this as
they are unable to cope with various edge cases that arise in a
multi-task situation.
* For example, while ``optparse`` can be told to stop at the first unknown
positional argument (in our case, the next task name) and could thus be
used in a loop, it still thinks about the *flags* it encounters globally.
Two tasks using the same flag name would make using ``optparse``
impossible. Requiring globally unique flag names would solve this, but is
too user-hostile.
* Force users to avoid putting spaces between flags and their arguments, i.e.
always doing ``--flag=value`` or ``-fvalue``. This makes task name tokens
unambiguous, at the cost of cutting out a very commonly used invocation style
(``--flag value``). Even worse, such an implementation falls into the trap of
looking very similar to, but not behaving like, all other programs using
"normal" flag syntax.
* Add special syntax to denote task names, e.g.::
$ invoke --global-opts task1: --kwarg value task2: --task2-opts
where the trailing colon is a sign that that particular "word" is a task
name. This is unorthodox but not terrifically ugly and could ease parsing.
* Alternately, use standalone sentinel characters in-between the task/arg
"chunks" which have no special shell meaning, e.g.::
$ invoke --global-opts , task1 --kwarg value , task2 --task2-opts , [...]
While this would be the simplest to parse, it's also pretty unappealing.
Per-task positional args
~~~~~~~~~~~~~~~~~~~~~~~~
An additional benefit of the second two approaches above is that they would
enable support for arbitrary per-task positional arguments, e.g.::
$ invoke --global-opts task1: arg1 arg2 --kwarg=value task2: --kwarg2=value2
The first approach -- using knowledge about the tasks involved to determine
where boundaries start and begin -- requires more complexity in the parser
engine to handle positional arguments. Even then, it is not possible to have
truly arbitrary positional args (meaning an undetermined until runtime number)
due to the ambiguity between "the next positional arg for this task" and "the
next task name".
Invoke's approach
~~~~~~~~~~~~~~~~~
At this point in time, Invoke has gone with the first option: relying on rich
task & argument objects being available to the parser, which uses them to
determine whether a given token is a flag, a value/argument to a flag, a
positional argument, or a task name.
This preserves "traditional" style invocations (no special sigils denoting task
boundaries) with the only tradeoff being that arbitrary numbers of positional
arguments are not possible.
Space-delimited flag values that look like flags themselves
-----------------------------------------------------------
This is an interesting oddity that arises in all types of task/argument
parsing. Consider this invocation::
$ invoke --takes-a-value --some-other-valid-flag
It can be interpreted in two ways:
* ``--takes-a-value`` having its value set to ``"--some-other-flag"``
* Pluses: allows specifying flag-like values, which would otherwise have to
be escaped in some fashion.
* Minuses: can obscure user error.
* ``--some-other-valid-flag`` being interpreted as an actual flag, and an error
being generated because ``--takes-a-value`` is then missing a value.
* Has the inverse tradeoff to the above: fast-fails on user error, but
would require escaping for actual flag-like values to be treated as flag
arguments.
A related issue is the possibility of **invalid** flag-like values, e.g.::
$ invoke --takes-a-value --not-even-a-valid-flag
This doesn't even make sense in the 2nd approach above, because now we've both
got a "missing value" error *and* a "unknown flag" error, whereas the 1st
approach still works as the user probably intended.
|