File: background.rst

package info (click to toggle)
python-invoke 0.11.1%2Bdfsg1-1
links: PTS, VCS
area: main
in suites: buster, stretch
size: 1,136 kB
ctags: 1,702
sloc: python: 5,614; makefile: 37; sh: 36
file content (194 lines) | stat: -rw-r--r-- 7,728 bytes
==========
Background
==========

Command-line invocation interfaces are surprisingly complicated, and Invoke's
needs are a little on the unique side. This document outlines the various
possible approaches to this problem, and why Invoke ended up with the mechanism
it currently provides.

.. note::
    This is purely design documentation; you don't need to understand it in
    order to use the software!

Basics
======

Flags/options, and arguments given to them, are pretty universal::

    $ invoke --boolean
    $ invoke --takes-arg value
    $ invoke --takes-arg=value
    $ invoke -s
    $ invoke -s svalue

Using "standard" GNU style long and short flags is a no-brainer. A less common
syntax is "Java style" where verbose/long flags can be given with a single
dash::

    $ java -jar path/to/jar

We're not a fan of this style so it was never under consideration. (That said,
Invoke's parser mechanism is generic enough to support it with only minor
tweaks.)

Short flag peculiarities
------------------------

Implementation of short flags tends to vary, re: whether an equals sign is
allowed (``program -f=bar``), and sometimes even whether a space is required
(``tail -n5``).

There's also combined short flags (``tar -xzv`` standing in for ``tar -x -z
-v``) which only work for boolean options as they'd be ambiguous otherwise (are
the 2nd through Nth characters other combined short flags, or are they a value
intended for the 1st short flag?)

Invoke supports both of these features: equals signs may be used with short
flags just as with long flags (but are not required), and Boolean short flags
may be combined.


The multiple tasks vs positional args problem
=============================================

This is the meat. Invoke really wanted to have multiple tasks in the spirit of
Fabric 1.x, which supported invocations like::

    $ fab task1:arg1,arg2=val2 task2:args

though we didn't want to continue using that specific invoke style. Instead we
wanted something that looked like this::

    $ invoke --global-opts task1 --task1-opts task2 --task2-opts [...]

Unfortunately ``argparse`` and friends are unable to do this, because they:

* have no way of "partitioning" input, so all flags/options get lumped
  together; and
* consider any non-flag input to be "positional arguments" and toss them into a
  simple list with no way to tell where they showed up in the original
  invocation.

That leaves us on our own.

The naive approach
------------------

The main hurdle here is determining when one task's "chunk" of the input
ends and another begins. In a really simple use case, we could just say
"anything that's not a flag is a task name". Our earlier example, repeated
below, is exactly that simple::

    $ invoke --global-opts task1 --task1-opts task2 --task2-opts [...]

Using the "not a flag" heuristic above gives us nicely delineated chunks to
perhaps hand off to some ``argparse`` based sub-parsers.

Unfortunately, this breaks down as soon as you get into non-boolean options
(i.e. options that take values/arguments.) E.g. what if ``task1`` can be
parameterized by some ``kwarg`` that takes a value::

    $ invoke --global-opts task1 --kwarg value task2

In this case, our naive parser will think ``value`` is a task name, when it's
actually supposed to be the value given to ``kwarg``. Not good!

Telling tasks apart
-------------------

There are multiple ways to solve this problem and tell tasks apart from flag
arguments, each with pluses/minuses:

* Give the parser deep knowledge of the flags involved so that when it
  encounters ``--flag value task`` it knows that ``--flag`` is supposed to have
  a value after it. This can be done explicitly (think ``argparse`` style
  argument definitions) or can be inferred from the task definition.
  
    * Unfortunately, we can't use ``argparse`` or ``optparse`` to do this as
      they are unable to cope with various edge cases that arise in a
      multi-task situation.
    * For example, while ``optparse`` can be told to stop at the first unknown
      positional argument (in our case, the next task name) and could thus be
      used in a loop, it still thinks about the *flags* it encounters globally.
      Two tasks using the same flag name would make using ``optparse``
      impossible. Requiring globally unique flag names would solve this, but is
      too user-hostile.

* Force users to avoid putting spaces between flags and their arguments, i.e.
  always doing ``--flag=value`` or ``-fvalue``. This makes task name tokens
  unambiguous, at the cost of cutting out a very commonly used invocation style
  (``--flag value``). Even worse, such an implementation falls into the trap of
  looking very similar to, but not behaving like, all other programs using
  "normal" flag syntax.
* Add special syntax to denote task names, e.g.::

    $ invoke --global-opts task1: --kwarg value task2: --task2-opts

  where the trailing colon is a sign that that particular "word" is a task
  name. This is unorthodox but not terrifically ugly and could ease parsing.
* Alternately, use standalone sentinel characters in-between the task/arg
  "chunks" which have no special shell meaning, e.g.::

    $ invoke --global-opts , task1 --kwarg value , task2 --task2-opts , [...]

  While this would be the simplest to parse, it's also pretty unappealing.

Per-task positional args
~~~~~~~~~~~~~~~~~~~~~~~~

An additional benefit of the second two approaches above is that they would
enable support for arbitrary per-task positional arguments, e.g.::

    $ invoke --global-opts task1: arg1 arg2 --kwarg=value task2: --kwarg2=value2

The first approach -- using knowledge about the tasks involved to determine
where boundaries start and begin -- requires more complexity in the parser
engine to handle positional arguments. Even then, it is not possible to have
truly arbitrary positional args (meaning an undetermined until runtime number)
due to the ambiguity between "the next positional arg for this task" and "the
next task name".

Invoke's approach
~~~~~~~~~~~~~~~~~

At this point in time, Invoke has gone with the first option: relying on rich
task & argument objects being available to the parser, which uses them to
determine whether a given token is a flag, a value/argument to a flag, a
positional argument, or a task name.

This preserves "traditional" style invocations (no special sigils denoting task
boundaries) with the only tradeoff being that arbitrary numbers of positional
arguments are not possible.


Space-delimited flag values that look like flags themselves
-----------------------------------------------------------

This is an interesting oddity that arises in all types of task/argument
parsing. Consider this invocation::

    $ invoke --takes-a-value --some-other-valid-flag

It can be interpreted in two ways:

* ``--takes-a-value`` having its value set to ``"--some-other-flag"``

    * Pluses: allows specifying flag-like values, which would otherwise have to
      be escaped in some fashion.
    * Minuses: can obscure user error.

* ``--some-other-valid-flag`` being interpreted as an actual flag, and an error
  being generated because ``--takes-a-value`` is then missing a value.

    * Has the inverse tradeoff to the above: fast-fails on user error, but
      would require escaping for actual flag-like values to be treated as flag
      arguments.

A related issue is the possibility of **invalid** flag-like values, e.g.::

    $ invoke --takes-a-value --not-even-a-valid-flag

This doesn't even make sense in the 2nd approach above, because now we've both
got a "missing value" error *and* a "unknown flag" error, whereas the 1st
approach still works as the user probably intended.