nullprogram.com/blog/2020/08/01/
This article was discussed on Hacker News and critiqued on
Wandering Thoughts (2, 3).
Command line interfaces have varied throughout their brief history but
have largely converged to some common, sound conventions. The core
originates from unix, and the Linux ecosystem extended it,
particularly via the GNU project. Unfortunately some tools initially
appear to follow the conventions, but subtly get them wrong, usually
for no practical benefit. I believe in many cases the authors simply
didn’t know any better, so I’d like to review the conventions.
Short Options
The simplest case is the short option flag. An option is a hyphen —
specifically HYPHEN-MINUS U+002D — followed by one alphanumeric
character. Capital letters are acceptable. The letters themselves have
conventional meanings and are worth following if possible.
Flags can be grouped together into one program argument. This is both
convenient and unambiguous. It’s also one of those often missed details
when programs use hand-coded argument parsers, and the lack of support
irritates me.
program -abc
program -acb
The next simplest case are short options that take arguments. The
argument follows the option.
program -i input.txt -o output.txt
The space is optional, so the option and argument can be packed together
into one program argument. Since the argument is required, this is still
unambiguous. This is another often-missed feature in hand-coded parsers.
program -iinput.txt -ooutput.txt
This does not prohibit grouping. When grouped, the option accepting an
argument must be last.
program -abco output.txt
program -abcooutput.txt
This technique is used to create another category, optional option
arguments. The option’s argument can be optional but still unambiguous
so long as the space is always omitted when the argument is present.
program -c # omitted
program -cblue # provided
program -c blue # omitted (blue is a new argument)
program -c -x # two separate flags
program -c-x # -c with argument "-x"
Optional option arguments should be used judiciously since they can be
surprising, but they have their uses.
Options can typically appear in any order — something parsers often
achieve via permutation — but non-options typically follow options.
program -a -b foo bar
program -b -a foo bar
GNU-style programs usually allow options and non-options to be mixed,
though I don’t consider this to be essential.
program -a foo -b bar
program foo -a -b bar
program foo bar -a -b
If a non-option looks like an option because it starts with a hyphen,
use --
to demarcate options from non-options.
program -a -b -- -x foo bar
An advantage of requiring that non-options follow options is that the
first non-option demarcates the two groups, so --
is less often
needed.
# note: without argument permutation
program -a -b foo -x bar # 2 options, 3 non-options
Long options
Since short options can be cryptic, and there are such a limited number
of them, more complex programs support long options. A long option
starts with two hyphens followed by one or more alphanumeric, lowercase
words. Hyphens separate words. Using two hyphens prevents long options
from being confused for grouped short options.
program --reverse --ignore-backups
Occasionally flags are paired with a mutually exclusive inverse flag
that begins with --no-
. This avoids a future flag day where the
default is changed in the release that also adds the flag implementing
the original behavior.
program --sort
program --no-sort
Long options can similarly accept arguments.
program --output output.txt --block-size 1024
These may optionally be connected to the argument with an equals sign
=
, much like omitting the space for a short option argument.
program --output=output.txt --block-size=1024
Like before, this opens up the doors for optional option arguments. Due
to the required =
this is still unambiguous.
program --color --reverse
program --color=never --reverse
The --
retains its original behavior of disambiguating option-like
non-option arguments:
program --reverse -- --foo bar
Subcommands
Some programs, such as Git, have subcommands each with their own
options. The main program itself may still have its own options distinct
from subcommand options. The program’s options come before the
subcommand and subcommand options follow the subcommand. Options are
never permuted around the subcommand.
program -a -b -c subcommand -x -y -z
program -abc subcommand -xyz
Above, the -a
, -b
, and -c
options are for program
, and the
others are for subcommand
. So, really, the subcommand is another
command line of its own.
Option parsing libraries
There’s little excuse for not getting these conventions right assuming
you’re interested in following the conventions. Short options can be
parsed correctly in just ~60 lines of C code. Long options are
just slightly more complex.
GNU’s getopt_long()
supports long option abbreviation — with no way to
disable it (!) — but this should be avoided.
Go’s flag package intentionally deviates from the conventions.
It only supports long option semantics, via a single hyphen. This makes
it impossible to support grouping even if all options are only one
letter. Also, the only way to combine option and argument into a single
command line argument is with =
. It’s sound, but I miss both features
every time I write programs in Go. That’s why I wrote my own argument
parser. Not only does it have a nicer feature set, I like the API a
lot more, too.
Python’s primary option parsing library is argparse
, and I just can’t
stand it. Despite appearing to follow convention, it actually breaks
convention and its behavior is unsound. For instance, the following
program has two options, --foo
and --bar
. The --foo
option accepts
an optional argument, and the --bar
option is a simple flag.
import argparse
import sys
parser = argparse.ArgumentParser()
parser.add_argument('--foo', type=str, nargs='?', default='X')
parser.add_argument('--bar', action='store_true')
print(parser.parse_args(sys.argv[1:]))
Here are some example runs:
$ python parse.py
Namespace(bar=False, foo='X')
$ python parse.py --foo
Namespace(bar=False, foo=None)
$ python parse.py --foo=arg
Namespace(bar=False, foo='arg')
$ python parse.py --bar --foo
Namespace(bar=True, foo=None)
$ python parse.py --foo arg
Namespace(bar=False, foo='arg')
Everything looks good except the last. If the --foo
argument is
optional then why did it consume arg
? What happens if I follow it with
--bar
? Will it consume it as the argument?
$ python parse.py --foo --bar
Namespace(bar=True, foo=None)
Nope! Unlike arg
, it left --bar
alone, so instead of following the
unambiguous conventions, it has its own ambiguous semantics and attempts
to remedy them with a “smart” heuristic: “If an optional argument looks
like an option, then it must be an option!” Non-option arguments can
never follow an option with an optional argument, which makes that
feature pretty useless. Since argparse
does not properly support --
,
that does not help.
$ python parse.py --foo -- arg
usage: parse.py [-h] [--foo [FOO]] [--bar]
parse.py: error: unrecognized arguments: -- arg
Please, stick to the conventions unless you have really good reasons
to break them!