<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title>Articles tagged python at null program</title>
  <link rel="alternate" type="text/html"
        href="https://nullprogram.com/tags/python/"/>
  <link rel="self" type="application/atom+xml"
        href="https://nullprogram.com/tags/python/feed/"/>
  <updated>2026-04-07T03:24:16Z</updated>
  <id>urn:uuid:7a5665a7-6c04-451e-a43c-f7284c9cfcca</id>

  <author>
    <name>Christopher Wellons</name>
    <uri>https://nullprogram.com</uri>
    <email>wellons@nullprogram.com</email>
  </author>

  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Assertions should be more debugger-oriented</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2022/06/26/"/>
    <id>urn:uuid:22ae914c-971b-4cee-ba48-a189db1b6df6</id>
    <updated>2022-06-26T18:51:04Z</updated>
    <category term="c"/><category term="cpp"/><category term="go"/><category term="python"/><category term="java"/>
    <content type="html">
      <![CDATA[<p>Prompted by <a href="https://www.youtube.com/watch?v=r9eQth4Q5jg">a 20 minute video</a>, over the past month I’ve improved my
debugger skills. I’d shamefully acquired a bad habit: avoiding a debugger
until exhausting dumber, insufficient methods. My <em>first</em> choice should be
a debugger, but I had allowed a bit of friction to dissuade me. With some
thoughtful practice and deliberate effort clearing the path, my bad habit
is finally broken — at least when a good debugger is available. It feels
like I’ve leveled up and, <a href="/blog/2017/04/01/">like touch typing</a>, this was a skill I’d
neglected far too long. One friction point was the less-than-optimal
<code class="language-plaintext highlighter-rouge">assert</code> feature in basically every programming language implementation.
It ought to work better with debuggers.</p>

<p>An assertion verifies a program invariant, and so if one fails then
there’s undoubtedly a defect in the program. In other words, assertions
make programs more sensitive to defects, allowing problems to be caught
more quickly and accurately. Counter-intuitively, crashing early and often
makes for more robust and reliable software in the long run. For exactly
this reason, assertions go especially well with <a href="/blog/2019/01/25/">fuzzing</a>.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">assert</span><span class="p">(</span><span class="n">i</span> <span class="o">&gt;=</span> <span class="mi">0</span> <span class="o">&amp;&amp;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">len</span><span class="p">);</span>   <span class="c1">// bounds check</span>
<span class="n">assert</span><span class="p">((</span><span class="kt">ssize_t</span><span class="p">)</span><span class="n">size</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="p">);</span>  <span class="c1">// suspicious size_t</span>
<span class="n">assert</span><span class="p">(</span><span class="n">cur</span><span class="o">-&gt;</span><span class="n">next</span> <span class="o">!=</span> <span class="n">cur</span><span class="p">);</span>    <span class="c1">// circular reference?</span>
</code></pre></div></div>

<p>They’re sometimes abused for error handling, which is a reason they’ve
also been (wrongfully) discouraged at times. For example, failing to open
a file is an error, not a defect, so an assertion is inappropriate.</p>

<p>Normal programs have implicit assertions all over, even if we don’t
usually think of them as assertions. In some cases they’re checked by the
hardware. Examples of implicit assertion failures:</p>

<ul>
  <li>Out-of-bounds indexing</li>
  <li>Dereferencing null/nil/None</li>
  <li>Dividing by zero</li>
  <li>Certain kinds of integer overflow (e.g. <code class="language-plaintext highlighter-rouge">-ftrapv</code>)</li>
</ul>

<p>Programs are generally not intended to recover from these situations
because, had they been anticipated, the invalid operation wouldn’t have
been attempted in the first place. The program simply crashes because
there’s no better alternative. Sanitizers, including Address Sanitizer
(ASan) and Undefined Behavior Sanitizer (UBSan), are in essence
additional, implicit assertions, checking invariants that aren’t normally
checked.</p>

<p>Ideally a failing assertion should have these two effects:</p>

<ul>
  <li>
    <p>Execution should <em>immediately</em> stop. The program is in an unknown state,
so it’s neither safe to “clean up” nor attempt to recover. Additional
execution will only make debugging more difficult, and may obscure the
defect.</p>
  </li>
  <li>
    <p>When run under a debugger — or visited as a core dump — it should break
exactly at the failed assertion, ready for inspection. I should not need
to dig around the call stack to figure out where the failure occurred. I
certainly shouldn’t need to manually set a breakpoint and restart the
program hoping to fail the assertion a second time. The whole reason for
using a debugger is to save time, so if it’s wasting my time then it’s
failing at its primary job.</p>
  </li>
</ul>

<p>I examined standard <code class="language-plaintext highlighter-rouge">assert</code> features across various language
implementations, and none strictly meet the criteria. Fortunately, in some
cases, it’s trivial to build a better assertion, and you can substitute
your own definition. First, let’s discuss the way assertions disappoint.</p>

<h3 id="a-test-assertion">A test assertion</h3>

<p>My test for C and C++ is minimal but establishes some state and gives me a
variable to inspect:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;assert.h&gt;</span><span class="cp">
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">10</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">assert</span><span class="p">(</span><span class="n">i</span> <span class="o">&lt;</span> <span class="mi">5</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Then I compile and debug in the most straightforward way:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cc -g -o test test.c
$ gdb test
(gdb) r
(gdb) bt
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">r</code> in GDB stands for <code class="language-plaintext highlighter-rouge">run</code>, which immediately breaks because of the
<code class="language-plaintext highlighter-rouge">assert</code>. The <code class="language-plaintext highlighter-rouge">bt</code> prints a backtrace. On a typical Linux distribution
that shows this backtrace:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0  __GI_raise
#1  __GI_abort
#2  __assert_fail_base
#3  __GI___assert_fail
#4  main
</code></pre></div></div>

<p>Well, actually, it’s much messier than this, but I manually cleaned it up:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linu
x/raise.c:50
#1  0x00007ffff7df4537 in __GI_abort () at abort.c:79
#2  0x00007ffff7df440f in __assert_fail_base (fmt=0x7ffff7f5d
128 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x
55555555600b "i &lt; 5", file=0x555555556004 "test.c", line=6, f
unction=&lt;optimized out&gt;) at assert.c:92
#3  0x00007ffff7e03662 in __GI___assert_fail (assertion=0x555
55555600b "i &lt; 5", file=0x555555556004 "test.c", line=6, func
tion=0x555555556011 &lt;__PRETTY_FUNCTION__.0&gt; "main") at assert
.c:101
#4  0x0000555555555178 in main () at test.c:6
</code></pre></div></div>

<p>That’s a lot to take in at a glance, and about 95% of it is noise that
will never contain useful information. Most notably, GDB didn’t stop at
the failing assertion. Instead there’s <em>four stack frames</em> of libc junk I
have to navigate before I can even begin debugging.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) up
(gdb) up
(gdb) up
(gdb) up
</code></pre></div></div>

<p>I must wade through this for every assertion failure. This is some of the
friction that made me avoid the debugger in the first place. glibc loves
indirection, so maybe the other libc implementations do better? How about
musl?</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0  setjmp
#1  raise
#2  ??
#3  ??
#4  ??
#5  ??
#6  ??
#7  ??
#8  ??
#9  ??
#10 ??
#11 ??
</code></pre></div></div>

<p>Oops, without musl debugging symbols I can’t debug assertions at all
because GDB can’t read the stack, so it’s lost. If you’re on Alpine you
can install <code class="language-plaintext highlighter-rouge">musl-dbg</code>, but otherwise you’ll probably need to build your
own from source. With debugging symbols, musl is no better than glibc:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0  __restore_sigs
#1  raise
#2  abort
#3  __assert_fail
#4  main
</code></pre></div></div>

<p>Same with FreeBSD:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0  thr_kill
#1  in raise
#2  in abort
#3  __assert
#4  main
</code></pre></div></div>

<p>OpenBSD has one fewer frame:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0  thrkill
#1  _libc_abort
#2  _libc___assert2
#3  main
</code></pre></div></div>

<p>How about on Windows with Mingw-w64?</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[Inferior 1 (process 7864) exited with code 03]
</code></pre></div></div>

<p>Oops, on Windows GDB doesn’t break at all on <code class="language-plaintext highlighter-rouge">assert</code>. You must first set
a breakpoint on <code class="language-plaintext highlighter-rouge">abort</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) b abort
</code></pre></div></div>

<p>Besides that, it’s the most straightforward so far:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0 msvcrt!abort
#1 msvcrt!_assert
#2 main
</code></pre></div></div>

<p>With MSVC (default CRT) I get something slightly different:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0 abort
#1 common_assert_to_stderr
#2 _wassert
#3 main
#4 __scrt_common_main_seh
</code></pre></div></div>

<p>RemedyBG leaves me at the <code class="language-plaintext highlighter-rouge">abort</code> like GDB does elsewhere. Visual Studio
recognizes that I don’t care about its stack frames and instead puts the
focus on the assertion, ready for debugging. The other stack frames are
there, but basically invisible. It’s the only case that practically meets
all my criteria!</p>

<p>I can’t entirely blame these implementations. The C standard requires that
<code class="language-plaintext highlighter-rouge">assert</code> print a diagnostic and call <code class="language-plaintext highlighter-rouge">abort</code>, and that <code class="language-plaintext highlighter-rouge">abort</code> raises
<code class="language-plaintext highlighter-rouge">SIGABRT</code>. There’s not much implementations can do, and it’s up to the
debugger to be smarter about it.</p>

<h3 id="sanitizers">Sanitizers</h3>

<p>ASan doesn’t break GDB on assertion failures, which is yet another source
of friction. You can work around this with an environment variable:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>export ASAN_OPTIONS=abort_on_error=1:print_legend=0
</code></pre></div></div>

<p>This works, but it’s the worst case of all: I get 7 junk stack frames on
top of the failed assertion. It’s also very noisy when it traps, so the
<code class="language-plaintext highlighter-rouge">print_legend=0</code> helps to cut it down a bit. I want this variable so often
that I set it in my shell’s <code class="language-plaintext highlighter-rouge">.profile</code> so that it’s always set.</p>

<p>With UBSan you can use <code class="language-plaintext highlighter-rouge">-fsanitize-undefined-trap-on-error</code>, which behaves
like the improved assertion. It traps directly on the defect with no junk
frames, though it prints no diagnostic. As a bonus, it also means you
don’t need to link <code class="language-plaintext highlighter-rouge">libubsan</code>. Thanks to the bonus, it fully supplants
<code class="language-plaintext highlighter-rouge">-ftrapv</code> for me on all platforms.</p>

<p><strong>Update November 2022</strong>: This “stop” hook eliminates ASan friction by
popping runtime frames — functions with the reserved <code class="language-plaintext highlighter-rouge">__</code> prefix — from
the call stack so that they’re not in the way when GDB takes control. It
requires Python support, which is the purpose of the feature-sniff outer
condition.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>if !$_isvoid($_any_caller_matches)
    define hook-stop
        while $_thread &amp;&amp; $_any_caller_matches("^__")
            up-silently
        end
    end
end
</code></pre></div></div>

<p>This is now part of my <code class="language-plaintext highlighter-rouge">.gdbinit</code>.</p>

<h3 id="a-better-assertion">A better assertion</h3>

<p>At least when under a debugger, here’s a much better assertion macro for
GCC and Clang:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define assert(c) if (!(c)) __builtin_trap()
</span></code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">__builtin_trap</code> inserts a trap instruction — a built-in breakpoint. By
not calling a function to raise a signal, there are no junk stack frames
and no need to breakpoint on <code class="language-plaintext highlighter-rouge">abort</code>. It stops exactly where it should as
quickly as possible. This definition works reliably with GCC across all
platforms, too. On MSVC the equivalent is <code class="language-plaintext highlighter-rouge">__debugbreak</code>. If you’re really
in a pinch then do whatever it takes to trigger a fault, like
dereferencing a null pointer. A more complete definition might be:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#ifdef DEBUG
#  if __GNUC__
#    define assert(c) if (!(c)) __builtin_trap()
#  elif _MSC_VER
#    define assert(c) if (!(c)) __debugbreak()
#  else
#    define assert(c) if (!(c)) *(volatile int *)0 = 0
#  endif
#else
#  define assert(c)
#endif
</span></code></pre></div></div>

<p>None of these print a diagnostic, but that’s unnecessary when a debugger
is involved.</p>

<h3 id="other-languages">Other languages</h3>

<p>Unfortunately the situation <a href="https://github.com/rust-lang/rust/issues/21102">mostly gets worse</a> with other language
implementations, and it’s generally not possible to build a better
assertion. Assertions typically have exception-like semantics, if not
literally just another exception, and so they are far less reliable. If a
failed assertion raises an exception, then the program won’t stop until
it’s unwound the stack — running destructors and such along the way — all
the way to the top level looking for a handler. It only knows there’s a
problem when nobody was there to catch it.</p>

<p><a href="https://go.dev/doc/faq#assertions">Go officially doesn’t have assertions</a>, though panics are a kind of
assertion. However, panics have exception-like semantics, and so suffer
the problems of exceptions. A Go version of my test:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">defer</span> <span class="n">fmt</span><span class="o">.</span><span class="n">Println</span><span class="p">(</span><span class="s">"DEFER"</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">i</span> <span class="o">:=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="m">10</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span> <span class="p">{</span>
        <span class="k">if</span> <span class="n">i</span> <span class="o">&gt;=</span> <span class="m">5</span> <span class="p">{</span>
            <span class="nb">panic</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>If I run this under Go’s premier debugger, <a href="https://github.com/go-delve/delve">Delve</a>, the unrecovered
panic causes it to break. So far so good. However, I get two junk frames:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0 runtime.fatalpanic
#1 runtime.gopanic
#2 main.main
#3 runtime.main
#4 runtime.goexit
</code></pre></div></div>

<p>It only knows to stop because the Go runtime called <code class="language-plaintext highlighter-rouge">fatalpanic</code>, but the
backtrace is a fiction: The program continued to run after the panic,
enough to run all the registered defers (including printing “DEFER”),
unwinding the stack to the top level, and only then did it <code class="language-plaintext highlighter-rouge">fatalpanic</code>.
Fortunately it’s still possible to inspect all those stack frames even if
some variables may have changed while unwinding, but it’s more like
inspecting a core dump than a paused process.</p>

<p>The situation in Python is similar: <code class="language-plaintext highlighter-rouge">assert</code> raises AssertionError — a
plain old exception — and <code class="language-plaintext highlighter-rouge">pdb</code> won’t break until the stack has unwound,
exiting context managers and such. Only once the exception reaches the top
level does it enter “post mortem debugging,” like a core dump. At least
there are no junk stack frames on top. If you’re using asyncio then your
program may continue running for quite awhile before the right tasks are
scheduled and the exception finally propagates to the top level, if ever.</p>

<p>The worst offender of all is Java. First <code class="language-plaintext highlighter-rouge">jdb</code> never breaks for unhandled
exceptions. It’s up to you to set a breakpoint before the exception is
thrown. But it gets worse: assertions are disabled under <code class="language-plaintext highlighter-rouge">jdb</code>. The Java
<code class="language-plaintext highlighter-rouge">assert</code> statement is worse than useless.</p>

<h3 id="addendum-dont-exit-the-debugger">Addendum: Don’t exit the debugger</h3>

<p>The largest friction-reducing change I made is never exiting the debugger.
Previously I would enter GDB, run my program, exit, edit/rebuild, repeat.
However, there’s no reason to exit GDB! It automatically and reliably
reloads symbols and updates breakpoints on symbols. It remembers your run
configuration, so re-running is just <code class="language-plaintext highlighter-rouge">r</code> rather than interacting with
shell history.</p>

<p>My workflow on all platforms (<a href="/blog/2020/05/15/">including Windows</a>) is a vertically
maximized Vim window and a vertically maximized terminal window. The new
part for me: The terminal runs a long-term GDB session exclusively, with
<code class="language-plaintext highlighter-rouge">file</code> set to the program I’m writing, usually set by initial the command
line.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gdb myprogram
gdb&gt;
</code></pre></div></div>

<p>Alternatively use <code class="language-plaintext highlighter-rouge">file</code> after starting GDB. Occasionally useful if my
project has multiple binaries, and I want to examine a different program.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; file myprogram
</code></pre></div></div>

<p>I use <code class="language-plaintext highlighter-rouge">make</code> and Vim’s <code class="language-plaintext highlighter-rouge">:mak</code> command for building from within the editor,
so I don’t need to change context to build. The quickfix list takes me
straight to warnings/errors. Often I’m writing something that takes input
from standard input. So I use the <code class="language-plaintext highlighter-rouge">run</code> (<code class="language-plaintext highlighter-rouge">r</code>) command to set this up
(along with any command line arguments).</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; r &lt;test.txt
</code></pre></div></div>

<p>You can redirect standard output as well. It remembers these settings for
plain <code class="language-plaintext highlighter-rouge">run</code> later, so I can test my program by entering <code class="language-plaintext highlighter-rouge">r</code> and nothing
else.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; r
</code></pre></div></div>

<p>My usual workflow is edit, <code class="language-plaintext highlighter-rouge">:mak</code>, <code class="language-plaintext highlighter-rouge">r</code>, repeat. If I want to test a
different input or use different options, change the run configuration
using <code class="language-plaintext highlighter-rouge">run</code> again:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; r -a -b -c &lt;test2.txt
</code></pre></div></div>

<p>On Windows you cannot recompile while the program is running. If GDB is
sitting on a breakpoint but I want to build, use <code class="language-plaintext highlighter-rouge">kill</code> (<code class="language-plaintext highlighter-rouge">k</code>) to stop it
without exiting GDB.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; k
</code></pre></div></div>

<p>GDB has an annoying, flow-breaking yes/no prompt for this, so I recommend
<code class="language-plaintext highlighter-rouge">set confirm no</code> in your <code class="language-plaintext highlighter-rouge">.gdbinit</code> to disable it.</p>

<p>Sometimes a program is stuck in a loop and I need it to break in the
debugger. I try to avoid CTRL-C in the terminal it since it can confuse
GDB. A safer option is to signal the process from Vim with <code class="language-plaintext highlighter-rouge">pkill</code>, which
GDB will catch (except on Windows):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>:!pkill myprogram
</code></pre></div></div>

<p>I suspect many people don’t know this, but if you’re on Windows and
<a href="/blog/2021/03/11/">developing a graphical application</a>, you can <a href="https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-registerhotkey">press F12</a> in the
debuggee’s window to immediately break the program in the attached
debugger. This is a general platform feature and works with any native
debugger. I’ve been using it quite a lot.</p>

<p>On that note, you can run commands from GDB with <code class="language-plaintext highlighter-rouge">!</code>, which is another way
to avoid having an extra terminal window around:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; !git diff
</code></pre></div></div>

<p>In any case, GDB will re-read the binary on the next <code class="language-plaintext highlighter-rouge">run</code> and update
breakpoints, so it’s mostly seamless. If there’s a function I want to
debug, I set a breakpoint on it, then run.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; b somefunc
gdb&gt; r
</code></pre></div></div>

<p>Alternatively I’ll use a line number, which I read from Vim. Though GDB,
not being involved in the editing process, cannot track how that line
moves between builds.</p>

<p>An empty command repeats the last command, so once I’m at a breakpoint,
I’ll type <code class="language-plaintext highlighter-rouge">next</code> (<code class="language-plaintext highlighter-rouge">n</code>) — or <code class="language-plaintext highlighter-rouge">step</code> (<code class="language-plaintext highlighter-rouge">s</code>) to enter function calls — then
press enter each time I want to advance a line, often with my eye on the
context in Vim in the other window:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; n
gdb&gt;
gdb&gt;
</code></pre></div></div>

<p>(<del>I wish GDB could print a source listing around the breakpoint as
context, like Delve, but no such feature exists. The woeful <code class="language-plaintext highlighter-rouge">list</code> command
is inadequate.</del> <strong>Update</strong>: GDB’s TUI is a reasonable compromise for GUI
applications or terminal applications running under a separate tty/console
with either <code class="language-plaintext highlighter-rouge">tty</code> or <code class="language-plaintext highlighter-rouge">set new-console</code>. I can access it everywhere since
w64devkit now supports GDB TUI.)</p>

<p>If I want to advance to the next breakpoint, I use <code class="language-plaintext highlighter-rouge">continue</code> (<code class="language-plaintext highlighter-rouge">c</code>):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; c
</code></pre></div></div>

<p>If I’m walking through a loop, I want to see how variables change, but
it’s tedious to keep <code class="language-plaintext highlighter-rouge">print</code>ing (<code class="language-plaintext highlighter-rouge">p</code>) the same variables again and again.
So I use <code class="language-plaintext highlighter-rouge">display</code> (<code class="language-plaintext highlighter-rouge">disp</code>) to display an expression with each prompt,
much like the “watch” window in Visual Studio. For example, if my loop
variable is <code class="language-plaintext highlighter-rouge">i</code> over some string <code class="language-plaintext highlighter-rouge">str</code>, this will show me the current
character in character format (<code class="language-plaintext highlighter-rouge">/c</code>).</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; disp/c str[i]
</code></pre></div></div>

<p>You can accumulate multiple expressions. Use <code class="language-plaintext highlighter-rouge">undisplay</code> to remove them.</p>

<p>Too many breakpoints? Use <code class="language-plaintext highlighter-rouge">info breakpoints</code> (<code class="language-plaintext highlighter-rouge">i b</code>) to list them, then
<code class="language-plaintext highlighter-rouge">delete</code> (<code class="language-plaintext highlighter-rouge">d</code>) the unwanted ones by ID.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; i b
gdb&gt; d 3 5 8
</code></pre></div></div>

<p>GDB has many more feature than this, but 10 commands cover 99% of use
cases: <code class="language-plaintext highlighter-rouge">r</code>, <code class="language-plaintext highlighter-rouge">c</code>, <code class="language-plaintext highlighter-rouge">n</code>, <code class="language-plaintext highlighter-rouge">s</code>, <code class="language-plaintext highlighter-rouge">disp</code>, <code class="language-plaintext highlighter-rouge">k</code>, <code class="language-plaintext highlighter-rouge">b</code>, <code class="language-plaintext highlighter-rouge">i</code>, <code class="language-plaintext highlighter-rouge">d</code>, <code class="language-plaintext highlighter-rouge">p</code>.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Compressing and embedding a Wordle word list</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2022/03/07/"/>
    <id>urn:uuid:95e1a2c2-c1b6-4472-9954-7bc76b4bab10</id>
    <updated>2022-03-07T03:22:41Z</updated>
    <category term="c"/><category term="python"/><category term="compression"/>
    <content type="html">
      <![CDATA[<p><a href="https://en.wikipedia.org/wiki/Wordle">Wordle</a> is all the rage, resulting in an explosion of hobbyist clones,
with new ones appearing every day. At the current rate I estimate by the
end of 2022 that 99% of all new software releases will be Wordle clones.
That’s no surprise since the rules are simple, it’s more fun to implement
and study than to actually play, and the hard part is building a decent
user interface. Such implementations go back <a href="https://www.youtube.com/watch?v=Yi2mTMWC4BM&amp;t=1270s">at least 30 years</a>.
Implementers get to decide on a platform, language, and the particular
subject of this article: how to handle the word list. Is it a separate
file/database or <a href="/blog/2016/11/15/">embedded in the program</a>? If embedded, is it
worth compressing? In this article I’ll present a simple, tailored Wordle
list compression strategy that beats general purpose compressors.</p>

<p>Last week one particular <a href="/blog/2020/11/17/">QuickBASIC</a> clone, <a href="http://grahamdowney.com/software/WorDOSle/WorDOSle.htm">WorDOSle</a>, caught my
eye. It embeds its word list despite the dire constraints of its 16-bit
platform. The original Wordle list (<a href="https://gist.github.com/cfreshman/cdcdf777450c5b5301e439061d29694c">1</a>, <a href="https://gist.github.com/cfreshman/a03ef2cba789d8cf00c08f767e0fad7b">2</a>) has 12,972 words which,
naively stored, would consume 77,832 bytes (5 letters, plus newline).
Sadly this exceeds a 16-bit address space. Eliminating the redundant
newline delimiter brings it down to 64,860 bytes — just small enough to
fit in an 8086 segment, but probably still difficult to manage from
QuickBASIC.</p>

<p>The author made a trade-off, reducing the word list to a more manageable,
if meager, 2,318 words, wisely excluding delimiters. Otherwise no further
effort made towards reducing the size. The list is sorted, and the program
cleverly tests words against the list in place using a binary search.</p>

<h3 id="compaction-baseline">Compaction baseline</h3>

<p>Before getting into any real compression technologies, there’s low hanging
fruit to investigate. Words are exactly five, case-insensitive, English
language letters: a–z. To illustrate, here are the first 100 5-letter
words from a short Wordle word list.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>abbey acute agile album alloy ample apron array attic awful
abide adapt aging alert alone angel arbor arrow audio babes
about added agree algae along anger areas ashes audit backs
above admit ahead alias aloud angle arena aside autos bacon
abuse adobe aided alien alpha angry argue asked avail badge
acids adopt aides align altar ankle arise aspen avoid badly
acorn adult aimed alike alter annex armed asses await baked
acres after aired alive amber apart armor asset awake baker
acted again aisle alley amend apple aroma atlas award balls
actor agent alarm allow among apply arose atoms aware bands
</code></pre></div></div>

<p>In ASCII/UTF-8 form it’s 8 bits per letter, 5 bytes per word, but I only
need 5 bits per letter, or more specifically, ~4.7 bits (<code class="language-plaintext highlighter-rouge">log2(26)</code>) per
letter. If I instead treat each word as a base-26 number, I can pack each
word into 3 bytes (<code class="language-plaintext highlighter-rouge">26**5</code> is ~23.5 bits). A 40% savings just by using a
smarter representation.</p>

<p>With 12,972 words, that’s <strong>38,916 bytes</strong> for the whole list. Any
compression I apply must at least beat this size in order to be worth
using.</p>

<h3 id="letter-frequency">Letter frequency</h3>

<p>Not all letters occur at the same frequency. Here’s the letter frequency
for the original Wordle word list:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>a:5990  e:6662  i:3759  m:1976  q: 112  u:2511  y:2074
b:1627  f:1115  j: 291  n:2952  r:4158  v: 694  z: 434
c:2028  g:1644  k:1505  o:4438  s:6665  w:1039
d:2453  h:1760  l:3371  p:2019  t:3295  x: 288
</code></pre></div></div>

<p>When encoding a word, I can save space by spending fewer bits on frequent
letters like <code class="language-plaintext highlighter-rouge">e</code> at the cost of spending more bits on infrequent letters
like <code class="language-plaintext highlighter-rouge">q</code>. There are multiple approaches, but the simplest is <a href="https://en.wikipedia.org/wiki/Huffman_coding">Huffman
coding</a>. It’s not the most efficient, but it’s so easy I can
almost code it in my sleep.</p>

<p>While my ultimate target is C, I did the frequency analysis, explored the
problem space, and implemented my compressors in Python. I don’t normally
like to use Python, but it <em>is</em> good for one-shot, disposable data
science-y stuff like this. The decompressor will be implemented in C,
partially via meta-programming: Python code generating my C code. Here’s
my letter histogram code:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">words</span> <span class="o">=</span> <span class="p">[</span><span class="n">line</span><span class="p">[:</span><span class="mi">5</span><span class="p">]</span> <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">sys</span><span class="p">.</span><span class="n">stdin</span><span class="p">]</span>
<span class="n">hist</span> <span class="o">=</span> <span class="n">collections</span><span class="p">.</span><span class="n">defaultdict</span><span class="p">(</span><span class="nb">int</span><span class="p">)</span>
<span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">itertools</span><span class="p">.</span><span class="n">chain</span><span class="p">(</span><span class="o">*</span><span class="n">words</span><span class="p">):</span>
    <span class="n">hist</span><span class="p">[</span><span class="n">c</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">1</span>
</code></pre></div></div>

<p>To build a Huffman coding tree, I’ll need a min-heap (priority queue)
initially filled with nodes representing each letter and its frequency.
While the heap has more than one element, I pop off the two lowest
frequency nodes, create a new parent node with the sum of their
frequencies, and push it into the heap. When the heap has one element, the
remaining element is the root of the Huffman coding tree.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">huffman</span><span class="p">(</span><span class="n">hist</span><span class="p">):</span>
    <span class="n">heap</span> <span class="o">=</span> <span class="p">[(</span><span class="n">n</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span> <span class="k">for</span> <span class="n">c</span><span class="p">,</span> <span class="n">n</span> <span class="ow">in</span> <span class="n">hist</span><span class="p">.</span><span class="n">items</span><span class="p">()]</span>
    <span class="n">heapq</span><span class="p">.</span><span class="n">heapify</span><span class="p">(</span><span class="n">heap</span><span class="p">)</span>
    <span class="k">while</span> <span class="nb">len</span><span class="p">(</span><span class="n">heap</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mi">1</span><span class="p">:</span>
        <span class="n">a</span><span class="p">,</span> <span class="n">b</span> <span class="o">=</span> <span class="n">heapq</span><span class="p">.</span><span class="n">heappop</span><span class="p">(</span><span class="n">heap</span><span class="p">),</span> <span class="n">heapq</span><span class="p">.</span><span class="n">heappop</span><span class="p">(</span><span class="n">heap</span><span class="p">)</span>
        <span class="n">node</span> <span class="o">=</span> <span class="n">a</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">+</span><span class="n">b</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="p">(</span><span class="n">a</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">b</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
        <span class="n">heapq</span><span class="p">.</span><span class="n">heappush</span><span class="p">(</span><span class="n">heap</span><span class="p">,</span> <span class="n">node</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">heap</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">1</span><span class="p">]</span>

<span class="n">tree</span> <span class="o">=</span> <span class="n">huffman</span><span class="p">(</span><span class="n">hist</span><span class="p">)</span>
</code></pre></div></div>

<p>(By the way, I love that <code class="language-plaintext highlighter-rouge">heapq</code> operates directly on a plain <code class="language-plaintext highlighter-rouge">list</code>
rather than being its own data structure.) This produces the following
Huffman coding tree (via <code class="language-plaintext highlighter-rouge">pprint</code>):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>((('e', 's'),
  (('t', 'l'), (('g', ('v', 'w')), ('h', 'm')))),
 ((('i', ('p', 'c')),
   ('r', ('y', ('f', ('z', ('j', ('q', 'x'))))))),
  (('o', ('d', 'u')), ('a', ('n', ('k', 'b'))))))
</code></pre></div></div>

<p>It would be more useful to actually see the encodings.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">flatten</span><span class="p">(</span><span class="n">tree</span><span class="p">,</span> <span class="n">prefix</span><span class="o">=</span><span class="s">""</span><span class="p">):</span>
    <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">tree</span><span class="p">,</span> <span class="nb">tuple</span><span class="p">):</span>
        <span class="k">return</span> <span class="n">flatten</span><span class="p">(</span><span class="n">tree</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">prefix</span><span class="o">+</span><span class="s">"0"</span><span class="p">)</span> <span class="o">+</span> \
               <span class="n">flatten</span><span class="p">(</span><span class="n">tree</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">prefix</span><span class="o">+</span><span class="s">"1"</span><span class="p">)</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="k">return</span> <span class="p">[(</span><span class="n">tree</span><span class="p">,</span> <span class="n">prefix</span><span class="p">)]</span>
</code></pre></div></div>

<p>I used <code class="language-plaintext highlighter-rouge">isinstance</code> to distinguish leaves (<code class="language-plaintext highlighter-rouge">str</code>) from internal nodes
(<code class="language-plaintext highlighter-rouge">tuple</code>). With <code class="language-plaintext highlighter-rouge">sorted(flatten(tree))</code>, I get something like Morse Code:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[('a', '1110'),       ('j', '10111110'),   ('s', '001'),
 ('b', '111111'),     ('k', '111110'),     ('t', '0100'),
 ('c', '10011'),      ('l', '0101'),       ('u', '11011'),
 ('d', '11010'),      ('m', '01111'),      ('v', '011010'),
 ('e', '000'),        ('n', '11110'),      ('w', '011011'),
 ('f', '101110'),     ('o', '1100'),       ('x', '101111111'),
 ('g', '01100'),      ('p', '10010'),      ('y', '10110'),
 ('h', '01110'),      ('q', '101111110'),  ('z', '1011110')]
 ('i', '1000'),       ('r', '1010'),
</code></pre></div></div>

<p>In terms of encoded bit length, what is the shortest and longest?</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">codes</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span><span class="n">flatten</span><span class="p">(</span><span class="n">tree</span><span class="p">))</span>
<span class="n">lengths</span> <span class="o">=</span> <span class="p">[(</span><span class="nb">sum</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">codes</span><span class="p">[</span><span class="n">c</span><span class="p">])</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">w</span><span class="p">),</span> <span class="n">w</span><span class="p">)</span> <span class="k">for</span> <span class="n">w</span> <span class="ow">in</span> <span class="n">words</span><span class="p">]</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">min(lengths)</code> is “esses” at 15 bits, and <code class="language-plaintext highlighter-rouge">max(lengths)</code> is “qajaq” at 34
bits. In other words, the worst case is worse than the compact, 24-bit
representation! However, the total is better: <code class="language-plaintext highlighter-rouge">sum(w[0] for w in lengths)</code>
reports 281,956 bits, or 35,245 bytes. Packed appropriately, that shaves
off ~3.5kB, though it comes at the cost of losing random access, and
therefore binary search.</p>

<p>Speaking of bit packing, I’m ready to compress the entire word list into a
bit stream:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">bits</span> <span class="o">=</span> <span class="s">""</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="s">""</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">codes</span><span class="p">[</span><span class="n">c</span><span class="p">]</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">w</span><span class="p">)</span> <span class="k">for</span> <span class="n">w</span> <span class="ow">in</span> <span class="n">words</span><span class="p">)</span>
</code></pre></div></div>

<p>Where <code class="language-plaintext highlighter-rouge">bits</code> begins with:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>11101110011100001101011101110010110001000111011101...
</code></pre></div></div>

<p>On the C side I’ll pack these into 32-bit integers, least significant bit
first. I abused <code class="language-plaintext highlighter-rouge">textwrap</code> to dice it up, and I also need to reverse each
set of bits before converting to an integer.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">u32</span> <span class="o">=</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">b</span><span class="p">[::</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="mi">2</span><span class="p">)</span> <span class="k">for</span> <span class="n">b</span> <span class="ow">in</span> <span class="n">textwrap</span><span class="p">.</span><span class="n">wrap</span><span class="p">(</span><span class="n">bits</span><span class="p">,</span> <span class="n">width</span><span class="o">=</span><span class="mi">32</span><span class="p">)]</span>
</code></pre></div></div>

<p>I now have my compressed data as a sequence of 32-bit integers. Next, some
meta-programming:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"static const uint32_t words[</span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">u32</span><span class="p">)</span><span class="si">}</span><span class="s">] ="</span><span class="p">,</span> <span class="s">"{"</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s">""</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">u</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">u32</span><span class="p">):</span>
    <span class="k">if</span> <span class="n">i</span><span class="o">%</span><span class="mi">6</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
        <span class="k">print</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">    "</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s">""</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"0x</span><span class="si">{</span><span class="n">u</span><span class="si">:</span><span class="mi">08</span><span class="n">x</span><span class="si">}</span><span class="s">,"</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s">""</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">};"</span><span class="p">)</span>
</code></pre></div></div>

<p>That produces a C table, the beginnings of my decompressor. The array
length isn’t necessary since the C compiler can figure it out, but being
explicit allows human readers to know the size at a glance, too. Observe
how the final 32-bit integer isn’t entirely filled.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="k">const</span> <span class="kt">uint32_t</span> <span class="n">words</span><span class="p">[</span><span class="mi">8812</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
    <span class="mh">0x4eeb0e77</span><span class="p">,</span><span class="mh">0xb8caee23</span><span class="p">,</span><span class="mh">0xffb892bb</span><span class="p">,</span><span class="mh">0x397fddf2</span><span class="p">,</span><span class="mh">0xddfcbfee</span><span class="p">,</span><span class="mh">0x5ff7997f</span><span class="p">,</span>
    <span class="c1">// ...</span>
    <span class="mh">0x7b4e66bd</span><span class="p">,</span><span class="mh">0x35ebcccd</span><span class="p">,</span><span class="mh">0x8f9af60f</span><span class="p">,</span><span class="mh">0x0000000c</span><span class="p">,</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Now, how to go about building the rest of the decompressor? I have a
Huffman coding tree, which is <em>an awful lot</em> <a href="/blog/2020/12/31/">like a state machine</a>,
eh? I can even have Python generate a state transition table from the
Huffman tree:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">transitions</span><span class="p">(</span><span class="n">tree</span><span class="p">,</span> <span class="n">states</span><span class="p">,</span> <span class="n">state</span><span class="p">):</span>
    <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">tree</span><span class="p">,</span> <span class="nb">tuple</span><span class="p">):</span>
        <span class="n">child</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">states</span><span class="p">)</span>
        <span class="n">states</span><span class="p">[</span><span class="n">state</span><span class="p">]</span> <span class="o">=</span> <span class="o">-</span><span class="n">child</span>
        <span class="n">states</span><span class="p">.</span><span class="n">extend</span><span class="p">((</span><span class="bp">None</span><span class="p">,</span> <span class="bp">None</span><span class="p">))</span>
        <span class="n">transitions</span><span class="p">(</span><span class="n">tree</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">states</span><span class="p">,</span> <span class="n">child</span><span class="o">+</span><span class="mi">0</span><span class="p">)</span>
        <span class="n">transitions</span><span class="p">(</span><span class="n">tree</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">states</span><span class="p">,</span> <span class="n">child</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="n">states</span><span class="p">[</span><span class="n">state</span><span class="p">]</span> <span class="o">=</span> <span class="nb">ord</span><span class="p">(</span><span class="n">tree</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">states</span>

<span class="n">states</span> <span class="o">=</span> <span class="n">transitions</span><span class="p">(</span><span class="n">tree</span><span class="p">,</span> <span class="p">[</span><span class="bp">None</span><span class="p">],</span> <span class="mi">0</span><span class="p">)</span>
</code></pre></div></div>

<p>The central idea: positive entries are leaves, and negative entries are
internal nodes. The negated value is the index of the left child, with the
right child immediately following. In <code class="language-plaintext highlighter-rouge">transitions</code>, the caller reserves
space in the state table for callees, hence starting with <code class="language-plaintext highlighter-rouge">[None]</code>. I’ll
show the actual table in C form after some more meta-programming:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"static const int8_t states[</span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">states</span><span class="p">)</span><span class="si">}</span><span class="s">] ="</span><span class="p">,</span> <span class="s">"{"</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s">""</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">s</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">states</span><span class="p">):</span>
    <span class="k">if</span> <span class="n">i</span><span class="o">%</span><span class="mi">12</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
        <span class="k">print</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">    "</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s">""</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">s</span><span class="si">:</span><span class="mi">4</span><span class="si">}</span><span class="s">,"</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s">""</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">};"</span><span class="p">)</span>
</code></pre></div></div>

<p>I chose <code class="language-plaintext highlighter-rouge">int8_t</code> since I know these values will all fit in an octet, and
it must be signed because of the negatives. The result:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="k">const</span> <span class="kt">int8_t</span> <span class="n">states</span><span class="p">[</span><span class="mi">51</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
      <span class="o">-</span><span class="mi">1</span><span class="p">,</span>  <span class="o">-</span><span class="mi">3</span><span class="p">,</span> <span class="o">-</span><span class="mi">19</span><span class="p">,</span>  <span class="o">-</span><span class="mi">5</span><span class="p">,</span>  <span class="o">-</span><span class="mi">7</span><span class="p">,</span> <span class="mi">101</span><span class="p">,</span> <span class="mi">115</span><span class="p">,</span>  <span class="o">-</span><span class="mi">9</span><span class="p">,</span> <span class="o">-</span><span class="mi">11</span><span class="p">,</span> <span class="mi">116</span><span class="p">,</span> <span class="mi">108</span><span class="p">,</span> <span class="o">-</span><span class="mi">13</span><span class="p">,</span>
     <span class="o">-</span><span class="mi">17</span><span class="p">,</span> <span class="mi">103</span><span class="p">,</span> <span class="o">-</span><span class="mi">15</span><span class="p">,</span> <span class="mi">118</span><span class="p">,</span> <span class="mi">119</span><span class="p">,</span> <span class="mi">104</span><span class="p">,</span> <span class="mi">109</span><span class="p">,</span> <span class="o">-</span><span class="mi">21</span><span class="p">,</span> <span class="o">-</span><span class="mi">39</span><span class="p">,</span> <span class="o">-</span><span class="mi">23</span><span class="p">,</span> <span class="o">-</span><span class="mi">27</span><span class="p">,</span> <span class="mi">105</span><span class="p">,</span>
     <span class="o">-</span><span class="mi">25</span><span class="p">,</span> <span class="mi">112</span><span class="p">,</span>  <span class="mi">99</span><span class="p">,</span> <span class="mi">114</span><span class="p">,</span> <span class="o">-</span><span class="mi">29</span><span class="p">,</span> <span class="mi">121</span><span class="p">,</span> <span class="o">-</span><span class="mi">31</span><span class="p">,</span> <span class="mi">102</span><span class="p">,</span> <span class="o">-</span><span class="mi">33</span><span class="p">,</span> <span class="mi">122</span><span class="p">,</span> <span class="o">-</span><span class="mi">35</span><span class="p">,</span> <span class="mi">106</span><span class="p">,</span>
     <span class="o">-</span><span class="mi">37</span><span class="p">,</span> <span class="mi">113</span><span class="p">,</span> <span class="mi">120</span><span class="p">,</span> <span class="o">-</span><span class="mi">41</span><span class="p">,</span> <span class="o">-</span><span class="mi">45</span><span class="p">,</span> <span class="mi">111</span><span class="p">,</span> <span class="o">-</span><span class="mi">43</span><span class="p">,</span> <span class="mi">100</span><span class="p">,</span> <span class="mi">117</span><span class="p">,</span>  <span class="mi">97</span><span class="p">,</span> <span class="o">-</span><span class="mi">47</span><span class="p">,</span> <span class="mi">110</span><span class="p">,</span>
     <span class="o">-</span><span class="mi">49</span><span class="p">,</span> <span class="mi">107</span><span class="p">,</span>  <span class="mi">98</span><span class="p">,</span>
<span class="p">};</span>
</code></pre></div></div>

<p>The first node is -1, meaning if you read a 0 bit then transition to state
1, else state 2 (e.g. immediately following 1). The decompressor reads one
bit at a time, walking the state table until it hits a positive value,
which is an ASCII code. I’ve decided on this function prototype:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int32_t</span> <span class="nf">next</span><span class="p">(</span><span class="kt">char</span> <span class="n">word</span><span class="p">[</span><span class="mi">5</span><span class="p">],</span> <span class="kt">int32_t</span> <span class="n">n</span><span class="p">);</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">n</code> is the bit index, which starts at zero. The function decodes the
word at the given index, then returns the bit index for the next word.
Callers can iterate the entire word list without decompressing the whole
list at once. Finally the decompressor code:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int32_t</span> <span class="nf">next</span><span class="p">(</span><span class="kt">char</span> <span class="n">word</span><span class="p">[</span><span class="mi">5</span><span class="p">],</span> <span class="kt">int32_t</span> <span class="n">n</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">5</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">int</span> <span class="n">state</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
        <span class="k">for</span> <span class="p">(;</span> <span class="n">states</span><span class="p">[</span><span class="n">state</span><span class="p">]</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">;</span> <span class="n">n</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
            <span class="kt">int</span> <span class="n">b</span> <span class="o">=</span> <span class="n">words</span><span class="p">[</span><span class="n">n</span><span class="o">&gt;&gt;</span><span class="mi">5</span><span class="p">]</span><span class="o">&gt;&gt;</span><span class="p">(</span><span class="n">n</span><span class="o">&amp;</span><span class="mi">31</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mi">1</span><span class="p">;</span>  <span class="c1">// next bit</span>
            <span class="n">state</span> <span class="o">=</span> <span class="n">b</span> <span class="o">-</span> <span class="n">states</span><span class="p">[</span><span class="n">state</span><span class="p">];</span>
        <span class="p">}</span>
        <span class="n">word</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">states</span><span class="p">[</span><span class="n">state</span><span class="p">];</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">n</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>When compiled, this is about 80 bytes of instructions, both x86-64 and
ARM64. This, along with the 51 bytes for the state table, should be
counted against the compression size. That’s 35,579 bytes total.</p>

<p>Trying it out, this program indeed reproduces the original word list:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int32_t</span> <span class="n">state</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="kt">char</span> <span class="n">word</span><span class="p">[]</span> <span class="o">=</span> <span class="s">".....</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">12972</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">state</span> <span class="o">=</span> <span class="n">next</span><span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="n">state</span><span class="p">);</span>
        <span class="n">fwrite</span><span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">stdout</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Searching 12,972 words linearly isn’t too bad, even for an old 16-bit
machine. However, if you really need to speed it up, you could build a
little run time index to track various bit positions in the list. For
example, the first word starting with <code class="language-plaintext highlighter-rouge">b</code> is at bit offset 15,743. If the
word I’m looking up begins with <code class="language-plaintext highlighter-rouge">b</code> then I can start there and stop at the
first <code class="language-plaintext highlighter-rouge">c</code>, decompressing just 909 words.</p>

<h3 id="taking-it-to-the-next-level-run-length-encoding">Taking it to the next level: run-length encoding</h3>

<p>Here’s the 100-word word list sample again. The sorting is deliberate:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>abbey acute agile album alloy ample apron array attic awful
abide adapt aging alert alone angel arbor arrow audio babes
about added agree algae along anger areas ashes audit backs
above admit ahead alias aloud angle arena aside autos bacon
abuse adobe aided alien alpha angry argue asked avail badge
acids adopt aides align altar ankle arise aspen avoid badly
acorn adult aimed alike alter annex armed asses await baked
acres after aired alive amber apart armor asset awake baker
acted again aisle alley amend apple aroma atlas award balls
actor agent alarm allow among apply arose atoms aware bands
</code></pre></div></div>

<p>If I look at words column-wise, I see a long run of <code class="language-plaintext highlighter-rouge">a</code>, then a long run
of <code class="language-plaintext highlighter-rouge">b</code>, etc. Even the second column has long runs. I should really exploit
this somehow. The first scheme would have worked equally as well on a
shuffled list as a sorted list, which is an indication that it’s storing
unnecessary information, namely the word list order. (Rule of thumb:
Compression should work better on sorted inputs.)</p>

<p>For this second scheme, I’ll pivot the whole list so that I can encode it
in column-order. (This is roughly how one part of bzip2 works, by the
way.) I’ll use run-length encoding (RLE) to communicate “91 ‘a’, 135 ‘b’,
etc.”, then I’ll encode these RLE tokens using Huffman coding, per the
first scheme, since there will be lots of repeated tokens.</p>

<p>First, pivot the word list:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pivot</span> <span class="o">=</span> <span class="s">""</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="s">""</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">w</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="k">for</span> <span class="n">w</span> <span class="ow">in</span> <span class="n">words</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">))</span>
</code></pre></div></div>

<p>Next compute the RLE token stream. The stream works in pairs, first
indicating a letter (1–26), then the run length.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tokens</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">offset</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">while</span> <span class="n">offset</span> <span class="o">&lt;</span> <span class="nb">len</span><span class="p">(</span><span class="n">pivot</span><span class="p">):</span>
    <span class="n">c</span> <span class="o">=</span> <span class="n">pivot</span><span class="p">[</span><span class="n">offset</span><span class="p">]</span>
    <span class="n">start</span> <span class="o">=</span> <span class="n">offset</span>
    <span class="k">while</span> <span class="n">offset</span> <span class="o">&lt;</span> <span class="nb">len</span><span class="p">(</span><span class="n">pivot</span><span class="p">)</span> <span class="ow">and</span> <span class="n">pivot</span><span class="p">[</span><span class="n">offset</span><span class="p">]</span> <span class="o">==</span> <span class="n">c</span><span class="p">:</span>
        <span class="n">offset</span> <span class="o">+=</span> <span class="mi">1</span>
    <span class="n">tokens</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="nb">ord</span><span class="p">(</span><span class="n">c</span><span class="p">)</span> <span class="o">-</span> <span class="nb">ord</span><span class="p">(</span><span class="s">'a'</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
    <span class="n">tokens</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">offset</span> <span class="o">-</span> <span class="n">start</span><span class="p">)</span>
</code></pre></div></div>

<p>I’ve biased the letter representation by 1 — i.e. 1–26 instead of 0–25 —
since I’m going to encode all the tokens using the same Huffman tree.
(Exercise for the reader: Does compression improve with two distinct
Huffman trees, one for letters and the other for runs?) There are no
zero-length runs, and I want there to be as few unique tokens as possible.</p>

<p><code class="language-plaintext highlighter-rouge">tokens</code> looks like so (e.g. 737 ‘a’, 909 ‘b’, …):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[1, 737, 2, 909, 3, 922, 4, 685, 5, 303, 6, 598, ...]
</code></pre></div></div>

<p>The original Wordle list results in 139 unique tokens. A few tokens appear
many times, but most of appear only once. Reusing my Huffman coding tree
builder from before:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tree</span> <span class="o">=</span> <span class="n">huffman</span><span class="p">(</span><span class="n">collections</span><span class="p">.</span><span class="n">Counter</span><span class="p">(</span><span class="n">tokens</span><span class="p">))</span>
</code></pre></div></div>

<p>This makes for a more complex and interesting tree:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(1,
 ((((18, 20), (25, (((10, 24), (26, 22)), 8))),
   (5,
    ((11,
      ((23,
        ((17,
          (((35, (46, 76)), ((82, 93), (104, 111))),
           (((165, 168), 27), (28, (((30, 39), 31), 38))))),
         ((((((40, 41), ((44, 48), 45)),
             ((53, (54, 56)), 55)),
            ((((57, 59), 58), ((60, 61), (62, 63))),
             ((64, (65, 66)), ((67, 70), 68)))),
           (((((71, 75), 74), (77, (78, 79))),
             (((80, 85), 87), 81)),
            ((((90, 91), (92, 97)), (96, (99, 100))),
             (((101, 103), 102),
              ((105, 106), (109, 110)))))),
          ((((((113, 114), 117), ((120, 121), (125, 129))),
             (((130, 133), (137, 139)), (138, (140, 142)))),
            ((((144, 145), (147, 153)), (148, (166, 175))),
             (((181, 183), (187, 189)),
              ((193, 202), (220, 242))))),
           (((((262, 303), (325, 376)),
              ((413, 489), (577, 598))),
             (((628, 638), (685, 693)),
              ((737, 815), (859, 909)))),
            ((((922, 1565), 29), 32), (34, (33, 43)))))))),
       6)),
     3))),
  ((19, 2),
   ((4, (15, (21, 16))), ((14, 9), (12, (13, 7)))))))
</code></pre></div></div>

<p>Peeking at the first 21 elements of <code class="language-plaintext highlighter-rouge">sorted(flatten(tree))</code>, which chops
off the long tail of large-valued, single-occurrence tokens:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[(1, '0'),            (8, '100111'),       (15, '111010'),
 (2, '1101'),         (9, '111101'),       (16, '1110111'),
 (3, '10111'),        (10, '10011000'),    (17, '1011010100'),
 (4, '11100'),        (11, '101100'),      (18, '10000'),
 (5, '1010'),         (12, '111110'),      (19, '1100'),
 (6, '1011011'),      (13, '1111110'),     (20, '10001'),
 (7, '1111111'),      (14, '111100'),      (21, '1110110')]
</code></pre></div></div>

<p>Huffman-encoding the RLE stream is more straightforward:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">codes</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span><span class="n">flatten</span><span class="p">(</span><span class="n">tree</span><span class="p">))</span>
<span class="n">bits</span> <span class="o">=</span> <span class="s">""</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">codes</span><span class="p">[</span><span class="n">token</span><span class="p">]</span> <span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">tokens</span><span class="p">)</span>
</code></pre></div></div>

<p>This time <code class="language-plaintext highlighter-rouge">len(bits)</code> is 164,958, or 20,620 bytes! A huge difference,
around 40% additional savings!</p>

<p>Slicing and dicing 32-bit integers and printing the table works the same
as before. However, this time the state table has larger values (e.g. that
run of 909), and so the state table will be <code class="language-plaintext highlighter-rouge">int16_t</code>. I copy-pasted the
original meta-programming code and make the appropriate adjustments:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="k">const</span> <span class="kt">int16_t</span> <span class="n">states</span><span class="p">[</span><span class="mi">277</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
      <span class="o">-</span><span class="mi">1</span><span class="p">,</span>   <span class="mi">1</span><span class="p">,</span>  <span class="o">-</span><span class="mi">3</span><span class="p">,</span>  <span class="o">-</span><span class="mi">5</span><span class="p">,</span><span class="o">-</span><span class="mi">257</span><span class="p">,</span>  <span class="o">-</span><span class="mi">7</span><span class="p">,</span> <span class="o">-</span><span class="mi">21</span><span class="p">,</span>  <span class="o">-</span><span class="mi">9</span><span class="p">,</span> <span class="o">-</span><span class="mi">11</span><span class="p">,</span>  <span class="mi">18</span><span class="p">,</span>  <span class="mi">20</span><span class="p">,</span>  <span class="mi">25</span><span class="p">,</span>
     <span class="o">-</span><span class="mi">13</span><span class="p">,</span> <span class="o">-</span><span class="mi">15</span><span class="p">,</span>   <span class="mi">8</span><span class="p">,</span> <span class="o">-</span><span class="mi">17</span><span class="p">,</span> <span class="o">-</span><span class="mi">19</span><span class="p">,</span>  <span class="mi">10</span><span class="p">,</span>  <span class="mi">24</span><span class="p">,</span>  <span class="mi">26</span><span class="p">,</span>  <span class="mi">22</span><span class="p">,</span>   <span class="mi">5</span><span class="p">,</span> <span class="o">-</span><span class="mi">23</span><span class="p">,</span> <span class="o">-</span><span class="mi">25</span><span class="p">,</span>
       <span class="mi">3</span><span class="p">,</span>  <span class="mi">11</span><span class="p">,</span> <span class="o">-</span><span class="mi">27</span><span class="p">,</span> <span class="o">-</span><span class="mi">29</span><span class="p">,</span>   <span class="mi">6</span><span class="p">,</span>  <span class="mi">23</span><span class="p">,</span> <span class="o">-</span><span class="mi">31</span><span class="p">,</span> <span class="o">-</span><span class="mi">33</span><span class="p">,</span> <span class="o">-</span><span class="mi">63</span><span class="p">,</span>  <span class="mi">17</span><span class="p">,</span> <span class="o">-</span><span class="mi">35</span><span class="p">,</span> <span class="o">-</span><span class="mi">37</span><span class="p">,</span>
     <span class="o">-</span><span class="mi">49</span><span class="p">,</span> <span class="o">-</span><span class="mi">39</span><span class="p">,</span> <span class="o">-</span><span class="mi">43</span><span class="p">,</span>  <span class="mi">35</span><span class="p">,</span> <span class="o">-</span><span class="mi">41</span><span class="p">,</span>  <span class="mi">46</span><span class="p">,</span>  <span class="mi">76</span><span class="p">,</span> <span class="o">-</span><span class="mi">45</span><span class="p">,</span> <span class="o">-</span><span class="mi">47</span><span class="p">,</span>  <span class="mi">82</span><span class="p">,</span>  <span class="mi">93</span><span class="p">,</span> <span class="mi">104</span><span class="p">,</span>
     <span class="mi">111</span><span class="p">,</span> <span class="o">-</span><span class="mi">51</span><span class="p">,</span> <span class="o">-</span><span class="mi">55</span><span class="p">,</span> <span class="o">-</span><span class="mi">53</span><span class="p">,</span>  <span class="mi">27</span><span class="p">,</span> <span class="mi">165</span><span class="p">,</span> <span class="mi">168</span><span class="p">,</span>  <span class="mi">28</span><span class="p">,</span> <span class="o">-</span><span class="mi">57</span><span class="p">,</span> <span class="o">-</span><span class="mi">59</span><span class="p">,</span>  <span class="mi">38</span><span class="p">,</span> <span class="o">-</span><span class="mi">61</span><span class="p">,</span>
      <span class="mi">31</span><span class="p">,</span>  <span class="mi">30</span><span class="p">,</span>  <span class="mi">39</span><span class="p">,</span> <span class="o">-</span><span class="mi">65</span><span class="p">,</span><span class="o">-</span><span class="mi">155</span><span class="p">,</span> <span class="o">-</span><span class="mi">67</span><span class="p">,</span><span class="o">-</span><span class="mi">109</span><span class="p">,</span> <span class="o">-</span><span class="mi">69</span><span class="p">,</span> <span class="o">-</span><span class="mi">85</span><span class="p">,</span> <span class="o">-</span><span class="mi">71</span><span class="p">,</span> <span class="o">-</span><span class="mi">79</span><span class="p">,</span> <span class="o">-</span><span class="mi">73</span><span class="p">,</span>
     <span class="o">-</span><span class="mi">75</span><span class="p">,</span>  <span class="mi">40</span><span class="p">,</span>  <span class="mi">41</span><span class="p">,</span> <span class="o">-</span><span class="mi">77</span><span class="p">,</span>  <span class="mi">45</span><span class="p">,</span>  <span class="mi">44</span><span class="p">,</span>  <span class="mi">48</span><span class="p">,</span> <span class="o">-</span><span class="mi">81</span><span class="p">,</span>  <span class="mi">55</span><span class="p">,</span>  <span class="mi">53</span><span class="p">,</span> <span class="o">-</span><span class="mi">83</span><span class="p">,</span>  <span class="mi">54</span><span class="p">,</span>
      <span class="mi">56</span><span class="p">,</span> <span class="o">-</span><span class="mi">87</span><span class="p">,</span> <span class="o">-</span><span class="mi">99</span><span class="p">,</span> <span class="o">-</span><span class="mi">89</span><span class="p">,</span> <span class="o">-</span><span class="mi">93</span><span class="p">,</span> <span class="o">-</span><span class="mi">91</span><span class="p">,</span>  <span class="mi">58</span><span class="p">,</span>  <span class="mi">57</span><span class="p">,</span>  <span class="mi">59</span><span class="p">,</span> <span class="o">-</span><span class="mi">95</span><span class="p">,</span> <span class="o">-</span><span class="mi">97</span><span class="p">,</span>  <span class="mi">60</span><span class="p">,</span>
      <span class="mi">61</span><span class="p">,</span>  <span class="mi">62</span><span class="p">,</span>  <span class="mi">63</span><span class="p">,</span><span class="o">-</span><span class="mi">101</span><span class="p">,</span><span class="o">-</span><span class="mi">105</span><span class="p">,</span>  <span class="mi">64</span><span class="p">,</span><span class="o">-</span><span class="mi">103</span><span class="p">,</span>  <span class="mi">65</span><span class="p">,</span>  <span class="mi">66</span><span class="p">,</span><span class="o">-</span><span class="mi">107</span><span class="p">,</span>  <span class="mi">68</span><span class="p">,</span>  <span class="mi">67</span><span class="p">,</span>
      <span class="mi">70</span><span class="p">,</span><span class="o">-</span><span class="mi">111</span><span class="p">,</span><span class="o">-</span><span class="mi">129</span><span class="p">,</span><span class="o">-</span><span class="mi">113</span><span class="p">,</span><span class="o">-</span><span class="mi">123</span><span class="p">,</span><span class="o">-</span><span class="mi">115</span><span class="p">,</span><span class="o">-</span><span class="mi">119</span><span class="p">,</span><span class="o">-</span><span class="mi">117</span><span class="p">,</span>  <span class="mi">74</span><span class="p">,</span>  <span class="mi">71</span><span class="p">,</span>  <span class="mi">75</span><span class="p">,</span>  <span class="mi">77</span><span class="p">,</span>
    <span class="o">-</span><span class="mi">121</span><span class="p">,</span>  <span class="mi">78</span><span class="p">,</span>  <span class="mi">79</span><span class="p">,</span><span class="o">-</span><span class="mi">125</span><span class="p">,</span>  <span class="mi">81</span><span class="p">,</span><span class="o">-</span><span class="mi">127</span><span class="p">,</span>  <span class="mi">87</span><span class="p">,</span>  <span class="mi">80</span><span class="p">,</span>  <span class="mi">85</span><span class="p">,</span><span class="o">-</span><span class="mi">131</span><span class="p">,</span><span class="o">-</span><span class="mi">143</span><span class="p">,</span><span class="o">-</span><span class="mi">133</span><span class="p">,</span>
    <span class="o">-</span><span class="mi">139</span><span class="p">,</span><span class="o">-</span><span class="mi">135</span><span class="p">,</span><span class="o">-</span><span class="mi">137</span><span class="p">,</span>  <span class="mi">90</span><span class="p">,</span>  <span class="mi">91</span><span class="p">,</span>  <span class="mi">92</span><span class="p">,</span>  <span class="mi">97</span><span class="p">,</span>  <span class="mi">96</span><span class="p">,</span><span class="o">-</span><span class="mi">141</span><span class="p">,</span>  <span class="mi">99</span><span class="p">,</span> <span class="mi">100</span><span class="p">,</span><span class="o">-</span><span class="mi">145</span><span class="p">,</span>
    <span class="o">-</span><span class="mi">149</span><span class="p">,</span><span class="o">-</span><span class="mi">147</span><span class="p">,</span> <span class="mi">102</span><span class="p">,</span> <span class="mi">101</span><span class="p">,</span> <span class="mi">103</span><span class="p">,</span><span class="o">-</span><span class="mi">151</span><span class="p">,</span><span class="o">-</span><span class="mi">153</span><span class="p">,</span> <span class="mi">105</span><span class="p">,</span> <span class="mi">106</span><span class="p">,</span> <span class="mi">109</span><span class="p">,</span> <span class="mi">110</span><span class="p">,</span><span class="o">-</span><span class="mi">157</span><span class="p">,</span>
    <span class="o">-</span><span class="mi">213</span><span class="p">,</span><span class="o">-</span><span class="mi">159</span><span class="p">,</span><span class="o">-</span><span class="mi">185</span><span class="p">,</span><span class="o">-</span><span class="mi">161</span><span class="p">,</span><span class="o">-</span><span class="mi">173</span><span class="p">,</span><span class="o">-</span><span class="mi">163</span><span class="p">,</span><span class="o">-</span><span class="mi">167</span><span class="p">,</span><span class="o">-</span><span class="mi">165</span><span class="p">,</span> <span class="mi">117</span><span class="p">,</span> <span class="mi">113</span><span class="p">,</span> <span class="mi">114</span><span class="p">,</span><span class="o">-</span><span class="mi">169</span><span class="p">,</span>
    <span class="o">-</span><span class="mi">171</span><span class="p">,</span> <span class="mi">120</span><span class="p">,</span> <span class="mi">121</span><span class="p">,</span> <span class="mi">125</span><span class="p">,</span> <span class="mi">129</span><span class="p">,</span><span class="o">-</span><span class="mi">175</span><span class="p">,</span><span class="o">-</span><span class="mi">181</span><span class="p">,</span><span class="o">-</span><span class="mi">177</span><span class="p">,</span><span class="o">-</span><span class="mi">179</span><span class="p">,</span> <span class="mi">130</span><span class="p">,</span> <span class="mi">133</span><span class="p">,</span> <span class="mi">137</span><span class="p">,</span>
     <span class="mi">139</span><span class="p">,</span> <span class="mi">138</span><span class="p">,</span><span class="o">-</span><span class="mi">183</span><span class="p">,</span> <span class="mi">140</span><span class="p">,</span> <span class="mi">142</span><span class="p">,</span><span class="o">-</span><span class="mi">187</span><span class="p">,</span><span class="o">-</span><span class="mi">199</span><span class="p">,</span><span class="o">-</span><span class="mi">189</span><span class="p">,</span><span class="o">-</span><span class="mi">195</span><span class="p">,</span><span class="o">-</span><span class="mi">191</span><span class="p">,</span><span class="o">-</span><span class="mi">193</span><span class="p">,</span> <span class="mi">144</span><span class="p">,</span>
     <span class="mi">145</span><span class="p">,</span> <span class="mi">147</span><span class="p">,</span> <span class="mi">153</span><span class="p">,</span> <span class="mi">148</span><span class="p">,</span><span class="o">-</span><span class="mi">197</span><span class="p">,</span> <span class="mi">166</span><span class="p">,</span> <span class="mi">175</span><span class="p">,</span><span class="o">-</span><span class="mi">201</span><span class="p">,</span><span class="o">-</span><span class="mi">207</span><span class="p">,</span><span class="o">-</span><span class="mi">203</span><span class="p">,</span><span class="o">-</span><span class="mi">205</span><span class="p">,</span> <span class="mi">181</span><span class="p">,</span>
     <span class="mi">183</span><span class="p">,</span> <span class="mi">187</span><span class="p">,</span> <span class="mi">189</span><span class="p">,</span><span class="o">-</span><span class="mi">209</span><span class="p">,</span><span class="o">-</span><span class="mi">211</span><span class="p">,</span> <span class="mi">193</span><span class="p">,</span> <span class="mi">202</span><span class="p">,</span> <span class="mi">220</span><span class="p">,</span> <span class="mi">242</span><span class="p">,</span><span class="o">-</span><span class="mi">215</span><span class="p">,</span><span class="o">-</span><span class="mi">245</span><span class="p">,</span><span class="o">-</span><span class="mi">217</span><span class="p">,</span>
    <span class="o">-</span><span class="mi">231</span><span class="p">,</span><span class="o">-</span><span class="mi">219</span><span class="p">,</span><span class="o">-</span><span class="mi">225</span><span class="p">,</span><span class="o">-</span><span class="mi">221</span><span class="p">,</span><span class="o">-</span><span class="mi">223</span><span class="p">,</span> <span class="mi">262</span><span class="p">,</span> <span class="mi">303</span><span class="p">,</span> <span class="mi">325</span><span class="p">,</span> <span class="mi">376</span><span class="p">,</span><span class="o">-</span><span class="mi">227</span><span class="p">,</span><span class="o">-</span><span class="mi">229</span><span class="p">,</span> <span class="mi">413</span><span class="p">,</span>
     <span class="mi">489</span><span class="p">,</span> <span class="mi">577</span><span class="p">,</span> <span class="mi">598</span><span class="p">,</span><span class="o">-</span><span class="mi">233</span><span class="p">,</span><span class="o">-</span><span class="mi">239</span><span class="p">,</span><span class="o">-</span><span class="mi">235</span><span class="p">,</span><span class="o">-</span><span class="mi">237</span><span class="p">,</span> <span class="mi">628</span><span class="p">,</span> <span class="mi">638</span><span class="p">,</span> <span class="mi">685</span><span class="p">,</span> <span class="mi">693</span><span class="p">,</span><span class="o">-</span><span class="mi">241</span><span class="p">,</span>
    <span class="o">-</span><span class="mi">243</span><span class="p">,</span> <span class="mi">737</span><span class="p">,</span> <span class="mi">815</span><span class="p">,</span> <span class="mi">859</span><span class="p">,</span> <span class="mi">909</span><span class="p">,</span><span class="o">-</span><span class="mi">247</span><span class="p">,</span><span class="o">-</span><span class="mi">253</span><span class="p">,</span><span class="o">-</span><span class="mi">249</span><span class="p">,</span>  <span class="mi">32</span><span class="p">,</span><span class="o">-</span><span class="mi">251</span><span class="p">,</span>  <span class="mi">29</span><span class="p">,</span> <span class="mi">922</span><span class="p">,</span>
    <span class="mi">1565</span><span class="p">,</span>  <span class="mi">34</span><span class="p">,</span><span class="o">-</span><span class="mi">255</span><span class="p">,</span>  <span class="mi">33</span><span class="p">,</span>  <span class="mi">43</span><span class="p">,</span><span class="o">-</span><span class="mi">259</span><span class="p">,</span><span class="o">-</span><span class="mi">261</span><span class="p">,</span>  <span class="mi">19</span><span class="p">,</span>   <span class="mi">2</span><span class="p">,</span><span class="o">-</span><span class="mi">263</span><span class="p">,</span><span class="o">-</span><span class="mi">269</span><span class="p">,</span>   <span class="mi">4</span><span class="p">,</span>
    <span class="o">-</span><span class="mi">265</span><span class="p">,</span>  <span class="mi">15</span><span class="p">,</span><span class="o">-</span><span class="mi">267</span><span class="p">,</span>  <span class="mi">21</span><span class="p">,</span>  <span class="mi">16</span><span class="p">,</span><span class="o">-</span><span class="mi">271</span><span class="p">,</span><span class="o">-</span><span class="mi">273</span><span class="p">,</span>  <span class="mi">14</span><span class="p">,</span>   <span class="mi">9</span><span class="p">,</span>  <span class="mi">12</span><span class="p">,</span><span class="o">-</span><span class="mi">275</span><span class="p">,</span>  <span class="mi">13</span><span class="p">,</span>
       <span class="mi">7</span><span class="p">,</span>
<span class="p">};</span>
</code></pre></div></div>

<p>(Since 277 is prime it will never wrap to a nice rectangle no matter what
width I plug in. Ugh.)</p>

<p>With column-wise compression it’s not possible to iterate a word at a
time. The entire list must be decompressed at once. The interface now
looks like so, where the caller supplies a <code class="language-plaintext highlighter-rouge">12972*5</code>-byte buffer to be
filled:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">decompress</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="p">);</span>
</code></pre></div></div>

<p>Exercise for the reader: Modify this to decompress into the 24-bit compact
form, so the caller only needs a <code class="language-plaintext highlighter-rouge">12972*3</code>-byte buffer.</p>

<p>Here’s my decoder, much like before:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">decompress</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">buf</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int32_t</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">164958</span><span class="p">;)</span> <span class="p">{</span>
        <span class="c1">// Decode letter</span>
        <span class="kt">int</span> <span class="n">state</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
        <span class="k">for</span> <span class="p">(;</span> <span class="n">states</span><span class="p">[</span><span class="n">state</span><span class="p">]</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
            <span class="kt">int</span> <span class="n">b</span> <span class="o">=</span> <span class="n">words</span><span class="p">[</span><span class="n">i</span><span class="o">&gt;&gt;</span><span class="mi">5</span><span class="p">]</span><span class="o">&gt;&gt;</span><span class="p">(</span><span class="n">i</span><span class="o">&amp;</span><span class="mi">31</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mi">1</span><span class="p">;</span>
            <span class="n">state</span> <span class="o">=</span> <span class="n">b</span> <span class="o">-</span> <span class="n">states</span><span class="p">[</span><span class="n">state</span><span class="p">];</span>
        <span class="p">}</span>
        <span class="kt">int</span> <span class="n">c</span> <span class="o">=</span> <span class="n">states</span><span class="p">[</span><span class="n">state</span><span class="p">]</span> <span class="o">+</span> <span class="mi">96</span><span class="p">;</span>

        <span class="c1">// Decode run-length</span>
        <span class="n">state</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
        <span class="k">for</span> <span class="p">(;</span> <span class="n">states</span><span class="p">[</span><span class="n">state</span><span class="p">]</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
            <span class="kt">int</span> <span class="n">b</span> <span class="o">=</span> <span class="n">words</span><span class="p">[</span><span class="n">i</span><span class="o">&gt;&gt;</span><span class="mi">5</span><span class="p">]</span><span class="o">&gt;&gt;</span><span class="p">(</span><span class="n">i</span><span class="o">&amp;</span><span class="mi">31</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mi">1</span><span class="p">;</span>
            <span class="n">state</span> <span class="o">=</span> <span class="n">b</span> <span class="o">-</span> <span class="n">states</span><span class="p">[</span><span class="n">state</span><span class="p">];</span>
        <span class="p">}</span>
        <span class="kt">int</span> <span class="n">len</span> <span class="o">=</span> <span class="n">states</span><span class="p">[</span><span class="n">state</span><span class="p">];</span>

        <span class="c1">// Fill columns</span>
        <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">n</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">n</span> <span class="o">&lt;</span> <span class="n">len</span><span class="p">;</span> <span class="n">n</span><span class="o">++</span><span class="p">,</span> <span class="n">y</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">buf</span><span class="p">[</span><span class="n">y</span><span class="o">*</span><span class="mi">5</span><span class="o">+</span><span class="n">x</span><span class="p">]</span> <span class="o">=</span> <span class="n">c</span><span class="p">;</span>
        <span class="p">}</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">y</span> <span class="o">==</span> <span class="mi">12972</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">y</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
            <span class="n">x</span><span class="o">++</span><span class="p">;</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And my new test exactly reproduces the original list:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">12972</span><span class="o">*</span><span class="mi">5L</span><span class="p">];</span>
    <span class="n">decompress</span><span class="p">(</span><span class="n">buf</span><span class="p">);</span>

    <span class="kt">char</span> <span class="n">word</span><span class="p">[]</span> <span class="o">=</span> <span class="s">".....</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">12972</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">memcpy</span><span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="n">buf</span><span class="o">+</span><span class="n">i</span><span class="o">*</span><span class="mi">5</span><span class="p">,</span> <span class="mi">5</span><span class="p">);</span>
        <span class="n">fwrite</span><span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">stdout</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Totalling it up:</p>

<ul>
  <li>Compressed data is 20,620 bytes</li>
  <li>State table is 554 bytes</li>
  <li>Decompressor is about 200 bytes</li>
</ul>

<p>That’s a total of 21,374 bytes. Surprisingly this beats general purpose
compressors!</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>PROGRAM     VERSION   SIZE
bzip2 -9    1.0.8     33,752
gzip -9     1.10      30,338
zstd -19    1.4.8     27,098
brotli -9   1.0.9     26,031
xz -9e      5.2.5     16,656
lzip -9     1.22      16,608
</code></pre></div></div>

<p>Only <code class="language-plaintext highlighter-rouge">xz</code> and <code class="language-plaintext highlighter-rouge">lzip</code> come out ahead on the raw compressed data, but lose
if accounting for an embedded decompressor (on the order of 10kB). Clearly
there’s an advantage to customizing compression to a particular dataset.</p>

<p><em>Update</em>: <a href="https://lists.sr.ht/~skeeto/public-inbox/%3CCAKF7Hnc4nVKS%3D2adUjyiRb5yBZUdw5z0K_Fb9kFbaW5S6i7POw%40mail.gmail.com%3E">Johannes Rudolph has pointed out</a> a compression scheme for
a Game Boy Wordle clone last month that gets it <a href="http://alexanderpruss.blogspot.com/2022/02/game-boy-wordle-how-to-compress-12972.html">down to 17,871 bytes,
<em>and</em> supports iteration</a>. I improved on this scheme to <a href="https://github.com/skeeto/scratch/blob/master/misc/wordle.c">further
reduce it to 16,659 bytes</a>.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>OpenBSD's pledge and unveil from Python</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2021/09/15/"/>
    <id>urn:uuid:cd3857dd-270c-430e-824d-6512688687a3</id>
    <updated>2021-09-15T02:46:56Z</updated>
    <category term="bsd"/><category term="c"/><category term="python"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=28535255">on Hacker News</a>.</em></p>

<p>Years ago, OpenBSD gained two new security system calls, <a href="https://man.openbsd.org/pledge.2"><code class="language-plaintext highlighter-rouge">pledge(2)</code></a>
(originally <a href="https://www.openbsd.org/papers/tame-fsec2015/mgp00001.html"><code class="language-plaintext highlighter-rouge">tame(2)</code></a>) and <a href="https://man.openbsd.org/unveil.2"><code class="language-plaintext highlighter-rouge">unveil</code></a>. In both, an application
surrenders capabilities at run-time. The idea is to perform initialization
like usual, then drop capabilities before handling untrusted input,
limiting unwanted side effects. This feature is applicable even where type
safety isn’t an issue, such as Python, where a program might still get
tricked into accessing sensitive files or making network connections when
it shouldn’t. So how can a Python program access these system calls?</p>

<p>As <a href="/blog/2021/06/29/">discussed previously</a>, it’s quite easy to access C APIs from
Python through its <a href="https://docs.python.org/3/library/ctypes.html"><code class="language-plaintext highlighter-rouge">ctypes</code></a> package, and this is no exception.
In this article I show how to do it. Here’s the full source if you want to
dive in: <a href="https://github.com/skeeto/scratch/tree/master/misc/openbsd.py"><strong><code class="language-plaintext highlighter-rouge">openbsd.py</code></strong></a>.</p>

<!--more-->

<p>I’ve chosen these extra constraints:</p>

<ul>
  <li>
    <p>As extra safety features, unnecessary for correctness, attempts to call
these functions on systems where they don’t exist will silently do
nothing, as though they succeeded. They’re provided as a best effort.</p>
  </li>
  <li>
    <p>Systems other than OpenBSD may support these functions, now or in the
future, and it would be nice to automatically make use of them when
available. This means no checking for OpenBSD specifically but instead
<em>feature sniffing</em> for their presence.</p>
  </li>
  <li>
    <p>The interfaces should be Pythonic as though they were implemented in
Python itself. Raise exceptions for errors, and accept strings since
they’re more convenient than bytes.</p>
  </li>
</ul>

<p>For reference, here are the function prototypes:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">pledge</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">promises</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">execpromises</span><span class="p">);</span>
<span class="kt">int</span> <span class="nf">unveil</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">path</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">permissions</span><span class="p">);</span>
</code></pre></div></div>

<p>The <a href="https://flak.tedunangst.com/post/string-interfaces">string-oriented interface of <code class="language-plaintext highlighter-rouge">pledge</code></a> will make this a whole
lot easier to implement.</p>

<h3 id="finding-the-functions">Finding the functions</h3>

<p>The first step is to grab functions through <code class="language-plaintext highlighter-rouge">ctypes</code>. Like a lot of Python
documentation, this area is frustratingly imprecise and under-documented.
I want to grab a handle to the already-linked libc and search for either
function. However, getting that handle is a little different on each
platform, and in the process I saw four different exceptions, only one of
which is documented.</p>

<p>I came up with passing None to <code class="language-plaintext highlighter-rouge">ctypes.CDLL</code>, which ultimately just passes
<code class="language-plaintext highlighter-rouge">NULL</code> to <a href="https://man.openbsd.org/dlopen.3"><code class="language-plaintext highlighter-rouge">dlopen(3)</code></a>. That’s really all I wanted. Currently on
Windows this is a TypeError. Once the handle is in hand, try to access the
<code class="language-plaintext highlighter-rouge">pledge</code> attribute, which will fail with AttributeError if it doesn’t
exist. In the event of any exception, just assume the behavior isn’t
available. If found, I also define the function prototype for <code class="language-plaintext highlighter-rouge">ctypes</code>.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">_pledge</span> <span class="o">=</span> <span class="bp">None</span>
<span class="k">try</span><span class="p">:</span>
    <span class="n">_pledge</span> <span class="o">=</span> <span class="n">ctypes</span><span class="p">.</span><span class="n">CDLL</span><span class="p">(</span><span class="bp">None</span><span class="p">,</span> <span class="n">use_errno</span><span class="o">=</span><span class="bp">True</span><span class="p">).</span><span class="n">pledge</span>
    <span class="n">_pledge</span><span class="p">.</span><span class="n">restype</span> <span class="o">=</span> <span class="n">ctypes</span><span class="p">.</span><span class="n">c_int</span>
    <span class="n">_pledge</span><span class="p">.</span><span class="n">argtypes</span> <span class="o">=</span> <span class="n">ctypes</span><span class="p">.</span><span class="n">c_char_p</span><span class="p">,</span> <span class="n">ctypes</span><span class="p">.</span><span class="n">c_char_p</span>
<span class="k">except</span> <span class="nb">Exception</span><span class="p">:</span>
    <span class="n">_pledge</span> <span class="o">=</span> <span class="bp">None</span>
</code></pre></div></div>

<p>Catching a broad Exception isn’t great, but it’s the best we can do since
the documentation is incomplete. From this block I’ve seen TypeError,
AttributeError, FileNotFoundError, and OSError. I wouldn’t be surprised if
there are more possibilities, and I don’t want to risk missing them.</p>

<p>Note that I’m catching Exception rather than using a bare <code class="language-plaintext highlighter-rouge">except</code>. My
code will not catch KeyboardInterrupt nor SystemExit. This is deliberate,
and I never want to catch these.</p>

<p>The same story for <code class="language-plaintext highlighter-rouge">unveil</code>:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">_unveil</span> <span class="o">=</span> <span class="bp">None</span>
<span class="k">try</span><span class="p">:</span>
    <span class="n">_unveil</span> <span class="o">=</span> <span class="n">ctypes</span><span class="p">.</span><span class="n">CDLL</span><span class="p">(</span><span class="bp">None</span><span class="p">,</span> <span class="n">use_errno</span><span class="o">=</span><span class="bp">True</span><span class="p">).</span><span class="n">unveil</span>
    <span class="n">_unveil</span><span class="p">.</span><span class="n">restype</span> <span class="o">=</span> <span class="n">ctypes</span><span class="p">.</span><span class="n">c_int</span>
    <span class="n">_unveil</span><span class="p">.</span><span class="n">argtypes</span> <span class="o">=</span> <span class="n">ctypes</span><span class="p">.</span><span class="n">c_char_p</span><span class="p">,</span> <span class="n">ctypes</span><span class="p">.</span><span class="n">c_char_p</span>
<span class="k">except</span> <span class="nb">Exception</span><span class="p">:</span>
    <span class="n">_unveil</span> <span class="o">=</span> <span class="bp">None</span>
</code></pre></div></div>

<h3 id="pythonic-wrappers">Pythonic wrappers</h3>

<p>The next and final step is to wrap the low-level call in an interface that
hides their C and <code class="language-plaintext highlighter-rouge">ctypes</code> nature.</p>

<p>Python strings must be encoded to bytes before they can be passed to C
functions. Rather than make the caller worry about this, we’ll let them
pass friendly strings and have the wrapper do the conversion. Either may
also be <code class="language-plaintext highlighter-rouge">NULL</code>, so None is allowed.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">pledge</span><span class="p">(</span><span class="n">promises</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">],</span> <span class="n">execpromises</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]):</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="n">_pledge</span><span class="p">:</span>
        <span class="k">return</span>  <span class="c1"># unimplemented
</span>
    <span class="n">r</span> <span class="o">=</span> <span class="n">_pledge</span><span class="p">(</span><span class="bp">None</span> <span class="k">if</span> <span class="n">promises</span> <span class="ow">is</span> <span class="bp">None</span> <span class="k">else</span> <span class="n">promises</span><span class="p">.</span><span class="n">encode</span><span class="p">(),</span>
                <span class="bp">None</span> <span class="k">if</span> <span class="n">execpromises</span> <span class="ow">is</span> <span class="bp">None</span> <span class="k">else</span> <span class="n">execpromises</span><span class="p">.</span><span class="n">encode</span><span class="p">())</span>
    <span class="k">if</span> <span class="n">r</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">:</span>
        <span class="n">errno</span> <span class="o">=</span> <span class="n">ctypes</span><span class="p">.</span><span class="n">get_errno</span><span class="p">()</span>
        <span class="k">raise</span> <span class="nb">OSError</span><span class="p">(</span><span class="n">errno</span><span class="p">,</span> <span class="n">os</span><span class="p">.</span><span class="n">strerror</span><span class="p">(</span><span class="n">errno</span><span class="p">))</span>
</code></pre></div></div>

<p>As usual, a return of -1 means there was an error, in which case we fetch
<code class="language-plaintext highlighter-rouge">errno</code> and raise the appropriate OSError.</p>

<p><code class="language-plaintext highlighter-rouge">unveil</code> works a little differently since the first argument is a path.
Python functions that accept paths, such as <code class="language-plaintext highlighter-rouge">open</code>, generally accept
either strings or bytes. On unix-like systems, <a href="https://simonsapin.github.io/wtf-8/">paths are fundamentally
bytestrings</a> and not necessarily Unicode, so it’s necessary to accept
bytes. Since strings are nearly always more convenient, they take both.
The <code class="language-plaintext highlighter-rouge">unveil</code> wrapper here will do the same. If it’s a string, encode it,
otherwise pass it straight through.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">unveil</span><span class="p">(</span><span class="n">path</span><span class="p">:</span> <span class="n">Union</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">bytes</span><span class="p">,</span> <span class="bp">None</span><span class="p">],</span> <span class="n">permissions</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">str</span><span class="p">]):</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="n">_unveil</span><span class="p">:</span>
        <span class="k">return</span>  <span class="c1"># unimplemented
</span>
    <span class="n">r</span> <span class="o">=</span> <span class="n">_unveil</span><span class="p">(</span><span class="n">path</span><span class="p">.</span><span class="n">encode</span><span class="p">()</span> <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="nb">str</span><span class="p">)</span> <span class="k">else</span> <span class="n">path</span><span class="p">,</span>
                <span class="bp">None</span> <span class="k">if</span> <span class="n">permissions</span> <span class="ow">is</span> <span class="bp">None</span> <span class="k">else</span> <span class="n">permissions</span><span class="p">.</span><span class="n">encode</span><span class="p">())</span>
    <span class="k">if</span> <span class="n">r</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">:</span>
        <span class="n">errno</span> <span class="o">=</span> <span class="n">ctypes</span><span class="p">.</span><span class="n">get_errno</span><span class="p">()</span>
        <span class="k">raise</span> <span class="nb">OSError</span><span class="p">(</span><span class="n">errno</span><span class="p">,</span> <span class="n">os</span><span class="p">.</span><span class="n">strerror</span><span class="p">(</span><span class="n">errno</span><span class="p">))</span>
</code></pre></div></div>

<p>That’s it!</p>

<h3 id="trying-it-out">Trying it out</h3>

<p>Let’s start with <code class="language-plaintext highlighter-rouge">unveil</code>. Initially a process has access to the whole
file system with the usual restrictions. On the first call to <code class="language-plaintext highlighter-rouge">unveil</code>
it’s immediately restricted to some subset of the tree. Each call reveals
a little more until a final <code class="language-plaintext highlighter-rouge">NULL</code> which locks it in place for the rest of
the process’s existence.</p>

<p>Suppose a program has been tricked into accessing your shell history,
perhaps by mishandling a path:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">hackme</span><span class="p">():</span>
    <span class="k">try</span><span class="p">:</span>
        <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">pathlib</span><span class="p">.</span><span class="n">Path</span><span class="p">.</span><span class="n">home</span><span class="p">()</span> <span class="o">/</span> <span class="s">".bash_history"</span><span class="p">):</span>
            <span class="k">print</span><span class="p">(</span><span class="s">"You've been hacked!"</span><span class="p">)</span>
    <span class="k">except</span> <span class="nb">FileNotFoundError</span><span class="p">:</span>
        <span class="k">print</span><span class="p">(</span><span class="s">"Blocked by unveil."</span><span class="p">)</span>

<span class="n">hackme</span><span class="p">()</span>
</code></pre></div></div>

<p>If you’re a Bash user, this prints:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>You've been hacked!
</code></pre></div></div>

<p>Using our new feature to restrict the program’s access first:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># restrict access to static program data
</span><span class="n">unveil</span><span class="p">(</span><span class="s">"/usr/share"</span><span class="p">,</span> <span class="s">"r"</span><span class="p">)</span>
<span class="n">unveil</span><span class="p">(</span><span class="bp">None</span><span class="p">,</span> <span class="bp">None</span><span class="p">)</span>

<span class="n">hackme</span><span class="p">()</span>
</code></pre></div></div>

<p>On OpenBSD this now prints:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Blocked by unveil.
</code></pre></div></div>

<p>Working just as it should!</p>

<p>With <code class="language-plaintext highlighter-rouge">pledge</code> we declare what abilities we’d like to keep by supplying a
list of promises, <em>pledging</em> to use only those abilities afterward. A
common case is the <code class="language-plaintext highlighter-rouge">stdio</code> promise which allows reading and writing of
open files, but not <em>opening</em> files. A program might open its log file,
then drop the ability to open files while retaining the ability to write
to its log.</p>

<p>An invalid or unknown promise is an error. Does that work?</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt;&gt;&gt; pledge("doesntexist", None)
OSError: [Errno 22] Invalid argument
</code></pre></div></div>

<p>So far so good. How about the functionality itself?</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pledge</span><span class="p">(</span><span class="s">"stdio"</span><span class="p">,</span> <span class="bp">None</span><span class="p">)</span>
<span class="n">hackme</span><span class="p">()</span>
</code></pre></div></div>

<p>The program is instantly killed when making the disallowed system call:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Abort trap (core dumped)
</code></pre></div></div>

<p>If you want something a little softer, include the <code class="language-plaintext highlighter-rouge">error</code> promise:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pledge</span><span class="p">(</span><span class="s">"stdio error"</span><span class="p">,</span> <span class="bp">None</span><span class="p">)</span>
<span class="n">hackme</span><span class="p">()</span>
</code></pre></div></div>

<p>Instead it’s an exception, which will be a lot easier to debug when it
comes to Python, so you probably always want to use it.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>OSError: [Errno 78] Function not implemented
</code></pre></div></div>

<p>The core dump isn’t going to be much help to a Python program, so you
probably always want to use this promise. In general you need to be extra
careful about <code class="language-plaintext highlighter-rouge">pledge</code> in complex runtimes like Python’s which may
reasonably need to do many arbitrary, undocumented things at any time.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>State machines are wonderful tools</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2020/12/31/"/>
    <id>urn:uuid:c93d7a7b-6ae0-4b7e-afa6-424ef40b9d9c</id>
    <updated>2020-12-31T22:48:13Z</updated>
    <category term="compsci"/><category term="c"/><category term="python"/><category term="lua"/><category term="optimization"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=25601821">on Hacker News</a>.</em></p>

<p>I love when my current problem can be solved with a state machine. They’re
fun to design and implement, and I have high confidence about correctness.
They tend to:</p>

<ol>
  <li>Present <a href="/blog/2018/06/10/">minimal, tidy interfaces</a></li>
  <li>Require few, fixed resources</li>
  <li>Hold no opinions about input and output</li>
  <li>Have a compact, concise implementation</li>
  <li>Be easy to reason about</li>
</ol>

<p>State machines are perhaps one of those concepts you heard about in
college but never put into practice. Maybe you use them regularly.
Regardless, you certainly run into them regularly, from <a href="https://swtch.com/~rsc/regexp/">regular
expressions</a> to traffic lights.</p>

<!--more-->

<h3 id="morse-code-decoder-state-machine">Morse code decoder state machine</h3>

<p>Inspired by <a href="https://possiblywrong.wordpress.com/2020/11/21/among-us-morse-code-puzzle/">a puzzle</a>, I came up with this deterministic state
machine for decoding <a href="https://en.wikipedia.org/wiki/Morse_code">Morse code</a>. It accepts a dot (<code class="language-plaintext highlighter-rouge">'.'</code>), dash
(<code class="language-plaintext highlighter-rouge">'-'</code>), or terminator (0) one at a time, advancing through a state
machine step by step:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">morse_decode</span><span class="p">(</span><span class="kt">int</span> <span class="n">state</span><span class="p">,</span> <span class="kt">int</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">static</span> <span class="k">const</span> <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">t</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
        <span class="mh">0x03</span><span class="p">,</span> <span class="mh">0x3f</span><span class="p">,</span> <span class="mh">0x7b</span><span class="p">,</span> <span class="mh">0x4f</span><span class="p">,</span> <span class="mh">0x2f</span><span class="p">,</span> <span class="mh">0x63</span><span class="p">,</span> <span class="mh">0x5f</span><span class="p">,</span> <span class="mh">0x77</span><span class="p">,</span> <span class="mh">0x7f</span><span class="p">,</span> <span class="mh">0x72</span><span class="p">,</span>
        <span class="mh">0x87</span><span class="p">,</span> <span class="mh">0x3b</span><span class="p">,</span> <span class="mh">0x57</span><span class="p">,</span> <span class="mh">0x47</span><span class="p">,</span> <span class="mh">0x67</span><span class="p">,</span> <span class="mh">0x4b</span><span class="p">,</span> <span class="mh">0x81</span><span class="p">,</span> <span class="mh">0x40</span><span class="p">,</span> <span class="mh">0x01</span><span class="p">,</span> <span class="mh">0x58</span><span class="p">,</span>
        <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x68</span><span class="p">,</span> <span class="mh">0x51</span><span class="p">,</span> <span class="mh">0x32</span><span class="p">,</span> <span class="mh">0x88</span><span class="p">,</span> <span class="mh">0x34</span><span class="p">,</span> <span class="mh">0x8c</span><span class="p">,</span> <span class="mh">0x92</span><span class="p">,</span> <span class="mh">0x6c</span><span class="p">,</span> <span class="mh">0x02</span><span class="p">,</span>
        <span class="mh">0x03</span><span class="p">,</span> <span class="mh">0x18</span><span class="p">,</span> <span class="mh">0x14</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x10</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x0c</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span>
        <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x08</span><span class="p">,</span> <span class="mh">0x1c</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span>
        <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x20</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x24</span><span class="p">,</span>
        <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x28</span><span class="p">,</span> <span class="mh">0x04</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x30</span><span class="p">,</span> <span class="mh">0x31</span><span class="p">,</span> <span class="mh">0x32</span><span class="p">,</span> <span class="mh">0x33</span><span class="p">,</span> <span class="mh">0x34</span><span class="p">,</span> <span class="mh">0x35</span><span class="p">,</span>
        <span class="mh">0x36</span><span class="p">,</span> <span class="mh">0x37</span><span class="p">,</span> <span class="mh">0x38</span><span class="p">,</span> <span class="mh">0x39</span><span class="p">,</span> <span class="mh">0x41</span><span class="p">,</span> <span class="mh">0x42</span><span class="p">,</span> <span class="mh">0x43</span><span class="p">,</span> <span class="mh">0x44</span><span class="p">,</span> <span class="mh">0x45</span><span class="p">,</span> <span class="mh">0x46</span><span class="p">,</span>
        <span class="mh">0x47</span><span class="p">,</span> <span class="mh">0x48</span><span class="p">,</span> <span class="mh">0x49</span><span class="p">,</span> <span class="mh">0x4a</span><span class="p">,</span> <span class="mh">0x4b</span><span class="p">,</span> <span class="mh">0x4c</span><span class="p">,</span> <span class="mh">0x4d</span><span class="p">,</span> <span class="mh">0x4e</span><span class="p">,</span> <span class="mh">0x4f</span><span class="p">,</span> <span class="mh">0x50</span><span class="p">,</span>
        <span class="mh">0x51</span><span class="p">,</span> <span class="mh">0x52</span><span class="p">,</span> <span class="mh">0x53</span><span class="p">,</span> <span class="mh">0x54</span><span class="p">,</span> <span class="mh">0x55</span><span class="p">,</span> <span class="mh">0x56</span><span class="p">,</span> <span class="mh">0x57</span><span class="p">,</span> <span class="mh">0x58</span><span class="p">,</span> <span class="mh">0x59</span><span class="p">,</span> <span class="mh">0x5a</span>
    <span class="p">};</span>
    <span class="kt">int</span> <span class="n">v</span> <span class="o">=</span> <span class="n">t</span><span class="p">[</span><span class="o">-</span><span class="n">state</span><span class="p">];</span>
    <span class="k">switch</span> <span class="p">(</span><span class="n">c</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">case</span> <span class="mh">0x00</span><span class="p">:</span> <span class="k">return</span> <span class="n">v</span> <span class="o">&gt;&gt;</span> <span class="mi">2</span> <span class="o">?</span> <span class="n">t</span><span class="p">[(</span><span class="n">v</span> <span class="o">&gt;&gt;</span> <span class="mi">2</span><span class="p">)</span> <span class="o">+</span> <span class="mi">63</span><span class="p">]</span> <span class="o">:</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">case</span> <span class="mh">0x2e</span><span class="p">:</span> <span class="k">return</span> <span class="n">v</span> <span class="o">&amp;</span>  <span class="mi">2</span> <span class="o">?</span> <span class="n">state</span><span class="o">*</span><span class="mi">2</span> <span class="o">-</span> <span class="mi">1</span> <span class="o">:</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">case</span> <span class="mh">0x2d</span><span class="p">:</span> <span class="k">return</span> <span class="n">v</span> <span class="o">&amp;</span>  <span class="mi">1</span> <span class="o">?</span> <span class="n">state</span><span class="o">*</span><span class="mi">2</span> <span class="o">-</span> <span class="mi">2</span> <span class="o">:</span> <span class="mi">0</span><span class="p">;</span>
    <span class="nl">default:</span>   <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>It typically compiles to under 200 bytes (table included), requires only a
few bytes of memory to operate, and will fit on even the smallest of
microcontrollers. The full source listing, documentation, and
comprehensive test suite:</p>

<p><a href="https://github.com/skeeto/scratch/blob/master/parsers/morsecode.c">https://github.com/skeeto/scratch/blob/master/parsers/morsecode.c</a></p>

<p>The state machine is trie-shaped, and the 100-byte table <code class="language-plaintext highlighter-rouge">t</code> is the static
<a href="/blog/2016/11/15/">encoding of the Morse code trie</a>:</p>

<p><a href="/img/diagram/morse.dot"><img src="/img/diagram/morse.svg" alt="" /></a></p>

<p>Dots traverse left, dashes right, terminals emit the character at the
current node (terminal state). Stopping on red nodes, or attempting to
take an unlisted edge is an error (invalid input).</p>

<p>Each node in the trie is a byte in the table. Dot and dash each have a bit
indicating if their edge exists. The remaining bits index into a 1-based
character table (at the end of <code class="language-plaintext highlighter-rouge">t</code>), and a 0 “index” indicates an empty
(red) node. The nodes themselves are laid out as <a href="https://en.wikipedia.org/wiki/Binary_heap#Heap_implementation">a binary heap in an
array</a>: the left and right children of the node at <code class="language-plaintext highlighter-rouge">i</code> are found at
<code class="language-plaintext highlighter-rouge">i*2+1</code> and <code class="language-plaintext highlighter-rouge">i*2+2</code>. No need to <a href="/blog/2020/10/19/#minimax-costs">waste memory storing edges</a>!</p>

<p>Since C sadly does not have multiple return values, I’m using the sign bit
of the return value to create a kind of sum type. A negative return value
is a state — which is why the state is negated internally before use. A
positive result is a character output. If zero, the input was invalid.
Only the initial state is non-negative (zero), which is fine since it’s,
by definition, not possible to traverse to the initial state. No <code class="language-plaintext highlighter-rouge">c</code> input
will produce a bad state.</p>

<p>In the original problem the terminals were missing. Despite being a <em>state
machine</em>, <code class="language-plaintext highlighter-rouge">morse_decode</code> is a pure function. The caller can save their
position in the trie by saving the state integer and trying different
inputs from that state.</p>

<h3 id="utf-8-decoder-state-machine">UTF-8 decoder state machine</h3>

<p>The classic UTF-8 decoder state machine is <a href="https://bjoern.hoehrmann.de/utf-8/decoder/dfa/">Bjoern Hoehrmann’s Flexible
and Economical UTF-8 Decoder</a>. It packs the entire state machine into
a relatively small table using clever tricks. It’s easily my favorite
UTF-8 decoder.</p>

<p>I wanted to try my own hand at it, so I re-derived the same canonical
UTF-8 automaton:</p>

<p><a href="/img/diagram/utf8.dot"><img src="/img/diagram/utf8.svg" alt="" /></a></p>

<p>Then I encoded this diagram directly into a much larger (2,064-byte), less
elegant table, too large to display inline here:</p>

<p><a href="https://github.com/skeeto/scratch/blob/master/parsers/utf8_decode.c">https://github.com/skeeto/scratch/blob/master/parsers/utf8_decode.c</a></p>

<p>However, the trade-off is that the executable code is smaller, faster, and
<a href="/blog/2017/10/06/">branchless again</a> (by accident, I swear!):</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">utf8_decode</span><span class="p">(</span><span class="kt">int</span> <span class="n">state</span><span class="p">,</span> <span class="kt">long</span> <span class="o">*</span><span class="n">cp</span><span class="p">,</span> <span class="kt">int</span> <span class="n">byte</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">static</span> <span class="k">const</span> <span class="kt">signed</span> <span class="kt">char</span> <span class="n">table</span><span class="p">[</span><span class="mi">8</span><span class="p">][</span><span class="mi">256</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="cm">/* ... */</span> <span class="p">};</span>
    <span class="k">static</span> <span class="k">const</span> <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">masks</span><span class="p">[</span><span class="mi">2</span><span class="p">][</span><span class="mi">8</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="cm">/* ... */</span> <span class="p">};</span>
    <span class="kt">int</span> <span class="n">next</span> <span class="o">=</span> <span class="n">table</span><span class="p">[</span><span class="n">state</span><span class="p">][</span><span class="n">byte</span><span class="p">];</span>
    <span class="o">*</span><span class="n">cp</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">cp</span> <span class="o">&lt;&lt;</span> <span class="mi">6</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">byte</span> <span class="o">&amp;</span> <span class="n">masks</span><span class="p">[</span><span class="o">!</span><span class="n">state</span><span class="p">][</span><span class="n">next</span><span class="o">&amp;</span><span class="mi">7</span><span class="p">]);</span>
    <span class="k">return</span> <span class="n">next</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Like Bjoern’s decoder, there’s a code point accumulator. The <em>real</em> state
machine has 1,109,950 terminal states, and many more edges and nodes. The
accumulator is an optimization to track exactly which edge was taken to
which node without having to represent such a monstrosity.</p>

<p>Despite the huge table I’m pretty happy with it.</p>

<h3 id="word-count-state-machine">Word count state machine</h3>

<p>Here’s another state machine I came up with awhile back for counting words
one Unicode code point at a time while accounting for Unicode’s various
kinds of whitespace. If your input is bytes, then plug this into the above
UTF-8 state machine to convert bytes to code points! This one uses a
switch instead of a lookup table since the table would be sparse (i.e.
<a href="/blog/2019/12/09/">let the compiler figure it out</a>).</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* State machine counting words in a sequence of code points.
 *
 * The current word count is the absolute value of the state, so
 * the initial state is zero. Code points are fed into the state
 * machine one at a time, each call returning the next state.
 */</span>
<span class="kt">long</span> <span class="nf">word_count</span><span class="p">(</span><span class="kt">long</span> <span class="n">state</span><span class="p">,</span> <span class="kt">long</span> <span class="n">codepoint</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">switch</span> <span class="p">(</span><span class="n">codepoint</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">case</span> <span class="mh">0x0009</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x000a</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x000b</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x000c</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x000d</span><span class="p">:</span>
    <span class="k">case</span> <span class="mh">0x0020</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x0085</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x00a0</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x1680</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2000</span><span class="p">:</span>
    <span class="k">case</span> <span class="mh">0x2001</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2002</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2003</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2004</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2005</span><span class="p">:</span>
    <span class="k">case</span> <span class="mh">0x2006</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2007</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2008</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2009</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x200a</span><span class="p">:</span>
    <span class="k">case</span> <span class="mh">0x2028</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2029</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x202f</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x205f</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x3000</span><span class="p">:</span>
        <span class="k">return</span> <span class="n">state</span> <span class="o">&lt;</span> <span class="mi">0</span> <span class="o">?</span> <span class="o">-</span><span class="n">state</span> <span class="o">:</span> <span class="n">state</span><span class="p">;</span>
    <span class="nl">default:</span>
        <span class="k">return</span> <span class="n">state</span> <span class="o">&lt;</span> <span class="mi">0</span> <span class="o">?</span> <span class="n">state</span> <span class="o">:</span> <span class="o">-</span><span class="mi">1</span> <span class="o">-</span> <span class="n">state</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I’m particularly happy with the <em>edge-triggered</em> state transition
mechanism. The sign of the state tracks whether the “signal” is “high”
(inside of a word) or “low” (outside of a word), and so it counts rising
edges.</p>

<p><a href="/img/diagram/wordcount.dot"><img src="/img/diagram/wordcount.svg" alt="" /></a></p>

<p>The counter is not <em>technically</em> part of the state machine — though it
eventually overflows for practical reasons, it isn’t really “finite” — but
is rather an external count of the times the state machine transitions
from low to high, which is the actual, useful output.</p>

<p><em>Reader challenge</em>: Find a slick, efficient way to encode all those code
points as a table rather than rely on whatever the compiler generates for
the <code class="language-plaintext highlighter-rouge">switch</code> (chain of branches, jump table?).</p>

<h3 id="coroutines-and-generators-as-state-machines">Coroutines and generators as state machines</h3>

<p>In languages that support them, state machines can be implemented using
coroutines, including generators. I do particularly like the idea of
<a href="/blog/2018/05/31/">compiler-synthesized coroutines</a> as state machines, though this is a
rare treat. The state is implicit in the coroutine at each yield, so the
programmer doesn’t have to manage it explicitly. (Though often that
explicit control is powerful!)</p>

<p>Unfortunately in practice it always feels clunky. The following implements
the word count state machine (albeit in a rather un-Pythonic way). The
generator returns the current count and is continued by sending it another
code point:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">WHITESPACE</span> <span class="o">=</span> <span class="p">{</span>
    <span class="mh">0x0009</span><span class="p">,</span> <span class="mh">0x000a</span><span class="p">,</span> <span class="mh">0x000b</span><span class="p">,</span> <span class="mh">0x000c</span><span class="p">,</span> <span class="mh">0x000d</span><span class="p">,</span>
    <span class="mh">0x0020</span><span class="p">,</span> <span class="mh">0x0085</span><span class="p">,</span> <span class="mh">0x00a0</span><span class="p">,</span> <span class="mh">0x1680</span><span class="p">,</span> <span class="mh">0x2000</span><span class="p">,</span>
    <span class="mh">0x2001</span><span class="p">,</span> <span class="mh">0x2002</span><span class="p">,</span> <span class="mh">0x2003</span><span class="p">,</span> <span class="mh">0x2004</span><span class="p">,</span> <span class="mh">0x2005</span><span class="p">,</span>
    <span class="mh">0x2006</span><span class="p">,</span> <span class="mh">0x2007</span><span class="p">,</span> <span class="mh">0x2008</span><span class="p">,</span> <span class="mh">0x2009</span><span class="p">,</span> <span class="mh">0x200a</span><span class="p">,</span>
    <span class="mh">0x2028</span><span class="p">,</span> <span class="mh">0x2029</span><span class="p">,</span> <span class="mh">0x202f</span><span class="p">,</span> <span class="mh">0x205f</span><span class="p">,</span> <span class="mh">0x3000</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">def</span> <span class="nf">wordcount</span><span class="p">():</span>
    <span class="n">count</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
        <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
            <span class="c1"># low signal
</span>            <span class="n">codepoint</span> <span class="o">=</span> <span class="k">yield</span> <span class="n">count</span>
            <span class="k">if</span> <span class="n">codepoint</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">WHITESPACE</span><span class="p">:</span>
                <span class="n">count</span> <span class="o">+=</span> <span class="mi">1</span>
                <span class="k">break</span>
        <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
            <span class="c1"># high signal
</span>            <span class="n">codepoint</span> <span class="o">=</span> <span class="k">yield</span> <span class="n">count</span>
            <span class="k">if</span> <span class="n">codepoint</span> <span class="ow">in</span> <span class="n">WHITESPACE</span><span class="p">:</span>
                <span class="k">break</span>
</code></pre></div></div>

<p>However, the generator ceremony dominates the interface, so you’d probably
want to wrap it in something nicer — at which point there’s really no
reason to use the generator in the first place:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">wc</span> <span class="o">=</span> <span class="n">wordcount</span><span class="p">()</span>
<span class="nb">next</span><span class="p">(</span><span class="n">wc</span><span class="p">)</span>  <span class="c1"># prime the generator
</span><span class="n">wc</span><span class="p">.</span><span class="n">send</span><span class="p">(</span><span class="nb">ord</span><span class="p">(</span><span class="s">'A'</span><span class="p">))</span>  <span class="c1"># =&gt; 1
</span><span class="n">wc</span><span class="p">.</span><span class="n">send</span><span class="p">(</span><span class="nb">ord</span><span class="p">(</span><span class="s">' '</span><span class="p">))</span>  <span class="c1"># =&gt; 1
</span><span class="n">wc</span><span class="p">.</span><span class="n">send</span><span class="p">(</span><span class="nb">ord</span><span class="p">(</span><span class="s">'B'</span><span class="p">))</span>  <span class="c1"># =&gt; 2
</span><span class="n">wc</span><span class="p">.</span><span class="n">send</span><span class="p">(</span><span class="nb">ord</span><span class="p">(</span><span class="s">' '</span><span class="p">))</span>  <span class="c1"># =&gt; 2
</span></code></pre></div></div>

<p>Same idea in Lua, which famously has full coroutines:</p>

<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">local</span> <span class="n">WHITESPACE</span> <span class="o">=</span> <span class="p">{</span>
    <span class="p">[</span><span class="mh">0x0009</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x000a</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x000b</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x000c</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,</span>
    <span class="p">[</span><span class="mh">0x000d</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x0020</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x0085</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x00a0</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,</span>
    <span class="p">[</span><span class="mh">0x1680</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2000</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2001</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2002</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,</span>
    <span class="p">[</span><span class="mh">0x2003</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2004</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2005</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2006</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,</span>
    <span class="p">[</span><span class="mh">0x2007</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2008</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2009</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x200a</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,</span>
    <span class="p">[</span><span class="mh">0x2028</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2029</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x202f</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x205f</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,</span>
    <span class="p">[</span><span class="mh">0x3000</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span>
<span class="p">}</span>

<span class="k">function</span> <span class="nf">wordcount</span><span class="p">()</span>
    <span class="kd">local</span> <span class="n">count</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="k">while</span> <span class="kc">true</span> <span class="k">do</span>
        <span class="k">while</span> <span class="kc">true</span> <span class="k">do</span>
            <span class="c1">-- low signal</span>
            <span class="kd">local</span> <span class="n">codepoint</span> <span class="o">=</span> <span class="nb">coroutine.yield</span><span class="p">(</span><span class="n">count</span><span class="p">)</span>
            <span class="k">if</span> <span class="ow">not</span> <span class="n">WHITESPACE</span><span class="p">[</span><span class="n">codepoint</span><span class="p">]</span> <span class="k">then</span>
                <span class="n">count</span> <span class="o">=</span> <span class="n">count</span> <span class="o">+</span> <span class="mi">1</span>
                <span class="k">break</span>
            <span class="k">end</span>
        <span class="k">end</span>
        <span class="k">while</span> <span class="kc">true</span> <span class="k">do</span>
            <span class="c1">-- high signal</span>
            <span class="kd">local</span> <span class="n">codepoint</span> <span class="o">=</span> <span class="nb">coroutine.yield</span><span class="p">(</span><span class="n">count</span><span class="p">)</span>
            <span class="k">if</span> <span class="n">WHITESPACE</span><span class="p">[</span><span class="n">codepoint</span><span class="p">]</span> <span class="k">then</span>
                <span class="k">break</span>
            <span class="k">end</span>
        <span class="k">end</span>
    <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Except for initially priming the coroutine, at least <code class="language-plaintext highlighter-rouge">coroutine.wrap()</code>
hides the fact that it’s a coroutine.</p>

<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">wc</span> <span class="o">=</span> <span class="nb">coroutine.wrap</span><span class="p">(</span><span class="n">wordcount</span><span class="p">)</span>
<span class="n">wc</span><span class="p">()</span>  <span class="c1">-- prime the coroutine</span>
<span class="n">wc</span><span class="p">(</span><span class="nb">string.byte</span><span class="p">(</span><span class="s1">'A'</span><span class="p">))</span>  <span class="c1">-- =&gt; 1</span>
<span class="n">wc</span><span class="p">(</span><span class="nb">string.byte</span><span class="p">(</span><span class="s1">' '</span><span class="p">))</span>  <span class="c1">-- =&gt; 1</span>
<span class="n">wc</span><span class="p">(</span><span class="nb">string.byte</span><span class="p">(</span><span class="s1">'B'</span><span class="p">))</span>  <span class="c1">-- =&gt; 2</span>
<span class="n">wc</span><span class="p">(</span><span class="nb">string.byte</span><span class="p">(</span><span class="s1">' '</span><span class="p">))</span>  <span class="c1">-- =&gt; 2</span>
</code></pre></div></div>

<h3 id="extra-examples">Extra examples</h3>

<p>Finally, a couple more examples not worth describing in detail here. First
a Unicode case folding state machine:</p>

<p><a href="https://github.com/skeeto/scratch/blob/master/misc/casefold.c">https://github.com/skeeto/scratch/blob/master/misc/casefold.c</a></p>

<p>It’s just an interface to do a lookup into the <a href="https://www.unicode.org/Public/13.0.0/ucd/CaseFolding.txt">official case folding
table</a>. It was an experiment, and I <em>probably</em> wouldn’t use it in a
real program.</p>

<p>Second, I’ve mentioned <a href="https://github.com/skeeto/utf-7">my UTF-7 encoder and decoder</a> before. It’s
not obvious from the interface, but internally it’s just a state machine
for both encoder and decoder, which is what it allows it to “pause”
between any pair of input/output bytes.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Asynchronously Opening and Closing Files in Asyncio</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2020/09/04/"/>
    <id>urn:uuid:ae94da45-f65d-4c72-a10e-9e421ea843ec</id>
    <updated>2020-09-04T01:36:20Z</updated>
    <category term="c"/><category term="linux"/><category term="python"/><category term="asyncio"/>
    <content type="html">
      <![CDATA[<p>Python <a href="https://docs.python.org/3/library/asyncio.html">asyncio</a> has support for asynchronous networking,
subprocesses, and interprocess communication. However, it has nothing
for asynchronous file operations — opening, reading, writing, or
closing. This is likely in part because operating systems themselves
also lack these facilities. If a file operation takes a long time,
perhaps because the file is on a network mount, then the entire Python
process will hang. It’s possible to work around this, so let’s build a
utility that can asynchronously open and close files.</p>

<p>The usual way to work around the lack of operating system support for a
particular asynchronous operation is to <a href="http://docs.libuv.org/en/v1.x/design.html#file-i-o">dedicate threads to waiting on
those operations</a>. By using a thread pool, we can even avoid the
overhead of spawning threads when we need them. Plus asyncio is designed
to play nicely with thread pools anyway.</p>

<h3 id="test-setup">Test setup</h3>

<p>Before we get started, we’ll need some way to test that it’s working. We
need a slow file system. One thought is to <a href="/blog/2018/06/23/">use ptrace to intercept the
relevant system calls</a>, though this isn’t quite so simple. The
other threads need to continue running while the thread waiting on
<code class="language-plaintext highlighter-rouge">open(2)</code> is paused, but ptrace pauses the whole process. Fortunately
there’s a simpler solution anyway: <code class="language-plaintext highlighter-rouge">LD_PRELOAD</code>.</p>

<p>Setting the <code class="language-plaintext highlighter-rouge">LD_PRELOAD</code> environment variable to the name of a shared
object will cause the loader to load this shared object ahead of
everything else, allowing that shared object to override other
libraries. I’m on x86-64 Linux (Debian), and so I’m looking to override
<code class="language-plaintext highlighter-rouge">open64(2)</code> in glibc. Here’s my <code class="language-plaintext highlighter-rouge">open64.c</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define _GNU_SOURCE
#include</span> <span class="cpf">&lt;dlfcn.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;string.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;unistd.h&gt;</span><span class="cp">
</span>
<span class="kt">int</span>
<span class="nf">open64</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">path</span><span class="p">,</span> <span class="kt">int</span> <span class="n">flags</span><span class="p">,</span> <span class="kt">int</span> <span class="n">mode</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">strncmp</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="s">"/tmp/"</span><span class="p">,</span> <span class="mi">5</span><span class="p">))</span> <span class="p">{</span>
        <span class="n">sleep</span><span class="p">(</span><span class="mi">3</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="kt">int</span> <span class="p">(</span><span class="o">*</span><span class="n">f</span><span class="p">)(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="p">,</span> <span class="kt">int</span><span class="p">,</span> <span class="kt">int</span><span class="p">)</span> <span class="o">=</span> <span class="n">dlsym</span><span class="p">(</span><span class="n">RTLD_NEXT</span><span class="p">,</span> <span class="s">"open64"</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">f</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="n">mode</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Now Python must go through my C function when it opens files. If the
file resides where under <code class="language-plaintext highlighter-rouge">/tmp/</code>, opening the file will be delayed by 3
seconds. Since I still want to actually open a file, I use <code class="language-plaintext highlighter-rouge">dlsym()</code> to
access the <em>real</em> <code class="language-plaintext highlighter-rouge">open64()</code> in glibc. I build it like so:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cc -shared -fPIC -o open64.so open64.c -ldl
</code></pre></div></div>

<p>And to test that it works with Python, let’s time how long it takes to
open <code class="language-plaintext highlighter-rouge">/tmp/x</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ touch /tmp/x
$ time LD_PRELOAD=./open64.so python3 -c 'open("/tmp/x")'

real    0m3.021s
user    0m0.014s
sys     0m0.005s
</code></pre></div></div>

<p>Perfect! (Note: It’s a little strange putting <code class="language-plaintext highlighter-rouge">time</code> <em>before</em> setting the
environment variable, but that’s because I’m using Bash and it <code class="language-plaintext highlighter-rouge">time</code> is
special since this is the shell’s version of the command.)</p>

<h3 id="thread-pools">Thread pools</h3>

<p>Python’s standard <code class="language-plaintext highlighter-rouge">open()</code> is most commonly used as a <em>context manager</em>
so that the file is automatically closed no matter what happens.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s">'output.txt'</span><span class="p">,</span> <span class="s">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">out</span><span class="p">:</span>
    <span class="k">print</span><span class="p">(</span><span class="s">'hello world'</span><span class="p">,</span> <span class="nb">file</span><span class="o">=</span><span class="n">out</span><span class="p">)</span>
</code></pre></div></div>

<p>I’d like my asynchronous open to follow this pattern using <a href="https://www.python.org/dev/peps/pep-0492/"><code class="language-plaintext highlighter-rouge">async
with</code></a>. It’s like <code class="language-plaintext highlighter-rouge">with</code>, but the context manager is acquired and
released asynchronously. I’ll call my version <code class="language-plaintext highlighter-rouge">aopen()</code>:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">with</span> <span class="n">aopen</span><span class="p">(</span><span class="s">'output.txt'</span><span class="p">,</span> <span class="s">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">out</span><span class="p">:</span>
    <span class="p">...</span>
</code></pre></div></div>

<p>So <code class="language-plaintext highlighter-rouge">aopen()</code> will need to return an <em>asynchronous context manager</em>, an
object with methods <code class="language-plaintext highlighter-rouge">__aenter__</code> and <code class="language-plaintext highlighter-rouge">__aexit__</code> that both return
<a href="https://docs.python.org/3/glossary.html#term-awaitable"><em>awaitables</em></a>. Usually this is by virtue of these methods being
<a href="https://docs.python.org/3/glossary.html#term-coroutine-function"><em>coroutine functions</em></a>, but a normal function that directly returns
an awaitable also works, which is what I’ll be doing for <code class="language-plaintext highlighter-rouge">__aenter__</code>.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">_AsyncOpen</span><span class="p">():</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">args</span><span class="p">,</span> <span class="n">kwargs</span><span class="p">):</span>
        <span class="p">...</span>

    <span class="k">def</span> <span class="nf">__aenter__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="p">...</span>

    <span class="k">async</span> <span class="k">def</span> <span class="nf">__aexit__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">exc_type</span><span class="p">,</span> <span class="n">exc</span><span class="p">,</span> <span class="n">tb</span><span class="p">):</span>
        <span class="p">...</span>
</code></pre></div></div>

<p>Ultimately we have to call <code class="language-plaintext highlighter-rouge">open()</code>. The arguments for <code class="language-plaintext highlighter-rouge">open()</code> will be
given to the constructor to be used later. This will make more sense
when you see the definition for <code class="language-plaintext highlighter-rouge">aopen()</code>.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">args</span><span class="p">,</span> <span class="n">kwargs</span><span class="p">):</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">_args</span> <span class="o">=</span> <span class="n">args</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">_kwargs</span> <span class="o">=</span> <span class="n">kwargs</span>
</code></pre></div></div>

<p>When it’s time to actually open the file, Python will call <code class="language-plaintext highlighter-rouge">__aenter__</code>.
We can’t call <code class="language-plaintext highlighter-rouge">open()</code> directly since that will block, so we’ll use a
thread pool to wait on it. Rather than create a thread pool, we’ll use
the one that comes with the current event loop. The <code class="language-plaintext highlighter-rouge">run_in_executor()</code>
method runs a function in a thread pool — where <code class="language-plaintext highlighter-rouge">None</code> means use the
default pool — returning an asyncio future representing the future
result, in this case the opened file object.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">def</span> <span class="nf">__aenter__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="k">def</span> <span class="nf">thread_open</span><span class="p">():</span>
            <span class="k">return</span> <span class="nb">open</span><span class="p">(</span><span class="o">*</span><span class="bp">self</span><span class="p">.</span><span class="n">_args</span><span class="p">,</span> <span class="o">**</span><span class="bp">self</span><span class="p">.</span><span class="n">_kwargs</span><span class="p">)</span>
        <span class="n">loop</span> <span class="o">=</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">get_event_loop</span><span class="p">()</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">_future</span> <span class="o">=</span> <span class="n">loop</span><span class="p">.</span><span class="n">run_in_executor</span><span class="p">(</span><span class="bp">None</span><span class="p">,</span> <span class="n">thread_open</span><span class="p">)</span>
        <span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">_future</span>
</code></pre></div></div>

<p>Since this <code class="language-plaintext highlighter-rouge">__aenter__</code> is not a coroutine function, it returns the
future directly as its awaitable result. The caller will await it.</p>

<p>The default thread pool is limited to one thread per core, which I
suppose is the most obvious choice, though not ideal here. That’s fine
for CPU-bound operations but not for I/O-bound operations. In a real
program we may want to use a larger thread pool.</p>

<p>Closing a file may block, so we’ll do that in a thread pool as well.
First pull the file object <a href="/blog/2020/07/30/">from the future</a>, then close it in the
thread pool, waiting until the file has actually closed:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">async</span> <span class="k">def</span> <span class="nf">__aexit__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">exc_type</span><span class="p">,</span> <span class="n">exc</span><span class="p">,</span> <span class="n">tb</span><span class="p">):</span>
        <span class="nb">file</span> <span class="o">=</span> <span class="k">await</span> <span class="bp">self</span><span class="p">.</span><span class="n">_future</span>
        <span class="k">def</span> <span class="nf">thread_close</span><span class="p">():</span>
            <span class="nb">file</span><span class="p">.</span><span class="n">close</span><span class="p">()</span>
        <span class="n">loop</span> <span class="o">=</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">get_event_loop</span><span class="p">()</span>
        <span class="k">await</span> <span class="n">loop</span><span class="p">.</span><span class="n">run_in_executor</span><span class="p">(</span><span class="bp">None</span><span class="p">,</span> <span class="n">thread_close</span><span class="p">)</span>
</code></pre></div></div>

<p>The open and close are paired in this context manager, but it may be
concurrent with an arbitrary number of other <code class="language-plaintext highlighter-rouge">_AsyncOpen</code> context
managers. There will be some upper limit to the number of open files, so
<strong>we need to be careful not to use too many of these things
concurrently</strong>, something <a href="/blog/2020/05/24/">which easily happens when using unbounded
queues</a>. Lacking back pressure, all it takes is for tasks to be
opening files slightly faster than they close them.</p>

<p>With all the hard work done, the definition for <code class="language-plaintext highlighter-rouge">aopen()</code> is trivial:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">aopen</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
    <span class="k">return</span> <span class="n">_AsyncOpen</span><span class="p">(</span><span class="n">args</span><span class="p">,</span> <span class="n">kwargs</span><span class="p">)</span>
</code></pre></div></div>

<p>That’s it! Let’s try it out with the <code class="language-plaintext highlighter-rouge">LD_PRELOAD</code> test.</p>

<h3 id="a-test-drive">A test drive</h3>

<p>First define a “heartbeat” task that will tell us the asyncio loop is
still chugging away while we wait on opening the file.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">def</span> <span class="nf">heartbeat</span><span class="p">():</span>
    <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
        <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mf">0.5</span><span class="p">)</span>
        <span class="k">print</span><span class="p">(</span><span class="s">'HEARTBEAT'</span><span class="p">)</span>
</code></pre></div></div>

<p>Here’s a test function for <code class="language-plaintext highlighter-rouge">aopen()</code> that asynchronously opens a file
under <code class="language-plaintext highlighter-rouge">/tmp/</code> named by an integer, (synchronously) writes that integer
to the file, then asynchronously closes it.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">def</span> <span class="nf">write</span><span class="p">(</span><span class="n">i</span><span class="p">):</span>
    <span class="k">async</span> <span class="k">with</span> <span class="n">aopen</span><span class="p">(</span><span class="sa">f</span><span class="s">'/tmp/</span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s">'</span><span class="p">,</span> <span class="s">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">out</span><span class="p">:</span>
        <span class="k">print</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="nb">file</span><span class="o">=</span><span class="n">out</span><span class="p">)</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">main()</code> function creates the heartbeat task and opens 4 files
concurrently though the intercepted file opening routine:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
    <span class="n">beat</span> <span class="o">=</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">create_task</span><span class="p">(</span><span class="n">heartbeat</span><span class="p">())</span>
    <span class="n">tasks</span> <span class="o">=</span> <span class="p">[</span><span class="n">asyncio</span><span class="p">.</span><span class="n">create_task</span><span class="p">(</span><span class="n">write</span><span class="p">(</span><span class="n">i</span><span class="p">))</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">)]</span>
    <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">gather</span><span class="p">(</span><span class="o">*</span><span class="n">tasks</span><span class="p">)</span>
    <span class="n">beat</span><span class="p">.</span><span class="n">cancel</span><span class="p">()</span>

<span class="n">asyncio</span><span class="p">.</span><span class="n">run</span><span class="p">(</span><span class="n">main</span><span class="p">())</span>
</code></pre></div></div>

<p>The result:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ LD_PRELOAD=./open64.so python3 aopen.py
HEARTBEAT
HEARTBEAT
HEARTBEAT
HEARTBEAT
HEARTBEAT
HEARTBEAT
$ cat /tmp/{1,2,3,4}
1
2
3
4
</code></pre></div></div>

<p>As expected, 6 heartbeats corresponding to 3 seconds that all 4 tasks
spent concurrently waiting on the intercepted <code class="language-plaintext highlighter-rouge">open()</code>. Here’s the full
source if you want to try it our for yourself:</p>

<p><a href="https://gist.github.com/skeeto/89af673a0a0d24de32ad19ee505c8dbd">https://gist.github.com/skeeto/89af673a0a0d24de32ad19ee505c8dbd</a></p>

<h3 id="caveat-no-asynchronous-reads-and-writes">Caveat: no asynchronous reads and writes</h3>

<p><em>Only</em> opening and closing the file is asynchronous. Read and writes are
unchanged, still fully synchronous and blocking, so this is only a half
solution. A full solution is not nearly as simple because asyncio is
async/await. Asynchronous reads and writes would require all new APIs
<a href="https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/">with different coloring</a>. You’d need an <code class="language-plaintext highlighter-rouge">aprint()</code> to complement
<code class="language-plaintext highlighter-rouge">print()</code>, and so on, each returning an <code class="language-plaintext highlighter-rouge">awaitable</code> to be awaited.</p>

<p>This is one of the unfortunate downsides of async/await. I strongly
prefer conventional, preemptive concurrency, <em>but</em> we don’t always have
that luxury.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Conventions for Command Line Options</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2020/08/01/"/>
    <id>urn:uuid:9be2ce0e-298e-4085-8789-49674aecfeeb</id>
    <updated>2020-08-01T00:34:23Z</updated>
    <category term="tutorial"/><category term="posix"/><category term="c"/><category term="python"/><category term="go"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=24020952">on Hacker News</a> and critiqued <a href="https://utcc.utoronto.ca/~cks/space/blog/unix/MyOptionsConventions">on
Wandering Thoughts</a> (<a href="https://utcc.utoronto.ca/~cks/space/blog/unix/UnixOptionsConventions">2</a>, <a href="https://utcc.utoronto.ca/~cks/space/blog/python/ArgparseSomeUnixNotes">3</a>).</em></p>

<p>Command line interfaces have varied throughout their brief history but
have largely converged to some common, sound conventions. The core
<a href="https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html">originates from unix</a>, and the Linux ecosystem extended it,
particularly via the GNU project. Unfortunately some tools initially
<em>appear</em> to follow the conventions, but subtly get them wrong, usually
for no practical benefit. I believe in many cases the authors simply
didn’t know any better, so I’d like to review the conventions.</p>

<!--more-->

<h3 id="short-options">Short Options</h3>

<p>The simplest case is the <em>short option</em> flag. An option is a hyphen —
specifically HYPHEN-MINUS U+002D — followed by one alphanumeric
character. Capital letters are acceptable. The letters themselves <a href="http://www.catb.org/~esr/writings/taoup/html/ch10s05.html">have
conventional meanings</a> and are worth following if possible.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program -a -b -c
</code></pre></div></div>

<p>Flags can be grouped together into one program argument. This is both
convenient and unambiguous. It’s also one of those often missed details
when programs use hand-coded argument parsers, and the lack of support
irritates me.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program -abc
program -acb
</code></pre></div></div>

<p>The next simplest case are short options that take arguments. The
argument follows the option.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program -i input.txt -o output.txt
</code></pre></div></div>

<p>The space is optional, so the option and argument can be packed together
into one program argument. Since the argument is required, this is still
unambiguous. This is another often-missed feature in hand-coded parsers.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program -iinput.txt -ooutput.txt
</code></pre></div></div>

<p>This does not prohibit grouping. When grouped, the option accepting an
argument must be last.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program -abco output.txt
program -abcooutput.txt
</code></pre></div></div>

<p>This technique is used to create another category, <em>optional option
arguments</em>. The option’s argument can be optional but still unambiguous
so long as the space is always omitted when the argument is present.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program -c       # omitted
program -cblue   # provided
program -c blue  # omitted (blue is a new argument)

program -c -x   # two separate flags
program -c-x    # -c with argument "-x"
</code></pre></div></div>

<p>Optional option arguments should be used judiciously since they can be
surprising, but they have their uses.</p>

<p>Options can typically appear in any order — something parsers often
achieve via <em>permutation</em> — but non-options typically follow options.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program -a -b foo bar
program -b -a foo bar
</code></pre></div></div>

<p>GNU-style programs usually allow options and non-options to be mixed,
though I don’t consider this to be essential.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program -a foo -b bar
program foo -a -b bar
program foo bar -a -b
</code></pre></div></div>

<p>If a non-option looks like an option because it starts with a hyphen,
use <code class="language-plaintext highlighter-rouge">--</code> to demarcate options from non-options.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program -a -b -- -x foo bar
</code></pre></div></div>

<p>An advantage of requiring that non-options follow options is that the
first non-option demarcates the two groups, so <code class="language-plaintext highlighter-rouge">--</code> is less often
needed.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># note: without argument permutation
program -a -b foo -x bar  # 2 options, 3 non-options
</code></pre></div></div>

<h3 id="long-options">Long options</h3>

<p>Since short options can be cryptic, and there are such a limited number
of them, more complex programs support long options. A long option
starts with two hyphens followed by one or more alphanumeric, lowercase
words. Hyphens separate words. Using two hyphens prevents long options
from being confused for grouped short options.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program --reverse --ignore-backups
</code></pre></div></div>

<p>Occasionally flags are paired with a mutually exclusive inverse flag
that begins with <code class="language-plaintext highlighter-rouge">--no-</code>. This avoids a future <em>flag day</em> where the
default is changed in the release that also adds the flag implementing
the original behavior.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program --sort
program --no-sort
</code></pre></div></div>

<p>Long options can similarly accept arguments.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program --output output.txt --block-size 1024
</code></pre></div></div>

<p>These may optionally be connected to the argument with an equals sign
<code class="language-plaintext highlighter-rouge">=</code>, much like omitting the space for a short option argument.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program --output=output.txt --block-size=1024
</code></pre></div></div>

<p>Like before, this opens up the doors for optional option arguments. Due
to the required <code class="language-plaintext highlighter-rouge">=</code> this is still unambiguous.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program --color --reverse
program --color=never --reverse
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">--</code> retains its original behavior of disambiguating option-like
non-option arguments:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program --reverse -- --foo bar
</code></pre></div></div>

<h3 id="subcommands">Subcommands</h3>

<p>Some programs, such as Git, have subcommands each with their own
options. The main program itself may still have its own options distinct
from subcommand options. The program’s options come before the
subcommand and subcommand options follow the subcommand. Options are
never permuted around the subcommand.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program -a -b -c subcommand -x -y -z
program -abc subcommand -xyz
</code></pre></div></div>

<p>Above, the <code class="language-plaintext highlighter-rouge">-a</code>, <code class="language-plaintext highlighter-rouge">-b</code>, and <code class="language-plaintext highlighter-rouge">-c</code> options are for <code class="language-plaintext highlighter-rouge">program</code>, and the
others are for <code class="language-plaintext highlighter-rouge">subcommand</code>. So, really, the subcommand is another
command line of its own.</p>

<h3 id="option-parsing-libraries">Option parsing libraries</h3>

<p>There’s little excuse for not getting these conventions right assuming
you’re interested in following the conventions. Short options can be
parsed correctly in <a href="https://github.com/skeeto/getopt">just ~60 lines of C code</a>. Long options are
<a href="https://github.com/skeeto/optparse">just slightly more complex</a>.</p>

<p>GNU’s <code class="language-plaintext highlighter-rouge">getopt_long()</code> supports long option abbreviation — with no way to
disable it (!) — but <a href="https://utcc.utoronto.ca/~cks/space/blog/python/ArgparseAbbreviatedOptions">this should be avoided</a>.</p>

<p>Go’s <a href="https://golang.org/pkg/flag/">flag package</a> intentionally deviates from the conventions.
It only supports long option semantics, via a single hyphen. This makes
it impossible to support grouping even if all options are only one
letter. Also, the only way to combine option and argument into a single
command line argument is with <code class="language-plaintext highlighter-rouge">=</code>. It’s sound, but I miss both features
every time I write programs in Go. That’s why I <a href="https://github.com/skeeto/optparse-go">wrote my own argument
parser</a>. Not only does it have a nicer feature set, I like the API a
lot more, too.</p>

<p>Python’s primary option parsing library is <code class="language-plaintext highlighter-rouge">argparse</code>, and I just can’t
stand it. Despite appearing to follow convention, it actually breaks
convention <em>and</em> its behavior is unsound. For instance, the following
program has two options, <code class="language-plaintext highlighter-rouge">--foo</code> and <code class="language-plaintext highlighter-rouge">--bar</code>. The <code class="language-plaintext highlighter-rouge">--foo</code> option accepts
an optional argument, and the <code class="language-plaintext highlighter-rouge">--bar</code> option is a simple flag.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">argparse</span>
<span class="kn">import</span> <span class="nn">sys</span>

<span class="n">parser</span> <span class="o">=</span> <span class="n">argparse</span><span class="p">.</span><span class="n">ArgumentParser</span><span class="p">()</span>
<span class="n">parser</span><span class="p">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s">'--foo'</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="nb">str</span><span class="p">,</span> <span class="n">nargs</span><span class="o">=</span><span class="s">'?'</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="s">'X'</span><span class="p">)</span>
<span class="n">parser</span><span class="p">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s">'--bar'</span><span class="p">,</span> <span class="n">action</span><span class="o">=</span><span class="s">'store_true'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">parser</span><span class="p">.</span><span class="n">parse_args</span><span class="p">(</span><span class="n">sys</span><span class="p">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">:]))</span>
</code></pre></div></div>

<p>Here are some example runs:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ python parse.py
Namespace(bar=False, foo='X')

$ python parse.py --foo
Namespace(bar=False, foo=None)

$ python parse.py --foo=arg
Namespace(bar=False, foo='arg')

$ python parse.py --bar --foo
Namespace(bar=True, foo=None)

$ python parse.py --foo arg
Namespace(bar=False, foo='arg')
</code></pre></div></div>

<p>Everything looks good except the last. If the <code class="language-plaintext highlighter-rouge">--foo</code> argument is
optional then why did it consume <code class="language-plaintext highlighter-rouge">arg</code>? What happens if I follow it with
<code class="language-plaintext highlighter-rouge">--bar</code>? Will it consume it as the argument?</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ python parse.py --foo --bar
Namespace(bar=True, foo=None)
</code></pre></div></div>

<p>Nope! Unlike <code class="language-plaintext highlighter-rouge">arg</code>, it left <code class="language-plaintext highlighter-rouge">--bar</code> alone, so instead of following the
unambiguous conventions, it has its own ambiguous semantics and attempts
to remedy them with a “smart” heuristic: “If an optional argument <em>looks
like</em> an option, then it must be an option!” Non-option arguments can
never follow an option with an optional argument, which makes that
feature pretty useless. Since <code class="language-plaintext highlighter-rouge">argparse</code> does not properly support <code class="language-plaintext highlighter-rouge">--</code>,
that does not help.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ python parse.py --foo -- arg
usage: parse.py [-h] [--foo [FOO]] [--bar]
parse.py: error: unrecognized arguments: -- arg
</code></pre></div></div>

<p>Please, stick to the conventions unless you have <em>really</em> good reasons
to break them!</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Exactly-Once Initialization in Asynchronous Python</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2020/07/30/"/>
    <id>urn:uuid:c6796958-9178-47be-8411-8f48c2c85d83</id>
    <updated>2020-07-30T23:39:12Z</updated>
    <category term="python"/><category term="asyncio"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=24007354">on Hacker News</a>.</em></p>

<p>A common situation in <a href="https://docs.python.org/3/library/asyncio.html">asyncio</a> Python programs is asynchronous
initialization. Some resource must be initialized exactly once before it
can be used, but the initialization itself is asynchronous — such as an
<a href="https://github.com/MagicStack/asyncpg">asyncpg</a> database. Let’s talk about a couple of solutions.</p>

<!--more-->

<p>The naive “solution” would be to track the initialization state in a
variable:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">initialized</span> <span class="o">=</span> <span class="bp">False</span>

<span class="k">async</span> <span class="k">def</span> <span class="nf">one_time_setup</span><span class="p">():</span>
    <span class="s">"Do not call more than once!"</span>
    <span class="p">...</span>

<span class="k">async</span> <span class="k">def</span> <span class="nf">maybe_initialize</span><span class="p">():</span>
    <span class="k">global</span> <span class="n">initialized</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="n">initialized</span><span class="p">:</span>
        <span class="k">await</span> <span class="n">one_time_setup</span><span class="p">()</span>
        <span class="n">initialized</span> <span class="o">=</span> <span class="bp">True</span>
</code></pre></div></div>

<p>The reasoning for <code class="language-plaintext highlighter-rouge">initialized</code> is the expectation of calling the
function more than once. However, if it might be called from concurrent
tasks there’s a <em>race condition</em>. If the second caller arrives while the
first is awaiting <code class="language-plaintext highlighter-rouge">one_time_setup()</code>, the function will be called a
second time.</p>

<p>Switching the order of the call and the assignment won’t help:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">def</span> <span class="nf">maybe_initialize</span><span class="p">():</span>
    <span class="k">global</span> <span class="n">initialized</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="n">initialized</span><span class="p">:</span>
        <span class="n">initialized</span> <span class="o">=</span> <span class="bp">True</span>
        <span class="k">await</span> <span class="n">one_time_setup</span><span class="p">()</span>
</code></pre></div></div>

<p>Since asyncio is cooperative, the first caller doesn’t give up control
until to other tasks until the <code class="language-plaintext highlighter-rouge">await</code>, meaning <code class="language-plaintext highlighter-rouge">one_time_setup()</code> will
never be called twice. However, the second caller may return before
<code class="language-plaintext highlighter-rouge">one_time_setup()</code> has completed. What we want is for <code class="language-plaintext highlighter-rouge">one_time_setup()</code>
to be called exactly once, but for no caller to return until it has
returned.</p>

<h3 id="mutual-exclusion">Mutual exclusion</h3>

<p>My first thought was to use a <a href="https://docs.python.org/3/library/asyncio-sync.html#lock">mutex lock</a>. This will protect the
variable <em>and</em> prevent followup callers from progressing too soon. Tasks
arriving while <code class="language-plaintext highlighter-rouge">one_time_setup()</code> is still running will block on the
lock.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">initialized</span> <span class="o">=</span> <span class="bp">False</span>
<span class="n">initialized_lock</span> <span class="o">=</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">Lock</span><span class="p">()</span>

<span class="k">async</span> <span class="k">def</span> <span class="nf">maybe_initialize</span><span class="p">():</span>
    <span class="k">global</span> <span class="n">initialized</span>
    <span class="k">async</span> <span class="k">with</span> <span class="n">initialized_lock</span><span class="p">:</span>
        <span class="k">if</span> <span class="ow">not</span> <span class="n">initialized</span><span class="p">:</span>
            <span class="k">await</span> <span class="n">one_time_setup</span><span class="p">()</span>
            <span class="n">initialized</span> <span class="o">=</span> <span class="bp">True</span>
</code></pre></div></div>

<p>Unfortunately this has a serious downside: <strong>asyncio locks are
associated with the <a href="https://docs.python.org/3/library/asyncio-eventloop.html">loop</a> where they were created</strong>. Since the
lock variable is global, <code class="language-plaintext highlighter-rouge">maybe_initialize()</code> can only be called from
the same loop that loaded the module. <code class="language-plaintext highlighter-rouge">asyncio.run()</code> creates a new loop
so it’s incompatible.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># create a loop: always an error
</span><span class="n">asyncio</span><span class="p">.</span><span class="n">run</span><span class="p">(</span><span class="n">maybe_initialize</span><span class="p">())</span>

<span class="c1"># reuse the loop: maybe an error
</span><span class="n">loop</span> <span class="o">=</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">get_event_loop</span><span class="p">()</span>
<span class="n">loop</span><span class="p">.</span><span class="n">run_until_complete</span><span class="p">((</span><span class="n">maybe_initialize</span><span class="p">()))</span>
</code></pre></div></div>

<p>(IMHO, it was a mistake for the asyncio API to include explicit loop
objects. It’s a low-level concept that unavoidably leaks through most
high-level abstractions.)</p>

<p>A workaround is to create the lock lazily. Thank goodness creating a
lock isn’t itself asynchronous!</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">initialized</span> <span class="o">=</span> <span class="bp">False</span>
<span class="n">initialized_lock</span> <span class="o">=</span> <span class="bp">None</span>

<span class="k">async</span> <span class="k">def</span> <span class="nf">maybe_initialize</span><span class="p">():</span>
    <span class="k">global</span> <span class="n">initialized</span><span class="p">,</span> <span class="n">initialized_lock</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="n">initialized_lock</span><span class="p">:</span>
        <span class="n">initialized_lock</span> <span class="o">=</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">Lock</span><span class="p">()</span>
    <span class="k">async</span> <span class="k">with</span> <span class="n">initialized_lock</span><span class="p">:</span>
        <span class="k">if</span> <span class="ow">not</span> <span class="n">initialized</span><span class="p">:</span>
            <span class="k">await</span> <span class="n">one_time_setup</span><span class="p">()</span>
            <span class="n">initialized</span> <span class="o">=</span> <span class="bp">True</span>
</code></pre></div></div>

<p>This is better, but <code class="language-plaintext highlighter-rouge">maybe_initialize()</code> can still only ever be called
from a single loop.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">asyncio</span><span class="p">.</span><span class="n">run</span><span class="p">(</span><span class="n">maybe_initialize</span><span class="p">())</span> <span class="c1"># ok
</span><span class="n">asyncio</span><span class="p">.</span><span class="n">run</span><span class="p">(</span><span class="n">maybe_initialize</span><span class="p">())</span> <span class="c1"># error!
</span></code></pre></div></div>

<h3 id="once">Once</h3>

<p>The pthreads API provides <a href="https://pubs.opengroup.org/onlinepubs/007908799/xsh/pthread_once.html"><code class="language-plaintext highlighter-rouge">pthread_once</code></a> to solve this problem.
C++11 has similarly has <a href="https://en.cppreference.com/w/cpp/thread/call_once"><code class="language-plaintext highlighter-rouge">std::call_once</code></a>. We can build something
similar using a future-like object.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">future</span> <span class="o">=</span> <span class="bp">None</span>

<span class="k">async</span> <span class="k">def</span> <span class="nf">maybe_initialize</span><span class="p">():</span>
    <span class="k">global</span> <span class="n">future</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="n">future</span><span class="p">:</span>
        <span class="n">future</span> <span class="o">=</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">create_task</span><span class="p">(</span><span class="n">one_time_setup</span><span class="p">())</span>
    <span class="k">await</span> <span class="n">future</span>
</code></pre></div></div>

<p>Awaiting a coroutine more than once is an error, but <a href="https://docs.python.org/3/library/asyncio-task.html#task-object">tasks</a> are
future-like objects and can be awaited more than once. At least on
CPython, they can also be awaited in other loops! So not only is this
simpler, it also solves the loop problem!</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">asyncio</span><span class="p">.</span><span class="n">run</span><span class="p">(</span><span class="n">maybe_initialize</span><span class="p">())</span> <span class="c1"># ok
</span><span class="n">asyncio</span><span class="p">.</span><span class="n">run</span><span class="p">(</span><span class="n">maybe_initialize</span><span class="p">())</span> <span class="c1"># still ok
</span></code></pre></div></div>

<p>This can be tidied up nicely in a <code class="language-plaintext highlighter-rouge">@once</code> decorator:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">once</span><span class="p">(</span><span class="n">func</span><span class="p">):</span>
    <span class="n">future</span> <span class="o">=</span> <span class="bp">None</span>
    <span class="k">async</span> <span class="k">def</span> <span class="nf">once_wrapper</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
        <span class="k">nonlocal</span> <span class="n">future</span>
        <span class="k">if</span> <span class="ow">not</span> <span class="n">future</span><span class="p">:</span>
            <span class="n">future</span> <span class="o">=</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">create_task</span><span class="p">(</span><span class="n">func</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">))</span>
        <span class="k">return</span> <span class="k">await</span> <span class="n">future</span>
    <span class="k">return</span> <span class="n">once_wrapper</span>
</code></pre></div></div>

<p>No more need for <code class="language-plaintext highlighter-rouge">maybe_initialize()</code>, just decorate the original
<code class="language-plaintext highlighter-rouge">one_time_setup()</code>:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">@</span><span class="n">once</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">one_time_setup</span><span class="p">():</span>
    <span class="p">...</span>
</code></pre></div></div>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>Latency in Asynchronous Python</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2020/05/24/"/>
    <id>urn:uuid:529e2382-d4ec-47a9-93a8-f450311e5a05</id>
    <updated>2020-05-24T02:44:50Z</updated>
    <category term="python"/><category term="asyncio"/>
    <content type="html">
      <![CDATA[<p>This week I was debugging a misbehaving Python program that makes
significant use of <a href="https://docs.python.org/3/library/asyncio.html">Python’s asyncio</a>. The program would
eventually take very long periods of time to respond to network
requests. My first suspicion was a CPU-heavy coroutine hogging the
thread, preventing the socket coroutines from running, but an
inspection with <code class="language-plaintext highlighter-rouge">pdb</code> showed this wasn’t the case. Instead, the
program’s author had made a couple of fundamental mistakes using
asyncio. Let’s discuss them using small examples.</p>

<p>Setting the stage: There’s a heartbeat coroutine that “beats” once per
second. A real program would send out a packet as the heartbeat, but
here it just prints how late it was scheduled.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">def</span> <span class="nf">heartbeat</span><span class="p">():</span>
    <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
        <span class="n">start</span> <span class="o">=</span> <span class="n">time</span><span class="p">.</span><span class="n">time</span><span class="p">()</span>
        <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
        <span class="n">delay</span> <span class="o">=</span> <span class="n">time</span><span class="p">.</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">start</span> <span class="o">-</span> <span class="mi">1</span>
        <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">'heartbeat delay = </span><span class="si">{</span><span class="n">delay</span><span class="si">:</span><span class="p">.</span><span class="mi">3</span><span class="n">f</span><span class="si">}</span><span class="s">s'</span><span class="p">)</span>
</code></pre></div></div>

<p>Running this with <code class="language-plaintext highlighter-rouge">asyncio.run(heartbeat())</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>heartbeat delay = 0.001s
heartbeat delay = 0.001s
heartbeat delay = 0.001s
</code></pre></div></div>

<p>It’s consistently 1ms late, but good enough, especially considering
what’s to come. A program that <em>only</em> sends a heartbeat is pretty
useless, so a real program will be busy working on other things
concurrently. In this example, we have little 10ms payloads of work to
do, which are represented by this <code class="language-plaintext highlighter-rouge">process()</code> function:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">JOB_DURATION</span> <span class="o">=</span> <span class="mf">0.01</span>  <span class="c1"># 10ms
</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">process</span><span class="p">():</span>
    <span class="n">time</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">JOB_DURATION</span><span class="p">)</span> <span class="c1"># simulate CPU time
</span></code></pre></div></div>

<p>That’s a synchronous sleep because it’s standing in for actual CPU work.
Maybe it’s parsing JSON in a loop or crunching numbers in NumPy. Use
your imagination. During this 10ms no other coroutines can be scheduled
because this is, after all, still <a href="https://rachelbythebay.com/w/2020/03/07/costly/">just a single-threaded program</a>.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">JOB_COUNT</span> <span class="o">=</span> <span class="mi">200</span>

<span class="k">async</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
    <span class="n">asyncio</span><span class="p">.</span><span class="n">create_task</span><span class="p">(</span><span class="n">heartbeat</span><span class="p">())</span>

    <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mf">2.5</span><span class="p">)</span>

    <span class="k">print</span><span class="p">(</span><span class="s">'begin processing'</span><span class="p">)</span>
    <span class="n">count</span> <span class="o">=</span> <span class="n">JOB_COUNT</span>
    <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">JOB_COUNT</span><span class="p">):</span>
        <span class="n">asyncio</span><span class="p">.</span><span class="n">create_task</span><span class="p">(</span><span class="n">process</span><span class="p">())</span>

    <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
</code></pre></div></div>

<p>This program starts the heartbeat coroutine in a task. A coroutine
doesn’t make progress unless someone is waiting on it, and that
something can be a task. So it will continue along independently without
prodding.</p>

<p>The arbitrary 2.5 second sleep simulates waiting, say, for a network
request. In the output we’ll see the heartbeat tick a couple of times,
then it will create and process 200 jobs concurrently. In a real program
we’d have some way to collect the results, but we can ignore that part
for now. They’re <em>only</em> 10ms, so the effect on the heartbeat should be
pretty small right?</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>heartbeat delay = 0.001s
heartbeat delay = 0.001s
begin processing
heartbeat delay = 1.534s
heartbeat delay = 0.001s
heartbeat delay = 0.001s
</code></pre></div></div>

<p>The heartbeat was delayed for 1.5 seconds by a mere 200 tasks each doing
only 10ms of work each. What happened?</p>

<p>Python calls the object that schedules tasks a <em>loop</em>, and this is no
coincidence. Everything to be scheduled gets put into a loop and is
scheduled round robin, one after another. The 200 tasks got scheduled
ahead of the heartbeat, and so it doesn’t get scheduled again until each
of those tasks either yields (<code class="language-plaintext highlighter-rouge">await</code>) or completes.</p>

<p>It really didn’t take much to significantly hamper the heartbeat, and,
with a <a href="/blog/2019/02/24/">dumb bytecode compiler</a>, 10ms may not be much work at all.
The lesson here is to avoid spawning many tasks if latency is an
important consideration.</p>

<h3 id="a-semaphore-is-not-the-answer">A semaphore is not the answer</h3>

<p>My first idea at a solution: What if we used a semaphore to limit the
number of “active” tasks at a time? Then perhaps the heartbeat wouldn’t
have to compete with so many other tasks for time.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">WORKER_COUNT</span> <span class="o">=</span> <span class="mi">4</span>  <span class="c1"># max "active" jobs at a time
</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">main_with_semaphore</span><span class="p">():</span>
    <span class="n">asyncio</span><span class="p">.</span><span class="n">create_task</span><span class="p">(</span><span class="n">heartbeat</span><span class="p">())</span>

    <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mf">2.5</span><span class="p">)</span>

    <span class="n">sem</span> <span class="o">=</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">Semaphore</span><span class="p">(</span><span class="n">WORKER_COUNT</span><span class="p">)</span>
    <span class="k">async</span> <span class="k">def</span> <span class="nf">process</span><span class="p">():</span>
        <span class="k">await</span> <span class="n">sem</span><span class="p">.</span><span class="n">acquire</span><span class="p">()</span>
        <span class="n">time</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">JOB_DURATION</span><span class="p">)</span>
        <span class="n">sem</span><span class="p">.</span><span class="n">release</span><span class="p">()</span>

    <span class="k">print</span><span class="p">(</span><span class="s">'begin processing'</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">JOB_COUNT</span><span class="p">):</span>
        <span class="n">asyncio</span><span class="p">.</span><span class="n">create_task</span><span class="p">(</span><span class="n">process</span><span class="p">())</span>

    <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
</code></pre></div></div>

<p>When the heartbeat sleep completes, about half the jobs will be complete
and the other half blocked on the semaphore. So perhaps the heartbeat
gets to skip ahead of all the blocked tasks since they’re not yet ready
to run?</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>heartbeat delay = 0.001s
heartbeat delay = 0.001s
begin processing
heartbeat delay = 1.537s
heartbeat delay = 0.001s
heartbeat delay = 0.001s
</code></pre></div></div>

<p>It made no difference whatsoever because the tasks each “held their
place” in line in the loop! Even reducing <code class="language-plaintext highlighter-rouge">WORKER_COUNT</code> to 1 would have
no effect. As soon as a task completes, it frees the task waiting next
in line. The semaphore does practically nothing here.</p>

<h3 id="solving-it-with-a-job-queue">Solving it with a job queue</h3>

<p>Here’s what does work: a <a href="https://docs.python.org/3/library/asyncio-queue.html">job queue</a>. Create a queue to be populated
with coroutines (not tasks), and have a small number of tasks run jobs
from the queue. Since this is a real solution, I’ve made this example
more complete.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">def</span> <span class="nf">main_with_queue</span><span class="p">():</span>
    <span class="n">asyncio</span><span class="p">.</span><span class="n">create_task</span><span class="p">(</span><span class="n">heartbeat</span><span class="p">())</span>

    <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mf">2.5</span><span class="p">)</span>

    <span class="n">queue</span> <span class="o">=</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">Queue</span><span class="p">(</span><span class="n">maxsize</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
    <span class="k">async</span> <span class="k">def</span> <span class="nf">worker</span><span class="p">():</span>
        <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
            <span class="n">coro</span> <span class="o">=</span> <span class="k">await</span> <span class="n">queue</span><span class="p">.</span><span class="n">get</span><span class="p">()</span>
            <span class="k">await</span> <span class="n">coro</span>  <span class="c1"># consider using try/except
</span>            <span class="n">queue</span><span class="p">.</span><span class="n">task_done</span><span class="p">()</span>
    <span class="n">workers</span> <span class="o">=</span> <span class="p">[</span><span class="n">asyncio</span><span class="p">.</span><span class="n">create_task</span><span class="p">(</span><span class="n">worker</span><span class="p">())</span>
                   <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">WORKER_COUNT</span><span class="p">)]</span>

    <span class="k">print</span><span class="p">(</span><span class="s">'begin processing'</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">JOB_COUNT</span><span class="p">):</span>
        <span class="k">await</span> <span class="n">queue</span><span class="p">.</span><span class="n">put</span><span class="p">(</span><span class="n">process</span><span class="p">())</span>
    <span class="k">await</span> <span class="n">queue</span><span class="p">.</span><span class="n">join</span><span class="p">()</span>
    <span class="k">print</span><span class="p">(</span><span class="s">'end processing'</span><span class="p">)</span>

    <span class="k">for</span> <span class="n">w</span> <span class="ow">in</span> <span class="n">workers</span><span class="p">:</span>
        <span class="n">w</span><span class="p">.</span><span class="n">cancel</span><span class="p">()</span>

    <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">task_done()</code> and <code class="language-plaintext highlighter-rouge">join()</code> methods make it trivial synchronize on
full job completion. I also take the time to destroy the worker tasks.
It’s harmless to leave them blocked on the queue. They’ll be garbage
collected so it’s not a resource leak. However, CPython complains about
garbage collecting running tasks because it looks like a mistake — and
it usually is.</p>

<p>If you read carefully you might have noticed the queue’s maximum size is
set to 1: not much of a “queue”! <a href="https://golang.org/">Go</a> developers will recognize this
as being (nearly) an <em>unbuffered channel</em>, the default and most common
kind of channel. So it’s more a synchronized rendezvous between producer
(<code class="language-plaintext highlighter-rouge">put()</code>) and consumer (<code class="language-plaintext highlighter-rouge">get()</code>). The producer waits at the queue with a
job until a task is free to come take it. A task waits at the queue
until a producer arrives with a job for it.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>heartbeat delay = 0.001s
heartbeat delay = 0.001s
begin processing
heartbeat delay = 0.014s
heartbeat delay = 0.020s
end processing
heartbeat delay = 0.002s
heartbeat delay = 0.001s
</code></pre></div></div>

<p>The output shows that the impact to the heartbeat was modest — about
the best we could hope for from async/await — and the heartbeat
continued while jobs were running. The more concurrency — the more
worker tasks running on the queue — the greater the latency.</p>

<p>Note: Increasing the <code class="language-plaintext highlighter-rouge">WORKER_COUNT</code> in this toy example won’t have an
impact on latency since the jobs aren’t actually concurrent. They start,
run, and complete before another worker task can draw from the queue.
Putting a couple awaits in <code class="language-plaintext highlighter-rouge">process()</code> allows for concurrency:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">WORKER_COUNT</span> <span class="o">=</span> <span class="mi">200</span>

<span class="k">async</span> <span class="k">def</span> <span class="nf">process</span><span class="p">():</span>
    <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mf">0.01</span><span class="p">)</span>
    <span class="n">time</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">JOB_DURATION</span><span class="p">)</span>
    <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mf">0.01</span><span class="p">)</span>
</code></pre></div></div>

<p>Since there are so many worker tasks, this is back to the initial
problem:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>heartbeat delay = 0.001s
heartbeat delay = 0.001s
begin processing
heartbeat delay = 1.655s
end processing
heartbeat delay = 0.001s
heartbeat delay = 0.001s
</code></pre></div></div>

<p>As <code class="language-plaintext highlighter-rouge">WORKER_COUNT</code> decreases, so does heartbeat latency.</p>

<h3 id="unbounded-queues">Unbounded queues</h3>

<p>Here’s another defect from the same program. Create an unbounded queue,
a producer, and a consumer. The consumer prints the queue size so we can
see what’s happening:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">def</span> <span class="nf">producer_consumer</span><span class="p">():</span>
    <span class="n">queue</span> <span class="o">=</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">Queue</span><span class="p">()</span>
    <span class="n">done</span> <span class="o">=</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">Condition</span><span class="p">()</span>

    <span class="k">async</span> <span class="k">def</span> <span class="nf">producer</span><span class="p">():</span>
        <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">100_000</span><span class="p">):</span>
            <span class="k">await</span> <span class="n">queue</span><span class="p">.</span><span class="n">put</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
        <span class="k">await</span> <span class="n">queue</span><span class="p">.</span><span class="n">join</span><span class="p">()</span>
        <span class="k">async</span> <span class="k">with</span> <span class="n">done</span><span class="p">:</span>
            <span class="n">done</span><span class="p">.</span><span class="n">notify</span><span class="p">()</span>

    <span class="k">async</span> <span class="k">def</span> <span class="nf">consumer</span><span class="p">():</span>
        <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
            <span class="k">await</span> <span class="n">queue</span><span class="p">.</span><span class="n">get</span><span class="p">()</span>
            <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">'qsize = </span><span class="si">{</span><span class="n">queue</span><span class="p">.</span><span class="n">qsize</span><span class="p">()</span><span class="si">}</span><span class="s">'</span><span class="p">)</span>
            <span class="n">queue</span><span class="p">.</span><span class="n">task_done</span><span class="p">()</span>

    <span class="n">asyncio</span><span class="p">.</span><span class="n">create_task</span><span class="p">(</span><span class="n">producer</span><span class="p">())</span>
    <span class="n">asyncio</span><span class="p">.</span><span class="n">create_task</span><span class="p">(</span><span class="n">consumer</span><span class="p">())</span>

    <span class="k">async</span> <span class="k">with</span> <span class="n">done</span><span class="p">:</span>
        <span class="k">await</span> <span class="n">done</span><span class="p">.</span><span class="n">wait</span><span class="p">()</span>
</code></pre></div></div>

<p>The output of this program begins:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>qsize = 99999
qsize = 99998
qsize = 99997
qsize = 99996
...
</code></pre></div></div>

<p>So the entire queue is populated before the consumer does anything at
all: tons of latency for whatever is being consumed. Since the queue is
unbounded, the producer never needs to yield. You might be tempted to
use <code class="language-plaintext highlighter-rouge">asyncio.sleep(0)</code> in the producer to yield explicitly:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">async</span> <span class="k">def</span> <span class="nf">producer</span><span class="p">():</span>
        <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">100_000</span><span class="p">):</span>
            <span class="k">await</span> <span class="n">queue</span><span class="p">.</span><span class="n">put</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
            <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>  <span class="c1"># yield
</span>        <span class="k">await</span> <span class="n">queue</span><span class="p">.</span><span class="n">join</span><span class="p">()</span>
        <span class="k">async</span> <span class="k">with</span> <span class="n">done</span><span class="p">:</span>
            <span class="n">done</span><span class="p">.</span><span class="n">notify</span><span class="p">()</span>
</code></pre></div></div>

<p>This even seems to work! The output looks like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>qsize = 0
qsize = 0
qsize = 0
qsize = 0
</code></pre></div></div>

<p>However, this is fragile and not a real solution. If the consumer yields
just two times in its own loop, its nearly back to where we started:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">async</span> <span class="k">def</span> <span class="nf">consumer</span><span class="p">():</span>
        <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
            <span class="k">await</span> <span class="n">queue</span><span class="p">.</span><span class="n">get</span><span class="p">()</span>
            <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">'qsize = </span><span class="si">{</span><span class="n">queue</span><span class="p">.</span><span class="n">qsize</span><span class="p">()</span><span class="si">}</span><span class="s">'</span><span class="p">)</span>
            <span class="n">queue</span><span class="p">.</span><span class="n">task_done</span><span class="p">()</span>
            <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
            <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
</code></pre></div></div>

<p>The output shows that the producer gradually creeps ahead of the
consumer. On each consumer iteration, the producer iterates twice:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>qsize = 0
qsize = 1
qsize = 2
qsize = 3
...
</code></pre></div></div>

<p>There’s a really simple solution to this: <a href="https://lucumr.pocoo.org/2020/1/1/async-pressure/">Never, ever use unbounded
queues.</a> In fact <strong>every unbounded <code class="language-plaintext highlighter-rouge">asyncio.Queue()</code> is a bug</strong>.
It’s a serious API defect that asyncio allows unbounded queues to be
created at all. The default <code class="language-plaintext highlighter-rouge">maxsize</code> should have been <em>actually</em> zero
(unbuffered), not infinite. Because unbounded is the default, virtually
every example of <code class="language-plaintext highlighter-rouge">asyncio.Queue</code> — online, offline, and even the
official documentation — is broken in some way.</p>

<h3 id="important-takeaways">Important takeaways</h3>

<ol>
  <li>The default <code class="language-plaintext highlighter-rouge">asyncio.Queue()</code> is <em>always</em> wrong.</li>
  <li><code class="language-plaintext highlighter-rouge">asyncio.sleep(0)</code> is <em>nearly always</em> used incorrectly.</li>
  <li>Use a <code class="language-plaintext highlighter-rouge">maxsize=1</code> job queue instead of spawning many identical tasks.</li>
</ol>

<p>Python linters should be updated to warn about 1 and 2 by default.</p>

<p>Update: A couple of people have pointed out <a href="https://trio.readthedocs.io/en/stable/reference-core.html#buffering-in-channels">an argument in the Trio
documentation for unbounded queues</a>. This argument conflates two
different concepts: data structure queues and concurrent communication
infrastructure queues. To distinguish, the latter is often called a
channel. An unbounded <em>queue</em> (<code class="language-plaintext highlighter-rouge">collections.deque</code>) is necessary, but
and unbounded <em>channel</em> (<code class="language-plaintext highlighter-rouge">asyncio.Queue</code>) is always wrong. The Trio
documentation describes a web crawler, which is fundamentally a
breadth-first search (read: queue-oriented) of a graph. So this is a
plain old BFS queue, not a channel, which is why it’s reasonable for it
to be unbounded.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Endlessh: an SSH Tarpit</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/03/22/"/>
    <id>urn:uuid:5429ee15-3d42-4af2-8690-f7f402870dd0</id>
    <updated>2019-03-22T17:26:45Z</updated>
    <category term="netsec"/><category term="python"/><category term="c"/><category term="posix"/><category term="asyncio"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=19465967">on Hacker News</a> (<a href="https://news.ycombinator.com/item?id=24491453">later</a>), <a href="https://old.reddit.com/r/programming/comments/b4iq00/endlessh_an_ssh_tarpit/">on
reddit</a> (<a href="https://old.reddit.com/r/netsec/comments/b4dwjl/endlessh_an_ssh_tarpit/">also</a>), featured in <a href="https://www.youtube.com/watch?v=bM65iyRRW0A&amp;t=3m52s">BSD Now 294</a>.
Also check out <a href="https://github.com/bediger4000/ssh-tarpit-behavior">this Endlessh analysis</a>.</em></p>

<p>I’m a big fan of tarpits: a network service that intentionally inserts
delays in its protocol, slowing down clients by forcing them to wait.
This arrests the speed at which a bad actor can attack or probe the
host system, and it ties up some of the attacker’s resources that
might otherwise be spent attacking another host. When done well, a
tarpit imposes more cost on the attacker than the defender.</p>

<!--more-->

<p>The Internet is a very hostile place, and anyone who’s ever stood up
an Internet-facing IPv4 host has witnessed the immediate and
continuous attacks against their server. I’ve maintained <a href="/blog/2017/06/15/">such a
server</a> for nearly six years now, and more than 99% of my
incoming traffic has ill intent. One part of my defenses has been
tarpits in various forms. The latest addition is an SSH tarpit I wrote
a couple of months ago:</p>

<p><a href="https://github.com/skeeto/endlessh"><strong>Endlessh: an SSH tarpit</strong></a></p>

<p>This program opens a socket and pretends to be an SSH server. However,
it actually just ties up SSH clients with false promises indefinitely
— or at least until the client eventually gives up. After cloning the
repository, here’s how you can try it out for yourself (default port
2222):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ make
$ ./endlessh &amp;
$ ssh -p2222 localhost
</code></pre></div></div>

<p>Your SSH client will hang there and wait for at least several days
before finally giving up. Like a mammoth in the La Brea Tar Pits, it
got itself stuck and can’t get itself out. As I write, my
Internet-facing SSH tarpit currently has 27 clients trapped in it. A
few of these have been connected for weeks. In one particular spike it
had 1,378 clients trapped at once, lasting about 20 hours.</p>

<p>My Internet-facing Endlessh server listens on port 22, which is the
standard SSH port. I long ago moved my real SSH server off to another
port where it sees a whole lot less SSH traffic — essentially none.
This makes the logs a whole lot more manageable. And (hopefully)
Endlessh convinces attackers not to look around for an SSH server on
another port.</p>

<p>How does it work? Endlessh exploits <a href="https://tools.ietf.org/html/rfc4253#section-4.2">a little paragraph in RFC
4253</a>, the SSH protocol specification. Immediately after the TCP
connection is established, and before negotiating the cryptography,
both ends send an identification string:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>SSH-protoversion-softwareversion SP comments CR LF
</code></pre></div></div>

<p>The RFC also notes:</p>

<blockquote>
  <p>The server MAY send other lines of data before sending the version
string.</p>
</blockquote>

<p>There is no limit on the number of lines, just that these lines must
not begin with “SSH-“ since that would be ambiguous with the
identification string, and lines must not be longer than 255
characters including CRLF. So <strong>Endlessh sends and <em>endless</em> stream of
randomly-generated “other lines of data”</strong> without ever intending to
send a version string. By default it waits 10 seconds between each
line. This slows down the protocol, but prevents it from actually
timing out.</p>

<p>This means Endlessh need not know anything about cryptography or the
vast majority of the SSH protocol. It’s dead simple.</p>

<h3 id="implementation-strategies">Implementation strategies</h3>

<p>Ideally the tarpit’s resource footprint should be as small as
possible. It’s just a security tool, and the server does have an
actual purpose that doesn’t include being a tarpit. It should tie up
the attacker’s resources, not the server’s, and should generally be
unnoticeable. (Take note all those who write the awful “security”
products I have to tolerate at my day job.)</p>

<p>Even when many clients have been trapped, Endlessh spends more than
99.999% of its time waiting around, doing nothing. It wouldn’t even be
accurate to call it I/O-bound. If anything, it’s <em>timer-bound</em>,
waiting around before sending off the next line of data. <strong>The most
precious resource to conserve is <em>memory</em>.</strong></p>

<h4 id="processes">Processes</h4>

<p>The most straightforward way to implement something like Endlessh is a
fork server: accept a connection, fork, and the child simply alternates
between <code class="language-plaintext highlighter-rouge">sleep(3)</code> and <code class="language-plaintext highlighter-rouge">write(2)</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="p">(;;)</span> <span class="p">{</span>
    <span class="kt">ssize_t</span> <span class="n">r</span><span class="p">;</span>
    <span class="kt">char</span> <span class="n">line</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>

    <span class="n">sleep</span><span class="p">(</span><span class="n">DELAY</span><span class="p">);</span>
    <span class="n">generate_line</span><span class="p">(</span><span class="n">line</span><span class="p">);</span>
    <span class="n">r</span> <span class="o">=</span> <span class="n">write</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">line</span><span class="p">,</span> <span class="n">strlen</span><span class="p">(</span><span class="n">line</span><span class="p">));</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">r</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span> <span class="o">&amp;&amp;</span> <span class="n">errno</span> <span class="o">!=</span> <span class="n">EINTR</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>A process per connection is a lot of overhead when connections are
expected to be up hours or even weeks at a time. An attacker who knows
about this could exhaust the server’s resources with little effort by
opening up lots of connections.</p>

<h4 id="threads">Threads</h4>

<p>A better option is, instead of processes, to create a thread per
connection. On Linux <a href="/blog/2015/05/15/">this is practically the same thing</a>, but it’s
still better. However, you still have to allocate a stack for the thread
and the kernel will have to spend some resources managing the thread.</p>

<h4 id="poll">Poll</h4>

<p>For Endlessh I went for an even more lightweight version: a
single-threaded <code class="language-plaintext highlighter-rouge">poll(2)</code> server, analogous to stackless green threads.
The overhead per connection is about as low as it gets.</p>

<p>Clients that are being delayed are not registered in <code class="language-plaintext highlighter-rouge">poll(2)</code>. Their
only overhead is the socket object in the kernel, and another 78 bytes
to track them in Endlessh. Most of those bytes are used only for
accurate logging. Only those clients that are overdue for a new line
are registered for <code class="language-plaintext highlighter-rouge">poll(2)</code>.</p>

<p>When clients are waiting, but no clients are overdue, <code class="language-plaintext highlighter-rouge">poll(2)</code> is
essentially used in place of <code class="language-plaintext highlighter-rouge">sleep(3)</code>. Though since it still needs
to manage the <em>accept</em> server socket, it (almost) never actually waits
on <em>nothing</em>.</p>

<p>There’s an option to limit the total number of client connections so
that it doesn’t get out of hand. In this case it will stop polling the
accept socket until a client disconnects. I probably shouldn’t have
bothered with this option and instead relied on <code class="language-plaintext highlighter-rouge">ulimit</code>, a feature
already provided by the operating system.</p>

<p>I could have used epoll (Linux) or kqueue (BSD), which would be much
more efficient than <code class="language-plaintext highlighter-rouge">poll(2)</code>. The problem with <code class="language-plaintext highlighter-rouge">poll(2)</code> is that it’s
constantly registering and unregistering Endlessh on each of the
overdue sockets each time around the main loop. This is by far the
most CPU-intensive part of Endlessh, and it’s all inflicted on the
kernel. Most of the time, even with thousands of clients trapped in
the tarpit, only a small number of them at polled at once, so I opted
for better portability instead.</p>

<p>One consequence of not polling connections that are waiting is that
disconnections aren’t noticed in a timely fashion. This makes the logs
less accurate than I like, but otherwise it’s pretty harmless.
Unforunately even if I wanted to fix this, the <code class="language-plaintext highlighter-rouge">poll(2)</code> interface
isn’t quite equipped for it anyway.</p>

<h4 id="raw-sockets">Raw sockets</h4>

<p>With a <code class="language-plaintext highlighter-rouge">poll(2)</code> server, the biggest overhead remaining is in the
kernel, where it allocates send and receive buffers for each client
and manages the proper TCP state. The next step to reducing this
overhead is Endlessh opening a <em>raw socket</em> and speaking TCP itself,
bypassing most of the operating system’s TCP/IP stack.</p>

<p>Much of the TCP connection state doesn’t matter to Endlessh and doesn’t
need to be tracked. For example, it doesn’t care about any data sent by
the client, so no receive buffer is needed, and any data that arrives
could be dropped on the floor.</p>

<p>Even more, raw sockets would allow for some even nastier tarpit tricks.
Despite the long delays between data lines, the kernel itself responds
very quickly on the TCP layer and below. ACKs are sent back quickly and
so on. An astute attacker could detect that the delay is artificial,
imposed above the TCP layer by an application.</p>

<p>If Endlessh worked at the TCP layer, it could <a href="https://nyman.re/super-simple-ssh-tarpit/">tarpit the TCP protocol
itself</a>. It could introduce artificial “noise” to the connection
that requires packet retransmissions, delay ACKs, etc. It would look a
lot more like network problems than a tarpit.</p>

<p>I haven’t taken Endlessh this far, nor do I plan to do so. At the
moment attackers either have a hard timeout, so this wouldn’t matter,
or they’re pretty dumb and Endlessh already works well enough.</p>

<h3 id="asyncio-and-other-tarpits">asyncio and other tarpits</h3>

<p>Since writing Endless <a href="/blog/2019/03/10/">I’ve learned about Python’s <code class="language-plaintext highlighter-rouge">asyncio</code></a>, and
it’s actually a near perfect fit for this problem. I should have just
used it in the first place. The hard part is already implemented within
<code class="language-plaintext highlighter-rouge">asyncio</code>, and the problem isn’t CPU-bound, so being written in Python
<a href="/blog/2019/02/24/">doesn’t matter</a>.</p>

<p>Here’s a simplified (no logging, no configuration, etc.) version of
Endlessh implemented in about 20 lines of Python 3.7:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">asyncio</span>
<span class="kn">import</span> <span class="nn">random</span>

<span class="k">async</span> <span class="k">def</span> <span class="nf">handler</span><span class="p">(</span><span class="n">_reader</span><span class="p">,</span> <span class="n">writer</span><span class="p">):</span>
    <span class="k">try</span><span class="p">:</span>
        <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
            <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
            <span class="n">writer</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="sa">b</span><span class="s">'%x</span><span class="se">\r\n</span><span class="s">'</span> <span class="o">%</span> <span class="n">random</span><span class="p">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="o">**</span><span class="mi">32</span><span class="p">))</span>
            <span class="k">await</span> <span class="n">writer</span><span class="p">.</span><span class="n">drain</span><span class="p">()</span>
    <span class="k">except</span> <span class="nb">ConnectionResetError</span><span class="p">:</span>
        <span class="k">pass</span>

<span class="k">async</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
    <span class="n">server</span> <span class="o">=</span> <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">start_server</span><span class="p">(</span><span class="n">handler</span><span class="p">,</span> <span class="s">'0.0.0.0'</span><span class="p">,</span> <span class="mi">2222</span><span class="p">)</span>
    <span class="k">async</span> <span class="k">with</span> <span class="n">server</span><span class="p">:</span>
        <span class="k">await</span> <span class="n">server</span><span class="p">.</span><span class="n">serve_forever</span><span class="p">()</span>

<span class="n">asyncio</span><span class="p">.</span><span class="n">run</span><span class="p">(</span><span class="n">main</span><span class="p">())</span>
</code></pre></div></div>

<p>Since Python coroutines are stackless, the per-connection memory
overhead is comparable to the C version. So it seems asyncio is
perfectly suited for writing tarpits! Here’s an HTTP tarpit to trip up
attackers trying to exploit HTTP servers. It slowly sends a random,
endless HTTP header:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">asyncio</span>
<span class="kn">import</span> <span class="nn">random</span>

<span class="k">async</span> <span class="k">def</span> <span class="nf">handler</span><span class="p">(</span><span class="n">_reader</span><span class="p">,</span> <span class="n">writer</span><span class="p">):</span>
    <span class="n">writer</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="sa">b</span><span class="s">'HTTP/1.1 200 OK</span><span class="se">\r\n</span><span class="s">'</span><span class="p">)</span>
    <span class="k">try</span><span class="p">:</span>
        <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
            <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
            <span class="n">header</span> <span class="o">=</span> <span class="n">random</span><span class="p">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="o">**</span><span class="mi">32</span><span class="p">)</span>
            <span class="n">value</span> <span class="o">=</span> <span class="n">random</span><span class="p">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="o">**</span><span class="mi">32</span><span class="p">)</span>
            <span class="n">writer</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="sa">b</span><span class="s">'X-%x: %x</span><span class="se">\r\n</span><span class="s">'</span> <span class="o">%</span> <span class="p">(</span><span class="n">header</span><span class="p">,</span> <span class="n">value</span><span class="p">))</span>
            <span class="k">await</span> <span class="n">writer</span><span class="p">.</span><span class="n">drain</span><span class="p">()</span>
    <span class="k">except</span> <span class="nb">ConnectionResetError</span><span class="p">:</span>
        <span class="k">pass</span>

<span class="k">async</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
    <span class="n">server</span> <span class="o">=</span> <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">start_server</span><span class="p">(</span><span class="n">handler</span><span class="p">,</span> <span class="s">'0.0.0.0'</span><span class="p">,</span> <span class="mi">8080</span><span class="p">)</span>
    <span class="k">async</span> <span class="k">with</span> <span class="n">server</span><span class="p">:</span>
        <span class="k">await</span> <span class="n">server</span><span class="p">.</span><span class="n">serve_forever</span><span class="p">()</span>

<span class="n">asyncio</span><span class="p">.</span><span class="n">run</span><span class="p">(</span><span class="n">main</span><span class="p">())</span>
</code></pre></div></div>

<p>Try it out for yourself. Firefox and Chrome will spin on that server
for hours before giving up. I have yet to see curl actually timeout on
its own in the default settings (<code class="language-plaintext highlighter-rouge">--max-time</code>/<code class="language-plaintext highlighter-rouge">-m</code> does work
correctly, though).</p>

<p>Parting exercise for the reader: Using the examples above as a starting
point, implement an SMTP tarpit using asyncio. Bonus points for using
TLS connections and testing it against real spammers.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>An Async / Await Library for Emacs Lisp</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/03/10/"/>
    <id>urn:uuid:5d1462fa-a30d-432e-9a4f-827eb67862b2</id>
    <updated>2019-03-10T20:57:03Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="lisp"/><category term="python"/><category term="javascript"/><category term="lang"/><category term="asyncio"/>
    <content type="html">
      <![CDATA[<p>As part of <a href="/blog/2019/02/24/">building my Python proficiency</a>, I’ve learned how to
use <a href="https://docs.python.org/3/library/asyncio.html">asyncio</a>. This new language feature <a href="https://docs.python.org/3/whatsnew/3.5.html#whatsnew-pep-492">first appeared in
Python 3.5</a> (<a href="https://www.python.org/dev/peps/pep-0492/">PEP 492</a>, September 2015). JavaScript grew <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/async_function">a
nearly identical feature</a> in ES2017 (June 2017). An async function
can pause to await on an asynchronously computed result, much like a
generator pausing when it yields a value.</p>

<p>In fact, both Python and JavaScript async functions are essentially just
fancy generator functions with some specialized syntax and semantics.
That is, they’re <a href="https://blog.varunramesh.net/posts/stackless-vs-stackful-coroutines/">stackless coroutines</a>. Both languages already had
generators, so their generator-like async functions are a natural
extension that — unlike <a href="/blog/2017/06/21/"><em>stackful</em> coroutines</a> — do not require
significant, new runtime plumbing.</p>

<p>Emacs <a href="/blog/2018/05/31/">officially got generators in 25.1</a> (September 2016),
though, unlike Python and JavaScript, it didn’t require any additional
support from the compiler or runtime. It’s implemented entirely using
Lisp macros. In other words, it’s just another library, not a core
language feature. In theory, the generator library could be easily
backported to the first Emacs release to <a href="/blog/2016/12/22/">properly support lexical
closures</a>, Emacs 24.1 (June 2012).</p>

<p>For the same reason, stackless async/await coroutines can also be
implemented as a library. So that’s what I did, letting Emacs’ generator
library do most of the heavy lifting. The package is called <code class="language-plaintext highlighter-rouge">aio</code>:</p>

<ul>
  <li><strong><a href="https://github.com/skeeto/emacs-aio">https://github.com/skeeto/emacs-aio</a></strong></li>
</ul>

<p>It’s modeled more closely on JavaScript’s async functions than Python’s
asyncio, with the core representation being <em>promises</em> rather than a
coroutine objects. I just have an easier time reasoning about promises
than coroutines.</p>

<p>I’m definitely <a href="https://github.com/chuntaro/emacs-async-await">not the first person to realize this was
possible</a>, and was beaten to the punch by two years. Wanting to
<a href="http://www.winestockwebdesign.com/Essays/Lisp_Curse.html">avoid fragmentation</a>, I set aside all formality in my first
iteration on the idea, not even bothering with namespacing my
identifiers. It was to be only an educational exercise. However, I got
quite attached to my little toy. Once I got my head wrapped around the
problem, everything just sort of clicked into place so nicely.</p>

<p>In this article I will show step-by-step one way to build async/await
on top of generators, laying out one concept at a time and then
building upon each. But first, some examples to illustrate the desired
final result.</p>

<h3 id="aio-example">aio example</h3>

<p>Ignoring <a href="/blog/2016/06/16/">all its problems</a> for a moment, suppose you want to use
<code class="language-plaintext highlighter-rouge">url-retrieve</code> to fetch some content from a URL and return it. To keep
this simple, I’m going to omit error handling. Also assume that
<code class="language-plaintext highlighter-rouge">lexical-binding</code> is <code class="language-plaintext highlighter-rouge">t</code> for all examples. Besides, lexical scope
required by the generator library, and therefore also required by <code class="language-plaintext highlighter-rouge">aio</code>.</p>

<p>The most naive approach is to fetch the content synchronously:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">fetch-fortune-1</span> <span class="p">(</span><span class="nv">url</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">buffer</span> <span class="p">(</span><span class="nv">url-retrieve-synchronously</span> <span class="nv">url</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">with-current-buffer</span> <span class="nv">buffer</span>
      <span class="p">(</span><span class="nb">prog1</span> <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">kill-buffer</span><span class="p">)))))</span>
</code></pre></div></div>

<p>The result is returned directly, and errors are communicated by an error
signal (e.g. Emacs’ version of exceptions). This is convenient, but the
function will block the main thread, locking up Emacs until the result
has arrived. This is obviously very undesirable, so, in practice,
everyone nearly always uses the asynchronous version:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">fetch-fortune-2</span> <span class="p">(</span><span class="nv">url</span> <span class="nv">callback</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">url-retrieve</span> <span class="nv">url</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">_status</span><span class="p">)</span>
                      <span class="p">(</span><span class="nb">funcall</span> <span class="nv">callback</span> <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)))))</span>
</code></pre></div></div>

<p>The main thread no longer blocks, but it’s a whole lot less
convenient. The result isn’t returned to the caller, and instead the
caller supplies a callback function. The result, whether success or
failure, will be delivered via callback, so the caller must split
itself into two pieces: the part before the callback and the callback
itself. Errors cannot be delivered using a error signal because of the
inverted flow control.</p>

<p>The situation gets worse if, say, you need to fetch results from two
different URLs. You either fetch results one at a time (inefficient),
or you manage two different callbacks that could be invoked in any
order, and therefore have to coordinate.</p>

<p><em>Wouldn’t it be nice for the function to work like the first example,
but be asynchronous like the second example?</em> Enter async/await:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">aio-defun</span> <span class="nv">fetch-fortune-3</span> <span class="p">(</span><span class="nv">url</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">buffer</span> <span class="p">(</span><span class="nv">aio-await</span> <span class="p">(</span><span class="nv">aio-url-retrieve</span> <span class="nv">url</span><span class="p">))))</span>
    <span class="p">(</span><span class="nv">with-current-buffer</span> <span class="nv">buffer</span>
      <span class="p">(</span><span class="nb">prog1</span> <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">kill-buffer</span><span class="p">)))))</span>
</code></pre></div></div>

<p>A function defined with <code class="language-plaintext highlighter-rouge">aio-defun</code> is just like <code class="language-plaintext highlighter-rouge">defun</code> except that
it can use <code class="language-plaintext highlighter-rouge">aio-await</code> to pause and wait on any other function defined
with <code class="language-plaintext highlighter-rouge">aio-defun</code> — or, more specifically, any function that returns a
promise. Borrowing Python parlance: Returning a promise makes a
function <em>awaitable</em>. If there’s an error, it’s delivered as a error
signal from <code class="language-plaintext highlighter-rouge">aio-url-retrieve</code>, just like the first example. When
called, this function returns immediately with a promise object that
represents a future result. The caller might look like this:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defcustom</span> <span class="nv">fortune-url</span> <span class="o">...</span><span class="p">)</span>

<span class="p">(</span><span class="nv">aio-defun</span> <span class="nv">display-fortune</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">message</span> <span class="s">"%s"</span> <span class="p">(</span><span class="nv">aio-await</span> <span class="p">(</span><span class="nv">fetch-fortune-3</span> <span class="nv">fortune-url</span><span class="p">))))</span>
</code></pre></div></div>

<p>How wonderfully clean that looks! And, yes, it even works with
<code class="language-plaintext highlighter-rouge">interactive</code> like that. I can <code class="language-plaintext highlighter-rouge">M-x display-fortune</code> and a fortune is
printed in the minibuffer as soon as the result arrives from the
server. In the meantime Emacs doesn’t block and I can continue my
work.</p>

<p>You can’t do anything you couldn’t already do before. It’s just a
nicer way to organize the same callbacks: <em>implicit</em> rather than
<em>explicit</em>.</p>

<h3 id="promises-simplified">Promises, simplified</h3>

<p>The core object at play is the <em>promise</em>. Promises are already a
rather simple concept, but <code class="language-plaintext highlighter-rouge">aio</code> promises have been distilled to their
essence, as they’re only needed for this singular purpose. More on
this later.</p>

<p>As I said, a promise represents a future result. In practical terms, a
promise is just an object to which one can subscribe with a callback.
When the result is ready, the callbacks are invoked. Another way to
put it is that <em>promises <a href="https://en.wikipedia.org/wiki/Reification_(computer_science)">reify</a> the concept of callbacks</em>. A
callback is no longer just the idea of extra argument on a function.
It’s a first-class <em>thing</em> that itself can be passed around as a
value.</p>

<p>Promises have two slots: the final promise <em>result</em> and a list of
<em>subscribers</em>. A <code class="language-plaintext highlighter-rouge">nil</code> result means the result hasn’t been computed
yet. It’s so simple I’m not even <a href="/blog/2018/02/14/">bothering with <code class="language-plaintext highlighter-rouge">cl-struct</code></a>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">aio-promise</span> <span class="p">()</span>
  <span class="s">"Create a new promise object."</span>
  <span class="p">(</span><span class="nv">record</span> <span class="ss">'aio-promise</span> <span class="no">nil</span> <span class="p">()))</span>

<span class="p">(</span><span class="nv">defsubst</span> <span class="nv">aio-promise-p</span> <span class="p">(</span><span class="nv">object</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">eq</span> <span class="ss">'aio-promise</span> <span class="p">(</span><span class="nb">type-of</span> <span class="nv">object</span><span class="p">))</span>
       <span class="p">(</span><span class="nb">=</span> <span class="mi">3</span> <span class="p">(</span><span class="nb">length</span> <span class="nv">object</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">defsubst</span> <span class="nv">aio-result</span> <span class="p">(</span><span class="nv">promise</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">aref</span> <span class="nv">promise</span> <span class="mi">1</span><span class="p">))</span>
</code></pre></div></div>

<p>To subscribe to a promise, use <code class="language-plaintext highlighter-rouge">aio-listen</code>:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">aio-listen</span> <span class="p">(</span><span class="nv">promise</span> <span class="nv">callback</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">result</span> <span class="p">(</span><span class="nv">aio-result</span> <span class="nv">promise</span><span class="p">)))</span>
    <span class="p">(</span><span class="k">if</span> <span class="nv">result</span>
        <span class="p">(</span><span class="nv">run-at-time</span> <span class="mi">0</span> <span class="no">nil</span> <span class="nv">callback</span> <span class="nv">result</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">push</span> <span class="nv">callback</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">promise</span> <span class="mi">2</span><span class="p">)))))</span>
</code></pre></div></div>

<p>If the result isn’t ready yet, add the callback to the list of
subscribers. If the result is ready <em>call the callback in the next
event loop turn</em> using <code class="language-plaintext highlighter-rouge">run-at-time</code>. This is important because it
keeps all the asynchronous components isolated from one another. They
won’t see each others’ frames on the call stack, nor frames from
<code class="language-plaintext highlighter-rouge">aio</code>. This is so important that the <a href="https://promisesaplus.com/">Promises/A+ specification</a>
is explicit about it.</p>

<p>The other half of the equation is resolving a promise, which is done
with <code class="language-plaintext highlighter-rouge">aio-resolve</code>. Unlike other promises, <code class="language-plaintext highlighter-rouge">aio</code> promises don’t care
whether the promise is being <em>fulfilled</em> (success) or <em>rejected</em>
(error). Instead a promise is resolved using a <em>value function</em> — or,
usually, a <em>value closure</em>. Subscribers receive this value function
and extract the value by invoking it with no arguments.</p>

<p>Why? This lets the promise’s resolver decide the semantics of the
result. Instead of returning a value, this function can instead signal
an error, propagating an error signal that terminated an async function.
Because of this, the promise doesn’t need to know how it’s being
resolved.</p>

<p>When a promise is resolved, subscribers are each scheduled in their own
event loop turns in the same order that they subscribed. If a promise
has already been resolved, nothing happens. (Thought: Perhaps this
should be an error in order to catch API misuse?)</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">aio-resolve</span> <span class="p">(</span><span class="nv">promise</span> <span class="nv">value-function</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">unless</span> <span class="p">(</span><span class="nv">aio-result</span> <span class="nv">promise</span><span class="p">)</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">callbacks</span> <span class="p">(</span><span class="nb">nreverse</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">promise</span> <span class="mi">2</span><span class="p">))))</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">promise</span> <span class="mi">1</span><span class="p">)</span> <span class="nv">value-function</span>
            <span class="p">(</span><span class="nb">aref</span> <span class="nv">promise</span> <span class="mi">2</span><span class="p">)</span> <span class="p">())</span>
      <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">callback</span> <span class="nv">callbacks</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">run-at-time</span> <span class="mi">0</span> <span class="no">nil</span> <span class="nv">callback</span> <span class="nv">value-function</span><span class="p">)))))</span>
</code></pre></div></div>

<p>If you’re not an async function, you might subscribe to a promise like
so:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">aio-listen</span> <span class="nv">promise</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">v</span><span class="p">)</span>
                      <span class="p">(</span><span class="nv">message</span> <span class="s">"%s"</span> <span class="p">(</span><span class="nb">funcall</span> <span class="nv">v</span><span class="p">))))</span>
</code></pre></div></div>

<p>The simplest example of a non-async function that creates and delivers
on a promise is a “sleep” function:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">aio-sleep</span> <span class="p">(</span><span class="nv">seconds</span> <span class="k">&amp;optional</span> <span class="nv">result</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">promise</span> <span class="p">(</span><span class="nv">aio-promise</span><span class="p">))</span>
        <span class="p">(</span><span class="nv">value-function</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="nv">result</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">prog1</span> <span class="nv">promise</span>
      <span class="p">(</span><span class="nv">run-at-time</span> <span class="nv">seconds</span> <span class="no">nil</span>
                   <span class="nf">#'</span><span class="nv">aio-resolve</span> <span class="nv">promise</span> <span class="nv">value-function</span><span class="p">))))</span>
</code></pre></div></div>

<p>Similarly, here’s a “timeout” promise that delivers a special timeout
error signal at a given time in the future.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">aio-timeout</span> <span class="p">(</span><span class="nv">seconds</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">promise</span> <span class="p">(</span><span class="nv">aio-promise</span><span class="p">))</span>
        <span class="p">(</span><span class="nv">value-function</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="p">(</span><span class="nb">signal</span> <span class="ss">'aio-timeout</span> <span class="no">nil</span><span class="p">))))</span>
    <span class="p">(</span><span class="nb">prog1</span> <span class="nv">promise</span>
      <span class="p">(</span><span class="nv">run-at-time</span> <span class="nv">seconds</span> <span class="no">nil</span>
                   <span class="nf">#'</span><span class="nv">aio-resolve</span> <span class="nv">promise</span> <span class="nv">value-function</span><span class="p">))))</span>
</code></pre></div></div>

<p>That’s all there is to promises.</p>

<h3 id="evaluate-in-the-context-of-a-promise">Evaluate in the context of a promise</h3>

<p>Before we get into pausing functions, lets deal with the slightly
simpler matter of delivering their return values using a promise. What
we need is a way to evaluate a “body” and capture its result in a
promise. If the body exits due to a signal, we want to capture that as
well.</p>

<p>Here’s a macro that does just this:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmacro</span> <span class="nv">aio-with-promise</span> <span class="p">(</span><span class="nv">promise</span> <span class="k">&amp;rest</span> <span class="nv">body</span><span class="p">)</span>
  <span class="o">`</span><span class="p">(</span><span class="nv">aio-resolve</span> <span class="o">,</span><span class="nv">promise</span>
                <span class="p">(</span><span class="nv">condition-case</span> <span class="nb">error</span>
                    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">result</span> <span class="p">(</span><span class="k">progn</span> <span class="o">,@</span><span class="nv">body</span><span class="p">)))</span>
                      <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="nv">result</span><span class="p">))</span>
                  <span class="p">(</span><span class="nb">error</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
                           <span class="p">(</span><span class="nb">signal</span> <span class="p">(</span><span class="nb">car</span> <span class="nb">error</span><span class="p">)</span> <span class="c1">; rethrow</span>
                                   <span class="p">(</span><span class="nb">cdr</span> <span class="nb">error</span><span class="p">)))))))</span>
</code></pre></div></div>

<p>The body result is captured in a closure and delivered to the promise.
If there’s an error signal, it’s “<em>rethrown</em>” into subscribers by the
promise’s value function.</p>

<p>This is where Emacs Lisp has a serious weak spot. There’s not really a
concept of rethrowing a signal. Unlike a language with explicit
exception objects that can capture a snapshot of the backtrace, the
original backtrace is completely lost where the signal is caught.
There’s no way to “reattach” it to the signal when it’s rethrown. This
is unfortunate because it would greatly help debugging if you got to see
the full backtrace on the other side of the promise.</p>

<h3 id="async-functions">Async functions</h3>

<p>So we have promises and we want to pause a function on a promise.
Generators have <code class="language-plaintext highlighter-rouge">iter-yield</code> for pausing an iterator’s execution. To
tackle this problem:</p>

<ol>
  <li>Yield the promise to pause the iterator.</li>
  <li>Subscribe a callback on the promise that continues the generator
(<code class="language-plaintext highlighter-rouge">iter-next</code>) with the promise’s result as the yield result.</li>
</ol>

<p>All the hard work is done in either side of the yield, so <code class="language-plaintext highlighter-rouge">aio-await</code> is
just a simple wrapper around <code class="language-plaintext highlighter-rouge">iter-yield</code>:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmacro</span> <span class="nv">aio-await</span> <span class="p">(</span><span class="nv">expr</span><span class="p">)</span>
  <span class="o">`</span><span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">iter-yield</span> <span class="o">,</span><span class="nv">expr</span><span class="p">)))</span>
</code></pre></div></div>

<p>Remember, that <code class="language-plaintext highlighter-rouge">funcall</code> is here to extract the promise value from the
value function. If it signals an error, this propagates directly into
the iterator just as if it had been a direct call — minus an accurate
backtrace.</p>

<p>So <code class="language-plaintext highlighter-rouge">aio-lambda</code> / <code class="language-plaintext highlighter-rouge">aio-defun</code> needs to wrap the body in a generator
(<code class="language-plaintext highlighter-rouge">iter-lamba</code>), invoke it to produce a generator, then drive the
generator using callbacks. Here’s a simplified, unhygienic definition of
<code class="language-plaintext highlighter-rouge">aio-lambda</code>:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmacro</span> <span class="nv">aio-lambda</span> <span class="p">(</span><span class="nv">arglist</span> <span class="k">&amp;rest</span> <span class="nv">body</span><span class="p">)</span>
  <span class="o">`</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">args</span><span class="p">)</span>
     <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">promise</span> <span class="p">(</span><span class="nv">aio-promise</span><span class="p">))</span>
           <span class="p">(</span><span class="nv">iter</span> <span class="p">(</span><span class="nb">apply</span> <span class="p">(</span><span class="nv">iter-lambda</span> <span class="o">,</span><span class="nv">arglist</span>
                          <span class="p">(</span><span class="nv">aio-with-promise</span> <span class="nv">promise</span>
                            <span class="o">,@</span><span class="nv">body</span><span class="p">))</span>
                        <span class="nv">args</span><span class="p">)))</span>
       <span class="p">(</span><span class="nb">prog1</span> <span class="nv">promise</span>
         <span class="p">(</span><span class="nv">aio--step</span> <span class="nv">iter</span> <span class="nv">promise</span> <span class="no">nil</span><span class="p">)))))</span>
</code></pre></div></div>

<p>The body is evaluated inside <code class="language-plaintext highlighter-rouge">aio-with-promise</code> with the result
delivered to the promise returned directly by the async function.</p>

<p>Before returning, the iterator is handed to <code class="language-plaintext highlighter-rouge">aio--step</code>, which drives
the iterator forward until it delivers its first promise. When the
iterator yields a promise, <code class="language-plaintext highlighter-rouge">aio--step</code> attaches a callback back to
itself on the promise as described above. Immediately driving the
iterator up to the first yielded promise “primes” it, which is
important for getting the ball rolling on any asynchronous operations.</p>

<p>If the iterator ever yields something other than a promise, it’s
delivered right back into the iterator.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">aio--step</span> <span class="p">(</span><span class="nv">iter</span> <span class="nv">promise</span> <span class="nv">yield-result</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">condition-case</span> <span class="nv">_</span>
      <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">result</span> <span class="nb">=</span> <span class="p">(</span><span class="nv">iter-next</span> <span class="nv">iter</span> <span class="nv">yield-result</span><span class="p">)</span>
               <span class="nv">then</span> <span class="p">(</span><span class="nv">iter-next</span> <span class="nv">iter</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="nv">result</span><span class="p">))</span>
               <span class="nv">until</span> <span class="p">(</span><span class="nv">aio-promise-p</span> <span class="nv">result</span><span class="p">)</span>
               <span class="nv">finally</span> <span class="p">(</span><span class="nv">aio-listen</span> <span class="nv">result</span>
                                   <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">value</span><span class="p">)</span>
                                     <span class="p">(</span><span class="nv">aio--step</span> <span class="nv">iter</span> <span class="nv">promise</span> <span class="nv">value</span><span class="p">))))</span>
    <span class="p">(</span><span class="nv">iter-end-of-sequence</span><span class="p">)))</span>
</code></pre></div></div>

<p>When the iterator is done, nothing more needs to happen since the
iterator resolves its own return value promise.</p>

<p>The definition of <code class="language-plaintext highlighter-rouge">aio-defun</code> just uses <code class="language-plaintext highlighter-rouge">aio-lambda</code> with <code class="language-plaintext highlighter-rouge">defalias</code>.
There’s nothing to it.</p>

<p>That’s everything you need! Everything else in the package is merely
useful, awaitable functions like <code class="language-plaintext highlighter-rouge">aio-sleep</code> and <code class="language-plaintext highlighter-rouge">aio-timeout</code>.</p>

<h3 id="composing-promises">Composing promises</h3>

<p>Unfortunately <code class="language-plaintext highlighter-rouge">url-retrieve</code> doesn’t support timeouts. We can work
around this by composing two promises: a <code class="language-plaintext highlighter-rouge">url-retrieve</code> promise and
<code class="language-plaintext highlighter-rouge">aio-timeout</code> promise. First define a promise-returning function,
<code class="language-plaintext highlighter-rouge">aio-select</code> that takes a list of promises and returns (as another
promise) the first promise to resolve:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">aio-select</span> <span class="p">(</span><span class="nv">promises</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">result</span> <span class="p">(</span><span class="nv">aio-promise</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">prog1</span> <span class="nv">result</span>
      <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">promise</span> <span class="nv">promises</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">aio-listen</span> <span class="nv">promise</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">_</span><span class="p">)</span>
                              <span class="p">(</span><span class="nv">aio-resolve</span>
                               <span class="nv">result</span>
                               <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="nv">promise</span><span class="p">))))))))</span>
</code></pre></div></div>

<p>We give <code class="language-plaintext highlighter-rouge">aio-select</code> both our <code class="language-plaintext highlighter-rouge">url-retrieve</code> and <code class="language-plaintext highlighter-rouge">timeout</code> promises, and
it tells us which resolved first:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">aio-defun</span> <span class="nv">fetch-fortune-4</span> <span class="p">(</span><span class="nv">url</span> <span class="nv">timeout</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">promises</span> <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="nv">aio-url-retrieve</span> <span class="nv">url</span><span class="p">)</span>
                         <span class="p">(</span><span class="nv">aio-timeout</span> <span class="nv">timeout</span><span class="p">)))</span>
         <span class="p">(</span><span class="nv">fastest</span> <span class="p">(</span><span class="nv">aio-await</span> <span class="p">(</span><span class="nv">aio-select</span> <span class="nv">promises</span><span class="p">)))</span>
         <span class="p">(</span><span class="nv">buffer</span> <span class="p">(</span><span class="nv">aio-await</span> <span class="nv">fastest</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">with-current-buffer</span> <span class="nv">buffer</span>
      <span class="p">(</span><span class="nb">prog1</span> <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">kill-buffer</span><span class="p">)))))</span>
</code></pre></div></div>

<p>Cool! Note: This will not actually cancel the URL request, just move
the async function forward earlier and prevent it from getting the
result.</p>

<h3 id="threads">Threads</h3>

<p>Despite <code class="language-plaintext highlighter-rouge">aio</code> being entirely about managing concurrent, asynchronous
operations, it has nothing at all to do with threads — as in Emacs 26’s
support for kernel threads. All async functions and promise callbacks
are expected to run <em>only</em> on the main thread. That’s not to say an
async function can’t await on a result from another thread. It just must
be <a href="/blog/2017/02/14/">done very carefully</a>.</p>

<h3 id="processes">Processes</h3>

<p>The package also includes two functions for realizing promises on
processes, whether they be subprocesses or network sockets.</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">aio-process-filter</code></li>
  <li><code class="language-plaintext highlighter-rouge">aio-process-sentinel</code></li>
</ul>

<p>For example, this function loops over each chunk of output (typically
4kB) from the process, as delivered to a filter function:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">aio-defun</span> <span class="nv">process-chunks</span> <span class="p">(</span><span class="nv">process</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">chunk</span> <span class="nb">=</span> <span class="p">(</span><span class="nv">aio-await</span> <span class="p">(</span><span class="nv">aio-process-filter</span> <span class="nv">process</span><span class="p">))</span>
           <span class="nv">while</span> <span class="nv">chunk</span>
           <span class="nb">do</span> <span class="p">(</span><span class="o">...</span> <span class="nv">process</span> <span class="nv">chunk</span> <span class="o">...</span><span class="p">)))</span>
</code></pre></div></div>

<p>Exercise for the reader: Write an awaitable function that returns a line
at at time rather than a chunk at a time. You can build it on top of
<code class="language-plaintext highlighter-rouge">aio-process-filter</code>.</p>

<p>I considered wrapping functions like <code class="language-plaintext highlighter-rouge">start-process</code> so that their <code class="language-plaintext highlighter-rouge">aio</code>
versions would return a promise representing some kind of result from
the process. However there are <em>so</em> many different ways to create and
configure processes that I would have ended up duplicating all the
process functions. Focusing on the filter and sentinel, and letting the
caller create and configure the process is much cleaner.</p>

<p>Unfortunately Emacs has no asynchronous API for writing output to a
process. Both <code class="language-plaintext highlighter-rouge">process-send-string</code> and <code class="language-plaintext highlighter-rouge">process-send-region</code> will block
if the pipe or socket is full. There is no callback, so you cannot await
on writing output. Maybe there’s a way to do it with a dedicated thread?</p>

<p>Another issue is that the <code class="language-plaintext highlighter-rouge">process-send-*</code> functions <a href="/blog/2013/01/14/">are
preemptible</a>, made necessary because they block. The
<code class="language-plaintext highlighter-rouge">aio-process-*</code> functions leave a gap (i.e. between filter awaits)
where no filter or sentinel function is attached. It’s a consequence
of promises being single-fire. The gap is harmless so long as the
async function doesn’t await something else or get preempted. This
needs some more thought.</p>

<p><strong><em>Update</em></strong>: These process functions no longer exist and have been
replaced by a small framework for building chains of promises. See
<code class="language-plaintext highlighter-rouge">aio-make-callback</code>.</p>

<h3 id="testing-aio">Testing aio</h3>

<p>The test suite for <code class="language-plaintext highlighter-rouge">aio</code> is a bit unusual. Emacs’ built-in test suite,
ERT, doesn’t support asynchronous tests. Furthermore, tests are
generally run in batch mode, where Emacs invokes a single function and
then exits rather than pump an event loop. Batch mode can only handle
asynchronous process I/O, not the async functions of <code class="language-plaintext highlighter-rouge">aio</code>. So it’s
not possible to run the tests in batch mode.</p>

<p>Instead I hacked together a really crude callback-based test suite. It
runs in non-batch mode and writes the test results into a buffer
(run with <code class="language-plaintext highlighter-rouge">make check</code>). Not ideal, but it works.</p>

<p>One of the tests is a sleep sort (with reasonable tolerances). It’s a
pretty neat demonstration of what you can do with <code class="language-plaintext highlighter-rouge">aio</code>:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">aio-defun</span> <span class="nv">sleep-sort</span> <span class="p">(</span><span class="nb">values</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">promises</span> <span class="p">(</span><span class="nb">mapcar</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nv">aio-sleep</span> <span class="nv">v</span> <span class="nv">v</span><span class="p">))</span> <span class="nb">values</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">while</span> <span class="nv">promises</span>
             <span class="nv">for</span> <span class="nv">next</span> <span class="nb">=</span> <span class="p">(</span><span class="nv">aio-await</span> <span class="p">(</span><span class="nv">aio-select</span> <span class="nv">promises</span><span class="p">))</span>
             <span class="nb">do</span> <span class="p">(</span><span class="nb">setf</span> <span class="nv">promises</span> <span class="p">(</span><span class="nv">delq</span> <span class="nv">next</span> <span class="nv">promises</span><span class="p">))</span>
             <span class="nv">collect</span> <span class="p">(</span><span class="nv">aio-await</span> <span class="nv">next</span><span class="p">))))</span>
</code></pre></div></div>

<p>To see it in action (<code class="language-plaintext highlighter-rouge">M-x sleep-sort-demo</code>):</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">aio-defun</span> <span class="nv">sleep-sort-demo</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nb">values</span> <span class="o">'</span><span class="p">(</span><span class="mf">0.1</span> <span class="mf">0.4</span> <span class="mf">1.1</span> <span class="mf">0.2</span> <span class="mf">0.8</span> <span class="mf">0.6</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">message</span> <span class="s">"%S"</span> <span class="p">(</span><span class="nv">aio-await</span> <span class="p">(</span><span class="nv">sleep-sort</span> <span class="nb">values</span><span class="p">)))))</span>
</code></pre></div></div>

<h3 id="asyncawait-is-pretty-awesome">Async/await is pretty awesome</h3>

<p>I’m quite happy with how this all came together. Once I had the
concepts straight — particularly resolving to value functions —
everything made sense and all the parts fit together well, and mostly
by accident. That feels good.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Python Decorators: Syntactic Artificial Sweetener</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/03/08/"/>
    <id>urn:uuid:588255e6-70a2-4733-bf58-ca9a857930f3</id>
    <updated>2019-03-08T23:00:49Z</updated>
    <category term="python"/><category term="lang"/>
    <content type="html">
      <![CDATA[<p>Python has a feature called <em>function decorators</em>. With a little bit of
syntax, the behavior of a function or class can be modified in useful
ways. Python comes with a few decorators, but most of the useful ones
are found in third-party libraries.</p>

<p><a href="https://www.python.org/dev/peps/pep-0318/">PEP 318</a> suggests a very simple, but practical decorator called
<code class="language-plaintext highlighter-rouge">synchronized</code>, though it doesn’t provide a concrete example. Consider
this function that increments a global counter:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">counter</span> <span class="o">=</span> <span class="mi">0</span>

<span class="k">def</span> <span class="nf">increment</span><span class="p">():</span>
    <span class="k">global</span> <span class="n">counter</span>
    <span class="n">counter</span> <span class="o">=</span> <span class="n">counter</span> <span class="o">+</span> <span class="mi">1</span>
</code></pre></div></div>

<p>If this function is called from multiple threads, there’s a <em>race
condition</em> — though, at least for CPython, it’s <a href="https://blog.regehr.org/archives/490">not a <em>data
race</em></a> thanks to the Global Interpreter Lock (GIL). Incrementing
the counter is not an atomic operation, as illustrated by <a href="/blog/2019/02/24/">its byte
code</a>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> 0 LOAD_GLOBAL              0 (counter)
 3 LOAD_CONST               1 (1)
 6 BINARY_ADD
 7 STORE_GLOBAL             0 (counter)
10 LOAD_CONST               0 (None)
13 RETURN_VALUE
</code></pre></div></div>

<p>The variable is loaded, operated upon, and stored. Another thread
could be scheduled between any of these instructions and cause an
undesired result. It’s easy to see that in practice:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">threading</span> <span class="kn">import</span> <span class="n">Thread</span>

<span class="k">def</span> <span class="nf">worker</span><span class="p">():</span>
    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">200000</span><span class="p">):</span>
        <span class="n">increment</span><span class="p">()</span>

<span class="n">threads</span> <span class="o">=</span> <span class="p">[</span><span class="n">Thread</span><span class="p">(</span><span class="n">target</span><span class="o">=</span><span class="n">worker</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">8</span><span class="p">)];</span>
<span class="k">for</span> <span class="n">thread</span> <span class="ow">in</span> <span class="n">threads</span><span class="p">:</span>
    <span class="n">thread</span><span class="p">.</span><span class="n">start</span><span class="p">()</span>
<span class="k">for</span> <span class="n">thread</span> <span class="ow">in</span> <span class="n">threads</span><span class="p">:</span>
    <span class="n">thread</span><span class="p">.</span><span class="n">join</span><span class="p">()</span>

<span class="k">print</span><span class="p">(</span><span class="n">counter</span><span class="p">)</span>
</code></pre></div></div>

<p>The increment function is called exactly 1.6 million times, but on my
system I get different results on each run:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ python3 example.py 
1306205
$ python3 example.py 
1162418
$ python3 example.py 
1076801
</code></pre></div></div>

<p>I could change the definition of <code class="language-plaintext highlighter-rouge">increment()</code> to use synchronization,
but wouldn’t it be nice if I could just tell Python to synchronize this
function? This is where a function decorator shines:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">threading</span> <span class="kn">import</span> <span class="n">Lock</span>

<span class="k">def</span> <span class="nf">synchronized</span><span class="p">(</span><span class="n">f</span><span class="p">):</span>
    <span class="n">lock</span> <span class="o">=</span> <span class="n">Lock</span><span class="p">()</span>
    <span class="k">def</span> <span class="nf">wrapper</span><span class="p">():</span>
        <span class="k">with</span> <span class="n">lock</span><span class="p">:</span>
            <span class="k">return</span> <span class="n">f</span><span class="p">()</span>
    <span class="k">return</span> <span class="n">wrapper</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">synchronized</code> function is a higher order function that accepts a
function and returns a function — or, more specifically, a <em>callable</em>.
The purpose is to wrap and <em>decorate</em> the function it’s given. In this
case the function is wrapped in a mutual exclusion lock. Note: This
implementation is very simple and only works for functions that accept
no arguments.</p>

<p>To use it, I just add a single line to <code class="language-plaintext highlighter-rouge">increment</code>:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">@</span><span class="n">synchronized</span>
<span class="k">def</span> <span class="nf">increment</span><span class="p">():</span>
    <span class="k">global</span> <span class="n">counter</span>
    <span class="n">counter</span> <span class="o">=</span> <span class="n">counter</span> <span class="o">+</span> <span class="mi">1</span>
</code></pre></div></div>

<p>With this change my program now always prints 1600000.</p>

<h3 id="syntactic-sugar">Syntactic “sugar”</h3>

<p>Everyone is quick to point out that this is just syntactic sugar, and
that you can accomplish this without the <code class="language-plaintext highlighter-rouge">@</code> syntax. For example, the
last definition of <code class="language-plaintext highlighter-rouge">increment</code> is equivalent to:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">increment</span><span class="p">():</span>
    <span class="p">...</span>

<span class="n">increment</span> <span class="o">=</span> <span class="n">synchronized</span><span class="p">(</span><span class="n">increment</span><span class="p">)</span>
</code></pre></div></div>

<p>Decorators can also be parameterized. For example, Python’s
<code class="language-plaintext highlighter-rouge">functools</code> module has an <code class="language-plaintext highlighter-rouge">lru_cache</code> decorator for memoizing a
function:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">@</span><span class="n">lru_cache</span><span class="p">(</span><span class="n">maxsize</span><span class="o">=</span><span class="mi">32</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">expensive</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
    <span class="p">...</span>
</code></pre></div></div>

<p>Which is equivalent to this very direct source transformation:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">expensive</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
    <span class="p">...</span>

<span class="n">expensive</span> <span class="o">=</span> <span class="n">lru_cache</span><span class="p">(</span><span class="n">maxsize</span><span class="o">=</span><span class="mi">32</span><span class="p">)(</span><span class="n">expensive</span><span class="p">)</span>
</code></pre></div></div>

<p>So what comes after the <code class="language-plaintext highlighter-rouge">@</code> isn’t just a name. In fact, it <em>looks</em>
like it can be any kind of expression that evaluates to a function
decorator. Or is it?</p>

<h3 id="syntactic-artificial-sweetener">Syntactic artificial sweetener</h3>

<p>Reality is often disappointing. Let’s try using an “identity” decorator
defined using <code class="language-plaintext highlighter-rouge">lambda</code>. This decorator will accomplish nothing, but it
will test if we can decorate a function using a lambda expression.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">@</span><span class="k">lambda</span> <span class="n">f</span><span class="p">:</span> <span class="n">f</span>
<span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
    <span class="k">pass</span>
</code></pre></div></div>

<p>But Python complains:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    @lambda f: f
          ^
SyntaxError: invalid syntax
</code></pre></div></div>

<p>Maybe Python is absolutely literal about the syntax sugar thing, and
it’s more like a kind of macro replacement. Let’s try wrapping it in
parentheses:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">@</span><span class="p">(</span><span class="k">lambda</span> <span class="n">f</span><span class="p">:</span> <span class="n">f</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">foo</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
    <span class="k">pass</span>
</code></pre></div></div>

<p>Nope, same error, but now pointing at the opening parenthesis. Getting
desperate now:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">@</span><span class="p">[</span><span class="n">synchronized</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
    <span class="k">pass</span>
</code></pre></div></div>

<p>Again, syntax error. What’s going on?</p>

<h3 id="pattern-matching">Pattern matching</h3>

<p>The problem is that the Python language reference doesn’t parse an
expression after <code class="language-plaintext highlighter-rouge">@</code>. It <a href="https://docs.python.org/3/reference/compound_stmts.html#function-definitions">matches a very specific pattern</a> that
just so happens to <em>look</em> like a Python expression. It’s not syntactic
sugar, it’s syntactic artificial sweetener!</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ator ::= "@" dotted_name ["(" [argument_list [","]] ")"] NEWLINE
</code></pre></div></div>

<p>In a way, this puts Python in the ranks of PHP 5 and Matlab: two
languages with completely screwed up grammars that can only parse
specific constructions that the developers had anticipated. For
example, in PHP 5 (fixed in PHP 7):</p>

<div class="language-php highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span> <span class="n">foo</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">return</span> <span class="k">function</span><span class="p">()</span> <span class="p">{</span>
        <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
    <span class="p">};</span>
<span class="p">}</span>

<span class="nf">foo</span><span class="p">()();</span>
</code></pre></div></div>

<p>That is a syntax error:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>PHP Parse error:  syntax error, unexpected '(', expecting ',' or ';'
</code></pre></div></div>

<p>Or <a href="/blog/2008/08/29/">in any version of Matlab</a>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    magic(4)(:)
</code></pre></div></div>

<p>That is a syntax error:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Unbalanced or unexpected parenthesis or bracket
</code></pre></div></div>

<p>In Python’s defense, this strange, limited syntax is only in a single
place rather than everywhere, but I still wonder why it was defined
that way.</p>

<p>Update: Clément Pit-Claudel pointed out the explanation in the PEP,
which references <a href="https://mail.python.org/pipermail/python-dev/2004-August/046711.html">a 2004 email by Guido van Rossum</a>:</p>

<blockquote>
  <p>I have a gut feeling about this one.  I’m not sure where it comes
from, but I have it.  It may be that I want the compiler to be able to
recognize certain decorators.</p>

  <p>So while it would be quite easy to change the syntax to @test in the
future, I’d like to stick with the more restricted form unless a real
use case is presented where allowing @test would increase readability.
(@foo().bar() doesn’t count because I don’t expect you’ll ever need
that).</p>
</blockquote>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>The CPython Bytecode Compiler is Dumb</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/02/24/"/>
    <id>urn:uuid:4348d611-858b-4f48-a6f5-6e4b93f71a34</id>
    <updated>2019-02-24T21:56:35Z</updated>
    <category term="python"/><category term="lua"/><category term="lang"/><category term="elisp"/><category term="optimization"/>
    <content type="html">
      <![CDATA[<p><em>This article was <a href="https://news.ycombinator.com/item?id=19241545">discussed on Hacker News</a>.</em></p>

<p>Due to sheer coincidence of several unrelated tasks converging on
Python at work, I recently needed to brush up on my Python skills. So
far for me, Python has been little more than <a href="/blog/2017/05/15/">a fancy extension
language for BeautifulSoup</a>, though I also used it to participate
in the recent tradition of <a href="https://github.com/skeeto/qualbum">writing one’s own static site
generator</a>, in this case for <a href="http://photo.nullprogram.com/">my wife’s photo blog</a>.
I’ve been reading through <em>Fluent Python</em> by Luciano Ramalho, and it’s
been quite effective at getting me up to speed.</p>

<!--more-->

<p>As I write Python, <a href="/blog/2014/01/04/">like with Emacs Lisp</a>, I can’t help but
consider what exactly is happening inside the interpreter. I wonder if
the code I’m writing is putting undue constraints on the bytecode
compiler and limiting its options. Ultimately I’d like the code I
write <a href="/blog/2017/01/30/">to drive the interpreter efficiently and effectively</a>.
<a href="https://www.python.org/dev/peps/pep-0020/">The Zen of Python</a> says there should “only one obvious way to do
it,” but in practice there’s a lot of room for expression. Given
multiple ways to express the same algorithm or idea, I tend to prefer
the one that compiles to the more efficient bytecode.</p>

<p>Fortunately CPython, the main and most widely used implementation of
Python, is very transparent about its bytecode. It’s easy to inspect
and reason about its bytecode. The disassembly listing is easy to read
and understand, and I can always follow it without consulting the
documentation. This contrasts sharply with modern JavaScript engines
and their opaque use of JIT compilation, where performance is guided
by obeying certain patterns (<a href="https://www.youtube.com/watch?v=UJPdhx5zTaw">hidden classes</a>, etc.), helping the
compiler <a href="https://blog.mozilla.org/javascript/2013/11/07/efficient-float32-arithmetic-in-javascript/">understand my program’s types</a>, and being careful
not to unnecessarily constrain the compiler.</p>

<p>So, besides just catching up with Python the language, I’ve been
studying the bytecode disassembly of the functions that I write. One
fact has become quite apparent: <strong>the CPython bytecode compiler is
pretty dumb</strong>. With a few exceptions, it’s a very literal translation
of a Python program, and there is almost <a href="https://legacy.python.org/workshops/1998-11/proceedings/papers/montanaro/montanaro.html">no optimization</a>.
Below I’ll demonstrate a case where it’s possible to detect one of the
missed optimizations without inspecting the bytecode disassembly
thanks to a small abstraction leak in the optimizer.</p>

<p>To be clear: This isn’t to say CPython is bad, or even that it should
necessarily change. In fact, as I’ll show, <strong>dumb bytecode compilers
are par for the course</strong>. In the past I’ve lamented how the Emacs Lisp
compiler could do a better job, but CPython and Lua are operating at
the same level. There are benefits to a dumb and straightforward
bytecode compiler: the compiler itself is simpler, easier to maintain,
and more amenable to modification (e.g. as Python continues to
evolve). It’s also easier to debug Python (<code class="language-plaintext highlighter-rouge">pdb</code>) because it’s such a
close match to the source listing.</p>

<p><em>Update</em>: <a href="https://codewords.recurse.com/issues/seven/dragon-taming-with-tailbiter-a-bytecode-compiler">Darius Bacon points out</a> that Guido van Rossum
himself said, “<a href="https://books.google.com/books?id=bIxWAgAAQBAJ&amp;pg=PA26&amp;lpg=PA26&amp;dq=%22Python+is+about+having+the+simplest,+dumbest+compiler+imaginable.%22&amp;source=bl&amp;ots=2OfDoWX321&amp;sig=ACfU3U32jKZBE3VkJ0gvkKbxRRgD0bnoRg&amp;hl=en&amp;sa=X&amp;ved=2ahUKEwjZ1quO89bgAhWpm-AKHfckAxUQ6AEwAHoECAkQAQ#v=onepage&amp;q=%22Python%20is%20about%20having%20the%20simplest%2C%20dumbest%20compiler%20imaginable.%22&amp;f=false">Python is about having the simplest, dumbest compiler
imaginable.</a>” So this is all very much by design.</p>

<p>The consensus seems to be that if you want or need better performance,
use something other than Python. (And if you can’t do that, at least use
<a href="https://pypy.org/">PyPy</a>.) That’s a fairly reasonable and healthy goal. Still, if
I’m writing Python, I’d like to do the best I can, which means
exploiting the optimizations that <em>are</em> available when possible.</p>

<h3 id="disassembly-examples">Disassembly examples</h3>

<p>I’m going to compare three bytecode compilers in this article: CPython
3.7, Lua 5.3, and Emacs 26.1. Each of these languages are dynamically
typed, are primarily executed on a bytecode virtual machine, and it’s
easy to access their disassembly listings. One caveat: CPython and Emacs
use a stack-based virtual machine while Lua uses a register-based
virtual machine.</p>

<p>For CPython I’ll be using the <code class="language-plaintext highlighter-rouge">dis</code> module. For Emacs Lisp I’ll use <code class="language-plaintext highlighter-rouge">M-x
disassemble</code>, and all code will use lexical scoping. In Lua I’ll use
<code class="language-plaintext highlighter-rouge">lua -l</code> on the command line.</p>

<h3 id="local-variable-elimination">Local variable elimination</h3>

<p>Will the bytecode compiler eliminate local variables? Keeping the
variable around potentially involves allocating memory for it, assigning
to it, and accessing it. Take this example:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
    <span class="n">x</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="n">y</span> <span class="o">=</span> <span class="mi">1</span>
    <span class="k">return</span> <span class="n">x</span>
</code></pre></div></div>

<p>This function is equivalent to:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
    <span class="k">return</span> <span class="mi">0</span>
</code></pre></div></div>

<p>Despite this, CPython completely misses this optimization for both <code class="language-plaintext highlighter-rouge">x</code>
and <code class="language-plaintext highlighter-rouge">y</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  2           0 LOAD_CONST               1 (0)
              2 STORE_FAST               0 (x)
  3           4 LOAD_CONST               2 (1)
              6 STORE_FAST               1 (y)
  4           8 LOAD_FAST                0 (x)
             10 RETURN_VALUE
</code></pre></div></div>

<p>It assigns both variables, and even loads again from <code class="language-plaintext highlighter-rouge">x</code> for the return.
Missed optimizations, but, as I said, by keeping these variables around,
debugging is more straightforward. Users can always inspect variables.</p>

<p>How about Lua?</p>

<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span> <span class="nf">foo</span><span class="p">()</span>
    <span class="kd">local</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="kd">local</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">1</span>
    <span class="k">return</span> <span class="n">x</span>
<span class="k">end</span>
</code></pre></div></div>

<p>It also misses this optimization, though it matters a little less due to
its architecture (the return instruction references a register
regardless of whether or not that register is allocated to a local
variable):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>        1       [2]     LOADK           0 -1    ; 0
        2       [3]     LOADK           1 -2    ; 1
        3       [4]     RETURN          0 2
        4       [5]     RETURN          0 1
</code></pre></div></div>

<p>Emacs Lisp also misses it:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">foo</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">x</span> <span class="mi">0</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">y</span> <span class="mi">1</span><span class="p">))</span>
    <span class="nv">x</span><span class="p">))</span>
</code></pre></div></div>

<p>Disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0	constant  0
1	constant  1
2	stack-ref 1
3	return
</code></pre></div></div>

<p>All three are on the same page.</p>

<h3 id="constant-folding">Constant folding</h3>

<p>Does the bytecode compiler evaluate simple constant expressions at
compile time? This is simple and everyone does it.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
    <span class="k">return</span> <span class="mi">1</span> <span class="o">+</span> <span class="mi">2</span> <span class="o">*</span> <span class="mi">3</span> <span class="o">/</span> <span class="mi">4</span>
</code></pre></div></div>

<p>Disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  2           0 LOAD_CONST               1 (2.5)
              2 RETURN_VALUE
</code></pre></div></div>

<p>Lua:</p>

<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span> <span class="nf">foo</span><span class="p">()</span>
    <span class="k">return</span> <span class="mi">1</span> <span class="o">+</span> <span class="mi">2</span> <span class="o">*</span> <span class="mi">3</span> <span class="o">/</span> <span class="mi">4</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>        1       [2]     LOADK           0 -1    ; 2.5
        2       [2]     RETURN          0 2
        3       [3]     RETURN          0 1
</code></pre></div></div>

<p>Emacs Lisp:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">foo</span> <span class="p">()</span>
  <span class="p">(</span><span class="nb">+</span> <span class="mi">1</span> <span class="p">(</span><span class="nb">/</span> <span class="p">(</span><span class="nb">*</span> <span class="mi">2</span> <span class="mi">3</span><span class="p">)</span> <span class="mf">4.0</span><span class="p">))</span>
</code></pre></div></div>

<p>Disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0	constant  2.5
1	return
</code></pre></div></div>

<p>That’s something we can count on so long as the operands are all
numeric literals (or also, for Python, string literals) that are
visible to the compiler. Don’t count on your operator overloads to
work here, though.</p>

<h3 id="allocation-optimization">Allocation optimization</h3>

<p>Optimizers often perform <em>escape analysis</em>, to determine if objects
allocated in a function ever become visible outside of that function. If
they don’t then these objects could potentially be stack-allocated
(instead of heap-allocated) or even be eliminated entirely.</p>

<p>None of the bytecode compilers are this sophisticated. However CPython
does have a trick up its sleeve: tuple optimization. Since tuples are
immutable, in certain circumstances CPython will reuse them and avoid
both the constructor and the allocation.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
    <span class="k">return</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
</code></pre></div></div>

<p>Check it out, the tuple is used as a constant:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  2           0 LOAD_CONST               1 ((1, 2, 3))
              2 RETURN_VALUE
</code></pre></div></div>

<p>Which we can detect by evaluating <code class="language-plaintext highlighter-rouge">foo() is foo()</code>, which is <code class="language-plaintext highlighter-rouge">True</code>.
Though deviate from this too much and the optimization is disabled.
Remember how CPython can’t optimize away variables, and that they
break constant folding? The break this, too:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
    <span class="n">x</span> <span class="o">=</span> <span class="mi">1</span>
    <span class="k">return</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
</code></pre></div></div>

<p>Disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  2           0 LOAD_CONST               1 (1)
              2 STORE_FAST               0 (x)
  3           4 LOAD_FAST                0 (x)
              6 LOAD_CONST               2 (2)
              8 LOAD_CONST               3 (3)
             10 BUILD_TUPLE              3
             12 RETURN_VALUE
</code></pre></div></div>

<p>This function might document that it always returns a simple tuple,
but we can tell if its being optimized or not using <code class="language-plaintext highlighter-rouge">is</code> like before:
<code class="language-plaintext highlighter-rouge">foo() is foo()</code> is now <code class="language-plaintext highlighter-rouge">False</code>! In some future version of Python with
a cleverer bytecode compiler, that expression might evaluate to
<code class="language-plaintext highlighter-rouge">True</code>. (Unless the <a href="https://docs.python.org/3/reference/">Python language specification</a> is specific
about this case, which I didn’t check.)</p>

<p>Note: Curiously PyPy replicates this exact behavior when examined with
<code class="language-plaintext highlighter-rouge">is</code>. Was that deliberate? I’m impressed that PyPy matches CPython’s
semantics so closely here.</p>

<p>Putting a mutable value, such as a list, in the tuple will also break
this optimization. But that’s not the compiler being dumb. That’s a
hard constraint on the compiler: the caller might change the mutable
component of the tuple, so it must always return a fresh copy.</p>

<p>Neither Lua nor Emacs Lisp have a language-level concept equivalent of
an immutable tuple, so there’s nothing to compare.</p>

<p>Other than the tuples situation in CPython, none of the bytecode
compilers eliminate unnecessary intermediate objects.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
    <span class="k">return</span> <span class="p">[</span><span class="mi">1024</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span>
</code></pre></div></div>

<p>Disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  2           0 LOAD_CONST               1 (1024)
              2 BUILD_LIST               1
              4 LOAD_CONST               2 (0)
              6 BINARY_SUBSCR
              8 RETURN_VALUE
</code></pre></div></div>

<p>Lua:</p>

<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span> <span class="nf">foo</span><span class="p">()</span>
    <span class="k">return</span> <span class="p">({</span><span class="mi">1024</span><span class="p">})[</span><span class="mi">1</span><span class="p">]</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>        1       [2]     NEWTABLE        0 1 0
        2       [2]     LOADK           1 -1    ; 1024
        3       [2]     SETLIST         0 1 1   ; 1
        4       [2]     GETTABLE        0 0 -2  ; 1
        5       [2]     RETURN          0 2
        6       [3]     RETURN          0 1
</code></pre></div></div>

<p>Emacs Lisp:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">foo</span> <span class="p">()</span>
  <span class="p">(</span><span class="nb">car</span> <span class="p">(</span><span class="nb">list</span> <span class="mi">1024</span><span class="p">)))</span>
</code></pre></div></div>

<p>Disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0	constant  1024
1	list1
2	car
3	return
</code></pre></div></div>

<h3 id="dont-expect-too-much">Don’t expect too much</h3>

<p>I could go on with lots of examples, looking at loop optimizations and
so on, and each case is almost certainly unoptimized. The general rule
of thumb is to simply not expect much from these bytecode compilers.
They’re very literal in their translation.</p>

<p>Working so much in C has put me in the habit of expecting all obvious
optimizations from the compiler. This frees me to be more expressive
in my code. Lots of things are cost-free thanks to these
optimizations, such as breaking a complex expression up into several
variables, naming my constants, or not using a local variable to
manually cache memory accesses. I’m confident the compiler will
optimize away my expressiveness. The catch is that <a href="/blog/2018/05/01/">clever compilers
can take things too far</a>, so I’ve got to be mindful of how it might
undermine my intentions — i.e. when I’m doing something unusual or not
strictly permitted.</p>

<p>These bytecode compilers will never truly surprise me. The cost is
that being more expressive in Python, Lua, or Emacs Lisp may reduce
performance at run time because it shows in the bytecode. Usually this
doesn’t matter, but sometimes it does.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Web Scraping into an E-book with BeautifulSoup and Pandoc</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2017/05/15/"/>
    <id>urn:uuid:8e05a4a5-4601-3717-d1ef-c03ea2413025</id>
    <updated>2017-05-15T02:39:20Z</updated>
    <category term="python"/><category term="web"/>
    <content type="html">
      <![CDATA[<p>I recently learned how to use <a href="https://www.crummy.com/software/BeautifulSoup/">BeautifulSoup</a>, a Python library for
manipulating HTML and XML parse trees, and it’s been a fantastic
addition to my virtual toolbelt. In the past when I’ve needed to process
raw HTML, I’ve tried nasty hacks with Unix pipes, or <a href="/blog/2013/01/24/">routing the
content through a web browser</a> so that I could manipulate it via
the DOM API. None of that worked very well, but now I finally have
BeautifulSoup to fill that gap. It’s got a selector interface and,
except for rendering, it’s basically as comfortable with HTML as
JavaScript.</p>

<p>Today’s problem was that I wanted to read <a href="http://daviddfriedman.blogspot.com/2017/05/something-different-or-maybe-not.html">a recommended</a> online
book called <a href="https://banter-latte.com/portfolio/interviewing-leather/"><em>Interviewing Leather</em></a>, a story set “in a world where
caped heroes fight dastardly villains on an everyday basis.” I say
“online book” because the 39,403 word story is distributed as a series
of 14 blog posts. I’d rather not read it on the website in a browser,
instead preferring it in e-book form where it’s more comfortable. The
<a href="/blog/2015/09/03/">last time I did this</a>, I manually scraped the entire book into
Markdown, spent a couple of weeks editing it for mistakes, and finally
sent the Markdown to <a href="http://pandoc.org/">Pandoc</a> to convert into an e-book.</p>

<p>For this book, I just want a quick-and-dirty scrape in order to shift
formats. I’ve never read it and I may not even like it (<em>update</em>: I
enjoyed it), so I definitely don’t want to spend much time on the
conversion. Despite <a href="/blog/2017/04/01/">having fun with typing lately</a>, I’d also
prefer to keep all the formating — italics, etc. — without re-entering
it all manually.</p>

<p>Fortunately Pandoc can consume HTML as input, so, in theory, I can feed
it the original HTML and preserve all of the original markup. The
challenge is that the HTML is spread across 14 pages surrounded by all
the expected blog cruft. I need some way to extract the book content
from each page, concatenate it together along with chapter headings, and
send the result to Pandoc. Enter BeautifulSoup.</p>

<p>First, I need to construct the skeleton HTML document. Rather than code
my own HTML, I’m going to build it with BeautifulSoup. I start by
creating a completely empty document and adding a doctype to it.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">bs4</span> <span class="kn">import</span> <span class="n">BeautifulSoup</span><span class="p">,</span> <span class="n">Doctype</span>

<span class="n">doc</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">()</span>
<span class="n">doc</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">Doctype</span><span class="p">(</span><span class="s">'html'</span><span class="p">))</span>
</code></pre></div></div>

<p>Next I create the <code class="language-plaintext highlighter-rouge">html</code> root element, then add the <code class="language-plaintext highlighter-rouge">head</code> and <code class="language-plaintext highlighter-rouge">body</code>
elements. I also add a <code class="language-plaintext highlighter-rouge">title</code> element. The original content has fancy
Unicode markup — left and right quotation marks, em dash, etc. — so it’s
important to declare the page as UTF-8, since otherwise these characters
are likely to be interpreted incorrectly. It always feels odd declaring
the encoding within the content being encoded, but that’s just the way
things are.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">html</span> <span class="o">=</span> <span class="n">doc</span><span class="p">.</span><span class="n">new_tag</span><span class="p">(</span><span class="s">'html'</span><span class="p">,</span> <span class="n">lang</span><span class="o">=</span><span class="s">'en-US'</span><span class="p">)</span>
<span class="n">doc</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">html</span><span class="p">)</span>
<span class="n">head</span> <span class="o">=</span> <span class="n">doc</span><span class="p">.</span><span class="n">new_tag</span><span class="p">(</span><span class="s">'head'</span><span class="p">)</span>
<span class="n">html</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">head</span><span class="p">)</span>
<span class="n">meta</span> <span class="o">=</span> <span class="n">doc</span><span class="p">.</span><span class="n">new_tag</span><span class="p">(</span><span class="s">'meta'</span><span class="p">,</span> <span class="n">charset</span><span class="o">=</span><span class="s">'utf-8'</span><span class="p">)</span>
<span class="n">head</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">meta</span><span class="p">)</span>
<span class="n">title</span> <span class="o">=</span> <span class="n">doc</span><span class="p">.</span><span class="n">new_tag</span><span class="p">(</span><span class="s">'title'</span><span class="p">)</span>
<span class="n">title</span><span class="p">.</span><span class="n">string</span> <span class="o">=</span> <span class="s">'Interviewing Leather'</span>
<span class="n">head</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">title</span><span class="p">)</span>
<span class="n">body</span> <span class="o">=</span> <span class="n">doc</span><span class="p">.</span><span class="n">new_tag</span><span class="p">(</span><span class="s">'body'</span><span class="p">)</span>
<span class="n">html</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">body</span><span class="p">)</span>
</code></pre></div></div>

<p>If I <code class="language-plaintext highlighter-rouge">print(doc.prettify())</code> then I see the skeleton I want:</p>

<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">&lt;!DOCTYPE html&gt;</span>
<span class="nt">&lt;html</span> <span class="na">lang=</span><span class="s">"en-US"</span><span class="nt">&gt;</span>
 <span class="nt">&lt;head&gt;</span>
  <span class="nt">&lt;meta</span> <span class="na">charset=</span><span class="s">"utf-8"</span><span class="nt">/&gt;</span>
  <span class="nt">&lt;title&gt;</span>
   Interviewing Leather
  <span class="nt">&lt;/title&gt;</span>
 <span class="nt">&lt;/head&gt;</span>
 <span class="nt">&lt;body&gt;</span>
 <span class="nt">&lt;/body&gt;</span>
<span class="nt">&lt;/html&gt;</span>
</code></pre></div></div>

<p>Next, I assemble a list of the individual blog posts. When I was
actually writing the script, I first downloaded them locally with <a href="/blog/2016/06/16/">my
favorite download tool</a>, curl, and ran the script against local
copies. I didn’t want to hit the web server each time I tested. (Note:
I’ve truncated these URLs to fit in this article.)</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">chapters</span> <span class="o">=</span> <span class="p">[</span>
    <span class="s">"https://banter-latte.com/2007/06/26/..."</span><span class="p">,</span>
    <span class="s">"https://banter-latte.com/2007/07/03/..."</span><span class="p">,</span>
    <span class="s">"https://banter-latte.com/2007/07/10/..."</span><span class="p">,</span>
    <span class="s">"https://banter-latte.com/2007/07/17/..."</span><span class="p">,</span>
    <span class="s">"https://banter-latte.com/2007/07/24/..."</span><span class="p">,</span>
    <span class="s">"https://banter-latte.com/2007/07/31/..."</span><span class="p">,</span>
    <span class="s">"https://banter-latte.com/2007/08/07/..."</span><span class="p">,</span>
    <span class="s">"https://banter-latte.com/2007/08/14/..."</span><span class="p">,</span>
    <span class="s">"https://banter-latte.com/2007/08/21/..."</span><span class="p">,</span>
    <span class="s">"https://banter-latte.com/2007/08/28/..."</span><span class="p">,</span>
    <span class="s">"https://banter-latte.com/2007/09/04/..."</span><span class="p">,</span>
    <span class="s">"https://banter-latte.com/2007/09/20/..."</span><span class="p">,</span>
    <span class="s">"https://banter-latte.com/2007/09/25/..."</span><span class="p">,</span>
    <span class="s">"https://banter-latte.com/2007/10/02/..."</span>
<span class="p">]</span>
</code></pre></div></div>

<p>I visit a few of these pages in my browser to determine which part of
the page I want to extract. I want to look closely enough to see what
I’m doing, but not <em>too</em> closely as to not spoil myself! Right clicking
the content in the browser and selecting “Inspect Element” (Firefox) or
“Inspect” (Chrome) pops up a pane to structurally navigate the page.
“View Page Source” would work, too, especially since this is static
content, but I find the developer pane easier to read. Plus it hides
most of the content, revealing only the structure.</p>

<p>The content is contained in a <code class="language-plaintext highlighter-rouge">div</code> with the class <code class="language-plaintext highlighter-rouge">entry-content</code>. I
can use a selector to isolate this element and extract its child <code class="language-plaintext highlighter-rouge">p</code>
elements. However, it’s not quite so simple. Each chapter starts with a
bit of commentary that’s not part of the book, and I don’t want to
include in my extract. It’s separated from the real content by an <code class="language-plaintext highlighter-rouge">hr</code>
element. There’s also a footer below another <code class="language-plaintext highlighter-rouge">hr</code> element, likely put
there by someone who wasn’t paying attention to the page structure. It’s
not quite the shining example of semantic markup, but it’s regular
enough I can manage.</p>

<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">&lt;body&gt;</span>
  <span class="nt">&lt;main</span> <span class="na">class=</span><span class="s">"site-main"</span><span class="nt">&gt;</span>
    <span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">"entry-body"</span><span class="nt">&gt;</span>
      <span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">"entry-content"</span><span class="nt">&gt;</span>
        <span class="nt">&lt;p&gt;</span>A little intro.<span class="nt">&lt;/p&gt;</span>
        <span class="nt">&lt;p&gt;</span>Some more intro.<span class="nt">&lt;/p&gt;</span>
        <span class="nt">&lt;hr/&gt;</span>
        <span class="nt">&lt;p&gt;</span>Actual book content.<span class="nt">&lt;/p&gt;</span>
        <span class="nt">&lt;p&gt;</span>More content.<span class="nt">&lt;/p&gt;</span>
        <span class="nt">&lt;hr/&gt;</span>
        <span class="nt">&lt;p&gt;</span>Footer navigation junk.<span class="nt">&lt;/p&gt;</span>
      <span class="nt">&lt;/div&gt;</span>
    <span class="nt">&lt;/div&gt;</span>
  <span class="nt">&lt;/main&gt;</span>
<span class="nt">&lt;/body&gt;</span>
</code></pre></div></div>

<p>The next step is visiting each of these pages. I use <code class="language-plaintext highlighter-rouge">enumerate</code> since I
want the chapter numbers when inserting <code class="language-plaintext highlighter-rouge">h1</code> chapter elements. Pandoc
will use these to build the table of contents.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">chapter</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">chapters</span><span class="p">):</span>
    <span class="c1"># Construct h1 for the chapter
</span>    <span class="n">header</span> <span class="o">=</span> <span class="n">doc</span><span class="p">.</span><span class="n">new_tag</span><span class="p">(</span><span class="s">'h1'</span><span class="p">)</span>
    <span class="n">header</span><span class="p">.</span><span class="n">string</span> <span class="o">=</span> <span class="s">'Chapter %d'</span> <span class="o">%</span> <span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,)</span>
    <span class="n">body</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">header</span><span class="p">)</span>
</code></pre></div></div>

<p>Next grab the page content using <code class="language-plaintext highlighter-rouge">urllib</code> and parse it with
BeautifulSoup. I’m using a selector to locate the <code class="language-plaintext highlighter-rouge">div</code> with the
book content.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="c1"># Load chapter content
</span>    <span class="k">with</span> <span class="n">urllib</span><span class="p">.</span><span class="n">request</span><span class="p">.</span><span class="n">urlopen</span><span class="p">(</span><span class="n">chapter</span><span class="p">)</span> <span class="k">as</span> <span class="n">url</span><span class="p">:</span>
        <span class="n">page</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
    <span class="n">content</span> <span class="o">=</span> <span class="n">page</span><span class="p">.</span><span class="n">select</span><span class="p">(</span><span class="s">'.entry-content'</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
</code></pre></div></div>

<p>Finally I iterate over the child elements of the <code class="language-plaintext highlighter-rouge">div.entry-content</code>
element. I keep a running count of the <code class="language-plaintext highlighter-rouge">hr</code> element and only extract
content when we’ve seen exactly one <code class="language-plaintext highlighter-rouge">hr</code> element.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="c1"># Append content between hr elements
</span>    <span class="n">hr_count</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="k">for</span> <span class="n">child</span> <span class="ow">in</span> <span class="n">content</span><span class="p">.</span><span class="n">children</span><span class="p">:</span>
        <span class="k">if</span> <span class="n">child</span><span class="p">.</span><span class="n">name</span> <span class="o">==</span> <span class="s">'hr'</span><span class="p">:</span>
            <span class="n">hr_count</span> <span class="o">+=</span> <span class="mi">1</span>
        <span class="k">elif</span> <span class="n">child</span><span class="p">.</span><span class="n">name</span> <span class="o">==</span> <span class="s">'p'</span> <span class="ow">and</span> <span class="n">hr_count</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
            <span class="n">child</span><span class="p">.</span><span class="n">attrs</span> <span class="o">=</span> <span class="p">{}</span>
            <span class="k">if</span> <span class="n">child</span><span class="p">.</span><span class="n">string</span> <span class="o">==</span> <span class="s">'#'</span><span class="p">:</span>
                <span class="n">body</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">doc</span><span class="p">.</span><span class="n">new_tag</span><span class="p">(</span><span class="s">'hr'</span><span class="p">))</span>
            <span class="k">else</span><span class="p">:</span>
                <span class="n">body</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">child</span><span class="p">)</span>
</code></pre></div></div>

<p>If it’s a <code class="language-plaintext highlighter-rouge">p</code> element, I copy it into the output document, taking a
moment to strip away any attributes present on the <code class="language-plaintext highlighter-rouge">p</code> tag, since, for
some reason, some of these elements have old-fashioned alignment
attributes in the original content.</p>

<p>The original content also uses the text “<code class="language-plaintext highlighter-rouge">#</code>” by itself in a <code class="language-plaintext highlighter-rouge">p</code> to
separate sections rather than using the appropriate markup. Despite
being semantically incorrect, I’m thankful for this since more <code class="language-plaintext highlighter-rouge">hr</code>
elements would have complicated matters further. I convert these to the
correct markup for the final document.</p>

<p>Finally I pretty print the result:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">print</span><span class="p">(</span><span class="n">doc</span><span class="p">.</span><span class="n">prettify</span><span class="p">())</span>
</code></pre></div></div>

<p>Alternatively I could pipe it through <a href="http://tidy.sourceforge.net/">tidy</a>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ python3 extract.py | tidy -indent -utf8 &gt; output.html
</code></pre></div></div>

<p>A brief inspection with a browser indicates that everything seems to
have come out correctly. I won’t know for sure, though, until I actually
read through the whole book. Finally I have Pandoc perform the
conversion.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ pandoc -t epub3 -o output.epub output.html
</code></pre></div></div>

<p>And that’s it! It’s ready to read offline in my e-book reader of
choice. The crude version of my script took around 15–20 minutes to
write and test, so I had an e-book conversion in under 30 minutes.
That’s about as long as I was willing to spend to get it. Tidying the
script up for this article took a lot longer.</p>

<p>I don’t have permission to share the resulting e-book, but I can share
my script so that you can generate your own, at least as long as it’s
hosted at the same place with the same structure.</p>

<ul>
  <li><a href="/download/leather/extract.py" class="download">extract.py</a></li>
</ul>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>The Adversarial Implementation</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2017/05/03/"/>
    <id>urn:uuid:e6f370f9-1d35-3295-3bd5-74ae20c52a0e</id>
    <updated>2017-05-03T17:51:53Z</updated>
    <category term="c"/><category term="python"/><category term="lang"/>
    <content type="html">
      <![CDATA[<p>When <a href="/blog/2017/03/30/">coding against a standard</a>, whether it’s a programming
language specification or an open API with multiple vendors, a common
concern is the conformity of a particular construct to the standard.
This cannot be determined simply by experimentation, since a piece of
code may work correctly due only to the specifics of a particular
implementation. It works <em>today</em> with <em>this</em> implementation, but it
may not work <em>tomorrow</em> or with a <em>different</em> implementation.
Sometimes an implementation will warn about the use of non-standard
behavior, but this isn’t always the case.</p>

<p>When I’m reasoning about whether or not something is allowed, I like to
imagine an <em>adversarial implementation</em>. If the standard allows some
freedom, this implementation takes an imaginative or unique approach. It
chooses <a href="/blog/2016/05/30/">non-obvious interpretations</a> with possibly unexpected,
but valid, results. This is nearly the opposite of <a href="https://groups.google.com/forum/m/#!msg/boring-crypto/48qa1kWignU/o8GGp2K1DAAJ">djb’s hypothetical
boringcc</a>, though some of the ideas are similar.</p>

<p>Many argue that <a href="http://yarchive.net/comp/linux/gcc.html">this is already the case</a> with modern C and C++
optimizing compilers. Compiler writers are already <a href="http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html">creative with the
standard</a> in order to squeeze out more performance, even if it’s
at odds with the programmer’s actual intentions. The most prominent
example in C and C++ is <em>strict aliasing</em>, where the optimizer is
deliberately blinded to certain kinds of aliasing because the standard
allows it to be, eliminating some (possibly important) loads. This
happens despite the compiler’s ability to trivially prove that two
particular objects really do alias.</p>

<p>I want to be clear that I’m not talking about the <a href="http://www.catb.org/jargon/html/N/nasal-demons.html">nasal daemon</a>
kind of creativity. That’s not a helpful thought experiment. What I
mean is this: <strong>Can I imagine a conforming implementation that breaks
any assumptions made by the code?</strong></p>

<p>In practice, compilers typically have to bridge multiple
specifications: the language standard, the <a href="/blog/2016/11/17/">platform ABI</a>, and
operating system interface (process startup, syscalls, etc.). This
really ties its hands on how creative it can be with any one of the
specifications. Depending on the situation, the imaginary adversarial
implementation isn’t necessarily running on any particular platform.
If our program is expected to have a long life, useful for many years
to come, we should avoid making too many assumptions about future
computers and imagine an adversarial compiler with few limitations.</p>

<h3 id="c-example">C example</h3>

<p>Take this bit of C:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">printf</span><span class="p">(</span><span class="s">"%d"</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">foo</span><span class="p">));</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">printf</code> function is variadic, and it relies entirely on the format
string in order to correctly handle all its arguments. The <code class="language-plaintext highlighter-rouge">%d</code>
specifier means that its matching argument is of type <code class="language-plaintext highlighter-rouge">int</code>. The result
of the <code class="language-plaintext highlighter-rouge">sizeof</code> operator is an integer of type <code class="language-plaintext highlighter-rouge">size_t</code>, which has a
different sign and may even be a different size.</p>

<p>Typically this code will work just fine. An <code class="language-plaintext highlighter-rouge">int</code> and <code class="language-plaintext highlighter-rouge">size_t</code> are
generally passed the same way, the actual value probably fits in an
<code class="language-plaintext highlighter-rouge">int</code>, and two’s complement means the signedness isn’t an issue due to
the value being positive. From the <code class="language-plaintext highlighter-rouge">printf</code> point of view, it
typically can’t detect that the type is wrong, so everything works by
chance. In fact, it’s hard to imagine a real situation where this
wouldn’t work fine.</p>

<p>However, this still undefined behavior — a scenario where a creative
adversarial implementation can break things. In this case there are a
few options for an adversarial implementation:</p>

<ol>
  <li>Arguments of type <code class="language-plaintext highlighter-rouge">int</code> and <code class="language-plaintext highlighter-rouge">size_t</code> are passed differently, so
<code class="language-plaintext highlighter-rouge">printf</code> will load the argument it from the wrong place.</li>
  <li>The implementation doesn’t use two’s complement and even small
positive values have different bit representations.</li>
  <li>The type of <code class="language-plaintext highlighter-rouge">foo</code> is given crazy padding for arbitrary reasons that
makes it so large it doesn’t fit in an <code class="language-plaintext highlighter-rouge">int</code>.</li>
</ol>

<p>What’s interesting about #1 is that <em>this has actually happened</em>. For
example, here’s a C source file.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">float</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">float</span> <span class="n">x</span><span class="p">,</span> <span class="kt">int</span> <span class="n">y</span><span class="p">);</span>

<span class="kt">float</span>
<span class="nf">bar</span><span class="p">(</span><span class="kt">int</span> <span class="n">y</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="n">foo</span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="n">f</span><span class="p">,</span> <span class="n">y</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And in another source file:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">float</span>
<span class="nf">foo</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">,</span> <span class="kt">int</span> <span class="n">y</span><span class="p">)</span>
<span class="p">{</span>
    <span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="n">x</span><span class="p">;</span>  <span class="c1">// ignore x</span>
    <span class="k">return</span> <span class="n">y</span> <span class="o">*</span> <span class="mi">2</span><span class="p">.</span><span class="mi">0</span><span class="n">f</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The type of argument <code class="language-plaintext highlighter-rouge">x</code> differs between the prototype and the
definition, which is undefined behavior. However, since this argument
is ignored, this code will still work correctly on many different
real-world computers, particularly where <code class="language-plaintext highlighter-rouge">float</code> and <code class="language-plaintext highlighter-rouge">int</code> arguments
are passed the same way (i.e. on the stack).</p>

<p>However, in 2003 the x86-64 CPU arrived with its new System V ABI.
Floating point and integer arguments were now passed differently, and
the types of preceding arguments mattered when deciding which register
to use. Some constructs that worked fine, by chance, prior to 2003 would
soon stop working due to what may have seemed like an adversarial
implementation years before.</p>

<h3 id="python-example">Python example</h3>

<p>Let’s look at some Python. This snippet opens a file a million times
without closing any handles.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1000000</span><span class="p">):</span>
    <span class="n">f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="s">"/dev/null"</span><span class="p">,</span> <span class="s">"r"</span><span class="p">)</span>
</code></pre></div></div>

<p>Assuming you have a <code class="language-plaintext highlighter-rouge">/dev/null</code>, this code will work fine without
throwing any exceptions on CPython, the most widely used Python
implementation. CPython uses a deterministic reference counting scheme,
and the handle is automatically closed as soon as its variable falls out
of scope. It’s like having an invisible <code class="language-plaintext highlighter-rouge">f.close()</code> at the end of the
block.</p>

<p>However, this code is incorrect. The deterministic handle closing an
implementation behavior, <a href="https://docs.python.org/3/reference/datamodel.html">not part of the specification</a>. The
operating system limits the number of files a process can have open at
once, and there’s a risk that this resource will run out even though
none of those handles are reachable. Imagine an adversarial Python
implementation trying to break this code. It could sufficiently delay
garbage collection, or even <a href="https://web.archive.org/web/0/https://blogs.msdn.microsoft.com/oldnewthing/20100809-00/?p=13203">have infinite memory</a>, omitting
garbage collection altogether.</p>

<p>Like before, such an implementation eventually did come about: PyPy, a
Python implementation written in Python with a JIT compiler. It uses (by
default) something closer to mark-and-sweep, not reference counting, and
those handles <a href="https://utcc.utoronto.ca/~cks/space/blog/programming/NondeterministicGCII">are left open</a> until the next collection.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt;&gt;&gt;&gt; for i in range(1, 1000000):
....     f = open("/dev/null", "r")
.... 
Traceback (most recent call last):
  File "&lt;stdin&gt;", line 2, in &lt;module&gt;
IOError: [Errno 24] Too many open files: '/dev/null'
</code></pre></div></div>

<h3 id="a-tool-for-understanding-specifications">A tool for understanding specifications</h3>

<p>This fits right in with a broader method of self-improvement:
Occasionally put yourself in the implementor’s shoes. Think about what
it would take to correctly implement the code that you write, either
as a language or the APIs that you call. On reflection, you may find
that some of those things that <em>seem</em> cheap may not be. Your
assumptions may be reasonable, but not guaranteed. (Though it may be
that “reasonable” is perfectly sufficient for your situation.)</p>

<p>An adversarial implementation is one that challenges an assumption
you’ve taken for granted by turning it on its head.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  

</feed>
