<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title>Articles tagged compsci at null program</title>
  <link rel="alternate" type="text/html"
        href="https://nullprogram.com/tags/compsci/"/>
  <link rel="self" type="application/atom+xml"
        href="https://nullprogram.com/tags/compsci/feed/"/>
  <updated>2026-04-09T13:25:45Z</updated>
  <id>urn:uuid:73aad984-a0c4-4870-b941-b92ebee6efda</id>

  <author>
    <name>Christopher Wellons</name>
    <uri>https://nullprogram.com</uri>
    <email>wellons@nullprogram.com</email>
  </author>

  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Unix "find" expressions compiled to bytecode</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2025/12/23/"/>
    <id>urn:uuid:bbe2671b-378d-40b1-9564-c3a3b798dfb4</id>
    <updated>2025-12-23T04:20:22Z</updated>
    <category term="c"/><category term="compsci"/>
    <content type="html">
      <![CDATA[<p>In preparation for a future project, I was thinking about at the <a href="https://pubs.opengroup.org/onlinepubs/9799919799/utilities/find.html">unix
<code class="language-plaintext highlighter-rouge">find</code> utility</a>. It operates a file system hierarchies, with basic
operations selected and filtered using a specialized expression language.
Users compose operations using unary and binary operators, grouping with
parentheses for precedence. <code class="language-plaintext highlighter-rouge">find</code> may apply the expression to a great
many files, so compiling it into a bytecode, resolving as much as possible
ahead of time, and minimizing the per-element work, seems like a prudent
implementation strategy. With some thought, I worked out a technique to do
so, which was simpler than I expected, and I’m pleased with the results. I
was later surprised all the real world <code class="language-plaintext highlighter-rouge">find</code> implementations I examined
use <a href="https://craftinginterpreters.com/a-tree-walk-interpreter.html">tree-walk interpreters</a> instead. This article describes how my
compiler works, with a runnable example, and lists ideas for improvements.</p>

<p>For a quick overview, the syntax looks like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ find [-H|-L] path... [expression...]
</code></pre></div></div>

<p>Technically at least one path is required, but most implementations imply
<code class="language-plaintext highlighter-rouge">.</code> when none are provided. If no expression is supplied, the default is
<code class="language-plaintext highlighter-rouge">-print</code>, e.g. print everything under each listed path. This prints the
whole tree, including directories, under the current directory:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ find .
</code></pre></div></div>

<p>To only print files, we could use <code class="language-plaintext highlighter-rouge">-type f</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ find . -type f -a -print
</code></pre></div></div>

<p>Where <code class="language-plaintext highlighter-rouge">-a</code> is the logical AND binary operator. <code class="language-plaintext highlighter-rouge">-print</code> always evaluates
to true. It’s never necessary to write <code class="language-plaintext highlighter-rouge">-a</code>, and adjacent operations are
implicitly joined with <code class="language-plaintext highlighter-rouge">-a</code>. We can keep chaining them, such as finding
all executable files:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ find . -type f -executable -print
</code></pre></div></div>

<p>If no <code class="language-plaintext highlighter-rouge">-exec</code>, <code class="language-plaintext highlighter-rouge">-ok</code>, or <code class="language-plaintext highlighter-rouge">-print</code> (or similar side-effect extensions like
<code class="language-plaintext highlighter-rouge">-print0</code> or <code class="language-plaintext highlighter-rouge">-delete</code>) are present, the whole expression is wrapped in an
implicit <code class="language-plaintext highlighter-rouge">( expr ) -print</code>. So we could also write this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ find . -type f -executable
</code></pre></div></div>

<p>Use <code class="language-plaintext highlighter-rouge">-o</code> for logical OR. To print all files with the executable bit <em>or</em>
with a <code class="language-plaintext highlighter-rouge">.exe</code> extension:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ find . -type f \( -executable -o -name '*.exe' \)
</code></pre></div></div>

<p>I needed parentheses because <code class="language-plaintext highlighter-rouge">-o</code> has lower precedence than <code class="language-plaintext highlighter-rouge">-a</code>, and
because parentheses are shell metacharacters I also needed to escape them
for the shell. It’s a shame <code class="language-plaintext highlighter-rouge">find</code> didn’t use <code class="language-plaintext highlighter-rouge">[</code> and <code class="language-plaintext highlighter-rouge">]</code> instead! There’s
also a unary logical NOT operator, <code class="language-plaintext highlighter-rouge">!</code>. To print all non-executable files:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ find . -type f ! -executable
</code></pre></div></div>

<p>Binary operators are short-circuiting, so this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ find -type d -a -exec du -sh {} +
</code></pre></div></div>

<p>Only lists the sizes of directories, as the <code class="language-plaintext highlighter-rouge">-type d</code> fails causing the
whole expression to evaluate to false without evaluating <code class="language-plaintext highlighter-rouge">-exec</code>. Or
equivalently with <code class="language-plaintext highlighter-rouge">-o</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ find ! -type d -o -exec du -sh {} +
</code></pre></div></div>

<p>If it’s not a directory then the left-hand side evaluates to true, and the
right-hand side is not evaluated. All three implementations I examined
(GNU, BSD, BusyBox) have a <code class="language-plaintext highlighter-rouge">-regex</code> extension, and eagerly compile the
regular expression even if the operation is never evaluated:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ find . -print -o -regex [
find: bad regex '[': Invalid regular expression
</code></pre></div></div>

<p>I was surprised by this because it doesn’t seem to be in the spirit of the
original utility (“The second expression shall not be evaluated if the
first expression is true.”), and I’m used to the idea of short-circuit
validation for the right-hand side of a logical expression. Recompiling
for each evaluation would be unwise, but it could happen lazily such that
an invalid regular expression only causes an error if it’s actually used.
No big deal, just a curiosity.</p>

<h3 id="bytecode-design">Bytecode design</h3>

<p>A bytecode interpreter needs to track just one result at a time, making it
a single register machine, with a 1-bit register at that. I came up with
these five opcodes:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>halt
not
braf   LABEL
brat   LABEL
action NAME [ARGS...]
</code></pre></div></div>

<p>Obviously <code class="language-plaintext highlighter-rouge">halt</code> stops the program. While I could just let it “run off the
end” it’s useful to have an actual instruction so that I can attach a
label and jump to it. The <code class="language-plaintext highlighter-rouge">not</code> opcode negates the register. <code class="language-plaintext highlighter-rouge">braf</code> is
“branch if false”, jumping (via relative immediate) to the labeled (in
printed form) instruction if the register is false. <code class="language-plaintext highlighter-rouge">brat</code> is “branch if
true”. Together they implement the <code class="language-plaintext highlighter-rouge">-a</code> and <code class="language-plaintext highlighter-rouge">-o</code> operators. In practice
there are no loops and jumps are always forward: <code class="language-plaintext highlighter-rouge">find</code> is <a href="/blog/2016/04/30/">not Turing
complete</a>.</p>

<p>In a real implementation each possible action (<code class="language-plaintext highlighter-rouge">-name</code>, <code class="language-plaintext highlighter-rouge">-ok</code>, <code class="language-plaintext highlighter-rouge">-print</code>,
<code class="language-plaintext highlighter-rouge">-type</code>, etc.) would get a dedicated opcode. This requires implementing
each operator, at least in part, in order to correctly parse the whole
<code class="language-plaintext highlighter-rouge">find</code> expression. For now I’m just focused on the bytecode compiler, so
this opcode is a stand-in, and it kind of pretends based on looks. Each
action sets the register, and actions like <code class="language-plaintext highlighter-rouge">-print</code> always set it to true.
My compiler is <a href="https://github.com/skeeto/scratch/blob/c142e729/parsers/findc.c">called <strong><code class="language-plaintext highlighter-rouge">findc</code> (“find compiler”)</strong></a>.</p>

<p><strong>Update</strong>: Or try <a href="https://nullprogram.com/scratch/findc/">the <strong>online demo</strong></a> via Wasm! This version
includes a <a href="https://github.com/skeeto/scratch/commit/2c0a4b8f">peephole optimizer</a> I wrote after publishing this
article.</p>

<p>I assume readers of this program are familiar with <a href="/blog/2025/01/19/"><code class="language-plaintext highlighter-rouge">push</code> macro</a>
and <a href="/blog/2025/06/26/"><code class="language-plaintext highlighter-rouge">Slice</code> macro</a>. Because of the latter it requires a very
recent C compiler, like GCC 15 (e.g. via <a href="https://github.com/skeeto/w64devkit">w64devkit</a>) or Clang 22. Try
out some <code class="language-plaintext highlighter-rouge">find</code> commands and see how they appear as bytecode. The simplest
case is also optimal:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ findc
// path: .
        action  -print
        halt
</code></pre></div></div>

<p>Print the path then halt. Simple. Stepping it up:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ findc -type f -executable
// path: .
        action  -type f
        braf    L1
        action  -executable
L1:     braf    L2
        action  -print
L2:     halt
</code></pre></div></div>

<p>If the path is not a file, it skips over the rest of the program by way of
the second branch instruction. It’s correct, but already we can see room
for improvement. This would be better:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>        action  -type f
        braf    L1
        action  -executable
        braf    L1
        action  -print
L1:     halt
</code></pre></div></div>

<p>More complex still:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ findc -type f \( -executable -o -name '*.exe' \)
// path: .
        action  -type f
        braf    L1
        action  -executable
        brat    L1
        action  -name *.exe
L1:     braf    L2
        action  -print
L2:     halt
</code></pre></div></div>

<p>Inside the parentheses, if <code class="language-plaintext highlighter-rouge">-executable</code> succeeds, the right-hand side is
skipped. Though the <code class="language-plaintext highlighter-rouge">brat</code> jumps straight to a <code class="language-plaintext highlighter-rouge">braf</code>. It would be better
to jump ahead one more instruction:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>        action  -type f
        braf    L2
        action  -executable
        brat    L1
        action  -name *.exe
        braf    L2
L1      action  -print
L2:     halt
</code></pre></div></div>

<p>Silly things aren’t optimized either:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ findc ! ! -executable
// path: .
        action  -executable
        not
        not
        braf    L1
        action  -print
L1:     halt
</code></pre></div></div>

<p>Two <code class="language-plaintext highlighter-rouge">not</code> in a row cancel out, and so these instructions could be
eliminated. Overall this compiler could benefit from a <a href="https://en.wikipedia.org/wiki/Peephole_optimization">peephole
optimizer</a>, scanning over the program repeatedly, making small
improvements until no more can be made:</p>

<ul>
  <li>Delete <code class="language-plaintext highlighter-rouge">not</code>-<code class="language-plaintext highlighter-rouge">not</code>.</li>
  <li>A <code class="language-plaintext highlighter-rouge">brat</code> to a <code class="language-plaintext highlighter-rouge">braf</code> re-targets ahead one instruction, and vice versa.</li>
  <li>Jumping onto an identical jump adopts its target for itself.</li>
  <li>A <code class="language-plaintext highlighter-rouge">not</code>-<code class="language-plaintext highlighter-rouge">braf</code> might convert to a <code class="language-plaintext highlighter-rouge">brat</code>, and vice versa.</li>
  <li>Delete side-effect-free instructions before <code class="language-plaintext highlighter-rouge">halt</code> (e.g. <code class="language-plaintext highlighter-rouge">not</code>-<code class="language-plaintext highlighter-rouge">halt</code>).</li>
  <li>Exploit always-true actions, e.g. <code class="language-plaintext highlighter-rouge">-print</code>-<code class="language-plaintext highlighter-rouge">braf</code> can drop the branch.</li>
</ul>

<p>Writing a bunch of peephole pattern matchers sounds kind of fun. Though my
compiler would first need a slightly richer representation in order to
detect and fix up changes to branches. One more for the road:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ findc -type f ! \( -executable -o -name '*.exe' \)
// path: .
        action  -type f
        braf    L1
        action  -executable
        brat    L2
        action  -name *.exe
L2:     not
L1:     braf    L3
        action  -print
L3:     halt
</code></pre></div></div>

<p>The unoptimal jumps hint at my compiler’s structure. If you’re feeling up
for a challenge, pause here to consider how you’d build this compiler, and
how it might produce these particular artifacts.</p>

<h3 id="parsing-and-compiling">Parsing and compiling</h3>

<p>Before I even considered the shape of the bytecode I knew I needed to
convert <code class="language-plaintext highlighter-rouge">find</code> infix into a compiler-friendly postfix. That is, this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-type f -a ! ( -executable -o -name *.exe )
</code></pre></div></div>

<p>Becomes:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-type f -executable -name *.exe -o ! -a
</code></pre></div></div>

<p>Which, importantly, erases the parentheses. This comes in as an <code class="language-plaintext highlighter-rouge">argv</code>
array, so it’s already tokenized for us by the shell <a href="/blog/2022/02/18/">or runtime</a>. The
classic <a href="https://en.wikipedia.org/wiki/Shunting_yard_algorithm">shunting-yard algorithm</a> solves this problem easily enough.
We have an output queue that goes into the compiler, and a token stack for
tracking <code class="language-plaintext highlighter-rouge">-a</code>, <code class="language-plaintext highlighter-rouge">-o</code>, <code class="language-plaintext highlighter-rouge">!</code>, and <code class="language-plaintext highlighter-rouge">(</code>. Then we walk <code class="language-plaintext highlighter-rouge">argv</code> in order:</p>

<ul>
  <li>
    <p>Actions go straight into the output queue.</p>
  </li>
  <li>
    <p>If we see one of the special stack tokens we push it onto the stack,
first popping operators with greater precedence into the queue, stopping
at <code class="language-plaintext highlighter-rouge">(</code>.</p>
  </li>
  <li>
    <p>If we see <code class="language-plaintext highlighter-rouge">)</code> we pop the stack into the output queue until we see <code class="language-plaintext highlighter-rouge">(</code>.</p>
  </li>
</ul>

<p>When we’re out of tokens, pop the remaining stack into the queue. My
parser synthesizes <code class="language-plaintext highlighter-rouge">-a</code> where it’s implied, so the compiler always sees
logical AND. If the expression contains no <code class="language-plaintext highlighter-rouge">-exec</code>, <code class="language-plaintext highlighter-rouge">-ok</code>, or <code class="language-plaintext highlighter-rouge">-print</code>,
after processing is complete the parser puts <code class="language-plaintext highlighter-rouge">-print</code> then <code class="language-plaintext highlighter-rouge">-a</code> into the
queue, which effectively wraps the whole expression in <code class="language-plaintext highlighter-rouge">( expr ) -print</code>.
By clearing the stack first, the real expression is effectively wrapped in
parentheses, so no parenthesis tokens need to be synthesized.</p>

<p>I’ve used the shunting-yard algorithm many times before, so this part was
easy. The new part was coming up with an algorithm to convert a series of
postfix tokens into bytecode. My solution is the compiler <strong>maintains a
stack of bytecode fragments</strong>. That is, each stack element is a sequence
of one or more bytecode instructions. Branches use relative addresses, so
they’re position-independent, and I can concatenate code fragments without
any branch fix-ups. It takes the following actions from queue tokens:</p>

<ul>
  <li>
    <p>For an action token, create an <code class="language-plaintext highlighter-rouge">action</code> instruction, and push it onto
the fragment stack as a new fragment.</p>
  </li>
  <li>
    <p>For a <code class="language-plaintext highlighter-rouge">!</code> token, pop the top fragment, append a <code class="language-plaintext highlighter-rouge">not</code> instruction, and
push it back onto the stack.</p>
  </li>
  <li>
    <p>For a <code class="language-plaintext highlighter-rouge">-a</code> token, pop the top two fragments, join then with a <code class="language-plaintext highlighter-rouge">braf</code> in
the middle which jumps just beyond the second fragment. That is, if the
first fragment evaluates to false, skip over the second fragment into
whatever follows.</p>
  </li>
  <li>
    <p>For a <code class="language-plaintext highlighter-rouge">-o</code> token, just like <code class="language-plaintext highlighter-rouge">-a</code> but use <code class="language-plaintext highlighter-rouge">brat</code>. If the first fragment
is true, we skip over the second fragment.</p>
  </li>
</ul>

<p>If the expression is valid, at the end of this process the stack contains
exactly one fragment. Append a <code class="language-plaintext highlighter-rouge">halt</code> instruction to this fragment, and
that’s our program! If the final fragment contained a branch just beyond
its end, this <code class="language-plaintext highlighter-rouge">halt</code> is that branch target. A few peephole optimizations
and could probably be an optimal program for this instruction set.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>State machines are wonderful tools</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2020/12/31/"/>
    <id>urn:uuid:c93d7a7b-6ae0-4b7e-afa6-424ef40b9d9c</id>
    <updated>2020-12-31T22:48:13Z</updated>
    <category term="compsci"/><category term="c"/><category term="python"/><category term="lua"/><category term="optimization"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=25601821">on Hacker News</a>.</em></p>

<p>I love when my current problem can be solved with a state machine. They’re
fun to design and implement, and I have high confidence about correctness.
They tend to:</p>

<ol>
  <li>Present <a href="/blog/2018/06/10/">minimal, tidy interfaces</a></li>
  <li>Require few, fixed resources</li>
  <li>Hold no opinions about input and output</li>
  <li>Have a compact, concise implementation</li>
  <li>Be easy to reason about</li>
</ol>

<p>State machines are perhaps one of those concepts you heard about in
college but never put into practice. Maybe you use them regularly.
Regardless, you certainly run into them regularly, from <a href="https://swtch.com/~rsc/regexp/">regular
expressions</a> to traffic lights.</p>

<!--more-->

<h3 id="morse-code-decoder-state-machine">Morse code decoder state machine</h3>

<p>Inspired by <a href="https://possiblywrong.wordpress.com/2020/11/21/among-us-morse-code-puzzle/">a puzzle</a>, I came up with this deterministic state
machine for decoding <a href="https://en.wikipedia.org/wiki/Morse_code">Morse code</a>. It accepts a dot (<code class="language-plaintext highlighter-rouge">'.'</code>), dash
(<code class="language-plaintext highlighter-rouge">'-'</code>), or terminator (0) one at a time, advancing through a state
machine step by step:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">morse_decode</span><span class="p">(</span><span class="kt">int</span> <span class="n">state</span><span class="p">,</span> <span class="kt">int</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">static</span> <span class="k">const</span> <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">t</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
        <span class="mh">0x03</span><span class="p">,</span> <span class="mh">0x3f</span><span class="p">,</span> <span class="mh">0x7b</span><span class="p">,</span> <span class="mh">0x4f</span><span class="p">,</span> <span class="mh">0x2f</span><span class="p">,</span> <span class="mh">0x63</span><span class="p">,</span> <span class="mh">0x5f</span><span class="p">,</span> <span class="mh">0x77</span><span class="p">,</span> <span class="mh">0x7f</span><span class="p">,</span> <span class="mh">0x72</span><span class="p">,</span>
        <span class="mh">0x87</span><span class="p">,</span> <span class="mh">0x3b</span><span class="p">,</span> <span class="mh">0x57</span><span class="p">,</span> <span class="mh">0x47</span><span class="p">,</span> <span class="mh">0x67</span><span class="p">,</span> <span class="mh">0x4b</span><span class="p">,</span> <span class="mh">0x81</span><span class="p">,</span> <span class="mh">0x40</span><span class="p">,</span> <span class="mh">0x01</span><span class="p">,</span> <span class="mh">0x58</span><span class="p">,</span>
        <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x68</span><span class="p">,</span> <span class="mh">0x51</span><span class="p">,</span> <span class="mh">0x32</span><span class="p">,</span> <span class="mh">0x88</span><span class="p">,</span> <span class="mh">0x34</span><span class="p">,</span> <span class="mh">0x8c</span><span class="p">,</span> <span class="mh">0x92</span><span class="p">,</span> <span class="mh">0x6c</span><span class="p">,</span> <span class="mh">0x02</span><span class="p">,</span>
        <span class="mh">0x03</span><span class="p">,</span> <span class="mh">0x18</span><span class="p">,</span> <span class="mh">0x14</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x10</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x0c</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span>
        <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x08</span><span class="p">,</span> <span class="mh">0x1c</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span>
        <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x20</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x24</span><span class="p">,</span>
        <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x28</span><span class="p">,</span> <span class="mh">0x04</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x30</span><span class="p">,</span> <span class="mh">0x31</span><span class="p">,</span> <span class="mh">0x32</span><span class="p">,</span> <span class="mh">0x33</span><span class="p">,</span> <span class="mh">0x34</span><span class="p">,</span> <span class="mh">0x35</span><span class="p">,</span>
        <span class="mh">0x36</span><span class="p">,</span> <span class="mh">0x37</span><span class="p">,</span> <span class="mh">0x38</span><span class="p">,</span> <span class="mh">0x39</span><span class="p">,</span> <span class="mh">0x41</span><span class="p">,</span> <span class="mh">0x42</span><span class="p">,</span> <span class="mh">0x43</span><span class="p">,</span> <span class="mh">0x44</span><span class="p">,</span> <span class="mh">0x45</span><span class="p">,</span> <span class="mh">0x46</span><span class="p">,</span>
        <span class="mh">0x47</span><span class="p">,</span> <span class="mh">0x48</span><span class="p">,</span> <span class="mh">0x49</span><span class="p">,</span> <span class="mh">0x4a</span><span class="p">,</span> <span class="mh">0x4b</span><span class="p">,</span> <span class="mh">0x4c</span><span class="p">,</span> <span class="mh">0x4d</span><span class="p">,</span> <span class="mh">0x4e</span><span class="p">,</span> <span class="mh">0x4f</span><span class="p">,</span> <span class="mh">0x50</span><span class="p">,</span>
        <span class="mh">0x51</span><span class="p">,</span> <span class="mh">0x52</span><span class="p">,</span> <span class="mh">0x53</span><span class="p">,</span> <span class="mh">0x54</span><span class="p">,</span> <span class="mh">0x55</span><span class="p">,</span> <span class="mh">0x56</span><span class="p">,</span> <span class="mh">0x57</span><span class="p">,</span> <span class="mh">0x58</span><span class="p">,</span> <span class="mh">0x59</span><span class="p">,</span> <span class="mh">0x5a</span>
    <span class="p">};</span>
    <span class="kt">int</span> <span class="n">v</span> <span class="o">=</span> <span class="n">t</span><span class="p">[</span><span class="o">-</span><span class="n">state</span><span class="p">];</span>
    <span class="k">switch</span> <span class="p">(</span><span class="n">c</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">case</span> <span class="mh">0x00</span><span class="p">:</span> <span class="k">return</span> <span class="n">v</span> <span class="o">&gt;&gt;</span> <span class="mi">2</span> <span class="o">?</span> <span class="n">t</span><span class="p">[(</span><span class="n">v</span> <span class="o">&gt;&gt;</span> <span class="mi">2</span><span class="p">)</span> <span class="o">+</span> <span class="mi">63</span><span class="p">]</span> <span class="o">:</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">case</span> <span class="mh">0x2e</span><span class="p">:</span> <span class="k">return</span> <span class="n">v</span> <span class="o">&amp;</span>  <span class="mi">2</span> <span class="o">?</span> <span class="n">state</span><span class="o">*</span><span class="mi">2</span> <span class="o">-</span> <span class="mi">1</span> <span class="o">:</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">case</span> <span class="mh">0x2d</span><span class="p">:</span> <span class="k">return</span> <span class="n">v</span> <span class="o">&amp;</span>  <span class="mi">1</span> <span class="o">?</span> <span class="n">state</span><span class="o">*</span><span class="mi">2</span> <span class="o">-</span> <span class="mi">2</span> <span class="o">:</span> <span class="mi">0</span><span class="p">;</span>
    <span class="nl">default:</span>   <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>It typically compiles to under 200 bytes (table included), requires only a
few bytes of memory to operate, and will fit on even the smallest of
microcontrollers. The full source listing, documentation, and
comprehensive test suite:</p>

<p><a href="https://github.com/skeeto/scratch/blob/master/parsers/morsecode.c">https://github.com/skeeto/scratch/blob/master/parsers/morsecode.c</a></p>

<p>The state machine is trie-shaped, and the 100-byte table <code class="language-plaintext highlighter-rouge">t</code> is the static
<a href="/blog/2016/11/15/">encoding of the Morse code trie</a>:</p>

<p><a href="/img/diagram/morse.dot"><img src="/img/diagram/morse.svg" alt="" /></a></p>

<p>Dots traverse left, dashes right, terminals emit the character at the
current node (terminal state). Stopping on red nodes, or attempting to
take an unlisted edge is an error (invalid input).</p>

<p>Each node in the trie is a byte in the table. Dot and dash each have a bit
indicating if their edge exists. The remaining bits index into a 1-based
character table (at the end of <code class="language-plaintext highlighter-rouge">t</code>), and a 0 “index” indicates an empty
(red) node. The nodes themselves are laid out as <a href="https://en.wikipedia.org/wiki/Binary_heap#Heap_implementation">a binary heap in an
array</a>: the left and right children of the node at <code class="language-plaintext highlighter-rouge">i</code> are found at
<code class="language-plaintext highlighter-rouge">i*2+1</code> and <code class="language-plaintext highlighter-rouge">i*2+2</code>. No need to <a href="/blog/2020/10/19/#minimax-costs">waste memory storing edges</a>!</p>

<p>Since C sadly does not have multiple return values, I’m using the sign bit
of the return value to create a kind of sum type. A negative return value
is a state — which is why the state is negated internally before use. A
positive result is a character output. If zero, the input was invalid.
Only the initial state is non-negative (zero), which is fine since it’s,
by definition, not possible to traverse to the initial state. No <code class="language-plaintext highlighter-rouge">c</code> input
will produce a bad state.</p>

<p>In the original problem the terminals were missing. Despite being a <em>state
machine</em>, <code class="language-plaintext highlighter-rouge">morse_decode</code> is a pure function. The caller can save their
position in the trie by saving the state integer and trying different
inputs from that state.</p>

<h3 id="utf-8-decoder-state-machine">UTF-8 decoder state machine</h3>

<p>The classic UTF-8 decoder state machine is <a href="https://bjoern.hoehrmann.de/utf-8/decoder/dfa/">Bjoern Hoehrmann’s Flexible
and Economical UTF-8 Decoder</a>. It packs the entire state machine into
a relatively small table using clever tricks. It’s easily my favorite
UTF-8 decoder.</p>

<p>I wanted to try my own hand at it, so I re-derived the same canonical
UTF-8 automaton:</p>

<p><a href="/img/diagram/utf8.dot"><img src="/img/diagram/utf8.svg" alt="" /></a></p>

<p>Then I encoded this diagram directly into a much larger (2,064-byte), less
elegant table, too large to display inline here:</p>

<p><a href="https://github.com/skeeto/scratch/blob/master/parsers/utf8_decode.c">https://github.com/skeeto/scratch/blob/master/parsers/utf8_decode.c</a></p>

<p>However, the trade-off is that the executable code is smaller, faster, and
<a href="/blog/2017/10/06/">branchless again</a> (by accident, I swear!):</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">utf8_decode</span><span class="p">(</span><span class="kt">int</span> <span class="n">state</span><span class="p">,</span> <span class="kt">long</span> <span class="o">*</span><span class="n">cp</span><span class="p">,</span> <span class="kt">int</span> <span class="n">byte</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">static</span> <span class="k">const</span> <span class="kt">signed</span> <span class="kt">char</span> <span class="n">table</span><span class="p">[</span><span class="mi">8</span><span class="p">][</span><span class="mi">256</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="cm">/* ... */</span> <span class="p">};</span>
    <span class="k">static</span> <span class="k">const</span> <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">masks</span><span class="p">[</span><span class="mi">2</span><span class="p">][</span><span class="mi">8</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="cm">/* ... */</span> <span class="p">};</span>
    <span class="kt">int</span> <span class="n">next</span> <span class="o">=</span> <span class="n">table</span><span class="p">[</span><span class="n">state</span><span class="p">][</span><span class="n">byte</span><span class="p">];</span>
    <span class="o">*</span><span class="n">cp</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">cp</span> <span class="o">&lt;&lt;</span> <span class="mi">6</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">byte</span> <span class="o">&amp;</span> <span class="n">masks</span><span class="p">[</span><span class="o">!</span><span class="n">state</span><span class="p">][</span><span class="n">next</span><span class="o">&amp;</span><span class="mi">7</span><span class="p">]);</span>
    <span class="k">return</span> <span class="n">next</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Like Bjoern’s decoder, there’s a code point accumulator. The <em>real</em> state
machine has 1,109,950 terminal states, and many more edges and nodes. The
accumulator is an optimization to track exactly which edge was taken to
which node without having to represent such a monstrosity.</p>

<p>Despite the huge table I’m pretty happy with it.</p>

<h3 id="word-count-state-machine">Word count state machine</h3>

<p>Here’s another state machine I came up with awhile back for counting words
one Unicode code point at a time while accounting for Unicode’s various
kinds of whitespace. If your input is bytes, then plug this into the above
UTF-8 state machine to convert bytes to code points! This one uses a
switch instead of a lookup table since the table would be sparse (i.e.
<a href="/blog/2019/12/09/">let the compiler figure it out</a>).</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* State machine counting words in a sequence of code points.
 *
 * The current word count is the absolute value of the state, so
 * the initial state is zero. Code points are fed into the state
 * machine one at a time, each call returning the next state.
 */</span>
<span class="kt">long</span> <span class="nf">word_count</span><span class="p">(</span><span class="kt">long</span> <span class="n">state</span><span class="p">,</span> <span class="kt">long</span> <span class="n">codepoint</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">switch</span> <span class="p">(</span><span class="n">codepoint</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">case</span> <span class="mh">0x0009</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x000a</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x000b</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x000c</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x000d</span><span class="p">:</span>
    <span class="k">case</span> <span class="mh">0x0020</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x0085</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x00a0</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x1680</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2000</span><span class="p">:</span>
    <span class="k">case</span> <span class="mh">0x2001</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2002</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2003</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2004</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2005</span><span class="p">:</span>
    <span class="k">case</span> <span class="mh">0x2006</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2007</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2008</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2009</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x200a</span><span class="p">:</span>
    <span class="k">case</span> <span class="mh">0x2028</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2029</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x202f</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x205f</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x3000</span><span class="p">:</span>
        <span class="k">return</span> <span class="n">state</span> <span class="o">&lt;</span> <span class="mi">0</span> <span class="o">?</span> <span class="o">-</span><span class="n">state</span> <span class="o">:</span> <span class="n">state</span><span class="p">;</span>
    <span class="nl">default:</span>
        <span class="k">return</span> <span class="n">state</span> <span class="o">&lt;</span> <span class="mi">0</span> <span class="o">?</span> <span class="n">state</span> <span class="o">:</span> <span class="o">-</span><span class="mi">1</span> <span class="o">-</span> <span class="n">state</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I’m particularly happy with the <em>edge-triggered</em> state transition
mechanism. The sign of the state tracks whether the “signal” is “high”
(inside of a word) or “low” (outside of a word), and so it counts rising
edges.</p>

<p><a href="/img/diagram/wordcount.dot"><img src="/img/diagram/wordcount.svg" alt="" /></a></p>

<p>The counter is not <em>technically</em> part of the state machine — though it
eventually overflows for practical reasons, it isn’t really “finite” — but
is rather an external count of the times the state machine transitions
from low to high, which is the actual, useful output.</p>

<p><em>Reader challenge</em>: Find a slick, efficient way to encode all those code
points as a table rather than rely on whatever the compiler generates for
the <code class="language-plaintext highlighter-rouge">switch</code> (chain of branches, jump table?).</p>

<h3 id="coroutines-and-generators-as-state-machines">Coroutines and generators as state machines</h3>

<p>In languages that support them, state machines can be implemented using
coroutines, including generators. I do particularly like the idea of
<a href="/blog/2018/05/31/">compiler-synthesized coroutines</a> as state machines, though this is a
rare treat. The state is implicit in the coroutine at each yield, so the
programmer doesn’t have to manage it explicitly. (Though often that
explicit control is powerful!)</p>

<p>Unfortunately in practice it always feels clunky. The following implements
the word count state machine (albeit in a rather un-Pythonic way). The
generator returns the current count and is continued by sending it another
code point:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">WHITESPACE</span> <span class="o">=</span> <span class="p">{</span>
    <span class="mh">0x0009</span><span class="p">,</span> <span class="mh">0x000a</span><span class="p">,</span> <span class="mh">0x000b</span><span class="p">,</span> <span class="mh">0x000c</span><span class="p">,</span> <span class="mh">0x000d</span><span class="p">,</span>
    <span class="mh">0x0020</span><span class="p">,</span> <span class="mh">0x0085</span><span class="p">,</span> <span class="mh">0x00a0</span><span class="p">,</span> <span class="mh">0x1680</span><span class="p">,</span> <span class="mh">0x2000</span><span class="p">,</span>
    <span class="mh">0x2001</span><span class="p">,</span> <span class="mh">0x2002</span><span class="p">,</span> <span class="mh">0x2003</span><span class="p">,</span> <span class="mh">0x2004</span><span class="p">,</span> <span class="mh">0x2005</span><span class="p">,</span>
    <span class="mh">0x2006</span><span class="p">,</span> <span class="mh">0x2007</span><span class="p">,</span> <span class="mh">0x2008</span><span class="p">,</span> <span class="mh">0x2009</span><span class="p">,</span> <span class="mh">0x200a</span><span class="p">,</span>
    <span class="mh">0x2028</span><span class="p">,</span> <span class="mh">0x2029</span><span class="p">,</span> <span class="mh">0x202f</span><span class="p">,</span> <span class="mh">0x205f</span><span class="p">,</span> <span class="mh">0x3000</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">def</span> <span class="nf">wordcount</span><span class="p">():</span>
    <span class="n">count</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
        <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
            <span class="c1"># low signal
</span>            <span class="n">codepoint</span> <span class="o">=</span> <span class="k">yield</span> <span class="n">count</span>
            <span class="k">if</span> <span class="n">codepoint</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">WHITESPACE</span><span class="p">:</span>
                <span class="n">count</span> <span class="o">+=</span> <span class="mi">1</span>
                <span class="k">break</span>
        <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
            <span class="c1"># high signal
</span>            <span class="n">codepoint</span> <span class="o">=</span> <span class="k">yield</span> <span class="n">count</span>
            <span class="k">if</span> <span class="n">codepoint</span> <span class="ow">in</span> <span class="n">WHITESPACE</span><span class="p">:</span>
                <span class="k">break</span>
</code></pre></div></div>

<p>However, the generator ceremony dominates the interface, so you’d probably
want to wrap it in something nicer — at which point there’s really no
reason to use the generator in the first place:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">wc</span> <span class="o">=</span> <span class="n">wordcount</span><span class="p">()</span>
<span class="nb">next</span><span class="p">(</span><span class="n">wc</span><span class="p">)</span>  <span class="c1"># prime the generator
</span><span class="n">wc</span><span class="p">.</span><span class="n">send</span><span class="p">(</span><span class="nb">ord</span><span class="p">(</span><span class="s">'A'</span><span class="p">))</span>  <span class="c1"># =&gt; 1
</span><span class="n">wc</span><span class="p">.</span><span class="n">send</span><span class="p">(</span><span class="nb">ord</span><span class="p">(</span><span class="s">' '</span><span class="p">))</span>  <span class="c1"># =&gt; 1
</span><span class="n">wc</span><span class="p">.</span><span class="n">send</span><span class="p">(</span><span class="nb">ord</span><span class="p">(</span><span class="s">'B'</span><span class="p">))</span>  <span class="c1"># =&gt; 2
</span><span class="n">wc</span><span class="p">.</span><span class="n">send</span><span class="p">(</span><span class="nb">ord</span><span class="p">(</span><span class="s">' '</span><span class="p">))</span>  <span class="c1"># =&gt; 2
</span></code></pre></div></div>

<p>Same idea in Lua, which famously has full coroutines:</p>

<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">local</span> <span class="n">WHITESPACE</span> <span class="o">=</span> <span class="p">{</span>
    <span class="p">[</span><span class="mh">0x0009</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x000a</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x000b</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x000c</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,</span>
    <span class="p">[</span><span class="mh">0x000d</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x0020</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x0085</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x00a0</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,</span>
    <span class="p">[</span><span class="mh">0x1680</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2000</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2001</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2002</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,</span>
    <span class="p">[</span><span class="mh">0x2003</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2004</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2005</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2006</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,</span>
    <span class="p">[</span><span class="mh">0x2007</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2008</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2009</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x200a</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,</span>
    <span class="p">[</span><span class="mh">0x2028</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2029</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x202f</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x205f</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,</span>
    <span class="p">[</span><span class="mh">0x3000</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span>
<span class="p">}</span>

<span class="k">function</span> <span class="nf">wordcount</span><span class="p">()</span>
    <span class="kd">local</span> <span class="n">count</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="k">while</span> <span class="kc">true</span> <span class="k">do</span>
        <span class="k">while</span> <span class="kc">true</span> <span class="k">do</span>
            <span class="c1">-- low signal</span>
            <span class="kd">local</span> <span class="n">codepoint</span> <span class="o">=</span> <span class="nb">coroutine.yield</span><span class="p">(</span><span class="n">count</span><span class="p">)</span>
            <span class="k">if</span> <span class="ow">not</span> <span class="n">WHITESPACE</span><span class="p">[</span><span class="n">codepoint</span><span class="p">]</span> <span class="k">then</span>
                <span class="n">count</span> <span class="o">=</span> <span class="n">count</span> <span class="o">+</span> <span class="mi">1</span>
                <span class="k">break</span>
            <span class="k">end</span>
        <span class="k">end</span>
        <span class="k">while</span> <span class="kc">true</span> <span class="k">do</span>
            <span class="c1">-- high signal</span>
            <span class="kd">local</span> <span class="n">codepoint</span> <span class="o">=</span> <span class="nb">coroutine.yield</span><span class="p">(</span><span class="n">count</span><span class="p">)</span>
            <span class="k">if</span> <span class="n">WHITESPACE</span><span class="p">[</span><span class="n">codepoint</span><span class="p">]</span> <span class="k">then</span>
                <span class="k">break</span>
            <span class="k">end</span>
        <span class="k">end</span>
    <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Except for initially priming the coroutine, at least <code class="language-plaintext highlighter-rouge">coroutine.wrap()</code>
hides the fact that it’s a coroutine.</p>

<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">wc</span> <span class="o">=</span> <span class="nb">coroutine.wrap</span><span class="p">(</span><span class="n">wordcount</span><span class="p">)</span>
<span class="n">wc</span><span class="p">()</span>  <span class="c1">-- prime the coroutine</span>
<span class="n">wc</span><span class="p">(</span><span class="nb">string.byte</span><span class="p">(</span><span class="s1">'A'</span><span class="p">))</span>  <span class="c1">-- =&gt; 1</span>
<span class="n">wc</span><span class="p">(</span><span class="nb">string.byte</span><span class="p">(</span><span class="s1">' '</span><span class="p">))</span>  <span class="c1">-- =&gt; 1</span>
<span class="n">wc</span><span class="p">(</span><span class="nb">string.byte</span><span class="p">(</span><span class="s1">'B'</span><span class="p">))</span>  <span class="c1">-- =&gt; 2</span>
<span class="n">wc</span><span class="p">(</span><span class="nb">string.byte</span><span class="p">(</span><span class="s1">' '</span><span class="p">))</span>  <span class="c1">-- =&gt; 2</span>
</code></pre></div></div>

<h3 id="extra-examples">Extra examples</h3>

<p>Finally, a couple more examples not worth describing in detail here. First
a Unicode case folding state machine:</p>

<p><a href="https://github.com/skeeto/scratch/blob/master/misc/casefold.c">https://github.com/skeeto/scratch/blob/master/misc/casefold.c</a></p>

<p>It’s just an interface to do a lookup into the <a href="https://www.unicode.org/Public/13.0.0/ucd/CaseFolding.txt">official case folding
table</a>. It was an experiment, and I <em>probably</em> wouldn’t use it in a
real program.</p>

<p>Second, I’ve mentioned <a href="https://github.com/skeeto/utf-7">my UTF-7 encoder and decoder</a> before. It’s
not obvious from the interface, but internally it’s just a state machine
for both encoder and decoder, which is what it allows it to “pause”
between any pair of input/output bytes.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>You might not need machine learning</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2020/11/24/"/>
    <id>urn:uuid:91aa121d-c796-4c11-99d4-41c707637672</id>
    <updated>2020-11-24T04:04:36Z</updated>
    <category term="ai"/><category term="c"/><category term="media"/><category term="compsci"/><category term="video"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=25196574">on Hacker News</a>.</em></p>

<p>Machine learning is a trendy topic, so naturally it’s often used for
inappropriate purposes where a simpler, more efficient, and more reliable
solution suffices. The other day I saw an illustrative and fun example of
this: <a href="https://www.youtube.com/watch?v=-sg-GgoFCP0">Neural Network Cars and Genetic Algorithms</a>. The video
demonstrates 2D cars driven by a neural network with weights determined by
a generic algorithm. However, the entire scheme can be replaced by a
first-degree polynomial without any loss in capability. The machine
learning part is overkill.</p>

<p><a href="https://nullprogram.com/video/?v=racetrack"><img src="/img/screenshot/racetrack.jpg" alt="" /></a></p>

<!--more-->

<p>Above demonstrates my implementation using a polynomial to drive the cars.
My wife drew the background. There’s no path-finding; these cars are just
feeling their way along the track, “following the rails” so to speak.</p>

<p>My intention is not to pick on this project in particular. The likely
motivation in the first place was a desire to apply a neural network to
<em>something</em>. Many of my own projects are little more than a vehicle to try
something new, so I can sympathize. Though a professional setting is
different, where machine learning should be viewed with a more skeptical
eye than it’s usually given. For instance, don’t use active learning to
select sample distribution when a <a href="http://extremelearning.com.au/unreasonable-effectiveness-of-quasirandom-sequences/">quasirandom sequence</a> will do.</p>

<p>In the video, the car has a limited turn radius, and minimum and maximum
speeds. (I’ve retained these contraints in my own simulation.) There are
five sensors — forward, forward-diagonals, and sides — each sensing the
distance to the nearest wall. These are fed into a 3-layer neural network,
and the outputs determine throttle and steering. Sounds pretty cool!</p>

<p><img src="/img/diagram/racecar.svg" alt="" /></p>

<p>A key feature of neural networks is that the outputs are a nonlinear
function of the inputs. However, steering a 2D car is simple enough that
<strong>a linear function is more than sufficient</strong>, and neural networks are
unnecessary. Here are my equations:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>steering = C0*input1 - C0*input3
throttle = C1*input2
</code></pre></div></div>

<p>I only need three of the original inputs — forward for throttle, and
diagonals for steering — and the driver has just two parameters, <code class="language-plaintext highlighter-rouge">C0</code> and
<code class="language-plaintext highlighter-rouge">C1</code>, the polynomial coefficients. Optimal values depend on the track
layout and car configuration, but for my simulation, most values above 0
and below 1 are good enough in most cases. It’s less a matter of crashing
and more about navigating the course quickly.</p>

<p>The lengths of the red lines below are the driver’s three inputs:</p>

<video src="/vid/racecar.mp4" width="530" height="330" loop="" muted="" autoplay="" controls="">
</video>

<p>These polynomials are obviously much faster than a neural network, but
they’re also easy to understand and debug. I can confidently reason about
the entire range of possible inputs rather than worry about a trained
neural network <a href="https://arxiv.org/abs/1903.06638">responding strangely</a> to untested inputs.</p>

<p>Instead of doing anything fancy, my program generates the coefficients at
random to explore the space. If I wanted to generate a good driver for a
course, I’d run a few thousand of these and pick the coefficients that
complete the course in the shortest time. For instance, these coefficients
make for a fast, capable driver for the course featured at the top of the
article:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>C0 = 0.896336973, C1 = 0.0354805067
</code></pre></div></div>

<p>Many constants can complete the track, but some will be faster than
others. If I was developing a racing game using this as the AI, I’d not
just pick constants that successfully complete the track, but the ones
that do it quickly. Here’s what the spread can look like:</p>

<video src="/vid/racecars.mp4" width="530" height="330" loop="" muted="" autoplay="" controls="">
</video>

<p>If you want to play around with this yourself, here’s my C source code
that implements this driving AI and <a href="/blog/2017/11/03/">generates the videos and images
above</a>:</p>

<p><strong><a href="https://github.com/skeeto/scratch/blob/master/aidrivers/aidrivers.c">aidrivers.c</a></strong></p>

<p>Racetracks are just images drawn in your favorite image editing program
using the colors documented in the source header.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Unintuitive JSON Parsing</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/12/28/"/>
    <id>urn:uuid:721eda6d-a78a-41e1-9d78-db3666208a71</id>
    <updated>2019-12-28T17:23:09Z</updated>
    <category term="javascript"/><category term="compsci"/><category term="lang"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=21900715">on Hacker News</a> and <a href="https://old.reddit.com/r/programming/comments/egvq11/unintuitive_json_parsing/">on reddit</a>.</em></p>

<p>Despite the goal of JSON being a subset of JavaScript — which <a href="http://seriot.ch/parsing_json.php">it failed
to achieve</a> (update: <a href="https://github.com/tc39/proposal-json-superset">this was fixed</a>) — parsing JSON is
quite unlike parsing a programming language. For invalid inputs, the
specific cause of error is often counter-intuitive. Normally this
doesn’t matter, but I recently <a href="https://github.com/skeeto/pdjson/pull/19/commits/1500ca73f2ed44ed8a6129fd1fa164bd7e326874#diff-eb030bc5ad128fc13160acab7d06f3a0R702">ran into a case where it does</a>.</p>

<!--more-->

<p>Consider this invalid input to a JSON parser:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[01]
</code></pre></div></div>

<p>To a human this might be interpreted as an array containing a number.
Either the leading zero is ignored, or it indicates octal, as it does in
many languages, including JavaScript. In either case the number in the
array would be 1.</p>

<p>However, JSON does not support leading zeros, neither ignoring them nor
supporting octal notation. Here’s the railroad diagram for numbers <a href="https://www.json.org/json-en.html">from
the JSON specficaiton</a>:</p>

<p><img src="/img/diagram/json-number.png" alt="" />
<!-- Copyright (C) 2017 Ecma International --></p>

<p>Or in regular expression form:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-?(0|[1-9][0-9]*)(\.[0-9]+)?([eE][+-]?[0-9]+)?
</code></pre></div></div>

<p>If a token starts with <code class="language-plaintext highlighter-rouge">0</code> then it can only be followed by <code class="language-plaintext highlighter-rouge">.</code>, <code class="language-plaintext highlighter-rouge">e</code>,
or <code class="language-plaintext highlighter-rouge">E</code>. It cannot be followed by a digit. So, the natural human
response to mentally parsing <code class="language-plaintext highlighter-rouge">[01]</code> is: This input is invalid because
it contains a number with a leading zero, and leading zeros are not
accepted. <em>But this is not actually why parsing fails!</em></p>

<p>A simple model for the parser is as consuming tokens from a lexer. The
lexer’s job is to read individual code points (characters) from the
input and group them into tokens. The possible tokens are string,
number, left brace, right brace, left bracket, right bracket, comma,
true, false, and null. The lexer skips over insignificant whitespace,
and it doesn’t care about structure, like matching braces and
brackets. That’s the parser’s job.</p>

<p>In some instances the lexer can fail to parse a token. For example, if
while looking for a new token the lexer reads the character <code class="language-plaintext highlighter-rouge">%</code>, then
the input must be invalid. No token starts with this character. So in
some cases invalid input will be detected by the lexer.</p>

<p>The parser consumes tokens from the lexer and, using some state, ensures
the sequence of tokens is valid. For example, arrays must be a well
formed sequence of left bracket, value, comma, value, comma, etc., right
bracket. One way to reject input with trailing garbage, is for the lexer
to also produce an EOF (end of file/input) token when there are no more
tokens, and the parser could specifically check for that token before
accepting the input as valid.</p>

<p>Getting back to the input <code class="language-plaintext highlighter-rouge">[01]</code>, a JSON parser receives a left bracket
token, then updates its bookkeeping to track that it’s parsing an array.
When looking for the next token, the lexer sees the character <code class="language-plaintext highlighter-rouge">0</code>
followed by <code class="language-plaintext highlighter-rouge">1</code>. According to the railroad diagram, this is a number
token (starts with <code class="language-plaintext highlighter-rouge">0</code>), but <code class="language-plaintext highlighter-rouge">1</code> cannot be part of this token, so it
produces a number token with the contents “0”. Everything is still fine.</p>

<p>Next the lexer sees <code class="language-plaintext highlighter-rouge">1</code> followed by <code class="language-plaintext highlighter-rouge">]</code>. Since <code class="language-plaintext highlighter-rouge">]</code> cannot be part of a
number, it produces another number token with the contents “1”. The
parser receives this token but, since it’s parsing an array, it expects
either a comma token or a right bracket. Since this is neither, the
parser fails with an error about an unexpected number. <strong>The parser will
not complain about leading zeros because JSON has no concept of leading
zeros.</strong> Human intuition is right, but for the wrong reasons.</p>

<p>Try this for yourself in your favorite JSON parser. Or even just pop up
the JavaScript console in your browser and try it out:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>JSON.parse('[01]');
</code></pre></div></div>

<p>Firefox reports:</p>

<blockquote>
  <p>SyntaxError: JSON.parse: expected ‘,’ or ‘]’ after array element</p>
</blockquote>

<p>Chromium reports:</p>

<blockquote>
  <p>SyntaxError: Unexpected number in JSON</p>
</blockquote>

<p>Edge reports (note it says “number” not “digit”):</p>

<blockquote>
  <p>Error: Invalid number at position:3</p>
</blockquote>

<p>In all cases the parsers accepted a zero as the first array element,
then rejected the input after the second number token for being a bad
sequence of tokens. In other words, this is a parser error rather than
a lexer error, as a human might intuit.</p>

<p><a href="https://github.com/skeeto/pdjson">My JSON parser</a> comes with a testing tool that shows the token
stream up until the parser rejects the input, useful for understanding
these situations:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ echo '[01]' | tests/stream
struct expect seq[] = {
    {JSON_ARRAY},
    {JSON_NUMBER, "0"},
    {JSON_ERROR},
};
</code></pre></div></div>

<p>There’s an argument to be made here that perhaps the human readable
error message <em>should</em> mention leading zeros, since that’s likely the
cause of the invalid input. That is, a human probably thought JSON
allowed leading zeros, and so the clearer message would tell the human
that JSON does not allow leading zeros. This is the “more art than
science” part of parsing.</p>

<p>It’s the same story with this invalid input:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[truefalse]
</code></pre></div></div>

<p>From this input, the lexer <em>unambiguously</em> produces left bracket,
true, false, right bracket. It’s still up to the parser to reject this
input. The only reason we never see <code class="language-plaintext highlighter-rouge">truefalse</code> in valid JSON is that
the overall structure never allows these tokens to be adjacent, not
because they’d be ambiguous. Programming languages have identifiers,
and in a programming language this would parse as the identifier
<code class="language-plaintext highlighter-rouge">truefalse</code> rather than <code class="language-plaintext highlighter-rouge">true</code> followed by <code class="language-plaintext highlighter-rouge">false</code>. From this point of
view, JSON seems quite strange.</p>

<p>Just as before, Firefox reports:</p>

<blockquote>
  <p>SyntaxError: JSON.parse: expected ‘,’ or ‘]’ after array element</p>
</blockquote>

<p>Chromium reports the same error as it does for <code class="language-plaintext highlighter-rouge">[true false]</code>:</p>

<blockquote>
  <p>SyntaxError: Unexpected token f in JSON</p>
</blockquote>

<p>Edge’s message is probably a minor bug in their JSON parser:</p>

<blockquote>
  <p>Error: Expected ‘]’ at position:10</p>
</blockquote>

<p>Position 10 is the last character in <code class="language-plaintext highlighter-rouge">false</code>. The lexer consumed <code class="language-plaintext highlighter-rouge">false</code>
from the input, produced a “false” token, then the parser rejected the
input. When it reported the error, it chose the <em>end</em> of the invalid
token as the error position rather than the start, despite the fact that
the only two valid tokens (comma, right bracket) are both a single
character. It should also say “Expected ‘]’ or ‘,’” (as Firefox does)
rather than just “]”.</p>

<h3 id="concatenated-json">Concatenated JSON</h3>

<p>That’s all pretty academic. Except for producing nice error messages,
nobody really cares so much <em>why</em> the input was rejected. The mismatch
between intuition and reality isn’t important.</p>

<p>However, it <em>does</em> come up with concatenated JSON. Some parsers,
including mine, will optionally consume multiple JSON values, one after
another, from the same input. Here’s an example from <a href="/blog/2016/09/15/">one of my
favorite</a> command line tools, <a href="https://stedolan.github.io/jq/">jq</a>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>echo '{"x":0,"y":1}{"x":2,"y":3}{"x":4,"y":5}' | jq '.x + .y'
1
5
9
</code></pre></div></div>

<p>The input contains three unambiguously-concatenated JSON objects, so
the parser produces three distinct objects. Now consider this input,
this time outside of the context of an array:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>01
</code></pre></div></div>

<p>Is this invalid, one number, or two numbers? According to the lexer and
parser model described above, this is valid and unambiguously two
concatenated numbers. Here’s what my parser says:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ echo '01' | tests/stream
struct expect seq[] = {
    {JSON_NUMBER, "0"},
    {JSON_DONE},
    {JSON_NUMBER, "1"},
    {JSON_DONE},
    {JSON_ERROR},
};
</code></pre></div></div>

<p>Note: The <code class="language-plaintext highlighter-rouge">JSON_DONE</code> “token” indicates acceptance, and the <code class="language-plaintext highlighter-rouge">JSON_ERROR</code>
token is an EOF indicator, not a hard error. Since jq allows leading
zeros in its JSON input, it’s ambiguous and parses this as the number 1,
so asking its opinion on this input isn’t so interesting. I surveyed
some other JSON parsers that accept concatenated JSON:</p>

<ul>
  <li><a href="https://github.com/FasterXML/jackson">Jackson</a>: Reject as leading zero.</li>
  <li><a href="https://github.com/yonik/noggit">Noggit</a>: Reject as leading zero.</li>
  <li><a href="https://lloyd.github.io/yajl/">yajl</a>: Accept as two numbers.</li>
</ul>

<p>For my parser it’s the same story for <code class="language-plaintext highlighter-rouge">truefalse</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>echo 'truefalse' | tests/stream
struct expect seq[] = {
    {JSON_TRUE, "true"},
    {JSON_DONE},
    {JSON_FALSE, "false"},
    {JSON_DONE},
    {JSON_ERROR},
};
</code></pre></div></div>

<p>Neither rejecting nor accepting this input is wrong, per se.
Concatenated JSON is outside of the scope of JSON itself, and
concatenating arbitrary JSON objects without a whitespace delimiter can
lead to weird and ill-formed input. This is all a great argument in
favor or <a href="http://ndjson.org/">Newline Delimited JSON</a>, and its two simple rules:</p>

<ol>
  <li>Line separator is <code class="language-plaintext highlighter-rouge">'\n'</code></li>
  <li>Each line is a valid JSON value</li>
</ol>

<p>This solves the concatenation issue, and, even more, it works well with
parsers not supporting concatenation: Split the input on newlines and
pass each line to your JSON parser.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>What's in an Emacs Lambda</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2017/12/14/"/>
    <id>urn:uuid:efcc8cf7-11d3-3bd3-9fc9-a23e80f7bf33</id>
    <updated>2017-12-14T18:18:57Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="compsci"/><category term="lang"/>
    <content type="html">
      <![CDATA[<p>There was recently some <a href="https://old.reddit.com/r/emacs/comments/7h23ed/dynamically_construct_a_lambda_function/">interesting discussion</a> about correctly
using backquotes to express a mixture of data and code. Since lambda
expressions <em>seem</em> to evaluate to themselves, what’s the difference?
For example, an association list of operations:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">'</span><span class="p">((</span><span class="nv">add</span> <span class="o">.</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">+</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">sub</span> <span class="o">.</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">mul</span> <span class="o">.</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">div</span> <span class="o">.</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">))))</span>
</code></pre></div></div>

<p>It looks like it would work, and indeed it does work in this case.
However, there are good reasons to actually evaluate those lambda
expressions. Eventually invoking the lambda expressions in the quoted
form above are equivalent to using <code class="language-plaintext highlighter-rouge">eval</code>. So, instead, prefer the
backquote form:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">`</span><span class="p">((</span><span class="nv">add</span> <span class="o">.</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">+</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">sub</span> <span class="o">.</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">mul</span> <span class="o">.</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">div</span> <span class="o">.</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">))))</span>
</code></pre></div></div>

<p>There are a lot of interesting things to say about this, but let’s
first reduce it to two very simple cases:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>

<span class="o">'</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>
</code></pre></div></div>

<p>What’s the difference between these two forms? The first is a lambda
expression, and it evaluates to a function object. The other is a quoted
list that <em>looks like</em> a lambda expression, and it evaluates to a list —
a piece of data.</p>

<p>A naive evaluation of these expressions in <code class="language-plaintext highlighter-rouge">*scratch*</code> (<code class="language-plaintext highlighter-rouge">C-x C-e</code>)
suggests they are are identical, and so it would seem that quoting a
lambda expression doesn’t really matter:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>
<span class="c1">;; =&gt; (lambda (x) x)</span>

<span class="o">'</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>
<span class="c1">;; =&gt; (lambda (x) x)</span>
</code></pre></div></div>

<p>However, there are two common situations where this is not the case:
<strong>byte compilation</strong> and <strong>lexical scope</strong>.</p>

<h3 id="lambda-under-byte-compilation">Lambda under byte compilation</h3>

<p>It’s a little trickier to evaluate these forms byte compiled in the
scratch buffer since that doesn’t happen automatically. But if it did,
it would look like this:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: nil; -*-</span>

<span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>
<span class="c1">;; =&gt; #[(x) "\010\207" [x] 1]</span>

<span class="o">'</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>
<span class="c1">;; =&gt; (lambda (x) x)</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">#[...]</code> is the syntax for a byte-code function object. As
discussed in detail in <a href="/blog/2014/01/04/">my byte-code internals article</a>, it’s a
special vector object that contains byte-code, and other metadata, for
evaluation by Emacs’ virtual stack machine. Elisp is one of very few
languages with <a href="/blog/2013/12/30/">readable function objects</a>, and this feature is
core to its ahead-of-time byte compilation.</p>

<p>The quote, by definition, prevents evaluation, and so inhibits byte
compilation of the lambda expression. It’s vital that the byte compiler
does not try to guess the programmer’s intent and compile the expression
anyway, since that would interfere with lists that just so happen to
look like lambda expressions — i.e. any list containing the <code class="language-plaintext highlighter-rouge">lambda</code>
symbol.</p>

<p>There are three reasons you want your lambda expressions to get byte
compiled:</p>

<ul>
  <li>
    <p>Byte-compiled functions are significantly faster. That’s the main
purpose for byte compilation after all.</p>
  </li>
  <li>
    <p>The compiler performs static checks, producing warnings and errors
ahead of time. This lets you spot certain classes of problems before
they occur. The static analysis is even better under lexical scope due
to its tighter semantics.</p>
  </li>
  <li>
    <p>Under lexical scope, byte-compiled closures may use less memory. More
specifically, they won’t accidentally keep objects alive longer than
necessary. I’ve never seen a name for this implementation issue, but I
call it <em>overcapturing</em>. More on this later.</p>
  </li>
</ul>

<p>While it’s common for personal configurations to skip byte compilation,
Elisp should still generally be written as if it were going to be byte
compiled. General rule of thumb: <strong>Ensure your lambda expressions are
actually evaluated.</strong></p>

<h3 id="lambda-in-lexical-scope">Lambda in lexical scope</h3>

<p>As I’ve stressed many times, <a href="/blog/2016/12/22/">you should <em>always</em> use lexical
scope</a>. There’s no practical disadvantage or trade-off involved.
Just do it.</p>

<p>Once lexical scope is enabled, the two expressions diverge even without
byte compilation:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: t; -*-</span>

<span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>
<span class="c1">;; =&gt; (closure (t) (x) x)</span>

<span class="o">'</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>
<span class="c1">;; =&gt; (lambda (x) x)</span>
</code></pre></div></div>

<p>Under lexical scope, lambda expressions evaluate to <em>closures</em>.
Closures capture their lexical environment in their closure object —
nothing in this particular case. It’s a type of function object,
making it a valid first argument to <code class="language-plaintext highlighter-rouge">funcall</code>.</p>

<p>Since the quote prevents the second expression from being evaluated,
semantically it evaluates to a list that just so happens to look like
a (non-closure) function object. <strong>Invoking a <em>data</em> object as a
function is like using <code class="language-plaintext highlighter-rouge">eval</code></strong> — i.e. executing data as code.
Everyone already knows <code class="language-plaintext highlighter-rouge">eval</code> should not be used lightly.</p>

<p>It’s a little more interesting to look at a closure that actually
captures a variable, so here’s a definition for <code class="language-plaintext highlighter-rouge">constantly</code>, a
higher-order function that returns a closure that accepts any number of
arguments and returns a particular constant:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nb">constantly</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span>
  <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">_</span><span class="p">)</span> <span class="nv">x</span><span class="p">))</span>
</code></pre></div></div>

<p>Without byte compiling it, here’s an example of its return value:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">constantly</span> <span class="ss">:foo</span><span class="p">)</span>
<span class="c1">;; =&gt; (closure ((x . :foo) t) (&amp;rest _) x)</span>
</code></pre></div></div>

<p>The environment has been captured as an association list (with a
trailing <code class="language-plaintext highlighter-rouge">t</code>), and we can plainly see that the variable <code class="language-plaintext highlighter-rouge">x</code> is bound to
the symbol <code class="language-plaintext highlighter-rouge">:foo</code> in this closure. Consider that we could manipulate
this data structure (e.g. <code class="language-plaintext highlighter-rouge">setcdr</code> or <code class="language-plaintext highlighter-rouge">setf</code>) to change the binding of
<code class="language-plaintext highlighter-rouge">x</code> for this closure. <em>This is essentially how closures mutate their own
environment.</em> Moreover, closures from the same environment share
structure, so such mutations are also shared. More on this later.</p>

<p>Semantically, closures are distinct objects (via <code class="language-plaintext highlighter-rouge">eq</code>), even if the
variables they close over are bound to the same value. This is because
they each have a distinct environment attached to them, even if in
some invisible way.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nb">constantly</span> <span class="ss">:foo</span><span class="p">)</span> <span class="p">(</span><span class="nb">constantly</span> <span class="ss">:foo</span><span class="p">))</span>
<span class="c1">;; =&gt; nil</span>
</code></pre></div></div>

<p>Without byte compilation, this is true <em>even when there’s no lexical
environment to capture</em>:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">dummy</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="no">t</span><span class="p">))</span>

<span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nv">dummy</span><span class="p">)</span> <span class="p">(</span><span class="nv">dummy</span><span class="p">))</span>
<span class="c1">;; =&gt; nil</span>
</code></pre></div></div>

<p>The byte compiler is smart, though. <a href="/blog/2017/01/30/">As an optimization</a>, the
same closure object is reused when possible, avoiding unnecessary
work, including multiple object allocations. Though this is a bit of
an abstraction leak. A function can (ab)use this to introspect whether
it’s been byte compiled:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">have-i-been-compiled-p</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">funcs</span> <span class="p">(</span><span class="nb">vector</span> <span class="no">nil</span> <span class="no">nil</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">i</span> <span class="mi">2</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">funcs</span> <span class="nv">i</span><span class="p">)</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">())))</span>
    <span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">funcs</span> <span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">funcs</span> <span class="mi">1</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">have-i-been-compiled-p</span><span class="p">)</span>
<span class="c1">;; =&gt; nil</span>

<span class="p">(</span><span class="nv">byte-compile</span> <span class="ss">'have-i-been-compiled-p</span><span class="p">)</span>

<span class="p">(</span><span class="nv">have-i-been-compiled-p</span><span class="p">)</span>
<span class="c1">;; =&gt; t</span>
</code></pre></div></div>

<p>The trick here is to evaluate the exact same non-capturing lambda
expression twice, which requires a loop (or at least some sort of
branch). <em>Semantically</em> we should think of these closures as being
distinct objects, but, if we squint our eyes a bit, we can see the
effects of the behind-the-scenes optimization.</p>

<p>Don’t actually do this in practice, of course. That’s what
<code class="language-plaintext highlighter-rouge">byte-code-function-p</code> is for, which won’t rely on a subtle
implementation detail.</p>

<h3 id="overcapturing">Overcapturing</h3>

<p>I mentioned before that one of the potential gotchas of not byte
compiling your lambda expressions is overcapturing closure variables in
the interpreter.</p>

<p>To evaluate lisp code, Emacs has both an interpreter and a virtual
machine. The interpreter evaluates code in list form: cons cells,
numbers, symbols, etc. The byte compiler is like the interpreter, but
instead of directly executing those forms, it emits byte-code that, when
evaluated by the virtual machine, produces identical visible results to
the interpreter — <em>in theory</em>.</p>

<p>What this means is that <strong>Emacs contains two different implementations
of Emacs Lisp</strong>, one in the interpreter and one in the byte compiler.
The Emacs developers have been maintaining and expanding these
implementations side-by-side for decades. A pitfall to this approach
is that the <em>implementations can, and do, diverge in their behavior</em>.
We saw this above with that introspective function, and it <a href="/blog/2013/01/22/">comes up
in practice with advice</a>.</p>

<p>Another way they diverge is in closure variable capture. For example:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: t; -*-</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">overcapture</span> <span class="p">(</span><span class="nv">x</span> <span class="nv">y</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">when</span> <span class="nv">y</span>
    <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="nv">x</span><span class="p">)))</span>

<span class="p">(</span><span class="nv">overcapture</span> <span class="ss">:x</span> <span class="ss">:some-big-value</span><span class="p">)</span>
<span class="c1">;; =&gt; (closure ((y . :some-big-value) (x . :x) t) nil x)</span>
</code></pre></div></div>

<p>Notice that the closure captured <code class="language-plaintext highlighter-rouge">y</code> even though it’s unnecessary.
This is because the interpreter doesn’t, and shouldn’t, take the time
to analyze the body of the lambda to determine which variables should
be captured. That would need to happen at run-time each time the
lambda is evaluated, which would make the interpreter much slower.
Overcapturing can get pretty messy if macros are introducing their own
hidden variables.</p>

<p>On the other hand, the byte compiler can do this analysis just once at
compile-time. And it’s already doing the analysis as part of its job.
It can avoid this problem easily:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">overcapture</span> <span class="ss">:x</span> <span class="ss">:some-big-value</span><span class="p">)</span>
<span class="c1">;; =&gt; #[0 "\300\207" [:x] 1]</span>
</code></pre></div></div>

<p>It’s clear that <code class="language-plaintext highlighter-rouge">:some-big-value</code> isn’t present in the closure.</p>

<p>But… how does this work?</p>

<h3 id="how-byte-compiled-closures-are-constructed">How byte compiled closures are constructed</h3>

<p>Recall from the <a href="/blog/2014/01/04/">internals article</a> that the four core elements of a
byte-code function object are:</p>

<ol>
  <li>Parameter specification</li>
  <li>Byte-code string (opcodes)</li>
  <li>Constants vector</li>
  <li>Maximum stack usage</li>
</ol>

<p>While a closure <em>seems</em> like compiling a whole new function each time
the lambda expression is evaluated, there’s actually not that much to
it! Namely, <a href="/blog/2017/01/08/">the <em>behavior</em> of the function remains the same</a>. Only
the closed-over environment changes.</p>

<p>What this means is that closures produced by a common lambda
expression can all share the same byte-code string (second element).
Their bodies are identical, so they compile to the same byte-code.
Where they differ are in their constants vector (third element), which
gets filled out according to the closed over environment. It’s clear
just from examining the outputs:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">constantly</span> <span class="ss">:a</span><span class="p">)</span>
<span class="c1">;; =&gt; #[128 "\300\207" [:a] 2]</span>

<span class="p">(</span><span class="nb">constantly</span> <span class="ss">:b</span><span class="p">)</span>
<span class="c1">;; =&gt; #[128 "\300\207" [:b] 2]</span>

</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">constantly</code> has three of the four components of the closure in its own
constant pool. Its job is to construct the constants vector, and then
assemble the whole thing into a byte-code function object (<code class="language-plaintext highlighter-rouge">#[...]</code>).
Here it is with <code class="language-plaintext highlighter-rouge">M-x disassemble</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  make-byte-code
1       constant  128
2       constant  "\300\207"
4       constant  vector
5       stack-ref 4
6       call      1
7       constant  2
8       call      4
9       return
</code></pre></div></div>

<p>(Note: since byte compiler doesn’t produce perfectly optimal code, I’ve
simplified it for this discussion.)</p>

<p>It pushes most of its constants on the stack. Then the <code class="language-plaintext highlighter-rouge">stack-ref 5</code> (5)
puts <code class="language-plaintext highlighter-rouge">x</code> on the stack. Then it calls <code class="language-plaintext highlighter-rouge">vector</code> to create the constants
vector (6). Finally, it constructs the function object (<code class="language-plaintext highlighter-rouge">#[...]</code>) by
calling <code class="language-plaintext highlighter-rouge">make-byte-code</code> (8).</p>

<p>Since this might be clearer, here’s the same thing expressed back in
terms of Elisp:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nb">constantly</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">make-byte-code</span> <span class="mi">128</span> <span class="s">"\300\207"</span> <span class="p">(</span><span class="nb">vector</span> <span class="nv">x</span><span class="p">)</span> <span class="mi">2</span><span class="p">))</span>
</code></pre></div></div>

<p>To see the disassembly of the closure’s byte-code:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">disassemble</span> <span class="p">(</span><span class="nb">constantly</span> <span class="ss">:x</span><span class="p">))</span>
</code></pre></div></div>

<p>The result isn’t very surprising:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  :x
1       return
</code></pre></div></div>

<p>Things get a little more interesting when mutation is involved. Consider
this adder closure generator, which mutates its environment every time
it’s called:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">adder</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">total</span> <span class="mi">0</span><span class="p">))</span>
    <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="p">(</span><span class="nv">cl-incf</span> <span class="nv">total</span><span class="p">))))</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nb">count</span> <span class="p">(</span><span class="nv">adder</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">funcall</span> <span class="nb">count</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">funcall</span> <span class="nb">count</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">funcall</span> <span class="nb">count</span><span class="p">))</span>
<span class="c1">;; =&gt; 3</span>

<span class="p">(</span><span class="nv">adder</span><span class="p">)</span>
<span class="c1">;; =&gt; #[0 "\300\211\242T\240\207" [(0)] 2]</span>
</code></pre></div></div>

<p>The adder essentially works like this:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">adder</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">make-byte-code</span> <span class="mi">0</span> <span class="s">"\300\211\242T\240\207"</span> <span class="p">(</span><span class="nb">vector</span> <span class="p">(</span><span class="nb">list</span> <span class="mi">0</span><span class="p">))</span> <span class="mi">2</span><span class="p">))</span>
</code></pre></div></div>

<p><em>In theory</em>, this closure could operate by mutating its constants vector
directly. But that wouldn’t be much of a <em>constants</em> vector, now would
it!? Instead, mutated variables are <em>boxed</em> inside a cons cell. Closures
don’t share constant vectors, so the main reason for boxing is to share
variables between closures from the same environment. That is, they have
the same cons in each of their constant vectors.</p>

<p>There’s no equivalent Elisp for the closure in <code class="language-plaintext highlighter-rouge">adder</code>, so here’s the
disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  (0)
1       dup
2       car-safe
3       add1
4       setcar
5       return
</code></pre></div></div>

<p>It puts two references to boxed integer on the stack (<code class="language-plaintext highlighter-rouge">constant</code>,
<code class="language-plaintext highlighter-rouge">dup</code>), unboxes the top one (<code class="language-plaintext highlighter-rouge">car-safe</code>), increments that unboxed
integer, stores it back in the box (<code class="language-plaintext highlighter-rouge">setcar</code>) via the bottom reference,
leaving the incremented value behind to be returned.</p>

<p>This all gets a little more interesting when closures interact:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">fancy-adder</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">total</span> <span class="mi">0</span><span class="p">))</span>
    <span class="o">`</span><span class="p">(</span><span class="ss">:add</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="p">(</span><span class="nv">cl-incf</span> <span class="nv">total</span><span class="p">))</span>
      <span class="ss">:set</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nb">setf</span> <span class="nv">total</span> <span class="nv">v</span><span class="p">))</span>
      <span class="ss">:get</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="nv">total</span><span class="p">))))</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">counter</span> <span class="p">(</span><span class="nv">fancy-adder</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">counter</span> <span class="ss">:set</span><span class="p">)</span> <span class="mi">100</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">counter</span> <span class="ss">:add</span><span class="p">))</span>
  <span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">counter</span> <span class="ss">:add</span><span class="p">))</span>
  <span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">counter</span> <span class="ss">:get</span><span class="p">)))</span>
<span class="c1">;; =&gt; 102</span>

<span class="p">(</span><span class="nv">fancy-adder</span><span class="p">)</span>
<span class="c1">;; =&gt; (:add #[0 "\300\211\242T\240\207" [(0)] 2]</span>
<span class="c1">;;     :set #[257 "\300\001\240\207" [(0)] 3]</span>
<span class="c1">;;     :get #[0 "\300\242\207" [(0)] 1])</span>
</code></pre></div></div>

<p>This is starting to resemble object oriented programming, with methods
acting upon fields stored in a common, closed-over environment.</p>

<p>All three closures share a common variable, <code class="language-plaintext highlighter-rouge">total</code>. Since I didn’t
use <code class="language-plaintext highlighter-rouge">print-circle</code>, this isn’t obvious from the last result, but each
of those <code class="language-plaintext highlighter-rouge">(0)</code> conses are the same object. When one closure mutates
the box, they all see the change. Here’s essentially how <code class="language-plaintext highlighter-rouge">fancy-adder</code>
is transformed by the byte compiler:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">fancy-adder</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">box</span> <span class="p">(</span><span class="nb">list</span> <span class="mi">0</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">list</span> <span class="ss">:add</span> <span class="p">(</span><span class="nv">make-byte-code</span> <span class="mi">0</span> <span class="s">"\300\211\242T\240\207"</span> <span class="p">(</span><span class="nb">vector</span> <span class="nv">box</span><span class="p">)</span> <span class="mi">2</span><span class="p">)</span>
          <span class="ss">:set</span> <span class="p">(</span><span class="nv">make-byte-code</span> <span class="mi">257</span> <span class="s">"\300\001\240\207"</span> <span class="p">(</span><span class="nb">vector</span> <span class="nv">box</span><span class="p">)</span> <span class="mi">3</span><span class="p">)</span>
          <span class="ss">:get</span> <span class="p">(</span><span class="nv">make-byte-code</span> <span class="mi">0</span> <span class="s">"\300\242\207"</span> <span class="p">(</span><span class="nb">vector</span> <span class="nv">box</span><span class="p">)</span> <span class="mi">1</span><span class="p">))))</span>
</code></pre></div></div>

<p>The backquote in the original <code class="language-plaintext highlighter-rouge">fancy-adder</code> brings this article full
circle. This final example wouldn’t work correctly if those lambdas
weren’t evaluated properly.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Finding the Best 64-bit Simulation PRNG</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2017/09/21/"/>
    <id>urn:uuid:637af55f-6e33-31e5-25fa-edb590a16d44</id>
    <updated>2017-09-21T21:25:00Z</updated>
    <category term="c"/><category term="compsci"/><category term="x86"/><category term="crypto"/><category term="optimization"/>
    <content type="html">
      <![CDATA[<p><strong>August 2018 Update</strong>: <em>xoroshiro128+ fails <a href="http://pracrand.sourceforge.net/">PractRand</a> very
badly. Since this article was published, its authors have supplanted it
with <strong>xoshiro256**</strong>. It has essentially the same performance, but
better statistical properties. xoshiro256** is now my preferred PRNG.</em></p>

<p>I use pseudo-random number generators (PRNGs) a whole lot. They’re an
essential component in lots of algorithms and processes.</p>

<ul>
  <li>
    <p><strong>Monte Carlo simulations</strong>, where PRNGs are used to <a href="https://possiblywrong.wordpress.com/2015/09/15/kanoodle-iq-fit-and-dancing-links/">compute
numeric estimates</a> for problems that are difficult or impossible
to solve analytically.</p>
  </li>
  <li>
    <p><a href="/blog/2017/04/27/"><strong>Monte Carlo tree search AI</strong></a>, where massive numbers of games
are played out randomly in search of an optimal move. This is a
specific application of the last item.</p>
  </li>
  <li>
    <p><a href="https://github.com/skeeto/carpet-fractal-genetics"><strong>Genetic algorithms</strong></a>, where a PRNG creates the initial
population, and then later guides in mutation and breeding of selected
solutions.</p>
  </li>
  <li>
    <p><a href="https://blog.cr.yp.to/20140205-entropy.html"><strong>Cryptography</strong></a>, where a cryptographically-secure PRNGs
(CSPRNGs) produce output that is predictable for recipients who know
a particular secret, but not for anyone else. This article is only
concerned with plain PRNGs.</p>
  </li>
</ul>

<p>For the first three “simulation” uses, there are two primary factors
that drive the selection of a PRNG. These factors can be at odds with
each other:</p>

<ol>
  <li>
    <p>The PRNG should be <em>very</em> fast. The application should spend its
time running the actual algorithms, not generating random numbers.</p>
  </li>
  <li>
    <p>PRNG output should have robust statistical qualities. Bits should
appear to be independent and the output should closely follow the
desired distribution. Poor quality output will negatively effect
the algorithms using it. Also just as important is <a href="http://mumble.net/~campbell/2014/04/28/uniform-random-float">how you use
it</a>, but this article will focus only on generating bits.</p>
  </li>
</ol>

<p>In other situations, such as in cryptography or online gambling,
another important property is that an observer can’t learn anything
meaningful about the PRNG’s internal state from its output. For the
three simulation cases I care about, this is not a concern. Only speed
and quality properties matter.</p>

<p>Depending on the programming language, the PRNGs found in various
standard libraries may be of dubious quality. They’re slower than they
need to be, or have poorer quality than required. In some cases, such
as <code class="language-plaintext highlighter-rouge">rand()</code> in C, the algorithm isn’t specified, and you can’t rely on
it for anything outside of trivial examples. In other cases the
algorithm and behavior <em>is</em> specified, but you could easily do better
yourself.</p>

<p>My preference is to BYOPRNG: <em>Bring Your Own Pseudo-random Number
Generator</em>. You get reliable, identical output everywhere. Also, in
the case of C and C++ — and if you do it right — by embedding the PRNG
in your project, it will get inlined and unrolled, making it far more
efficient than a <a href="/blog/2016/10/27/">slow call into a dynamic library</a>.</p>

<p>A fast PRNG is going to be small, making it a great candidate for
embedding as, say, a header library. That leaves just one important
question, “Can the PRNG be small <em>and</em> have high quality output?” In
the 21st century, the answer to this question is an emphatic “yes!”</p>

<p>For the past few years my main go to for a drop-in PRNG has been
<a href="https://en.wikipedia.org/wiki/Xorshift">xorshift*</a>. The body of the function is 6 lines of C, and its
entire state is a 64-bit integer, directly seeded. However, there are a
number of choices here, including other variants of Xorshift. How do I
know which one is best? The only way to know is to test it, hence my
64-bit PRNG shootout:</p>

<ul>
  <li><a href="https://github.com/skeeto/prng64-shootout"><strong>64-bit PRNG Shootout</strong></a></li>
</ul>

<p>Sure, there <a href="http://xoroshiro.di.unimi.it/">are other such shootouts</a>, but they’re all missing
something I want to measure. I also want to test in an environment very
close to how I’d use these PRNGs myself.</p>

<h3 id="shootout-results">Shootout results</h3>

<p>Before getting into the details of the benchmark and each generator,
here are the results. These tests were run on an i7-6700 (Skylake)
running Linux 4.9.0.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>                               Speed (MB/s)
PRNG           FAIL  WEAK  gcc-6.3.0 clang-3.8.1
------------------------------------------------
baseline          X     X      15000       13100
blowfishcbc16     0     1        169         157
blowfishcbc4      0     5        725         676
blowfishctr16     1     3        187         184
blowfishctr4      1     5        890        1000
mt64              1     7       1700        1970
pcg64             0     4       4150        3290
rc4               0     5        366         185
spcg64            0     8       5140        4960
xoroshiro128+     0     6       8100        7720
xorshift128+      0     2       7660        6530
xorshift64*       0     3       4990        5060
</code></pre></div></div>

<p><strong>The clear winner is <a href="http://xoroshiro.di.unimi.it/">xoroshiro128+</a></strong>, with a function body of
just 7 lines of C. It’s clearly the fastest, and the output had no
observed statistical failures. However, that’s not the whole story. A
couple of the other PRNGS have advantages that situationally makes
them better suited than xoroshiro128+. I’ll go over these in the
discussion below.</p>

<p>These two versions of GCC and Clang were chosen because these are the
latest available in Debian 9 “Stretch.” It’s easy to build and run the
benchmark yourself if you want to try a different version.</p>

<h3 id="speed-benchmark">Speed benchmark</h3>

<p>In the speed benchmark, the PRNG is initialized, a 1-second <code class="language-plaintext highlighter-rouge">alarm(1)</code>
is set, then the PRNG fills a large <code class="language-plaintext highlighter-rouge">volatile</code> buffer of 64-bit unsigned
integers again and again as quickly as possible until the alarm fires.
The amount of memory written is measured as the PRNG’s speed.</p>

<p>The baseline “PRNG” writes zeros into the buffer. This represents the
absolute speed limit that no PRNG can exceed.</p>

<p>The purpose for making the buffer <code class="language-plaintext highlighter-rouge">volatile</code> is to force the entire
output to actually be “consumed” as far as the compiler is concerned.
Otherwise the compiler plays nasty tricks to make the program do as
little work as possible. Another way to deal with this would be to
<code class="language-plaintext highlighter-rouge">write(2)</code> buffer, but of course I didn’t want to introduce
unnecessary I/O into a benchmark.</p>

<p>On Linux, SIGALRM was impressively consistent between runs, meaning it
was perfectly suitable for this benchmark. To account for any process
scheduling wonkiness, the bench mark was run 8 times and only the
fastest time was kept.</p>

<p>The SIGALRM handler sets a <code class="language-plaintext highlighter-rouge">volatile</code> global variable that tells the
generator to stop. The PRNG call was unrolled 8 times to avoid the
alarm check from significantly impacting the benchmark. You can see
the effect for yourself by changing <code class="language-plaintext highlighter-rouge">UNROLL</code> to 1 (i.e. “don’t
unroll”) in the code. Unrolling beyond 8 times had no measurable
effect to my tests.</p>

<p>Due to the PRNGs being inlined, this unrolling makes the benchmark
less realistic, and it shows in the results. Using <code class="language-plaintext highlighter-rouge">volatile</code> for the
buffer helped to counter this effect and reground the results. This is
a fuzzy problem, and there’s not really any way to avoid it, but I
will also discuss this below.</p>

<h3 id="statistical-benchmark">Statistical benchmark</h3>

<p>To measure the statistical quality of each PRNG — mostly as a sanity
check — the raw binary output was run through <a href="http://webhome.phy.duke.edu/~rgb/General/dieharder.php">dieharder</a> 3.31.1:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>prng | dieharder -g200 -a -m4
</code></pre></div></div>

<p>This statistical analysis has no timing characteristics and the
results should be the same everywhere. You would only need to re-run
it to test with a different version of dieharder, or a different
analysis tool.</p>

<p>There’s not much information to glean from this part of the shootout.
It mostly confirms that all of these PRNGs would work fine for
simulation purposes. The WEAK results are not very significant and is
only useful for breaking ties. Even a true RNG will get some WEAK
results. For example, the <a href="https://en.wikipedia.org/wiki/RdRand">x86 RDRAND</a> instruction (not
included in actual shootout) got 7 WEAK results in my tests.</p>

<p>The FAIL results are more significant, but a single failure doesn’t
mean much. A non-failing PRNG should be preferred to an otherwise
equal PRNG with a failure.</p>

<h3 id="individual-prngs">Individual PRNGs</h3>

<p>Admittedly the definition for “64-bit PRNG” is rather vague. My high
performance targets are all 64-bit platforms, so the highest PRNG
throughput will be built on 64-bit operations (<a href="/blog/2015/07/10/">if not wider</a>).
The original plan was to focus on PRNGs built from 64-bit operations.</p>

<p>Curiosity got the best of me, so I included some PRNGs that don’t use
<em>any</em> 64-bit operations. I just wanted to see how they stacked up.</p>

<h4 id="blowfish">Blowfish</h4>

<p>One of the reasons I <a href="/blog/2017/09/15/">wrote a Blowfish implementation</a> was to
evaluate its performance and statistical qualities, so naturally I
included it in the benchmark. It only uses 32-bit addition and 32-bit
XOR. It has a 64-bit block size, so it’s naturally producing a 64-bit
integer. There are two different properties that combine to make four
variants in the benchmark: number of rounds and block mode.</p>

<p>Blowfish normally uses 16 rounds. This makes it a lot slower than a
non-cryptographic PRNG but gives it a <em>security margin</em>. I don’t care
about the security margin, so I included a 4-round variant. At
expected, it’s about four times faster.</p>

<p>The other feature I tested is the block mode: <a href="https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#CBC">Cipher Block
Chaining</a> (CBC) versus <a href="https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_.28CTR.29">Counter</a> (CTR) mode. In CBC mode it
encrypts zeros as plaintext. This just means it’s encrypting its last
output. The ciphertext is the PRNG’s output.</p>

<p>In CTR mode the PRNG is encrypting a 64-bit counter. It’s 11% faster
than CBC in the 16-round variant and 23% faster in the 4-round variant.
The reason is simple, and it’s in part an artifact of unrolling the
generation loop in the benchmark.</p>

<p>In CBC mode, each output depends on the previous, but in CTR mode all
blocks are independent. Work can begin on the next output before the
previous output is complete. The x86 architecture uses out-of-order
execution to achieve many of its performance gains: Instructions may
be executed in a different order than they appear in the program,
though their observable effects must <a href="http://preshing.com/20120515/memory-reordering-caught-in-the-act/">generally be ordered
correctly</a>. Breaking dependencies between instructions allows
out-of-order execution to be fully exercised. It also gives the
compiler more freedom in instruction scheduling, though the <code class="language-plaintext highlighter-rouge">volatile</code>
accesses cannot be reordered with respect to each other (hence it
helping to reground the benchmark).</p>

<p>Statistically, the 4-round cipher was not significantly worse than the
16-round cipher. For simulation purposes the 4-round cipher would be
perfectly sufficient, though xoroshiro128+ is still more than 9 times
faster without sacrificing quality.</p>

<p>On the other hand, CTR mode had a single failure in both the 4-round
(dab_filltree2) and 16-round (dab_filltree) variants. At least for
Blowfish, is there something that makes CTR mode less suitable than CBC
mode as a PRNG?</p>

<p>In the end Blowfish is too slow and too complicated to serve as a
simulation PRNG. This was entirely expected, but it’s interesting to see
how it stacks up.</p>

<h4 id="mersenne-twister-mt19937-64">Mersenne Twister (MT19937-64)</h4>

<p>Nobody ever got fired for choosing <a href="https://en.wikipedia.org/wiki/Mersenne_Twister">Mersenne Twister</a>. It’s the
classical choice for simulations, and is still usually recommended to
this day. However, Mersenne Twister’s best days are behind it. I
tested the 64-bit variant, MT19937-64, and there are four problems:</p>

<ul>
  <li>
    <p>It’s between 1/4 and 1/5 the speed of xoroshiro128+.</p>
  </li>
  <li>
    <p>It’s got a large state: 2,500 bytes. Versus xoroshiro128+’s 16 bytes.</p>
  </li>
  <li>
    <p>Its implementation is three times bigger than xoroshiro128+, and much
more complicated.</p>
  </li>
  <li>
    <p>It had one statistical failure (dab_filltree2).</p>
  </li>
</ul>

<p>Curiously my implementation is 16% faster with Clang than GCC. Since
Mersenne Twister isn’t seriously in the running, I didn’t take time to
dig into why.</p>

<p>Ultimately I would never choose Mersenne Twister for anything anymore.
This was also not surprising.</p>

<h4 id="permuted-congruential-generator-pcg">Permuted Congruential Generator (PCG)</h4>

<p>The <a href="http://www.pcg-random.org/">Permuted Congruential Generator</a> (PCG) has some really
interesting history behind it, particularly with its somewhat <a href="http://www.pcg-random.org/paper.html">unusual
paper</a>, controversial for both its excessive length (58 pages)
and informal style. It’s in close competition with Xorshift and
xoroshiro128+. I was really interested in seeing how it stacked up.</p>

<p>PCG is really just a Linear Congruential Generator (LCG) that doesn’t
output the lowest bits (too poor quality), and has an extra
permutation step to make up for the LCG’s other weaknesses. I included
two variants in my benchmark: the official PCG and a “simplified” PCG
(sPCG) with a simple permutation step. sPCG is just the first PCG
presented in the paper (34 pages in!).</p>

<p>Here’s essentially what the simplified version looks like:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span>
<span class="nf">spcg32</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">m</span> <span class="o">=</span> <span class="mh">0x9b60933458e17d7d</span><span class="p">;</span>
    <span class="kt">uint64_t</span> <span class="n">a</span> <span class="o">=</span> <span class="mh">0xd737232eeccdf7ed</span><span class="p">;</span>
    <span class="o">*</span><span class="n">s</span> <span class="o">=</span> <span class="o">*</span><span class="n">s</span> <span class="o">*</span> <span class="n">m</span> <span class="o">+</span> <span class="n">a</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">shift</span> <span class="o">=</span> <span class="mi">29</span> <span class="o">-</span> <span class="p">(</span><span class="o">*</span><span class="n">s</span> <span class="o">&gt;&gt;</span> <span class="mi">61</span><span class="p">);</span>
    <span class="k">return</span> <span class="o">*</span><span class="n">s</span> <span class="o">&gt;&gt;</span> <span class="n">shift</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The third line with the modular multiplication and addition is the
LCG. The bit shift is the permutation. This PCG uses the most
significant three bits of the result to determine which 32 bits to
output. That’s <em>the</em> novel component of PCG.</p>

<p>The two constants are entirely my own devising. It’s two 64-bit primes
generated using Emacs’ <code class="language-plaintext highlighter-rouge">M-x calc</code>: <code class="language-plaintext highlighter-rouge">2 64 ^ k r k n k p k p k p</code>.</p>

<p>Heck, that’s so simple that I could easily memorize this and code it
from scratch on demand. Key takeaway: This is <strong>one way that PCG is
situationally better than xoroshiro128+</strong>. In a pinch I could use Emacs
to generate a couple of primes and code the rest from memory. If you
participate in coding competitions, take note.</p>

<p>However, you probably also noticed PCG only generates 32-bit integers
despite using 64-bit operations. To properly generate a 64-bit value
we’d need 128-bit operations, which would need to be implemented in
software.</p>

<p>Instead, I doubled up on everything to run two PRNGs in parallel.
Despite the doubling in state size, the period doesn’t get any larger
since the PRNGs don’t interact with each other. We get something in
return, though. Remember what I said about out-of-order execution?
Except for the last step combining their results, since the two PRNGs
are independent, doubling up shouldn’t <em>quite</em> halve the performance,
particularly with the benchmark loop unrolling business.</p>

<p>Here’s my doubled-up version:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint64_t</span>
<span class="nf">spcg64</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">s</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
<span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">m</span>  <span class="o">=</span> <span class="mh">0x9b60933458e17d7d</span><span class="p">;</span>
    <span class="kt">uint64_t</span> <span class="n">a0</span> <span class="o">=</span> <span class="mh">0xd737232eeccdf7ed</span><span class="p">;</span>
    <span class="kt">uint64_t</span> <span class="n">a1</span> <span class="o">=</span> <span class="mh">0x8b260b70b8e98891</span><span class="p">;</span>
    <span class="kt">uint64_t</span> <span class="n">p0</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
    <span class="kt">uint64_t</span> <span class="n">p1</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
    <span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">p0</span> <span class="o">*</span> <span class="n">m</span> <span class="o">+</span> <span class="n">a0</span><span class="p">;</span>
    <span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">p1</span> <span class="o">*</span> <span class="n">m</span> <span class="o">+</span> <span class="n">a1</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">r0</span> <span class="o">=</span> <span class="mi">29</span> <span class="o">-</span> <span class="p">(</span><span class="n">p0</span> <span class="o">&gt;&gt;</span> <span class="mi">61</span><span class="p">);</span>
    <span class="kt">int</span> <span class="n">r1</span> <span class="o">=</span> <span class="mi">29</span> <span class="o">-</span> <span class="p">(</span><span class="n">p1</span> <span class="o">&gt;&gt;</span> <span class="mi">61</span><span class="p">);</span>
    <span class="kt">uint64_t</span> <span class="n">high</span> <span class="o">=</span> <span class="n">p0</span> <span class="o">&gt;&gt;</span> <span class="n">r0</span><span class="p">;</span>
    <span class="kt">uint32_t</span> <span class="n">low</span>  <span class="o">=</span> <span class="n">p1</span> <span class="o">&gt;&gt;</span> <span class="n">r1</span><span class="p">;</span>
    <span class="k">return</span> <span class="p">(</span><span class="n">high</span> <span class="o">&lt;&lt;</span> <span class="mi">32</span><span class="p">)</span> <span class="o">|</span> <span class="n">low</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The “full” PCG has some extra shifts that makes it 25% (GCC) to 50%
(Clang) slower than the “simplified” PCG, but it does halve the WEAK
results.</p>

<p>In this 64-bit form, both are significantly slower than xoroshiro128+.
However, if you find yourself only needing 32 bits at a time (always
throwing away the high 32 bits from a 64-bit PRNG), 32-bit PCG is
faster than using xoroshiro128+ and throwing away half its output.</p>

<h4 id="rc4">RC4</h4>

<p>This is another CSPRNG where I was curious how it would stack up. It
only uses 8-bit operations, and it generates a 64-bit integer one byte
at a time. It’s the slowest after 16-round Blowfish and generally not
useful as a simulation PRNG.</p>

<h4 id="xoroshiro128">xoroshiro128+</h4>

<p>xoroshiro128+ is the obvious winner in this benchmark and it seems to be
the best 64-bit simulation PRNG available. If you need a fast, quality
PRNG, just drop these 11 lines into your C or C++ program:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint64_t</span>
<span class="nf">xoroshiro128plus</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">s</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
<span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">s0</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
    <span class="kt">uint64_t</span> <span class="n">s1</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
    <span class="kt">uint64_t</span> <span class="n">result</span> <span class="o">=</span> <span class="n">s0</span> <span class="o">+</span> <span class="n">s1</span><span class="p">;</span>
    <span class="n">s1</span> <span class="o">^=</span> <span class="n">s0</span><span class="p">;</span>
    <span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">((</span><span class="n">s0</span> <span class="o">&lt;&lt;</span> <span class="mi">55</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">s0</span> <span class="o">&gt;&gt;</span> <span class="mi">9</span><span class="p">))</span> <span class="o">^</span> <span class="n">s1</span> <span class="o">^</span> <span class="p">(</span><span class="n">s1</span> <span class="o">&lt;&lt;</span> <span class="mi">14</span><span class="p">);</span>
    <span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">s1</span> <span class="o">&lt;&lt;</span> <span class="mi">36</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">s1</span> <span class="o">&gt;&gt;</span> <span class="mi">28</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">result</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>There’s one important caveat: <strong>That 16-byte state must be
well-seeded.</strong> Having lots of zero bytes will lead <em>terrible</em> initial
output until the generator mixes it all up. Having all zero bytes will
completely break the generator. If you’re going to seed from, say, the
unix epoch, then XOR it with 16 static random bytes.</p>

<h4 id="xorshift128-and-xorshift64">xorshift128+ and xorshift64*</h4>

<p>These generators are closely related and, like I said, xorshift64* was
what I used for years. Looks like it’s time to retire it.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint64_t</span>
<span class="nf">xorshift64star</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">x</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">12</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&lt;&lt;</span> <span class="mi">25</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">27</span><span class="p">;</span>
    <span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">x</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">x</span> <span class="o">*</span> <span class="n">UINT64_C</span><span class="p">(</span><span class="mh">0x2545f4914f6cdd1d</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>However, unlike both xoroshiro128+ and xorshift128+, xorshift64* will
tolerate weak seeding so long as it’s not literally zero. Zero will also
break this generator.</p>

<p>If it weren’t for xoroshiro128+, then xorshift128+ would have been the
winner of the benchmark and my new favorite choice.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint64_t</span>
<span class="nf">xorshift128plus</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">s</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
<span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">x</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
    <span class="kt">uint64_t</span> <span class="n">y</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
    <span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">y</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&lt;&lt;</span> <span class="mi">23</span><span class="p">;</span>
    <span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">x</span> <span class="o">^</span> <span class="n">y</span> <span class="o">^</span> <span class="p">(</span><span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">17</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">y</span> <span class="o">&gt;&gt;</span> <span class="mi">26</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="n">y</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>It’s a lot like xoroshiro128+, including the need to be well-seeded,
but it’s just slow enough to lose out. There’s no reason to use
xorshift128+ instead of xoroshiro128+.</p>

<h3 id="conclusion">Conclusion</h3>

<p>My own takeaway (until I re-evaluate some years in the future):</p>

<ul>
  <li>The best 64-bit simulation PRNG is xoroshiro128+.</li>
  <li>“Simplified” PCG can be useful in a pinch.</li>
  <li>When only 32-bit integers are necessary, use PCG.</li>
</ul>

<p>Things can change significantly between platforms, though. Here’s the
shootout on a ARM Cortex-A53:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>                    Speed (MB/s)
PRNG         gcc-5.4.0   clang-3.8.0
------------------------------------
baseline          2560        2400
blowfishcbc16       36.5        45.4
blowfishcbc4       135         173
blowfishctr16       36.4        45.2
blowfishctr4       133         168
mt64               207         254
pcg64              980         712
rc4                 96.6        44.0
spcg64            1021         948
xoroshiro128+     2560        1570
xorshift128+      2560        1520
xorshift64*       1360        1080
</code></pre></div></div>

<p>LLVM is not as mature on this platform, but, with GCC, both
xoroshiro128+ and xorshift128+ matched the baseline! It seems memory
is the bottleneck.</p>

<p>So don’t necessarily take my word for it. You can run this shootout in
your own environment — perhaps even tossing in more PRNGs — to find
what’s appropriate for your own situation.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Some Performance Advantages of Lexical Scope</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/12/22/"/>
    <id>urn:uuid:21bc4afa-caa8-37ed-a912-a35f35d0e432</id>
    <updated>2016-12-22T02:33:36Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="optimization"/><category term="compsci"/>
    <content type="html">
      <![CDATA[<p>I recently had a discussion with <a href="http://ergoemacs.org/">Xah Lee</a> about lexical scope in
Emacs Lisp. The topic was why <code class="language-plaintext highlighter-rouge">lexical-binding</code> exists at a file-level
when there was already <code class="language-plaintext highlighter-rouge">lexical-let</code> (from <code class="language-plaintext highlighter-rouge">cl-lib</code>), prompted by my
previous article on <a href="/blog/2016/12/11/">JIT byte-code compilation</a>. The specific
context is Emacs Lisp, but these concepts apply to language design in
general.</p>

<p>Until Emacs 24.1 (June 2012), Elisp only had dynamically scoped
variables — a feature, mostly by accident, common to old lisp
dialects. While dynamic scope has some selective uses, it’s widely
regarded as a mistake for local variables, and virtually no other
languages have adopted it.</p>

<p>Way back in 1993, Dave Gillespie’s deviously clever <code class="language-plaintext highlighter-rouge">lexical-let</code>
macro <a href="http://git.savannah.gnu.org/cgit/emacs.git/commit/?h=fcd73769&amp;id=fcd737693e8e320acd70f91ec8e0728563244805">was committed</a> to the <code class="language-plaintext highlighter-rouge">cl</code> package, providing a rudimentary
form of opt-in lexical scope. The macro walks its body replacing local
variable names with guaranteed-unique gensym names: the exact same
technique used in macros to create “hygienic” bindings that aren’t
visible to the macro body. It essentially “fakes” lexical scope within
Elisp’s dynamic scope by preventing variable name collisions.</p>

<p>For example, here’s one of the consequences of dynamic scope.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">inner</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">setq</span> <span class="nv">v</span> <span class="ss">:inner</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">outer</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">v</span> <span class="ss">:outer</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">inner</span><span class="p">)</span>
    <span class="nv">v</span><span class="p">))</span>

<span class="p">(</span><span class="nv">outer</span><span class="p">)</span>
<span class="c1">;; =&gt; :inner</span>
</code></pre></div></div>

<p>The “local” variable <code class="language-plaintext highlighter-rouge">v</code> in <code class="language-plaintext highlighter-rouge">outer</code> is visible to its callee, <code class="language-plaintext highlighter-rouge">inner</code>,
which can access and manipulate it. The meaning of the <em>free variable</em>
<code class="language-plaintext highlighter-rouge">v</code> in <code class="language-plaintext highlighter-rouge">inner</code> depends entirely on the run-time call stack. It might
be a global variable, or it might be a local variable for a caller,
direct or indirect.</p>

<p>Using <code class="language-plaintext highlighter-rouge">lexical-let</code> deconflicts these names, giving the effect of
lexical scope.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">v</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">lexical-outer</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">lexical-let</span> <span class="p">((</span><span class="nv">v</span> <span class="ss">:outer</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">inner</span><span class="p">)</span>
    <span class="nv">v</span><span class="p">))</span>

<span class="p">(</span><span class="nv">lexical-outer</span><span class="p">)</span>
<span class="c1">;; =&gt; :outer</span>
</code></pre></div></div>

<p>But there’s more to lexical scope than this. Closures only make sense
in the context of lexical scope, and the most useful feature of
<code class="language-plaintext highlighter-rouge">lexical-let</code> is that lambda expressions evaluate to closures. The
macro implements this using a technique called <a href="https://en.wikipedia.org/wiki/Lambda_lifting"><em>closure
conversion</em></a>. Additional parameters are added to the original
lambda function, one for each lexical variable (and not just each
closed-over variable), and the whole thing is wrapped in <em>another</em>
lambda function that invokes the original lambda function with the
additional parameters filled with the closed-over variables — yes, the
variables (e.g. symbols) themselves, <em>not</em> just their values, (e.g.
pass-by-reference). The last point means different closures can
properly close over the same variables, and they can bind new values.</p>

<p>To roughly illustrate how this works, the first lambda expression
below, which closes over the lexical variables <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code>, would be
converted into the latter by <code class="language-plaintext highlighter-rouge">lexical-let</code>. The <code class="language-plaintext highlighter-rouge">#:</code> is Elisp’s syntax
for uninterned variables. So <code class="language-plaintext highlighter-rouge">#:x</code> is <em>a</em> symbol <code class="language-plaintext highlighter-rouge">x</code>, but not <em>the</em>
symbol <code class="language-plaintext highlighter-rouge">x</code> (see <code class="language-plaintext highlighter-rouge">print-gensym</code>).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Before conversion:</span>
<span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
  <span class="p">(</span><span class="nb">+</span> <span class="nv">x</span> <span class="nv">y</span><span class="p">))</span>

<span class="c1">;; After conversion:</span>
<span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">args</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">apply</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span> <span class="nv">y</span><span class="p">)</span>
           <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nb">symbol-value</span> <span class="nv">x</span><span class="p">)</span>
              <span class="p">(</span><span class="nb">symbol-value</span> <span class="nv">y</span><span class="p">)))</span>
         <span class="o">'</span><span class="ss">#:x</span> <span class="o">'</span><span class="ss">#:y</span> <span class="nv">args</span><span class="p">))</span>
</code></pre></div></div>

<p>I’ve said on multiple occasions that <code class="language-plaintext highlighter-rouge">lexical-binding: t</code> has
significant advantages, both in performance and static analysis, and
so it should be used for all future Elisp code. The only reason it’s
not the default is because it breaks some old (badly written) code.
However, <strong><code class="language-plaintext highlighter-rouge">lexical-let</code> doesn’t realize any of these advantages</strong>! In
fact, it has worse performance than straightforward dynamic scope with
<code class="language-plaintext highlighter-rouge">let</code>.</p>

<ol>
  <li>
    <p>New symbol objects are allocated and initialized (<code class="language-plaintext highlighter-rouge">make-symbol</code>) on
each run-time evaluation, one per lexical variable.</p>
  </li>
  <li>
    <p>Since it’s just faking it, <code class="language-plaintext highlighter-rouge">lexical-let</code> still uses dynamic
bindings, which are more expensive than lexical bindings. It varies
depending on the C compiler that built Emacs, but dynamic variable
accesses (opcode <code class="language-plaintext highlighter-rouge">varref</code>) take around 30% longer than lexical
variable accesses (opcode <code class="language-plaintext highlighter-rouge">stack-ref</code>). Assignment is far worse,
where dynamic variable assignment (<code class="language-plaintext highlighter-rouge">varset</code>) takes 650% longer than
lexical variable assignment (<code class="language-plaintext highlighter-rouge">stack-set</code>). How I measured all this
is a topic for another article.</p>
  </li>
  <li>
    <p>The “lexical” variables are accessed using <code class="language-plaintext highlighter-rouge">symbol-value</code>, a full
function call, so they’re even slower than normal dynamic
variables.</p>
  </li>
  <li>
    <p>Because converted lambda expressions are constructed dynamically at
run-time within the body of <code class="language-plaintext highlighter-rouge">lexical-let</code>, the resulting closure is
only partially byte-compiled even if the code as a whole has been
byte-compiled. In contrast, <code class="language-plaintext highlighter-rouge">lexical-binding: t</code> closures are fully
compiled. How this works is worth <a href="/blog/2017/12/14/">its own article</a>.</p>
  </li>
  <li>
    <p>Converted lambda expressions include the additional internal
function invocation, making them slower.</p>
  </li>
</ol>

<p>While <code class="language-plaintext highlighter-rouge">lexical-let</code> is clever, and occasionally useful prior to Emacs
24, it may come at a hefty performance cost if evaluated frequently.
There’s no reason to use it anymore.</p>

<h3 id="constraints-on-code-generation">Constraints on code generation</h3>

<p>Another reason to be weary of dynamic scope is that it puts needless
constraints on the compiler, preventing a number of important
optimization opportunities. For example, consider the following
function, <code class="language-plaintext highlighter-rouge">bar</code>:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">bar</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">x</span> <span class="mi">1</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">y</span> <span class="mi">2</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">foo</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">+</span> <span class="nv">x</span> <span class="nv">y</span><span class="p">)))</span>
</code></pre></div></div>

<p>Byte-compile this function under dynamic scope (<code class="language-plaintext highlighter-rouge">lexical-binding:
nil</code>) and <a href="/blog/2014/01/04/">disassemble it</a> to see what it looks like.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">byte-compile</span> <span class="nf">#'</span><span class="nv">bar</span><span class="p">)</span>
<span class="p">(</span><span class="nb">disassemble</span> <span class="nf">#'</span><span class="nv">bar</span><span class="p">)</span>
</code></pre></div></div>

<p>That pops up a buffer with the disassembly listing:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  1
1       constant  2
2       varbind   y
3       varbind   x
4       constant  foo
5       call      0
6       discard
7       varref    x
8       varref    y
9       plus
10      unbind    2
11      return
</code></pre></div></div>

<p>It’s 12 instructions, 5 of which deal with dynamic bindings. The
byte-compiler doesn’t always produce optimal byte-code, but this just
so happens to be <em>nearly</em> optimal byte-code. The <code class="language-plaintext highlighter-rouge">discard</code> (a very
fast instruction) isn’t necessary, but otherwise no more compiler
smarts can improve on this. Since the variables <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> are
visible to <code class="language-plaintext highlighter-rouge">foo</code>, they must be bound before the call and <a href="/blog/2016/07/25/">loaded after
the call</a>. While generally this function will return 3, the
compiler cannot assume so since it ultimately depends on the behavior
<code class="language-plaintext highlighter-rouge">foo</code>. Its hands are tied.</p>

<p>Compare this to the lexical scope version (<code class="language-plaintext highlighter-rouge">lexical-binding: t</code>):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  1
1       constant  2
2       constant  foo
3       call      0
4       discard
5       stack-ref 1
6       stack-ref 1
7       plus
8       return
</code></pre></div></div>

<p>It’s only 8 instructions, none of which are expensive dynamic variable
instructions. And this isn’t even close to the optimal byte-code. In
fact, as of Emacs 25.1 the byte-compiler often doesn’t produce the
optimal byte-code for lexical scope code and still needs some work.
<strong>Despite not firing on all cylinders, lexical scope still manages to
beat dynamic scope in performance benchmarks.</strong></p>

<p>Here’s the optimal byte-code, should the byte-compiler become smarter
someday:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  foo
1       call      0
2       constant  3
3       return
</code></pre></div></div>

<p>It’s down to 4 instructions due to computing the math operation at
compile time. Emacs’ byte-compiler only has rudimentary constant
folding, so it doesn’t notice that <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> are constants and
misses this optimization. I speculate this is due to its roots
compiling under dynamic scope. Since <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> are no longer exposed
to <code class="language-plaintext highlighter-rouge">foo</code>, the compiler has the opportunity to optimize them out of
existence. I haven’t measured it, but I would expect this to be
significantly faster than the dynamic scope version of this function.</p>

<h3 id="optional-dynamic-scope">Optional dynamic scope</h3>

<p>You might be thinking, “What if I really <em>do</em> want <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> to be
dynamically bound for <code class="language-plaintext highlighter-rouge">foo</code>?” This is often useful. Many of Emacs’ own
functions are designed to have certain variables dynamically bound
around them. For example, the print family of functions use the global
variable <code class="language-plaintext highlighter-rouge">standard-output</code> to determine where to send output by
default.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">standard-output</span> <span class="p">(</span><span class="nv">current-buffer</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">princ</span> <span class="s">"value = "</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">prin1</span> <span class="nv">value</span><span class="p">))</span>
</code></pre></div></div>

<p>Have no fear: <strong>With <code class="language-plaintext highlighter-rouge">lexical-binding: t</code> you can have your cake and
eat it too.</strong> Variables declared with <code class="language-plaintext highlighter-rouge">defvar</code>, <code class="language-plaintext highlighter-rouge">defconst</code>, or
<code class="language-plaintext highlighter-rouge">defvaralias</code> are marked as “special” with an internal bit flag
(<code class="language-plaintext highlighter-rouge">declared_special</code> in C). When the compiler detects one of these
variables (<code class="language-plaintext highlighter-rouge">special-variable-p</code>), it uses a classical dynamic binding.</p>

<p>Declaring both <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> as special restores the original semantics,
reverting <code class="language-plaintext highlighter-rouge">bar</code> back to its old byte-code definition (next time it’s
compiled, that is). But it would be poor form to mark <code class="language-plaintext highlighter-rouge">x</code> or <code class="language-plaintext highlighter-rouge">y</code> as
special: You’d de-optimize all code (compiled <em>after</em> the declaration)
anywhere in Emacs that uses these names. As a package author, only do
this with the namespace-prefixed variables that belong to you.</p>

<p>The only way to unmark a special variable is with the undocumented
function <code class="language-plaintext highlighter-rouge">internal-make-var-non-special</code>. I expected <code class="language-plaintext highlighter-rouge">makunbound</code> to
do this, but as of Emacs 25.1 it does not. This could possibly be
considered a bug.</p>

<h3 id="accidental-closures">Accidental closures</h3>

<p>I’ve said there are absolutely no advantages to <code class="language-plaintext highlighter-rouge">lexical-binding: nil</code>.
It’s only the default for the sake of backwards-compatibility. However,
there <em>is</em> one case where <code class="language-plaintext highlighter-rouge">lexical-binding: t</code> introduces a subtle issue
that would otherwise not exist. Take this code for example (and
nevermind <code class="language-plaintext highlighter-rouge">prin1-to-string</code> for a moment):</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; -*- lexical-binding: t; -*-</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">function-as-string</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">with-temp-buffer</span>
    <span class="p">(</span><span class="nb">prin1</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="ss">:example</span><span class="p">)</span> <span class="p">(</span><span class="nv">current-buffer</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)))</span>
</code></pre></div></div>

<p>This creates and serializes a closure, which is <a href="/blog/2013/12/30/">one of Elisp’s unique
features</a>. It doesn’t close over any variables, so it should be
pretty simple. However, this function will only work correctly under
<code class="language-plaintext highlighter-rouge">lexical-binding: t</code> when byte-compiled.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">function-as-string</span><span class="p">)</span>
<span class="c1">;; =&gt; "(closure ((temp-buffer . #&lt;buffer  *temp*&gt;) t) nil :example)"</span>
</code></pre></div></div>

<p>The interpreter doesn’t analyze the closure, so just closes over
everything. This includes the hidden variable <code class="language-plaintext highlighter-rouge">temp-buffer</code> created by
the <code class="language-plaintext highlighter-rouge">with-temp-buffer</code> macro, resulting in an abstraction leak.
Buffers aren’t readable, so this will signal an error if an attempt is
made to read this function back into an s-expression. The
byte-compiler fixes this by noticing <code class="language-plaintext highlighter-rouge">temp-buffer</code> isn’t actually
closed over and so doesn’t include it in the closure, making it work
correctly.</p>

<p>Under <code class="language-plaintext highlighter-rouge">lexical-binding: nil</code> it works correctly either way:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">function-as-string</span><span class="p">)</span>
<span class="c1">;; -&gt; "(lambda nil :example)"</span>
</code></pre></div></div>

<p>This may seem contrived — it’s certainly unlikely — but <a href="https://github.com/jwiegley/emacs-async/issues/17">it has come
up in practice</a>. Still, it’s no reason to avoid <code class="language-plaintext highlighter-rouge">lexical-binding: t</code>.</p>

<h3 id="use-lexical-scope-in-all-new-code">Use lexical scope in all new code</h3>

<p>As I’ve said again and again, always use <code class="language-plaintext highlighter-rouge">lexical-binding: t</code>. Use
dynamic variables judiciously. And <code class="language-plaintext highlighter-rouge">lexical-let</code> is no replacement. It
has virtually none of the benefits, performs <em>worse</em>, and it only
applies to <code class="language-plaintext highlighter-rouge">let</code>, not any of the other places bindings are created:
function parameters, <code class="language-plaintext highlighter-rouge">dotimes</code>, <code class="language-plaintext highlighter-rouge">dolist</code>, and <code class="language-plaintext highlighter-rouge">condition-case</code>.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Zero-allocation Trie Traversal</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/11/13/"/>
    <id>urn:uuid:38dd798b-9e27-3109-590c-3a8482f634a7</id>
    <updated>2016-11-13T06:03:24Z</updated>
    <category term="c"/><category term="compsci"/>
    <content type="html">
      <![CDATA[<p>As part of a demonstration in <a href="/blog/2016/11/15/">an upcoming article</a>, I wrote a
simple <a href="https://en.wikipedia.org/wiki/Trie">trie</a> implementation. A trie is a search tree where the
keys are a sequence of symbols (i.e. strings). Strings with a common
prefix share an initial path down the trie, and the keys themselves
are stored implicitly by the structure of the trie. It’s commonly used
as a sorted set or, when values are associated with nodes, an
associative array.</p>

<p>This wasn’t my first time writing a trie. The curse of programming in
C is rewriting the same data structures and algorithms over and over.
It’s the problem C++ templates are intended to solve. This rewriting
isn’t always bad since each implementation is typically customized for
its specific use, often resulting in greater performance and a smaller
resource footprint.</p>

<p>Every time I’ve rewritten a trie, my implementation is a little bit
better than the last. This time around I discovered an approach for
traversing, both depth-first and breadth-first, an arbitrarily-sized
trie without memory allocation. I’m definitely not the first to
discover something like this. There’s <a href="https://xlinux.nist.gov/dads/HTML/SchorrWaiteGraphMarking.html">Deutsch-Schorr-Waite pointer
reversal</a> for binary graphs (1965) — which I originally learned
from reading the <a href="http://t3x.org/s9fes/">Scheme 9 from Outer Space</a> garbage collector
source — and <a href="http://www.geeksforgeeks.org/morris-traversal-for-preorder/">Morris in-order traversal</a> (1979) for binary
trees. The former requires two extra tag bits per node and the latter
requires no modifications at all.</p>

<h3 id="whats-a-trie">What’s a trie?</h3>

<p>But before I go further, some background. A trie can come in many
shapes and sizes, but in the simple case each node of a trie has as
many pointers as its alphabet. For illustration purposes, imagine a
trie for strings of only four characters: A, B, C, and D. Each node is
essentially four pointers.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define TRIE_ALPHABET_SIZE  4
#define TRIE_STATIC_INIT    {.flags = 0}
#define TRIE_TERMINAL_FLAG  (1U &lt;&lt; 0)
</span>
<span class="k">struct</span> <span class="n">trie</span> <span class="p">{</span>
    <span class="k">struct</span> <span class="n">trie</span> <span class="o">*</span><span class="n">next</span><span class="p">[</span><span class="n">TRIE_ALPHABET_SIZE</span><span class="p">];</span>
    <span class="kt">unsigned</span> <span class="n">flags</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>It includes a <code class="language-plaintext highlighter-rouge">flags</code> field, where a single bit tracks whether or not
a node is terminal — that is, a key terminates at this node. Terminal
nodes are not necessarily leaf nodes, which is the case when one key
is a prefix of another key. I could instead have used a 1-bit
bit-field (e.g. <code class="language-plaintext highlighter-rouge">int is_terminal : 1;</code>) but I don’t like bit-fields.</p>

<p>A trie with the following keys, inserted in any order:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>AAAAA
ABCD
CAA
CAD
CDBD
</code></pre></div></div>

<p>Looks like this (terminal nodes illustrated as small black squares):</p>

<p><img src="/img/trie/trie.svg" alt="" /></p>

<p>The root of the trie is the empty string, and each child represents a
trie prefixed with one of the symbols from the alphabet. This is a
nice recursive definition, and it’s tempting to write recursive
functions to process it. For example, here’s a recursive insertion
function.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>
<span class="nf">trie_insert_recursive</span><span class="p">(</span><span class="k">struct</span> <span class="n">trie</span> <span class="o">*</span><span class="n">t</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">s</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!*</span><span class="n">s</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">t</span><span class="o">-&gt;</span><span class="n">flags</span> <span class="o">|=</span> <span class="n">TRIE_TERMINAL_FLAG</span><span class="p">;</span>
        <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="o">*</span><span class="n">s</span> <span class="o">-</span> <span class="sc">'A'</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="p">{</span>
        <span class="n">t</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">[</span><span class="n">i</span><span class="p">]));</span>
        <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
            <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
        <span class="o">*</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">trie</span><span class="p">)</span><span class="n">TRIE_STATIC_INIT</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">trie_insert_recursive</span><span class="p">(</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">s</span> <span class="o">+</span> <span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>If the string is empty (<code class="language-plaintext highlighter-rouge">!*s</code>), mark the current node as terminal.
Otherwise recursively insert the substring under the appropriate
child. That’s a tail call, and any optimizing compiler would optimize
this call into a jump back to the beginning of of the function
(tail-call optimization), reusing the stack frame as if it were a
simple loop.</p>

<p>If that’s not good enough, such as when optimization is disabled for
debugging and the recursive definition is blowing the stack, this is
trivial to convert to a safe, iterative function. I prefer this
version anyway.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>
<span class="nf">trie_insert</span><span class="p">(</span><span class="k">struct</span> <span class="n">trie</span> <span class="o">*</span><span class="n">t</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">s</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">for</span> <span class="p">(;</span> <span class="o">*</span><span class="n">s</span><span class="p">;</span> <span class="n">s</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="o">*</span><span class="n">s</span> <span class="o">-</span> <span class="sc">'A'</span><span class="p">;</span>
        <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="p">{</span>
            <span class="n">t</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">[</span><span class="n">i</span><span class="p">]));</span>
            <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
                <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
            <span class="o">*</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">trie</span><span class="p">)</span><span class="n">TRIE_STATIC_INIT</span><span class="p">;</span>
        <span class="p">}</span>
        <span class="n">t</span> <span class="o">=</span> <span class="n">t</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
    <span class="p">}</span>
    <span class="n">t</span><span class="o">-&gt;</span><span class="n">flags</span> <span class="o">|=</span> <span class="n">TRIE_TERMINAL_FLAG</span><span class="p">;</span>
    <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Finding a particular prefix in the trie iteratively is also easy. This
would be used to narrow the trie to a chosen prefix before iterating
over the keys (e.g. find all strings matching a prefix).</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">trie</span> <span class="o">*</span>
<span class="nf">trie_find</span><span class="p">(</span><span class="k">struct</span> <span class="n">trie</span> <span class="o">*</span><span class="n">t</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">s</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">for</span> <span class="p">(;</span> <span class="o">*</span><span class="n">s</span><span class="p">;</span> <span class="n">s</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="o">*</span><span class="n">s</span> <span class="o">-</span> <span class="sc">'A'</span><span class="p">;</span>
        <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
            <span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
        <span class="n">t</span> <span class="o">=</span> <span class="n">t</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">t</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Depth-first traversal is <em>stack-oriented</em>. The stack represents the
path through the graph, and each new vertex is pushed into this stack
as it’s visited. A recursive traversal function can implicitly use the
call stack for storing this information, so no additional data
structure is needed.</p>

<p>The downside is that the call is no longer tail-recursive, so a large
trie will blow the stack. Also, the caller needs to provide a callback
function because the stack cannot unwind to return a value: The stack
has important state on it. Here’s a typedef for the callback.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="nf">void</span> <span class="p">(</span><span class="o">*</span><span class="n">trie_visitor</span><span class="p">)(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">key</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">arg</span><span class="p">);</span>
</code></pre></div></div>

<p>And here’s the recursive depth-first traversal function. The top-level
caller passes the same buffer for <code class="language-plaintext highlighter-rouge">buf</code> and <code class="language-plaintext highlighter-rouge">bufend</code>, which must be at
least as large as the largest key. The visited key will be written to
this buffer and passed to the visitor.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span>
<span class="nf">trie_dfs_recursive</span><span class="p">(</span><span class="k">struct</span> <span class="n">trie</span> <span class="o">*</span><span class="n">t</span><span class="p">,</span>
                   <span class="kt">char</span> <span class="o">*</span><span class="n">buf</span><span class="p">,</span>
                   <span class="kt">char</span> <span class="o">*</span><span class="n">bufend</span><span class="p">,</span>
                   <span class="n">trie_visitor</span> <span class="n">v</span><span class="p">,</span>
                   <span class="kt">void</span> <span class="o">*</span><span class="n">arg</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">flags</span> <span class="o">&amp;</span> <span class="n">TRIE_TERMINAL_FLAG</span><span class="p">)</span> <span class="p">{</span>
        <span class="o">*</span><span class="n">bufend</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
        <span class="n">v</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="n">arg</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">TRIE_ALPHABET_SIZE</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="p">{</span>
            <span class="o">*</span><span class="n">bufend</span> <span class="o">=</span> <span class="sc">'A'</span> <span class="o">+</span> <span class="n">i</span><span class="p">;</span>
            <span class="n">trie_dfs_recursive</span><span class="p">(</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">buf</span><span class="p">,</span> <span class="n">bufend</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="n">arg</span><span class="p">);</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<h4 id="heap-allocated-traversal-stack">Heap-allocated Traversal Stack</h4>

<p>Moving the traversal stack to the heap would eliminate the stack
overflow problem and it would allow control to return to the caller.
This is going to be a lot of code for an article, but bear with me.</p>

<p>First define an iterator object. The stack will need two pieces of
information: which node did we come from (<code class="language-plaintext highlighter-rouge">p</code>) and through which
pointer (<code class="language-plaintext highlighter-rouge">i</code>). When a node has been exhausted, this will allow return
to the parent. The <code class="language-plaintext highlighter-rouge">root</code> field tracks when traversal is complete.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">trie_iter</span> <span class="p">{</span>
    <span class="k">struct</span> <span class="n">trie</span> <span class="o">*</span><span class="n">root</span><span class="p">;</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">buf</span><span class="p">;</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">bufend</span><span class="p">;</span>
    <span class="k">struct</span> <span class="p">{</span>
        <span class="k">struct</span> <span class="n">trie</span> <span class="o">*</span><span class="n">p</span><span class="p">;</span>
        <span class="kt">int</span> <span class="n">i</span><span class="p">;</span>
    <span class="p">}</span> <span class="o">*</span><span class="n">stack</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>A special value of -1 in <code class="language-plaintext highlighter-rouge">i</code> means it’s the first visit for this node
and it should be visited by the callback if it’s terminal.</p>

<p>The iterator is initialized with <code class="language-plaintext highlighter-rouge">trie_iter_init</code>. The <code class="language-plaintext highlighter-rouge">max</code> indicates
the maximum length of any key. A more elaborate implementation could
automatically grow the stack to accommodate (e.g. realloc()), but I’m
keeping it as simple as possible.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>
<span class="nf">trie_iter_init</span><span class="p">(</span><span class="k">struct</span> <span class="n">trie_iter</span> <span class="o">*</span><span class="n">it</span><span class="p">,</span> <span class="k">struct</span> <span class="n">trie</span> <span class="o">*</span><span class="n">t</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">max</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">it</span><span class="o">-&gt;</span><span class="n">root</span> <span class="o">=</span> <span class="n">t</span><span class="p">;</span>
    <span class="n">it</span><span class="o">-&gt;</span><span class="n">stack</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">stack</span><span class="p">)</span> <span class="o">*</span> <span class="n">max</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">stack</span><span class="p">)</span>
        <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">it</span><span class="o">-&gt;</span><span class="n">buf</span> <span class="o">=</span> <span class="n">it</span><span class="o">-&gt;</span><span class="n">bufend</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="n">max</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">buf</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">free</span><span class="p">(</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">stack</span><span class="p">);</span>
        <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="n">it</span><span class="o">-&gt;</span><span class="n">stack</span><span class="o">-&gt;</span><span class="n">p</span> <span class="o">=</span> <span class="n">t</span><span class="p">;</span>
    <span class="n">it</span><span class="o">-&gt;</span><span class="n">stack</span><span class="o">-&gt;</span><span class="n">i</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
    <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">void</span>
<span class="nf">trie_iter_destroy</span><span class="p">(</span><span class="k">struct</span> <span class="n">trie_iter</span> <span class="o">*</span><span class="n">it</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">free</span><span class="p">(</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">stack</span><span class="p">);</span>
    <span class="n">it</span><span class="o">-&gt;</span><span class="n">stack</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
    <span class="n">free</span><span class="p">(</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">buf</span><span class="p">);</span>
    <span class="n">it</span><span class="o">-&gt;</span><span class="n">buf</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And finally the complicated part. This uses the allocated stack to
explore the trie in a loop until it hits a terminal, at which point it
returns. A further call continues the traversal from where it left
off. It’s like a hand-coded <a href="https://en.wikipedia.org/wiki/Generator_(computer_programming)">generator</a>. With the way it’s
written, the caller is obligated to follow through with the entire
iteration before destroying the iterator, but this would be easy to
correct.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>
<span class="nf">trie_iter_next</span><span class="p">(</span><span class="k">struct</span> <span class="n">trie_iter</span> <span class="o">*</span><span class="n">it</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">for</span> <span class="p">(;;)</span> <span class="p">{</span>
        <span class="k">struct</span> <span class="n">trie</span> <span class="o">*</span><span class="n">current</span> <span class="o">=</span> <span class="n">it</span><span class="o">-&gt;</span><span class="n">stack</span><span class="o">-&gt;</span><span class="n">p</span><span class="p">;</span>
        <span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="n">it</span><span class="o">-&gt;</span><span class="n">stack</span><span class="o">-&gt;</span><span class="n">i</span><span class="o">++</span><span class="p">;</span>

        <span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
            <span class="cm">/* Return result if terminal node. */</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">current</span><span class="o">-&gt;</span><span class="n">flags</span> <span class="o">&amp;</span> <span class="n">TRIE_TERMINAL_FLAG</span><span class="p">)</span> <span class="p">{</span>
                <span class="o">*</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">bufend</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
                <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
            <span class="p">}</span>
            <span class="k">continue</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="n">TRIE_ALPHABET_SIZE</span><span class="p">)</span> <span class="p">{</span>
            <span class="cm">/* End of current node. */</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">current</span> <span class="o">==</span> <span class="n">it</span><span class="o">-&gt;</span><span class="n">root</span><span class="p">)</span>
                <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>  <span class="c1">// back at root, done</span>
            <span class="n">it</span><span class="o">-&gt;</span><span class="n">stack</span><span class="o">--</span><span class="p">;</span>
            <span class="n">it</span><span class="o">-&gt;</span><span class="n">bufend</span><span class="o">--</span><span class="p">;</span>
            <span class="k">continue</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="k">if</span> <span class="p">(</span><span class="n">current</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="p">{</span>
            <span class="cm">/* Push on next child node. */</span>
            <span class="o">*</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">bufend</span> <span class="o">=</span> <span class="sc">'A'</span> <span class="o">+</span> <span class="n">i</span><span class="p">;</span>
            <span class="n">it</span><span class="o">-&gt;</span><span class="n">stack</span><span class="o">++</span><span class="p">;</span>
            <span class="n">it</span><span class="o">-&gt;</span><span class="n">bufend</span><span class="o">++</span><span class="p">;</span>
            <span class="n">it</span><span class="o">-&gt;</span><span class="n">stack</span><span class="o">-&gt;</span><span class="n">p</span> <span class="o">=</span> <span class="n">current</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
            <span class="n">it</span><span class="o">-&gt;</span><span class="n">stack</span><span class="o">-&gt;</span><span class="n">i</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This is <em>much</em> nicer for the caller since there’s no control inverse.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">trie_iter</span> <span class="n">it</span><span class="p">;</span>
<span class="n">trie_iter_init</span><span class="p">(</span><span class="o">&amp;</span><span class="n">it</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">trie_root</span><span class="p">,</span> <span class="n">KEY_MAX</span><span class="p">);</span>
<span class="k">while</span> <span class="p">(</span><span class="n">trie_iter_next</span><span class="p">(</span><span class="o">&amp;</span><span class="n">it</span><span class="p">))</span> <span class="p">{</span>
    <span class="c1">// ... do something with it.buf ...</span>
<span class="p">}</span>
<span class="n">trie_iter_destroy</span><span class="p">(</span><span class="o">&amp;</span><span class="n">it</span><span class="p">);</span>
</code></pre></div></div>

<p>There are a few downsides to this:</p>

<ol>
  <li>
    <p>Initialization could fail (not checked in the example) since it
allocates memory.</p>
  </li>
  <li>
    <p>Either the caller has to keep track of the maximum key length, or
the iterator grows the stack automatically, which would mean
iteration could fail at any point in the middle.</p>
  </li>
  <li>
    <p>In order to destroy the trie, it needs to be traversed: Freeing
memory first requires allocating memory. If the program is out of
memory, it cannot destroy the trie to clean up before handling the
situation, nor to make more memory available. It’s not good for
resilience.</p>
  </li>
</ol>

<p>Wouldn’t it be nice to traverse the trie without memory allocation?</p>

<h3 id="modifying-the-trie">Modifying the Trie</h3>

<p>Rather than allocate a separate stack, the stack can be allocated
across the individual nodes of the trie. Remember those <code class="language-plaintext highlighter-rouge">p</code> and <code class="language-plaintext highlighter-rouge">i</code>
fields from before? Put them on the trie.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">trie_v2</span> <span class="p">{</span>
    <span class="k">struct</span> <span class="n">trie_v2</span> <span class="o">*</span><span class="n">next</span><span class="p">[</span><span class="n">TRIE_ALPHABET_SIZE</span><span class="p">];</span>
    <span class="k">struct</span> <span class="n">trie_v2</span> <span class="o">*</span><span class="n">p</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">i</span><span class="p">;</span>
    <span class="kt">unsigned</span> <span class="n">flags</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p><img src="/img/trie/trie_v2.svg" alt="" /></p>

<p>This automatically scales with the size of the trie, so there will
always be enough of this stack. With the stack “pre-allocated” like
this, traversal requires no additional memory allocation.</p>

<p>The iterator itself becomes a little simpler. It cannot fail and it
doesn’t need a destructor.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">trie_v2_iter</span> <span class="p">{</span>
    <span class="k">struct</span> <span class="n">trie_v2</span> <span class="o">*</span><span class="n">current</span><span class="p">;</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">buf</span><span class="p">;</span>
<span class="p">};</span>

<span class="kt">void</span>
<span class="nf">trie_v2_iter_init</span><span class="p">(</span><span class="k">struct</span> <span class="n">trie_v2_iter</span> <span class="o">*</span><span class="n">it</span><span class="p">,</span> <span class="k">struct</span> <span class="n">trie_v2</span> <span class="o">*</span><span class="n">t</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">buf</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">t</span><span class="o">-&gt;</span><span class="n">p</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
    <span class="n">t</span><span class="o">-&gt;</span><span class="n">i</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
    <span class="n">it</span><span class="o">-&gt;</span><span class="n">current</span> <span class="o">=</span> <span class="n">t</span><span class="p">;</span>
    <span class="n">it</span><span class="o">-&gt;</span><span class="n">buf</span> <span class="o">=</span> <span class="n">buf</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The iteration function itself is almost identical to before. Rather
than increment a stack pointer, it uses <code class="language-plaintext highlighter-rouge">p</code> to chain the nodes as a
linked list.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>
<span class="nf">trie_v2_iter_next</span><span class="p">(</span><span class="k">struct</span> <span class="n">trie_v2_iter</span> <span class="o">*</span><span class="n">it</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">for</span> <span class="p">(;;)</span> <span class="p">{</span>
        <span class="k">struct</span> <span class="n">trie_v2</span> <span class="o">*</span><span class="n">current</span> <span class="o">=</span> <span class="n">it</span><span class="o">-&gt;</span><span class="n">current</span><span class="p">;</span>
        <span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="n">it</span><span class="o">-&gt;</span><span class="n">current</span><span class="o">-&gt;</span><span class="n">i</span><span class="o">++</span><span class="p">;</span>

        <span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
            <span class="cm">/* Return result if terminal node. */</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">current</span><span class="o">-&gt;</span><span class="n">flags</span> <span class="o">&amp;</span> <span class="n">TRIE_TERMINAL_FLAG</span><span class="p">)</span> <span class="p">{</span>
                <span class="o">*</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">buf</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
                <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
            <span class="p">}</span>
            <span class="k">continue</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="n">TRIE_ALPHABET_SIZE</span><span class="p">)</span> <span class="p">{</span>
            <span class="cm">/* End of current node. */</span>
            <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">current</span><span class="o">-&gt;</span><span class="n">p</span><span class="p">)</span>
                <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
            <span class="n">it</span><span class="o">-&gt;</span><span class="n">current</span> <span class="o">=</span> <span class="n">current</span><span class="o">-&gt;</span><span class="n">p</span><span class="p">;</span>
            <span class="n">it</span><span class="o">-&gt;</span><span class="n">buf</span><span class="o">--</span><span class="p">;</span>
            <span class="k">continue</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="k">if</span> <span class="p">(</span><span class="n">current</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="p">{</span>
            <span class="cm">/* Push on next child node. */</span>
            <span class="o">*</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">buf</span> <span class="o">=</span> <span class="sc">'A'</span> <span class="o">+</span> <span class="n">i</span><span class="p">;</span>
            <span class="n">it</span><span class="o">-&gt;</span><span class="n">buf</span><span class="o">++</span><span class="p">;</span>
            <span class="n">it</span><span class="o">-&gt;</span><span class="n">current</span> <span class="o">=</span> <span class="n">current</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
            <span class="n">it</span><span class="o">-&gt;</span><span class="n">current</span><span class="o">-&gt;</span><span class="n">p</span> <span class="o">=</span> <span class="n">current</span><span class="p">;</span>
            <span class="n">it</span><span class="o">-&gt;</span><span class="n">current</span><span class="o">-&gt;</span><span class="n">i</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
        <span class="p">}</span>

    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>During traversal the iteration pointers look something like this:</p>

<p><img src="/img/trie/trie_v2-dfs.svg" alt="" /></p>

<p>This is not without its downsides:</p>

<ol>
  <li>
    <p>Traversal is not re-entrant nor thread-safe. It’s not possible to
run multiple in-place iterators side by side on the same trie since
they’ll clobber each other.</p>
  </li>
  <li>
    <p>It uses more memory — O(n) rather than O(max-key-length) — and sits
on this extra memory for its entire lifetime.</p>
  </li>
</ol>

<h4 id="breadth-first-traversal">Breadth-first Traversal</h4>

<p>The same technique can be used for breadth-first search, which is
<em>queue-oriented</em> rather than stack-oriented. The <code class="language-plaintext highlighter-rouge">p</code> pointers are
instead chained into a queue, with a <code class="language-plaintext highlighter-rouge">head</code> and <code class="language-plaintext highlighter-rouge">tail</code> pointer
variable for each end. As each node is visited, its children are
pushed into the queue linked list.</p>

<p>This isn’t good for visiting keys by name. <code class="language-plaintext highlighter-rouge">buf</code> was itself a stack
and played nicely with depth-first traversal, but there’s no easy way
to build up a key in a buffer breadth-first. So instead here’s a
function to destroy a trie breadth-first.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span>
<span class="nf">trie_v2_destroy</span><span class="p">(</span><span class="k">struct</span> <span class="n">trie_v2</span> <span class="o">*</span><span class="n">t</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">trie_v2</span> <span class="o">*</span><span class="n">head</span> <span class="o">=</span> <span class="n">t</span><span class="p">;</span>
    <span class="k">struct</span> <span class="n">trie_v2</span> <span class="o">*</span><span class="n">tail</span> <span class="o">=</span> <span class="n">t</span><span class="p">;</span>
    <span class="k">while</span> <span class="p">(</span><span class="n">head</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">TRIE_ALPHABET_SIZE</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
            <span class="k">struct</span> <span class="n">trie_v2</span> <span class="o">*</span><span class="n">next</span> <span class="o">=</span> <span class="n">head</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">next</span><span class="p">)</span> <span class="p">{</span>
                <span class="n">next</span><span class="o">-&gt;</span><span class="n">p</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
                <span class="n">tail</span><span class="o">-&gt;</span><span class="n">p</span> <span class="o">=</span> <span class="n">next</span><span class="p">;</span>
                <span class="n">tail</span> <span class="o">=</span> <span class="n">next</span><span class="p">;</span>
            <span class="p">}</span>
        <span class="p">}</span>
        <span class="k">struct</span> <span class="n">trie_v2</span> <span class="o">*</span><span class="n">dead</span> <span class="o">=</span> <span class="n">head</span><span class="p">;</span>
        <span class="n">head</span> <span class="o">=</span> <span class="n">head</span><span class="o">-&gt;</span><span class="n">p</span><span class="p">;</span>
        <span class="n">free</span><span class="p">(</span><span class="n">dead</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>During its traversal the <code class="language-plaintext highlighter-rouge">p</code> pointers link up like so:</p>

<p><img src="/img/trie/trie_v2-bfs.svg" alt="" /></p>

<h3 id="further-research">Further Research</h3>

<p>In my real code there’s also a flag to indicate the node’s allocation
type: static or heap. This allows a trie to be composed of nodes from
both kinds of allocations while still safe to destroy. It might also
be useful to pack a reference counter into this space so that a node
could be shared by more than one trie.</p>

<p>For a production implementation it may be worth packing <code class="language-plaintext highlighter-rouge">i</code> into the
<code class="language-plaintext highlighter-rouge">flags</code> field since it only needs a few bits, even with larger
alphabets. Also, I bet, as in Deutsch-Schorr-Waite, the <code class="language-plaintext highlighter-rouge">p</code> field
could be eliminated and instead one of the child pointers is
temporarily reversed. With these changes, this technique would fit
into the original <code class="language-plaintext highlighter-rouge">struct trie</code> without changes, eliminating the extra
memory usage.</p>

<p>Update: Over on Hacker News, <a href="https://news.ycombinator.com/item?id=12943339">psi-squared has interesting
suggestions</a> such as leaving the traversal pointers intact,
particularly in the case of a breadth-first search, which, until the
next trie modification, allows for concurrent follow-up traversals.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Makefile Assignments are Turing-Complete</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/04/30/"/>
    <id>urn:uuid:49f54bce-b7da-374e-1e0e-1724b92e3e1f</id>
    <updated>2016-04-30T03:01:22Z</updated>
    <category term="lang"/><category term="compsci"/><category term="posix"/>
    <content type="html">
      <![CDATA[<p>For over a decade now, GNU Make has almost exclusively been my build
system of choice, either directly or indirectly. Unfortunately this
means I unnecessarily depend on some GNU extensions — an annoyance when
porting to the BSDs. In an effort to increase the portability of my
Makefiles, I recently read <a href="http://pubs.opengroup.org/onlinepubs/9699919799/utilities/make.html">the POSIX make specification</a>. I
learned two important things: 1) <del>POSIX make is so barren it’s not
really worth striving for</del> (<em>update</em>: I’ve <a href="/blog/2017/08/20/">changed my mind</a>),
and 2) <strong>make’s macro assignment mechanism is Turing-complete</strong>.</p>

<p>If you want to see it in action for yourself before reading further,
here’s a Makefile that implements Conway’s Game of Life (40x40) using
only macro assignments.</p>

<ul>
  <li><a href="/download/life.mak"><strong>life.mak</strong></a> (174kB) [<a href="https://github.com/skeeto/makefile-game-of-life">or generate your own</a>]</li>
</ul>

<p>Run it with any make program in an ANSI terminal. It <em>must</em> literally
be named <code class="language-plaintext highlighter-rouge">life.mak</code>. Beware: if you run it longer than a few minutes,
your computer may begin thrashing.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make -f life.mak
</code></pre></div></div>

<p>It’s 100% POSIX-compatible except for the <code class="language-plaintext highlighter-rouge">sleep 0.1</code> (fractional
sleep), which is only needed for visual effect.</p>

<h3 id="a-posix-workaround">A POSIX workaround</h3>

<p>Unlike virtually every real world implementation, POSIX make doesn’t
support conditional parts. For example, you might want your Makefile’s
behavior to change depending on the value of certain variables. In GNU
Make it looks like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ifdef USE_FOO
    EXTRA_FLAGS = -ffoo -lfoo
else
    EXTRA_FLAGS = -Wbar
endif
</code></pre></div></div>

<p>Or BSD-style:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.ifdef USE_FOO
    EXTRA_FLAGS = -ffoo -lfoo
.else
    EXTRA_FLAGS = -Wbar
.endif
</code></pre></div></div>

<p>If the goal is to write a strictly POSIX Makefile, how could I work
around the lack of conditional parts and maintain a similar interface?
The selection of macro/variable to evaluate can be dynamically
selected, allowing for some useful tricks. First define the option’s
default:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>USE_FOO = 0
</code></pre></div></div>

<p>Then define both sets of flags:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>EXTRA_FLAGS_0 = -Wbar
EXTRA_FLAGS_1 = -ffoo -lfoo
</code></pre></div></div>

<p>Now dynamically select one of these macros for assignment to
<code class="language-plaintext highlighter-rouge">EXTRA_FLAGS</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>EXTRA_FLAGS = $(EXTRA_FLAGS_$(USE_FOO))
</code></pre></div></div>

<p>The assignment on the command line overrides the assignment in the
Makefile, so the user gets to override <code class="language-plaintext highlighter-rouge">USE_FOO</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ make              # EXTRA_FLAGS = -Wbar
$ make USE_FOO=0    # EXTRA_FLAGS = -Wbar
$ make USE_FOO=1    # EXTRA_FLAGS = -ffoo -lfoo
</code></pre></div></div>

<p>Before reading the POSIX specification, I didn’t realize that the
<em>left</em> side of an assignment can get the same treatment. For example,
if I really want the “if defined” behavior back, I can use the macro
to mangle the left-hand side. For example,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>EXTRA_FLAGS = -O0 -g3
EXTRA_FLAGS$(DEBUG) = -O3 -DNDEBUG
</code></pre></div></div>

<p>Caveat: If <code class="language-plaintext highlighter-rouge">DEBUG</code> is set to empty, it may still result in true for
<code class="language-plaintext highlighter-rouge">ifdef</code> depending on which make flavor you’re using, but will always
<em>appear</em> to be unset in this hack.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ make             # EXTRA_FLAGS = -O3 -DNDEBUG
$ make DEBUG=yes   # EXTRA_FLAGS = -O0 -g3
</code></pre></div></div>

<p>This last case had me thinking: This is very similar to the (ab)use of
the x86 <code class="language-plaintext highlighter-rouge">mov</code> instruction in <a href="https://www.cl.cam.ac.uk/~sd601/papers/mov.pdf">mov is Turing-complete</a>. These
macro assignments alone should be enough to compute <em>any</em> algorithm.</p>

<h3 id="macro-operations">Macro Operations</h3>

<p>Macro names are just keys to a global associative array. This can be
used to build lookup tables. Here’s a Makefile to “compute” the square
root of integers between 0 and 10.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sqrt_0  = 0.000000
sqrt_1  = 1.000000
sqrt_2  = 1.414214
sqrt_3  = 1.732051
sqrt_4  = 2.000000
sqrt_5  = 2.236068
sqrt_6  = 2.449490
sqrt_7  = 2.645751
sqrt_8  = 2.828427
sqrt_9  = 3.000000
sqrt_10 = 3.162278
result := $(sqrt_$(n))
</code></pre></div></div>

<p>The BSD flavors of make have a <code class="language-plaintext highlighter-rouge">-V</code> option for printing variables,
which is an easy way to retrieve output. I used an “immediate”
assignment (<code class="language-plaintext highlighter-rouge">:=</code>) for <code class="language-plaintext highlighter-rouge">result</code> since some versions of make won’t
evaluate the expression before <code class="language-plaintext highlighter-rouge">-V</code> printing.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ make -f sqrt.mak -V result n=8
2.828427
</code></pre></div></div>

<p>Without <code class="language-plaintext highlighter-rouge">-V</code>, a default target could be used instead:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>output :
        @printf "$(result)\n"
</code></pre></div></div>

<p>There are no math operators, so performing arithmetic <a href="/blog/2008/03/15/">requires some
creativity</a>. For example, integers could be represented as a
series of x characters. The number 4 is <code class="language-plaintext highlighter-rouge">xxxx</code>, the number 6 is
<code class="language-plaintext highlighter-rouge">xxxxxx</code>, etc. Addition is concatenation (note: macros can have <code class="language-plaintext highlighter-rouge">+</code> in
their names):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>A      = xxx
B      = xxxx
A+B    = $(A)$(B)
</code></pre></div></div>

<p>However, since there’s no way to “slice” a value, subtraction isn’t
possible. A more realistic approach to arithmetic would require lookup
tables.</p>

<h3 id="branching">Branching</h3>

<p>Branching could be achieved through more lookup tables. For example,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>square_0  = 1
square_1  = 2
square_2  = 4
# ...
result := $($(op)_$(n))
</code></pre></div></div>

<p>And called as:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ make n=5 op=sqrt    # 2.236068
$ make n=5 op=square  # 25
</code></pre></div></div>

<p>Or using the <code class="language-plaintext highlighter-rouge">DEBUG</code> trick above, use the condition to mask out the
results of the unwanted branch. This is similar to the <code class="language-plaintext highlighter-rouge">mov</code> paper.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>result           := $(op)($(n)) = $($(op)_$(n))
result$(verbose) := $($(op)_$(n))
</code></pre></div></div>

<p>And its usage:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ make n=5 op=square             # 25
$ make n=5 op=square verbose=1   # square(5) = 25
</code></pre></div></div>

<h3 id="what-about-loops">What about loops?</h3>

<p>Looping is a tricky problem. However, one of the most common build
(<a href="http://aegis.sourceforge.net/auug97.pdf">anti</a>?)patterns is the recursive Makefile. Borrowing from the
<code class="language-plaintext highlighter-rouge">mov</code> paper, which used an unconditional jump to restart the program
from the beginning, for a Makefile Turing-completeness I can invoke
the Makefile recursively, restarting the program with a new set of
inputs.</p>

<p>Remember the print target above? I can loop by invoking make again
with new inputs in this target,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>output :
    @printf "$(result)\n"
    @$(MAKE) $(args)
</code></pre></div></div>

<p>Before going any further, now that loops have been added, the natural
next question is halting. In reality, the operating system will take
care of that after some millions of make processes have carelessly
been invoked by this horribly inefficient scheme. However, we can do
better. The program can clobber the <code class="language-plaintext highlighter-rouge">MAKE</code> variable when it’s ready to
halt. Let’s formalize it.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>loop = $(MAKE) $(args)
output :
    @printf "$(result)\n"
    @$(loop)
</code></pre></div></div>

<p>To halt, the program just needs to clear <code class="language-plaintext highlighter-rouge">loop</code>.</p>

<p>Suppose we want to count down to 0. There will be an initial count:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>count = 6
</code></pre></div></div>

<p>A decrement table:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>6  = 5
5  = 4
4  = 3
3  = 2
2  = 1
1  = 0
0  = loop
</code></pre></div></div>

<p>The last line will be used to halt by clearing the name on the right
side. This is <a href="http://c2.com/cgi/wiki?ThreeStarProgrammer">three star</a> territory.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$($($(count))) =
</code></pre></div></div>

<p>The result (current iteration) loop value is computed from the lookup
table.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>result = $($(count))
</code></pre></div></div>

<p>The next loop value is passed via <code class="language-plaintext highlighter-rouge">args</code>. If <code class="language-plaintext highlighter-rouge">loop</code> was cleared above,
this result will be discarded.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>args = count=$(result)
</code></pre></div></div>

<p>With all that in place, invoking the Makefile will print a countdown
from 5 to 0 and quit. This is the general structure for the Game of
Life macro program.</p>

<h3 id="game-of-life">Game of Life</h3>

<p>A universal Turing machine has <a href="http://rendell-attic.org/gol/tm.htm">been implemented in Conway’s Game of
Life</a>. With all that heavy lifting done, one of the easiest
methods today to prove a language’s Turing-completeness is to
implement Conway’s Game of Life. Ignoring the criminal inefficiency of
it, the Game of Life Turing machine could be run on the Game of Life
simulation running on make’s macro assignments.</p>

<p>In the Game of Life program — the one linked at the top of this
article — each cell is stored in a macro named xxyy, after its
position. The top-left most cell is named 0000, then going left to
right, 0100, 0200, etc. Providing input is a matter of assigning each
of these macros. I chose <code class="language-plaintext highlighter-rouge">X</code> for alive and <code class="language-plaintext highlighter-rouge">-</code> for dead, but, as
you’ll see, any two characters permitted in macro names would work as
well.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ make 0000=X 0100=- 0200=- 0300=X ...
</code></pre></div></div>

<p>The next part should be no surprise: The rules of the Game of Life are
encoded as a 512-entry lookup table. The key is formed by
concatenating the cell’s value along with all its neighbors, with
itself in the center.</p>

<p><img src="/img/diagram/make-gol.png" alt="" /></p>

<p>The “beginning” of the table looks like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>--------- = -
X-------- = -
-X------- = -
XX------- = -
--X------ = -
X-X------ = -
-XX------ = -
XXX------ = X
---X----- = -
X--X----- = -
-X-X----- = -
XX-X----- = X
# ...
</code></pre></div></div>

<p>Note: The two right-hand <code class="language-plaintext highlighter-rouge">X</code> values here are the cell coming to life
(exactly three living neighbors). Computing the <em>next</em> value (n0101)
for 0101 is done like so:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>n0101 = $($(0000)$(0100)$(0200)$(0001)$(0101)$(0201)$(0002)$(0102)$(0202))
</code></pre></div></div>

<p>Given these results, constructing the input to the next loop is
simple:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>args = 0000=$(n0000) 0100=$(n0100) 0200=$(n0200) ...
</code></pre></div></div>

<p>The display output, to be given to <code class="language-plaintext highlighter-rouge">printf</code>, is built similarly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>output = $(n0000)$(n0100)$(n0200)$(n0300)...
</code></pre></div></div>

<p>In the real version, this is decorated with an ANSI escape code that
clears the terminal. The <code class="language-plaintext highlighter-rouge">printf</code> interprets the escape byte (<code class="language-plaintext highlighter-rouge">\033</code>)
so that it doesn’t need to appear literally in the source.</p>

<p>And that’s all there is to it: Conway’s Game of Life running in a
Makefile. <a href="https://www.youtube.com/watch?v=dMjQ3hA9mEA">Life, uh, finds a way</a>.</p>

<!-- Obviously the following image is not public domain. -->
<p><img src="/img/humor/life-finds-a-way.jpg" alt="" /></p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Duck Typing vs. Type Erasure</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/04/01/"/>
    <id>urn:uuid:d01a1d2e-2752-35f4-949a-ff69d7f78e22</id>
    <updated>2014-04-01T21:07:31Z</updated>
    <category term="java"/><category term="cpp"/><category term="lang"/><category term="compsci"/>
    <content type="html">
      <![CDATA[<p>Consider the following C++ class.</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;iostream&gt;</span><span class="cp">
</span>
<span class="k">template</span> <span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">&gt;</span>
<span class="k">struct</span> <span class="nc">Caller</span> <span class="p">{</span>
  <span class="k">const</span> <span class="n">T</span> <span class="n">callee_</span><span class="p">;</span>
  <span class="n">Caller</span><span class="p">(</span><span class="k">const</span> <span class="n">T</span> <span class="n">callee</span><span class="p">)</span> <span class="o">:</span> <span class="n">callee_</span><span class="p">(</span><span class="n">callee</span><span class="p">)</span> <span class="p">{}</span>
  <span class="kt">void</span> <span class="n">go</span><span class="p">()</span> <span class="p">{</span> <span class="n">callee_</span><span class="p">.</span><span class="n">call</span><span class="p">();</span> <span class="p">}</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Caller can be parameterized to <em>any</em> type so long as it has a <code class="language-plaintext highlighter-rouge">call()</code>
method. For example, introduce two types, Foo and Bar.</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">Foo</span> <span class="p">{</span>
  <span class="kt">void</span> <span class="n">call</span><span class="p">()</span> <span class="k">const</span> <span class="p">{</span> <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="s">"Foo"</span><span class="p">;</span> <span class="p">}</span>
<span class="p">};</span>

<span class="k">struct</span> <span class="nc">Bar</span> <span class="p">{</span>
  <span class="kt">void</span> <span class="n">call</span><span class="p">()</span> <span class="k">const</span> <span class="p">{</span> <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="s">"Bar"</span><span class="p">;</span> <span class="p">}</span>
<span class="p">};</span>

<span class="kt">int</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
  <span class="n">Caller</span><span class="o">&lt;</span><span class="n">Foo</span><span class="o">&gt;</span> <span class="n">foo</span><span class="p">{</span><span class="n">Foo</span><span class="p">()};</span>
  <span class="n">Caller</span><span class="o">&lt;</span><span class="n">Bar</span><span class="o">&gt;</span> <span class="n">bar</span><span class="p">{</span><span class="n">Bar</span><span class="p">()};</span>
  <span class="n">foo</span><span class="p">.</span><span class="n">go</span><span class="p">();</span>
  <span class="n">bar</span><span class="p">.</span><span class="n">go</span><span class="p">();</span>
  <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
  <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This code compiles cleanly and, when run, emits “FooBar”. This is an
example of <em>duck typing</em> — i.e., “If it looks like a duck, swims like
a duck, and quacks like a duck, then it probably is a duck.” Foo and
Bar are unrelated types. They have no common inheritance, but by
providing the expected interface, they both work with with Caller.
This is a special case of <em>polymorphism</em>.</p>

<p>Duck typing is normally only found in dynamically typed languages.
Thanks to templates, a statically, strongly typed language like C++
can have duck typing without sacrificing any type safety.</p>

<h3 id="java-duck-typing">Java Duck Typing</h3>

<p>Let’s try the same thing in Java using generics.</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">class</span> <span class="nc">Caller</span><span class="o">&lt;</span><span class="no">T</span><span class="o">&gt;</span> <span class="o">{</span>
    <span class="kd">final</span> <span class="no">T</span> <span class="n">callee</span><span class="o">;</span>
    <span class="nc">Caller</span><span class="o">(</span><span class="no">T</span> <span class="n">callee</span><span class="o">)</span> <span class="o">{</span>
        <span class="k">this</span><span class="o">.</span><span class="na">callee</span> <span class="o">=</span> <span class="n">callee</span><span class="o">;</span>
    <span class="o">}</span>
    <span class="kd">public</span> <span class="kt">void</span> <span class="nf">go</span><span class="o">()</span> <span class="o">{</span>
        <span class="n">callee</span><span class="o">.</span><span class="na">call</span><span class="o">();</span>  <span class="c1">// compiler error: cannot find symbol call</span>
    <span class="o">}</span>
<span class="o">}</span>

<span class="kd">class</span> <span class="nc">Foo</span> <span class="o">{</span>
    <span class="kd">public</span> <span class="kt">void</span> <span class="nf">call</span><span class="o">()</span> <span class="o">{</span> <span class="nc">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">print</span><span class="o">(</span><span class="s">"Foo"</span><span class="o">);</span> <span class="o">}</span>
<span class="o">}</span>

<span class="kd">class</span> <span class="nc">Bar</span> <span class="o">{</span>
    <span class="kd">public</span> <span class="kt">void</span> <span class="nf">call</span><span class="o">()</span> <span class="o">{</span> <span class="nc">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">print</span><span class="o">(</span><span class="s">"Bar"</span><span class="o">);</span> <span class="o">}</span>
<span class="o">}</span>

<span class="kd">public</span> <span class="kd">class</span> <span class="nc">Main</span> <span class="o">{</span>
    <span class="kd">public</span> <span class="kd">static</span> <span class="kt">void</span> <span class="nf">main</span><span class="o">(</span><span class="nc">String</span> <span class="n">args</span><span class="o">[])</span> <span class="o">{</span>
        <span class="nc">Caller</span><span class="o">&lt;</span><span class="nc">Foo</span><span class="o">&gt;</span> <span class="n">f</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Caller</span><span class="o">&lt;&gt;(</span><span class="k">new</span> <span class="nc">Foo</span><span class="o">());</span>
        <span class="nc">Caller</span><span class="o">&lt;</span><span class="nc">Bar</span><span class="o">&gt;</span> <span class="n">b</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Caller</span><span class="o">&lt;&gt;(</span><span class="k">new</span> <span class="nc">Bar</span><span class="o">());</span>
        <span class="n">f</span><span class="o">.</span><span class="na">go</span><span class="o">();</span>
        <span class="n">b</span><span class="o">.</span><span class="na">go</span><span class="o">();</span>
        <span class="nc">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">();</span>
    <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p>The program is practically identical, but this will fail with a
compile-time error. This is the result of <em>type erasure</em>. Unlike C++’s
templates, there will only ever be one compiled version of Caller, and
T will become Object. Since Object has no <code class="language-plaintext highlighter-rouge">call()</code> method, compilation
fails. The generic type is only for enabling additional compiler
checks later on.</p>

<p>C++ templates behave like a macros, expanded by the compiler once for
each different type of applied parameter. The <code class="language-plaintext highlighter-rouge">call</code> symbol is looked
up later, after the type has been fully realized, not when the
template is defined.</p>

<p>To fix this, Foo and Bar need a common ancestry. Let’s make this
<code class="language-plaintext highlighter-rouge">Callee</code>.</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">interface</span> <span class="nc">Callee</span> <span class="o">{</span>
    <span class="kt">void</span> <span class="nf">call</span><span class="o">();</span>
<span class="o">}</span>
</code></pre></div></div>

<p>Caller needs to be redefined such that T is a subclass of Callee.</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">class</span> <span class="nc">Caller</span><span class="o">&lt;</span><span class="no">T</span> <span class="kd">extends</span> <span class="nc">Callee</span><span class="o">&gt;</span> <span class="o">{</span>
    <span class="c1">// ...</span>
<span class="o">}</span>
</code></pre></div></div>

<p>This now compiles cleanly because <code class="language-plaintext highlighter-rouge">call()</code> will be found in <code class="language-plaintext highlighter-rouge">Callee</code>.
Finally, implement Callee.</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">class</span> <span class="nc">Foo</span> <span class="kd">implements</span> <span class="nc">Callee</span> <span class="o">{</span>
    <span class="c1">// ...</span>
<span class="o">}</span>

<span class="kd">class</span> <span class="nc">Bar</span> <span class="kd">implements</span> <span class="nc">Callee</span> <span class="o">{</span>
    <span class="c1">// ...</span>
<span class="o">}</span>
</code></pre></div></div>

<p>This is no longer duck typing, just plain old polymorphism. Type
erasure prohibits duck typing in Java (outside of dirty reflection
hacks).</p>

<h3 id="signals-and-slots-and-events-oh-my">Signals and Slots and Events! Oh My!</h3>

<p>Duck typing is useful for implementing the observer pattern without as
much boilerplate. A class can participate in the observer pattern
without <a href="http://raganwald.com/2014/03/31/class-hierarchies-dont-do-that.html">inheriting from some specialized class</a> or interface.
For example, see <a href="http://en.wikipedia.org/wiki/Signals_and_slots">the various signal and slots systems for C++</a>.
In constrast, Java <a href="http://docs.oracle.com/javase/7/docs/api/java/util/EventListener.html">has an EventListener type for everything</a>:</p>

<ul>
  <li>KeyListener</li>
  <li>MouseListener</li>
  <li>MouseMotionListener</li>
  <li>FocusListener</li>
  <li>ActionListener, etc.</li>
</ul>

<p>A class concerned with many different kinds of events, such as an
event logger, would need to inherit a large number of interfaces.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>The Physical Analog for Encryption is the Hyperdrive</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/08/06/"/>
    <id>urn:uuid:89cd4041-90bf-3536-3bda-c1fe56e26383</id>
    <updated>2012-08-06T00:00:00Z</updated>
    <category term="crypto"/><category term="compsci"/>
    <content type="html">
      <![CDATA[<p>I was recently
<a href="http://www.youtube.com/playlist?list=PL80F8C1F2AE9B29DD">watching GetDaved play</a>
through
<a href="http://en.wikipedia.org/wiki/Star_Wars:_X-Wing_Alliance">X-Wing Alliance</a>,
a game I myself played in college. I have a lot of nostalgia for it,
especially because
<a href="http://en.wikipedia.org/wiki/Star_Wars:_TIE_Fighter">TIE Fighter</a> was
the first games I ever invested a lot of time into playing. Just
hearing the sounds and music brings back relaxing memories.</p>

<p>In one of the early missions the player travels through hyperspace
(which ain’t like dusting crops)
<a href="http://youtu.be/SeB1sn_6Zhk">to a storage area</a> located in deep
space. It’s a family business and the player is out there to take
inventory of storage containers. Like when I
<a href="/blog/2008/12/16/">saw the wormhole minefield in Deep Space 9</a>, it
got me thinking, “<em>Why?</em>” Why keep all these storage containers in
deep space? There’s no defense or security out there to stop someone
from stealing containers. It seems like it would be better to store
those at the home base where they can be protected.</p>

<p><img src="/img/misc/deep-space.jpg" alt="" /></p>

<p>Storing items at random locations in deep space is actually <em>very</em>
secure — more so than any lock! Space is <em>huge</em>. Even with
faster-than-light travel searching a galaxy for a storage location
would be impractical. It would be as impractical as using brute-force
to find an encryption key — another huge search space. Also, if the
storage location as been in use for <em>X</em> years, you’d need to come
within <em>X</em> light-years of it, at least, in order to find it, since
even gravity itself is limited by the speed of light.</p>

<p>Physical locks are usually described as the physical analogy of
cryptography. Honestly, it’s not a very good analogy. The brute-force
method for bypassing a lock isn’t to keep trying different keys or
combinations until it works. No, it’s to just smash something (a
window, the lock) or pick the lock. When translated back into the
crypto world that’s like breaking a cipher, which isn’t a practical
attack in modern cryptography.</p>

<p>No, the physical analogy for cryptography is deep space storage. The
only practical way to access deep space items is to learn the
coordinates of the storage location, which is the equivalent of the
encryption key. If the coordinates are lost or forgotten, the items
are as good as destroyed, just like data.</p>

<p>There are actually some advantages of physical “encryption.”
Ciphertext can be decrypted offline without being detected. It’s not
possible to visit deep space storage without having a physical
presence, which is certainly more detectable than offline
decryption. There’s also the advantage that it’s somewhat easier to
tell when the key (location) generation algorithm is busted or you’re
just bad at picking passphrases: someone else’s stuff will already be
there. A <em>literal</em> collision.</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>A Fractran Short Story</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2010/03/09/"/>
    <id>urn:uuid:8f7a913f-5d94-3fc0-b692-f536c10ad8f1</id>
    <updated>2010-03-09T00:00:00Z</updated>
    <category term="story"/><category term="compsci"/>
    <content type="html">
      <![CDATA[<!-- 9 March 2010 -->
<p>
<a href="http://en.wikipedia.org/wiki/Fractran">Fractran</a> is a
Turing-complete esoteric programming language. A Fractran program is
just an ordered list of positive, irreducible fractions. The program's
output for an input <i>n</i> is the output of the program run
on <i>n</i> multiplied by the first fraction in the list that results
in an integer. If no such multiplication results in an integer, the
output is the input <i>n</i>. Variables are encoded in the exponents
of the prime factorization of the input and output.
</p>
<p>
Some time ago I thought up an idea for a short story involving
Fractran. A mathematician accidentally creates a Fractran program that
can trivially factor large composites. Think something like
O(log <i>n</i>). It's just the right magical string of, say, 31
fractions.
</p>
<p>
The story would be a first-person narrative of the mathematician's
thoughts during a short time after the discovery, considering many of
the consequences of the program. For example, it would render much of
cryptography, which plays an essential role in the modern world,
useless. He would also wonder if mankind should deserve such a
discovery, considering how accidental it was.
</p>
<p>
This whole idea vanished once I realized that this Fractran program is
actually completely trivial. It even runs in O(1) time. It's so
trivial as to be worthless. Remember that Fractran stores its data in
the number's prime factorization? The Fractran program that can factor
any number in constant time is the identity function. To decode the
output, which matches the input, all you need to do is factor it!
</p>
<p>
Interestingly, it doesn't seem to actually be possible to implement
the identity function in Fractran (<i>But somehow it's
Turing-complete? Hmmm... more investigation needed.</i>), unless you
can define your program in terms of its input. For example, the
program <code>1/(n+1)</code> is the identity function for
input <i>n</i>.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Unorderable Sets</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2009/09/27/"/>
    <id>urn:uuid:616a14f1-ad6b-334b-f1c3-d3aadba6dc44</id>
    <updated>2009-09-27T00:00:00Z</updated>
    <category term="math"/><category term="compsci"/>
    <content type="html">
      <![CDATA[<!-- 27 September 2009 -->
<p>
<img src="/img/diagram/dag.png" alt=""
     title="Directed acyclic graph (DAG)" class="right"/>

Under <a href="http://devrand.org/">Gavin</a>'s suggestion, I've been
watching <a href="http://en.wikipedia.org/wiki/The_Prisoner"> The
Prisoner</a>, a 1960's British television show. The main character is
an ex-spy held prisoner in "the Village", an Orwellian, isolated,
enclosed town. No one in the Village has a name, but is instead
assigned a number. The main character's number is 6.
</p>
<p>
As far as I can tell, after number 2 the order of the numbers is not
important. Number 56 is no more important than number 12. By using
numbers to name things there is an implied ordering, even if the the
ordering is insignificant. It could be misleading to a newcomer.
</p>
<p>
Is there an unordered set could be used to name things? More
specifically, is there a set that <i>cannot</i> be ordered? If it is
unorderable then there is no implicit ordering to cause
confusion. It's easy to have an unorderable set in theory, but I think
it is difficult to have in practice.
</p>
<p>
Using letters is obviously out, as the alphabet has an order. Words
and names made of letters can be sorted according to the alphabet.
However, the ability to order words is almost never used outside of
indexing. If words are used to name things, a newcomer is unlikely to
assume relationships based on ordering. No one will assume Alan is
more important than Bob.
</p>
<p>
Large numbers also tend to lack an assumed order. I don't think anyone
assumes a larger or smaller social security number has meaning, or a
larger or smaller phone number. However, these values are also known
to be handed out in some semi-random way.
</p>
<p>
But can we do better? For at least English speakers, is it possible to
create an unorderable set? If the items in the set have a vocal
pronunciation, then they can probably be ordered by their
phonetics. That could be avoided by using non-standard phonetic
components, like clicks and pops, which won't have a standard ordering
(in English, anyway).
</p>
<p>
A set has an order if there is a <a
href="http://en.wikipedia.org/wiki/Total_relation"> total</a>, <a
href="http://en.wikipedia.org/wiki/Transitive_relation">transitive</a>,
<a href="http://en.wikipedia.org/wiki/Relational_operator"> relational
operator</a> for the set. If such an operator does not exist then the
set isn't linearly ordered. I want a set that can't easily have such
an operator.
</p>
<p>
If a set of symbols was created, how might they be presented as to
show no ordering. The order of the symbols in the original
presentation might be considered the ordering, like how the alphabet
is always presented in order. A circle could be used, but this is
circularly ordered. I think there is also the issue of memorization. A
human will have a much better time memorizing the symbols if memorized
in some order. For example, try naming all the letters of the alphabet
at random, without repeats. Or US states.
</p>
<p>
Thanks to modern day technology, with dynamic content, the set could
be displayed in a random order each time it is viewed. For a web page,
the server could select a random order, or a JavaScript program could
reorder the images at random.
</p>
<p>
There could be partially ordered sets, like hierarchies and <a
href="http://en.wikipedia.org/wiki/Directed_acyclic_graph">DAGs</a>. The
ordering in The Prisoner is one of these. There is number 1, then
number 2, then everyone else. Is there a partially ordered set in use
that has unique names at the same level?
</p>
<p>
The penalties incurred by intentionally prohibiting order would likely
outweigh the benefit of the set. If it's not orderable, we can't index
it, and it's difficult to deal with. I expect it's much easer to just
use numbers and tell people that the order isn't important, or just
use an obviously unordered set.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Lisp Number Representations</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2008/03/15/"/>
    <id>urn:uuid:32c3088b-0e23-3299-0efe-03bbcdd5ce52</id>
    <updated>2008-03-15T00:00:00Z</updated>
    <category term="lisp"/><category term="compsci"/>
    <content type="html">
      <![CDATA[<!-- 15 March 2008 -->
<p>
This exercise partly comes from a couple different chapters in the
book <a href="http://www.ccs.neu.edu/home/matthias/BTLS/"><i>The
Little Schemer</i></a>. The book is an introduction to the Scheme
programming language, a dialect of Lisp. The purpose to to teach basic
programming concepts in a way that anyone can follow along just as
well as someone with a degree in, say, computer science. It is still
very useful for us programmer types because there are some good
practice you get from reading and playing along.
</p>
<p>
First of all, Lisp is famous (infamous?) for lacking syntax. Any Lisp
program is simply
an <a href="http://en.wikipedia.org/wiki/S-expression">S-expression</a>,
put simply, a list of lists. There is no operator precedence because
operators are treated just like functions. This leads to prefix
notation for mathematical expressions,
</p>
<pre>
(+ 4 5)
=&gt; 9
</pre>
<p>
where the <code>=&gt;</code> indicates the result of evaluating the
expression. We can apply as many operands as we want,
</p>
<pre>
(+ 2 3 4 5 10)
=&gt; 24
</pre>
<p>
We can put another list right in there as an operand,
</p>
<pre>
(+ 3 (* 2 5) 4)
=&gt; 17
</pre>
<p>
You get the idea. In a function, the value of the last expression is
the return value. For example, here is the <code>square</code>
function in Scheme, which squares its input,
</p>
<pre>
(define (square x)
  (* x x))
</pre>
<p>
Then we can use it,
</p>
<pre>
(+ (square 2) (square 5))
=&gt; 29
</pre>
<p>
There are three important list operators to understand as
well: <code>car</code>, <code>cdr</code>,
and <code>cons</code>. <code>car</code> returns the first element in a
list. In the example below, the <code>'</code>, a single quote, tells
the interpreter or compiler that the list is to be treated as data and
not to be executed. This is shorthand, or syntactic sugar, for
the <code>quote</code> operator: <code>(quote (stallman
moglen))</code> is the same as <code>'(stallman moglen)</code>.
</p>
<pre>
(car '(stallman moglen lessig))
=&gt; stallman
</pre>
<p>
<code>cdr</code> returns the "rest" of a list (everything <i>but</i>
the <code>car</code> of the list). When passing a list with only one
element <code>cdr</code> returns the empty list: <code>()</code>.
</p>
<pre>
(cdr '(stallman moglen lessig))
=&gt; (moglen lessig)
(cdr '(stallman))
=&gt; ()
</pre>
<p>
We can ask if a list is empty or not
with <code>null?</code>. <code>#t</code> and <code>#f</code> are true
and false.
</p>
<pre>
(null? '(stallman moglen lessig))
=&gt; #f
(null? '())
=&gt; #t
</pre>
<p>
And finally, for lists, we have <code>cons</code>. This function
allows us to build a list. It glues the first argument to the front of
the list in the second argument,
</p>
<pre>
(cons 'stallman '(moglen lessig))
=&gt; (stallman moglen lessig)
(cons 'stallman '())
=&gt; (stallman)
</pre>
<p>
And one last function you need to know: <code>eq?</code>. It
determines the two atoms are the same atom,
</p>
<pre>
(eq? 'stallman 'moglen)
=&gt; #f
(eq? 'stallman 'stallman)
=&gt; #t
</pre>
<p>
Now, for this exercise we will pretend that the basic arithmetic
functions have not been defined for us. Instead all we have
is <code>add1</code> and <code>sub1</code>, each of which adds or
subtracts 1 from its argument respectively.
</p>
<pre>
(add1 5)
=&gt; 6
(sub1 5)
=&gt; 4
</pre>
<p>
Oh, I almost forgot. We also have the <code>zero?</code> function
defined for us, which tells us if its argument is 0 or not. Notice
that functions that return true or false, called predicates, have
a <code>?</code> on the end.
</p>
<pre>
(zero? 2)
=&gt; #f
(zero? 0)
=&gt; #t
</pre>
<p>
To make things simple, these definitions will only consider positive
numbers. We can define the <code>+</code> function (for only two
arguments) in terms of the three basic functions shown above. It might
be interesting to try to write this yourself before you look any
further. (Hint: define it recursively!)
</p>
<pre>
;; Adds together n and m
(define (+ n m)
  (if (zero? m) n
      (add1 (+ n (sub1 m)))))
</pre>
<p>
If the second argument is 0 we are done and simply return the first
argument. If not, we add 1 to <code>n + (m -
1)</code>. The <code>-</code> function is defined similarly.
</p>
<pre>
;; Subtracts m from n
(define (- n m)
  (if (zero? m) n
      (sub1 (- n (sub1 m)))))
</pre>
<p>
Multiplication is the act of performing addition many times. We can go
on defining it in terms of addition,
</p>
<pre>
(define (* n m)
  (if (zero? m) 0
      (+ n (* n (sub1 m)))))
</pre>
<p>
(We'll leave division as an exercise for the reader as it gets a
little more complicated than I need to go in order to get my overall
point across.)
</p>
<p>
We will leave math behind for a moment take a look at
<a href="http://www.paulgraham.com/rootsofLisp.html">The Roots of
Lisp</a>. In that link is an excellent paper written by Paul Graham
about John McCarthy, the inventor (or perhaps discoverer?) of Lisp,
and how Lisp came to be. It turns out that in order to have a fully
functional Lisp engine we only need seven primitive operators:
operators defined outside of the language itself as building blocks
for the language. For Lisp these seven operators are (Scheme-ized for
our purposes): <code>eq?</code>, <code>atom?</code>, <code>car</code>,
<code>cdr</code>, <code>cons</code>, <code>quote</code>, and
<code>if</code>.
</p>
<p>
Notice how none of these are math operators. You may wonder how we can
possibly perform mathematical operations when we lack these
facilities. The answer: we have to define our own representation for
numbers! Let's try this, define a number as a list of empty lists. So,
the number 3 is,
</p>
<pre>
'(() () ())
</pre>
<p>
And here is 0, 2, and 4,
</p>
<pre>
'()
'(() ())
'(() () () ())
</pre>
<p>
See how that works? Before, when we wanted to define addition and
subtraction, we needed three other
functions: <code>zero?</code>, <code>add1</code>,
and <code>sub1</code>. With our number representation, how could we
define <code>add1</code> with our seven primitive operators? Our
numbers are defined as lists, so we can use our list operators. To add
1 to a number, we append another empty list. Hey, that sounds a lot
like <code>cons</code>!
</p>
<pre>
(define (add1 n)
  (cons '() n))
</pre>
<p>
Subtraction is removing an element from the list, which sounds a lot
like <code>cdr</code>,
</p>
<pre>
(define (sub1 n)
  (cdr n))
</pre>
<p>
And to define <code>zero?</code> we need to check for an empty
list. Notice this will also be the definition for <code>null?</code>.
</p>
<pre>
(define (zero? n)
  (eq? '() n))
</pre>
<p>
And now we are back where we started. In fact, you can use the exact
definitions above to define <code>+</code>, <code>-</code>,
and <code>*</code>. Our entire method number representation depends on
how we define <code>add1</code>, <code>sub1</code>,
and <code>zero?</code>. Let's try it out,
</p>
<pre>
;; 3 + 4
(+ '(() () ()) '(() () () ()))
=&gt; (() () () () () () ())

;; 5 - 2
(- '(() () () () ()) '(() ()))
=&gt; (() () ())

;; 2 * 2
(* '(() ()) '(() ()))
=&gt; (() () () ())

;; 3 + 4 * 2   bolded for clarity
(+ (* '(<b>() () () ()</b>) '(<b>() ()</b>)) '(<b>() () ()</b>))
=&gt; (() () () () () () () () () () ())
</pre>
<p>
Pretty cool, huh? We just added arithmetic (albeit extremely simple)
to our basic Lisp engine. With some modifications we should be able to
define and operate on negative integers and even define any rational
number (limited by how much memory your computer's hardware can
provide).
</p>
<p>
Now, thank goodness this isn't how real Lisp implementations actually
handle numbers. It would be incredibly slow and impractical, not to
mention annoying to read. Normally, numbers and math operators are
primitive so that they are fast.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  

</feed>
