<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title>Articles tagged lua at null program</title>
  <link rel="alternate" type="text/html"
        href="https://nullprogram.com/tags/lua/"/>
  <link rel="self" type="application/atom+xml"
        href="https://nullprogram.com/tags/lua/feed/"/>
  <updated>2026-04-09T13:25:45Z</updated>
  <id>urn:uuid:8681fa33-3a8a-46e2-ba71-b2d6bb236a5f</id>

  <author>
    <name>Christopher Wellons</name>
    <uri>https://nullprogram.com</uri>
    <email>wellons@nullprogram.com</email>
  </author>

  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>State machines are wonderful tools</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2020/12/31/"/>
    <id>urn:uuid:c93d7a7b-6ae0-4b7e-afa6-424ef40b9d9c</id>
    <updated>2020-12-31T22:48:13Z</updated>
    <category term="compsci"/><category term="c"/><category term="python"/><category term="lua"/><category term="optimization"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=25601821">on Hacker News</a>.</em></p>

<p>I love when my current problem can be solved with a state machine. They’re
fun to design and implement, and I have high confidence about correctness.
They tend to:</p>

<ol>
  <li>Present <a href="/blog/2018/06/10/">minimal, tidy interfaces</a></li>
  <li>Require few, fixed resources</li>
  <li>Hold no opinions about input and output</li>
  <li>Have a compact, concise implementation</li>
  <li>Be easy to reason about</li>
</ol>

<p>State machines are perhaps one of those concepts you heard about in
college but never put into practice. Maybe you use them regularly.
Regardless, you certainly run into them regularly, from <a href="https://swtch.com/~rsc/regexp/">regular
expressions</a> to traffic lights.</p>

<!--more-->

<h3 id="morse-code-decoder-state-machine">Morse code decoder state machine</h3>

<p>Inspired by <a href="https://possiblywrong.wordpress.com/2020/11/21/among-us-morse-code-puzzle/">a puzzle</a>, I came up with this deterministic state
machine for decoding <a href="https://en.wikipedia.org/wiki/Morse_code">Morse code</a>. It accepts a dot (<code class="language-plaintext highlighter-rouge">'.'</code>), dash
(<code class="language-plaintext highlighter-rouge">'-'</code>), or terminator (0) one at a time, advancing through a state
machine step by step:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">morse_decode</span><span class="p">(</span><span class="kt">int</span> <span class="n">state</span><span class="p">,</span> <span class="kt">int</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">static</span> <span class="k">const</span> <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">t</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
        <span class="mh">0x03</span><span class="p">,</span> <span class="mh">0x3f</span><span class="p">,</span> <span class="mh">0x7b</span><span class="p">,</span> <span class="mh">0x4f</span><span class="p">,</span> <span class="mh">0x2f</span><span class="p">,</span> <span class="mh">0x63</span><span class="p">,</span> <span class="mh">0x5f</span><span class="p">,</span> <span class="mh">0x77</span><span class="p">,</span> <span class="mh">0x7f</span><span class="p">,</span> <span class="mh">0x72</span><span class="p">,</span>
        <span class="mh">0x87</span><span class="p">,</span> <span class="mh">0x3b</span><span class="p">,</span> <span class="mh">0x57</span><span class="p">,</span> <span class="mh">0x47</span><span class="p">,</span> <span class="mh">0x67</span><span class="p">,</span> <span class="mh">0x4b</span><span class="p">,</span> <span class="mh">0x81</span><span class="p">,</span> <span class="mh">0x40</span><span class="p">,</span> <span class="mh">0x01</span><span class="p">,</span> <span class="mh">0x58</span><span class="p">,</span>
        <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x68</span><span class="p">,</span> <span class="mh">0x51</span><span class="p">,</span> <span class="mh">0x32</span><span class="p">,</span> <span class="mh">0x88</span><span class="p">,</span> <span class="mh">0x34</span><span class="p">,</span> <span class="mh">0x8c</span><span class="p">,</span> <span class="mh">0x92</span><span class="p">,</span> <span class="mh">0x6c</span><span class="p">,</span> <span class="mh">0x02</span><span class="p">,</span>
        <span class="mh">0x03</span><span class="p">,</span> <span class="mh">0x18</span><span class="p">,</span> <span class="mh">0x14</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x10</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x0c</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span>
        <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x08</span><span class="p">,</span> <span class="mh">0x1c</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span>
        <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x20</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x24</span><span class="p">,</span>
        <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x28</span><span class="p">,</span> <span class="mh">0x04</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x30</span><span class="p">,</span> <span class="mh">0x31</span><span class="p">,</span> <span class="mh">0x32</span><span class="p">,</span> <span class="mh">0x33</span><span class="p">,</span> <span class="mh">0x34</span><span class="p">,</span> <span class="mh">0x35</span><span class="p">,</span>
        <span class="mh">0x36</span><span class="p">,</span> <span class="mh">0x37</span><span class="p">,</span> <span class="mh">0x38</span><span class="p">,</span> <span class="mh">0x39</span><span class="p">,</span> <span class="mh">0x41</span><span class="p">,</span> <span class="mh">0x42</span><span class="p">,</span> <span class="mh">0x43</span><span class="p">,</span> <span class="mh">0x44</span><span class="p">,</span> <span class="mh">0x45</span><span class="p">,</span> <span class="mh">0x46</span><span class="p">,</span>
        <span class="mh">0x47</span><span class="p">,</span> <span class="mh">0x48</span><span class="p">,</span> <span class="mh">0x49</span><span class="p">,</span> <span class="mh">0x4a</span><span class="p">,</span> <span class="mh">0x4b</span><span class="p">,</span> <span class="mh">0x4c</span><span class="p">,</span> <span class="mh">0x4d</span><span class="p">,</span> <span class="mh">0x4e</span><span class="p">,</span> <span class="mh">0x4f</span><span class="p">,</span> <span class="mh">0x50</span><span class="p">,</span>
        <span class="mh">0x51</span><span class="p">,</span> <span class="mh">0x52</span><span class="p">,</span> <span class="mh">0x53</span><span class="p">,</span> <span class="mh">0x54</span><span class="p">,</span> <span class="mh">0x55</span><span class="p">,</span> <span class="mh">0x56</span><span class="p">,</span> <span class="mh">0x57</span><span class="p">,</span> <span class="mh">0x58</span><span class="p">,</span> <span class="mh">0x59</span><span class="p">,</span> <span class="mh">0x5a</span>
    <span class="p">};</span>
    <span class="kt">int</span> <span class="n">v</span> <span class="o">=</span> <span class="n">t</span><span class="p">[</span><span class="o">-</span><span class="n">state</span><span class="p">];</span>
    <span class="k">switch</span> <span class="p">(</span><span class="n">c</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">case</span> <span class="mh">0x00</span><span class="p">:</span> <span class="k">return</span> <span class="n">v</span> <span class="o">&gt;&gt;</span> <span class="mi">2</span> <span class="o">?</span> <span class="n">t</span><span class="p">[(</span><span class="n">v</span> <span class="o">&gt;&gt;</span> <span class="mi">2</span><span class="p">)</span> <span class="o">+</span> <span class="mi">63</span><span class="p">]</span> <span class="o">:</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">case</span> <span class="mh">0x2e</span><span class="p">:</span> <span class="k">return</span> <span class="n">v</span> <span class="o">&amp;</span>  <span class="mi">2</span> <span class="o">?</span> <span class="n">state</span><span class="o">*</span><span class="mi">2</span> <span class="o">-</span> <span class="mi">1</span> <span class="o">:</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">case</span> <span class="mh">0x2d</span><span class="p">:</span> <span class="k">return</span> <span class="n">v</span> <span class="o">&amp;</span>  <span class="mi">1</span> <span class="o">?</span> <span class="n">state</span><span class="o">*</span><span class="mi">2</span> <span class="o">-</span> <span class="mi">2</span> <span class="o">:</span> <span class="mi">0</span><span class="p">;</span>
    <span class="nl">default:</span>   <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>It typically compiles to under 200 bytes (table included), requires only a
few bytes of memory to operate, and will fit on even the smallest of
microcontrollers. The full source listing, documentation, and
comprehensive test suite:</p>

<p><a href="https://github.com/skeeto/scratch/blob/master/parsers/morsecode.c">https://github.com/skeeto/scratch/blob/master/parsers/morsecode.c</a></p>

<p>The state machine is trie-shaped, and the 100-byte table <code class="language-plaintext highlighter-rouge">t</code> is the static
<a href="/blog/2016/11/15/">encoding of the Morse code trie</a>:</p>

<p><a href="/img/diagram/morse.dot"><img src="/img/diagram/morse.svg" alt="" /></a></p>

<p>Dots traverse left, dashes right, terminals emit the character at the
current node (terminal state). Stopping on red nodes, or attempting to
take an unlisted edge is an error (invalid input).</p>

<p>Each node in the trie is a byte in the table. Dot and dash each have a bit
indicating if their edge exists. The remaining bits index into a 1-based
character table (at the end of <code class="language-plaintext highlighter-rouge">t</code>), and a 0 “index” indicates an empty
(red) node. The nodes themselves are laid out as <a href="https://en.wikipedia.org/wiki/Binary_heap#Heap_implementation">a binary heap in an
array</a>: the left and right children of the node at <code class="language-plaintext highlighter-rouge">i</code> are found at
<code class="language-plaintext highlighter-rouge">i*2+1</code> and <code class="language-plaintext highlighter-rouge">i*2+2</code>. No need to <a href="/blog/2020/10/19/#minimax-costs">waste memory storing edges</a>!</p>

<p>Since C sadly does not have multiple return values, I’m using the sign bit
of the return value to create a kind of sum type. A negative return value
is a state — which is why the state is negated internally before use. A
positive result is a character output. If zero, the input was invalid.
Only the initial state is non-negative (zero), which is fine since it’s,
by definition, not possible to traverse to the initial state. No <code class="language-plaintext highlighter-rouge">c</code> input
will produce a bad state.</p>

<p>In the original problem the terminals were missing. Despite being a <em>state
machine</em>, <code class="language-plaintext highlighter-rouge">morse_decode</code> is a pure function. The caller can save their
position in the trie by saving the state integer and trying different
inputs from that state.</p>

<h3 id="utf-8-decoder-state-machine">UTF-8 decoder state machine</h3>

<p>The classic UTF-8 decoder state machine is <a href="https://bjoern.hoehrmann.de/utf-8/decoder/dfa/">Bjoern Hoehrmann’s Flexible
and Economical UTF-8 Decoder</a>. It packs the entire state machine into
a relatively small table using clever tricks. It’s easily my favorite
UTF-8 decoder.</p>

<p>I wanted to try my own hand at it, so I re-derived the same canonical
UTF-8 automaton:</p>

<p><a href="/img/diagram/utf8.dot"><img src="/img/diagram/utf8.svg" alt="" /></a></p>

<p>Then I encoded this diagram directly into a much larger (2,064-byte), less
elegant table, too large to display inline here:</p>

<p><a href="https://github.com/skeeto/scratch/blob/master/parsers/utf8_decode.c">https://github.com/skeeto/scratch/blob/master/parsers/utf8_decode.c</a></p>

<p>However, the trade-off is that the executable code is smaller, faster, and
<a href="/blog/2017/10/06/">branchless again</a> (by accident, I swear!):</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">utf8_decode</span><span class="p">(</span><span class="kt">int</span> <span class="n">state</span><span class="p">,</span> <span class="kt">long</span> <span class="o">*</span><span class="n">cp</span><span class="p">,</span> <span class="kt">int</span> <span class="n">byte</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">static</span> <span class="k">const</span> <span class="kt">signed</span> <span class="kt">char</span> <span class="n">table</span><span class="p">[</span><span class="mi">8</span><span class="p">][</span><span class="mi">256</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="cm">/* ... */</span> <span class="p">};</span>
    <span class="k">static</span> <span class="k">const</span> <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">masks</span><span class="p">[</span><span class="mi">2</span><span class="p">][</span><span class="mi">8</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="cm">/* ... */</span> <span class="p">};</span>
    <span class="kt">int</span> <span class="n">next</span> <span class="o">=</span> <span class="n">table</span><span class="p">[</span><span class="n">state</span><span class="p">][</span><span class="n">byte</span><span class="p">];</span>
    <span class="o">*</span><span class="n">cp</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">cp</span> <span class="o">&lt;&lt;</span> <span class="mi">6</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">byte</span> <span class="o">&amp;</span> <span class="n">masks</span><span class="p">[</span><span class="o">!</span><span class="n">state</span><span class="p">][</span><span class="n">next</span><span class="o">&amp;</span><span class="mi">7</span><span class="p">]);</span>
    <span class="k">return</span> <span class="n">next</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Like Bjoern’s decoder, there’s a code point accumulator. The <em>real</em> state
machine has 1,109,950 terminal states, and many more edges and nodes. The
accumulator is an optimization to track exactly which edge was taken to
which node without having to represent such a monstrosity.</p>

<p>Despite the huge table I’m pretty happy with it.</p>

<h3 id="word-count-state-machine">Word count state machine</h3>

<p>Here’s another state machine I came up with awhile back for counting words
one Unicode code point at a time while accounting for Unicode’s various
kinds of whitespace. If your input is bytes, then plug this into the above
UTF-8 state machine to convert bytes to code points! This one uses a
switch instead of a lookup table since the table would be sparse (i.e.
<a href="/blog/2019/12/09/">let the compiler figure it out</a>).</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* State machine counting words in a sequence of code points.
 *
 * The current word count is the absolute value of the state, so
 * the initial state is zero. Code points are fed into the state
 * machine one at a time, each call returning the next state.
 */</span>
<span class="kt">long</span> <span class="nf">word_count</span><span class="p">(</span><span class="kt">long</span> <span class="n">state</span><span class="p">,</span> <span class="kt">long</span> <span class="n">codepoint</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">switch</span> <span class="p">(</span><span class="n">codepoint</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">case</span> <span class="mh">0x0009</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x000a</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x000b</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x000c</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x000d</span><span class="p">:</span>
    <span class="k">case</span> <span class="mh">0x0020</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x0085</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x00a0</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x1680</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2000</span><span class="p">:</span>
    <span class="k">case</span> <span class="mh">0x2001</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2002</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2003</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2004</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2005</span><span class="p">:</span>
    <span class="k">case</span> <span class="mh">0x2006</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2007</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2008</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2009</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x200a</span><span class="p">:</span>
    <span class="k">case</span> <span class="mh">0x2028</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x2029</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x202f</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x205f</span><span class="p">:</span> <span class="k">case</span> <span class="mh">0x3000</span><span class="p">:</span>
        <span class="k">return</span> <span class="n">state</span> <span class="o">&lt;</span> <span class="mi">0</span> <span class="o">?</span> <span class="o">-</span><span class="n">state</span> <span class="o">:</span> <span class="n">state</span><span class="p">;</span>
    <span class="nl">default:</span>
        <span class="k">return</span> <span class="n">state</span> <span class="o">&lt;</span> <span class="mi">0</span> <span class="o">?</span> <span class="n">state</span> <span class="o">:</span> <span class="o">-</span><span class="mi">1</span> <span class="o">-</span> <span class="n">state</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I’m particularly happy with the <em>edge-triggered</em> state transition
mechanism. The sign of the state tracks whether the “signal” is “high”
(inside of a word) or “low” (outside of a word), and so it counts rising
edges.</p>

<p><a href="/img/diagram/wordcount.dot"><img src="/img/diagram/wordcount.svg" alt="" /></a></p>

<p>The counter is not <em>technically</em> part of the state machine — though it
eventually overflows for practical reasons, it isn’t really “finite” — but
is rather an external count of the times the state machine transitions
from low to high, which is the actual, useful output.</p>

<p><em>Reader challenge</em>: Find a slick, efficient way to encode all those code
points as a table rather than rely on whatever the compiler generates for
the <code class="language-plaintext highlighter-rouge">switch</code> (chain of branches, jump table?).</p>

<h3 id="coroutines-and-generators-as-state-machines">Coroutines and generators as state machines</h3>

<p>In languages that support them, state machines can be implemented using
coroutines, including generators. I do particularly like the idea of
<a href="/blog/2018/05/31/">compiler-synthesized coroutines</a> as state machines, though this is a
rare treat. The state is implicit in the coroutine at each yield, so the
programmer doesn’t have to manage it explicitly. (Though often that
explicit control is powerful!)</p>

<p>Unfortunately in practice it always feels clunky. The following implements
the word count state machine (albeit in a rather un-Pythonic way). The
generator returns the current count and is continued by sending it another
code point:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">WHITESPACE</span> <span class="o">=</span> <span class="p">{</span>
    <span class="mh">0x0009</span><span class="p">,</span> <span class="mh">0x000a</span><span class="p">,</span> <span class="mh">0x000b</span><span class="p">,</span> <span class="mh">0x000c</span><span class="p">,</span> <span class="mh">0x000d</span><span class="p">,</span>
    <span class="mh">0x0020</span><span class="p">,</span> <span class="mh">0x0085</span><span class="p">,</span> <span class="mh">0x00a0</span><span class="p">,</span> <span class="mh">0x1680</span><span class="p">,</span> <span class="mh">0x2000</span><span class="p">,</span>
    <span class="mh">0x2001</span><span class="p">,</span> <span class="mh">0x2002</span><span class="p">,</span> <span class="mh">0x2003</span><span class="p">,</span> <span class="mh">0x2004</span><span class="p">,</span> <span class="mh">0x2005</span><span class="p">,</span>
    <span class="mh">0x2006</span><span class="p">,</span> <span class="mh">0x2007</span><span class="p">,</span> <span class="mh">0x2008</span><span class="p">,</span> <span class="mh">0x2009</span><span class="p">,</span> <span class="mh">0x200a</span><span class="p">,</span>
    <span class="mh">0x2028</span><span class="p">,</span> <span class="mh">0x2029</span><span class="p">,</span> <span class="mh">0x202f</span><span class="p">,</span> <span class="mh">0x205f</span><span class="p">,</span> <span class="mh">0x3000</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">def</span> <span class="nf">wordcount</span><span class="p">():</span>
    <span class="n">count</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
        <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
            <span class="c1"># low signal
</span>            <span class="n">codepoint</span> <span class="o">=</span> <span class="k">yield</span> <span class="n">count</span>
            <span class="k">if</span> <span class="n">codepoint</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">WHITESPACE</span><span class="p">:</span>
                <span class="n">count</span> <span class="o">+=</span> <span class="mi">1</span>
                <span class="k">break</span>
        <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
            <span class="c1"># high signal
</span>            <span class="n">codepoint</span> <span class="o">=</span> <span class="k">yield</span> <span class="n">count</span>
            <span class="k">if</span> <span class="n">codepoint</span> <span class="ow">in</span> <span class="n">WHITESPACE</span><span class="p">:</span>
                <span class="k">break</span>
</code></pre></div></div>

<p>However, the generator ceremony dominates the interface, so you’d probably
want to wrap it in something nicer — at which point there’s really no
reason to use the generator in the first place:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">wc</span> <span class="o">=</span> <span class="n">wordcount</span><span class="p">()</span>
<span class="nb">next</span><span class="p">(</span><span class="n">wc</span><span class="p">)</span>  <span class="c1"># prime the generator
</span><span class="n">wc</span><span class="p">.</span><span class="n">send</span><span class="p">(</span><span class="nb">ord</span><span class="p">(</span><span class="s">'A'</span><span class="p">))</span>  <span class="c1"># =&gt; 1
</span><span class="n">wc</span><span class="p">.</span><span class="n">send</span><span class="p">(</span><span class="nb">ord</span><span class="p">(</span><span class="s">' '</span><span class="p">))</span>  <span class="c1"># =&gt; 1
</span><span class="n">wc</span><span class="p">.</span><span class="n">send</span><span class="p">(</span><span class="nb">ord</span><span class="p">(</span><span class="s">'B'</span><span class="p">))</span>  <span class="c1"># =&gt; 2
</span><span class="n">wc</span><span class="p">.</span><span class="n">send</span><span class="p">(</span><span class="nb">ord</span><span class="p">(</span><span class="s">' '</span><span class="p">))</span>  <span class="c1"># =&gt; 2
</span></code></pre></div></div>

<p>Same idea in Lua, which famously has full coroutines:</p>

<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">local</span> <span class="n">WHITESPACE</span> <span class="o">=</span> <span class="p">{</span>
    <span class="p">[</span><span class="mh">0x0009</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x000a</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x000b</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x000c</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,</span>
    <span class="p">[</span><span class="mh">0x000d</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x0020</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x0085</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x00a0</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,</span>
    <span class="p">[</span><span class="mh">0x1680</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2000</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2001</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2002</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,</span>
    <span class="p">[</span><span class="mh">0x2003</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2004</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2005</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2006</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,</span>
    <span class="p">[</span><span class="mh">0x2007</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2008</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2009</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x200a</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,</span>
    <span class="p">[</span><span class="mh">0x2028</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x2029</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x202f</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,[</span><span class="mh">0x205f</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span><span class="p">,</span>
    <span class="p">[</span><span class="mh">0x3000</span><span class="p">]</span><span class="o">=</span><span class="kc">true</span>
<span class="p">}</span>

<span class="k">function</span> <span class="nf">wordcount</span><span class="p">()</span>
    <span class="kd">local</span> <span class="n">count</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="k">while</span> <span class="kc">true</span> <span class="k">do</span>
        <span class="k">while</span> <span class="kc">true</span> <span class="k">do</span>
            <span class="c1">-- low signal</span>
            <span class="kd">local</span> <span class="n">codepoint</span> <span class="o">=</span> <span class="nb">coroutine.yield</span><span class="p">(</span><span class="n">count</span><span class="p">)</span>
            <span class="k">if</span> <span class="ow">not</span> <span class="n">WHITESPACE</span><span class="p">[</span><span class="n">codepoint</span><span class="p">]</span> <span class="k">then</span>
                <span class="n">count</span> <span class="o">=</span> <span class="n">count</span> <span class="o">+</span> <span class="mi">1</span>
                <span class="k">break</span>
            <span class="k">end</span>
        <span class="k">end</span>
        <span class="k">while</span> <span class="kc">true</span> <span class="k">do</span>
            <span class="c1">-- high signal</span>
            <span class="kd">local</span> <span class="n">codepoint</span> <span class="o">=</span> <span class="nb">coroutine.yield</span><span class="p">(</span><span class="n">count</span><span class="p">)</span>
            <span class="k">if</span> <span class="n">WHITESPACE</span><span class="p">[</span><span class="n">codepoint</span><span class="p">]</span> <span class="k">then</span>
                <span class="k">break</span>
            <span class="k">end</span>
        <span class="k">end</span>
    <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Except for initially priming the coroutine, at least <code class="language-plaintext highlighter-rouge">coroutine.wrap()</code>
hides the fact that it’s a coroutine.</p>

<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">wc</span> <span class="o">=</span> <span class="nb">coroutine.wrap</span><span class="p">(</span><span class="n">wordcount</span><span class="p">)</span>
<span class="n">wc</span><span class="p">()</span>  <span class="c1">-- prime the coroutine</span>
<span class="n">wc</span><span class="p">(</span><span class="nb">string.byte</span><span class="p">(</span><span class="s1">'A'</span><span class="p">))</span>  <span class="c1">-- =&gt; 1</span>
<span class="n">wc</span><span class="p">(</span><span class="nb">string.byte</span><span class="p">(</span><span class="s1">' '</span><span class="p">))</span>  <span class="c1">-- =&gt; 1</span>
<span class="n">wc</span><span class="p">(</span><span class="nb">string.byte</span><span class="p">(</span><span class="s1">'B'</span><span class="p">))</span>  <span class="c1">-- =&gt; 2</span>
<span class="n">wc</span><span class="p">(</span><span class="nb">string.byte</span><span class="p">(</span><span class="s1">' '</span><span class="p">))</span>  <span class="c1">-- =&gt; 2</span>
</code></pre></div></div>

<h3 id="extra-examples">Extra examples</h3>

<p>Finally, a couple more examples not worth describing in detail here. First
a Unicode case folding state machine:</p>

<p><a href="https://github.com/skeeto/scratch/blob/master/misc/casefold.c">https://github.com/skeeto/scratch/blob/master/misc/casefold.c</a></p>

<p>It’s just an interface to do a lookup into the <a href="https://www.unicode.org/Public/13.0.0/ucd/CaseFolding.txt">official case folding
table</a>. It was an experiment, and I <em>probably</em> wouldn’t use it in a
real program.</p>

<p>Second, I’ve mentioned <a href="https://github.com/skeeto/utf-7">my UTF-7 encoder and decoder</a> before. It’s
not obvious from the interface, but internally it’s just a state machine
for both encoder and decoder, which is what it allows it to “pause”
between any pair of input/output bytes.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Looking for Entropy in All the Wrong Places</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/04/30/"/>
    <id>urn:uuid:67da1a72-1103-4e12-a646-8a57443619eb</id>
    <updated>2019-04-30T22:50:09Z</updated>
    <category term="c"/><category term="lua"/><category term="crypto"/>
    <content type="html">
      <![CDATA[<p>Imagine we’re writing a C program and we need some random numbers. Maybe
it’s for a game, or for a Monte Carlo simulation, or for cryptography.
The standard library has a <code class="language-plaintext highlighter-rouge">rand()</code> function for some of these purposes.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">r</span> <span class="o">=</span> <span class="n">rand</span><span class="p">();</span>
</code></pre></div></div>

<p>There are some problems with this. Typically the implementation is a
rather poor PRNG, and <a href="/blog/2017/09/21/">we can do much better</a>. It’s a poor choice
for Monte Carlo simulations, and outright dangerous for cryptography.
Furthermore, it’s usually a dynamic function call, which <a href="/blog/2018/05/27/">has a high
overhead</a> compared to how little the function actually does. In
glibc, it’s also synchronized, adding even more overhead.</p>

<p>But, more importantly, this function returns the same sequences of
values each time the program runs. If we want different numbers each
time the program runs, it needs to be seeded — but seeded with <em>what</em>?
Regardless of what PRNG we ultimately use, we need inputs unique to this
particular execution.</p>

<h3 id="the-right-places">The right places</h3>

<p>On any modern unix-like system, the classical approach is to open
<code class="language-plaintext highlighter-rouge">/dev/urandom</code> and read some bytes. It’s not part of POSIX but it is a
<em>de facto</em> standard. These random bits are seeded from the physical
world by the operating system, making them highly unpredictable and
uncorrelated. They’re are suitable for keying a CSPRNG and, from
there, <a href="https://blog.cr.yp.to/20140205-entropy.html">generating all the secure random bits you will ever
need</a> (perhaps with <a href="https://blog.cr.yp.to/20170723-random.html">fast-key-erasure</a>). Why not
<code class="language-plaintext highlighter-rouge">/dev/random</code>? Because on Linux <a href="https://www.2uo.de/myths-about-urandom/">it’s pointlessly
superstitious</a>, which has basically ruined that path for
everyone.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* Returns zero on failure. */</span>
<span class="kt">int</span>
<span class="nf">getbits</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="n">buf</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">len</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">result</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="kt">FILE</span> <span class="o">*</span><span class="n">f</span> <span class="o">=</span> <span class="n">fopen</span><span class="p">(</span><span class="s">"/dev/urandom"</span><span class="p">,</span> <span class="s">"rb"</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">result</span> <span class="o">=</span> <span class="n">fread</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="n">len</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">f</span><span class="p">);</span>
        <span class="n">fclose</span><span class="p">(</span><span class="n">f</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">result</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">int</span>
<span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">unsigned</span> <span class="n">seed</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">getbits</span><span class="p">(</span><span class="o">&amp;</span><span class="n">seed</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">seed</span><span class="p">)))</span> <span class="p">{</span>
        <span class="n">srand</span><span class="p">(</span><span class="n">seed</span><span class="p">);</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="n">die</span><span class="p">();</span>
    <span class="p">}</span>

    <span class="cm">/* ... */</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Note how there are two different places <code class="language-plaintext highlighter-rouge">getbits()</code> could fail, with
multiple potential causes.</p>

<ul>
  <li>
    <p>It could fail to open the file. Perhaps the program isn’t running on a
modern unix-like system. Perhaps it’s running in a chroot and
<code class="language-plaintext highlighter-rouge">/dev/urandom</code> wasn’t created. Perhaps there are too many file
descriptors already open. Perhaps there isn’t enough memory available
to open a file. Perhaps the file permissions disallow it or it’s
blocked by Mandatory Access Control (MAC).</p>
  </li>
  <li>
    <p>It could fail to read the file. This essentially can’t happen unless
the system is severely misconfigured, in which case a successful
read would be suspect anyway. In this case it’s probably still a
good idea to check the result.</p>
  </li>
</ul>

<p>The need for creating a file descriptor a serious issue for libraries.
Libraries that quietly create and close file descriptors can interfere
with the main program, especially if its asynchronous. The main program
might rely on file descriptors being consecutive, predictable, or
monotonic (<a href="https://www.freedesktop.org/software/systemd/man/sd_listen_fds.html">example</a>). File descriptors are also a limited resource,
so it may exhaust a file descriptor slot needed for the main program.
For a network service, a remote attacker could perhaps open enough
sockets to deny a file descriptor to <code class="language-plaintext highlighter-rouge">getbits()</code>, blocking the program
from gathering entropy.</p>

<p><code class="language-plaintext highlighter-rouge">/dev/urandom</code> is simple, but it’s not an ideal API.</p>

<h4 id="getentropy2">getentropy(2)</h4>

<p>Wouldn’t it be nicer if our program could just directly ask the
operating system to fill a buffer with random bits? That’s what the
OpenBSD folks thought, so they introduced a <a href="https://man.openbsd.org/getentropy.2"><code class="language-plaintext highlighter-rouge">getentropy(2)</code></a>
system call. When called correctly <em>it cannot fail</em>!</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">getentropy</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="n">buf</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">buflen</span><span class="p">);</span>
</code></pre></div></div>

<p>Other operating systems followed suit, <a href="https://lwn.net/Articles/711013/">including Linux</a>, though
on Linux <code class="language-plaintext highlighter-rouge">getentropy(2)</code> is a library function implemented using
<a href="http://man7.org/linux/man-pages/man2/getrandom.2.html"><code class="language-plaintext highlighter-rouge">getrandom(2)</code></a>, the actual system call. It’s been in the Linux
kernel since version 3.17 (October 2014), but the libc wrapper didn’t
appear in glibc until version 2.25 (February 2017). So as of this
writing, there are still many systems where it’s still not practical
to use even if their kernel is new enough.</p>

<p>For now on Linux you may still want to check, and have a strategy in
place, for an <code class="language-plaintext highlighter-rouge">ENOSYS</code> result. Some systems are still running kernels
that are 5 years old, or older.</p>

<p>OpenBSD also has another trick up its trick-filled sleeves: the
<a href="https://github.com/openbsd/src/blob/master/libexec/ld.so/SPECS.randomdata"><code class="language-plaintext highlighter-rouge">.openbsd.randomdata</code></a> section. Just as the <code class="language-plaintext highlighter-rouge">.bss</code> section is
filled with zeros, the <code class="language-plaintext highlighter-rouge">.openbsd.randomdata</code> section is filled with
securely-generated random bits. You could put your PRNG state in this
section and it will be seeded as part of loading the program. Cool!</p>

<h4 id="rtlgenrandom">RtlGenRandom()</h4>

<p>Windows doesn’t have <code class="language-plaintext highlighter-rouge">/dev/urandom</code>. Instead it has:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">CryptGenRandom()</code></li>
  <li><code class="language-plaintext highlighter-rouge">CryptAcquireContext()</code></li>
  <li><code class="language-plaintext highlighter-rouge">CryptReleaseContext()</code></li>
</ul>

<p>Though in typical Win32 fashion, the API is ugly, overly-complicated,
and has multiple possible failure points. It’s essentially impossible
to use without referencing documentation. Ugh.</p>

<p>However, <a href="/blog/2018/04/13/">Windows 98 and later</a> has <a href="https://docs.microsoft.com/en-us/windows/desktop/api/ntsecapi/nf-ntsecapi-rtlgenrandom"><code class="language-plaintext highlighter-rouge">RtlGenRandom()</code></a>,
which has a much more reasonable interface. Looks an awful lot like
<code class="language-plaintext highlighter-rouge">getentropy(2)</code>, eh?</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">BOOLEAN</span> <span class="nf">RtlGenRandom</span><span class="p">(</span>
  <span class="n">PVOID</span> <span class="n">RandomBuffer</span><span class="p">,</span>
  <span class="n">ULONG</span> <span class="n">RandomBufferLength</span>
<span class="p">);</span>
</code></pre></div></div>

<p>The problem is that it’s not quite an official API, and no promises
are made about it. In practice, far too much software now depends on
it that the API is unlikely to ever break. Despite the prototype
above, this function is <em>actually</em> named <code class="language-plaintext highlighter-rouge">SystemFunction036()</code>, and
you have to supply your own prototype. Here’s my little drop-in
snippet that turns it nearly into <code class="language-plaintext highlighter-rouge">getentropy(2)</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#ifdef _WIN32
#  define WIN32_LEAN_AND_MEAN
#  include &lt;windows.h&gt;
#  pragma comment(lib, "advapi32.lib")
</span>   <span class="n">BOOLEAN</span> <span class="n">NTAPI</span> <span class="nf">SystemFunction036</span><span class="p">(</span><span class="n">PVOID</span><span class="p">,</span> <span class="n">ULONG</span><span class="p">);</span>
<span class="cp">#  define getentropy(buf, len) (SystemFunction036(buf, len) ? 0 : -1)
#endif
</span></code></pre></div></div>

<p>It works in Wine, too, where, at least in my version, it reads from
<code class="language-plaintext highlighter-rouge">/dev/urandom</code>.</p>

<h3 id="the-wrong-places">The wrong places</h3>

<p>That’s all well and good, but suppose we’re masochists. We want our
program to be <a href="/blog/2017/03/30/">maximally portable</a> so we’re sticking strictly to
functionality found in the standard C library. That means no
<code class="language-plaintext highlighter-rouge">getentropy(2)</code> and no <code class="language-plaintext highlighter-rouge">RtlGenRandom()</code>. We can still try to open
<code class="language-plaintext highlighter-rouge">/dev/urandom</code>, but it might fail, or it might not actually be useful,
so we’ll want a backup.</p>

<p>The usual approach found in a thousand tutorials is <code class="language-plaintext highlighter-rouge">time(3)</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">srand</span><span class="p">(</span><span class="n">time</span><span class="p">(</span><span class="nb">NULL</span><span class="p">));</span>
</code></pre></div></div>

<p>It would be better to <a href="/blog/2018/07/31/">use an integer hash function</a> to mix up the
result from <code class="language-plaintext highlighter-rouge">time(0)</code> before using it as a seed. Otherwise two programs
started close in time may have similar initial sequences.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">srand</span><span class="p">(</span><span class="n">triple32</span><span class="p">(</span><span class="n">time</span><span class="p">(</span><span class="nb">NULL</span><span class="p">)));</span>
</code></pre></div></div>

<p>The more pressing issue is that <code class="language-plaintext highlighter-rouge">time(3)</code> has a resolution of one
second. If the program is run twice inside of a second, they’ll both
have the same sequence of numbers. It would be better to use a higher
resolution clock, but, <strong>standard C doesn’t provide a clock with greater
than one second resolution</strong>. That normally requires calling into POSIX
or Win32.</p>

<p>So, we need to find some other sources of entropy unique to each
execution of the program.</p>

<h4 id="quick-and-dirty-string-hash-function">Quick and dirty “string” hash function</h4>

<p>Before we get into that, we need a way to mix these different sources
together. Here’s a <a href="/blog/2018/06/10/">small</a>, 32-bit “string” hash function. The loop
is the same algorithm as Java’s <code class="language-plaintext highlighter-rouge">hashCode()</code>, and I appended <a href="/blog/2018/07/31/">my own
integer hash</a> as a finalizer for much better diffusion.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span>
<span class="nf">hash32s</span><span class="p">(</span><span class="k">const</span> <span class="kt">void</span> <span class="o">*</span><span class="n">buf</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">len</span><span class="p">,</span> <span class="kt">uint32_t</span> <span class="n">h</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">const</span> <span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">p</span> <span class="o">=</span> <span class="n">buf</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">len</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
        <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="o">*</span> <span class="mi">31</span> <span class="o">+</span> <span class="n">p</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
    <span class="n">h</span> <span class="o">^=</span> <span class="n">h</span> <span class="o">&gt;&gt;</span> <span class="mi">17</span><span class="p">;</span>
    <span class="n">h</span> <span class="o">*=</span> <span class="n">UINT32_C</span><span class="p">(</span><span class="mh">0xed5ad4bb</span><span class="p">);</span>
    <span class="n">h</span> <span class="o">^=</span> <span class="n">h</span> <span class="o">&gt;&gt;</span> <span class="mi">11</span><span class="p">;</span>
    <span class="n">h</span> <span class="o">*=</span> <span class="n">UINT32_C</span><span class="p">(</span><span class="mh">0xac4c1b51</span><span class="p">);</span>
    <span class="n">h</span> <span class="o">^=</span> <span class="n">h</span> <span class="o">&gt;&gt;</span> <span class="mi">15</span><span class="p">;</span>
    <span class="n">h</span> <span class="o">*=</span> <span class="n">UINT32_C</span><span class="p">(</span><span class="mh">0x31848bab</span><span class="p">);</span>
    <span class="n">h</span> <span class="o">^=</span> <span class="n">h</span> <span class="o">&gt;&gt;</span> <span class="mi">14</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">h</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>It accepts a starting hash value, which is essentially a “context” for
the digest that allows different inputs to be appended together. The
finalizer acts as an implicit “stop” symbol in between inputs.</p>

<p>I used fixed-width integers, but it could be written nearly as concisely
using only <code class="language-plaintext highlighter-rouge">unsigned long</code> and some masking to truncate to 32-bits. I
leave this as an exercise to the reader.</p>

<p>Some of the values to be mixed in will be pointers themselves. These
could instead be cast to integers and passed through an integer hash
function, but using string hash avoids <a href="/blog/2016/05/30/">various caveats</a>. Besides,
one of the inputs will be a string, so we’ll need this function anyway.</p>

<h4 id="randomized-pointers-aslr-random-stack-gap-etc">Randomized pointers (ASLR, random stack gap, etc.)</h4>

<p>Attackers can use predictability to their advantage, so modern systems
use unpredictability to improve security. Memory addresses for various
objects and executable code are randomized since some attacks require
an attacker to know their addresses. We can skim entropy from these
pointers to seed our PRNG.</p>

<p>Address Space Layout Randomization (ASLR) is when executable code and
its associated data is loaded to a random offset by the loader. Code
designed for this is called Position Independent Code (PIC). This has
long been used when loading dynamic libraries so that all of the
libraries on a system don’t have to coordinate with each other to
avoid overlapping.</p>

<p>To improve security, it has more recently been extended to programs
themselves. On both modern unix-like systems and Windows,
position-independent executables (PIE) are now the default.</p>

<p>To skim entropy from ASLR, we just need the address of one of our
functions. All the functions in our program will have the same relative
offset, so there’s no reason to use more than one. An obvious choice is
<code class="language-plaintext highlighter-rouge">main()</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kt">uint32_t</span> <span class="n">h</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>  <span class="cm">/* initial hash value */</span>
    <span class="kt">int</span> <span class="p">(</span><span class="o">*</span><span class="n">mainptr</span><span class="p">)()</span> <span class="o">=</span> <span class="n">main</span><span class="p">;</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="o">&amp;</span><span class="n">mainptr</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">mainptr</span><span class="p">),</span> <span class="n">h</span><span class="p">);</span>
</code></pre></div></div>

<p>Notice I had to store the address of <code class="language-plaintext highlighter-rouge">main()</code> in a variable, and then
treat <em>the pointer itself</em> as a buffer for the hash function? It’s not
hashing the machine code behind <code class="language-plaintext highlighter-rouge">main</code>, just its address. The symbol
<code class="language-plaintext highlighter-rouge">main</code> doesn’t store an address, so it can’t be given to the hash
function to represent its address. This is analogous to an array
versus a pointer.</p>

<p>On a typical x86-64 Linux system, and when this is a PIE, that’s about
3 bytes worth of entropy. On 32-bit systems, virtual memory is so
tight that it’s worth a lot less. We might want more entropy than
that, and we want to cover the case where the program isn’t compiled
as a PIE.</p>

<p>On unix-like systems, programs are typically dynamically linked against
the C library, libc. Each shared object gets its own ASLR offset, so we
can skim more entropy from each shared object by picking a function or
variable from each. Let’s do <code class="language-plaintext highlighter-rouge">malloc(3)</code> for libc ASLR:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kt">void</span> <span class="o">*</span><span class="p">(</span><span class="o">*</span><span class="n">mallocptr</span><span class="p">)()</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">;</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="o">&amp;</span><span class="n">mallocptr</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">mallocptr</span><span class="p">),</span> <span class="n">h</span><span class="p">);</span>
</code></pre></div></div>

<p>Allocators themselves often randomize the addresses they return so that
data objects are stored at unpredictable addresses. In particular, glibc
uses different strategies for small (<code class="language-plaintext highlighter-rouge">brk(2)</code>) versus big (<code class="language-plaintext highlighter-rouge">mmap(2)</code>)
allocations. That’s two different sources of entropy:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kt">void</span> <span class="o">*</span><span class="n">small</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>        <span class="cm">/* 1 byte */</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="o">&amp;</span><span class="n">small</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">small</span><span class="p">),</span> <span class="n">h</span><span class="p">);</span>
    <span class="n">free</span><span class="p">(</span><span class="n">small</span><span class="p">);</span>

    <span class="kt">void</span> <span class="o">*</span><span class="n">big</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="mi">1UL</span> <span class="o">&lt;&lt;</span> <span class="mi">20</span><span class="p">);</span>  <span class="cm">/* 1 MB */</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="o">&amp;</span><span class="n">big</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">big</span><span class="p">),</span> <span class="n">h</span><span class="p">);</span>
    <span class="n">free</span><span class="p">(</span><span class="n">big</span><span class="p">);</span>
</code></pre></div></div>

<p>Finally the stack itself is often mapped at a random address, or at
least started with a random gap, so that local variable addresses are
also randomized.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kt">void</span> <span class="o">*</span><span class="n">ptr</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">ptr</span><span class="p">;</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="o">&amp;</span><span class="n">ptr</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">ptr</span><span class="p">),</span> <span class="n">h</span><span class="p">);</span>
</code></pre></div></div>

<h4 id="time-sources">Time sources</h4>

<p>We haven’t used <code class="language-plaintext highlighter-rouge">time(3)</code> yet! Let’s still do that, using the full
width of <code class="language-plaintext highlighter-rouge">time_t</code> this time around:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kt">time_t</span> <span class="n">t</span> <span class="o">=</span> <span class="n">time</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="o">&amp;</span><span class="n">t</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">t</span><span class="p">),</span> <span class="n">h</span><span class="p">);</span>
</code></pre></div></div>

<p>We do have another time source to consider: <code class="language-plaintext highlighter-rouge">clock(3)</code>. It returns an
approximation of the processor time used by the program. There’s a
tiny bit of noise and inconsistency between repeated calls. We can use
this to extract a little bit of entropy over many repeated calls.</p>

<p>Naively we might try to use it like this:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="cm">/* Note: don't use this */</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">1000</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">clock_t</span> <span class="n">c</span> <span class="o">=</span> <span class="n">clock</span><span class="p">();</span>
        <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="o">&amp;</span><span class="n">c</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">c</span><span class="p">),</span> <span class="n">h</span><span class="p">);</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>The problem is that the resolution for <code class="language-plaintext highlighter-rouge">clock()</code> is typically rough
enough that modern computers can execute multiple instructions between
ticks. On Windows, where <code class="language-plaintext highlighter-rouge">CLOCKS_PER_SEC</code> is low, that entire loop
will typically complete before the result from <code class="language-plaintext highlighter-rouge">clock()</code> increments
even once. With that arrangement we’re hardly getting anything from
it! So here’s a better version:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">1000</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">counter</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
        <span class="kt">clock_t</span> <span class="n">start</span> <span class="o">=</span> <span class="n">clock</span><span class="p">();</span>
        <span class="k">while</span> <span class="p">(</span><span class="n">clock</span><span class="p">()</span> <span class="o">==</span> <span class="n">start</span><span class="p">)</span>
            <span class="n">counter</span><span class="o">++</span><span class="p">;</span>
        <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="o">&amp;</span><span class="n">start</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">start</span><span class="p">),</span> <span class="n">h</span><span class="p">);</span>
        <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="o">&amp;</span><span class="n">counter</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">counter</span><span class="p">),</span> <span class="n">h</span><span class="p">);</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>The counter makes the resolution of the clock no longer important. If
it’s low resolution, then we’ll get lots of noise from the counter. If
it’s high resolution, then we get noise from the clock value itself.
Running the hash function an extra time between overall <code class="language-plaintext highlighter-rouge">clock(3)</code>
samples also helps with noise.</p>

<h4 id="a-legitimate-use-of-tmpnam3">A legitimate use of tmpnam(3)</h4>

<p>We’ve got one more source of entropy available: <code class="language-plaintext highlighter-rouge">tmpnam(3)</code>. This
function generates a unique, temporary file name. It’s dangerous to
use as intended because it doesn’t actually create the file. There’s a
race between generating the name for the file and actually creating
it.</p>

<p>Fortunately we don’t actually care about the name as a filename. We’re
using this to sample entropy not directly available to us. In attempt to
get a unique name, the standard C library draws on its own sources of
entropy.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="n">L_tmpnam</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="mi">0</span><span class="p">};</span>
    <span class="n">tmpnam</span><span class="p">(</span><span class="n">buf</span><span class="p">);</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">buf</span><span class="p">),</span> <span class="n">h</span><span class="p">);</span>
</code></pre></div></div>

<p>The rather unfortunately downside is that lots of modern systems produce
a <em>linker</em> warning when it sees <code class="language-plaintext highlighter-rouge">tmpnam(3)</code> being linked, even though in
this case it’s completely harmless.</p>

<p>So what goes into a temporary filename? It depends on the
implementation.</p>

<h5 id="glibc-and-musl">glibc and musl</h5>

<p>Both get a high resolution timestamp and generate the filename directly
from the timestamp (no hashing, etc.). Unfortunately glibc does a very
poor job of also mixing <code class="language-plaintext highlighter-rouge">getpid(2)</code> into the timestamp before using it,
and probably makes things worse by doing so.</p>

<p>On these platforms, this is is a way to sample a high resolution
timestamp without calling anything non-standard.</p>

<h5 id="dietlibc">dietlibc</h5>

<p>In the latest release as of this writing it uses <code class="language-plaintext highlighter-rouge">rand(3)</code>, which makes
this useless. It’s also a bug since the C library isn’t allowed to
affect the state of <code class="language-plaintext highlighter-rouge">rand(3)</code> outside of <code class="language-plaintext highlighter-rouge">rand(3)</code> and <code class="language-plaintext highlighter-rouge">srand(3)</code>. I
submitted a bug report and this has <a href="https://github.com/ensc/dietlibc/commit/8c8df9579962dc7449fe1f3205fd19eec461aa23">since been fixed</a>.</p>

<p>In the next release it will use a generator seeded by the <a href="https://lwn.net/Articles/301798/">ELF
<code class="language-plaintext highlighter-rouge">AT_RANDOM</code></a> value if available, or ASLR otherwise. This makes
it moderately useful.</p>

<h5 id="libiberty">libiberty</h5>

<p>Generated from <code class="language-plaintext highlighter-rouge">getpid(2)</code> alone, with a counter to handle multiple
calls. It’s basically a way to sample the process ID without actually
calling <code class="language-plaintext highlighter-rouge">getpid(2)</code>.</p>

<h5 id="bsd-libc--bionic-android">BSD libc / Bionic (Android)</h5>

<p>Actually gathers real entropy from the operating system (via
<code class="language-plaintext highlighter-rouge">arc4random(2)</code>), which means we’re getting a lot of mileage out of this
one.</p>

<h5 id="uclibc">uclibc</h5>

<p>Its implementation is obviously forked from glibc. However, it first
tries to read entropy from <code class="language-plaintext highlighter-rouge">/dev/urandom</code>, and only if that fails does
it fallback to glibc’s original high resolution clock XOR <code class="language-plaintext highlighter-rouge">getpid(2)</code>
method (still not hashing it).</p>

<h4 id="finishing-touches">Finishing touches</h4>

<p>Finally, still use <code class="language-plaintext highlighter-rouge">/dev/urandom</code> if it’s available. This doesn’t
require us to trust that the output is anything useful since it’s just
being mixed into the other inputs.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kt">char</span> <span class="n">rnd</span><span class="p">[</span><span class="mi">4</span><span class="p">];</span>
    <span class="kt">FILE</span> <span class="o">*</span><span class="n">f</span> <span class="o">=</span> <span class="n">fopen</span><span class="p">(</span><span class="s">"/dev/urandom"</span><span class="p">,</span> <span class="s">"rb"</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">fread</span><span class="p">(</span><span class="n">rnd</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">rnd</span><span class="p">),</span> <span class="mi">1</span><span class="p">,</span> <span class="n">f</span><span class="p">))</span>
            <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="n">rnd</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">rnd</span><span class="p">),</span> <span class="n">h</span><span class="p">);</span>
        <span class="n">fclose</span><span class="p">(</span><span class="n">f</span><span class="p">);</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>When we’re all done gathering entropy, set the seed from the result.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">srand</span><span class="p">(</span><span class="n">h</span><span class="p">);</span>   <span class="cm">/* or whatever you're seeding */</span>
</code></pre></div></div>

<p>That’s bound to find <em>some</em> entropy on just about any host. Though
definitely don’t rely on the results for cryptography.</p>

<h3 id="lua">Lua</h3>

<p>I recently tackled this problem in Lua. It has a no-batteries-included
design, demanding very little of its host platform: nothing more than an
ANSI C implementation. Because of this, a Lua program has even fewer
options for gathering entropy than C. But it’s still not impossible!</p>

<p>To further complicate things, Lua code is often run in a sandbox with
some features removed. For example, Lua has <code class="language-plaintext highlighter-rouge">os.time()</code> and <code class="language-plaintext highlighter-rouge">os.clock()</code>
wrapping the C equivalents, allowing for the same sorts of entropy
sampling. When run in a sandbox, <code class="language-plaintext highlighter-rouge">os</code> might not be available. Similarly,
<code class="language-plaintext highlighter-rouge">io</code> might not be available for accessing <code class="language-plaintext highlighter-rouge">/dev/urandom</code>.</p>

<p>Have you ever printed a table, though? Or a function? It evaluates to
a string containing the object’s address.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ lua -e 'print(math)'
table: 0x559577668a30
$ lua -e 'print(math)'
table: 0x55e4a3679a30
</code></pre></div></div>

<p>Since the raw pointer values are leaked to Lua, we can skim allocator
entropy like before. Here’s the same hash function in Lua 5.3:</p>

<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">local</span> <span class="k">function</span> <span class="nf">hash32s</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="n">h</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="o">#</span><span class="n">buf</span> <span class="k">do</span>
        <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="o">*</span> <span class="mi">31</span> <span class="o">+</span> <span class="n">buf</span><span class="p">:</span><span class="n">byte</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
    <span class="k">end</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="err">&amp;</span> <span class="mh">0xffffffff</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="err">~</span> <span class="p">(</span><span class="n">h</span> <span class="o">&gt;&gt;</span> <span class="mi">17</span><span class="p">)</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="o">*</span> <span class="mh">0xed5ad4bb</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="err">&amp;</span> <span class="mh">0xffffffff</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="err">~</span> <span class="p">(</span><span class="n">h</span> <span class="o">&gt;&gt;</span> <span class="mi">11</span><span class="p">)</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="o">*</span> <span class="mh">0xac4c1b51</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="err">&amp;</span> <span class="mh">0xffffffff</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="err">~</span> <span class="p">(</span><span class="n">h</span> <span class="o">&gt;&gt;</span> <span class="mi">15</span><span class="p">)</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="o">*</span> <span class="mh">0x31848bab</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="err">&amp;</span> <span class="mh">0xffffffff</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="err">~</span> <span class="p">(</span><span class="n">h</span> <span class="o">&gt;&gt;</span> <span class="mi">14</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">h</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Now hash a bunch of pointers in the global environment:</p>

<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">local</span> <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">({},</span> <span class="mi">0</span><span class="p">)</span>  <span class="c1">-- hash a new table</span>
<span class="k">for</span> <span class="n">varname</span><span class="p">,</span> <span class="n">value</span> <span class="k">in</span> <span class="nb">pairs</span><span class="p">(</span><span class="n">_G</span><span class="p">)</span> <span class="k">do</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="n">varname</span><span class="p">,</span> <span class="n">h</span><span class="p">)</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="nb">tostring</span><span class="p">(</span><span class="n">value</span><span class="p">),</span> <span class="n">h</span><span class="p">)</span>
    <span class="k">if</span> <span class="nb">type</span><span class="p">(</span><span class="n">value</span><span class="p">)</span> <span class="o">==</span> <span class="s1">'table'</span> <span class="k">then</span>
        <span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span> <span class="k">in</span> <span class="nb">pairs</span><span class="p">(</span><span class="n">value</span><span class="p">)</span> <span class="k">do</span>
            <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="nb">tostring</span><span class="p">(</span><span class="n">k</span><span class="p">),</span> <span class="n">h</span><span class="p">)</span>
            <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="nb">tostring</span><span class="p">(</span><span class="n">v</span><span class="p">),</span> <span class="n">h</span><span class="p">)</span>
        <span class="k">end</span>
    <span class="k">end</span>
<span class="k">end</span>

<span class="nb">math.randomseed</span><span class="p">(</span><span class="n">h</span><span class="p">)</span>
</code></pre></div></div>

<p>Unfortunately this doesn’t actually work well on one platform I tested:
Cygwin. Cygwin has few security features, notably lacking ASLR, and
having a largely deterministic allocator.</p>

<h3 id="when-to-use-it">When to use it</h3>

<p>In practice it’s not really necessary to use these sorts of tricks of
gathering entropy from odd places. It’s something that comes up more
in coding challenges and exercises than in real programs. I’m probably
already making platform-specific calls in programs substantial enough
to need it anyway.</p>

<p>On a few occasions I have thought about these things when debugging.
ASLR makes return pointers on the stack slightly randomized on each
run, which can change the behavior of some kinds of bugs. Allocator
and stack randomization does similar things to most of your pointers.
GDB tries to disable some of these features during debugging, but it
doesn’t get everything.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>The CPython Bytecode Compiler is Dumb</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/02/24/"/>
    <id>urn:uuid:4348d611-858b-4f48-a6f5-6e4b93f71a34</id>
    <updated>2019-02-24T21:56:35Z</updated>
    <category term="python"/><category term="lua"/><category term="lang"/><category term="elisp"/><category term="optimization"/>
    <content type="html">
      <![CDATA[<p><em>This article was <a href="https://news.ycombinator.com/item?id=19241545">discussed on Hacker News</a>.</em></p>

<p>Due to sheer coincidence of several unrelated tasks converging on
Python at work, I recently needed to brush up on my Python skills. So
far for me, Python has been little more than <a href="/blog/2017/05/15/">a fancy extension
language for BeautifulSoup</a>, though I also used it to participate
in the recent tradition of <a href="https://github.com/skeeto/qualbum">writing one’s own static site
generator</a>, in this case for <a href="http://photo.nullprogram.com/">my wife’s photo blog</a>.
I’ve been reading through <em>Fluent Python</em> by Luciano Ramalho, and it’s
been quite effective at getting me up to speed.</p>

<!--more-->

<p>As I write Python, <a href="/blog/2014/01/04/">like with Emacs Lisp</a>, I can’t help but
consider what exactly is happening inside the interpreter. I wonder if
the code I’m writing is putting undue constraints on the bytecode
compiler and limiting its options. Ultimately I’d like the code I
write <a href="/blog/2017/01/30/">to drive the interpreter efficiently and effectively</a>.
<a href="https://www.python.org/dev/peps/pep-0020/">The Zen of Python</a> says there should “only one obvious way to do
it,” but in practice there’s a lot of room for expression. Given
multiple ways to express the same algorithm or idea, I tend to prefer
the one that compiles to the more efficient bytecode.</p>

<p>Fortunately CPython, the main and most widely used implementation of
Python, is very transparent about its bytecode. It’s easy to inspect
and reason about its bytecode. The disassembly listing is easy to read
and understand, and I can always follow it without consulting the
documentation. This contrasts sharply with modern JavaScript engines
and their opaque use of JIT compilation, where performance is guided
by obeying certain patterns (<a href="https://www.youtube.com/watch?v=UJPdhx5zTaw">hidden classes</a>, etc.), helping the
compiler <a href="https://blog.mozilla.org/javascript/2013/11/07/efficient-float32-arithmetic-in-javascript/">understand my program’s types</a>, and being careful
not to unnecessarily constrain the compiler.</p>

<p>So, besides just catching up with Python the language, I’ve been
studying the bytecode disassembly of the functions that I write. One
fact has become quite apparent: <strong>the CPython bytecode compiler is
pretty dumb</strong>. With a few exceptions, it’s a very literal translation
of a Python program, and there is almost <a href="https://legacy.python.org/workshops/1998-11/proceedings/papers/montanaro/montanaro.html">no optimization</a>.
Below I’ll demonstrate a case where it’s possible to detect one of the
missed optimizations without inspecting the bytecode disassembly
thanks to a small abstraction leak in the optimizer.</p>

<p>To be clear: This isn’t to say CPython is bad, or even that it should
necessarily change. In fact, as I’ll show, <strong>dumb bytecode compilers
are par for the course</strong>. In the past I’ve lamented how the Emacs Lisp
compiler could do a better job, but CPython and Lua are operating at
the same level. There are benefits to a dumb and straightforward
bytecode compiler: the compiler itself is simpler, easier to maintain,
and more amenable to modification (e.g. as Python continues to
evolve). It’s also easier to debug Python (<code class="language-plaintext highlighter-rouge">pdb</code>) because it’s such a
close match to the source listing.</p>

<p><em>Update</em>: <a href="https://codewords.recurse.com/issues/seven/dragon-taming-with-tailbiter-a-bytecode-compiler">Darius Bacon points out</a> that Guido van Rossum
himself said, “<a href="https://books.google.com/books?id=bIxWAgAAQBAJ&amp;pg=PA26&amp;lpg=PA26&amp;dq=%22Python+is+about+having+the+simplest,+dumbest+compiler+imaginable.%22&amp;source=bl&amp;ots=2OfDoWX321&amp;sig=ACfU3U32jKZBE3VkJ0gvkKbxRRgD0bnoRg&amp;hl=en&amp;sa=X&amp;ved=2ahUKEwjZ1quO89bgAhWpm-AKHfckAxUQ6AEwAHoECAkQAQ#v=onepage&amp;q=%22Python%20is%20about%20having%20the%20simplest%2C%20dumbest%20compiler%20imaginable.%22&amp;f=false">Python is about having the simplest, dumbest compiler
imaginable.</a>” So this is all very much by design.</p>

<p>The consensus seems to be that if you want or need better performance,
use something other than Python. (And if you can’t do that, at least use
<a href="https://pypy.org/">PyPy</a>.) That’s a fairly reasonable and healthy goal. Still, if
I’m writing Python, I’d like to do the best I can, which means
exploiting the optimizations that <em>are</em> available when possible.</p>

<h3 id="disassembly-examples">Disassembly examples</h3>

<p>I’m going to compare three bytecode compilers in this article: CPython
3.7, Lua 5.3, and Emacs 26.1. Each of these languages are dynamically
typed, are primarily executed on a bytecode virtual machine, and it’s
easy to access their disassembly listings. One caveat: CPython and Emacs
use a stack-based virtual machine while Lua uses a register-based
virtual machine.</p>

<p>For CPython I’ll be using the <code class="language-plaintext highlighter-rouge">dis</code> module. For Emacs Lisp I’ll use <code class="language-plaintext highlighter-rouge">M-x
disassemble</code>, and all code will use lexical scoping. In Lua I’ll use
<code class="language-plaintext highlighter-rouge">lua -l</code> on the command line.</p>

<h3 id="local-variable-elimination">Local variable elimination</h3>

<p>Will the bytecode compiler eliminate local variables? Keeping the
variable around potentially involves allocating memory for it, assigning
to it, and accessing it. Take this example:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
    <span class="n">x</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="n">y</span> <span class="o">=</span> <span class="mi">1</span>
    <span class="k">return</span> <span class="n">x</span>
</code></pre></div></div>

<p>This function is equivalent to:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
    <span class="k">return</span> <span class="mi">0</span>
</code></pre></div></div>

<p>Despite this, CPython completely misses this optimization for both <code class="language-plaintext highlighter-rouge">x</code>
and <code class="language-plaintext highlighter-rouge">y</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  2           0 LOAD_CONST               1 (0)
              2 STORE_FAST               0 (x)
  3           4 LOAD_CONST               2 (1)
              6 STORE_FAST               1 (y)
  4           8 LOAD_FAST                0 (x)
             10 RETURN_VALUE
</code></pre></div></div>

<p>It assigns both variables, and even loads again from <code class="language-plaintext highlighter-rouge">x</code> for the return.
Missed optimizations, but, as I said, by keeping these variables around,
debugging is more straightforward. Users can always inspect variables.</p>

<p>How about Lua?</p>

<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span> <span class="nf">foo</span><span class="p">()</span>
    <span class="kd">local</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="kd">local</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">1</span>
    <span class="k">return</span> <span class="n">x</span>
<span class="k">end</span>
</code></pre></div></div>

<p>It also misses this optimization, though it matters a little less due to
its architecture (the return instruction references a register
regardless of whether or not that register is allocated to a local
variable):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>        1       [2]     LOADK           0 -1    ; 0
        2       [3]     LOADK           1 -2    ; 1
        3       [4]     RETURN          0 2
        4       [5]     RETURN          0 1
</code></pre></div></div>

<p>Emacs Lisp also misses it:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">foo</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">x</span> <span class="mi">0</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">y</span> <span class="mi">1</span><span class="p">))</span>
    <span class="nv">x</span><span class="p">))</span>
</code></pre></div></div>

<p>Disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0	constant  0
1	constant  1
2	stack-ref 1
3	return
</code></pre></div></div>

<p>All three are on the same page.</p>

<h3 id="constant-folding">Constant folding</h3>

<p>Does the bytecode compiler evaluate simple constant expressions at
compile time? This is simple and everyone does it.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
    <span class="k">return</span> <span class="mi">1</span> <span class="o">+</span> <span class="mi">2</span> <span class="o">*</span> <span class="mi">3</span> <span class="o">/</span> <span class="mi">4</span>
</code></pre></div></div>

<p>Disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  2           0 LOAD_CONST               1 (2.5)
              2 RETURN_VALUE
</code></pre></div></div>

<p>Lua:</p>

<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span> <span class="nf">foo</span><span class="p">()</span>
    <span class="k">return</span> <span class="mi">1</span> <span class="o">+</span> <span class="mi">2</span> <span class="o">*</span> <span class="mi">3</span> <span class="o">/</span> <span class="mi">4</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>        1       [2]     LOADK           0 -1    ; 2.5
        2       [2]     RETURN          0 2
        3       [3]     RETURN          0 1
</code></pre></div></div>

<p>Emacs Lisp:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">foo</span> <span class="p">()</span>
  <span class="p">(</span><span class="nb">+</span> <span class="mi">1</span> <span class="p">(</span><span class="nb">/</span> <span class="p">(</span><span class="nb">*</span> <span class="mi">2</span> <span class="mi">3</span><span class="p">)</span> <span class="mf">4.0</span><span class="p">))</span>
</code></pre></div></div>

<p>Disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0	constant  2.5
1	return
</code></pre></div></div>

<p>That’s something we can count on so long as the operands are all
numeric literals (or also, for Python, string literals) that are
visible to the compiler. Don’t count on your operator overloads to
work here, though.</p>

<h3 id="allocation-optimization">Allocation optimization</h3>

<p>Optimizers often perform <em>escape analysis</em>, to determine if objects
allocated in a function ever become visible outside of that function. If
they don’t then these objects could potentially be stack-allocated
(instead of heap-allocated) or even be eliminated entirely.</p>

<p>None of the bytecode compilers are this sophisticated. However CPython
does have a trick up its sleeve: tuple optimization. Since tuples are
immutable, in certain circumstances CPython will reuse them and avoid
both the constructor and the allocation.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
    <span class="k">return</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
</code></pre></div></div>

<p>Check it out, the tuple is used as a constant:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  2           0 LOAD_CONST               1 ((1, 2, 3))
              2 RETURN_VALUE
</code></pre></div></div>

<p>Which we can detect by evaluating <code class="language-plaintext highlighter-rouge">foo() is foo()</code>, which is <code class="language-plaintext highlighter-rouge">True</code>.
Though deviate from this too much and the optimization is disabled.
Remember how CPython can’t optimize away variables, and that they
break constant folding? The break this, too:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
    <span class="n">x</span> <span class="o">=</span> <span class="mi">1</span>
    <span class="k">return</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
</code></pre></div></div>

<p>Disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  2           0 LOAD_CONST               1 (1)
              2 STORE_FAST               0 (x)
  3           4 LOAD_FAST                0 (x)
              6 LOAD_CONST               2 (2)
              8 LOAD_CONST               3 (3)
             10 BUILD_TUPLE              3
             12 RETURN_VALUE
</code></pre></div></div>

<p>This function might document that it always returns a simple tuple,
but we can tell if its being optimized or not using <code class="language-plaintext highlighter-rouge">is</code> like before:
<code class="language-plaintext highlighter-rouge">foo() is foo()</code> is now <code class="language-plaintext highlighter-rouge">False</code>! In some future version of Python with
a cleverer bytecode compiler, that expression might evaluate to
<code class="language-plaintext highlighter-rouge">True</code>. (Unless the <a href="https://docs.python.org/3/reference/">Python language specification</a> is specific
about this case, which I didn’t check.)</p>

<p>Note: Curiously PyPy replicates this exact behavior when examined with
<code class="language-plaintext highlighter-rouge">is</code>. Was that deliberate? I’m impressed that PyPy matches CPython’s
semantics so closely here.</p>

<p>Putting a mutable value, such as a list, in the tuple will also break
this optimization. But that’s not the compiler being dumb. That’s a
hard constraint on the compiler: the caller might change the mutable
component of the tuple, so it must always return a fresh copy.</p>

<p>Neither Lua nor Emacs Lisp have a language-level concept equivalent of
an immutable tuple, so there’s nothing to compare.</p>

<p>Other than the tuples situation in CPython, none of the bytecode
compilers eliminate unnecessary intermediate objects.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
    <span class="k">return</span> <span class="p">[</span><span class="mi">1024</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span>
</code></pre></div></div>

<p>Disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  2           0 LOAD_CONST               1 (1024)
              2 BUILD_LIST               1
              4 LOAD_CONST               2 (0)
              6 BINARY_SUBSCR
              8 RETURN_VALUE
</code></pre></div></div>

<p>Lua:</p>

<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span> <span class="nf">foo</span><span class="p">()</span>
    <span class="k">return</span> <span class="p">({</span><span class="mi">1024</span><span class="p">})[</span><span class="mi">1</span><span class="p">]</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>        1       [2]     NEWTABLE        0 1 0
        2       [2]     LOADK           1 -1    ; 1024
        3       [2]     SETLIST         0 1 1   ; 1
        4       [2]     GETTABLE        0 0 -2  ; 1
        5       [2]     RETURN          0 2
        6       [3]     RETURN          0 1
</code></pre></div></div>

<p>Emacs Lisp:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">foo</span> <span class="p">()</span>
  <span class="p">(</span><span class="nb">car</span> <span class="p">(</span><span class="nb">list</span> <span class="mi">1024</span><span class="p">)))</span>
</code></pre></div></div>

<p>Disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0	constant  1024
1	list1
2	car
3	return
</code></pre></div></div>

<h3 id="dont-expect-too-much">Don’t expect too much</h3>

<p>I could go on with lots of examples, looking at loop optimizations and
so on, and each case is almost certainly unoptimized. The general rule
of thumb is to simply not expect much from these bytecode compilers.
They’re very literal in their translation.</p>

<p>Working so much in C has put me in the habit of expecting all obvious
optimizations from the compiler. This frees me to be more expressive
in my code. Lots of things are cost-free thanks to these
optimizations, such as breaking a complex expression up into several
variables, naming my constants, or not using a local variable to
manually cache memory accesses. I’m confident the compiler will
optimize away my expressiveness. The catch is that <a href="/blog/2018/05/01/">clever compilers
can take things too far</a>, so I’ve got to be mindful of how it might
undermine my intentions — i.e. when I’m doing something unusual or not
strictly permitted.</p>

<p>These bytecode compilers will never truly surprise me. The cost is
that being more expressive in Python, Lua, or Emacs Lisp may reduce
performance at run time because it shows in the bytecode. Usually this
doesn’t matter, but sometimes it does.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>The 3n + 1 Conjecture</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2008/01/29/"/>
    <id>urn:uuid:147c1170-2d8b-38b3-bfbe-0652dc2e5e9a</id>
    <updated>2008-01-29T00:00:00Z</updated>
    <category term="c"/><category term="lua"/>
    <content type="html">
      <![CDATA[<!-- 29 January 2008 -->
<p>
The 3n + 1 conjecture, also known as
the <a href="http://en.wikipedia.org/wiki/Collatz_conjecture">Collatz
conjecture</a>, is based around this recursive function,
</p>
<p class="center">
  <img src="/img/misc/collatz.png" alt=""/>
</p>
<p>
The conjecture is this,
</p>
<blockquote>
  <p>
    This process will eventually reach the number 1, regardless of which
    positive integer is chosen initially.
  </p>
</blockquote>
<p>
The way I am defining this may not be entirely accurate, as I took a
shortcut to make it a bit simpler. I am not a mathematician (IANAM) —
but sometimes I pretend to be one. For a really solid definition,
click through to the Wikipedia article in the link above.
</p>
<p>
A sample run, starting at 7, would look like this: <code>7, 22, 11,
34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1</code>. The sequence
starting at 7 contains 17 numbers. So 7 has a <i>cycle-length</i> of
17. Currently, there is no known positive integer that does not
eventually lead to 1. If the conjecture is true, then none exists to
be found.
</p>
<p>
I first found out about the problem when I saw it
on <a href="http://icpcres.ecs.baylor.edu/onlinejudge/">UVa Online
Judge</a>. UVa Online Judge is a system that has a couple thousand
programming problems to do. Users can submit solution programs written
in C, C++, Java, or Pascal. For normal submissions, the fastest
program wins.
</p>
<p>
Anyway, the way UVa Online Judge runs this problem is by providing the
solution program pairs of integers on <code>stdin</code> as text. The
integers define an inclusive range of integers over which the program
must return the length of the longest Collatz cycle-length for all the
integers inside that range. They don't tell you which ranges they are
checking, except that all integers will be less than 1,000,000 and the
sequences will never overflow a 32-bit integer (allowing shortcuts to
be made to increase performance).
</p>
<p>
The simple approach would be defining a function that returns the
cycle length (<a href="http://www.lua.org/">Lua</a> programming
language),
</p>
<pre>
function collatz_len (n)
   local c = 1

   while n > 1 do
      c = c + 1
      if math.mod(n, 2) == 0 then
         n = n / 2
      else
         n = 3 * n + 1
      end
   end

   return c
end
</pre>
<p>
Then we have a function check over a range (assuming n &lt;= m here),
</p>
<pre>
function check_range (n, m)
   local largest = 0

   for i = n, m do
      local len = collatz_len (i)

      if len > largest then
         largest = len
      end

   end

   return largest
end
</pre>
<p>
And top it off with the i/o. (I am just learning Lua, so I hope I did
this part properly!)
</p>
<pre>
while not io.stdin.eof do
   n, m = io.stdin:read("*number", "*number")

   -- check for eof
   if n == nil or m == nil then
      break
   end

   print (n .. " " .. m .. " " .. check_range(n, m))
end
</pre>
<p>
Notice anything extremely inefficient? We are doing the same work over
and over again! Take, for example, this range: 7, 22. When we start
with 7, we get the sequence shown above: <code>7, 22, 11, 34, 17, 52,
26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1</code>. Eight of these numbers
are part of the range that we are looking at. When we get up to 22, we
are going to walk down the same range again, less the 7. To make
things more efficient, we apply
some <a href="http://en.wikipedia.org/wiki/Dynamic_programming">
dynamic programming</a> and store previous calculated cycle-lengths in
an array. Once we get to a value we already calculated, we just look
it up.
</p>
<p>
I used dynamic programming in my submission, which I wrote up in
C. You can grab my
source <a href="/download/collatz/collatz.c">
here</a>. It fills in a large array (1000000 entries) as values are
found, so no cycle-length is calculated twice. When I submitted this
program, it ranked 60 out of about 300,000 entries. There are probably
a number of tweaks that can increase performance, such as increasing
the size of the array, but I didn't care much about inching closer to
the top. I would bet that the very top entries did some
trial-and-error and determined what ranges are tested, using the
results to seed their program accordingly. You could take my code and
submit it yourself, but that wouldn't be very honest, would it?
</p>
<p>
So why am I going through all of this describing such a simple
problem? Well, it is because of this neat feature of Lua that applies
well to this problem. Lua is kind of like Lisp. In Lisp, everything is
a list ("list processing" --> Lis<i>p</i>). In Lua, (almost)
everything is an associative array (Maybe they should have called it
Assp? Or Hashp?  I am kidding.) An object is a hash with fields
containing function references. There is even
some <a href="http://en.wikipedia.org/wiki/Syntactic_sugar"> syntactic
sugar</a> to help this along.
</p>
<p>
The cool thing is that we can create a hash with default entries that
reference a function that calculates the Collatz cycle-length of its
key. Once the cycle-length is calculated, the function reference is
replaced with the value, so the function is never called again from
that point. The function only actually determines the next integer,
then references the hash to get the cycle-length of that next integer.
</p>
<p>
Now this hash looks like it is infinitely large. This is really a form
of <a href="http://en.wikipedia.org/wiki/Lazy_evaluation"> lazy
evaluation</a>: no values are calculated until they are needed (this
is one of my favorite things about <a href="http://www.haskell.org/">
Haskell</a>). We don't need to explicitly ask for it to be calculated,
either. We just go along looking up values in the array as if they
were always there. Here is how you do it,
</p>
<pre>
collatz_len = { 1 }

setmetatable (collatz_len, {
   __index = function (name, n)
      if (math.mod (n, 2) == 0) then
         name[n] = name[n/2] + 1;
      else
         name[n] = name[3 * n + 1] + 1;
      end
         return name[n]
   end
})
</pre>
<p>
So we replace the <code>collatz_len</code> function with this array
(and replace the call to an array reference) and we have applied
dynamic programming to our old program. If I run the two programs with
this sample input,
</p>
<pre>
10 1000
1000 3000
300 500
</pre>
<p>
and look at average running times, the dynamic programming version
runs 87% faster than the original.
</p>
<p>
One problem with this, though, is the use of recursion. In Lua, it is
really easy to hit recursion limits. For example, accessing element
10000 will cause the program to crash. This will probably get fixed
someday, or in some implementation of Lua.
</p>
<p>
I thought there might be a way to do this in Perl, by changing the
default hash value from <code>undef</code> to something else, but I
was mildly disappointed to find out that this is not true.
</p>
<p>
Here is the source for the original program and the one with dynamic
programming (BSD licenced):
<a href="/download/collatz/collatz_simple.lua">
  collatz_simple.lua</a> and
<a href="/download/collatz/collatz.lua">
  collatz.lua</a>
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  

</feed>
