<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title>Articles tagged go at null program</title>
  <link rel="alternate" type="text/html"
        href="https://nullprogram.com/tags/go/"/>
  <link rel="self" type="application/atom+xml"
        href="https://nullprogram.com/tags/go/feed/"/>
  <updated>2026-04-09T13:25:45Z</updated>
  <id>urn:uuid:51c2a18d-6ae7-4e0c-b14e-a1424eb21dba</id>

  <author>
    <name>Christopher Wellons</name>
    <uri>https://nullprogram.com</uri>
    <email>wellons@nullprogram.com</email>
  </author>

  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Guidelines for computing sizes and subscripts</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2024/05/24/"/>
    <id>urn:uuid:df6214e0-e408-4254-bd65-49d64e06a93e</id>
    <updated>2024-05-24T22:25:10Z</updated>
    <category term="c"/><category term="cpp"/><category term="go"/><category term="tutorial"/>
    <content type="html">
      <![CDATA[<p>Occasionally we need to compute the size of an object that does not yet
exist, or a subscript <a href="https://research.google/blog/extra-extra-read-all-about-it-nearly-all-binary-searches-and-mergesorts-are-broken/">that may fall out of bounds</a>. It’s easy to miss
the edge cases where results overflow, creating a nasty, subtle bug, <a href="https://blog.carlana.net/post/2024/golang-slices-concat/">even
in the presence of type safety</a>. Ideally such computations happen in
specialized code, such as <em>inside</em> an allocator (<code class="language-plaintext highlighter-rouge">calloc</code>, <code class="language-plaintext highlighter-rouge">reallocarray</code>)
and not <em>outside</em> by the allocatee (i.e. <code class="language-plaintext highlighter-rouge">malloc</code>). Mitigations exist with
different trade-offs: arbitrary precision, or using a wider fixed integer
— i.e. 128-bit integers on 64-bit hosts. In the typical case, working only
with fixed size-type integers, I’ve come up with a set of guidelines to
avoid overflows in the edge cases.</p>

<ol>
  <li>Range check <em>before</em> computing a result. No exceptions.</li>
  <li>Do not cast unless you know <em>a priori</em> the operand is in range.</li>
  <li>Never mix unsigned and signed operands. <a href="https://www.youtube.com/watch?v=wvtFGa6XJDU">Prefer signed.</a> If you
need to convert an operand, see (2).</li>
  <li>Do not add unless you know <em>a priori</em> the result is in range.</li>
  <li>Do not multiply unless you know <em>a priori</em> the result is in range.</li>
  <li>Do not subtract unless you know <em>a priori</em> both signed operands
are non-negative. For unsigned, that the second operand is not larger
than the first (treat it like (4)).</li>
  <li>Do not divide unless you know <em>a prior</em> the denominator is positive.</li>
  <li>Make it correct first. Make it fast later, if needed.</li>
</ol>

<p>These guidelines are also useful when <em>reviewing</em> code, tracking in your
mind whether the invariants are held at each step. If not, you’ve likely
found a bug. If in doubt, use assertions to document and check invariants.
I compiled this list during code review, so for me that’s where it’s most
useful.</p>

<h3 id="range-check-then-compute">Range check, then compute</h3>

<p>Not strictly necessary when overflow is well-defined, i.e. wraparound, but
it’s like defensive driving. It’s simpler and clearer to check with basic
arithmetic rather than reason from a wraparound, i.e. a negative result.
Checked math functions are fine, too, if you check the overflow boolean
before accessing the result.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// bad
len++;
if (len &lt;= 0) error();

// good
if (len == MAX) error();
len++;
</code></pre></div></div>

<h3 id="casting">Casting</h3>

<p>Casting from signed to unsigned, it’s as simple as knowing the value is
non-negative, which is likely if you’re following (1). If a negative size
has appeared, there’s already been a bug earlier in the program, and the
only reasonable course of action is to abort, not handle it like an error.</p>

<h3 id="addition">Addition</h3>

<p>To check if addition will overflow, subtract one of the operands from the
maximum value.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>if (b &gt; MAX - a) error();
r = a + b;
</code></pre></div></div>

<p>In pointer arithmetic addition, it’s a common mistake to compute the
result pointer then compare it to the bounds. If the check failed, then
the pointer <em>already</em> overflowed, i.e. undefined behavior. Major pieces
software, <a href="https://sourcegraph.com/search?q=context:global+%22%3E+outend%22+repo:%5Egithub%5C.com/bminor/glibc%24+&amp;patternType=keyword&amp;sm=0">like glibc</a>, are riddled with such pointer overflows.
(Now that you’re aware of it, you’ll start noticing it everywhere. Sorry.)</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// bad: never do this
beg += size;
if (beg &gt; end) error();
</code></pre></div></div>

<p>To do this correctly, <strong>check integers not pointers</strong>. Like before,
subtract before adding.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>available = end - beg;
if (size &gt; available) error();
beg += size;
</code></pre></div></div>

<p>Mind mixing signed and unsigned operands for the comparison operator (3),
e.g. an unsigned size on the left and signed difference on the right.</p>

<h3 id="multiplication-and-division">Multiplication and division</h3>

<p>If you’re working this out on your own, multiplication seems tricky until
you’ve internalized a simple pattern. Just as we subtracted before adding,
we need to divide before multiplying. Divide the maximum value by one of
the operands:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>if (a&gt;0 &amp;&amp; b&gt;MAX/a) error();
r = a * b;
</code></pre></div></div>

<p>It’s often permitted for one or both to be zero, so mind divide-by-zero,
which is handled above by the first condition. Sometimes size must be
positive, e.g. the result of the <code class="language-plaintext highlighter-rouge">sizeof</code> operator in C, in which case we
should prefer it as the denominator.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>assert(size  &gt;  0);
assert(count &gt;= 0);
if (count &gt; MAX/size) error();
total = count * size;
</code></pre></div></div>

<p>With <a href="/blog/2023/09/27/">arena allocation</a> there are usually two concerns. First, will
it overflow when computing the total size, i.e. <code class="language-plaintext highlighter-rouge">count * size</code>? Second, is
the total size within the arena capacity. Naively that’s two checks, but
we can kill two birds with one stone: Check both at once by using the
current arena capacity as the maximum value when considering overflow.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>if (count &gt; (end - beg)/size) error();
total = count * size;
</code></pre></div></div>

<p>One condition pulling double duty.</p>

<h3 id="subtraction">Subtraction</h3>

<p>With signed sizes, the negative range is a long “runway” allowing a single
unchecked subtraction before overflow might occur. In essence, we were
exploiting this in order to check addition. The most common mistake with
unsigned subtraction is not accounting for overflow when going below zero.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// note: signed "i" only
for (i = end - stride; i &gt;= beg; i -= stride) ...
</code></pre></div></div>

<p>This loop will go awry if <code class="language-plaintext highlighter-rouge">i</code> is unsigned and <code class="language-plaintext highlighter-rouge">beg &lt;= stride</code>.</p>

<p>In special cases we can get away with a second subtraction without an
overflow check if we know some properties of our operands. For example, my
arena allocators look like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>padding = -beg &amp; (align - 1);
if (count &gt;= (end - beg - padding)/size) error();
</code></pre></div></div>

<p>That’s two subtractions in a row. However, <code class="language-plaintext highlighter-rouge">end - beg</code> describes the size
of a realized object, and <code class="language-plaintext highlighter-rouge">align</code> is a small constant (e.g. 2^(0–6)). It
could only overflow if the entirety of memory was occupied by the arena.</p>

<p>Bonus, advanced note: This check is actually pulling <em>triple duty</em>. Notice
that I used <code class="language-plaintext highlighter-rouge">&gt;=</code> instead of <code class="language-plaintext highlighter-rouge">&gt;</code>. The arena can’t fill exactly to the brim,
but it handles the extreme edge case where <code class="language-plaintext highlighter-rouge">count</code> is zero, the arena is
nearly full, but the bump pointer is unaligned. The result of subtracting
<code class="language-plaintext highlighter-rouge">padding</code> is negative, which rounds to zero by integer division, and would
pass a <code class="language-plaintext highlighter-rouge">&gt;</code> check. That wouldn’t be a problem except that aligning the bump
pointer would break the invariant <code class="language-plaintext highlighter-rouge">beg &lt;= end</code>.</p>

<h3 id="try-it-for-yourself">Try it for yourself</h3>

<p>Next time you’re reviewing code that computes sizes or subscripts, bring
the list up and see how well it follows the guidelines. If it misses one,
try to contrive an input that causes an overflow. If it follows guidelines
and you can still contrive such an input, then perhaps the list could use
another item!</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Solving "Two Sum" in C with a tiny hash table</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2023/06/26/"/>
    <id>urn:uuid:5d15318f-6915-4f72-8690-74a84d43d2f7</id>
    <updated>2023-06-26T19:38:18Z</updated>
    <category term="c"/><category term="go"/><category term="optimization"/>
    <content type="html">
      <![CDATA[<p>I came across a question: How does one efficiently solve <a href="https://leetcode.com/problems/two-sum/">Two Sum</a> in C?
There’s a naive quadratic time solution, but also an amortized linear time
solution using a hash table. Without a built-in or standard library hash
table, the latter sounds onerous. However, a <a href="/blog/2022/08/08/">mask-step-index table</a>,
a hash table construction suitable for many problems, requires only a few
lines of code. This approach is useful even when a standard hash table is
available, because by <a href="https://vimeo.com/644068002">exploiting the known problem constraints</a>, it
beats typical generic hash table performance by an order of magnitude
(<a href="https://gist.github.com/skeeto/7119cf683662deae717c0d4e79ebf605">demo</a>).</p>

<p>The Two Sum exercise, restated:</p>

<blockquote>
  <p>Given an integer array and target, return the distinct indices of two
elements that sum to the target.</p>
</blockquote>

<p>In particular, the solution doesn’t find elements, but their indices. The
exercise also constrains input ranges — important but easy to overlook:</p>

<ul>
  <li>2 &lt;= <code class="language-plaintext highlighter-rouge">count</code> &lt;= 10<sup>4</sup></li>
  <li>-10<sup>9</sup> &lt;= <code class="language-plaintext highlighter-rouge">nums[i]</code> &lt;= 10<sup>9</sup></li>
  <li>-10<sup>9</sup> &lt;= <code class="language-plaintext highlighter-rouge">target</code> &lt;= 10<sup>9</sup></li>
</ul>

<p>Notably, indices fit in a 16-bit integer with lots of room to spare. In
fact, it will fit in a 14-bit address space (16,384) with still plenty of
overhead. Elements fit in a signed 32-bit integer, and we can add and
subtract elements without overflow, if just barely. The last constraint
isn’t redundant, but it’s not readily exploitable either.</p>

<p>The naive solution is to linearly search the array for the complement.
With nested loops, it’s obviously quadratic time. At 10k elements, we
expect an abysmal 25M comparisons on average.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int16_t</span> <span class="n">count</span> <span class="o">=</span> <span class="p">...;</span>
<span class="kt">int32_t</span> <span class="o">*</span><span class="n">nums</span> <span class="o">=</span> <span class="p">...;</span>

<span class="k">for</span> <span class="p">(</span><span class="kt">int16_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">count</span><span class="o">-</span><span class="mi">1</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int16_t</span> <span class="n">j</span> <span class="o">=</span> <span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">;</span> <span class="n">j</span> <span class="o">&lt;</span> <span class="n">count</span><span class="p">;</span> <span class="n">j</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">nums</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">+</span><span class="n">nums</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">==</span> <span class="n">target</span><span class="p">)</span> <span class="p">{</span>
            <span class="c1">// found</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">nums</code> array is “keyed” by index. It would be better to also have the
inverse mapping: key on elements to obtain the <code class="language-plaintext highlighter-rouge">nums</code> index. Then for each
element we could compute the complement and find its index, if any, using
this second mapping.</p>

<p>The input range is finite, so an inverse map is simple. Allocate an array,
one element per integer in range, and store the index there. However, the
input range is 2 billion, and even with 16-bit indices that’s a 4GB array.
Feasible on 64-bit hosts, but wasteful. The exercise is certainly designed
to make it so. This array would be very sparse, at most less than half a
percent of its elements populated. That’s a hint: Associative arrays are
far more appropriate for representing such sparse mappings. That is, a
hash table.</p>

<p>Using Go’s built-in hash table:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">TwoSumWithMap</span><span class="p">(</span><span class="n">nums</span> <span class="p">[]</span><span class="kt">int32</span><span class="p">,</span> <span class="n">target</span> <span class="kt">int32</span><span class="p">)</span> <span class="p">(</span><span class="kt">int</span><span class="p">,</span> <span class="kt">int</span><span class="p">,</span> <span class="kt">bool</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">seen</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">(</span><span class="k">map</span><span class="p">[</span><span class="kt">int32</span><span class="p">]</span><span class="kt">int16</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">num</span> <span class="o">:=</span> <span class="k">range</span> <span class="n">nums</span> <span class="p">{</span>
        <span class="n">complement</span> <span class="o">:=</span> <span class="n">target</span> <span class="o">-</span> <span class="n">num</span>
        <span class="k">if</span> <span class="n">j</span><span class="p">,</span> <span class="n">ok</span> <span class="o">:=</span> <span class="n">seen</span><span class="p">[</span><span class="n">complement</span><span class="p">];</span> <span class="n">ok</span> <span class="p">{</span>
            <span class="k">return</span> <span class="kt">int</span><span class="p">(</span><span class="n">j</span><span class="p">),</span> <span class="n">i</span><span class="p">,</span> <span class="no">true</span>
        <span class="p">}</span>
        <span class="n">seen</span><span class="p">[</span><span class="n">num</span><span class="p">]</span> <span class="o">=</span> <span class="kt">int16</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="no">false</span>
<span class="p">}</span>
</code></pre></div></div>

<p>In essence, the hash table folds the sparse 2 billion element array onto a
smaller array, with collision resolution when elements inevitably land in
the same slot. For this exercise, that small array could be as small as
10,000 elements because that’s the most we’d ever need to track. For
folding the large key space onto the smaller, we could use modulo. For
collision resolution, we could keep walking the table.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int16_t</span> <span class="n">seen</span><span class="p">[</span><span class="mi">10000</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="mi">0</span><span class="p">};</span>

<span class="c1">// Find or insert nums[index].</span>
<span class="kt">int16_t</span> <span class="nf">lookup</span><span class="p">(</span><span class="kt">int32_t</span> <span class="o">*</span><span class="n">nums</span><span class="p">,</span> <span class="kt">int16_t</span> <span class="n">index</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="n">nums</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="o">%</span> <span class="mi">10000</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(;;)</span> <span class="p">{</span>
        <span class="kt">int16_t</span> <span class="n">j</span> <span class="o">=</span> <span class="n">seen</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span>  <span class="c1">// unbias</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">j</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>  <span class="c1">// empty slot</span>
            <span class="n">seen</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">index</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>  <span class="c1">// insert biased index</span>
            <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
        <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">nums</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">==</span> <span class="n">nums</span><span class="p">[</span><span class="n">index</span><span class="p">])</span> <span class="p">{</span>
            <span class="k">return</span> <span class="n">j</span><span class="p">;</span>  <span class="c1">// match found</span>
        <span class="p">}</span>
        <span class="n">i</span> <span class="o">=</span> <span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">%</span> <span class="mi">10000</span><span class="p">;</span>  <span class="c1">// keep looking</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Take note of a few details:</p>

<ol>
  <li>
    <p>An empty slot is zero, and an empty table is a zero-initialized array.
Since zero is a valid value, and all values are non-negative, it biases
values by 1 in the table.</p>
  </li>
  <li>
    <p>The <code class="language-plaintext highlighter-rouge">nums</code> array is part of the table structure, necessary for lookups.
<strong>The two mappings — element-by-index and index-by-element — share
structure.</strong></p>
  </li>
  <li>
    <p>It uses <em>open addressing</em> with <em>linear probing</em>, and so walks the table
until it either either finds the element or hits an empty slot.</p>
  </li>
  <li>
    <p>The “hash” function is modulo. If inputs are not random, they’ll tend
to bunch up in the table. Combined with linear probing makes for lots
of collisions. For the worst case, imagine sequentially ordered inputs.</p>
  </li>
  <li>
    <p>Sometimes the table will almost completely fill, and lookups will be no
better than the linear scans of the naive solution.</p>
  </li>
  <li>
    <p>Most subtle of all: This hash table is not enough for the exercise. The
keyed-on element may not even be in <code class="language-plaintext highlighter-rouge">nums</code>, and when lookup fails, that
element is not inserted in the table. Instead, a different element is
inserted. The conventional solution has at least two hash table
lookups. <strong>In the Go code, it’s <code class="language-plaintext highlighter-rouge">seen[complement]</code> for lookups and
<code class="language-plaintext highlighter-rouge">seen[num]</code> for inserts.</strong></p>
  </li>
</ol>

<p>To solve (4) we’ll use a hash function to more uniformly distribute
elements in the table. We’ll also probe the table in a random-ish order
that depends on the key. In practice there will be little bunching even
for non-random inputs.</p>

<p>To solve (5) we’ll use a larger table: 2<sup>14</sup> or 16,384 elements.
This has breathing room, and with a power of two we can use a fast mask
instead of a slow division (though in practice, compilers usually
implement division by a constant denominator with modular multiplication).</p>

<p>To solve (6) we’ll key complements together under the same key. It looks
for the complement, but on failure it inserts the current element in the
empty slot. In other words, <strong>this solution will only need a single hash
table lookup per element!</strong></p>

<p>Laying down some groundwork:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
    <span class="kt">int16_t</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">;</span>
    <span class="kt">_Bool</span> <span class="n">ok</span><span class="p">;</span>
<span class="p">}</span> <span class="n">TwoSum</span><span class="p">;</span>

<span class="n">TwoSum</span> <span class="nf">twosum</span><span class="p">(</span><span class="kt">int32_t</span> <span class="o">*</span><span class="n">nums</span><span class="p">,</span> <span class="kt">int16_t</span> <span class="n">count</span><span class="p">,</span> <span class="kt">int32_t</span> <span class="n">target</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">TwoSum</span> <span class="n">r</span> <span class="o">=</span> <span class="p">{</span><span class="mi">0</span><span class="p">};</span>
    <span class="kt">int16_t</span> <span class="n">seen</span><span class="p">[</span><span class="mi">1</span><span class="o">&lt;&lt;</span><span class="mi">14</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="mi">0</span><span class="p">};</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int16_t</span> <span class="n">n</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">n</span> <span class="o">&lt;</span> <span class="n">count</span><span class="p">;</span> <span class="n">n</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// ...</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">seen</code> array is a 32KiB hash table large enough for all inputs, small
enough that it can be a local variable. In the loop:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>        <span class="kt">int32_t</span> <span class="n">complement</span> <span class="o">=</span> <span class="n">target</span> <span class="o">-</span> <span class="n">nums</span><span class="p">[</span><span class="n">n</span><span class="p">];</span>
        <span class="kt">int32_t</span> <span class="n">key</span> <span class="o">=</span> <span class="n">complement</span><span class="o">&gt;</span><span class="n">nums</span><span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="o">?</span> <span class="n">complement</span> <span class="o">:</span> <span class="n">nums</span><span class="p">[</span><span class="n">n</span><span class="p">];</span>
        <span class="kt">uint32_t</span> <span class="n">hash</span> <span class="o">=</span> <span class="n">key</span> <span class="o">*</span> <span class="mi">489183053u</span><span class="p">;</span>
        <span class="kt">unsigned</span> <span class="n">mask</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">seen</span><span class="p">)</span><span class="o">/</span><span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">seen</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span>
        <span class="kt">unsigned</span> <span class="n">step</span> <span class="o">=</span> <span class="n">hash</span><span class="o">&gt;&gt;</span><span class="mi">13</span> <span class="o">|</span> <span class="mi">1</span><span class="p">;</span>
</code></pre></div></div>

<p>Compute the complement, then apply a “max” operation to derive a key. Any
commutative operation works, though obviously addition would be a poor
choice. XOR is similar enough to cause many collisions. Multiplication
works well, and is probably better if the ternary produces a branch.</p>

<p>The hash function is multiplication with <a href="/blog/2019/11/19/">a randomly-chosen prime</a>.
As we’ll see in a moment, <code class="language-plaintext highlighter-rouge">step</code> will also add-shift the hash before use.
The initial index will be the bottom 14 bits of this hash. For <code class="language-plaintext highlighter-rouge">step</code>,
recall from the MSI article that it must be odd so that every slot is
eventually probed. I shift out 13 bits and then override the 14th bit, so
<code class="language-plaintext highlighter-rouge">step</code> effectively skips over the 14 bits used for the initial table
index.</p>

<p>I used <code class="language-plaintext highlighter-rouge">unsigned</code> because I don’t really care about the width of the hash
table index, but more importantly, I want defined overflow from all the
bit twiddling, even in the face of implicit promotion. As a bonus, it can
help in reasoning about indirection: <code class="language-plaintext highlighter-rouge">seen</code> indices are <code class="language-plaintext highlighter-rouge">unsigned</code>, <code class="language-plaintext highlighter-rouge">nums</code>
indices are <code class="language-plaintext highlighter-rouge">int16_t</code>.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>        <span class="k">for</span> <span class="p">(</span><span class="kt">unsigned</span> <span class="n">i</span> <span class="o">=</span> <span class="n">hash</span><span class="p">;;)</span> <span class="p">{</span>
            <span class="n">i</span> <span class="o">=</span> <span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="n">step</span><span class="p">)</span> <span class="o">&amp;</span> <span class="n">mask</span><span class="p">;</span>
            <span class="kt">int16_t</span> <span class="n">j</span> <span class="o">=</span> <span class="n">seen</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span>  <span class="c1">// unbias</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">j</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
                <span class="n">seen</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">n</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>  <span class="c1">// bias and insert</span>
                <span class="k">break</span><span class="p">;</span>
            <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">nums</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">==</span> <span class="n">complement</span><span class="p">)</span> <span class="p">{</span>
                <span class="n">r</span><span class="p">.</span><span class="n">i</span> <span class="o">=</span> <span class="n">j</span><span class="p">;</span>
                <span class="n">r</span><span class="p">.</span><span class="n">j</span> <span class="o">=</span> <span class="n">n</span><span class="p">;</span>
                <span class="n">r</span><span class="p">.</span><span class="n">ok</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
                <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
            <span class="p">}</span>
        <span class="p">}</span>
</code></pre></div></div>

<p>The step is added before using the index the first time, helping to
scatter the start point and reduce collisions. If it’s an empty slot,
insert the <em>current</em> element, not the complement — which wouldn’t be
possible anyway. Unlike conventional solutions, this doesn’t require
another hash and lookup. If it finds the complement, problem solved,
otherwise keep going.</p>

<p>Putting it all together, it’s only slightly longer than solutions using a
generic hash table:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">TwoSum</span> <span class="nf">twosum</span><span class="p">(</span><span class="kt">int32_t</span> <span class="o">*</span><span class="n">nums</span><span class="p">,</span> <span class="kt">int16_t</span> <span class="n">count</span><span class="p">,</span> <span class="kt">int32_t</span> <span class="n">target</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">TwoSum</span> <span class="n">r</span> <span class="o">=</span> <span class="p">{</span><span class="mi">0</span><span class="p">};</span>
    <span class="kt">int16_t</span> <span class="n">seen</span><span class="p">[</span><span class="mi">1</span><span class="o">&lt;&lt;</span><span class="mi">14</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="mi">0</span><span class="p">};</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int16_t</span> <span class="n">n</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">n</span> <span class="o">&lt;</span> <span class="n">count</span><span class="p">;</span> <span class="n">n</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">int32_t</span> <span class="n">complement</span> <span class="o">=</span> <span class="n">target</span> <span class="o">-</span> <span class="n">nums</span><span class="p">[</span><span class="n">n</span><span class="p">];</span>
        <span class="kt">int32_t</span> <span class="n">key</span> <span class="o">=</span> <span class="n">complement</span><span class="o">&gt;</span><span class="n">nums</span><span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="o">?</span> <span class="n">complement</span> <span class="o">:</span> <span class="n">nums</span><span class="p">[</span><span class="n">n</span><span class="p">];</span>
        <span class="kt">uint32_t</span> <span class="n">hash</span> <span class="o">=</span> <span class="n">key</span> <span class="o">*</span> <span class="mi">489183053u</span><span class="p">;</span>
        <span class="kt">unsigned</span> <span class="n">mask</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">seen</span><span class="p">)</span><span class="o">/</span><span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">seen</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span>
        <span class="kt">unsigned</span> <span class="n">step</span> <span class="o">=</span> <span class="n">hash</span><span class="o">&gt;&gt;</span><span class="mi">13</span> <span class="o">|</span> <span class="mi">1</span><span class="p">;</span>
        <span class="k">for</span> <span class="p">(</span><span class="kt">unsigned</span> <span class="n">i</span> <span class="o">=</span> <span class="n">hash</span><span class="p">;;)</span> <span class="p">{</span>
            <span class="n">i</span> <span class="o">=</span> <span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="n">step</span><span class="p">)</span> <span class="o">&amp;</span> <span class="n">mask</span><span class="p">;</span>
            <span class="kt">int16_t</span> <span class="n">j</span> <span class="o">=</span> <span class="n">seen</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span>  <span class="c1">// unbias</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">j</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
                <span class="n">seen</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">n</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>  <span class="c1">// bias and insert</span>
                <span class="k">break</span><span class="p">;</span>
            <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">nums</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">==</span> <span class="n">complement</span><span class="p">)</span> <span class="p">{</span>
                <span class="n">r</span><span class="p">.</span><span class="n">i</span> <span class="o">=</span> <span class="n">j</span><span class="p">;</span>
                <span class="n">r</span><span class="p">.</span><span class="n">j</span> <span class="o">=</span> <span class="n">n</span><span class="p">;</span>
                <span class="n">r</span><span class="p">.</span><span class="n">ok</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
                <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Applying this technique to Go:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">TwoSumWithBespoke</span><span class="p">(</span><span class="n">nums</span> <span class="p">[]</span><span class="kt">int32</span><span class="p">,</span> <span class="n">target</span> <span class="kt">int32</span><span class="p">)</span> <span class="p">(</span><span class="kt">int</span><span class="p">,</span> <span class="kt">int</span><span class="p">,</span> <span class="kt">bool</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">var</span> <span class="n">seen</span> <span class="p">[</span><span class="m">1</span> <span class="o">&lt;&lt;</span> <span class="m">14</span><span class="p">]</span><span class="kt">int16</span>
    <span class="k">for</span> <span class="n">n</span><span class="p">,</span> <span class="n">num</span> <span class="o">:=</span> <span class="k">range</span> <span class="n">nums</span> <span class="p">{</span>
        <span class="n">complement</span> <span class="o">:=</span> <span class="n">target</span> <span class="o">-</span> <span class="n">num</span>
        <span class="n">hash</span> <span class="o">:=</span> <span class="kt">int</span><span class="p">(</span><span class="n">num</span> <span class="o">*</span> <span class="n">complement</span> <span class="o">*</span> <span class="m">489183053</span><span class="p">)</span>
        <span class="n">mask</span> <span class="o">:=</span> <span class="nb">len</span><span class="p">(</span><span class="n">seen</span><span class="p">)</span> <span class="o">-</span> <span class="m">1</span>
        <span class="n">step</span> <span class="o">:=</span> <span class="n">hash</span><span class="o">&gt;&gt;</span><span class="m">13</span> <span class="o">|</span> <span class="m">1</span>
        <span class="k">for</span> <span class="n">i</span> <span class="o">:=</span> <span class="n">hash</span><span class="p">;</span> <span class="p">;</span> <span class="p">{</span>
            <span class="n">i</span> <span class="o">=</span> <span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="n">step</span><span class="p">)</span> <span class="o">&amp;</span> <span class="n">mask</span>
            <span class="n">j</span> <span class="o">:=</span> <span class="kt">int</span><span class="p">(</span><span class="n">seen</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">-</span> <span class="m">1</span><span class="p">)</span> <span class="c">// unbias</span>
            <span class="k">if</span> <span class="n">j</span> <span class="o">&lt;</span> <span class="m">0</span> <span class="p">{</span>
                <span class="n">seen</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="kt">int16</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="o">+</span> <span class="m">1</span> <span class="c">// bias</span>
                <span class="k">break</span>
            <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="n">nums</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">==</span> <span class="n">complement</span> <span class="p">{</span>
                <span class="k">return</span> <span class="n">j</span><span class="p">,</span> <span class="n">n</span><span class="p">,</span> <span class="no">true</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="no">false</span>
<span class="p">}</span>
</code></pre></div></div>

<p>With Go 1.20 this is an order of magnitude faster than <code class="language-plaintext highlighter-rouge">map[int32]int16</code>,
which isn’t surprising. I used multiplication as the key operator because,
in my first take, Go produced a branch for the “max” operation — at a 25%
performance penalty on random inputs.</p>

<p>A full-featured, generic hash table may be overkill for your problem, and
a bit of hashed indexing with collision resolution over a small array
might be sufficient. The problem constraints might open up such shortcuts.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Assertions should be more debugger-oriented</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2022/06/26/"/>
    <id>urn:uuid:22ae914c-971b-4cee-ba48-a189db1b6df6</id>
    <updated>2022-06-26T18:51:04Z</updated>
    <category term="c"/><category term="cpp"/><category term="go"/><category term="python"/><category term="java"/>
    <content type="html">
      <![CDATA[<p>Prompted by <a href="https://www.youtube.com/watch?v=r9eQth4Q5jg">a 20 minute video</a>, over the past month I’ve improved my
debugger skills. I’d shamefully acquired a bad habit: avoiding a debugger
until exhausting dumber, insufficient methods. My <em>first</em> choice should be
a debugger, but I had allowed a bit of friction to dissuade me. With some
thoughtful practice and deliberate effort clearing the path, my bad habit
is finally broken — at least when a good debugger is available. It feels
like I’ve leveled up and, <a href="/blog/2017/04/01/">like touch typing</a>, this was a skill I’d
neglected far too long. One friction point was the less-than-optimal
<code class="language-plaintext highlighter-rouge">assert</code> feature in basically every programming language implementation.
It ought to work better with debuggers.</p>

<p>An assertion verifies a program invariant, and so if one fails then
there’s undoubtedly a defect in the program. In other words, assertions
make programs more sensitive to defects, allowing problems to be caught
more quickly and accurately. Counter-intuitively, crashing early and often
makes for more robust and reliable software in the long run. For exactly
this reason, assertions go especially well with <a href="/blog/2019/01/25/">fuzzing</a>.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">assert</span><span class="p">(</span><span class="n">i</span> <span class="o">&gt;=</span> <span class="mi">0</span> <span class="o">&amp;&amp;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">len</span><span class="p">);</span>   <span class="c1">// bounds check</span>
<span class="n">assert</span><span class="p">((</span><span class="kt">ssize_t</span><span class="p">)</span><span class="n">size</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="p">);</span>  <span class="c1">// suspicious size_t</span>
<span class="n">assert</span><span class="p">(</span><span class="n">cur</span><span class="o">-&gt;</span><span class="n">next</span> <span class="o">!=</span> <span class="n">cur</span><span class="p">);</span>    <span class="c1">// circular reference?</span>
</code></pre></div></div>

<p>They’re sometimes abused for error handling, which is a reason they’ve
also been (wrongfully) discouraged at times. For example, failing to open
a file is an error, not a defect, so an assertion is inappropriate.</p>

<p>Normal programs have implicit assertions all over, even if we don’t
usually think of them as assertions. In some cases they’re checked by the
hardware. Examples of implicit assertion failures:</p>

<ul>
  <li>Out-of-bounds indexing</li>
  <li>Dereferencing null/nil/None</li>
  <li>Dividing by zero</li>
  <li>Certain kinds of integer overflow (e.g. <code class="language-plaintext highlighter-rouge">-ftrapv</code>)</li>
</ul>

<p>Programs are generally not intended to recover from these situations
because, had they been anticipated, the invalid operation wouldn’t have
been attempted in the first place. The program simply crashes because
there’s no better alternative. Sanitizers, including Address Sanitizer
(ASan) and Undefined Behavior Sanitizer (UBSan), are in essence
additional, implicit assertions, checking invariants that aren’t normally
checked.</p>

<p>Ideally a failing assertion should have these two effects:</p>

<ul>
  <li>
    <p>Execution should <em>immediately</em> stop. The program is in an unknown state,
so it’s neither safe to “clean up” nor attempt to recover. Additional
execution will only make debugging more difficult, and may obscure the
defect.</p>
  </li>
  <li>
    <p>When run under a debugger — or visited as a core dump — it should break
exactly at the failed assertion, ready for inspection. I should not need
to dig around the call stack to figure out where the failure occurred. I
certainly shouldn’t need to manually set a breakpoint and restart the
program hoping to fail the assertion a second time. The whole reason for
using a debugger is to save time, so if it’s wasting my time then it’s
failing at its primary job.</p>
  </li>
</ul>

<p>I examined standard <code class="language-plaintext highlighter-rouge">assert</code> features across various language
implementations, and none strictly meet the criteria. Fortunately, in some
cases, it’s trivial to build a better assertion, and you can substitute
your own definition. First, let’s discuss the way assertions disappoint.</p>

<h3 id="a-test-assertion">A test assertion</h3>

<p>My test for C and C++ is minimal but establishes some state and gives me a
variable to inspect:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;assert.h&gt;</span><span class="cp">
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">10</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">assert</span><span class="p">(</span><span class="n">i</span> <span class="o">&lt;</span> <span class="mi">5</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Then I compile and debug in the most straightforward way:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cc -g -o test test.c
$ gdb test
(gdb) r
(gdb) bt
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">r</code> in GDB stands for <code class="language-plaintext highlighter-rouge">run</code>, which immediately breaks because of the
<code class="language-plaintext highlighter-rouge">assert</code>. The <code class="language-plaintext highlighter-rouge">bt</code> prints a backtrace. On a typical Linux distribution
that shows this backtrace:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0  __GI_raise
#1  __GI_abort
#2  __assert_fail_base
#3  __GI___assert_fail
#4  main
</code></pre></div></div>

<p>Well, actually, it’s much messier than this, but I manually cleaned it up:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linu
x/raise.c:50
#1  0x00007ffff7df4537 in __GI_abort () at abort.c:79
#2  0x00007ffff7df440f in __assert_fail_base (fmt=0x7ffff7f5d
128 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x
55555555600b "i &lt; 5", file=0x555555556004 "test.c", line=6, f
unction=&lt;optimized out&gt;) at assert.c:92
#3  0x00007ffff7e03662 in __GI___assert_fail (assertion=0x555
55555600b "i &lt; 5", file=0x555555556004 "test.c", line=6, func
tion=0x555555556011 &lt;__PRETTY_FUNCTION__.0&gt; "main") at assert
.c:101
#4  0x0000555555555178 in main () at test.c:6
</code></pre></div></div>

<p>That’s a lot to take in at a glance, and about 95% of it is noise that
will never contain useful information. Most notably, GDB didn’t stop at
the failing assertion. Instead there’s <em>four stack frames</em> of libc junk I
have to navigate before I can even begin debugging.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) up
(gdb) up
(gdb) up
(gdb) up
</code></pre></div></div>

<p>I must wade through this for every assertion failure. This is some of the
friction that made me avoid the debugger in the first place. glibc loves
indirection, so maybe the other libc implementations do better? How about
musl?</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0  setjmp
#1  raise
#2  ??
#3  ??
#4  ??
#5  ??
#6  ??
#7  ??
#8  ??
#9  ??
#10 ??
#11 ??
</code></pre></div></div>

<p>Oops, without musl debugging symbols I can’t debug assertions at all
because GDB can’t read the stack, so it’s lost. If you’re on Alpine you
can install <code class="language-plaintext highlighter-rouge">musl-dbg</code>, but otherwise you’ll probably need to build your
own from source. With debugging symbols, musl is no better than glibc:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0  __restore_sigs
#1  raise
#2  abort
#3  __assert_fail
#4  main
</code></pre></div></div>

<p>Same with FreeBSD:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0  thr_kill
#1  in raise
#2  in abort
#3  __assert
#4  main
</code></pre></div></div>

<p>OpenBSD has one fewer frame:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0  thrkill
#1  _libc_abort
#2  _libc___assert2
#3  main
</code></pre></div></div>

<p>How about on Windows with Mingw-w64?</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[Inferior 1 (process 7864) exited with code 03]
</code></pre></div></div>

<p>Oops, on Windows GDB doesn’t break at all on <code class="language-plaintext highlighter-rouge">assert</code>. You must first set
a breakpoint on <code class="language-plaintext highlighter-rouge">abort</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) b abort
</code></pre></div></div>

<p>Besides that, it’s the most straightforward so far:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0 msvcrt!abort
#1 msvcrt!_assert
#2 main
</code></pre></div></div>

<p>With MSVC (default CRT) I get something slightly different:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0 abort
#1 common_assert_to_stderr
#2 _wassert
#3 main
#4 __scrt_common_main_seh
</code></pre></div></div>

<p>RemedyBG leaves me at the <code class="language-plaintext highlighter-rouge">abort</code> like GDB does elsewhere. Visual Studio
recognizes that I don’t care about its stack frames and instead puts the
focus on the assertion, ready for debugging. The other stack frames are
there, but basically invisible. It’s the only case that practically meets
all my criteria!</p>

<p>I can’t entirely blame these implementations. The C standard requires that
<code class="language-plaintext highlighter-rouge">assert</code> print a diagnostic and call <code class="language-plaintext highlighter-rouge">abort</code>, and that <code class="language-plaintext highlighter-rouge">abort</code> raises
<code class="language-plaintext highlighter-rouge">SIGABRT</code>. There’s not much implementations can do, and it’s up to the
debugger to be smarter about it.</p>

<h3 id="sanitizers">Sanitizers</h3>

<p>ASan doesn’t break GDB on assertion failures, which is yet another source
of friction. You can work around this with an environment variable:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>export ASAN_OPTIONS=abort_on_error=1:print_legend=0
</code></pre></div></div>

<p>This works, but it’s the worst case of all: I get 7 junk stack frames on
top of the failed assertion. It’s also very noisy when it traps, so the
<code class="language-plaintext highlighter-rouge">print_legend=0</code> helps to cut it down a bit. I want this variable so often
that I set it in my shell’s <code class="language-plaintext highlighter-rouge">.profile</code> so that it’s always set.</p>

<p>With UBSan you can use <code class="language-plaintext highlighter-rouge">-fsanitize-undefined-trap-on-error</code>, which behaves
like the improved assertion. It traps directly on the defect with no junk
frames, though it prints no diagnostic. As a bonus, it also means you
don’t need to link <code class="language-plaintext highlighter-rouge">libubsan</code>. Thanks to the bonus, it fully supplants
<code class="language-plaintext highlighter-rouge">-ftrapv</code> for me on all platforms.</p>

<p><strong>Update November 2022</strong>: This “stop” hook eliminates ASan friction by
popping runtime frames — functions with the reserved <code class="language-plaintext highlighter-rouge">__</code> prefix — from
the call stack so that they’re not in the way when GDB takes control. It
requires Python support, which is the purpose of the feature-sniff outer
condition.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>if !$_isvoid($_any_caller_matches)
    define hook-stop
        while $_thread &amp;&amp; $_any_caller_matches("^__")
            up-silently
        end
    end
end
</code></pre></div></div>

<p>This is now part of my <code class="language-plaintext highlighter-rouge">.gdbinit</code>.</p>

<h3 id="a-better-assertion">A better assertion</h3>

<p>At least when under a debugger, here’s a much better assertion macro for
GCC and Clang:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define assert(c) if (!(c)) __builtin_trap()
</span></code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">__builtin_trap</code> inserts a trap instruction — a built-in breakpoint. By
not calling a function to raise a signal, there are no junk stack frames
and no need to breakpoint on <code class="language-plaintext highlighter-rouge">abort</code>. It stops exactly where it should as
quickly as possible. This definition works reliably with GCC across all
platforms, too. On MSVC the equivalent is <code class="language-plaintext highlighter-rouge">__debugbreak</code>. If you’re really
in a pinch then do whatever it takes to trigger a fault, like
dereferencing a null pointer. A more complete definition might be:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#ifdef DEBUG
#  if __GNUC__
#    define assert(c) if (!(c)) __builtin_trap()
#  elif _MSC_VER
#    define assert(c) if (!(c)) __debugbreak()
#  else
#    define assert(c) if (!(c)) *(volatile int *)0 = 0
#  endif
#else
#  define assert(c)
#endif
</span></code></pre></div></div>

<p>None of these print a diagnostic, but that’s unnecessary when a debugger
is involved.</p>

<h3 id="other-languages">Other languages</h3>

<p>Unfortunately the situation <a href="https://github.com/rust-lang/rust/issues/21102">mostly gets worse</a> with other language
implementations, and it’s generally not possible to build a better
assertion. Assertions typically have exception-like semantics, if not
literally just another exception, and so they are far less reliable. If a
failed assertion raises an exception, then the program won’t stop until
it’s unwound the stack — running destructors and such along the way — all
the way to the top level looking for a handler. It only knows there’s a
problem when nobody was there to catch it.</p>

<p><a href="https://go.dev/doc/faq#assertions">Go officially doesn’t have assertions</a>, though panics are a kind of
assertion. However, panics have exception-like semantics, and so suffer
the problems of exceptions. A Go version of my test:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">defer</span> <span class="n">fmt</span><span class="o">.</span><span class="n">Println</span><span class="p">(</span><span class="s">"DEFER"</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">i</span> <span class="o">:=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="m">10</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span> <span class="p">{</span>
        <span class="k">if</span> <span class="n">i</span> <span class="o">&gt;=</span> <span class="m">5</span> <span class="p">{</span>
            <span class="nb">panic</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>If I run this under Go’s premier debugger, <a href="https://github.com/go-delve/delve">Delve</a>, the unrecovered
panic causes it to break. So far so good. However, I get two junk frames:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0 runtime.fatalpanic
#1 runtime.gopanic
#2 main.main
#3 runtime.main
#4 runtime.goexit
</code></pre></div></div>

<p>It only knows to stop because the Go runtime called <code class="language-plaintext highlighter-rouge">fatalpanic</code>, but the
backtrace is a fiction: The program continued to run after the panic,
enough to run all the registered defers (including printing “DEFER”),
unwinding the stack to the top level, and only then did it <code class="language-plaintext highlighter-rouge">fatalpanic</code>.
Fortunately it’s still possible to inspect all those stack frames even if
some variables may have changed while unwinding, but it’s more like
inspecting a core dump than a paused process.</p>

<p>The situation in Python is similar: <code class="language-plaintext highlighter-rouge">assert</code> raises AssertionError — a
plain old exception — and <code class="language-plaintext highlighter-rouge">pdb</code> won’t break until the stack has unwound,
exiting context managers and such. Only once the exception reaches the top
level does it enter “post mortem debugging,” like a core dump. At least
there are no junk stack frames on top. If you’re using asyncio then your
program may continue running for quite awhile before the right tasks are
scheduled and the exception finally propagates to the top level, if ever.</p>

<p>The worst offender of all is Java. First <code class="language-plaintext highlighter-rouge">jdb</code> never breaks for unhandled
exceptions. It’s up to you to set a breakpoint before the exception is
thrown. But it gets worse: assertions are disabled under <code class="language-plaintext highlighter-rouge">jdb</code>. The Java
<code class="language-plaintext highlighter-rouge">assert</code> statement is worse than useless.</p>

<h3 id="addendum-dont-exit-the-debugger">Addendum: Don’t exit the debugger</h3>

<p>The largest friction-reducing change I made is never exiting the debugger.
Previously I would enter GDB, run my program, exit, edit/rebuild, repeat.
However, there’s no reason to exit GDB! It automatically and reliably
reloads symbols and updates breakpoints on symbols. It remembers your run
configuration, so re-running is just <code class="language-plaintext highlighter-rouge">r</code> rather than interacting with
shell history.</p>

<p>My workflow on all platforms (<a href="/blog/2020/05/15/">including Windows</a>) is a vertically
maximized Vim window and a vertically maximized terminal window. The new
part for me: The terminal runs a long-term GDB session exclusively, with
<code class="language-plaintext highlighter-rouge">file</code> set to the program I’m writing, usually set by initial the command
line.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gdb myprogram
gdb&gt;
</code></pre></div></div>

<p>Alternatively use <code class="language-plaintext highlighter-rouge">file</code> after starting GDB. Occasionally useful if my
project has multiple binaries, and I want to examine a different program.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; file myprogram
</code></pre></div></div>

<p>I use <code class="language-plaintext highlighter-rouge">make</code> and Vim’s <code class="language-plaintext highlighter-rouge">:mak</code> command for building from within the editor,
so I don’t need to change context to build. The quickfix list takes me
straight to warnings/errors. Often I’m writing something that takes input
from standard input. So I use the <code class="language-plaintext highlighter-rouge">run</code> (<code class="language-plaintext highlighter-rouge">r</code>) command to set this up
(along with any command line arguments).</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; r &lt;test.txt
</code></pre></div></div>

<p>You can redirect standard output as well. It remembers these settings for
plain <code class="language-plaintext highlighter-rouge">run</code> later, so I can test my program by entering <code class="language-plaintext highlighter-rouge">r</code> and nothing
else.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; r
</code></pre></div></div>

<p>My usual workflow is edit, <code class="language-plaintext highlighter-rouge">:mak</code>, <code class="language-plaintext highlighter-rouge">r</code>, repeat. If I want to test a
different input or use different options, change the run configuration
using <code class="language-plaintext highlighter-rouge">run</code> again:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; r -a -b -c &lt;test2.txt
</code></pre></div></div>

<p>On Windows you cannot recompile while the program is running. If GDB is
sitting on a breakpoint but I want to build, use <code class="language-plaintext highlighter-rouge">kill</code> (<code class="language-plaintext highlighter-rouge">k</code>) to stop it
without exiting GDB.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; k
</code></pre></div></div>

<p>GDB has an annoying, flow-breaking yes/no prompt for this, so I recommend
<code class="language-plaintext highlighter-rouge">set confirm no</code> in your <code class="language-plaintext highlighter-rouge">.gdbinit</code> to disable it.</p>

<p>Sometimes a program is stuck in a loop and I need it to break in the
debugger. I try to avoid CTRL-C in the terminal it since it can confuse
GDB. A safer option is to signal the process from Vim with <code class="language-plaintext highlighter-rouge">pkill</code>, which
GDB will catch (except on Windows):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>:!pkill myprogram
</code></pre></div></div>

<p>I suspect many people don’t know this, but if you’re on Windows and
<a href="/blog/2021/03/11/">developing a graphical application</a>, you can <a href="https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-registerhotkey">press F12</a> in the
debuggee’s window to immediately break the program in the attached
debugger. This is a general platform feature and works with any native
debugger. I’ve been using it quite a lot.</p>

<p>On that note, you can run commands from GDB with <code class="language-plaintext highlighter-rouge">!</code>, which is another way
to avoid having an extra terminal window around:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; !git diff
</code></pre></div></div>

<p>In any case, GDB will re-read the binary on the next <code class="language-plaintext highlighter-rouge">run</code> and update
breakpoints, so it’s mostly seamless. If there’s a function I want to
debug, I set a breakpoint on it, then run.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; b somefunc
gdb&gt; r
</code></pre></div></div>

<p>Alternatively I’ll use a line number, which I read from Vim. Though GDB,
not being involved in the editing process, cannot track how that line
moves between builds.</p>

<p>An empty command repeats the last command, so once I’m at a breakpoint,
I’ll type <code class="language-plaintext highlighter-rouge">next</code> (<code class="language-plaintext highlighter-rouge">n</code>) — or <code class="language-plaintext highlighter-rouge">step</code> (<code class="language-plaintext highlighter-rouge">s</code>) to enter function calls — then
press enter each time I want to advance a line, often with my eye on the
context in Vim in the other window:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; n
gdb&gt;
gdb&gt;
</code></pre></div></div>

<p>(<del>I wish GDB could print a source listing around the breakpoint as
context, like Delve, but no such feature exists. The woeful <code class="language-plaintext highlighter-rouge">list</code> command
is inadequate.</del> <strong>Update</strong>: GDB’s TUI is a reasonable compromise for GUI
applications or terminal applications running under a separate tty/console
with either <code class="language-plaintext highlighter-rouge">tty</code> or <code class="language-plaintext highlighter-rouge">set new-console</code>. I can access it everywhere since
w64devkit now supports GDB TUI.)</p>

<p>If I want to advance to the next breakpoint, I use <code class="language-plaintext highlighter-rouge">continue</code> (<code class="language-plaintext highlighter-rouge">c</code>):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; c
</code></pre></div></div>

<p>If I’m walking through a loop, I want to see how variables change, but
it’s tedious to keep <code class="language-plaintext highlighter-rouge">print</code>ing (<code class="language-plaintext highlighter-rouge">p</code>) the same variables again and again.
So I use <code class="language-plaintext highlighter-rouge">display</code> (<code class="language-plaintext highlighter-rouge">disp</code>) to display an expression with each prompt,
much like the “watch” window in Visual Studio. For example, if my loop
variable is <code class="language-plaintext highlighter-rouge">i</code> over some string <code class="language-plaintext highlighter-rouge">str</code>, this will show me the current
character in character format (<code class="language-plaintext highlighter-rouge">/c</code>).</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; disp/c str[i]
</code></pre></div></div>

<p>You can accumulate multiple expressions. Use <code class="language-plaintext highlighter-rouge">undisplay</code> to remove them.</p>

<p>Too many breakpoints? Use <code class="language-plaintext highlighter-rouge">info breakpoints</code> (<code class="language-plaintext highlighter-rouge">i b</code>) to list them, then
<code class="language-plaintext highlighter-rouge">delete</code> (<code class="language-plaintext highlighter-rouge">d</code>) the unwanted ones by ID.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; i b
gdb&gt; d 3 5 8
</code></pre></div></div>

<p>GDB has many more feature than this, but 10 commands cover 99% of use
cases: <code class="language-plaintext highlighter-rouge">r</code>, <code class="language-plaintext highlighter-rouge">c</code>, <code class="language-plaintext highlighter-rouge">n</code>, <code class="language-plaintext highlighter-rouge">s</code>, <code class="language-plaintext highlighter-rouge">disp</code>, <code class="language-plaintext highlighter-rouge">k</code>, <code class="language-plaintext highlighter-rouge">b</code>, <code class="language-plaintext highlighter-rouge">i</code>, <code class="language-plaintext highlighter-rouge">d</code>, <code class="language-plaintext highlighter-rouge">p</code>.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  <entry>
    <title>A flexible, lightweight, spin-lock barrier</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2022/03/13/"/>
    <id>urn:uuid:5a72d27a-60f4-4b52-a4c2-f1c3b72e6c85</id>
    <updated>2022-03-13T23:55:08Z</updated>
    <category term="c"/><category term="cpp"/><category term="go"/><category term="x86"/><category term="optimization"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=30671979">on Hacker News</a>.</em></p>

<p>The other day I wanted try the famous <a href="https://preshing.com/20120515/memory-reordering-caught-in-the-act/">memory reordering experiment</a>
for myself. It’s the double-slit experiment of concurrency, where a
program can observe an <a href="https://research.swtch.com/hwmm">“impossible” result</a> on common hardware, as
though a thread had time-traveled. While getting thread timing as tight as
possible, I designed a possibly-novel thread barrier. It’s purely
spin-locked, the entire footprint is a zero-initialized integer, it
automatically resets, it can be used across processes, and the entire
implementation is just three to four lines of code.</p>

<!--more-->

<p>Here’s the entire barrier implementation for two threads in C11.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Spin-lock barrier for two threads. Initialize *barrier to zero.</span>
<span class="kt">void</span> <span class="nf">barrier_wait</span><span class="p">(</span><span class="k">_Atomic</span> <span class="kt">uint32_t</span> <span class="o">*</span><span class="n">barrier</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">uint32_t</span> <span class="n">v</span> <span class="o">=</span> <span class="o">++*</span><span class="n">barrier</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">v</span> <span class="o">&amp;</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">for</span> <span class="p">(</span><span class="n">v</span> <span class="o">&amp;=</span> <span class="mi">2</span><span class="p">;</span> <span class="p">(</span><span class="o">*</span><span class="n">barrier</span><span class="o">&amp;</span><span class="mi">2</span><span class="p">)</span> <span class="o">==</span> <span class="n">v</span><span class="p">;);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Or in Go:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">BarrierWait</span><span class="p">(</span><span class="n">barrier</span> <span class="o">*</span><span class="kt">uint32</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">v</span> <span class="o">:=</span> <span class="n">atomic</span><span class="o">.</span><span class="n">AddUint32</span><span class="p">(</span><span class="n">barrier</span><span class="p">,</span> <span class="m">1</span><span class="p">)</span>
    <span class="k">if</span> <span class="n">v</span><span class="o">&amp;</span><span class="m">1</span> <span class="o">==</span> <span class="m">1</span> <span class="p">{</span>
        <span class="n">v</span> <span class="o">&amp;=</span> <span class="m">2</span>
        <span class="k">for</span> <span class="n">atomic</span><span class="o">.</span><span class="n">LoadUint32</span><span class="p">(</span><span class="n">barrier</span><span class="p">)</span><span class="o">&amp;</span><span class="m">2</span> <span class="o">==</span> <span class="n">v</span> <span class="p">{</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Even more, these two implementations are compatible with each other. C
threads and Go goroutines can synchronize on a common barrier using these
functions. Also note how it only uses two bits.</p>

<p>When I was done with my experiment, I did a quick search online for other
spin-lock barriers to see if anyone came up with the same idea. I found a
couple of <a href="https://web.archive.org/web/20151109230817/https://stackoverflow.com/questions/33598686/spinning-thread-barrier-using-atomic-builtins">subtly-incorrect</a> spin-lock barriers, and some
straightforward barrier constructions using a mutex spin-lock.</p>

<p>Before diving into how this works, and how to generalize it, let’s discuss
the circumstance that let to its design.</p>

<h3 id="experiment">Experiment</h3>

<p>Here’s the setup for the memory reordering experiment, where <code class="language-plaintext highlighter-rouge">w0</code> and <code class="language-plaintext highlighter-rouge">w1</code>
are initialized to zero.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>thread#1    thread#2
w0 = 1      w1 = 1
r1 = w1     r0 = w0
</code></pre></div></div>

<p>Considering all the possible orderings, it would seem that at least one of
<code class="language-plaintext highlighter-rouge">r0</code> or <code class="language-plaintext highlighter-rouge">r1</code> is 1. There seems to be no ordering where <code class="language-plaintext highlighter-rouge">r0</code> and <code class="language-plaintext highlighter-rouge">r1</code> could
both be 0. However, if raced precisely, this is a frequent or possibly
even majority occurrence on common hardware, including x86 and ARM.</p>

<p>How to go about running this experiment? These are concurrent loads and
stores, so it’s tempting to use <code class="language-plaintext highlighter-rouge">volatile</code> for <code class="language-plaintext highlighter-rouge">w0</code> and <code class="language-plaintext highlighter-rouge">w1</code>. However,
this would constitute a data race — undefined behavior in at least C and
C++ — and so we couldn’t really reason much about the results, at least
not without first verifying the compiler’s assembly. These are variables
in a high-level language, not architecture-level stores/loads, even with
<code class="language-plaintext highlighter-rouge">volatile</code>.</p>

<p>So my first idea was to use a bit of inline assembly for all accesses that
would otherwise be data races. x86-64:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">int</span> <span class="nf">experiment</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">w0</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">w1</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">r1</span><span class="p">;</span>
    <span class="kr">__asm</span> <span class="k">volatile</span> <span class="p">(</span>
        <span class="s">"movl  $1, %1</span><span class="se">\n</span><span class="s">"</span>
        <span class="s">"movl  %2, %0</span><span class="se">\n</span><span class="s">"</span>
        <span class="o">:</span> <span class="s">"=r"</span><span class="p">(</span><span class="n">r1</span><span class="p">),</span> <span class="s">"=m"</span><span class="p">(</span><span class="o">*</span><span class="n">w0</span><span class="p">)</span>
        <span class="o">:</span> <span class="s">"m"</span><span class="p">(</span><span class="o">*</span><span class="n">w1</span><span class="p">)</span>
    <span class="p">);</span>
    <span class="k">return</span> <span class="n">r1</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>ARM64 (to try on my Raspberry Pi):</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">int</span> <span class="nf">experiment</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">w0</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">w1</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">r1</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
    <span class="kr">__asm</span> <span class="k">volatile</span> <span class="p">(</span>
        <span class="s">"str  %w0, %1</span><span class="se">\n</span><span class="s">"</span>
        <span class="s">"ldr  %w0, %2</span><span class="se">\n</span><span class="s">"</span>
        <span class="o">:</span> <span class="s">"+r"</span><span class="p">(</span><span class="n">r1</span><span class="p">),</span> <span class="s">"=m"</span><span class="p">(</span><span class="n">w0</span><span class="p">)</span>
        <span class="o">:</span> <span class="s">"m"</span><span class="p">(</span><span class="n">w1</span><span class="p">)</span>
    <span class="p">);</span>
    <span class="k">return</span> <span class="n">r1</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This is from the point-of-view of thread#1, but I can swap the arguments
for thread#2. I’m expecting this to be inlined, and encouraging it with
<code class="language-plaintext highlighter-rouge">static</code>.</p>

<p>Alternatively, I could use C11 atomics with a relaxed memory order:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">int</span> <span class="nf">experiment</span><span class="p">(</span><span class="k">_Atomic</span> <span class="kt">int</span> <span class="o">*</span><span class="n">w0</span><span class="p">,</span> <span class="k">_Atomic</span> <span class="kt">int</span> <span class="o">*</span><span class="n">w1</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">atomic_store_explicit</span><span class="p">(</span><span class="n">w0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">memory_order_relaxed</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">atomic_load_explicit</span><span class="p">(</span><span class="n">w1</span><span class="p">,</span> <span class="n">memory_order_relaxed</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Since this is a <em>race</em> and I want both threads to run their two experiment
instructions as simultaneously as possible, it would be wise to use some
sort of <em>starting barrier</em>… exactly the purpose of a thread barrier! It
will hold the threads back until they’re both ready.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">w0</span><span class="p">,</span> <span class="n">w1</span><span class="p">,</span> <span class="n">r0</span><span class="p">,</span> <span class="n">r1</span><span class="p">;</span>

<span class="c1">// thread#1                   // thread#2</span>
<span class="n">w0</span> <span class="o">=</span> <span class="n">w1</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">BARRIER</span><span class="p">;</span>                      <span class="n">BARRIER</span><span class="p">;</span>
<span class="n">r1</span> <span class="o">=</span> <span class="n">experiment</span><span class="p">(</span><span class="o">&amp;</span><span class="n">w0</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">w1</span><span class="p">);</span>    <span class="n">r0</span> <span class="o">=</span> <span class="n">experiment</span><span class="p">(</span><span class="o">&amp;</span><span class="n">w1</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">w0</span><span class="p">);</span>
<span class="n">BARRIER</span><span class="p">;</span>                      <span class="n">BARRIER</span><span class="p">;</span>

<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">r0</span> <span class="o">&amp;&amp;</span> <span class="o">!</span><span class="n">r1</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">puts</span><span class="p">(</span><span class="s">"impossible!"</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The second thread goes straight into the barrier, but the first thread
does a little more work to initialize the experiment and a little more at
the end to check the result. The second barrier ensures they’re both done
before checking.</p>

<p>Running this only once isn’t so useful, so each thread loops a few million
times, hence the re-initialization in thread#1. The barriers keep them
lockstep.</p>

<h3 id="barrier-selection">Barrier selection</h3>

<p>On my first attempt, I made the obvious decision for the barrier: I used
<a href="https://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_barrier_wait.html"><code class="language-plaintext highlighter-rouge">pthread_barrier_t</code></a>. I was already using pthreads for spawning the
extra thread, including <a href="/blog/2020/05/15/">on Windows</a>, so this was convenient.</p>

<p>However, my initial results were disappointing. I only observed an
“impossible” result around one in a million trials. With some debugging I
determined that the pthreads barrier was just too damn slow, throwing off
the timing. This was especially true with winpthreads, bundled with
Mingw-w64, which in addition to the per-barrier mutex, grabs a <em>global</em>
lock <em>twice</em> per wait to manage the barrier’s reference counter.</p>

<p>All pthreads implementations I used were quick to yield to the system
scheduler. The first thread to arrive at the barrier would go to sleep,
the second thread would wake it up, and it was rare they’d actually race
on the experiment. This is perfectly reasonable for a pthreads barrier
designed for the general case, but I really needed a <em>spin-lock barrier</em>.
That is, the first thread to arrive spins in a loop until the second
thread arrives, and it never interacts with the scheduler. This happens so
frequently and quickly that it should only spin for a few iterations.</p>

<h3 id="barrier-design">Barrier design</h3>

<p>Spin locking means atomics. By default, atomics have sequentially
consistent ordering and will provide the necessary synchronization for the
non-atomic experiment variables. Stores (e.g. to <code class="language-plaintext highlighter-rouge">w0</code>, <code class="language-plaintext highlighter-rouge">w1</code>) made before
the barrier will be visible to all other threads upon passing through the
barrier. In other words, the initialization will propagate before either
thread exits the first barrier, and results propagate before either thread
exits the second barrier.</p>

<p>I know statically that there are only two threads, simplifying the
implementation. The plan: When threads arrive, they atomically increment a
shared variable to indicate such. The first to arrive will see an odd
number, telling it to atomically read the variable in a loop until the
other thread changes it to an even number.</p>

<p>At first with just two threads this might seem like a single bit would
suffice. If the bit is set, the other thread hasn’t arrived. If clear,
both threads have arrived.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">broken_wait1</span><span class="p">(</span><span class="k">_Atomic</span> <span class="kt">unsigned</span> <span class="o">*</span><span class="n">barrier</span><span class="p">)</span>
<span class="p">{</span>
    <span class="o">++*</span><span class="n">barrier</span><span class="p">;</span>
    <span class="k">while</span> <span class="p">(</span><span class="o">*</span><span class="n">barrier</span><span class="o">&amp;</span><span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Or to avoid an extra load, use the result directly:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">broken_wait2</span><span class="p">(</span><span class="k">_Atomic</span> <span class="kt">unsigned</span> <span class="o">*</span><span class="n">barrier</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">++*</span><span class="n">barrier</span> <span class="o">&amp;</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">while</span> <span class="p">(</span><span class="o">*</span><span class="n">barrier</span><span class="o">&amp;</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Neither of these work correctly, and the other mutex-free barriers I found
all have the same defect. Consider the broader picture: Between atomic
loads in the first thread spin-lock loop, suppose the second thread
arrives, passes through the barrier, does its work, hits the next barrier,
and increments the counter. Both threads see an odd counter simultaneously
and deadlock. No good.</p>

<p>To fix this, the wait function must also track the <em>phase</em>. The first
barrier is the first phase, the second barrier is the second phase, etc.
Conveniently <strong>the rest of the integer acts like a phase counter</strong>!
Writing this out more explicitly:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">barrier_wait</span><span class="p">(</span><span class="k">_Atomic</span> <span class="kt">unsigned</span> <span class="o">*</span><span class="n">barrier</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">unsigned</span> <span class="n">observed</span> <span class="o">=</span> <span class="o">++*</span><span class="n">barrier</span><span class="p">;</span>
    <span class="kt">unsigned</span> <span class="n">thread_count</span> <span class="o">=</span> <span class="n">observed</span> <span class="o">&amp;</span> <span class="mi">1</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">thread_count</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// not last arrival, watch for phase change</span>
        <span class="kt">unsigned</span> <span class="n">init_phase</span> <span class="o">=</span> <span class="n">observed</span> <span class="o">&gt;&gt;</span> <span class="mi">1</span><span class="p">;</span>
        <span class="k">for</span> <span class="p">(;;)</span> <span class="p">{</span>
            <span class="kt">unsigned</span> <span class="n">current_phase</span> <span class="o">=</span> <span class="o">*</span><span class="n">barrier</span> <span class="o">&gt;&gt;</span> <span class="mi">1</span><span class="p">;</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">current_phase</span> <span class="o">!=</span> <span class="n">init_phase</span><span class="p">)</span> <span class="p">{</span>
                <span class="k">break</span><span class="p">;</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The key: When the last thread arrives, it overflows the thread counter to
zero and increments the phase counter in one operation.</p>

<p>By the way, I’m using <code class="language-plaintext highlighter-rouge">unsigned</code> since it may eventually overflow, and
even <code class="language-plaintext highlighter-rouge">_Atomic int</code> overflow is undefined for the <code class="language-plaintext highlighter-rouge">++</code> operator. However,
if you use <code class="language-plaintext highlighter-rouge">atomic_fetch_add</code> or C++ <code class="language-plaintext highlighter-rouge">std::atomic</code> then overflow is
defined and you can use <code class="language-plaintext highlighter-rouge">int</code>.</p>

<p>Threads can never be more than one phase apart by definition, so only one
bit is needed for the phase counter, making this effectively a two-phase,
two-bit barrier. In my final implementation, rather than shift (<code class="language-plaintext highlighter-rouge">&gt;&gt;</code>), I
mask (<code class="language-plaintext highlighter-rouge">&amp;</code>) the phase bit with 2.</p>

<p>With this spin-lock barrier, the experiment observes <code class="language-plaintext highlighter-rouge">r0 = r1 = 0</code> in ~10%
of trials on my x86 machines and ~75% of trials on my Raspberry Pi 4.</p>

<h3 id="generalizing-to-more-threads">Generalizing to more threads</h3>

<p>Two threads required two bits. This generalizes to <code class="language-plaintext highlighter-rouge">log2(n)+1</code> bits for
<code class="language-plaintext highlighter-rouge">n</code> threads, where <code class="language-plaintext highlighter-rouge">n</code> is a power of two. You may have already figured out
how to support more threads: spend more bits on the thread counter.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Spin-lock barrier for n threads, where n is a power of two.</span>
<span class="c1">// Initialize *barrier to zero.</span>
<span class="kt">void</span> <span class="nf">barrier_waitn</span><span class="p">(</span><span class="k">_Atomic</span> <span class="kt">unsigned</span> <span class="o">*</span><span class="n">barrier</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">unsigned</span> <span class="n">v</span> <span class="o">=</span> <span class="o">++*</span><span class="n">barrier</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">v</span> <span class="o">&amp;</span> <span class="p">(</span><span class="n">n</span> <span class="o">-</span> <span class="mi">1</span><span class="p">))</span> <span class="p">{</span>
        <span class="k">for</span> <span class="p">(</span><span class="n">v</span> <span class="o">&amp;=</span> <span class="n">n</span><span class="p">;</span> <span class="p">(</span><span class="o">*</span><span class="n">barrier</span><span class="o">&amp;</span><span class="n">n</span><span class="p">)</span> <span class="o">==</span> <span class="n">v</span><span class="p">;);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Note: <strong>It never makes sense for <code class="language-plaintext highlighter-rouge">n</code> to exceed the logical core count!</strong>
If it does, then at least one thread must not be actively running. The
spin-lock ensures it does not get scheduled promptly, and the barrier will
waste lots of resources doing nothing in the meantime.</p>

<p>If the barrier is used little enough that you won’t overflow the overall
barrier integer — maybe just use a <code class="language-plaintext highlighter-rouge">uint64_t</code> — an implementation could
support arbitrary thread counts with the same principle using modular
division instead of the <code class="language-plaintext highlighter-rouge">&amp;</code> operator. The denominator is ideally a
compile-time constant in order to avoid paying for division in the
spin-lock loop.</p>

<p>While C11 <code class="language-plaintext highlighter-rouge">_Atomic</code> seems like it would be useful, unsurprisingly it is
not supported by one major, <a href="/blog/2021/12/30/">stubborn</a> implementation. If you’re
using C++11 or later, then go ahead use <code class="language-plaintext highlighter-rouge">std::atomic&lt;int&gt;</code> since it’s
well-supported. In real, practical C programs, I will continue using dual
implementations: interlocked functions on MSVC, and GCC built-ins (also
supported by Clang) everywhere else.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#if __GNUC__
#  define BARRIER_INC(x) __atomic_add_fetch(x, 1, __ATOMIC_SEQ_CST)
#  define BARRIER_GET(x) __atomic_load_n(x, __ATOMIC_SEQ_CST)
#elif _MSC_VER
#  define BARRIER_INC(x) _InterlockedIncrement(x)
#  define BARRIER_GET(x) _InterlockedOr(x, 0)
#endif
</span>
<span class="c1">// Spin-lock barrier for n threads, where n is a power of two.</span>
<span class="c1">// Initialize *barrier to zero.</span>
<span class="k">static</span> <span class="kt">void</span> <span class="nf">barrier_wait</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">barrier</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">v</span> <span class="o">=</span> <span class="n">BARRIER_INC</span><span class="p">(</span><span class="n">barrier</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">v</span> <span class="o">&amp;</span> <span class="p">(</span><span class="n">n</span> <span class="o">-</span> <span class="mi">1</span><span class="p">))</span> <span class="p">{</span>
        <span class="k">for</span> <span class="p">(</span><span class="n">v</span> <span class="o">&amp;=</span> <span class="n">n</span><span class="p">;</span> <span class="p">(</span><span class="n">BARRIER_GET</span><span class="p">(</span><span class="n">barrier</span><span class="p">)</span><span class="o">&amp;</span><span class="n">n</span><span class="p">)</span> <span class="o">==</span> <span class="n">v</span><span class="p">;);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This has the nice bonus that the interface does not have the <code class="language-plaintext highlighter-rouge">_Atomic</code>
qualifier, nor <code class="language-plaintext highlighter-rouge">std::atomic</code> template. It’s just a plain old <code class="language-plaintext highlighter-rouge">int</code>, making
the interface simpler and easier to use. It’s something I’ve grown to
appreciate from Go.</p>

<p>If you’d like to try the experiment yourself: <a href="https://gist.github.com/skeeto/c63b9ddf2c599eeca86356325b93f3a7"><code class="language-plaintext highlighter-rouge">reorder.c</code></a>. If
you’d like to see a test of Go and C sharing a thread barrier:
<a href="https://gist.github.com/skeeto/bdb5a0d2aa36b68b6f66ca39989e1444"><code class="language-plaintext highlighter-rouge">coop.go</code></a>.</p>

<p>I’m intentionally not providing the spin-lock barrier as a library. First,
it’s too trivial and small for that, and second, I believe <a href="https://vimeo.com/644068002">context is
everything</a>. Now that you understand the principle, you can whip up
your own, custom-tailored implementation when the situation calls for it,
just as the one in my experiment is hard-coded for exactly two threads.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Test cross-architecture without leaving home</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2021/08/21/"/>
    <id>urn:uuid:ac34f8a0-af73-4301-b21b-5a47d48e3069</id>
    <updated>2021-08-21T23:59:33Z</updated>
    <category term="c"/><category term="go"/><category term="debian"/><category term="trick"/>
    <content type="html">
      <![CDATA[<p>I like to test my software across different environments, on <a href="/blog/2020/05/15/">strange
platforms</a>, and with <a href="/blog/2018/04/13/">alternative implementations</a>. Each has its
own quirks and oddities that can shake bugs out earlier. C is particularly
good at this since it has such a wide selection of compilers and runs on
everything. For instance I count at least 7 distinct C compilers in Debian
alone. One advantage of <a href="/blog/2017/03/30/">writing portable software</a> is access to a
broader testing environment, and it’s one reason I prefer to target
standards rather than specific platforms.</p>

<p>However, I’ve long struggled with architecture diversity. My work and
testing has been almost entirely on x86, with ARM as a distant second
(Raspberry Pi and friends). Big endian hosts are particularly rare.
However, I recently learned a trick for quickly and conveniently accessing
many different architectures without even leaving my laptop: <a href="https://wiki.debian.org/QemuUserEmulation">QEMU User
Emulation</a>. Debian and its derivatives support this very well and
require almost no setup or configuration.</p>

<!--more-->

<h3 id="cross-compilation-example">Cross-compilation Example</h3>

<p>While there are many options, my main cross-testing architecture has been
PowerPC. It’s 32-bit big endian, while I’m generally working on 64-bit
little endian, which is exactly the sort of mismatch I’m going for. I use
a Debian-supplied cross-compiler and qemu-user tools. The <a href="https://en.wikipedia.org/wiki/Binfmt_misc">binfmt</a>
support is especially slick, so that’s how I usually use it.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># apt install gcc-powerpc-linux-gnu qemu-user-binfmt
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">binfmt_misc</code> is a kernel module that teaches Linux how to recognize
arbitrary binary formats. For instance, there’s a Wine binfmt so that
Linux programs can transparently <code class="language-plaintext highlighter-rouge">exec(3)</code> Windows <code class="language-plaintext highlighter-rouge">.exe</code> binaries. In the
case of QEMU User Mode, binaries for foreign architectures are loaded into
a QEMU virtual machine configured in user mode. In user mode there’s no
guest operating system, and instead the virtual machine translates guest
system calls to the host operating system.</p>

<p>The first package gives me <code class="language-plaintext highlighter-rouge">powerpc-linux-gnu-gcc</code>. The prefix is the
<a href="https://wiki.debian.org/Multiarch/Tuples">architecture tuple</a> describing the instruction set and system ABI.
To try this out, I have a little test program that inspects its execution
environment:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp">
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">w</span> <span class="o">=</span> <span class="s">"?"</span><span class="p">;</span>
    <span class="k">switch</span> <span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">))</span> <span class="p">{</span>
    <span class="k">case</span> <span class="mi">1</span><span class="p">:</span> <span class="n">w</span> <span class="o">=</span> <span class="s">"8"</span><span class="p">;</span>  <span class="k">break</span><span class="p">;</span>
    <span class="k">case</span> <span class="mi">2</span><span class="p">:</span> <span class="n">w</span> <span class="o">=</span> <span class="s">"16"</span><span class="p">;</span> <span class="k">break</span><span class="p">;</span>
    <span class="k">case</span> <span class="mi">4</span><span class="p">:</span> <span class="n">w</span> <span class="o">=</span> <span class="s">"32"</span><span class="p">;</span> <span class="k">break</span><span class="p">;</span>
    <span class="k">case</span> <span class="mi">8</span><span class="p">:</span> <span class="n">w</span> <span class="o">=</span> <span class="s">"64"</span><span class="p">;</span> <span class="k">break</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="kt">char</span> <span class="o">*</span><span class="n">b</span> <span class="o">=</span> <span class="s">"?"</span><span class="p">;</span>
    <span class="k">switch</span> <span class="p">(</span><span class="o">*</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="p">)(</span><span class="kt">int</span> <span class="p">[]){</span><span class="mi">1</span><span class="p">})</span> <span class="p">{</span>
    <span class="k">case</span> <span class="mi">0</span><span class="p">:</span> <span class="n">b</span> <span class="o">=</span> <span class="s">"big"</span><span class="p">;</span>    <span class="k">break</span><span class="p">;</span>
    <span class="k">case</span> <span class="mi">1</span><span class="p">:</span> <span class="n">b</span> <span class="o">=</span> <span class="s">"little"</span><span class="p">;</span> <span class="k">break</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="n">printf</span><span class="p">(</span><span class="s">"%s-bit, %s endian</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">b</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>When I run this natively on x86-64:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc test.c
$ ./a.out
64-bit, little endian
</code></pre></div></div>

<p>Running it on PowerPC via QEMU:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ powerpc-linux-gnu-gcc -static test.c
$ ./a.out
32-bit, big endian
</code></pre></div></div>

<p>Thanks to binfmt, I could execute it as though the PowerPC binary were a
native binary. With just a couple of environment variables in the right
place, I could pretend I’m developing on PowerPC — aside from emulation
performance penalties of course.</p>

<p>However, you might have noticed I pulled a sneaky on ya: <code class="language-plaintext highlighter-rouge">-static</code>. So far
what I’ve shown only works with static binaries. There’s no dynamic loader
available to run dynamically-linked binaries. Fortunately this is easy to
fix in two steps. The first step is to install the dynamic linker for
PowerPC:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># apt install libc6-powerpc-cross
</code></pre></div></div>

<p>The second is to tell QEMU where to find it since, unfortunately, it
cannot currently do so on its own.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ export QEMU_LD_PREFIX=/usr/powerpc-linux-gnu
</code></pre></div></div>

<p>Now I can leave out the <code class="language-plaintext highlighter-rouge">-static</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ powerpc-linux-gnu-gcc test.c
$ ./a.out
32-bit, big endian
</code></pre></div></div>

<p>A practical example: Remember <a href="https://github.com/skeeto/binitools">binitools</a>? I’m now ready to run its
<a href="/blog/2019/01/25/">fuzz-generated test suite</a> on this cross-testing platform.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git clone https://github.com/skeeto/binitools
$ cd binitools/
$ make check CC=powerpc-linux-gnu-gcc
...
PASS: 668/668
</code></pre></div></div>

<p>Or if I’m going to be running <code class="language-plaintext highlighter-rouge">make</code> often:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ export CC=powerpc-linux-gnu-gcc
$ make -e check
</code></pre></div></div>

<p>Recall: <a href="/blog/2017/08/20/">make’s <code class="language-plaintext highlighter-rouge">-e</code> flag</a> passes the environment through, so I
don’t need to pass <code class="language-plaintext highlighter-rouge">CC=...</code> on the command line each time.</p>

<p>When setting up a test suite for your own programs, consider how difficult
it would be to run the tests under customized circumstances like this. The
easier it is to run your tests, the more they’re going to be run. I’ve run
into many projects with such overly-complex test builds that even enabling
sanitizers in the tests suite was a pain, let alone cross-architecture
testing.</p>

<p>Dependencies? There might be a way to use <a href="https://wiki.debian.org/Multiarch/HOWTO">Debian’s multiarch support</a>
to install these packages, but I haven’t been able to figure it out. You
likely need to build dependencies yourself using the cross compiler.</p>

<h3 id="testing-with-go">Testing with Go</h3>

<p>None of this is limited to C (or even C++). I’ve also successfully used
this to test Go libraries and programs cross-architecture. This isn’t
nearly as important since it’s harder to write unportable Go than C — e.g.
<a href="https://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html">dumb pointer tricks</a> are literally labeled “unsafe”. However, Go
(gc) trivializes cross-compilation and is statically compiled, so it’s
incredibly simple. Once you’ve installed <code class="language-plaintext highlighter-rouge">qemu-user-binfmt</code> it’s entirely
transparent:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ GOARCH=mips64 go test
</code></pre></div></div>

<p>That’s all there is to cross-platform testing. If for some reason binfmt
doesn’t work (WSL) or you don’t want to install it, there’s just one extra
step (package named <code class="language-plaintext highlighter-rouge">example</code>):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ GOARCH=mips64 go test -c
$ qemu-mips64-static example.test
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">-c</code> option builds a test binary but doesn’t run it, instead allowing
you to choose where and how to run it.</p>

<p>It even works <a href="/blog/2021/06/29/">with cgo</a> — if you’re willing to jump through the same
hoops as with C of course:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">package</span> <span class="n">main</span>

<span class="c">// #include &lt;stdint.h&gt;</span>
<span class="c">// uint16_t v = 0x1234;</span>
<span class="c">// char *hi = (char *)&amp;v + 0;</span>
<span class="c">// char *lo = (char *)&amp;v + 1;</span>
<span class="k">import</span> <span class="s">"C"</span>
<span class="k">import</span> <span class="s">"fmt"</span>

<span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
	<span class="n">fmt</span><span class="o">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"%02x %02x</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="o">*</span><span class="n">C</span><span class="o">.</span><span class="n">hi</span><span class="p">,</span> <span class="o">*</span><span class="n">C</span><span class="o">.</span><span class="n">lo</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<p>With <code class="language-plaintext highlighter-rouge">go run</code> on x86-64:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ CGO_ENABLED=1 go run example.go
34 12
</code></pre></div></div>

<p>Via QEMU User Mode:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ export CGO_ENABLED=1
$ export GOARCH=mips64
$ export CC=mips64-linux-gnuabi64-gcc
$ export QEMU_LD_PREFIX=/usr/mips64-linux-gnuabi64
$ go run example.go
12 34
</code></pre></div></div>

<p>I was pleasantly surprised how well this all works.</p>

<h3 id="one-dimension">One dimension</h3>

<p>Despite the variety, all these architectures are still “running” the same
operating system, Linux, and so they only vary on one dimension. For most
programs primarily targeting x86-64 Linux, PowerPC Linux is practically
the same thing, while x86-64 OpenBSD is foreign territory despite sharing
an architecture and ABI (<a href="/blog/2016/11/17/">System V</a>). Testing across operating
systems still requires spending the time to install, configure, and
maintain these extra hosts. That’s an article for another time.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>More DLL fun with w64devkit: Go, assembly, and Python</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2021/06/29/"/>
    <id>urn:uuid:b2c53451-b12a-4f1a-a475-6c81096c9b5a</id>
    <updated>2021-06-29T21:50:30Z</updated>
    <category term="c"/><category term="cpp"/><category term="go"/><category term="win32"/><category term="x86"/>
    <content type="html">
      <![CDATA[<p>My previous article explained <a href="/blog/2021/05/31/">how to work with dynamic-link libraries
(DLLs) using w64devkit</a>. These techniques also apply to other
circumstances, including with languages and ecosystems outside of C and
C++. In particular, <a href="/blog/2020/05/15/">w64devkit</a> is a great complement to Go and reliably
fullfills all the needs of <a href="https://golang.org/cmd/cgo/">cgo</a> — Go’s C interop — and can even
bootstrap Go itself. As before, this article is in large part an exercise
in capturing practical information I’ve picked up over time.</p>

<h3 id="go-bootstrap-and-cgo">Go: bootstrap and cgo</h3>

<p>The primary Go implementation, confusingly <a href="https://golang.org/doc/faq#What_compiler_technology_is_used_to_build_the_compilers">named “gc”</a>, is an
<a href="/blog/2020/01/21/">incredible piece of software engineering</a>. This is apparent when
building the Go toolchain itself, a process that is fast, reliable, easy,
and simple. It was originally written in C, but was re-written in Go
starting with Go 1.5. The C compiler in w64devkit can build the original C
implementation which then can be used to bootstrap any more recent
version. It’s so easy that I personally never use official binary releases
and always bootstrap from source.</p>

<p>You will need the Go 1.4 source, <a href="https://dl.google.com/go/go1.4-bootstrap-20171003.tar.gz">go1.4-bootstrap-20171003.tar.gz</a>.
This “bootstrap” tarball is the last Go 1.4 release plus a few additional
bugfixes. You will also need the source of the actual version of Go you
want to use, such as Go 1.16.5 (latest version as of this writing).</p>

<p>Start by building Go 1.4 using w64devkit. On Windows, Go is built using a
batch script and no special build system is needed. Since it shouldn’t be
invoked with the BusyBox ash shell, I use <a href="/blog/2021/02/08/"><code class="language-plaintext highlighter-rouge">cmd.exe</code></a> explicitly.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ tar xf go1.4-bootstrap-20171003.tar.gz
$ mv go/ bootstrap
$ (cd bootstrap/src/ &amp;&amp; cmd /c make)
</code></pre></div></div>

<p>In about 30 seconds you’ll have a fully-working Go 1.4 toolchain. Next use
it to build the desired toolchain. You can move this new toolchain after
it’s built if necessary.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ export GOROOT_BOOTSTRAP="$PWD/bootstrap"
$ tar xf go1.16.5.src.tar.gz
$ (cd go/src/ &amp;&amp; cmd /c make)
</code></pre></div></div>

<p>At this point you can delete the bootstrap toolchain. You probably also
want to put Go on your PATH.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ rm -rf bootstrap/
$ printf 'PATH="$PATH;%s/go/bin"\n' "$PWD" &gt;&gt;~/.profile
$ source ~/.profile
</code></pre></div></div>

<p>Not only is Go now available, so is the full power of cgo. (Including <a href="https://dave.cheney.net/2016/01/18/cgo-is-not-go">its
costs</a> if used.)</p>

<h3 id="vim-suggestions">Vim suggestions</h3>

<p>Since w64devkit is oriented so much around Vim, here’s my personal Vim
configuration for Go. I don’t need or want fancy plugins, just access to
<code class="language-plaintext highlighter-rouge">goimports</code> and a couple of corrections to Vim’s built-in Go support (<code class="language-plaintext highlighter-rouge">[[</code>
and <code class="language-plaintext highlighter-rouge">]]</code> navigation). The included <code class="language-plaintext highlighter-rouge">ctags</code> understands Go, so tags
navigation works the same as it does with C. <code class="language-plaintext highlighter-rouge">\i</code> saves the current
buffer, runs <code class="language-plaintext highlighter-rouge">goimports</code>, and populates the quickfix list with any errors.
Similarly <code class="language-plaintext highlighter-rouge">:make</code> invokes <code class="language-plaintext highlighter-rouge">go build</code> and, as expected, populates the
quickfix list.</p>

<div class="language-vim highlighter-rouge"><div class="highlight"><pre class="highlight"><code>autocmd <span class="nb">FileType</span> <span class="k">go</span> <span class="k">setlocal</span> <span class="nb">makeprg</span><span class="p">=</span><span class="k">go</span>\ build
autocmd <span class="nb">FileType</span> <span class="k">go</span> <span class="nb">map</span> <span class="p">&lt;</span><span class="k">silent</span><span class="p">&gt;</span> <span class="p">&lt;</span><span class="k">buffer</span><span class="p">&gt;</span> <span class="p">&lt;</span>leader<span class="p">&gt;</span><span class="k">i</span>
<span class="se">    \</span> <span class="p">:</span><span class="k">update</span> \<span class="p">|</span>
<span class="se">    \</span> <span class="p">:</span><span class="k">cexpr</span> <span class="nb">system</span><span class="p">(</span><span class="s2">"goimports -w "</span> <span class="p">.</span> <span class="nb">expand</span><span class="p">(</span><span class="s2">"%"</span><span class="p">))</span> \<span class="p">|</span>
<span class="se">    \</span> <span class="p">:</span><span class="k">silent</span> <span class="k">edit</span><span class="p">&lt;</span><span class="k">cr</span><span class="p">&gt;</span>
autocmd <span class="nb">FileType</span> <span class="k">go</span> <span class="nb">map</span> <span class="p">&lt;</span><span class="k">buffer</span><span class="p">&gt;</span> <span class="p">[[</span>
<span class="se">    \</span> ?^\<span class="p">(</span>func\\<span class="p">|</span>var\\<span class="p">|</span><span class="nb">type</span>\\<span class="p">|</span><span class="k">import</span>\\<span class="p">|</span>package\<span class="p">)</span>\<span class="p">&gt;&lt;</span><span class="k">cr</span><span class="p">&gt;</span>
autocmd <span class="nb">FileType</span> <span class="k">go</span> <span class="nb">map</span> <span class="p">&lt;</span><span class="k">buffer</span><span class="p">&gt;</span> <span class="p">]]</span>
<span class="se">    \</span> /^\<span class="p">(</span>func\\<span class="p">|</span>var\\<span class="p">|</span><span class="nb">type</span>\\<span class="p">|</span><span class="k">import</span>\\<span class="p">|</span>package\<span class="p">)</span>\<span class="p">&gt;&lt;</span><span class="k">cr</span><span class="p">&gt;</span>
</code></pre></div></div>

<p>Go only comes with <code class="language-plaintext highlighter-rouge">gofmt</code> but <code class="language-plaintext highlighter-rouge">goimports</code> is just one command away, so
there’s little excuse not to have it:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ go install golang.org/x/tools/cmd/goimports@latest
</code></pre></div></div>

<p>Thanks to GOPROXY, all Go dependencies are accessible without (or before)
installing Git, so this tool installation works with nothing more than
w64devkit and a bootstrapped Go toolchain.</p>

<h3 id="cgo-dlls">cgo DLLs</h3>

<p>The intricacies of cgo are beyond the scope of this article, but the gist
is that a Go source file contains C source in a comment followed by
<code class="language-plaintext highlighter-rouge">import "C"</code>. The imported <code class="language-plaintext highlighter-rouge">C</code> object provides access to C types and
functions. Go functions marked with an <code class="language-plaintext highlighter-rouge">//export</code> comment, as well as the
commented C code, are accessible to C. The latter means we can use Go to
implement a C interface in a DLL, and the caller will have no idea they’re
actually talking to Go.</p>

<p>To illustrate, here’s an little C interface. To keep it simple, I’ve
specifically sidestepped some more complicated issues, particularly
involving memory management.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Which DLL am I running?</span>
<span class="kt">int</span> <span class="nf">version</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span>

<span class="c1">// Generate 64 bits from a CSPRNG.</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="kt">long</span> <span class="nf">rand64</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span>

<span class="c1">// Compute the Euclidean norm.</span>
<span class="kt">float</span> <span class="nf">dist</span><span class="p">(</span><span class="kt">float</span> <span class="n">x</span><span class="p">,</span> <span class="kt">float</span> <span class="n">y</span><span class="p">);</span>
</code></pre></div></div>

<p>Here’s a C implementation which I’m calling “version 1”.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;math.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;windows.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;ntsecapi.h&gt;</span><span class="cp">
</span>
<span class="kr">__declspec</span><span class="p">(</span><span class="n">dllexport</span><span class="p">)</span>
<span class="kt">int</span>
<span class="nf">version</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>

<span class="kr">__declspec</span><span class="p">(</span><span class="n">dllexport</span><span class="p">)</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="kt">long</span>
<span class="nf">rand64</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">unsigned</span> <span class="kt">long</span> <span class="kt">long</span> <span class="n">x</span><span class="p">;</span>
    <span class="n">RtlGenRandom</span><span class="p">(</span><span class="o">&amp;</span><span class="n">x</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">x</span><span class="p">));</span>
    <span class="k">return</span> <span class="n">x</span><span class="p">;</span>
<span class="p">}</span>

<span class="kr">__declspec</span><span class="p">(</span><span class="n">dllexport</span><span class="p">)</span>
<span class="kt">float</span>
<span class="nf">dist</span><span class="p">(</span><span class="kt">float</span> <span class="n">x</span><span class="p">,</span> <span class="kt">float</span> <span class="n">y</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="n">sqrtf</span><span class="p">(</span><span class="n">x</span><span class="o">*</span><span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="o">*</span><span class="n">y</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>As discussed in the previous article, each function is exported using
<code class="language-plaintext highlighter-rouge">__declspec</code> so that they’re available for import. As before:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cc -shared -Os -s -o hello1.dll hello1.c
</code></pre></div></div>

<p>Side note: This could be trivially converted into a C++ implementation
just by adding <code class="language-plaintext highlighter-rouge">extern "C"</code> to each declaration. It disables C++ features
like name mangling, and follows the C ABI so that the C++ functions appear
as C functions. Compiling the C++ DLL is exactly the same.</p>

<p>Suppose we wanted to implement this in Go instead of C. We already have
all the tools needed to do so. Here’s a Go implementation, “version 2”:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">package</span> <span class="n">main</span>

<span class="k">import</span> <span class="s">"C"</span>
<span class="k">import</span> <span class="p">(</span>
	<span class="s">"crypto/rand"</span>
	<span class="s">"encoding/binary"</span>
	<span class="s">"math"</span>
<span class="p">)</span>

<span class="c">//export version</span>
<span class="k">func</span> <span class="n">version</span><span class="p">()</span> <span class="n">C</span><span class="o">.</span><span class="kt">int</span> <span class="p">{</span>
	<span class="k">return</span> <span class="m">2</span>
<span class="p">}</span>

<span class="c">//export rand64</span>
<span class="k">func</span> <span class="n">rand64</span><span class="p">()</span> <span class="n">C</span><span class="o">.</span><span class="n">ulonglong</span> <span class="p">{</span>
	<span class="k">var</span> <span class="n">buf</span> <span class="p">[</span><span class="m">8</span><span class="p">]</span><span class="kt">byte</span>
	<span class="n">rand</span><span class="o">.</span><span class="n">Read</span><span class="p">(</span><span class="n">buf</span><span class="p">[</span><span class="o">:</span><span class="p">])</span>
	<span class="n">r</span> <span class="o">:=</span> <span class="n">binary</span><span class="o">.</span><span class="n">LittleEndian</span><span class="o">.</span><span class="n">Uint64</span><span class="p">(</span><span class="n">buf</span><span class="p">[</span><span class="o">:</span><span class="p">])</span>
	<span class="k">return</span> <span class="n">C</span><span class="o">.</span><span class="n">ulonglong</span><span class="p">(</span><span class="n">r</span><span class="p">)</span>
<span class="p">}</span>

<span class="c">//export dist</span>
<span class="k">func</span> <span class="n">dist</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span> <span class="n">C</span><span class="o">.</span><span class="n">float</span><span class="p">)</span> <span class="n">C</span><span class="o">.</span><span class="n">float</span> <span class="p">{</span>
	<span class="k">return</span> <span class="n">C</span><span class="o">.</span><span class="n">float</span><span class="p">(</span><span class="n">math</span><span class="o">.</span><span class="n">Sqrt</span><span class="p">(</span><span class="kt">float64</span><span class="p">(</span><span class="n">x</span><span class="o">*</span><span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="o">*</span><span class="n">y</span><span class="p">)))</span>
<span class="p">}</span>

<span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Note the use of C types for all arguments and return values. The <code class="language-plaintext highlighter-rouge">main</code>
function is required since this is the main package, but it will never be
called. The DLL is built like so:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ go build -buildmode=c-shared -o hello2.dll hello2.go
</code></pre></div></div>

<p>Without the <code class="language-plaintext highlighter-rouge">-o</code> option, the DLL will lack an extension. This works fine
since it’s mostly only convention on Windows, but it may be confusing
without it.</p>

<p>What if we need an import library? This will be required when linking with
the MSVC toolchain. In the previous article we asked Binutils to generate
one using <code class="language-plaintext highlighter-rouge">--out-implib</code>. For Go we have to handle this ourselves via
<code class="language-plaintext highlighter-rouge">gendef</code> and <code class="language-plaintext highlighter-rouge">dlltool</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gendef hello2.dll
$ dlltool -l hello2.lib -d hello2.def
</code></pre></div></div>

<p>The only way anyone upgrading would know version 2 was implemented in Go
is that the DLL is a lot bigger (a few MB vs. a few kB) since it now
contains an entire Go runtime.</p>

<h3 id="nasm-assembly-dll">NASM assembly DLL</h3>

<p>We could also go the other direction and implement the DLL using plain
assembly. It won’t even require linking against a C runtime.</p>

<p>w64devkit includes two assemblers: GAS (Binutils) which is used by GCC,
and NASM which has <a href="https://elronnd.net/writ/2021-02-13_att-asm.html">friendlier syntax</a>. I prefer the latter whenever
possible — exactly why I included NASM in the distribution. So here’s how
I implemented “version 3” in NASM assembly.</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">bits</span> <span class="mi">64</span>

<span class="nf">section</span> <span class="nv">.text</span>

<span class="nf">global</span> <span class="nb">Dl</span><span class="nv">lMainCRTStartup</span>
<span class="nf">export</span> <span class="nb">Dl</span><span class="nv">lMainCRTStartup</span>
<span class="nl">DllMainCRTStartup:</span>
	<span class="nf">mov</span> <span class="nb">eax</span><span class="p">,</span> <span class="mi">1</span>
	<span class="nf">ret</span>

<span class="nf">global</span> <span class="nv">version</span>
<span class="nf">export</span> <span class="nv">version</span>
<span class="nl">version:</span>
	<span class="nf">mov</span> <span class="nb">eax</span><span class="p">,</span> <span class="mi">3</span>
	<span class="nf">ret</span>

<span class="nf">global</span> <span class="nv">rand64</span>
<span class="nf">export</span> <span class="nv">rand64</span>
<span class="nl">rand64:</span>
	<span class="nf">rdrand</span> <span class="nb">rax</span>
	<span class="nf">ret</span>

<span class="nf">global</span> <span class="nb">di</span><span class="nv">st</span>
<span class="nf">export</span> <span class="nb">di</span><span class="nv">st</span>
<span class="nl">dist:</span>
	<span class="nf">mulss</span>  <span class="nv">xmm0</span><span class="p">,</span> <span class="nv">xmm0</span>
	<span class="nf">mulss</span>  <span class="nv">xmm1</span><span class="p">,</span> <span class="nv">xmm1</span>
	<span class="nf">addss</span>  <span class="nv">xmm0</span><span class="p">,</span> <span class="nv">xmm1</span>
	<span class="nf">sqrtss</span> <span class="nv">xmm0</span><span class="p">,</span> <span class="nv">xmm0</span>
	<span class="nf">ret</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">global</code> directive is common in NASM assembly and causes the named
symbol to have the external linkage needed when linking the DLL. The
<code class="language-plaintext highlighter-rouge">export</code> directive is Windows-specific and is equivalent to <code class="language-plaintext highlighter-rouge">dllexport</code> in
C.</p>

<p>Every DLL must have an entrypoint, usually named <code class="language-plaintext highlighter-rouge">DllMainCRTStartup</code>. The
return value indicates if the DLL successfully loaded. So far this has
been handled automatically by the C implementation, but at this low level
we must define it explicitly.</p>

<p>Here’s how to assemble and link the DLL:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ nasm -fwin64 -o hello3.o hello3.s
$ ld -shared -s -o hello3.dll hello3.o
</code></pre></div></div>

<h3 id="call-the-dlls-from-python">Call the DLLs from Python</h3>

<p>Python has a nice, built-in C interop, <code class="language-plaintext highlighter-rouge">ctypes</code>, that allows Python to
call arbitrary C functions in shared libraries, including DLLs, without
writing C to glue it together. To tie this all off, here’s a Python
program that loads all of the DLLs above and invokes each of the
functions:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">ctypes</span>

<span class="k">def</span> <span class="nf">load</span><span class="p">(</span><span class="n">version</span><span class="p">):</span>
    <span class="n">hello</span> <span class="o">=</span> <span class="n">ctypes</span><span class="p">.</span><span class="n">CDLL</span><span class="p">(</span><span class="sa">f</span><span class="s">"./hello</span><span class="si">{</span><span class="n">version</span><span class="si">}</span><span class="s">.dll"</span><span class="p">)</span>
    <span class="n">hello</span><span class="p">.</span><span class="n">version</span><span class="p">.</span><span class="n">restype</span> <span class="o">=</span> <span class="n">ctypes</span><span class="p">.</span><span class="n">c_int</span>
    <span class="n">hello</span><span class="p">.</span><span class="n">version</span><span class="p">.</span><span class="n">argtypes</span> <span class="o">=</span> <span class="p">()</span>
    <span class="n">hello</span><span class="p">.</span><span class="n">dist</span><span class="p">.</span><span class="n">restype</span> <span class="o">=</span> <span class="n">ctypes</span><span class="p">.</span><span class="n">c_float</span>
    <span class="n">hello</span><span class="p">.</span><span class="n">dist</span><span class="p">.</span><span class="n">argtypes</span> <span class="o">=</span> <span class="p">(</span><span class="n">ctypes</span><span class="p">.</span><span class="n">c_float</span><span class="p">,</span> <span class="n">ctypes</span><span class="p">.</span><span class="n">c_float</span><span class="p">)</span>
    <span class="n">hello</span><span class="p">.</span><span class="n">rand64</span><span class="p">.</span><span class="n">restype</span> <span class="o">=</span> <span class="n">ctypes</span><span class="p">.</span><span class="n">c_ulonglong</span>
    <span class="n">hello</span><span class="p">.</span><span class="n">rand64</span><span class="p">.</span><span class="n">argtypes</span> <span class="o">=</span> <span class="p">()</span>
    <span class="k">return</span> <span class="n">hello</span>

<span class="k">for</span> <span class="n">hello</span> <span class="ow">in</span> <span class="n">load</span><span class="p">(</span><span class="mi">1</span><span class="p">),</span> <span class="n">load</span><span class="p">(</span><span class="mi">2</span><span class="p">),</span> <span class="n">load</span><span class="p">(</span><span class="mi">3</span><span class="p">):</span>
    <span class="k">print</span><span class="p">(</span><span class="s">"version"</span><span class="p">,</span> <span class="n">hello</span><span class="p">.</span><span class="n">version</span><span class="p">())</span>
    <span class="k">print</span><span class="p">(</span><span class="s">"rand   "</span><span class="p">,</span> <span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">hello</span><span class="p">.</span><span class="n">rand64</span><span class="p">()</span><span class="si">:</span><span class="mi">016</span><span class="n">x</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="s">"dist   "</span><span class="p">,</span> <span class="n">hello</span><span class="p">.</span><span class="n">dist</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">))</span>
</code></pre></div></div>

<p>After loading the DLL with <code class="language-plaintext highlighter-rouge">CDLL</code> the program defines each function
prototype so that Python knows how to call it. Unfortunately it’s not
possible to build Python with w64devkit, so you’ll also need to install
the standard CPython distribution in order to run it. Here’s the output:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ python finale.py
version 1
rand    b011ea9bdbde4bdf
dist    5.0
version 2
rand    f7c86ff06ae3d1a2
dist    5.0
version 3
rand    2a35a05b0482c898
dist    5.0
</code></pre></div></div>

<p>That output is the result of four different languages interfacing in one
process: C, Go, x86-64 assembly, and Python. Pretty neat if you ask me!</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Conventions for Command Line Options</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2020/08/01/"/>
    <id>urn:uuid:9be2ce0e-298e-4085-8789-49674aecfeeb</id>
    <updated>2020-08-01T00:34:23Z</updated>
    <category term="tutorial"/><category term="posix"/><category term="c"/><category term="python"/><category term="go"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=24020952">on Hacker News</a> and critiqued <a href="https://utcc.utoronto.ca/~cks/space/blog/unix/MyOptionsConventions">on
Wandering Thoughts</a> (<a href="https://utcc.utoronto.ca/~cks/space/blog/unix/UnixOptionsConventions">2</a>, <a href="https://utcc.utoronto.ca/~cks/space/blog/python/ArgparseSomeUnixNotes">3</a>).</em></p>

<p>Command line interfaces have varied throughout their brief history but
have largely converged to some common, sound conventions. The core
<a href="https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html">originates from unix</a>, and the Linux ecosystem extended it,
particularly via the GNU project. Unfortunately some tools initially
<em>appear</em> to follow the conventions, but subtly get them wrong, usually
for no practical benefit. I believe in many cases the authors simply
didn’t know any better, so I’d like to review the conventions.</p>

<!--more-->

<h3 id="short-options">Short Options</h3>

<p>The simplest case is the <em>short option</em> flag. An option is a hyphen —
specifically HYPHEN-MINUS U+002D — followed by one alphanumeric
character. Capital letters are acceptable. The letters themselves <a href="http://www.catb.org/~esr/writings/taoup/html/ch10s05.html">have
conventional meanings</a> and are worth following if possible.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program -a -b -c
</code></pre></div></div>

<p>Flags can be grouped together into one program argument. This is both
convenient and unambiguous. It’s also one of those often missed details
when programs use hand-coded argument parsers, and the lack of support
irritates me.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program -abc
program -acb
</code></pre></div></div>

<p>The next simplest case are short options that take arguments. The
argument follows the option.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program -i input.txt -o output.txt
</code></pre></div></div>

<p>The space is optional, so the option and argument can be packed together
into one program argument. Since the argument is required, this is still
unambiguous. This is another often-missed feature in hand-coded parsers.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program -iinput.txt -ooutput.txt
</code></pre></div></div>

<p>This does not prohibit grouping. When grouped, the option accepting an
argument must be last.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program -abco output.txt
program -abcooutput.txt
</code></pre></div></div>

<p>This technique is used to create another category, <em>optional option
arguments</em>. The option’s argument can be optional but still unambiguous
so long as the space is always omitted when the argument is present.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program -c       # omitted
program -cblue   # provided
program -c blue  # omitted (blue is a new argument)

program -c -x   # two separate flags
program -c-x    # -c with argument "-x"
</code></pre></div></div>

<p>Optional option arguments should be used judiciously since they can be
surprising, but they have their uses.</p>

<p>Options can typically appear in any order — something parsers often
achieve via <em>permutation</em> — but non-options typically follow options.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program -a -b foo bar
program -b -a foo bar
</code></pre></div></div>

<p>GNU-style programs usually allow options and non-options to be mixed,
though I don’t consider this to be essential.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program -a foo -b bar
program foo -a -b bar
program foo bar -a -b
</code></pre></div></div>

<p>If a non-option looks like an option because it starts with a hyphen,
use <code class="language-plaintext highlighter-rouge">--</code> to demarcate options from non-options.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program -a -b -- -x foo bar
</code></pre></div></div>

<p>An advantage of requiring that non-options follow options is that the
first non-option demarcates the two groups, so <code class="language-plaintext highlighter-rouge">--</code> is less often
needed.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># note: without argument permutation
program -a -b foo -x bar  # 2 options, 3 non-options
</code></pre></div></div>

<h3 id="long-options">Long options</h3>

<p>Since short options can be cryptic, and there are such a limited number
of them, more complex programs support long options. A long option
starts with two hyphens followed by one or more alphanumeric, lowercase
words. Hyphens separate words. Using two hyphens prevents long options
from being confused for grouped short options.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program --reverse --ignore-backups
</code></pre></div></div>

<p>Occasionally flags are paired with a mutually exclusive inverse flag
that begins with <code class="language-plaintext highlighter-rouge">--no-</code>. This avoids a future <em>flag day</em> where the
default is changed in the release that also adds the flag implementing
the original behavior.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program --sort
program --no-sort
</code></pre></div></div>

<p>Long options can similarly accept arguments.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program --output output.txt --block-size 1024
</code></pre></div></div>

<p>These may optionally be connected to the argument with an equals sign
<code class="language-plaintext highlighter-rouge">=</code>, much like omitting the space for a short option argument.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program --output=output.txt --block-size=1024
</code></pre></div></div>

<p>Like before, this opens up the doors for optional option arguments. Due
to the required <code class="language-plaintext highlighter-rouge">=</code> this is still unambiguous.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program --color --reverse
program --color=never --reverse
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">--</code> retains its original behavior of disambiguating option-like
non-option arguments:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program --reverse -- --foo bar
</code></pre></div></div>

<h3 id="subcommands">Subcommands</h3>

<p>Some programs, such as Git, have subcommands each with their own
options. The main program itself may still have its own options distinct
from subcommand options. The program’s options come before the
subcommand and subcommand options follow the subcommand. Options are
never permuted around the subcommand.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>program -a -b -c subcommand -x -y -z
program -abc subcommand -xyz
</code></pre></div></div>

<p>Above, the <code class="language-plaintext highlighter-rouge">-a</code>, <code class="language-plaintext highlighter-rouge">-b</code>, and <code class="language-plaintext highlighter-rouge">-c</code> options are for <code class="language-plaintext highlighter-rouge">program</code>, and the
others are for <code class="language-plaintext highlighter-rouge">subcommand</code>. So, really, the subcommand is another
command line of its own.</p>

<h3 id="option-parsing-libraries">Option parsing libraries</h3>

<p>There’s little excuse for not getting these conventions right assuming
you’re interested in following the conventions. Short options can be
parsed correctly in <a href="https://github.com/skeeto/getopt">just ~60 lines of C code</a>. Long options are
<a href="https://github.com/skeeto/optparse">just slightly more complex</a>.</p>

<p>GNU’s <code class="language-plaintext highlighter-rouge">getopt_long()</code> supports long option abbreviation — with no way to
disable it (!) — but <a href="https://utcc.utoronto.ca/~cks/space/blog/python/ArgparseAbbreviatedOptions">this should be avoided</a>.</p>

<p>Go’s <a href="https://golang.org/pkg/flag/">flag package</a> intentionally deviates from the conventions.
It only supports long option semantics, via a single hyphen. This makes
it impossible to support grouping even if all options are only one
letter. Also, the only way to combine option and argument into a single
command line argument is with <code class="language-plaintext highlighter-rouge">=</code>. It’s sound, but I miss both features
every time I write programs in Go. That’s why I <a href="https://github.com/skeeto/optparse-go">wrote my own argument
parser</a>. Not only does it have a nicer feature set, I like the API a
lot more, too.</p>

<p>Python’s primary option parsing library is <code class="language-plaintext highlighter-rouge">argparse</code>, and I just can’t
stand it. Despite appearing to follow convention, it actually breaks
convention <em>and</em> its behavior is unsound. For instance, the following
program has two options, <code class="language-plaintext highlighter-rouge">--foo</code> and <code class="language-plaintext highlighter-rouge">--bar</code>. The <code class="language-plaintext highlighter-rouge">--foo</code> option accepts
an optional argument, and the <code class="language-plaintext highlighter-rouge">--bar</code> option is a simple flag.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">argparse</span>
<span class="kn">import</span> <span class="nn">sys</span>

<span class="n">parser</span> <span class="o">=</span> <span class="n">argparse</span><span class="p">.</span><span class="n">ArgumentParser</span><span class="p">()</span>
<span class="n">parser</span><span class="p">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s">'--foo'</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="nb">str</span><span class="p">,</span> <span class="n">nargs</span><span class="o">=</span><span class="s">'?'</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="s">'X'</span><span class="p">)</span>
<span class="n">parser</span><span class="p">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s">'--bar'</span><span class="p">,</span> <span class="n">action</span><span class="o">=</span><span class="s">'store_true'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">parser</span><span class="p">.</span><span class="n">parse_args</span><span class="p">(</span><span class="n">sys</span><span class="p">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">:]))</span>
</code></pre></div></div>

<p>Here are some example runs:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ python parse.py
Namespace(bar=False, foo='X')

$ python parse.py --foo
Namespace(bar=False, foo=None)

$ python parse.py --foo=arg
Namespace(bar=False, foo='arg')

$ python parse.py --bar --foo
Namespace(bar=True, foo=None)

$ python parse.py --foo arg
Namespace(bar=False, foo='arg')
</code></pre></div></div>

<p>Everything looks good except the last. If the <code class="language-plaintext highlighter-rouge">--foo</code> argument is
optional then why did it consume <code class="language-plaintext highlighter-rouge">arg</code>? What happens if I follow it with
<code class="language-plaintext highlighter-rouge">--bar</code>? Will it consume it as the argument?</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ python parse.py --foo --bar
Namespace(bar=True, foo=None)
</code></pre></div></div>

<p>Nope! Unlike <code class="language-plaintext highlighter-rouge">arg</code>, it left <code class="language-plaintext highlighter-rouge">--bar</code> alone, so instead of following the
unambiguous conventions, it has its own ambiguous semantics and attempts
to remedy them with a “smart” heuristic: “If an optional argument <em>looks
like</em> an option, then it must be an option!” Non-option arguments can
never follow an option with an optional argument, which makes that
feature pretty useless. Since <code class="language-plaintext highlighter-rouge">argparse</code> does not properly support <code class="language-plaintext highlighter-rouge">--</code>,
that does not help.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ python parse.py --foo -- arg
usage: parse.py [-h] [--foo [FOO]] [--bar]
parse.py: error: unrecognized arguments: -- arg
</code></pre></div></div>

<p>Please, stick to the conventions unless you have <em>really</em> good reasons
to break them!</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>A Go Module Testbed</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2020/02/13/"/>
    <id>urn:uuid:838f3d56-f5d0-4422-be45-277a175e5daf</id>
    <updated>2020-02-13T01:03:24Z</updated>
    <category term="go"/>
    <content type="html">
      <![CDATA[<p>I had <a href="/blog/2020/01/21/">recently lamented</a> that due to Go’s strict module security
policy it was unreasonably difficult to experiment and practice with
modules. Modules can only be fetched from servers with valid TLS
certificates, including both the module path and repository servers.
Setting up a small, local experiment meant creating a certificate
authority, generating and signing certificates, and installing these all
in the right places. I’d much rather relax Go’s security policy for the
experiment.</p>

<p>As a result of that complaint, <a href="https://github.com/golang/go/issues/36746">I learned</a> that the upcoming Go
1.14 has as a new feature: <a href="https://golang.org/doc/go1.14#go-env-vars"><code class="language-plaintext highlighter-rouge">GOINSECURE</code></a>. It’s like the old
<code class="language-plaintext highlighter-rouge">-insecure</code> option, but safer due to being finer grained: a whitelist of
exceptions. It’s exactly what I needed. Since then I’ve been using it to
run small module experiments. It started as some scripts, but I
eventually formalized it into its own little project.</p>

<p><strong><a href="https://github.com/skeeto/go-module-testbed">https://github.com/skeeto/go-module-testbed</a></strong> [requires Go 1.14]</p>

<!--more-->

<p>It’s first and foremost a shell script, and the Go source is only there
as a server. The interface is like a <a href="https://docs.python.org/3/tutorial/venv.html">Python virtual environment</a>
where “activating” the environment in a shell allows Go run from that
shell to interact with the testbed servers. The script establishes the
testbed environment and starts both servers in that environment. It
optionally accepts a testbed directory as an argument, defaulting to the
working directory as the testbed.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./go-module-testbed
</code></pre></div></div>

<p>In addition to running the servers in the foreground, the script
populates the testbed directory with an <code class="language-plaintext highlighter-rouge">activate</code> script, <code class="language-plaintext highlighter-rouge">src/</code>
containing module Git repositories, and <code class="language-plaintext highlighter-rouge">www/</code> containing the static web
server contents. These are initialized with a module named
<code class="language-plaintext highlighter-rouge">127.0.0.1/example</code> at v1.0.0. Why not <code class="language-plaintext highlighter-rouge">localhost</code> as the domain? The
domain part of a module path must contain at least one dot, and IP
addresses are acceptable.</p>

<p>The server logs requests to standard output so you can see each request
Go makes to the server. This is has been an important part of learning
what exactly Go is requesting from the web server hosting the module
path.</p>

<p>There’s one giant caveat: Modules <em>must</em> be hosted on a privileged port.
Normally that’s 443 (HTTPS), though in this case it’s 80 (HTTP). Since
it’s a privileged port, you’ll need to do some system configuration. On
Linux it’s easy enough just to temporarily forward the testbed port 8001
to port 80.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># iptables -t nat -I OUTPUT -p tcp -d 127.0.0.1 \
           --dport 80 -j REDIRECT --to-ports 8001
</code></pre></div></div>

<p>Unfortunately this means, outside of doing something with namespace or
containers, there can only be one testbed per host at at time. My goal
is just to run small, local, temporary experiments, so this isn’t a big
deal for me, but I wish it could be better.</p>

<h3 id="activating-the-environment">Activating the environment</h3>

<p>With the server running and the port forwarding configured, source the
<code class="language-plaintext highlighter-rouge">activate</code> script from a shell:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ source activate
</code></pre></div></div>

<p>This sets up an isolated, disposable <code class="language-plaintext highlighter-rouge">GOPATH</code> so that the testbed is
completely isolated from your normal development. It also updates
<code class="language-plaintext highlighter-rouge">PATH</code>, unconditionally enables modules (<code class="language-plaintext highlighter-rouge">GO111MODULE=on</code>), whitelists
the testbed servers in <code class="language-plaintext highlighter-rouge">GOINSECURE</code>, and sets <code class="language-plaintext highlighter-rouge">GOPRIVATE</code> so that the
testbed modules don’t leak anywhere outside the testbed environment.</p>

<p>The ensure that it’s all working, try installing the <code class="language-plaintext highlighter-rouge">hello</code> command
from the example module:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ go get 127.0.0.1/example/cmd/demo
go: downloading 127.0.0.1/example v1.0.0
go: found 127.0.0.1/example/cmd/demo in 127.0.0.1/example v1.0.0
$ demo
Example v1.0.0
</code></pre></div></div>

<p>Non-testbed modules are still accessible like normal, though all fetched
and built artifacts are isolated in the testbed environment:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ go get nullprogram.com/x/passphrase2pgp
$ go get golang.org/x/tools/cmd/goimports
</code></pre></div></div>

<p>So you can mix your experiments and practice with real modules.</p>

<h3 id="running-experiments">Running experiments</h3>

<p>From here you could practice creating a new minor version of the example
module, and see how it appears to the module’s users.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sed -i s/v1.0.0/v1.1.0/ src/example/example.go 
$ git -C src/example/ commit -a -m 'Bump to v1.1.0'
[master 7a3cf82] Bump to v1.1.0
 1 file changed, 1 insertion(+), 1 deletion(-)
$ git -C src/example/ tag -a v1.1.0 -m v1.1.0
$ go get 127.0.0.1/example/cmd/demo
go: downloading 127.0.0.1/example v1.1.0
go: found 127.0.0.1/example/cmd/demo in 127.0.0.1/example v1.1.0
$ demo
Example v1.1.0
</code></pre></div></div>

<p>Or try more challenging: Release a v2.0.0, which requires changing the
module path.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cd src/example/
$ go mod edit -module 127.0.0.1/example/v2 go.mod
$ sed -i s/v1.0.0/v2.0.0/ example.go 
$ git commit -a -m 'Bump to v2.0.0'
[master bf5c4cf] Bump to v2.0.0
 2 files changed, 2 insertions(+), 2 deletions(-)
$ git tag -a v2.0.0 -m v2.0.0
$ cd ../../www/example/
$ mkdir v2
$ sed 's#e git#e/v2 git#' index.html &gt;v2/index.html
$ cd ../../
$ go get 127.0.0.1/example/v2/cmd/demo
go: downloading 127.0.0.1/example/v2 v2.0.0
go: downloading 127.0.0.1/example v1.0.0
go: found 127.0.0.1/example/v2/cmd/demo in 127.0.0.1/example/v2 v2.0.0
go: finding module for package 127.0.0.1/example
go: found 127.0.0.1/example in 127.0.0.1/example v1.0.0
</code></pre></div></div>

<p>I was able to figure this all out specifically because of my testbed.
Adding a <code class="language-plaintext highlighter-rouge">/v2</code> module path on the web server was not obvious, and it’s
glossed over in the tutorials.</p>

<h3 id="nested-modules">Nested modules</h3>

<p>One of the under-documented corners of Go modules is <em>nested modules</em>.
That is, repositories that contain more than one module. (Note: These
are not called <em>submodules</em> since that would be confusing in the context
of Git.) The Go module testbed is great place to try them out — and to
learn why they should never be used. Even if I never plan to use them, I
still want to understand them since I might need to debug them someday.</p>

<p>There are two tricky parts to nested modules: the version tag and the
module path. Neither are documented as far as I’ve seen, so I had to
figure them out from official examples.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mkdir src/example/nested
$ cd src/example/nested/
$ go mod init 127.0.0.1/example/nested
go: creating new go.mod: module 127.0.0.1/example/nested
$ echo package nested &gt;nested.go
$ git add .
$ git commit -m 'Add a nested module'
[master c5b1a29] Add a nested module
 2 files changed, 4 insertions(+)
 create mode 100644 nested/go.mod
 create mode 100644 nested/nested.go
$ git tag -a nested/v1.2.3 -m v1.2.3
$ cd ../../../
$ mkdir www/example/nested
$ cp www/example/index.html www/example/nested/
$ go get 127.0.0.1/example/nested
go: downloading 127.0.0.1/example v1.0.0
go: downloading 127.0.0.1/example/nested v1.2.3
go: 127.0.0.1/example/nested upgrade =&gt; v1.2.3
</code></pre></div></div>

<p>Module versions are derived from the Git tag, which is global to the
repository. So how are nested modules versions indicated? They get
namespaced tags, as shown above with <code class="language-plaintext highlighter-rouge">nested/v1.2.3</code>. If I didn’t create
this tag, it would be as if I didn’t tag any version of that module.</p>

<p>The second unintuitive part is the web server’s response to <code class="language-plaintext highlighter-rouge">?go-get=1</code>.
At the nested module path, the response <em>must</em> indicate the containing
module and where to get it. In other words, it’s the same response as
the containing module, which is why I merely copied <code class="language-plaintext highlighter-rouge">index.html</code>.
Returning a 404 for the module path is no good — another thing I’ve
learned from the module testbed.</p>

<p>There are still many things I have yet to try or practice in my module
testbed. It’s great that now when I have a niggling question about
modules or <code class="language-plaintext highlighter-rouge">go get</code> behavior, I can get an answer within a minute or so
without needing to dig through useless online search results.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>Go's Tooling is an Undervalued Technology</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2020/01/21/"/>
    <id>urn:uuid:fdb2d255-a6d0-4b91-86c7-013dc86ddade</id>
    <updated>2020-01-21T23:59:59Z</updated>
    <category term="go"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=22113827">on Hacker News</a>, <a href="https://old.reddit.com/r/golang/comments/es621w/gos_tooling_is_an_undervalued_technology/">on reddit</a>, and
<a href="https://lobste.rs/s/vycqgn/go_s_tooling_is_undervalued_technology">on Lobsters</a>.</em></p>

<p>Regardless of your opinions of the <a href="https://golang.org/">Go programming language</a>, the
primary implementation of Go, gc, is an incredible piece of software
engineering. Everyone ought to be blown away by it! Yet not only is it
undervalued in general, even the Go community itself doesn’t fully
appreciate it. It’s not perfect, but it has unique features never before
seen in a toolchain.</p>

<!--more-->

<p>In this article, when I say “Go” I’m referring to the gc compiler.</p>

<h3 id="building-go">Building Go</h3>

<p>Since Go 1.5, Go is implemented in Go. It also has no external
dependencies, so to build Go only a Go compiler is required. On my
laptop, building the latest version of Go takes only 43 seconds. That
includes the compiler, linker, all cross-compilers, and the standard
library.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ tar xzf go1.13.6.src.tar.gz
$ cd go/src/
$ ./make.bash
</code></pre></div></div>

<p>Cross-compiling Go — as in to build a Go toolchain for another platform
supported by Go — only requires setting a couple of environment
variables (<code class="language-plaintext highlighter-rouge">GOOS</code>, <code class="language-plaintext highlighter-rouge">GOARCH</code>). So, in a mere 43 seconds I can compile an
<em>entire toolchain</em> for <em>any supported host</em>! If you already have a Go
compiler on your system, there’s no reason to bother with binary
releases. Just grab the source and build it. All can manage their own
toolchain with ease!</p>

<p>Anyone who’s ever built a GCC or Clang+LLVM toolchain, <em>especially</em>
anyone who’s built cross-compiler toolchains, should find this situation
totally bonkers. How could it possibly be so easy and so fast? GCC’s
configure script <a href="/blog/2017/03/30/">wouldn’t even finish</a> before Go was already
built.</p>

<p>Yes, this comparison is a bit apples and oranges. Both GCC and LLVM are
more advanced compilers and produce much more efficient code, so of
course there’s more to them, and of course they take longer to build.
But does that completely justify the difference? This goes double for
GCC and LLVM cross-compiler toolchains, which are, for the most part,
very complex and difficult to build.</p>

<p>If you don’t already have Go, all you need is a C compiler and the Go
1.4 source code. <a href="/blog/2016/11/17/">Bootstrapping</a> through Go 1.4 is easy, and
I’ve done it a number of times. I keep a copy of the Go 1.4 tarball just
for this reason.</p>

<p>How Go could improve: <a href="http://golang.org/s/better-linker">The linker could be better</a>. Binaries are
already too big, and getting bigger with each release. This problem is
acknowledged by the Go developers:</p>

<blockquote>
  <p>The original linker was also simpler than it is now and its
implementation fit in <a href="https://en.wikipedia.org/wiki/Ken_Thompson">one Turing award winner’s head</a>, so there’s
little abstraction or modularity. Unfortunately, as the linker grew
and evolved, it retained its lack of structure, and our sole Turing
award winner retired.</p>
</blockquote>

<p>The story for native interop (cgo) <a href="https://dave.cheney.net/2016/01/18/cgo-is-not-go">isn’t great either</a> and
requires trading away Go’s biggest strengths.</p>

<h3 id="package-management">Package Management</h3>

<p>Go has decentralized package management — or, more accurately, <em>module</em>
management. There’s no central module manager or module registration. To
use a Go module, it need only be hosted on a reachable network with a
valid HTTPS certificate. Modules are named by a module path that
includes its network location. This means there’s no land grab for
<a href="https://github.com/dateutil/dateutil/issues/984">popular module names</a>.</p>

<p>An organization using Go does not need to trust an external package
repository (PyPI, etc.), nor do they need to run an internal package
repository for their own internal packages. In general it’s sufficient
just to leverage the organization’s already-existing source control
system.</p>

<p>Dependencies are locked to a particular version cryptographically. The
upstream source cannot change a published module for those that already
depend on it. They <em>could</em> still publish a new version with <a href="https://blog.npmjs.org/post/180565383195/details-about-the-event-stream-incident">hostile
changes</a>, but one should be cautious about updating dependencies —
a deliberate action — <a href="https://research.swtch.com/deps">or even having dependencies in the first
place</a> (<a href="https://feeding.cloud.geek.nz/posts/outsourcing-webapp-maintenance-to-debian/">also</a>).</p>

<p>With decentralized module management, you might think that each
dependency host is a single point of failure — and you would be exactly
right. If any dependency disappears, you can no longer build in a fresh
checkout. Go has a solution for this: a <em>module proxy</em>. Before fetching
the dependency directly, Go (optionally, configured via <code class="language-plaintext highlighter-rouge">GOPROXY</code>)
checks with a module proxy that may have cached the dependency. This
eliminates the single point of failure. Google hosts a free module proxy
service for the internet, but organizations should probably run their
own module proxy internally, at least for external dependencies. This
neatly solves <a href="https://lwn.net/Articles/681410/">the left-pad problem</a>.</p>

<p>Honestly, this is a breath of fresh air. Decentralized modules are great
idea and avoid most of the issues of a centralized package repositories.</p>

<p>How Go could improve: Go’s module management is a little <em>too</em> gung-ho
about HTTPS and certificates. The module documentation is still
incomplete, and the only way to get answers to some of my questions was
either to find the relevant source code in Go or to simply experiment.</p>

<p>Normally I could experiment using my local system, but Go refuses to do
anything with modules unless I go through HTTPS with valid certificates.
Needing to do bunch of pointless configuration — creating a dummy CA,
dummy localhost certificates, and setting it all up — really kills my
momentum and motivation, and it delayed me in learning the new module
system. Before modules, Go supported an <code class="language-plaintext highlighter-rouge">-insecure</code> flag, which was
great for this sort of experimentation, but they removed it out of fear
of misuse. I’ll decide my own risks, thank you very much.</p>

<p>An example of a question without a documented answer: If my module path
is <code class="language-plaintext highlighter-rouge">example.com/foo</code> but my web server 301 redirects this request to
<code class="language-plaintext highlighter-rouge">example.com/foo/</code>, will Go follow this redirect <em>and</em> re-append
<code class="language-plaintext highlighter-rouge">?go-get=1</code>? (Yes.) Did I want to configure an HTTPS server just to test
this? (No.)</p>

<p><strong>Update</strong>: <a href="https://github.com/golang/go/issues/36746">I’ve been alerted</a> that <strong>Go 1.14 will introduce
<code class="language-plaintext highlighter-rouge">GOINSECURE</code></strong> as a finer-grained form of the old <code class="language-plaintext highlighter-rouge">-insecure</code> option.
This nicely solves my experimentation issue!</p>

<h4 id="vendoring">Vendoring</h4>

<p>I still haven’t even gotten to one of the most powerful and unique
module features — a feature which the Go developers initially didn’t
want to include. If you have a <code class="language-plaintext highlighter-rouge">vendor/</code> directory at the root of your
module, and you use <code class="language-plaintext highlighter-rouge">-mod=vendor</code> when compiling, Go will look in that
directory for modules. Go’s build system before modules (<code class="language-plaintext highlighter-rouge">GOPATH</code>) had a
similar mechanism.</p>

<p>This is called <em>vendoring</em> and the practice pre-dates Go itself. Just
check your dependency sources directly into source control alongside
your own sources and hook them into your build. Organizations will often
use this internally to lock down dependencies and to avoid depending on
external resources. Typically, vendoring is a lot of work. The project’s
build system must cooperate with the dependency’s build system. Then
eventually you may want to update a vendored dependency, which may
require more build changes.</p>

<p>These issues have led to the rise of <a href="https://github.com/nothings/stb"><em>header libraries</em></a> and
<a href="https://www.sqlite.org/amalgamation.html">amalgamations</a> in C and C++: libraries that are trivial to
integrate into any project.</p>

<p><strong>Go’s module system fully automates vendoring</strong>, which it can do
because it already orchestrates builds. A single command populates the
<code class="language-plaintext highlighter-rouge">vendor/</code> directory with all of the module’s current dependencies:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ go mod vendor
</code></pre></div></div>

<p>Normally you might follow this up by checking it into source control,
but that’s not the only way it’s useful. Instead a project could merely
include the <code class="language-plaintext highlighter-rouge">vendor/</code> directory in its release source tarball. That
tarball would be the <em>entire</em>, standalone source for that release. With
all external dependencies packed into the tarball, the program could be
built entirely offline on any system with a Go compiler. This is
incredibly useful for me personally.</p>

<p>Some open source projects <em>not</em> written in Go have dependencies-included
releases like this (<a href="https://crawl.develz.org/">example</a>), but it’s a ton of work. So, of
course, it’s usually not done. However, any Go project (not using cgo)
can accomplish this <em>trivially</em> without even thinking about it. This is
a such big deal, and nobody’s talking about it!</p>

<p>There’s lots of discussion about Go the programming language, but I
hardly see discussion about the amazing engineering that’s gone into Go
itself. It’s an under-appreciated piece of technology!</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>The Long Key ID Collider</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/07/22/"/>
    <id>urn:uuid:e688fdea-0699-42dd-89ac-564d0b2b65bc</id>
    <updated>2019-07-22T21:27:02Z</updated>
    <category term="crypto"/><category term="openpgp"/><category term="go"/>
    <content type="html">
      <![CDATA[<p>Over the last couple weeks I’ve spent a lot more time working with
OpenPGP keys. It’s a consequence of polishing my <a href="/blog/2019/07/10/">passphrase-derived
PGP key generator</a>. I’ve tightened up the internals, and it’s
enabled me to explore the corners of the format, try interesting
things, and observe how various OpenPGP implementations respond to
weird inputs.</p>

<p>For one particularly cool trick, take a look at these two (private)
keys I generated yesterday. Here’s the first:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-----BEGIN PGP PRIVATE KEY BLOCK-----

xVgEXTU3gxYJKwYBBAHaRw8BAQdAjJgvdh3N2pegXPEuMe25nJ3gI7k8gEgQvCor
AExppm4AAQC0TNsuIRHkxaGjLNN6hQowRMxLXAMrkZfMcp1DTG8GBg1TzQ9udWxs
cHJvZ3JhbS5jb23CXgQTFggAEAUCXTU3gwkQmSpe7h0QSfoAAGq0APwOtCFVCxpv
d/gzKUg0SkdygmriV1UmrQ+KYx9dhzC6xwEAqwDGsSgSbCqPdkwqi/tOn+MwZ5N9
jYxy48PZGZ2V3ws=
=bBGR
-----END PGP PRIVATE KEY BLOCK-----
</code></pre></div></div>

<p>And the second:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-----BEGIN PGP PRIVATE KEY BLOCK-----

xVgEXTU3gxYJKwYBBAHaRw8BAQdAzjSPKjpOuJoLP6G0z7pptx4sBNiqmgEI0xiH
Z4Xb16kAAP0Qyon06UB2/gOeV/KjAjCi91MeoUd7lsA5yn82RR5bOxAkzQ9udWxs
cHJvZ3JhbS5jb23CXgQTFggAEAUCXTU3gwkQmSpe7h0QSfoAAEv4AQDLRqx10v3M
bwVnJ8BDASAOzrPw+Rz1tKbjG9r45iE7NQEAhm9QVtFd8SN337kIWcq8wXA6j1tY
+UeEsjg+SHzkqA4=
=QnLn
-----END PGP PRIVATE KEY BLOCK-----
</code></pre></div></div>

<p>Concatenate these and then import them into GnuPG to have a look at
them. To avoid littering in your actual keyring, especially with
private keys, use the <code class="language-plaintext highlighter-rouge">--homedir</code> option to set up a temporary
keyring. I’m going to omit that option in the examples.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gpg --import &lt; keys.asc
gpg: key 992A5EEE1D1049FA: public key "nullprogram.com" imported
gpg: key 992A5EEE1D1049FA: secret key imported
gpg: key 992A5EEE1D1049FA: public key "nullprogram.com" imported
gpg: key 992A5EEE1D1049FA: secret key imported
gpg: Total number processed: 2
gpg:               imported: 2
gpg:       secret keys read: 2
gpg:   secret keys imported: 2
</code></pre></div></div>

<p>The user ID is “nullprogram.com” since I made these and that’s me
taking credit. “992A5EEE1D1049FA” is called the <em>long key ID</em>: a
64-bit value that identifies the key. It’s the lowest 64 bits of the
full key ID, a 160-bit SHA-1 hash. In the old days everyone used a
<em>short key ID</em> to identify keys, which was the lowest 32 bits of the
key. For these keys, that would be “1D1049FA”. However, this was
deemed <em>way</em> too short, and everyone has since switched to long key
IDs, or even the full 160-bit key ID.</p>

<p>The key ID is nothing more than a SHA-1 hash of the key creation date —
unsigned 32-bit unix epoch seconds — and the public key material. So
secret keys have the same key ID as their associated public key. This
makes sense since they’re a key <em>pair</em> and they go together.</p>

<p>Look closely and you’ll notice that both keypairs have the same long
key ID. If you hadn’t already guessed from the title of this article,
these are two different keys with the same long key ID. In other
words, <strong>I’ve created a long key ID collision</strong>. The GnuPG
<code class="language-plaintext highlighter-rouge">--list-keys</code> command prints the full key ID since it’s so important:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gpg --list-keys
---------------------
pub   ed25519 2019-07-22 [SCA]
      A422F8B0E1BF89802521ECB2992A5EEE1D1049FA
uid           [ unknown] nullprogram.com

pub   ed25519 2019-07-22 [SCA]
      F43BC80C4FC2603904E7BE02992A5EEE1D1049FA
uid           [ unknown] nullprogram.com
</code></pre></div></div>

<p>I was only targeting the lower 64 bits, but I actually managed to
collide the lowest 68 bits by chance. So a long key ID still isn’t
enough to truly identify any particular key.</p>

<p>This isn’t news, of course. Nor am I the first person to create a long
key ID collision. In 2013, <a href="https://mailarchive.ietf.org/arch/msg/openpgp/Al8DzxTH2KT7vtFAgZ1q17Nub_g">David Leon Gil published a long key ID
collision for two 4096-bit RSA public keys</a>. However, that is the
only other example I was able to find. He did not include the private
keys and did not elaborate on how he did it. I know he <em>did</em> generate
viable keys, not just garbage for the public key portions, since they’re
both self-signed.</p>

<p>Creating these keys was trickier than I had anticipated, and there’s an
old, clever trick that makes it work. Building atop the work I did for
passphrase2pgp, I created a standalone tool that will create a long key
ID collision and print the two keypairs to standard output:</p>

<ul>
  <li><strong><a href="https://github.com/skeeto/pgpcollider">https://github.com/skeeto/pgpcollider</a></strong></li>
</ul>

<p>Example usage:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ go get -u github.com/skeeto/pgpcollider
$ pgpcollider --verbose &gt; keys.asc
</code></pre></div></div>

<p>This can take up to a day to complete when run like this. The tool can
optionally coordinate many machines — see the <code class="language-plaintext highlighter-rouge">--server</code> / <code class="language-plaintext highlighter-rouge">-S</code> and
<code class="language-plaintext highlighter-rouge">--client</code> / <code class="language-plaintext highlighter-rouge">-C</code> options — to work together, greatly reducing the total
time. It took around 4 hours to create the keys above on a single
machine, generating a around 1 billion extra keys in the process. As
discussed below, I actually got lucky that it only took 1 billion. If
you modify the program to do short key ID collisions, it only takes a
few seconds.</p>

<p>The rest of this article is about how it works.</p>

<h3 id="birthday-attacks">Birthday Attacks</h3>

<p>An important detail is that <strong>this technique doesn’t target any specific
key ID</strong>. Cloning someone’s long key ID is still very expensive. No,
this is a <a href="https://en.wikipedia.org/wiki/Birthday_attack"><em>birthday attack</em></a>. To find a collision in a space of
2^64, on average I only need to generate 2^32 samples — the square root
of that space. That’s perfectly feasible on a regular desktop computer.
To collide long key IDs, I need only generate about 4 billion IDs and
efficiently do membership tests on that set as I go.</p>

<p>That last step is easier said than done. Naively, that might look like
this (pseudo-code):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>seen := map of long key IDs to keys
loop forever {
    key := generateKey()
    longID := key.ID[12:20]
    if longID in seen {
        output seen[longID]
        output key
        break
    } else {
        seen[longID] = key
    }
}
</code></pre></div></div>

<p>Consider the size of that map. Each long ID is 8 bytes, and we expect
to store around 2^32 of them. That’s <em>at minimum</em> 32 GB of storage
just to track all the long IDs. The map itself is going to have some
overhead, too. Since these are literally random lookups, this all
mostly needs to be in RAM or else lookups are going to be <em>very</em> slow
and impractical.</p>

<p>And I haven’t even counted the keys yet. As a saving grace, these are
Ed25519 keys, so that’s 32 bytes for the public key and 32 bytes for the
private key, which I’ll need if I want to make a self-signature. (The
signature itself will be larger than the secret key.) That’s around
256GB more storage, though at least this can be stored on the hard
drive. However, to address these from the map I’d need at least 38 bits,
plus some more in case it goes over. Just call it another 8 bytes.</p>

<p>So that’s, at a bare minimum, 64GB of RAM plus 256GB of other storage.
Since nothing is ideal, we’ll need more than this. This is all still
feasible, but will require expensive hardware. We can do a lot better.</p>

<h4 id="keys-from-seeds">Keys from seeds</h4>

<p>The first thing you might notice is that we can jettison that 256GB of
storage by being a little more clever about how we generate keys. Since
we don’t actually care about the security of these keys, we can generate
each key from a seed much smaller than the key itself. Instead of using
8 bytes to reference a key in storage, just use those 8 bytes to store
the seed used to make the key.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>counter := rand64()
seen := map of long key IDs to 64-bit seeds
loop forever {
    seed := counter
    counter++
    key := generateKey(seed)
    longID := key.ID[12:20]
    if longID in seen {
        output generateKey(seen[longID])
        output key
        break
    } else {
        seen[longID] = seed
    }
}
</code></pre></div></div>

<p>I’m incrementing a counter to generate the seeds because I don’t want to
experience the birthday paradox to apply to my seeds. Each really must
be unique. I’m using SplitMix64 for the PRNG <a href="https://github.com/skeeto/rng-go">since I learned it’s the
fastest</a> for Go, so a simple increment to generate seeds <a href="/blog/2018/07/31/">is
perfectly fine</a>.</p>

<p>Ultimately, this still uses utterly excessive amounts of memory.
Wouldn’t it be crazy if we could somehow get this 64GB map down to just
a few MBs of RAM? Well, we can!</p>

<h4 id="rainbow-tables">Rainbow tables</h4>

<p>For decades, password crackers have faced a similar problem. They want
to precompute the hashes for billions of popular passwords so that they
can efficiently reverse those password hashes later. However, storing
all those hashes would be unnecessarily expensive, or even infeasible.</p>

<p>So they don’t. Instead they use <a href="https://en.wikipedia.org/wiki/Rainbow_table"><em>rainbow tables</em></a>. Password hashes
are chained together into a hash chain, where a password hash leads to a
new password, then to a hash, and so on. Then only store the beginning
and the end of each chain.</p>

<p>To lookup a hash in the rainbow table, run the hash chain algorithm
starting from the target hash and, for each hash, check if it matches
the end of one of the chains. If so, recompute that chain and note the
step just before the target hash value. That’s the corresponding
password.</p>

<p>For example, suppose the password “foo” hashes to <code class="language-plaintext highlighter-rouge">9bfe98eb</code>, and we
have a <em>reduction function</em> that maps a hash to some password. In this
case, it maps <code class="language-plaintext highlighter-rouge">9bfe98eb</code> to “bar”. A trivial reduction function could
just be an index into a list of passwords. A hash chain starting from
“foo” might look like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>foo -&gt; 9bfe98eb -&gt; bar -&gt; 27af0841 -&gt; baz -&gt; d9d4bbcb
</code></pre></div></div>

<p>In reality a chain would be a lot longer. Another chain starting from
“apple” might look like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>apple -&gt; 7bbc06bc -&gt; candle -&gt; 82a46a63 -&gt; dog -&gt; 98c85d0a
</code></pre></div></div>

<p>We only store the tuples (foo, <code class="language-plaintext highlighter-rouge">d9d4bbcb</code>) and (apple, <code class="language-plaintext highlighter-rouge">98c85d0a</code>) in
our database. If the chains had been one million hashes long, we’d
still only store those two tuples. That’s literally a 1:1000000
compression ratio!</p>

<p>Later on we’re faced with reversing the hash <code class="language-plaintext highlighter-rouge">27af0841</code>, which isn’t
listed directly in the database. So we run the chain forward from that
hash until either I hit the maximum chain length (i.e. password not in
the table), or we recognize a hash:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>27af0841 -&gt; baz -&gt; d9d4bbcb
</code></pre></div></div>

<p>That <code class="language-plaintext highlighter-rouge">d9d4bbcb</code> hash is listed as being in the “foo” hash chain. So I
regenerate that hash chain to discover that “bar” leads to <code class="language-plaintext highlighter-rouge">27af0841</code>.
Password cracked!</p>

<h4 id="collider-rainbow-table">Collider rainbow table</h4>

<p>My collider works very similarly. A hash chain works like this: Start
with a 64-bit seed as before, generate a key, get the long key ID,
<strong>then use the long key ID as the seed for the next key</strong>.</p>

<p><img src="/img/diagram/collider-chain.svg" alt="" /></p>

<p>There’s one big difference. In the rainbow table the purpose is to run
the hash function backwards by looking at the previous step in the
chain. For the collider, I want to know if any of the hash chains
collide. So long as each chain starts from a unique seed, it would mean
we’ve found <strong>two different seeds that lead to the same long key ID</strong>.</p>

<p>Alternatively, it could be two different seeds that lead to the same
key, which wouldn’t be useful, but that’s trivial to avoid.</p>

<p>A simple and efficient way to check if two chains contain the same
sequence is to stop them at the same place in that sequence. Rather than
run the hash chains for some fixed number of steps, they stop when they
reach a <em>distinguishing point</em>. In my collider a distinguishing point is
where the long key ID ends with at least N 0 bits, where N determines
the average chain length. I chose 17 bits.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>func computeChain(seed) {
    loop forever {
        key := generateKey(seed)
        longID := key.ID[12:20]
        if distinguished(longID) {
            return longID
        }
        seed = longID
    }
}
</code></pre></div></div>

<p>If two different hash chains end on the same distinguishing point,
they’re guaranteed to have collided somewhere in the middle.</p>

<p><img src="/img/diagram/collision.svg" alt="" /></p>

<p>To determine where two chains collided, regenerate each chain and find
the first long key ID that they have in common. The step just before are
the colliding keys.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>counter := rand64()
seen := map of long key IDs to 64-bit seeds
loop forever {
    seed := counter
    counter++
    longID := computeChain(seed)
    if longID in seen {
        output findCollision(seed, seen[longID])
        break
    } else {
        seen[longID] = seed
    }
}
</code></pre></div></div>

<p>Hash chains computation is embarrassingly parallel, so the load can be
spread efficiently across CPU cores. With these rainbow(-like) tables,
my tool can generate and track billions of keys in mere megabytes of
memory. The additional computational cost is the time it takes to
generate a couple more chains than otherwise necessary.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Predictable, Passphrase-Derived PGP Keys</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/07/10/"/>
    <id>urn:uuid:cae3111c-2887-404a-bc0e-80b8c45a2d06</id>
    <updated>2019-07-10T04:18:29Z</updated>
    <category term="crypto"/><category term="openpgp"/><category term="go"/>
    <content type="html">
      <![CDATA[<p><em>tl;dr</em>: <strong><a href="https://github.com/skeeto/passphrase2pgp">passphrase2pgp</a></strong>.</p>

<p>One of my long-term concerns has been losing my core cryptographic keys,
or just not having access to them when I need them. I keep my important
data backed up, and if that data is private then I store it encrypted.
My keys are private, but how am I supposed to encrypt them? The chicken
or the egg?</p>

<p>The OpenPGP solution is to (optionally) encrypt secret keys using a key
derived from a passphrase. GnuPG prompts the user for this passphrase
when generating keys and when using secret keys. This protects the keys
at rest, and, with some caution, they can be included as part of regular
backups. The <a href="https://tools.ietf.org/html/rfc4880">OpenPGP specification, RFC 4880</a> has many options
for deriving a key from this passphrase, called <em>String-to-Key</em>, or S2K,
algorithms. None of the options are great.</p>

<p>In 2012, I selected the strongest S2K configuration at the time and,
along with a very strong passphrase, put my GnuPG keyring on the
internet as part of <a href="/blog/2012/06/23/">my public dotfiles repository</a>. It was a
kind of super-backup that would guarantee their availability anywhere
I’d need them.</p>

<p>My timing was bad because, with the release of GnuPG 2.1 in 2014, GnuPG
fundamentally changed its secret keyring format. <a href="https://dev.gnupg.org/T1800">S2K options are now
(quietly!) ignored</a> when deriving the protection keys. Instead it
auto-calibrates to much weaker settings. With this new version of GnuPG,
I could no longer update the keyring in my dotfiles repository without
significantly downgrading its protection.</p>

<p>By 2017 I was pretty irritated with the whole situation. I let my
OpenPGP keys expire, and then <a href="/blog/2017/03/12/">I wrote my own tool</a> to replace
the only feature of GnuPG I was actively using: encrypting my backups
with asymmetric encryption. One of its core features is that the
asymmetric keypair can be derived from a passphrase using a memory-hard
key derivation function (KDF). Attackers must commit a significant
quantity of memory (expensive) when attempting to crack the passphrase,
making the passphrase that much more effective.</p>

<p>Since the asymmetric keys themselves, not just the keys protecting them,
are derived from a passphrase, I never need to back them up! They’re
also always available whenever I need them. <strong>My keys are essentially
stored entirely in my brain</strong> as if I was a character in a William
Gibson story.</p>

<h3 id="tackling-openpgp-key-generation">Tackling OpenPGP key generation</h3>

<p>At the time I had expressed my interest in having this feature for
OpenPGP keys. It’s something I’ve wanted for a long time. I first <a href="https://github.com/skeeto/passphrase2pgp/tree/old-version">took
a crack at it in 2013</a> (now the the <code class="language-plaintext highlighter-rouge">old-version</code> branch) for
generating RSA keys. <a href="/blog/2015/10/30/">RSA isn’t that complicated</a> but <a href="https://blog.trailofbits.com/2019/07/08/fuck-rsa/">it’s very
easy to screw up</a>. Since I was rolling it from scratch, I didn’t
really trust myself not to subtly get it wrong. Plus I never figured out
how to self-sign the key. GnuPG doesn’t accept secret keys that aren’t
self-signed, so it was never useful.</p>

<p>I took another crack at it in 2018 with a much more brute force
approach. When a program needs to generate keys, it will either read
from <code class="language-plaintext highlighter-rouge">/dev/u?random</code> or, on more modern systems, call <code class="language-plaintext highlighter-rouge">getentropy(3)</code>.
These are all ultimately system calls, and <a href="/blog/2018/06/23/">I know how to intercept
those with Ptrace</a>. If I want to control key generation for <em>any</em>
program, not just GnuPG, I could intercept these inputs and replace them
with the output of a CSPRNG keyed by a passphrase.</p>

<p><strong><a href="https://github.com/skeeto/keyed">Keyed</a>: Linux Entropy Interception</strong></p>

<p>In practice this doesn’t work at all. Real programs like GnuPG and
OpenSSH’s <code class="language-plaintext highlighter-rouge">ssh-keygen</code> don’t rely solely on these entropy inputs. They
<a href="/blog/2019/04/30/">also grab entropy from other places</a>, like <code class="language-plaintext highlighter-rouge">getpid(2)</code>,
<code class="language-plaintext highlighter-rouge">gettimeofday(2)</code>, and even extract their own scheduler and execution
time noise. Without modifying these programs I couldn’t realistically
control their key generation.</p>

<p>Besides, even if it <em>did</em> work, it would still be fragile and unreliable
since these programs could always change how they use the inputs. So,
ultimately, it was more of an experiment than something practical.</p>

<h3 id="passphrase2pgp">passphrase2pgp</h3>

<p>For regular readers, it’s probably obvious that I <a href="/tags/go/">recently learned
Go</a>. While searching for good projects idea for cutting my teeth, I
noticed that <a href="https://golang.org/x/">Go’s “extended” standard library</a> has a lot of useful
cryptographic support, so the idea of generating the keys myself may be
worth revisiting.</p>

<p>Something else also happened since my previous attempt: The OpenPGP
ecosystem now has widespread support for elliptic curve cryptography. So
instead of RSA, I could generate a Curve25519 keypair, which, by design,
is basically impossible to screw up. <strong>Not only would I be generating
keys on my own terms, I’d being doing it <em>in style</em>, baby.</strong></p>

<p>There are two different ways to use Curve25519:</p>

<ol>
  <li>Digital signatures: Ed25519 (EdDSA)</li>
  <li>Diffie–Hellman (encryption): X25519 (ECDH)</li>
</ol>

<p>In GnuPG terms, the first would be a “sign only” key and the second is
an “encrypt only” key. But can’t you usually do both after you generate
a new OpenPGP key? If you’ve used GnuPG, you’ve probably seen the terms
“primary key” and “subkey”, but you probably haven’t had think about
them since it’s all usually automated.</p>

<p>The <em>primary key</em> is the one associated directly with your identity.
It’s always a signature key. The OpenPGP specification says this is a
signature key only by convention, but, practically speaking, it really
must be since signatures is what holds everything together. Like
packaging tape.</p>

<p>If you want to use encryption, independently generate an encryption key,
then sign that key with the primary key, binding that key as a <em>subkey</em>
to the primary key. This all happens automatically with GnuPG.</p>

<p>Fun fact: Two different primary keys can have the same subkey. Anyone
could even bind any of your subkeys to their primary key! They only need
to sign the public key! Though, of course, they couldn’t actually use
your key since they’d lack the secret key. It would just be really
confusing, and could, perhaps in certain situations, even cause some
OpenPGP clients to malfunction. (Note to self: This demands
investigation!)</p>

<p>It’s also possible to have signature subkeys. What good is that?
Paranoid folks will keep their primary key only on a secure, air-gapped,
then use only subkeys on regular systems. The subkeys can be revoked and
replaced independently of the primary key if something were to go wrong.</p>

<p>In Go, generating an X25519 key pair is this simple (yes, it actually
takes array pointers, <a href="https://github.com/golang/go/issues/32670">which is rather weird</a>):</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">package</span> <span class="n">main</span>

<span class="k">import</span> <span class="p">(</span>
	<span class="s">"crypto/rand"</span>
	<span class="s">"fmt"</span>

	<span class="s">"golang.org/x/crypto/curve25519"</span>
<span class="p">)</span>

<span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
	<span class="k">var</span> <span class="n">seckey</span><span class="p">,</span> <span class="n">pubkey</span> <span class="p">[</span><span class="m">32</span><span class="p">]</span><span class="kt">byte</span>
	<span class="n">rand</span><span class="o">.</span><span class="n">Read</span><span class="p">(</span><span class="n">seckey</span><span class="p">[</span><span class="o">:</span><span class="p">])</span> <span class="c">// FIXME: check for error</span>
	<span class="n">seckey</span><span class="p">[</span><span class="m">0</span><span class="p">]</span> <span class="o">&amp;=</span> <span class="m">248</span>
	<span class="n">seckey</span><span class="p">[</span><span class="m">31</span><span class="p">]</span> <span class="o">&amp;=</span> <span class="m">127</span>
	<span class="n">seckey</span><span class="p">[</span><span class="m">31</span><span class="p">]</span> <span class="o">|=</span> <span class="m">64</span>
	<span class="n">curve25519</span><span class="o">.</span><span class="n">ScalarBaseMult</span><span class="p">(</span><span class="o">&amp;</span><span class="n">pubkey</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">seckey</span><span class="p">)</span>
	<span class="n">fmt</span><span class="o">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"pub %x</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">pubkey</span><span class="p">[</span><span class="o">:</span><span class="p">])</span>
	<span class="n">fmt</span><span class="o">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"sec %x</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">seckey</span><span class="p">[</span><span class="o">:</span><span class="p">])</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The three bitwise operations are optional since it will do these
internally, but it ensures that the secret key is in its canonical form.
The actual Diffie–Hellman exchange requires just one more function call:
<code class="language-plaintext highlighter-rouge">curve25519.ScalarMult()</code>.</p>

<p>For Ed25519, the API is higher-level:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">package</span> <span class="n">main</span>

<span class="k">import</span> <span class="p">(</span>
	<span class="s">"crypto/rand"</span>
	<span class="s">"fmt"</span>

	<span class="s">"golang.org/x/crypto/ed25519"</span>
<span class="p">)</span>

<span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
	<span class="n">seed</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span> <span class="n">ed25519</span><span class="o">.</span><span class="n">SeedSize</span><span class="p">)</span>
	<span class="n">rand</span><span class="o">.</span><span class="n">Read</span><span class="p">(</span><span class="n">seed</span><span class="p">)</span> <span class="c">// FIXME: check for error</span>
	<span class="n">key</span> <span class="o">:=</span> <span class="n">ed25519</span><span class="o">.</span><span class="n">NewKeyFromSeed</span><span class="p">(</span><span class="n">seed</span><span class="p">)</span>
	<span class="n">fmt</span><span class="o">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"pub %x</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">key</span><span class="p">[</span><span class="m">32</span><span class="o">:</span><span class="p">])</span>
	<span class="n">fmt</span><span class="o">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"sec %x</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">key</span><span class="p">[</span><span class="o">:</span><span class="m">32</span><span class="p">])</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Signing a message with this key is just one function call:
<code class="language-plaintext highlighter-rouge">ed25519.Sign()</code>.</p>

<p>Unfortunately that’s the easy part. The other 400 lines of the real
program are concerned only with encoding these values in the complex
OpenPGP format. That’s the hard part. GnuPG’s <code class="language-plaintext highlighter-rouge">--list-packets</code> option
was really useful for debugging this part.</p>

<h3 id="openpgp-specification">OpenPGP specification</h3>

<p>(Feel free to skip this section if the OpenPGP wire format isn’t
interesting to you.)</p>

<p>Following the specification was a real challenge, especially since many
of the details for Curve25519 only appear in still incomplete (and still
erroneous) updates to the specification. I certainly don’t envy the
people who have to parse arbitrary OpenPGP packets. It’s finicky and has
arbitrary parts that don’t seem to serve any purpose, such as redundant
prefix and suffix bytes on signature inputs. Fortunately I only had to
worry about the subset that represents an unencrypted secret key export.</p>

<p>OpenPGP data is broken up into <em>packets</em>. Each packet begins with a tag
identifying its type, followed by a length, which itself is a variable
length. All the packets produced by passphrase2pgp are short, so I could
pretend lengths were all a single byte long.</p>

<p>For a secret key export with one subkey, we need the following packets
in this order:</p>

<ol>
  <li>Secret-Key: Public-Key packet with secret key appended</li>
  <li>User ID: just a length-prefixed, UTF-8 string</li>
  <li>Signature: binds Public-Key packet (1) and User ID packet (2)</li>
  <li>Secret-Subkey: Public-Subkey packet with secret subkey appended</li>
  <li>Signature: binds Public-Key packet (1) and Public-Subkey packet (4)</li>
</ol>

<p>A Public-Key packet contains the creation date, key type, and public key
data. A Secret-Key packet is the same, but with the secret key literally
appended on the end and a different tag. The Key ID is (essentially) a
SHA-1 hash of the Public-Key packet, meaning <strong>the creation date is part
of the Key ID</strong>. That’s important for later.</p>

<p>I had wondered if the <a href="https://shattered.io/">SHAttered</a> attack could be used to create
two different keys with the same full Key ID. However, there’s no slack
space anywhere in the input, so I doubt it.</p>

<p>User IDs are usually a RFC 2822 name and email address, but that’s only
convention. It can literally be an empty string, though that wouldn’t be
useful. OpenPGP clients that require anything more than an empty string,
such as GnuPG during key generation, are adding artificial restrictions.</p>

<p>The first Signature packet indicates the signature date, the signature
issuer’s Key ID, and then optional metadata about how the primary key is
to be used and the capabilities the key owner’s client. The signature
itself is formed by appending the Public-Key packet portion of the
Secret-Key packet, the User ID packet, and the previously described
contents of the signature packet. The concatenation is hashed, the hash
is signed, and the signature is appended to the packet. Since the
options are included in the signature, they can’t be changed by another
person.</p>

<p>In theory the signature is redundant. A client could accept the
Secret-Key packet and User ID packet and consider the key imported. It
would then create its own self-signature since it has everything it
needs. However, my primary target for passphrase2pgp is GnuPG, and it
will not accept secret keys that are not self-signed.</p>

<p>The Secret-Subkey packet is exactly the same as the Secret-Key packet
except that it uses a different tag to indicate it’s a subkey.</p>

<p>The second Signature packet is constructed the same as the previous
signature packet. However, it signs the concatenation of the Public-Key
and Public-Subkey packets, binding the subkey to that primary key. This
key may similarly have its own options.</p>

<p>To create a public key export from this input, a client need only chop
off the secret keys and fix up the packet tags and lengths. The
signatures remain untouched since they didn’t include the secret keys.
That’s essentially what other people will receive about your key.</p>

<p>If someone else were to create a Signature packet binding your
Public-Subkey packet with their Public-Key packet, they could set their
own options on their version of the key. So my question is: Do clients
properly track this separate set of options and separate owner for the
same key? If not, they have a problem!</p>

<p>The format may not sound so complex from this description, but there are
a ton of little details that all need to be correct. To make matters
worse, the feedback is usually just a binary “valid” or “invalid”. The
world could use an OpenPGP format debugger.</p>

<h3 id="usage">Usage</h3>

<p>There is one required argument: either <code class="language-plaintext highlighter-rouge">--uid</code> (<code class="language-plaintext highlighter-rouge">-u</code>) or <code class="language-plaintext highlighter-rouge">--load</code>
(<code class="language-plaintext highlighter-rouge">-l</code>). The former specifies a User ID since a key with an empty User ID
is pretty useless. It’s my own artificial restriction on the User ID.
The latter loads a previously-generated key which will come with a User
ID.</p>

<p>To generate a key for use in GnuPG, just pipe the output straight into
GnuPG:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ passphrase2pgp --uid "Foo &lt;foo@example.com&gt;" | gpg --import
</code></pre></div></div>

<p>You will be prompted for a passphrase. That passphrase is run through
<a href="https://github.com/P-H-C/phc-winner-argon2">Argon2id</a>, a memory-hard KDF, with the User ID as the salt.
Deriving the key requires 8 passes over 1GB of state, which takes my
current computers around 8 seconds. With the <code class="language-plaintext highlighter-rouge">--paranoid</code> (<code class="language-plaintext highlighter-rouge">-x</code>) option
enabled, that becomes 16 passes over 2GB (perhaps not paranoid enough?).
The output is 64 bytes: 32 bytes to seed the primary key and 32 bytes to
seed the subkey.</p>

<p>Despite the aggressive KDF settings, you will still need to choose a
strong passphrase. Anyone who has your public key can mount an offline
attack. A 10-word Diceware or <a href="/blog/2017/07/27/">Pokerware</a> passphrase is more than
sufficient (~128 bits) while also being quite reasonable to memorize.</p>

<p>Since the User ID is the salt, an attacker couldn’t build a single
rainbow table to attack passphrases for different people. (Though your
passphrase really should be strong enough that this won’t matter!) The
cost is that you’ll need to use exactly the same User ID again to
reproduce the key. <em>In theory</em> you could change the User ID afterward to
whatever you like without affecting the Key ID, though it will require a
new self-signature.</p>

<p>The keys are not encrypted (no S2K), and there are few options you can
choose when generating the keys. If you want to change any of this, use
GnuPG’s <code class="language-plaintext highlighter-rouge">--edit-key</code> tool after importing. For example, to set a
protection passphrase:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gpg --edit-key Foo
gpg&gt; passwd
</code></pre></div></div>

<p>There’s a lot that can be configured from this interface.</p>

<p>If you just need the public key to publish or share, the <code class="language-plaintext highlighter-rouge">--public</code>
(<code class="language-plaintext highlighter-rouge">-p</code>) option will suppress the private parts and output only a public
key. It works well in combination with ASCII armor, <code class="language-plaintext highlighter-rouge">--armor</code> (<code class="language-plaintext highlighter-rouge">-a</code>).
For example, to put your public key on the clipboard:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ passphrase2pgp -u '...' -ap | xclip
</code></pre></div></div>

<p>The tool can create detached signatures (<code class="language-plaintext highlighter-rouge">--sign</code>, <code class="language-plaintext highlighter-rouge">-S</code>) entirely on its
own, too, so you don’t need to import the keys into GnuPG just to make
signatures:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ passphrase2pgp --sign --uid '...' program.exe
</code></pre></div></div>

<p>This would create a file named <code class="language-plaintext highlighter-rouge">program.exe.sig</code> with the detached
signature, ready to be verified by another OpenPGP implementation. In
fact, you can hook it directly up to Git for signing your tags and
commits without GnuPG:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git config --global gpg.program passphrase2pgp
</code></pre></div></div>

<p>This only works for signing, and it cannot verify (<code class="language-plaintext highlighter-rouge">verify-tag</code> or
<code class="language-plaintext highlighter-rouge">verify-commit</code>).</p>

<p>It’s pretty tedious to enter the <code class="language-plaintext highlighter-rouge">--uid</code> option all the time, so, if
omitted, passphrase2pgp will infer the User ID from the environment
variables REALNAME and EMAIL. Combined with the KEYID environment
variable (see the README for details), you can easily get away with
<em>never</em> storing your keys: only generate them on demand when needed.</p>

<p>That’s how I intend to use passphrase2pgp. When I want to sign a file,
I’ll only need one option, one passphrase prompt, and a few seconds of
patience:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ passphrase2pgp -S path/to/file
</code></pre></div></div>

<h4 id="january-1-1970">January 1, 1970</h4>

<p>The first time you run the tool you might notice one offensive aspect of
its output: Your key will be dated January 1, 1970 — i.e. unix epoch
zero. This predates PGP itself by more than two decades, so it might
alarm people who receive your key.</p>

<p>Why do this? As I noted before, the creation date is part of the Key ID.
Use a different date, and, as far as OpenPGP is concerned, you have a
different key. Since users probably don’t want to remember a specific
datetime, <em>at seconds resolution</em>, in addition to their passphrase,
passphrase2pgp uses the same hard-coded date by default. A date of
January 1, 1970 is like NULL in a database: no data.</p>

<p>If you don’t like this, you can override it with the <code class="language-plaintext highlighter-rouge">--time</code> (<code class="language-plaintext highlighter-rouge">-t</code>) or
<code class="language-plaintext highlighter-rouge">--now</code> (<code class="language-plaintext highlighter-rouge">-n</code>) options, but it’s up to you to remain consistent.</p>

<h4 id="vanity-keys">Vanity Keys</h4>

<p>If you’re interested in vanity keys — e.g. where the Key ID spells out
words or looks unusual — it wouldn’t take much work to hack up the
passphrase2pgp source into generating your preferred vanity keys. It
would easily beat anything else I could find online.</p>

<h3 id="reconsidering-limited-openpgp">Reconsidering limited OpenPGP</h3>

<p>Initially my intention was <em>never</em> to output an encryption subkey, and
passphrase2pgp would only be useful for signatures. By default it still
only produces a sign key, but you can still get an encryption subkey
with the <code class="language-plaintext highlighter-rouge">--subkey</code> (<code class="language-plaintext highlighter-rouge">-s</code>) option. I figured it might be useful to
generate an encryption key, even if it’s not output by default. Users
can always ask for it later if they have a need for it.</p>

<p>Why only a signing key? Nobody should be using OpenPGP for encryption
anymore. Use better tools instead and retire the <a href="https://blog.cryptographyengineering.com/2014/08/13/whats-matter-with-pgp/">20th century
cryptography</a>. If you don’t have an encryption subkey, nobody can
send you OpenPGP-encrypted messages.</p>

<p>In contrast, OpenPGP signatures are still kind of useful and lack a
practical alternative. The Web of Trust failed to reach critical mass,
but that doesn’t seem to matter much in practice. Important OpenPGP keys
can be bootstrapped off TLS by strategically publishing them on HTTPS
servers. Keybase.io has done interesting things in this area.</p>

<p>Further, <a href="https://github.blog/2016-04-05-gpg-signature-verification/">GitHub officially supports OpenPGP signatures</a>, and I
believe GitLab does too. This is another way to establish trust for a
key. IMHO, there’s generally too much emphasis on binding a person’s
legal identity to their OpenPGP key (e.g. the idea behind key-signing
parties). I suppose that’s useful for holding a person legally
accountable if they do something wrong. I’d prefer trust a key with has
an established history of valuable community contributions, even if done
so <a href="https://en.wikipedia.org/wiki/Why_the_lucky_stiff">only under a pseudonym</a>.</p>

<p>So sometime in the future I may again advertise an OpenPGP public key.
If I do, those keys would certainly be generated with passphrase2pgp. I
may not even store the secret keys on a keyring, and instead generate
them on the fly only when I occasionally need them.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Go Slices are Fat Pointers</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/06/30/"/>
    <id>urn:uuid:5ba40d47-11e4-4f82-b805-f5e7825df44c</id>
    <updated>2019-06-30T21:27:19Z</updated>
    <category term="c"/><category term="go"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=20321116">on Hacker News</a>.</em></p>

<p>One of the frequent challenges in C is that pointers are nothing but a
memory address. A callee who is passed a pointer doesn’t truly know
anything other than the type of object being pointed at, which says some
things about alignment and how that pointer can be used… maybe. If it’s
a pointer to void (<code class="language-plaintext highlighter-rouge">void *</code>) then not even that much is known.</p>

<!--more-->

<p>The number of consecutive elements being pointed at is also not known.
It could be as few as zero, so dereferencing would be illegal. This can
be true even when the pointer is not null. Pointers can go one past the
end of an array, at which point it points to zero elements. For example:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="p">);</span>

<span class="kt">void</span> <span class="nf">bar</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">array</span><span class="p">[</span><span class="mi">4</span><span class="p">];</span>
    <span class="n">foo</span><span class="p">(</span><span class="n">array</span> <span class="o">+</span> <span class="mi">4</span><span class="p">);</span>  <span class="c1">// pointer one past the end</span>
<span class="p">}</span>
</code></pre></div></div>

<p>In some situations, the number of elements <em>is</em> known, at least to the
programmer. For example, the function might have a contract that says it
must be passed <em>at least</em> N elements, or <em>exactly</em> N elements. This
could be communicated through documentation.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/** Foo accepts 4 int values. */</span>
<span class="kt">void</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="p">);</span>
</code></pre></div></div>

<p>Or it could be implied by the function’s prototype. Despite the
following function appearing to accept an array, that’s actually a
pointer, and the “4” isn’t relevant to the prototype.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">int</span><span class="p">[</span><span class="mi">4</span><span class="p">]);</span>
</code></pre></div></div>

<p>C99 introduced a feature to make this a formal part of the prototype,
though, unfortunately, I’ve never seen a compiler actually use this
information.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">int</span><span class="p">[</span><span class="k">static</span> <span class="mi">4</span><span class="p">]);</span>  <span class="c1">// &gt;= 4 elements, cannot be null</span>
</code></pre></div></div>

<p>Another common pattern is for the callee to accept a count parameter.
For example, the POSIX <code class="language-plaintext highlighter-rouge">write()</code> function:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">ssize_t</span> <span class="nf">write</span><span class="p">(</span><span class="kt">int</span> <span class="n">fd</span><span class="p">,</span> <span class="k">const</span> <span class="kt">void</span> <span class="o">*</span><span class="n">buf</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">count</span><span class="p">);</span>
</code></pre></div></div>

<p>The necessary information describing the buffer is split across two
arguments. That can become tedious, and it’s also a source of serious
bugs if the two parameters aren’t in agreement (buffer overflow,
<a href="/blog/2017/07/19/">information disclosure</a>, etc.). Wouldn’t it be nice if this
information was packed into the pointer itself? That’s essentially the
definition of a <em>fat pointer</em>.</p>

<h3 id="fat-pointers-via-bit-hacks">Fat pointers via bit hacks</h3>

<p>If we assume some things about the target platform, we can encode fat
pointers inside a plain pointer with <a href="/blog/2016/05/30/">some dirty pointer
tricks</a>, exploiting unused bits in the pointer value. For
example, currently on x86-64, only the lower 48 bits of a pointer are
actually used. The other 16 bits could carefully be used for other
information, like communicating the number of elements or bytes:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// NOTE: x86-64 only!</span>
<span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">1000</span><span class="p">];</span>
<span class="n">uintptr</span> <span class="n">addr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uintptr_t</span><span class="p">)</span><span class="n">buf</span> <span class="o">&amp;</span> <span class="mh">0xffffffffffff</span><span class="p">;</span>
<span class="n">uintptr</span> <span class="n">pack</span> <span class="o">=</span> <span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="n">buf</span><span class="p">)</span> <span class="o">&lt;&lt;</span> <span class="mi">48</span><span class="p">)</span> <span class="o">|</span> <span class="n">addr</span><span class="p">;</span>
<span class="kt">void</span> <span class="o">*</span><span class="n">fatptr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="n">pack</span><span class="p">;</span>
</code></pre></div></div>

<p>The other side can unpack this to get the components back out. Obviously
16 bits for the count will often be insufficient, so this would more
likely be used for <a href="https://www.usenix.org/legacy/event/sec09/tech/full_papers/akritidis.pdf">baggy bounds checks</a>.</p>

<p>Further, if we know something about the alignment — say, that it’s
16-byte aligned — then we can also encode information in the least
significant bits, such as a type tag.</p>

<h3 id="fat-pointers-via-a-struct">Fat pointers via a struct</h3>

<p>That’s all fragile, non-portable, and rather limited. A more robust
approach is to lift pointers up into a richer, heavier type, like a
structure.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">fatptr</span> <span class="p">{</span>
    <span class="kt">void</span> <span class="o">*</span><span class="n">ptr</span><span class="p">;</span>
    <span class="kt">size_t</span> <span class="n">len</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Functions accepting these fat pointers no longer need to accept a count
parameter, and they’d generally accept the fat pointer by value.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fatptr_write</span><span class="p">(</span><span class="kt">int</span> <span class="n">fd</span><span class="p">,</span> <span class="k">struct</span> <span class="n">fatptr</span><span class="p">);</span>
</code></pre></div></div>

<p>In typical C implementations, the structure fields would be passed
practically, if not exactly, same way as the individual parameters would
have been passed, so it’s really no less efficient. (<strong>Update June 2024</strong>:
Pengji Zhang pointed out that this <a href="https://lists.sr.ht/~skeeto/public-inbox/%3CCANOCUiz9ZjRi06pvSDmKsXcHcTiWfAJCeKQUn3EYCh7Tv0poVA@mail.gmail.com%3E">applies only to the 2-element <code class="language-plaintext highlighter-rouge">struct
fatptr</code></a>, and not to 3-element slice headers discussed below.)</p>

<p>To help keep this straight, we might employ some macros:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define COUNTOF(array) \
    (sizeof(array) / sizeof(array[0]))
</span>
<span class="cp">#define FATPTR(ptr, count) \
    (struct fatptr){ptr, count}
</span>
<span class="cp">#define ARRAYPTR(array) \
    FATPTR(array, COUNTOF(array))
</span>
<span class="cm">/* ... */</span>

<span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">40</span><span class="p">];</span>
<span class="n">fatptr_write</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">ARRAYPTR</span><span class="p">(</span><span class="n">buf</span><span class="p">));</span>
</code></pre></div></div>

<p>There are obvious disadvantages of this approach, like type confusion
due to that void pointer, the inability to use <code class="language-plaintext highlighter-rouge">const</code>, and just being
weird for C. I wouldn’t use it in a real program, but bear with me.</p>

<p>Before I move on, I want to add one more field to that fat pointer
struct: capacity.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">fatptr</span> <span class="p">{</span>
    <span class="kt">void</span> <span class="o">*</span><span class="n">ptr</span><span class="p">;</span>
    <span class="kt">size_t</span> <span class="n">len</span><span class="p">;</span>
    <span class="kt">size_t</span> <span class="n">cap</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>This communicates not how many elements are present (<code class="language-plaintext highlighter-rouge">len</code>), but how
much additional space is left in the buffer. This allows callees know
how much room is left for, say, appending new elements.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Fix the remainder of an int buffer with a value.</span>
<span class="kt">void</span>
<span class="nf">fill</span><span class="p">(</span><span class="k">struct</span> <span class="n">fatptr</span> <span class="n">ptr</span><span class="p">,</span> <span class="kt">int</span> <span class="n">value</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="o">*</span><span class="n">buf</span> <span class="o">=</span> <span class="n">ptr</span><span class="p">.</span><span class="n">ptr</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="n">ptr</span><span class="p">.</span><span class="n">len</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">ptr</span><span class="p">.</span><span class="n">cap</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">buf</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Since the callee modifies the fat pointer, it should be returned:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">fatptr</span>
<span class="nf">fill</span><span class="p">(</span><span class="k">struct</span> <span class="n">fatptr</span> <span class="n">ptr</span><span class="p">,</span> <span class="kt">int</span> <span class="n">value</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="o">*</span><span class="n">buf</span> <span class="o">=</span> <span class="n">ptr</span><span class="p">.</span><span class="n">ptr</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="n">ptr</span><span class="p">.</span><span class="n">len</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">ptr</span><span class="p">.</span><span class="n">cap</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">buf</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="n">ptr</span><span class="p">.</span><span class="n">len</span> <span class="o">=</span> <span class="n">ptr</span><span class="p">.</span><span class="n">cap</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">ptr</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Congratulations, you’ve got slices! Except that in Go they’re a proper
part of the language and so doesn’t rely on hazardous hacks or tedious
bookkeeping. The <code class="language-plaintext highlighter-rouge">fatptr_write()</code> function above is nearly functionally
equivalent to the <code class="language-plaintext highlighter-rouge">Writer.Write()</code> method in Go, which accepts a slice:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">type</span> <span class="n">Writer</span> <span class="k">interface</span> <span class="p">{</span>
	<span class="n">Write</span><span class="p">(</span><span class="n">p</span> <span class="p">[]</span><span class="kt">byte</span><span class="p">)</span> <span class="p">(</span><span class="n">n</span> <span class="kt">int</span><span class="p">,</span> <span class="n">err</span> <span class="kt">error</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">buf</code> and <code class="language-plaintext highlighter-rouge">count</code> parameters are packed together as a slice, and
<code class="language-plaintext highlighter-rouge">fd</code> parameter is instead the <em>receiver</em> (the object being acted upon by
the method).</p>

<h3 id="go-slices">Go slices</h3>

<p>Go famously has pointers, including <em>internal pointers</em>, but not pointer
arithmetic. You can take the address of (<a href="https://utcc.utoronto.ca/~cks/space/blog/programming/GoAddressableValues">nearly</a>) anything, but
you can’t make that pointer point at anything else, even if you took the
address of an array element. Pointer arithmetic would undermine Go’s
type safety, so it can only be done through special mechanisms in the
<code class="language-plaintext highlighter-rouge">unsafe</code> package.</p>

<p>But pointer arithmetic is really useful! It’s handy to take an address
of an array element, pass it to a function, and allow that function to
modify a <em>slice</em> (wink, wink) of the array. <strong>Slices are pointers that
support exactly this sort of pointer arithmetic, but safely.</strong> Unlike
the <code class="language-plaintext highlighter-rouge">&amp;</code> operator which creates a simple pointer, the slice operator
derives a fat pointer.</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">fill</span><span class="p">([]</span><span class="kt">int</span><span class="p">,</span> <span class="kt">int</span><span class="p">)</span> <span class="p">[]</span><span class="kt">int</span>

<span class="k">var</span> <span class="n">array</span> <span class="p">[</span><span class="m">8</span><span class="p">]</span><span class="kt">int</span>

<span class="c">// len == 0, cap == 8, like &amp;array[0]</span>
<span class="n">fill</span><span class="p">(</span><span class="n">array</span><span class="p">[</span><span class="o">:</span><span class="m">0</span><span class="p">],</span> <span class="m">1</span><span class="p">)</span>
<span class="c">// array is [1, 1, 1, 1, 1, 1, 1, 1]</span>

<span class="c">// len == 0, cap == 4, like &amp;array[4]</span>
<span class="n">fill</span><span class="p">(</span><span class="n">array</span><span class="p">[</span><span class="m">4</span><span class="o">:</span><span class="m">4</span><span class="p">],</span> <span class="m">2</span><span class="p">)</span>
<span class="c">// array is [1, 1, 1, 1, 2, 2, 2, 2]</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">fill</code> function could take a slice of the slice, effectively moving
the pointer around with pointer arithmetic, but without violating memory
safety due to the additional “fat pointer” information. In other words,
fat pointers can be derived from other fat pointers.</p>

<p>Slices aren’t as universal as pointers, at least at the moment. You can
take the address of any variable using <code class="language-plaintext highlighter-rouge">&amp;</code>, but you can’t take a <em>slice</em>
of any variable, even if it would be logically sound.</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">var</span> <span class="n">foo</span> <span class="kt">int</span>

<span class="c">// attempt to make len = 1, cap = 1 slice backed by foo</span>
<span class="k">var</span> <span class="n">fooslice</span> <span class="p">[]</span><span class="kt">int</span> <span class="o">=</span> <span class="n">foo</span><span class="p">[</span><span class="o">:</span><span class="p">]</span>   <span class="c">// compile-time error!</span>
</code></pre></div></div>

<p>That wouldn’t be very useful anyway. However, if you <em>really</em> wanted to
do this, the <code class="language-plaintext highlighter-rouge">unsafe</code> package can accomplish it. I believe the resulting
slice would be perfectly safe to use:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// Convert to one-element array, then slice</span>
<span class="n">fooslice</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="kt">int</span><span class="p">)(</span><span class="n">unsafe</span><span class="o">.</span><span class="n">Pointer</span><span class="p">(</span><span class="o">&amp;</span><span class="n">foo</span><span class="p">))[</span><span class="o">:</span><span class="p">]</span>
</code></pre></div></div>

<p>Update: <a href="https://utcc.utoronto.ca/~cks/space/blog/programming/GoVariableToArrayConversion">Chris Siebenmann speculated about why this requires
<code class="language-plaintext highlighter-rouge">unsafe</code></a>.</p>

<p>Of course, slices are super flexible and have many more uses that look
less like fat pointers, but this is still how I tend to reason about
slices when I write Go.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>UTF-8 String Indexing Strategies</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/05/29/"/>
    <id>urn:uuid:12e9ed44-b5c1-495f-8750-dfaf1ab008e2</id>
    <updated>2019-05-29T21:52:06Z</updated>
    <category term="elisp"/><category term="emacs"/><category term="go"/><category term="lang"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=20049491">on Hacker News</a>.</em></p>

<p>When designing or, in some cases, implementing a programming language
with built-in support for Unicode strings, an important decision must be
made about how to represent or encode those strings in memory. Not all
representations are equal, and there are trade-offs between different
choices.</p>

<!--more-->

<p>One issue to consider is that strings typically feature random access
indexing of code points with a time complexity resembling constant
time (<code class="language-plaintext highlighter-rouge">O(1)</code>). However, not all string representations actually
support this well. Strings using variable length encoding, such as
UTF-8 or UTF-16, have <code class="language-plaintext highlighter-rouge">O(n)</code> time complexity indexing, ignoring
special cases (discussed below). The most obvious choice to achieve
<code class="language-plaintext highlighter-rouge">O(1)</code> time complexity — an array of 32-bit values, as in UCS-4 —
makes very inefficient use of memory, especially with typical strings.</p>

<p>Despite this, UTF-8 is still chosen in a number of programming
languages, or at least in their implementations. In this article I’ll
discuss three examples — Emacs Lisp, Julia, and Go — and how each takes a
slightly different approach.</p>

<h3 id="emacs-lisp">Emacs Lisp</h3>

<p>Emacs Lisp has two different types of strings that generally can be used
interchangeably: <em>unibyte</em> and <em>multibyte</em>. In fact, the difference
between them is so subtle that I bet that most people writing Emacs Lisp
don’t even realize there are two kinds of strings.</p>

<p>Emacs Lisp uses UTF-8 internally to encode all “multibyte” strings and
buffers. To fully support arbitrary sequences of bytes in the files
being edited, Emacs uses <a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/Text-Representations.html">its own extension of Unicode</a> to
precisely and unambiguously represent raw bytes intermixed with text.
Any arbitrary sequence of bytes can be decoded into Emacs’ internal
representation, then losslessly re-encoded back into the exact same
sequence of bytes.</p>

<p>Unibyte strings and buffers are really just byte-strings. In practice,
they’re essentially ISO/IEC 8859-1, a.k.a. <em>Latin-1</em>. It’s a Unicode
string where all code points are below 256. Emacs prefers the smallest
and simplest string representation when possible, <a href="https://www.python.org/dev/peps/pep-0393/">similar to CPython
3.3+</a>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">multibyte-string-p</span> <span class="s">"hello"</span><span class="p">)</span>
<span class="c1">;; =&gt; nil</span>

<span class="p">(</span><span class="nv">multibyte-string-p</span> <span class="s">"π ≈ 3.14"</span><span class="p">)</span>
<span class="c1">;; =&gt; t</span>
</code></pre></div></div>

<p>Emacs Lisp strings are mutable, and therein lies the kicker: As soon as
you insert a code point above 255, Emacs quietly converts the string to
multibyte.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">fish</span> <span class="s">"fish"</span><span class="p">)</span>

<span class="p">(</span><span class="nv">multibyte-string-p</span> <span class="nv">fish</span><span class="p">)</span>
<span class="c1">;; =&gt; nil</span>

<span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">fish</span> <span class="mi">2</span><span class="p">)</span> <span class="nv">?</span><span class="err">ŝ</span>
      <span class="p">(</span><span class="nb">aref</span> <span class="nv">fish</span> <span class="mi">3</span><span class="p">)</span> <span class="nv">?o</span><span class="p">)</span>

<span class="nv">fish</span>
<span class="c1">;; =&gt; "fiŝo"</span>

<span class="p">(</span><span class="nv">multibyte-string-p</span> <span class="nv">fish</span><span class="p">)</span>
<span class="c1">;; =&gt; t</span>
</code></pre></div></div>

<p>Constant time indexing into unibyte strings is straightforward, and
Emacs does the obvious thing when indexing into unibyte strings. It
helps that most strings in Emacs are probably unibyte, even when the
user isn’t working in English.</p>

<p>Most buffers are multibyte, even if those buffers are generally just
ASCII. Since <a href="/blog/2017/09/07/">Emacs uses gap buffers</a> it generally doesn’t matter:
Nearly all accesses are tightly clustered around the point, so O(n)
indexing doesn’t often matter.</p>

<p>That leaves multibyte strings. Consider these idioms for iterating
across a string in Emacs Lisp:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">i</span> <span class="p">(</span><span class="nb">length</span> <span class="nb">string</span><span class="p">))</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">c</span> <span class="p">(</span><span class="nb">aref</span> <span class="nb">string</span> <span class="nv">i</span><span class="p">)))</span>
    <span class="o">...</span><span class="p">))</span>

<span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">c</span> <span class="nv">being</span> <span class="k">the</span> <span class="nv">elements</span> <span class="nv">of</span> <span class="nb">string</span>
         <span class="o">...</span><span class="p">)</span>
</code></pre></div></div>

<p>The latter expands into essentially the same as the former: An
incrementing index that uses <code class="language-plaintext highlighter-rouge">aref</code> to index to that code point. So is
iterating over a multibyte string — a common operation — an O(n^2)
operation?</p>

<p>The good news is that, at least in this case, no! It’s essentially just
as efficient as iterating over a unibyte string. Before going over why,
consider this little puzzle. Here’s a little string comparison function
that compares two strings a code point at a time, returning their first
difference:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">compare</span> <span class="p">(</span><span class="nv">string-a</span> <span class="nv">string-b</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">a</span> <span class="nv">being</span> <span class="k">the</span> <span class="nv">elements</span> <span class="nv">of</span> <span class="nv">string-a</span>
           <span class="nv">for</span> <span class="nv">b</span> <span class="nv">being</span> <span class="k">the</span> <span class="nv">elements</span> <span class="nv">of</span> <span class="nv">string-b</span>
           <span class="nb">unless</span> <span class="p">(</span><span class="nb">eql</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
           <span class="nb">return</span> <span class="p">(</span><span class="nb">cons</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
</code></pre></div></div>

<p>Let’s examine benchmarks with some long strings (100,000 code points):</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark-run</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">a</span> <span class="p">(</span><span class="nb">make-string</span> <span class="mi">100000</span> <span class="mi">0</span><span class="p">))</span>
          <span class="p">(</span><span class="nv">b</span> <span class="p">(</span><span class="nb">make-string</span> <span class="mi">100000</span> <span class="mi">0</span><span class="p">)))</span>
      <span class="p">(</span><span class="nv">compare</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
<span class="c1">;; =&gt; (0.012568031 0 0.0)</span>
</code></pre></div></div>

<p>With using two, zeroed unibyte strings it takes 13ms. How about changing
the last code point in one of them to 256, converting it to a multibyte
string:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark-run</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">a</span> <span class="p">(</span><span class="nb">make-string</span> <span class="mi">100000</span> <span class="mi">0</span><span class="p">))</span>
          <span class="p">(</span><span class="nv">b</span> <span class="p">(</span><span class="nb">make-string</span> <span class="mi">100000</span> <span class="mi">0</span><span class="p">)))</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">a</span> <span class="p">(</span><span class="nb">1-</span> <span class="p">(</span><span class="nb">length</span> <span class="nv">a</span><span class="p">)))</span> <span class="mi">256</span><span class="p">)</span>
      <span class="p">(</span><span class="nv">compare</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
<span class="c1">;; =&gt; (0.012680513 0 0.0)</span>
</code></pre></div></div>

<p>Same running time, so that multibyte string cost nothing more to iterate
across. Let’s try making them both multibyte:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark-run</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">a</span> <span class="p">(</span><span class="nb">make-string</span> <span class="mi">100000</span> <span class="mi">0</span><span class="p">))</span>
          <span class="p">(</span><span class="nv">b</span> <span class="p">(</span><span class="nb">make-string</span> <span class="mi">100000</span> <span class="mi">0</span><span class="p">)))</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">a</span> <span class="p">(</span><span class="nb">1-</span> <span class="p">(</span><span class="nb">length</span> <span class="nv">a</span><span class="p">)))</span> <span class="mi">256</span>
            <span class="p">(</span><span class="nb">aref</span> <span class="nv">b</span> <span class="p">(</span><span class="nb">1-</span> <span class="p">(</span><span class="nb">length</span> <span class="nv">b</span><span class="p">)))</span> <span class="mi">256</span><span class="p">)</span>
      <span class="p">(</span><span class="nv">compare</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
<span class="c1">;; =&gt; (2.327959762 0 0.0)</span>
</code></pre></div></div>

<p>That took 2.3 seconds: about 2000x longer to run! Iterating over two
multibyte strings concurrently seems to have broken an optimization.
Can you reason about what’s happened?</p>

<p>To avoid the O(n) cost on this common indexing operating, Emacs keeps
a “bookmark” for the last indexing location into a multibyte string.
If the next access is nearby, it can starting looking from this
bookmark, forwards or backwards. Like a gap buffer, this gives a big
advantage to clustered accesses, including iteration.</p>

<p>However, this string bookmark is <em>global</em>, one per Emacs instance, not
once per string. In the last benchmark, the two multibyte strings are
constantly fighting over a single string bookmark, and indexing in
comparison function is reduced to O(n^2) time complexity.</p>

<p>So, Emacs <em>pretends</em> it has constant time access into its UTF-8 text
data, but it’s only faking it with some simple optimizations. This
usually works out just fine.</p>

<h3 id="julia">Julia</h3>

<p>Another approach is to not pretend at all, and to make this limitation
of UTF-8 explicit in the interface. Julia took this approach, and it
<a href="/blog/2014/03/06/">was one of my complaints about the language</a>. I don’t think
this is necessarily a bad choice, but I do still think it’s
inappropriate considering Julia’s target audience (i.e. Matlab users).</p>

<p>Julia strings are explicitly byte strings containing valid UTF-8 data.
All indexing occurs on bytes, which is trivially constant time, and
always decodes the multibyte code point starting at that byte. <em>But</em>
it is an error to index to a byte that doesn’t begin a code point.
That error is also trivially checked in constant time.</p>

<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">s</span> <span class="o">=</span> <span class="s">"π"</span>

<span class="n">s</span><span class="x">[</span><span class="mi">1</span><span class="x">]</span>
<span class="c"># =&gt; 'π'</span>

<span class="n">s</span><span class="x">[</span><span class="mi">2</span><span class="x">]</span>
<span class="c"># ERROR: UnicodeError: invalid character index</span>
<span class="c">#  in getindex at ./strings/basic.jl:37</span>
</code></pre></div></div>

<p>Slices are still over bytes, but they “round up” to the end of the
current code point:</p>

<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">s</span><span class="x">[</span><span class="mi">1</span><span class="o">:</span><span class="mi">1</span><span class="x">]</span>
<span class="c"># =&gt; "π"</span>
</code></pre></div></div>

<p>Iterating over a string requires helper functions which keep an internal
“bookmark” so that each access is constant time:</p>

<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="n">eachindex</span><span class="x">(</span><span class="n">string</span><span class="x">)</span>
    <span class="n">c</span> <span class="o">=</span> <span class="n">string</span><span class="x">[</span><span class="n">i</span><span class="x">]</span>
    <span class="c"># ...</span>
<span class="k">end</span>
</code></pre></div></div>

<p>So Julia doesn’t pretend, it makes the problem explicit.</p>

<h3 id="go">Go</h3>

<p>Go is very similar to Julia, but takes an even more explicit view of
strings. All strings are byte strings and there are no restrictions on
their contents. Conventionally strings contain UTF-8 encoded text, but
this is not strictly required. There’s a <code class="language-plaintext highlighter-rouge">unicode/utf8</code> package for
working with strings containing UTF-8 data.</p>

<p>Beyond convention, the <code class="language-plaintext highlighter-rouge">range</code> clause also assumes the string contains
UTF-8 data, and it’s not an error if it does not. Bytes not containing
valid UTF-8 data appear as a <code class="language-plaintext highlighter-rouge">REPLACEMENT CHARACTER</code> (U+FFFD).</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="n">s</span> <span class="o">:=</span> <span class="s">"π</span><span class="se">\xff</span><span class="s">"</span>
    <span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">r</span> <span class="o">:=</span> <span class="k">range</span> <span class="n">s</span> <span class="p">{</span>
        <span class="n">fmt</span><span class="o">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"U+%04x</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">r</span><span class="p">)</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="c">// U+03c0</span>
<span class="c">// U+fffd</span>
</code></pre></div></div>

<p>A further case of the language favoring UTF-8 is that casting a string
to <code class="language-plaintext highlighter-rouge">[]rune</code> decodes strings into code points, like UCS-4, again using
<code class="language-plaintext highlighter-rouge">REPLACEMENT CHARACTER</code>:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="n">s</span> <span class="o">:=</span> <span class="s">"π</span><span class="se">\xff</span><span class="s">"</span>
    <span class="n">r</span> <span class="o">:=</span> <span class="p">[]</span><span class="kt">rune</span><span class="p">(</span><span class="n">s</span><span class="p">)</span>
    <span class="n">fmt</span><span class="o">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"U+%04x</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">r</span><span class="p">[</span><span class="m">0</span><span class="p">])</span>
    <span class="n">fmt</span><span class="o">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"U+%04x</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">r</span><span class="p">[</span><span class="m">1</span><span class="p">])</span>
<span class="p">}</span>

<span class="c">// U+03c0</span>
<span class="c">// U+fffd</span>
</code></pre></div></div>

<p>So, like Julia, there’s no pretending, and the programmer explicitly
must consider the problem.</p>

<h3 id="preferences">Preferences</h3>

<p>All-in-all I probably prefer how Julia and Go are explicit with
UTF-8’s limitations, rather than Emacs Lisp’s attempt to cover it up
with an internal optimization. Since the abstraction is leaky, it may
as well be made explicit.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  

</feed>
