<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title>null program</title>
  <link rel="alternate" type="text/html" href="https://nullprogram.com"/>
  <link rel="self" type="application/atom+xml" href="https://nullprogram.com/feed/"/>
  <updated>2026-01-19T21:57:56Z</updated>
  <id>urn:uuid:f8b65823-4ec5-3a70-efc8-2b713aa63091</id>

  <author>
    <name>Christopher Wellons</name>
    <uri>https://nullprogram.com/</uri>
    <email>wellons@nullprogram.com</email>
  </author>

  
  <entry>
    <title>Frankenwine: Multiple personas in a Wine process</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2026/01/19/"/>
    <id>urn:uuid:d2b53f8d-88a6-400b-a748-693a758741c5</id>
    <updated>2026-01-19T21:51:38Z</updated>
    <category term="c"/><category term="win32"/><category term="linux"/><category term="x86"/>
    <content type="html">
      <![CDATA[<p>I came across a recent article on <a href="https://gpfault.net/posts/drunk-exe.html">making Linux system calls from a Wine
process</a>. Windows programs running under Wine are still normal Linux
processes and may interact with the Linux kernel like any other process.
None of this was surprising, and the demonstration works just as I expect.
Still, it got the wheels spinning and I realized an <em>almost</em> practical
application: build <a href="/blog/2023/01/18/">my pkg-config implementation</a> such that on Windows
<code class="language-plaintext highlighter-rouge">pkg-config.exe</code> behaves as a native pkg-config, but when run under Wine
this same binary takes the persona of a Linux program and becomes a cross
toolchain pkg-config, bypassing Win32 and talking directly with the Linux
kernel. <a href="https://justine.lol/cosmopolitan/">Cosmopolitcan Libc</a> cleverly does this out-of-the-box, but
in this article we’ll mash together a couple existing sources with a bit
of glue.</p>

<p>The results are in <a href="https://github.com/skeeto/u-config/commit/e0008d7e">the merge-demo branch</a> of u-config, and took
hardly any work:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git show --stat
...
 main_linux_amd64.c |   8 ++---
 main_wine.c        | 101 +++++++++++++++++++++++++++++++++++++++++
 src/linux_noarch.c |  16 ++++-----
 src/u-config.c     |   1 +
 4 files changed, 114 insertions(+), 12 deletions(-)
</code></pre></div></div>

<p>A platform layer, <code class="language-plaintext highlighter-rouge">main_wine.c</code>, is a merge of two existing platform
layers, one of which required unavoidable tweaks. We’ll get to those
details in a moment. First we’ll need to detect if we’re running under
Wine, and <a href="https://web.archive.org/web/20250923061634/https://stackoverflow.com/questions/7372388/determine-whether-a-program-is-running-under-wine-at-runtime/42333249#42333249">the best solution I found</a> was to locate
<code class="language-plaintext highlighter-rouge">ntdll!wine_get_version</code>. If this function exists, we’re in Wine. That
works out to a pretty one-liner because <code class="language-plaintext highlighter-rouge">ntdll.dll</code> is already loaded:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">bool</span> <span class="nf">running_on_wine</span><span class="p">()</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="n">GetProcAddress</span><span class="p">(</span><span class="n">GetModuleHandleA</span><span class="p">(</span><span class="s">"ntdll"</span><span class="p">),</span> <span class="s">"wine_get_version"</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>An x86-64 Linux syscall wrapper with <a href="/blog/2024/12/20/">thorough inline assembly</a>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">ptrdiff_t</span> <span class="nf">syscall3</span><span class="p">(</span><span class="kt">int</span> <span class="n">n</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">a</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">b</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">ptrdiff_t</span> <span class="n">r</span><span class="p">;</span>
    <span class="n">asm</span> <span class="k">volatile</span> <span class="p">(</span>
        <span class="s">"syscall"</span>
        <span class="o">:</span> <span class="s">"=a"</span><span class="p">(</span><span class="n">r</span><span class="p">)</span>
        <span class="o">:</span> <span class="s">"a"</span><span class="p">(</span><span class="n">n</span><span class="p">),</span> <span class="s">"D"</span><span class="p">(</span><span class="n">a</span><span class="p">),</span> <span class="s">"S"</span><span class="p">(</span><span class="n">b</span><span class="p">),</span> <span class="s">"d"</span><span class="p">(</span><span class="n">c</span><span class="p">)</span>
        <span class="o">:</span> <span class="s">"rcx"</span><span class="p">,</span> <span class="s">"r11"</span><span class="p">,</span> <span class="s">"memory"</span>
    <span class="p">);</span>
    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">ptrdiff_t</span> <span class="nf">write</span><span class="p">(</span><span class="kt">int</span> <span class="n">fd</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">buf</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">len</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="n">syscall3</span><span class="p">(</span><span class="n">SYS_write</span><span class="p">,</span> <span class="n">fd</span><span class="p">,</span> <span class="p">(</span><span class="kt">ptrdiff_t</span><span class="p">)</span><span class="n">buf</span><span class="p">,</span> <span class="n">len</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I’d normally use <code class="language-plaintext highlighter-rouge">long</code> for all these integers because Linux is <a href="https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models">LP64</a>
(<code class="language-plaintext highlighter-rouge">long</code> is pointer-sized), but Windows is LLP64 (only <code class="language-plaintext highlighter-rouge">long long</code> is 64
bits). It’s so bizarre to interface with Linux from LLP64, and this will
have consequences later. With these pieces we can see the basic shape of a
split personality program:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">if</span> <span class="p">(</span><span class="n">running_on_wine</span><span class="p">())</span> <span class="p">{</span>
        <span class="n">write</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s">"hello, wine</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="mi">12</span><span class="p">);</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="n">HANDLE</span> <span class="n">h</span> <span class="o">=</span> <span class="n">GetStdHandle</span><span class="p">(</span><span class="n">STD_OUTPUT_HANDLE</span><span class="p">);</span>
        <span class="n">WriteFile</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="s">"hello, windows</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="mi">15</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>We can cram two programs into this binary and select which program at run
time depending on what we see. In typical programs locating and calling
into glibc would be a challenge, particularly with the incompatible ABIs
involved. We’re avoiding it here by interfacing directly with the kernel.</p>

<h3 id="application-to-u-config">Application to u-config</h3>

<p>Luckily u-config has completely-optional platform layers implemented with
Linux system calls. The POSIX platform layer works fine, and that’s what
distributions should generally use, but these bonus platforms are unhosted
and do not require libc. That means we can shove it into a Windows build
with relatively little trouble.</p>

<p>Before we do that, let’s think about what we’re doing. <a href="/blog/2021/08/21/">Debian has great
cross toolchain support</a>, including Mingw-w64. There are even a few
Windows libraries in the Debian package repository, <a href="https://packages.debian.org/trixie/x32/libz-mingw-w64">such as zlib</a>, and
we can build Windows programs against them. If you’re cross-building and
using pkg-config, you ought to use the cross toolchain pkg-config, which
in GNU ecosystems gets an architecture prefix like the other cross tools.
Debian cross toolchains each include a cross pkg-config, and it sometimes
<em>almost</em> works correctly! Here’s what I get on Debian 13:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ x86_64-w64-mingw32-pkg-config --cflags --libs zlib
-I/usr/x86_64-w64-mingw32/include -L/usr/x86_64-w64-mingw32/lib -lz
</code></pre></div></div>

<p>Note the architecture in the <code class="language-plaintext highlighter-rouge">-I</code> and <code class="language-plaintext highlighter-rouge">-L</code> options. It really is querying
the <a href="https://peter0x44.github.io/posts/cross-compilers/">cross sysroot</a>. Though these paths are in the cross sysroot,
and so should not be listed by pkg-config. It’s unoptimal and indicates
this pkg-config is probably misconfigured. In other cases it’s far from
correct:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ x86_64-w64-mingw32-pkg-config --variable pc_path pkg-config
/usr/local/lib/x86_64-linux-gnu/pkgconfig:...
</code></pre></div></div>

<p>A tool prefixed <code class="language-plaintext highlighter-rouge">x86_64-w64-mingw32-</code> should not produce paths containing
<code class="language-plaintext highlighter-rouge">x86_64-linux-gnu</code> (the host architecture in this case). Our version won’t
have these issues.</p>

<p>The u-config platform interface is five functions:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">filemap</span> <span class="nf">os_mapfile</span><span class="p">(</span><span class="n">os</span> <span class="o">*</span><span class="p">,</span> <span class="n">arena</span> <span class="o">*</span><span class="p">,</span> <span class="n">s8</span> <span class="n">path</span><span class="p">);</span>  <span class="c1">// read whole files</span>
<span class="n">s8node</span> <span class="o">*</span><span class="nf">os_listing</span><span class="p">(</span><span class="n">os</span> <span class="o">*</span><span class="p">,</span> <span class="n">arena</span> <span class="o">*</span><span class="p">,</span> <span class="n">s8</span> <span class="n">path</span><span class="p">);</span>  <span class="c1">// list directories</span>
<span class="kt">void</span>    <span class="nf">os_write</span><span class="p">(</span><span class="n">os</span> <span class="o">*</span><span class="p">,</span> <span class="n">i32</span> <span class="n">fd</span><span class="p">,</span> <span class="n">s8</span><span class="p">);</span>          <span class="c1">// standard out/err</span>
<span class="kt">void</span>    <span class="nf">os_fail</span><span class="p">(</span><span class="n">os</span> <span class="o">*</span><span class="p">);</span>                       <span class="c1">// non-zero exit</span>

<span class="kt">void</span> <span class="nf">uconfig</span><span class="p">(</span><span class="n">config</span> <span class="o">*</span><span class="p">);</span>
</code></pre></div></div>

<p>Platforms implement the first four functions, and call <code class="language-plaintext highlighter-rouge">uconfig()</code> with
the platform’s configuration, context pointer (<code class="language-plaintext highlighter-rouge">os *</code>), command line
arguments, environment, and some memory (all in the <code class="language-plaintext highlighter-rouge">config</code> object). My
strategy is to link two platforms into the binary, and the first challenge
is they both define <code class="language-plaintext highlighter-rouge">os_write</code>, etc. I did not plan nor intend for one
binary to contain more than one platform layer. Unity builds offer a fix
without changing a single line of code:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define os_fail     win32_fail
#define os_listing  win32_listing
#define os_mapfile  win32_mapfile
#define os_write    win32_write
#include</span> <span class="cpf">"main_windows.c"</span><span class="cp">
#undef os_write
#undef os_mapfile
#undef os_listing
#undef os_fail
</span>
<span class="cp">#define os_fail     linux_fail
#define os_listing  linux_listing
#define os_mapfile  linux_mapfile
#define os_write    linux_write
#include</span> <span class="cpf">"main_linux_amd64.c"</span><span class="cp">
#undef os_write
#undef os_mapfile
#undef os_listing
#undef os_fail
</span></code></pre></div></div>

<p>This dirty, but effective trick <a href="/blog/2025/02/05/">may look familiar</a>. It also doesn’t
interfere with the other builds. Next I define the real platform functions
as a dispatch based on our run-time situation:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">b32</span> <span class="n">wine_detected</span><span class="p">;</span>

<span class="n">filemap</span> <span class="nf">os_mapfile</span><span class="p">(</span><span class="n">os</span> <span class="o">*</span><span class="n">ctx</span><span class="p">,</span> <span class="n">arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="n">s8</span> <span class="n">path</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">wine_detected</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">return</span> <span class="n">linux_mapfile</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">path</span><span class="p">);</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="k">return</span> <span class="n">win32_mapfile</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">path</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>If I were serious about keeping this experiment, I’d lift <code class="language-plaintext highlighter-rouge">os</code> as I did
the functions (as <code class="language-plaintext highlighter-rouge">win32_os</code>, <code class="language-plaintext highlighter-rouge">linux_os</code>) and include <code class="language-plaintext highlighter-rouge">wine_detected</code> in
the context, eliminating this global variable. That cannot be done with
simple hacks and macros.</p>

<p>The next challenge is that I wrote the Linux platform layer assuming LP64,
and so it uses <code class="language-plaintext highlighter-rouge">long</code> instead of an equivalent platform-agnostic type like
<code class="language-plaintext highlighter-rouge">ptrdiff_t</code>. I never thought this would be an issue because this source
literally contains <code class="language-plaintext highlighter-rouge">asm</code> blocks and no conditional compilation, yet here
we are. Lesson learned. I wanted to try an extremely janky <code class="language-plaintext highlighter-rouge">#define</code> on
<code class="language-plaintext highlighter-rouge">long</code> to fix it, but this source file has a couple <code class="language-plaintext highlighter-rouge">long long</code> that won’t
play along. These multi-token type names of C are antithetical to its
preprocessor! So I adjusted the source manually instead.</p>

<p>The Windows and Linux platform entry points are completely different, both
in name and form, and so co-exist naturally. The merged platform layer is
a new entry point that will pass control to the appropriate entry point:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">entrypoint</span><span class="p">(</span><span class="kt">ptrdiff_t</span> <span class="o">*</span><span class="n">stack</span><span class="p">);</span>  <span class="c1">// Linux</span>
<span class="kt">void</span> <span class="kr">__stdcall</span> <span class="nf">mainCRTStartup</span><span class="p">();</span>    <span class="c1">// Windows</span>
</code></pre></div></div>

<p>On Linux <code class="language-plaintext highlighter-rouge">stack</code> is <a href="/blog/2025/03/06/">the initial value of the stack pointer</a>, which
<a href="https://articles.manugarg.com/aboutelfauxiliaryvectors">points to <code class="language-plaintext highlighter-rouge">argc</code>, <code class="language-plaintext highlighter-rouge">argv</code>, <code class="language-plaintext highlighter-rouge">envp</code>, and <code class="language-plaintext highlighter-rouge">auxv</code></a>. We’ll need construct
an artificial “stack” for the Linux platform layer to harvest. On Windows
this is <a href="/blog/2023/02/15/">the process entry point</a>, and it will find the rest on its
own as a normal Windows process. Ultimately this ended up simpler than I
expected:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="kr">__stdcall</span> <span class="nf">merge_entrypoint</span><span class="p">()</span>
<span class="p">{</span>
    <span class="n">wine_detected</span> <span class="o">=</span> <span class="n">running_on_wine</span><span class="p">();</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">wine_detected</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">u8</span> <span class="o">*</span><span class="n">fakestack</span><span class="p">[</span><span class="n">CMDLINE_ARGV_MAX</span><span class="o">+</span><span class="mi">1</span><span class="p">];</span>
        <span class="n">c16</span> <span class="o">*</span><span class="n">cmd</span> <span class="o">=</span> <span class="n">GetCommandLineW</span><span class="p">();</span>
        <span class="n">fakestack</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">u8</span> <span class="o">*</span><span class="p">)(</span><span class="n">iz</span><span class="p">)</span><span class="n">cmdline_to_argv8</span><span class="p">(</span><span class="n">cmd</span><span class="p">,</span> <span class="n">fakestack</span><span class="o">+</span><span class="mi">1</span><span class="p">);</span>
        <span class="c1">// TODO: append envp to the fake stack</span>
        <span class="n">entrypoint</span><span class="p">((</span><span class="n">iz</span> <span class="o">*</span><span class="p">)</span><span class="n">fakestack</span><span class="p">);</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="n">mainCRTStartup</span><span class="p">();</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Where <a href="/blog/2022/02/18/"><code class="language-plaintext highlighter-rouge">cmdline_to_argv8</code> is my Windows argument parser</a>, already
used by u-config, and I reserve one element at the front to store <code class="language-plaintext highlighter-rouge">argc</code>.
Since this is just a proof-of-concept I didn’t bother fabricating and
pushing <code class="language-plaintext highlighter-rouge">envp</code> onto the fake stack. The Linux entry point doesn’t need
<code class="language-plaintext highlighter-rouge">auxv</code> and can be omitted. Once in the Linux entry point it’s essentially
a Linux process from then on, except the x64 calling convention still in
use internally.</p>

<p>Finally, I configure the Linux platform layer for Debian’s cross sysroot:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define PKG_CONFIG_LIBDIR "/usr/x86_64-w64-mingw32/lib/pkgconfig"
#define PKG_CONFIG_SYSTEM_INCLUDE_PATH "/usr/x86_64-w64-mingw32/include</span><span class="cpf">"
#define PKG_CONFIG_SYSTEM_LIBRARY_PATH "</span><span class="c1">/usr/x86_64-w64-mingw32/lib"</span><span class="cp">
</span></code></pre></div></div>

<p>And that’s it! We have our platform merge. Build (<a href="https://github.com/skeeto/w64devkit">w64devkit</a>):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cc -nostartfiles -e merge_entrypoint -o pkg-config.exe main_wine.c
</code></pre></div></div>

<p>On Debian use <code class="language-plaintext highlighter-rouge">x86_64-w64-mingw32-gcc</code> for <code class="language-plaintext highlighter-rouge">cc</code>. The <code class="language-plaintext highlighter-rouge">-e</code> linker option
selects the new, higher level entry point. After installing <a href="https://packages.debian.org/trixie/wine-binfmt">Wine
binfmt</a>, here’s how it looks on Debian:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./pkg-config.exe --cflags --libs zlib
-lz
</code></pre></div></div>

<p>That’s the correct output, but is it using the cross sysroot? Ask it to
include the <code class="language-plaintext highlighter-rouge">-I</code> argument despite it being in the cross sysroot:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./pkg-config.exe --cflags --libs --keep-system-cflags zlib
-I/usr/x86_64-w64-mingw32/include -lz
</code></pre></div></div>

<p>Looking good! It passes the <code class="language-plaintext highlighter-rouge">pc_path</code> test, too:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./pkg-config.exe --variable pc_path pkg-config
/usr/x86_64-w64-mingw32/lib/pkgconfig
</code></pre></div></div>

<p>Running <em>this same binary</em> on Windows after installing zlib in w64devkit:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./pkg-config.exe --cflags --libs --keep-system-cflags zlib
-IC:/w64devkit/include -lz
</code></pre></div></div>

<p>Also:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./pkg-config.exe --variable pc_path pkg-config
C:/w64devkit/lib/pkgconfig;C:/w64devkit/share/pkgconfig
</code></pre></div></div>

<p>My Frankenwine is a success!</p>

]]>
    </content>
  </entry>
  
  <entry>
    <title>WebAssembly as a Python extension platform</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2026/01/01/"/>
    <id>urn:uuid:91e7555d-950f-47c6-84b8-bee0070f61a9</id>
    <updated>2026-01-01T21:21:19Z</updated>
    <category term="c"/>
    <content type="html">
      <![CDATA[<p>Software above some complexity level tends to sport an extension language,
becoming a kind of software platform itself. Lua fills this role well, and
of course there’s JavaScript for web technologies. <a href="/blog/2025/04/04/">WebAssembly</a>
generalizes this, and any Wasm-targeting programming language can extend a
Wasm-hosting application. It has more friction than supplying a script in
a text file, but extension authors can write in their language of choice,
and use more polished development tools — debugging, <a href="/blog/2025/02/05/">testing</a>, etc.
— than typically available for a typical extension language. Python is
traditionally extended through native code behind a C interface, but it’s
recently become practical to extend Python with Wasm. That is we can ship
an architecture-independent Wasm blob inside a Python library, and use it
without requiring a native toolchain on the host system. Let’s discuss two
different use cases and their pitfalls.</p>

<p>Normally we’d extend Python in order to access an external interface that
Python cannot access on its own. Wasm runs in a sandbox with no access to
the outside world whatsoever, so it obviously isn’t useful for that case.
Extensions may also grant Python more speed, which is one of Wasm’s main
selling points. We can also use Wasm to access <em>embeddable capabilities</em>
written in a different programming language which do not require external
access.</p>

<p>For preferred non-WASI Wasm runtime is Volodymyr Shymanskyy’s <a href="https://github.com/wasm3/wasm3">wasm3</a>.
It’s plain old C and very friendly to embedding in the same was as, say,
SQLite. Performance is middling, though a C program running on wasm3 is
still quite a bit faster than an equivalent Python program. It has Python
bindings, <a href="https://github.com/wasm3/pywasm3">pywasm3</a>, but it’s distributed only in source code form. That
is, the host machine must have a C toolchain in order to use pywasm3,
which defeats my purposes here. If there’s a C toolchain, I might as well
just use that instead of going through Wasm.</p>

<p>For the use cases in this article, the best option is <a href="https://github.com/bytecodealliance/wasmtime-py">wasmtime-py</a>. The
distribution includes binaries for Windows, macOS, and Linux on x86-64 and
ARM64, which covers nearly all Python installations. Hosts require nothing
more than a Python interpreter, no native toolchains. It’s almost as good
as having Wasm built into Python itself. In my tests it’s 3x–10x faster
than wasm3, so for my first use case the situation is even better. The
catch is that it currently weighs ~18MiB (installed), and in the future
will likely rival the Python interpreter itself. The API also breaks on a
monthly basis, so you’re signing up for the upgrade treadmill lest your
own program perishes to bitrot after a couple of years. This article is
about version 40.</p>

<h3 id="usage-examples-and-gotchas">Usage examples and gotchas</h3>

<p>The <a href="https://github.com/bytecodealliance/wasmtime-py/tree/main/examples">official examples</a> don’t do anything non-trivial or interesting,
and so to figure things out I had to study <a href="https://bytecodealliance.github.io/wasmtime-py/">the documentation</a>,
which does not offer many hints. Basic setup looks like this:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">functools</span>
<span class="kn">import</span> <span class="nn">wasmtime</span>

<span class="n">store</span>    <span class="o">=</span> <span class="n">wasmtime</span><span class="p">.</span><span class="n">Store</span><span class="p">()</span>
<span class="n">module</span>   <span class="o">=</span> <span class="n">wasmtime</span><span class="p">.</span><span class="n">Module</span><span class="p">.</span><span class="n">from_file</span><span class="p">(</span><span class="n">store</span><span class="p">.</span><span class="n">engine</span><span class="p">,</span> <span class="s">"example.wasm"</span><span class="p">)</span>
<span class="n">instance</span> <span class="o">=</span> <span class="n">wasmtime</span><span class="p">.</span><span class="n">Instance</span><span class="p">(</span><span class="n">store</span><span class="p">,</span> <span class="n">module</span><span class="p">,</span> <span class="p">())</span>
<span class="n">exports</span>  <span class="o">=</span> <span class="n">instance</span><span class="p">.</span><span class="n">exports</span><span class="p">(</span><span class="n">store</span><span class="p">)</span>

<span class="n">memory</span> <span class="o">=</span> <span class="n">exports</span><span class="p">[</span><span class="s">"memory"</span><span class="p">].</span><span class="n">get_buffer_ptr</span><span class="p">(</span><span class="n">store</span><span class="p">)</span>
<span class="n">func1</span>  <span class="o">=</span> <span class="n">functools</span><span class="p">.</span><span class="n">partial</span><span class="p">(</span><span class="n">exports</span><span class="p">[</span><span class="s">"func1"</span><span class="p">],</span> <span class="n">store</span><span class="p">)</span>
<span class="n">func2</span>  <span class="o">=</span> <span class="n">functools</span><span class="p">.</span><span class="n">partial</span><span class="p">(</span><span class="n">exports</span><span class="p">[</span><span class="s">"func2"</span><span class="p">],</span> <span class="n">store</span><span class="p">)</span>
<span class="n">func3</span>  <span class="o">=</span> <span class="n">functools</span><span class="p">.</span><span class="n">partial</span><span class="p">(</span><span class="n">exports</span><span class="p">[</span><span class="s">"func3"</span><span class="p">],</span> <span class="n">store</span><span class="p">)</span>
</code></pre></div></div>

<p>A store is an allocation region from which we allocate all Wasm objects.
It is not possible to free individual objects except to discard the whole
store. Quite sensible, honestly. What’s <em>not</em> sensible is how often I have
to repeat myself, passing the store back into every object in order to use
it. These objects are associated with exactly one store and cannot be used
with different stores. <a href="https://docs.wasmtime.dev/api/wasmtime/struct.Store.html#cross-store-usage-of-items">Use the wrong store and it panics</a>: It’s
already keeping track internally! I do not understand why the interface
works this way. So to make things simpler, I use <code class="language-plaintext highlighter-rouge">functools.partial</code> to
bind the <code class="language-plaintext highlighter-rouge">store</code> parameter and so get the interface I expect.</p>

<p>The <code class="language-plaintext highlighter-rouge">get_buffer_ptr</code> object is a buffer protocol object, and if you’re
moving anything other than bytes that’s probably what you want to use to
access memory. The usual caveats apply for this object: If you <a href="/blog/2025/04/19/">change the
memory size</a> you probably want to grab a fresh buffer object. For
bytes (e.g. buffers and strings) I prefer the <code class="language-plaintext highlighter-rouge">read</code> and <code class="language-plaintext highlighter-rouge">write</code> methods.</p>

<p>Because <a href="https://github.com/WebAssembly/multi-value/blob/master/proposals/multi-value/Overview.md">multi-value</a> is still in an experimental state in the Wasm
ecosystem, you will likely not pass structs with Wasm. Anything more
complicated than scalars will require pointers and copying data in and out
of Wasm linear memory. This involves the usual trap that catches nearly
everyone: Wasm interfaces make no distinction between pointers and
integers, and Wasm runtimes interpret generally interpret all integers as
signed. What that means is <strong>your pointers are signed unless you take
action</strong>. Addresses start at 0, so this is bad, bad news.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">malloc</span> <span class="o">=</span> <span class="n">functools</span><span class="p">.</span><span class="n">partial</span><span class="p">(</span><span class="n">exports</span><span class="p">[</span><span class="s">"func1"</span><span class="p">],</span> <span class="n">store</span><span class="p">)</span>

<span class="n">hello</span> <span class="o">=</span> <span class="sa">b</span><span class="s">"hello"</span>
<span class="n">pointer</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">hello</span><span class="p">))</span>
<span class="k">assert</span> <span class="n">pointer</span>
<span class="n">memory</span> <span class="o">=</span> <span class="n">exports</span><span class="p">[</span><span class="s">"memory"</span><span class="p">].</span><span class="n">write</span><span class="p">(</span><span class="n">store</span><span class="p">,</span> <span class="n">hello</span><span class="p">,</span> <span class="n">pointer</span><span class="p">)</span>  <span class="c1"># WRONG!
</span></code></pre></div></div>

<p>To make matters worse, wasmtime-py adds its own footgun: The <code class="language-plaintext highlighter-rouge">read</code> and
<code class="language-plaintext highlighter-rouge">write</code> methods adopt the questionable Python convention of negative
indices acting from the end. If <code class="language-plaintext highlighter-rouge">malloc</code> returns a pointer in the upper
half of memory, the negative pointer will pass the bounds check inside
<code class="language-plaintext highlighter-rouge">write</code> because negative is valid, then quietly store to the wrong
address! Doh!</p>

<p>I wondered how common this error, so I searched online. I could find only
one non-trivial wasmtime-py use in the wild, in a sandboxed PDF reader. It
falls into the negative pointer trap as I expected. Not only that, it’s <a href="https://github.com/paulocoutinhox/pdfium-lib/blob/139d5037/modules/wasm.py#L601-L606">a
buffer overflow into Python’s memory space</a>:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code>            <span class="n">buf_ptr</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="n">store</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">pdf_data</span><span class="p">))</span>
            <span class="n">mem_data</span> <span class="o">=</span> <span class="n">memory</span><span class="p">.</span><span class="n">data_ptr</span><span class="p">(</span><span class="n">store</span><span class="p">)</span>

            <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">byte</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">pdf_data</span><span class="p">):</span>
                <span class="n">mem_data</span><span class="p">[</span><span class="n">buf_ptr</span> <span class="o">+</span> <span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">byte</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">data_ptr</code> method returns a non-bounds-checked raw <code class="language-plaintext highlighter-rouge">ctypes</code> pointer,
so this is actually a double mistake. First, it shouldn’t trust pointers
coming out of Wasm if it cares at all about sandboxing. The second is the
potential negative pointer, which in this case would write outside of the
Wasm memory and in Python’s memory, hopefully seg-faulting.</p>

<p>What’s one to do? <strong>Every pointer coming out of Wasm must be truncated</strong>
with a mask:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pointer</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(...)</span> <span class="o">&amp;</span> <span class="mh">0xffffffff</span>   <span class="c1"># correct for wasm32!
</span></code></pre></div></div>

<p>This interprets the result as unsigned. 64-bit Wasm needs a 64-bit mask,
though in practice you will never get a valid negative pointer from 64-bit
Wasm. This rule applies to JavaScript as well, where the idiom is:</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">let</span> <span class="nx">pointer</span> <span class="o">=</span> <span class="nx">malloc</span><span class="p">(...)</span> <span class="o">&gt;&gt;&gt;</span> <span class="mi">0</span>
</code></pre></div></div>

<p>Wasm runtimes cannot help — they lack the necessary information — and this
is perhaps a fundamental flaw in Wasm’s design. Once you know about it you
see this mistake happening everywhere.</p>

<p>Now that you have a proper address, you can apply it to a buffer protocol
view of memory. If you’re using NumPy there are various ways to interact
with this memory by wrapping it in NumPy types, though only if you’re on a
little endian host. (If you’re on a big endian machine, just give up on
running Wasm anyway.) The first use case I have in mind typically involves
copying plain Python values in and out. The <a href="https://docs.python.org/3/library/struct.html"><code class="language-plaintext highlighter-rouge">struct</code> package</a> is
quite handy here:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">vec2</span>   <span class="o">=</span> <span class="n">malloc</span><span class="p">(...)</span> <span class="o">&amp;</span> <span class="mh">0xffffffff</span>
<span class="n">memory</span> <span class="o">=</span> <span class="n">exports</span><span class="p">[</span><span class="s">"memory"</span><span class="p">].</span><span class="n">get_buffer_ptr</span><span class="p">(</span><span class="n">store</span><span class="p">)</span>
<span class="n">struct</span><span class="p">.</span><span class="n">pack_into</span><span class="p">(</span><span class="s">"&lt;ii"</span><span class="p">,</span> <span class="n">memory</span><span class="p">,</span> <span class="n">vec2</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
</code></pre></div></div>

<p>It fills a similar role to <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/DataView">JavaScript <code class="language-plaintext highlighter-rouge">DataView</code></a>. If you’re copying
lots of numbers, with CPython it’s faster to construct a custom format
string rather than use a loop:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">nums</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">]</span> <span class="o">=</span> <span class="p">...</span>
<span class="n">struct</span><span class="p">.</span><span class="n">pack_into</span><span class="p">(</span><span class="sa">f</span><span class="s">"&lt;</span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">nums</span><span class="p">)</span><span class="si">}</span><span class="s">i"</span><span class="p">,</span> <span class="n">memory</span><span class="p">,</span> <span class="n">buf</span><span class="p">,</span> <span class="o">*</span><span class="n">nums</span><span class="p">)</span>
</code></pre></div></div>

<p>To copy structures back out, use <code class="language-plaintext highlighter-rouge">struct.unpack_from</code>. If you’re moving
strings, you’ll need to <code class="language-plaintext highlighter-rouge">.encode()</code> and <code class="language-plaintext highlighter-rouge">.decode()</code> to convert to and from
<code class="language-plaintext highlighter-rouge">bytes</code>, which are well-suited to <code class="language-plaintext highlighter-rouge">read</code> and <code class="language-plaintext highlighter-rouge">write</code>.</p>

<p>In practice with real Wasm programs you’re going to be interacting with
the “guest” allocator from the outside, to request memory into which you
copy inputs for a function. In my examples I’ve used <code class="language-plaintext highlighter-rouge">malloc</code> because it
requires no elaboration, but as usual <a href="/blog/2023/09/27/">a bump allocator</a> solves
this so much better, especially because it doesn’t require stuffing a
whole general purpose allocator inside the Wasm program. Have one global
arena — no other threads will sharing that Wasm instance — rapid fire a
bunch of allocations as needed without any concern for memory management
in the “host”, call the function, which might allocate a result from that
arena, then reset the arena to clean up. In essence a stack for passing
values in and out.</p>

<h3 id="webassembly-as-faster-python">WebAssembly as faster Python</h3>

<p>Suppose we noticed a computational hot spot in our Python program in a
pure Python function (e.g. not calling out to an extension). Optimizing
this function would be wise. Based on my experiments if I re-implement
that function in C, compile it to Wasm, then run that bit of Wasm in place
of the original function, I can expect around a 10x speed-up. In general C
is more like 100x faster than Python, and the overhead of interfacing with
Wasm — copying stuff in and out, etc. — can be high, but not so high as to
not be profitable. This improves further if I can change the interface,
e.g. require callers to use the buffer protocol.</p>

<p>Thanks to wasmtime-py, I could introduce this change without fussing with
cross-compilers to build distribution binaries, nor require a toolchain on
the target, just a hefty Python package. Might be worth it.</p>

<p>My <a href="https://github.com/skeeto/scratch/tree/master/wasm-bench">main experimental benchmark</a> is a variation on <a href="/blog/2023/06/26/">my solution to
the “Two Sum” problem</a>, which I originally wrote for JavaScript, then
extended to pywasm3 and later wasmtime-py. It’s simple, just interesting
enough, and representative of the sort of Wasm drop-in I have in mind. It
has the same interface, but implements it with Wasm.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Original Pythonic interface
</span><span class="k">def</span> <span class="nf">twosum</span><span class="p">(</span><span class="n">nums</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">int</span><span class="p">],</span> <span class="n">target</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">]</span> <span class="o">|</span> <span class="bp">None</span><span class="p">:</span>
    <span class="p">...</span>

<span class="c1"># Stateful Wasm interface
</span><span class="k">class</span> <span class="nc">TwoSumWasm</span><span class="p">():</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="n">store</span>    <span class="o">=</span> <span class="n">wasmtime</span><span class="p">.</span><span class="n">Store</span><span class="p">()</span>
        <span class="n">module</span>   <span class="o">=</span> <span class="n">wasmtime</span><span class="p">.</span><span class="n">Module</span><span class="p">.</span><span class="n">from_file</span><span class="p">(</span><span class="n">store</span><span class="p">.</span><span class="n">engine</span><span class="p">,</span> <span class="p">...)</span>
        <span class="n">instance</span> <span class="o">=</span> <span class="n">wasmtime</span><span class="p">.</span><span class="n">Instance</span><span class="p">(</span><span class="n">store</span><span class="p">,</span> <span class="n">module</span><span class="p">,</span> <span class="p">())</span>
        <span class="p">...</span>

    <span class="k">def</span> <span class="nf">twosum</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">nums</span><span class="p">,</span> <span class="n">target</span><span class="p">):</span>
        <span class="c1"># ... use wasm instance ...
</span></code></pre></div></div>

<p>There’s some state to it with the Wasm instance in tow. If you hide that
by making it global you’ll need to synchronize your threads around it. In
a multi-threaded program perhaps these would be lazily-constructed thread
locals. I haven’t had to solve this yet.</p>

<p>However, the weakness of the wasmtime “store” really shows: Notice how
compilation and instantiation are bound together in one store? <del>I cannot
compile once and then create disposable instances on the fly</del>, e.g. as
required for each run of a WASI program. Every instance permanently
extends the compilation store. In practice we must wastefully re-compile
the Wasm program for each disposable instance. Despite appearances,
compilation and instantiation are not actually distinct steps, as they are
in JavaScript’s Wasm API. <code class="language-plaintext highlighter-rouge">wasmtime.Instance</code> accepts a store as its first
argument, <em>suggesting</em> use of a different store for instantiation. That
would solve this problem, but as of this writing it <em>must</em> be the same
store used to compile the module. <del>This is a fatal flaw for certain real
use cases, particularly WASI.</del></p>

<p><strong>Update</strong>: Wolfgang Meier points out the <code class="language-plaintext highlighter-rouge">serialize</code> and <code class="language-plaintext highlighter-rouge">deserialize</code>
methods, which detaches a compiled module from its store, allowing for
independent instantations. I tried it, and it’s a practical workaround.
Overhead is low; no validation when deserializing. My benchmark now does
it for future reference, as I expect it to be my typical use case.</p>

<h3 id="webassembly-as-embedded-capabilities">WebAssembly as embedded capabilities</h3>

<p>Loup Vaillant’s <a href="https://monocypher.org/">Monocypher</a> is a wonderful cryptography library.
Lean, efficient, and embedding-friendly, so much so it’s distributed in
amalgamated form. It requires no libc or runtime, so we can compile it
straight to Wasm with almost any Clang toolchain:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ clang --target=wasm32 -nostdlib -O2 -Wl,--no-entry -Wl,--export-all
        -o monocypher.wasm monocypher.c
</code></pre></div></div>

<p>It’s not “Wasm-aware” so I need <code class="language-plaintext highlighter-rouge">--export-all</code> to expose the interface.
This is swell because, as single translation unit, anything with external
linkage is the interface. Though remember what I said about interacting
with the guest allocator? This has no allocator, nor should it. It’s not
so usable in this form because we’d need to manage memory from the
outside. Do-able, but it’s easy to improve by adding a couple more
functions, sticking to a single translation unit:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">"monocypher.c"</span><span class="cp">
</span>
<span class="k">extern</span> <span class="kt">char</span>  <span class="n">__heap_base</span><span class="p">[];</span>
<span class="k">static</span> <span class="kt">char</span> <span class="o">*</span><span class="n">heap_used</span><span class="p">;</span>
<span class="k">static</span> <span class="kt">char</span> <span class="o">*</span><span class="n">heap_high</span><span class="p">;</span>

<span class="kt">void</span> <span class="o">*</span><span class="nf">bump_alloc</span><span class="p">(</span><span class="kt">ptrdiff_t</span> <span class="n">size</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// ...</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">bump_reset</span><span class="p">()</span>
<span class="p">{</span>
    <span class="kt">ptrdiff_t</span> <span class="n">len</span> <span class="o">=</span> <span class="n">heap_used</span> <span class="o">-</span> <span class="n">__heap_base</span><span class="p">;</span>
    <span class="n">__builtin_memset</span><span class="p">(</span><span class="n">__heap_base</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">len</span><span class="p">);</span>  <span class="c1">// wipe keys, etc.</span>
    <span class="n">heap_used</span> <span class="o">=</span> <span class="n">__heap_base</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I’ve <a href="/blog/2025/04/19/">discussed <code class="language-plaintext highlighter-rouge">__heap_base</code> before</a>, which is part of the ABI.
We’ll push keys, inputs, etc. onto this “stack”, run our cryptography
routine, copy out the result, then reset the bump allocator, which wipes
out all sensitive data. Often <code class="language-plaintext highlighter-rouge">memset</code> is insufficient — typically it’s
zero-then-free, and compilers see the <a href="/blog/2025/09/30/">lifetime</a> about to end — but no
lifetime ends here, and stores to this “heap” memory externally observable
as far as the abstract machine can tell. (Otherwise we couldn’t reliably
copy out our results!)</p>

<p>There’s a lot to this API, but I’m only going to look at <a href="https://monocypher.org/manual/aead">the AEAD
interface</a>. We “lock” up some data in an encrypted box, write any
unencrypted label we’d like on the outside. Then later we can unlock the
box, which will only open for us if neither the contents of the box nor
the label were tampered with. That’s some solid API design:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">crypto_aead_lock</span><span class="p">(</span><span class="kt">uint8_t</span>       <span class="o">*</span><span class="n">cipher_text</span><span class="p">,</span>
                      <span class="kt">uint8_t</span>        <span class="n">mac</span>  <span class="p">[</span><span class="mi">16</span><span class="p">],</span>
                      <span class="k">const</span> <span class="kt">uint8_t</span>  <span class="n">key</span>  <span class="p">[</span><span class="mi">32</span><span class="p">],</span>
                      <span class="k">const</span> <span class="kt">uint8_t</span>  <span class="n">nonce</span><span class="p">[</span><span class="mi">24</span><span class="p">],</span>
                      <span class="k">const</span> <span class="kt">uint8_t</span> <span class="o">*</span><span class="n">ad</span><span class="p">,</span>         <span class="kt">size_t</span> <span class="n">ad_size</span><span class="p">,</span>
                      <span class="k">const</span> <span class="kt">uint8_t</span> <span class="o">*</span><span class="n">plain_text</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">text_size</span><span class="p">);</span>
<span class="kt">int</span> <span class="nf">crypto_aead_unlock</span><span class="p">(</span><span class="kt">uint8_t</span>       <span class="o">*</span><span class="n">plain_text</span><span class="p">,</span>
                       <span class="k">const</span> <span class="kt">uint8_t</span>  <span class="n">mac</span>  <span class="p">[</span><span class="mi">16</span><span class="p">],</span>
                       <span class="k">const</span> <span class="kt">uint8_t</span>  <span class="n">key</span>  <span class="p">[</span><span class="mi">32</span><span class="p">],</span>
                       <span class="k">const</span> <span class="kt">uint8_t</span>  <span class="n">nonce</span><span class="p">[</span><span class="mi">24</span><span class="p">],</span>
                       <span class="k">const</span> <span class="kt">uint8_t</span> <span class="o">*</span><span class="n">ad</span><span class="p">,</span>          <span class="kt">size_t</span> <span class="n">ad_size</span><span class="p">,</span>
                       <span class="k">const</span> <span class="kt">uint8_t</span> <span class="o">*</span><span class="n">cipher_text</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">text_size</span><span class="p">);</span>
</code></pre></div></div>

<p>By compiling to Wasm we can access this functionality from Python almost
like it was pure Python, and interact with other systems using Monocypher.</p>

<p>Since Monocypher does not interact with the outside world on its own, it
relies on callers to use their system’s CSPRNG to create those nonces and
keys, which we’ll do using <a href="https://docs.python.org/3/library/secrets.html">the <code class="language-plaintext highlighter-rouge">secrets</code> built-in package</a>:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Monocypher</span><span class="p">:</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="p">...</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">_read</span>   <span class="o">=</span> <span class="n">functools</span><span class="p">.</span><span class="n">partial</span><span class="p">(</span><span class="n">memory</span><span class="p">.</span><span class="n">read</span><span class="p">,</span> <span class="n">store</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">_write</span>  <span class="o">=</span> <span class="n">functools</span><span class="p">.</span><span class="n">partial</span><span class="p">(</span><span class="n">memory</span><span class="p">.</span><span class="n">write</span><span class="p">,</span> <span class="n">store</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">__alloc</span> <span class="o">=</span> <span class="n">functools</span><span class="p">.</span><span class="n">partial</span><span class="p">(</span><span class="n">exports</span><span class="p">[</span><span class="s">"bump_alloc"</span><span class="p">],</span> <span class="n">store</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">_reset</span>  <span class="o">=</span> <span class="n">functools</span><span class="p">.</span><span class="n">partial</span><span class="p">(</span><span class="n">exports</span><span class="p">[</span><span class="s">"bump_reset"</span><span class="p">],</span> <span class="n">store</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">_lock</span>   <span class="o">=</span> <span class="n">functools</span><span class="p">.</span><span class="n">partial</span><span class="p">(</span><span class="n">exports</span><span class="p">[</span><span class="s">"crypto_aead_lock"</span><span class="p">],</span> <span class="n">store</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">_unlock</span> <span class="o">=</span> <span class="n">functools</span><span class="p">.</span><span class="n">partial</span><span class="p">(</span><span class="n">exports</span><span class="p">[</span><span class="s">"crypto_aead_unlock"</span><span class="p">],</span> <span class="n">store</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">_csprng</span> <span class="o">=</span> <span class="n">secrets</span><span class="p">.</span><span class="n">SystemRandom</span><span class="p">()</span>

    <span class="k">def</span> <span class="nf">_alloc</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">n</span><span class="p">):</span>
        <span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">__alloc</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0xffffffff</span>

    <span class="k">def</span> <span class="nf">generate_key</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">_csprng</span><span class="p">.</span><span class="n">randbytes</span><span class="p">(</span><span class="mi">32</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">generate_nonce</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">_csprng</span><span class="p">.</span><span class="n">randbytes</span><span class="p">(</span><span class="mi">24</span><span class="p">)</span>

    <span class="p">...</span>
</code></pre></div></div>

<p>With a solid foundation, all that follows comes easily. A <code class="language-plaintext highlighter-rouge">finally</code>
guarantees secrets are always removed from Wasm memory, and the rest is
just about copying bytes around:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">def</span> <span class="nf">aead_lock</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">text</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">ad</span> <span class="o">=</span> <span class="sa">b</span><span class="s">""</span><span class="p">):</span>
        <span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">key</span><span class="p">)</span> <span class="o">==</span> <span class="mi">32</span>
        <span class="k">try</span><span class="p">:</span>
            <span class="n">macptr</span>   <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_alloc</span><span class="p">(</span><span class="mi">16</span><span class="p">)</span>
            <span class="n">keyptr</span>   <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_alloc</span><span class="p">(</span><span class="mi">32</span><span class="p">)</span>
            <span class="n">nonceptr</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_alloc</span><span class="p">(</span><span class="mi">24</span><span class="p">)</span>
            <span class="n">adptr</span>    <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_alloc</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">ad</span><span class="p">))</span>
            <span class="n">textptr</span>  <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_alloc</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">text</span><span class="p">))</span>

            <span class="bp">self</span><span class="p">.</span><span class="n">_write</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">keyptr</span><span class="p">)</span>
            <span class="n">nonce</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">generate_nonce</span><span class="p">()</span>
            <span class="bp">self</span><span class="p">.</span><span class="n">_write</span><span class="p">(</span><span class="n">nonce</span><span class="p">,</span> <span class="n">nonceptr</span><span class="p">)</span>
            <span class="bp">self</span><span class="p">.</span><span class="n">_write</span><span class="p">(</span><span class="n">ad</span><span class="p">,</span>    <span class="n">adptr</span><span class="p">)</span>
            <span class="bp">self</span><span class="p">.</span><span class="n">_write</span><span class="p">(</span><span class="n">text</span><span class="p">,</span>  <span class="n">textptr</span><span class="p">)</span>

            <span class="bp">self</span><span class="p">.</span><span class="n">_lock</span><span class="p">(</span>
                <span class="n">textptr</span><span class="p">,</span>
                <span class="n">macptr</span><span class="p">,</span>
                <span class="n">keyptr</span><span class="p">,</span>
                <span class="n">nonceptr</span><span class="p">,</span>
                <span class="n">adptr</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">ad</span><span class="p">),</span>
                <span class="n">textptr</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">text</span><span class="p">),</span>
            <span class="p">)</span>
            <span class="k">return</span> <span class="p">(</span>
                <span class="bp">self</span><span class="p">.</span><span class="n">_read</span><span class="p">(</span><span class="n">macptr</span><span class="p">,</span> <span class="n">macptr</span><span class="o">+</span><span class="mi">16</span><span class="p">),</span>
                <span class="n">nonce</span><span class="p">,</span>
                <span class="bp">self</span><span class="p">.</span><span class="n">_read</span><span class="p">(</span><span class="n">textptr</span><span class="p">,</span> <span class="n">textptr</span><span class="o">+</span><span class="nb">len</span><span class="p">(</span><span class="n">text</span><span class="p">)),</span>
            <span class="p">)</span>
        <span class="k">finally</span><span class="p">:</span>
            <span class="bp">self</span><span class="p">.</span><span class="n">_reset</span><span class="p">()</span>
</code></pre></div></div>

<p>And <code class="language-plaintext highlighter-rouge">aead_unlock</code> is basically the same in reverse, but throws if the box
fails to unlock, perhaps due to tampering:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">def</span> <span class="nf">aead_unlock</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">text</span><span class="p">,</span> <span class="n">mac</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">nonce</span><span class="p">,</span> <span class="n">ad</span> <span class="o">=</span> <span class="sa">b</span><span class="s">""</span><span class="p">):</span>
        <span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">mac</span><span class="p">)</span> <span class="o">==</span> <span class="mi">16</span>
        <span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">key</span><span class="p">)</span> <span class="o">==</span> <span class="mi">32</span>
        <span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">nonce</span><span class="p">)</span> <span class="o">==</span> <span class="mi">24</span>
        <span class="k">try</span><span class="p">:</span>
            <span class="n">macptr</span>   <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_alloc</span><span class="p">(</span><span class="mi">16</span><span class="p">)</span>
            <span class="n">keyptr</span>   <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_alloc</span><span class="p">(</span><span class="mi">32</span><span class="p">)</span>
            <span class="n">nonceptr</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_alloc</span><span class="p">(</span><span class="mi">24</span><span class="p">)</span>
            <span class="n">adptr</span>    <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_alloc</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">ad</span><span class="p">))</span>
            <span class="n">textptr</span>  <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_alloc</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">text</span><span class="p">))</span>

            <span class="bp">self</span><span class="p">.</span><span class="n">_write</span><span class="p">(</span><span class="n">mac</span><span class="p">,</span> <span class="n">macptr</span><span class="p">)</span>
            <span class="bp">self</span><span class="p">.</span><span class="n">_write</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">keyptr</span><span class="p">)</span>
            <span class="bp">self</span><span class="p">.</span><span class="n">_write</span><span class="p">(</span><span class="n">nonce</span><span class="p">,</span> <span class="n">nonceptr</span><span class="p">)</span>
            <span class="bp">self</span><span class="p">.</span><span class="n">_write</span><span class="p">(</span><span class="n">ad</span><span class="p">,</span> <span class="n">adptr</span><span class="p">)</span>
            <span class="bp">self</span><span class="p">.</span><span class="n">_write</span><span class="p">(</span><span class="n">text</span><span class="p">,</span> <span class="n">textptr</span><span class="p">)</span>

            <span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">_unlock</span><span class="p">(</span>
                <span class="n">textptr</span><span class="p">,</span>
                <span class="n">macptr</span><span class="p">,</span>
                <span class="n">keyptr</span><span class="p">,</span>
                <span class="n">nonceptr</span><span class="p">,</span>
                <span class="n">adptr</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">ad</span><span class="p">),</span>
                <span class="n">textptr</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">text</span><span class="p">),</span>
            <span class="p">):</span>
                <span class="k">raise</span> <span class="nb">ValueError</span><span class="p">(</span><span class="s">"AEAD mismatch"</span><span class="p">)</span>
            <span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">_read</span><span class="p">(</span><span class="n">textptr</span><span class="p">,</span> <span class="n">textptr</span><span class="o">+</span><span class="nb">len</span><span class="p">(</span><span class="n">text</span><span class="p">))</span>
        <span class="k">finally</span><span class="p">:</span>
            <span class="bp">self</span><span class="p">.</span><span class="n">_reset</span><span class="p">()</span>
</code></pre></div></div>

<p>Usage:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mc</span> <span class="o">=</span> <span class="n">Monocypher</span><span class="p">()</span>
<span class="n">key</span> <span class="o">=</span> <span class="n">mc</span><span class="p">.</span><span class="n">generate_key</span><span class="p">()</span>
<span class="n">message</span> <span class="o">=</span> <span class="s">"Hello, world!"</span>
<span class="n">mac</span><span class="p">,</span> <span class="n">nonce</span><span class="p">,</span> <span class="n">encrypted</span> <span class="o">=</span> <span class="n">mc</span><span class="p">.</span><span class="n">aead_lock</span><span class="p">(</span><span class="n">message</span><span class="p">.</span><span class="n">encode</span><span class="p">(),</span> <span class="n">key</span><span class="p">)</span>
</code></pre></div></div>

<p>Transmit <code class="language-plaintext highlighter-rouge">mac</code>, <code class="language-plaintext highlighter-rouge">nonce</code>, and <code class="language-plaintext highlighter-rouge">encrypted</code> to the other party (or your
future self), who already has the <code class="language-plaintext highlighter-rouge">key</code>:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">decrypted</span> <span class="o">=</span> <span class="n">mc</span><span class="p">.</span><span class="n">aead_unlock</span><span class="p">(</span><span class="n">encrypted</span><span class="p">,</span> <span class="n">mac</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">nonce</span><span class="p">)</span>
</code></pre></div></div>

<p>Find the <strong>complete source <a href="https://github.com/skeeto/scratch/tree/master/wasm-monocypher">in my scratch repository</a></strong>.</p>

<p>While I have a few reservations about wasmtime-py, it fascinates me how
well this all works. It’s been my hammer in search of a nail for some time
now.</p>

]]>
    </content>
  </entry>
  
  <entry>
    <title>Freestyle linked lists tricks</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2025/12/31/"/>
    <id>urn:uuid:355dfc03-0e7c-4bae-92fe-5b52174de325</id>
    <updated>2025-12-31T11:59:59Z</updated>
    <category term="c"/>
    <content type="html">
      <![CDATA[<p>Linked lists are a data structure basic building block, with especially
flexible allocation behavior. They’re not just a useful starting point,
but sometimes a sound foundation for future growth. I’m going to start
with the beginner stuff, then <em>without disrupting the original linked
list</em>, enhance it with new capabilities.</p>

<h3 id="linked-list-basics">Linked list basics</h3>

<p>For the sake of an interesting example, I’m will demonstrate with the same
concept as <a href="/blog/2025/01/19/">last time I talked about data structures</a>: a collection
of key/value strings, in the form of an environment variables. This time
in linked list form:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
    <span class="kt">char</span>     <span class="o">*</span><span class="n">data</span><span class="p">;</span>
    <span class="kt">ptrdiff_t</span> <span class="n">len</span><span class="p">;</span>
<span class="p">}</span> <span class="n">Str</span><span class="p">;</span>

<span class="kt">uint64_t</span> <span class="nf">hash64</span><span class="p">(</span><span class="n">Str</span><span class="p">);</span>
<span class="n">bool</span>     <span class="nf">equals</span><span class="p">(</span><span class="n">Str</span><span class="p">,</span> <span class="n">Str</span><span class="p">);</span>

<span class="k">typedef</span> <span class="k">struct</span> <span class="n">Env</span> <span class="n">Env</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">Env</span> <span class="p">{</span>
    <span class="n">Env</span> <span class="o">*</span><span class="n">next</span><span class="p">;</span>
    <span class="n">Str</span>  <span class="n">key</span><span class="p">;</span>
    <span class="n">Str</span>  <span class="n">value</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>It will be sourced from some string, formatted like the <code class="language-plaintext highlighter-rouge">env</code> program:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">Str</span> <span class="n">input</span> <span class="o">=</span> <span class="n">S</span><span class="p">(</span>
        <span class="s">"EDITOR=vim</span><span class="se">\n</span><span class="s">"</span>
        <span class="s">"HOME=/home/user</span><span class="se">\n</span><span class="s">"</span>
        <span class="s">"PATH=/bin:/usr/bin</span><span class="se">\n</span><span class="s">"</span>
        <span class="s">"SHELL=/bin/bash</span><span class="se">\n</span><span class="s">"</span>
        <span class="s">"TERM=xterm-256color</span><span class="se">\n</span><span class="s">"</span>
        <span class="s">"USER=user</span><span class="se">\n</span><span class="s">"</span>
        <span class="s">"SHELL=/bin/sh</span><span class="se">\n</span><span class="s">"</span>   <span class="c1">// &lt;- repeated entry</span>
    <span class="p">);</span>
</code></pre></div></div>

<p>And all the parser heavy lifting will be done by <a href="/blog/2025/03/02/">our ever-handy <code class="language-plaintext highlighter-rouge">cut</code>
function</a>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
    <span class="n">Str</span> <span class="n">tail</span><span class="p">;</span>
    <span class="n">Str</span> <span class="n">head</span><span class="p">;</span>
<span class="p">}</span> <span class="n">Cut</span><span class="p">;</span>

<span class="n">Cut</span> <span class="nf">cut</span><span class="p">(</span><span class="n">Str</span><span class="p">,</span> <span class="kt">char</span><span class="p">);</span>
</code></pre></div></div>

<p>The simplest way to build up a linked list is like a stack, pushing
objects into the front. Zero-initialized <code class="language-plaintext highlighter-rouge">head</code> pointer, point the new
node at it, then make that node the new <code class="language-plaintext highlighter-rouge">head</code> element:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Env</span> <span class="o">*</span><span class="nf">parse_reversed</span><span class="p">(</span><span class="n">Str</span> <span class="n">s</span><span class="p">,</span> <span class="n">Arena</span> <span class="o">*</span><span class="n">a</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">Env</span> <span class="o">*</span><span class="n">head</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>  <span class="c1">// 1</span>
    <span class="k">for</span> <span class="p">(</span><span class="n">Cut</span> <span class="n">line</span> <span class="o">=</span> <span class="p">{</span><span class="n">s</span><span class="p">};</span> <span class="n">line</span><span class="p">.</span><span class="n">tail</span><span class="p">.</span><span class="n">len</span><span class="p">;)</span> <span class="p">{</span>
        <span class="n">line</span> <span class="o">=</span> <span class="n">cut</span><span class="p">(</span><span class="n">line</span><span class="p">.</span><span class="n">tail</span><span class="p">,</span> <span class="sc">'\n'</span><span class="p">);</span>
        <span class="n">Cut</span>  <span class="n">pair</span>  <span class="o">=</span> <span class="n">cut</span><span class="p">(</span><span class="n">line</span><span class="p">.</span><span class="n">head</span><span class="p">,</span> <span class="sc">'='</span><span class="p">);</span>
        <span class="n">Env</span> <span class="o">*</span><span class="n">env</span>   <span class="o">=</span> <span class="n">new</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">Env</span><span class="p">);</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">key</span>   <span class="o">=</span> <span class="n">pair</span><span class="p">.</span><span class="n">head</span><span class="p">;</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">value</span> <span class="o">=</span> <span class="n">pair</span><span class="p">.</span><span class="n">tail</span><span class="p">;</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">next</span>  <span class="o">=</span> <span class="n">head</span><span class="p">;</span>  <span class="c1">// 2</span>
        <span class="n">head</span> <span class="o">=</span> <span class="n">env</span><span class="p">;</span>  <span class="c1">// 3</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">head</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>That’s it, a complete linked list implementation in three lines of code.
No big deal. Because of the bump allocator, nodes are packed in order in
memory, so the usual cache objections for linked lists do not apply. LIFO
semantics mean the linked list is in reverse order from the source order.
If we’re doing a linear scan through the linked list, the last entry in
the source wins, which may be what you wanted:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Str</span> <span class="nf">lookup_linear</span><span class="p">(</span><span class="n">Env</span> <span class="o">*</span><span class="n">env</span><span class="p">,</span> <span class="n">Str</span> <span class="n">key</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">for</span> <span class="p">(</span><span class="n">Env</span> <span class="o">*</span><span class="n">var</span> <span class="o">=</span> <span class="n">env</span><span class="p">;</span> <span class="n">var</span><span class="p">;</span> <span class="n">var</span> <span class="o">=</span> <span class="n">var</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">equals</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">var</span><span class="o">-&gt;</span><span class="n">key</span><span class="p">))</span> <span class="p">{</span>
            <span class="k">return</span> <span class="n">var</span><span class="o">-&gt;</span><span class="n">value</span><span class="p">;</span>
        <span class="p">}</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="p">(</span><span class="n">Str</span><span class="p">){};</span>
<span class="p">}</span>

    <span class="c1">// ...</span>
    <span class="n">Env</span> <span class="o">*</span><span class="n">env</span>  <span class="o">=</span> <span class="n">parse_reversed</span><span class="p">(</span><span class="n">input</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">scratch</span><span class="p">);</span>
    <span class="n">Str</span> <span class="n">value</span> <span class="o">=</span> <span class="n">lookup_linear</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">S</span><span class="p">(</span><span class="s">"SHELL"</span><span class="p">));</span>  <span class="c1">// &lt;- "/bin/sh"</span>
</code></pre></div></div>

<p>It’s just one more line of code to maintain the original order, using a
very simple double-pointer technique:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Env</span> <span class="o">*</span><span class="nf">parse_ordered</span><span class="p">(</span><span class="n">Str</span> <span class="n">s</span><span class="p">,</span> <span class="n">Arena</span> <span class="o">*</span><span class="n">a</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">Env</span>  <span class="o">*</span><span class="n">head</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>  <span class="c1">// 1</span>
    <span class="n">Env</span> <span class="o">**</span><span class="n">tail</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">head</span><span class="p">;</span>  <span class="c1">// 2</span>
    <span class="k">for</span> <span class="p">(</span><span class="n">Cut</span> <span class="n">line</span> <span class="o">=</span> <span class="p">{</span><span class="n">s</span><span class="p">};</span> <span class="n">line</span><span class="p">.</span><span class="n">tail</span><span class="p">.</span><span class="n">len</span><span class="p">;)</span> <span class="p">{</span>
        <span class="c1">// ...</span>
        <span class="o">*</span><span class="n">tail</span> <span class="o">=</span> <span class="n">env</span><span class="p">;</span>  <span class="c1">// 3</span>
        <span class="n">tail</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">env</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">;</span>  <span class="c1">// 4</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">head</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>No branches necessary, nor dummy nodes. A pointer to the last pointer in
the list works even for empty lists. The <code class="language-plaintext highlighter-rouge">tail</code> pointer is unneeded once
the list is complete. This form has queue behavior.</p>

<h3 id="faster-look-up-with-a-tree">Faster look-up with a tree</h3>

<p>If you’re doing many look-ups, or if the list is long, those linear scans
to find items in the list are not ideal. We can introduce an intrusive
hash map, in the form of <a href="/blog/2023/09/30/">a hash trie</a>, by adding two more pointers
to the linked list:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="n">Env</span> <span class="n">Env</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">Env</span> <span class="p">{</span>
    <span class="n">Env</span> <span class="o">*</span><span class="n">next</span><span class="p">;</span>
    <span class="n">Env</span> <span class="o">*</span><span class="n">child</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span>  <span class="c1">// &lt;- hash map linkage</span>
    <span class="n">Str</span>  <span class="n">key</span><span class="p">;</span>
    <span class="n">Str</span>  <span class="n">value</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>I’ve found it’s simplest to construct a node into the hash map, then link
it onto the list tail. That constructor looks like this:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Env</span> <span class="o">*</span><span class="nf">new_env</span><span class="p">(</span><span class="n">Arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="n">Env</span> <span class="o">**</span><span class="n">env</span><span class="p">,</span> <span class="n">Str</span> <span class="n">key</span><span class="p">,</span> <span class="n">Str</span> <span class="n">value</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="n">h</span> <span class="o">=</span> <span class="n">hash64</span><span class="p">(</span><span class="n">key</span><span class="p">);</span> <span class="o">*</span><span class="n">env</span><span class="p">;</span> <span class="n">h</span> <span class="o">&lt;&lt;=</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">env</span> <span class="o">=</span> <span class="o">&amp;</span><span class="p">(</span><span class="o">*</span><span class="n">env</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">child</span><span class="p">[</span><span class="n">h</span><span class="o">&gt;&gt;</span><span class="mi">63</span><span class="p">];</span>
    <span class="p">}</span>
    <span class="o">*</span><span class="n">env</span> <span class="o">=</span> <span class="n">new</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">Env</span><span class="p">);</span>
    <span class="p">(</span><span class="o">*</span><span class="n">env</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">key</span> <span class="o">=</span> <span class="n">key</span><span class="p">;</span>
    <span class="p">(</span><span class="o">*</span><span class="n">env</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">value</span> <span class="o">=</span> <span class="n">value</span><span class="p">;</span>
    <span class="k">return</span> <span class="o">*</span><span class="n">env</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Then we swap that into the <code class="language-plaintext highlighter-rouge">head</code>/<code class="language-plaintext highlighter-rouge">tail</code> version in place of the original
<code class="language-plaintext highlighter-rouge">new</code> macro call:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Env</span> <span class="o">*</span><span class="nf">parse_mapped</span><span class="p">(</span><span class="n">Str</span> <span class="n">s</span><span class="p">,</span> <span class="n">Arena</span> <span class="o">*</span><span class="n">a</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">Env</span>  <span class="o">*</span><span class="n">head</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">Env</span> <span class="o">**</span><span class="n">tail</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">head</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="n">Cut</span> <span class="n">line</span> <span class="o">=</span> <span class="p">{</span><span class="n">s</span><span class="p">};</span> <span class="n">line</span><span class="p">.</span><span class="n">tail</span><span class="p">.</span><span class="n">len</span><span class="p">;)</span> <span class="p">{</span>
        <span class="c1">// ...</span>
        <span class="n">Env</span> <span class="o">*</span><span class="n">env</span> <span class="o">=</span> <span class="n">new_env</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">head</span><span class="p">,</span> <span class="n">pair</span><span class="p">.</span><span class="n">head</span><span class="p">,</span> <span class="n">pair</span><span class="p">.</span><span class="n">tail</span><span class="p">);</span>
        <span class="o">*</span><span class="n">tail</span> <span class="o">=</span> <span class="n">env</span><span class="p">;</span>
        <span class="n">tail</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">env</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">head</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This is now a linked list and a hash map at the same time, built-up piece
by piece without any resizing. We still have the original linked list, but
we can now search it in log time. The look-up function resembles the
constructor:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Str</span> <span class="nf">lookup_logn</span><span class="p">(</span><span class="n">Env</span> <span class="o">*</span><span class="n">env</span><span class="p">,</span> <span class="n">Str</span> <span class="n">key</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="n">h</span> <span class="o">=</span> <span class="n">hash64</span><span class="p">(</span><span class="n">key</span><span class="p">);</span> <span class="n">env</span><span class="p">;</span> <span class="n">h</span> <span class="o">&lt;&lt;=</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">equals</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">key</span><span class="p">))</span> <span class="p">{</span>
            <span class="k">return</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">value</span><span class="p">;</span>
        <span class="p">}</span>
        <span class="n">env</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">child</span><span class="p">[</span><span class="n">h</span><span class="o">&gt;&gt;</span><span class="mi">63</span><span class="p">];</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="p">(</span><span class="n">Str</span><span class="p">){};</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Because of the FIFO semantics, it finds the first match in the source:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">Env</span> <span class="o">*</span><span class="n">env</span>   <span class="o">=</span> <span class="n">parse_mapped</span><span class="p">(</span><span class="n">input</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">scratch</span><span class="p">);</span>
    <span class="n">Str</span>  <span class="n">value</span> <span class="o">=</span> <span class="n">lookup_logn</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">S</span><span class="p">(</span><span class="s">"SHELL"</span><span class="p">));</span>  <span class="c1">// &lt;- /bin/bash</span>
</code></pre></div></div>

<p>The other matches are also in the tree, and we can find those as well by
continuing traversal. That is, it’s already a multi-map. This particular
interface can’t pick up where it left off, but we can build one that does
using an iterator/cursor:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">hash</span><span class="p">;</span>
    <span class="n">Str</span>      <span class="n">key</span><span class="p">;</span>
    <span class="n">Env</span>     <span class="o">*</span><span class="n">env</span><span class="p">;</span>
<span class="p">}</span> <span class="n">EnvIter</span><span class="p">;</span>

<span class="n">EnvIter</span> <span class="nf">new_enviter</span><span class="p">(</span><span class="n">Env</span> <span class="o">*</span><span class="n">env</span><span class="p">,</span> <span class="n">Str</span> <span class="n">key</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="p">(</span><span class="n">EnvIter</span><span class="p">){</span><span class="n">hash64</span><span class="p">(</span><span class="n">key</span><span class="p">),</span> <span class="n">key</span><span class="p">,</span> <span class="n">env</span><span class="p">};</span>
<span class="p">}</span>

<span class="n">Str</span> <span class="nf">enviter_next</span><span class="p">(</span><span class="n">EnvIter</span> <span class="o">*</span><span class="n">it</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">while</span> <span class="p">(</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">env</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">Env</span> <span class="o">*</span><span class="n">cur</span> <span class="o">=</span> <span class="n">it</span><span class="o">-&gt;</span><span class="n">env</span><span class="p">;</span>
        <span class="n">it</span><span class="o">-&gt;</span><span class="n">env</span> <span class="o">=</span> <span class="n">it</span><span class="o">-&gt;</span><span class="n">env</span><span class="o">-&gt;</span><span class="n">child</span><span class="p">[</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">hash</span><span class="o">&gt;&gt;</span><span class="mi">63</span><span class="p">];</span>
        <span class="n">it</span><span class="o">-&gt;</span><span class="n">hash</span> <span class="o">&lt;&lt;=</span> <span class="mi">1</span><span class="p">;</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">equals</span><span class="p">(</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">key</span><span class="p">,</span> <span class="n">cur</span><span class="o">-&gt;</span><span class="n">key</span><span class="p">))</span> <span class="p">{</span>
            <span class="k">return</span> <span class="n">cur</span><span class="o">-&gt;</span><span class="n">value</span><span class="p">;</span>
        <span class="p">}</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="p">(</span><span class="n">Str</span><span class="p">){};</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Update</strong>: Thanks to <a href="https://lists.sr.ht/~skeeto/public-inbox/%3CSJ2PR12MB79208563F4485DCAA27D5776A2BAA@SJ2PR12MB7920.namprd12.prod.outlook.com%3E?__goaway_challenge=meta-refresh&amp;__goaway_id=5902363e020028d0488062799debf13b&amp;__goaway_referer=https%3A%2F%2Flists.sr.ht%2F~skeeto%2Fpublic-inbox">Daniel Kareh for a correction</a>.</p>

<p>Then we can use a loop to visit every match in source order:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">Env</span> <span class="o">*</span><span class="n">env</span> <span class="o">=</span> <span class="n">parse_mapped</span><span class="p">(</span><span class="n">input</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">scratch</span><span class="p">);</span>
    <span class="k">for</span> <span class="p">(</span><span class="n">EnvIter</span> <span class="n">it</span> <span class="o">=</span> <span class="n">new_enviter</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">S</span><span class="p">(</span><span class="s">"SHELL"</span><span class="p">));;)</span> <span class="p">{</span>
        <span class="n">Str</span> <span class="n">value</span> <span class="o">=</span> <span class="n">enviter_next</span><span class="p">(</span><span class="o">&amp;</span><span class="n">it</span><span class="p">);</span>
        <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">value</span><span class="p">.</span><span class="n">data</span><span class="p">)</span> <span class="k">break</span><span class="p">;</span>
        <span class="c1">// ...</span>
    <span class="p">}</span>
</code></pre></div></div>

<h3 id="faster-look-up-with-an-index-table">Faster look-up with an index table</h3>

<p>If the list is static once constructed, or if look-ups happen much more
frequently than the list grows, we can find list items even faster by
constructing an index table over the list: <a href="/blog/2022/08/08/">an MSI hash table</a>. This
table avoids redundancy by <em>sharing structure with the list</em>. Because it’s
a flat table, if we keep adding to the list then eventually we’ll need to
reconstruct a larger table when it becomes overloaded.</p>

<p>The table itself has a very simple structure, just an array and its size,
expressed as a power-of-two exponent:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
    <span class="n">Env</span> <span class="o">**</span><span class="n">slots</span><span class="p">;</span>
    <span class="kt">int</span>   <span class="n">exp</span><span class="p">;</span>
<span class="p">}</span> <span class="n">EnvTable</span><span class="p">;</span>
</code></pre></div></div>

<p>We do not need the <code class="language-plaintext highlighter-rouge">child</code> nodes, and so linked list nodes are untouched.
That is, it’s not intrusive. In fact, we can build any arbitrary number of
tables over a list, perhaps indexing different properties for different
sorts of queries. The idea is that we build the list first, then create
the table:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">EnvTable</span> <span class="nf">new_table</span><span class="p">(</span><span class="n">Arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="n">Env</span> <span class="o">*</span><span class="n">env</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// Compute list length</span>
    <span class="kt">ptrdiff_t</span> <span class="n">len</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="n">Env</span> <span class="o">*</span><span class="n">var</span> <span class="o">=</span> <span class="n">env</span><span class="p">;</span> <span class="n">var</span><span class="p">;</span> <span class="n">var</span> <span class="o">=</span> <span class="n">var</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">len</span><span class="o">++</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// Then compute an appropriate table size</span>
    <span class="n">EnvTable</span> <span class="n">table</span> <span class="o">=</span> <span class="p">{};</span>
    <span class="n">table</span><span class="p">.</span><span class="n">exp</span> <span class="o">=</span> <span class="mi">3</span><span class="p">;</span>
    <span class="kt">ptrdiff_t</span> <span class="n">one</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(;</span> <span class="p">(</span><span class="n">one</span><span class="o">&lt;&lt;</span><span class="n">table</span><span class="p">.</span><span class="n">exp</span><span class="p">)</span> <span class="o">-</span> <span class="p">(</span><span class="n">one</span><span class="o">&lt;&lt;</span><span class="p">(</span><span class="n">table</span><span class="p">.</span><span class="n">exp</span><span class="o">-</span><span class="mi">3</span><span class="p">))</span> <span class="o">&lt;</span> <span class="n">len</span><span class="p">;</span> <span class="n">table</span><span class="p">.</span><span class="n">exp</span><span class="o">++</span><span class="p">)</span> <span class="p">{}</span>
    <span class="n">table</span><span class="p">.</span><span class="n">slots</span> <span class="o">=</span> <span class="n">new</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">one</span><span class="o">&lt;&lt;</span><span class="n">table</span><span class="p">.</span><span class="n">exp</span><span class="p">,</span> <span class="n">Env</span> <span class="o">*</span><span class="p">);</span>

    <span class="c1">// Then insert linked list items into the table</span>
    <span class="k">for</span> <span class="p">(</span><span class="n">Env</span> <span class="o">*</span><span class="n">var</span> <span class="o">=</span> <span class="n">env</span><span class="p">;</span> <span class="n">var</span><span class="p">;</span> <span class="n">var</span> <span class="o">=</span> <span class="n">var</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">uint64_t</span> <span class="n">hash</span> <span class="o">=</span> <span class="n">hash64</span><span class="p">(</span><span class="n">var</span><span class="o">-&gt;</span><span class="n">key</span><span class="p">);</span>
        <span class="kt">size_t</span>   <span class="n">mask</span> <span class="o">=</span> <span class="p">((</span><span class="kt">size_t</span><span class="p">)</span><span class="mi">1</span> <span class="o">&lt;&lt;</span> <span class="n">table</span><span class="p">.</span><span class="n">exp</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span>
        <span class="kt">size_t</span>   <span class="n">step</span> <span class="o">=</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)(</span><span class="n">hash</span> <span class="o">&gt;&gt;</span> <span class="p">(</span><span class="mi">64</span> <span class="o">-</span> <span class="n">table</span><span class="p">.</span><span class="n">exp</span><span class="p">))</span> <span class="o">|</span> <span class="mi">1</span><span class="p">;</span>
        <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)</span><span class="n">hash</span><span class="p">;;)</span> <span class="p">{</span>
            <span class="n">i</span> <span class="o">=</span> <span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="n">step</span><span class="p">)</span> <span class="o">&amp;</span> <span class="n">mask</span><span class="p">;</span>
            <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">table</span><span class="p">.</span><span class="n">slots</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="p">{</span>
                <span class="n">table</span><span class="p">.</span><span class="n">slots</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">var</span><span class="p">;</span>
                <span class="k">break</span><span class="p">;</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>

    <span class="k">return</span> <span class="n">table</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Note how only searches for an empty slot, not for a matching entry. That’s
because this too is a multi-map, also with elements in insertion order.
Look-ups are constant time:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Str</span> <span class="nf">lookup_constant</span><span class="p">(</span><span class="n">EnvTable</span> <span class="n">table</span><span class="p">,</span> <span class="n">Str</span> <span class="n">key</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">hash</span> <span class="o">=</span> <span class="n">hash64</span><span class="p">(</span><span class="n">key</span><span class="p">);</span>
    <span class="kt">size_t</span>   <span class="n">mask</span> <span class="o">=</span> <span class="p">((</span><span class="kt">size_t</span><span class="p">)</span><span class="mi">1</span> <span class="o">&lt;&lt;</span> <span class="n">table</span><span class="p">.</span><span class="n">exp</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span>
    <span class="kt">size_t</span>   <span class="n">step</span> <span class="o">=</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)(</span><span class="n">hash</span> <span class="o">&gt;&gt;</span> <span class="p">(</span><span class="mi">64</span> <span class="o">-</span> <span class="n">table</span><span class="p">.</span><span class="n">exp</span><span class="p">))</span> <span class="o">|</span> <span class="mi">1</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)</span><span class="n">hash</span><span class="p">;;)</span> <span class="p">{</span>
        <span class="n">i</span> <span class="o">=</span> <span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="n">step</span><span class="p">)</span> <span class="o">&amp;</span> <span class="n">mask</span><span class="p">;</span>
        <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">table</span><span class="p">.</span><span class="n">slots</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="p">{</span>
            <span class="k">return</span> <span class="p">(</span><span class="n">Str</span><span class="p">){};</span>
        <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">equals</span><span class="p">(</span><span class="n">table</span><span class="p">.</span><span class="n">slots</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">key</span><span class="p">,</span> <span class="n">key</span><span class="p">))</span> <span class="p">{</span>
            <span class="k">return</span> <span class="n">table</span><span class="p">.</span><span class="n">slots</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">value</span><span class="p">;</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>It finds the earliest match in the list, meaning an index over the
“reverse” list will find the last entry in the source. The indexed-over
property is the input to <code class="language-plaintext highlighter-rouge">hash64</code> and <code class="language-plaintext highlighter-rouge">equals</code>. By using a different input
to these functions we could build another table on, say, value length if
that’s a property on which we needed to find elements efficiently. Again,
for multi-map iteration we need some kind of iterator or cursor:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
    <span class="n">EnvTable</span> <span class="n">table</span><span class="p">;</span>
    <span class="n">Str</span>      <span class="n">key</span><span class="p">;</span>
    <span class="kt">size_t</span>   <span class="n">step</span><span class="p">;</span>
    <span class="kt">size_t</span>   <span class="n">i</span><span class="p">;</span>
<span class="p">}</span> <span class="n">TableIter</span><span class="p">;</span>

<span class="n">TableIter</span> <span class="nf">new_tableiter</span><span class="p">(</span><span class="n">EnvTable</span> <span class="n">table</span><span class="p">,</span> <span class="n">Str</span> <span class="n">key</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">hash</span> <span class="o">=</span> <span class="n">hash64</span><span class="p">(</span><span class="n">key</span><span class="p">);</span>
    <span class="kt">size_t</span>   <span class="n">step</span> <span class="o">=</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)(</span><span class="n">hash</span> <span class="o">&gt;&gt;</span> <span class="p">(</span><span class="mi">64</span> <span class="o">-</span> <span class="n">table</span><span class="p">.</span><span class="n">exp</span><span class="p">))</span> <span class="o">|</span> <span class="mi">1</span><span class="p">;</span>
    <span class="kt">size_t</span>   <span class="n">idx</span>  <span class="o">=</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)</span><span class="n">hash</span><span class="p">;</span>
    <span class="k">return</span> <span class="p">(</span><span class="n">TableIter</span><span class="p">){</span><span class="n">table</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">step</span><span class="p">,</span> <span class="n">idx</span><span class="p">};</span>
<span class="p">}</span>

<span class="n">Str</span> <span class="nf">table_next</span><span class="p">(</span><span class="n">TableIter</span> <span class="o">*</span><span class="n">it</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">size_t</span> <span class="n">mask</span>  <span class="o">=</span> <span class="p">((</span><span class="kt">size_t</span><span class="p">)</span><span class="mi">1</span> <span class="o">&lt;&lt;</span> <span class="n">it</span><span class="o">-&gt;</span><span class="n">table</span><span class="p">.</span><span class="n">exp</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span>
    <span class="n">Env</span>  <span class="o">**</span><span class="n">slots</span> <span class="o">=</span> <span class="n">it</span><span class="o">-&gt;</span><span class="n">table</span><span class="p">.</span><span class="n">slots</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(;;)</span> <span class="p">{</span>
        <span class="n">it</span><span class="o">-&gt;</span><span class="n">i</span> <span class="o">=</span> <span class="p">(</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">i</span> <span class="o">+</span> <span class="n">it</span><span class="o">-&gt;</span><span class="n">step</span><span class="p">)</span> <span class="o">&amp;</span> <span class="n">mask</span><span class="p">;</span>
        <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">slots</span><span class="p">[</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">i</span><span class="p">])</span> <span class="p">{</span>
            <span class="k">return</span> <span class="p">(</span><span class="n">Str</span><span class="p">){};</span>
        <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">equals</span><span class="p">(</span><span class="n">slots</span><span class="p">[</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">i</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">key</span><span class="p">,</span> <span class="n">it</span><span class="o">-&gt;</span><span class="n">key</span><span class="p">))</span> <span class="p">{</span>
            <span class="k">return</span> <span class="n">slots</span><span class="p">[</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">i</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">value</span><span class="p">;</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Its usage looks just like the other multi-map:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">Env</span> <span class="o">*</span><span class="n">env</span> <span class="o">=</span> <span class="n">parse_ordered</span><span class="p">(</span><span class="n">input</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">scratch</span><span class="p">);</span>
    <span class="n">EnvTable</span> <span class="n">table</span> <span class="o">=</span> <span class="n">new_table</span><span class="p">(</span><span class="o">&amp;</span><span class="n">scratch</span><span class="p">,</span> <span class="n">env</span><span class="p">);</span>
    <span class="k">for</span> <span class="p">(</span><span class="n">TableIter</span> <span class="n">it</span> <span class="o">=</span> <span class="n">new_tableiter</span><span class="p">(</span><span class="n">table</span><span class="p">,</span> <span class="n">S</span><span class="p">(</span><span class="s">"SHELL"</span><span class="p">));;)</span> <span class="p">{</span>
        <span class="n">Str</span> <span class="n">value</span> <span class="o">=</span> <span class="n">table_next</span><span class="p">(</span><span class="o">&amp;</span><span class="n">it</span><span class="p">);</span>
        <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">value</span><span class="p">.</span><span class="n">data</span><span class="p">)</span> <span class="k">break</span><span class="p">;</span>
        <span class="c1">// ...</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>With these techniques at hand, I can start with linked lists when they are
convenient, and later add needed features without fundamentally changing
the underlying data structure. None of this requires runtime support, and
so it fits comfortably on embedded systems, tiny WebAssembly programs,
etc.  All the above code is available ready to run: <a href="https://gist.github.com/skeeto/493823d5956dfdc1d95d8c390c2b0e1d"><code class="language-plaintext highlighter-rouge">list.c</code></a>.</p>

]]>
    </content>
  </entry>
  
  <entry>
    <title>Unix "find" expressions compiled to bytecode</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2025/12/23/"/>
    <id>urn:uuid:bbe2671b-378d-40b1-9564-c3a3b798dfb4</id>
    <updated>2025-12-23T04:20:22Z</updated>
    <category term="c"/><category term="compsci"/>
    <content type="html">
      <![CDATA[<p>In preparation for a future project, I was thinking about at the <a href="https://pubs.opengroup.org/onlinepubs/9799919799/utilities/find.html">unix
<code class="language-plaintext highlighter-rouge">find</code> utility</a>. It operates a file system hierarchies, with basic
operations selected and filtered using a specialized expression language.
Users compose operations using unary and binary operators, grouping with
parentheses for precedence. <code class="language-plaintext highlighter-rouge">find</code> may apply the expression to a great
many files, so compiling it into a bytecode, resolving as much as possible
ahead of time, and minimizing the per-element work, seems like a prudent
implementation strategy. With some thought, I worked out a technique to do
so, which was simpler than I expected, and I’m pleased with the results. I
was later surprised all the real world <code class="language-plaintext highlighter-rouge">find</code> implementations I examined
use <a href="https://craftinginterpreters.com/a-tree-walk-interpreter.html">tree-walk interpreters</a> instead. This article describes how my
compiler works, with a runnable example, and lists ideas for improvements.</p>

<p>For a quick overview, the syntax looks like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ find [-H|-L] path... [expression...]
</code></pre></div></div>

<p>Technically at least one path is required, but most implementations imply
<code class="language-plaintext highlighter-rouge">.</code> when none are provided. If no expression is supplied, the default is
<code class="language-plaintext highlighter-rouge">-print</code>, e.g. print everything under each listed path. This prints the
whole tree, including directories, under the current directory:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ find .
</code></pre></div></div>

<p>To only print files, we could use <code class="language-plaintext highlighter-rouge">-type f</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ find . -type f -a -print
</code></pre></div></div>

<p>Where <code class="language-plaintext highlighter-rouge">-a</code> is the logical AND binary operator. <code class="language-plaintext highlighter-rouge">-print</code> always evaluates
to true. It’s never necessary to write <code class="language-plaintext highlighter-rouge">-a</code>, and adjacent operations are
implicitly joined with <code class="language-plaintext highlighter-rouge">-a</code>. We can keep chaining them, such as finding
all executable files:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ find . -type f -executable -print
</code></pre></div></div>

<p>If no <code class="language-plaintext highlighter-rouge">-exec</code>, <code class="language-plaintext highlighter-rouge">-ok</code>, or <code class="language-plaintext highlighter-rouge">-print</code> (or similar side-effect extensions like
<code class="language-plaintext highlighter-rouge">-print0</code> or <code class="language-plaintext highlighter-rouge">-delete</code>) are present, the whole expression is wrapped in an
implicit <code class="language-plaintext highlighter-rouge">( expr ) -print</code>. So we could also write this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ find . -type f -executable
</code></pre></div></div>

<p>Use <code class="language-plaintext highlighter-rouge">-o</code> for logical OR. To print all files with the executable bit <em>or</em>
with a <code class="language-plaintext highlighter-rouge">.exe</code> extension:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ find . -type f \( -executable -o -name '*.exe' \)
</code></pre></div></div>

<p>I needed parentheses because <code class="language-plaintext highlighter-rouge">-o</code> has lower precedence than <code class="language-plaintext highlighter-rouge">-a</code>, and
because parentheses are shell metacharacters I also needed to escape them
for the shell. It’s a shame <code class="language-plaintext highlighter-rouge">find</code> didn’t use <code class="language-plaintext highlighter-rouge">[</code> and <code class="language-plaintext highlighter-rouge">]</code> instead! There’s
also a unary logical NOT operator, <code class="language-plaintext highlighter-rouge">!</code>. To print all non-executable files:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ find . -type f ! -executable
</code></pre></div></div>

<p>Binary operators are short-circuiting, so this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ find -type d -a -exec du -sh {} +
</code></pre></div></div>

<p>Only lists the sizes of directories, as the <code class="language-plaintext highlighter-rouge">-type d</code> fails causing the
whole expression to evaluate to false without evaluating <code class="language-plaintext highlighter-rouge">-exec</code>. Or
equivalently with <code class="language-plaintext highlighter-rouge">-o</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ find ! -type d -o -exec du -sh {} +
</code></pre></div></div>

<p>If it’s not a directory then the left-hand side evaluates to true, and the
right-hand side is not evaluated. All three implementations I examined
(GNU, BSD, BusyBox) have a <code class="language-plaintext highlighter-rouge">-regex</code> extension, and eagerly compile the
regular expression even if the operation is never evaluated:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ find . -print -o -regex [
find: bad regex '[': Invalid regular expression
</code></pre></div></div>

<p>I was surprised by this because it doesn’t seem to be in the spirit of the
original utility (“The second expression shall not be evaluated if the
first expression is true.”), and I’m used to the idea of short-circuit
validation for the right-hand side of a logical expression. Recompiling
for each evaluation would be unwise, but it could happen lazily such that
an invalid regular expression only causes an error if it’s actually used.
No big deal, just a curiosity.</p>

<h3 id="bytecode-design">Bytecode design</h3>

<p>A bytecode interpreter needs to track just one result at a time, making it
a single register machine, with a 1-bit register at that. I came up with
these five opcodes:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>halt
not
braf   LABEL
brat   LABEL
action NAME [ARGS...]
</code></pre></div></div>

<p>Obviously <code class="language-plaintext highlighter-rouge">halt</code> stops the program. While I could just let it “run off the
end” it’s useful to have an actual instruction so that I can attach a
label and jump to it. The <code class="language-plaintext highlighter-rouge">not</code> opcode negates the register. <code class="language-plaintext highlighter-rouge">braf</code> is
“branch if false”, jumping (via relative immediate) to the labeled (in
printed form) instruction if the register is false. <code class="language-plaintext highlighter-rouge">brat</code> is “branch if
true”. Together they implement the <code class="language-plaintext highlighter-rouge">-a</code> and <code class="language-plaintext highlighter-rouge">-o</code> operators. In practice
there are no loops and jumps are always forward: <code class="language-plaintext highlighter-rouge">find</code> is <a href="/blog/2016/04/30/">not Turing
complete</a>.</p>

<p>In a real implementation each possible action (<code class="language-plaintext highlighter-rouge">-name</code>, <code class="language-plaintext highlighter-rouge">-ok</code>, <code class="language-plaintext highlighter-rouge">-print</code>,
<code class="language-plaintext highlighter-rouge">-type</code>, etc.) would get a dedicated opcode. This requires implementing
each operator, at least in part, in order to correctly parse the whole
<code class="language-plaintext highlighter-rouge">find</code> expression. For now I’m just focused on the bytecode compiler, so
this opcode is a stand-in, and it kind of pretends based on looks. Each
action sets the register, and actions like <code class="language-plaintext highlighter-rouge">-print</code> always set it to true.
My compiler is <a href="https://github.com/skeeto/scratch/blob/c142e729/parsers/findc.c">called <strong><code class="language-plaintext highlighter-rouge">findc</code> (“find compiler”)</strong></a>.</p>

<p><strong>Update</strong>: Or try <a href="https://nullprogram.com/scratch/findc/">the <strong>online demo</strong></a> via Wasm! This version
includes a peephole optimizer I wrote after publishing this article.</p>

<p>I assume readers of this program are familiar with <a href="/blog/2025/01/19/"><code class="language-plaintext highlighter-rouge">push</code> macro</a>
and <a href="/blog/2025/06/26/"><code class="language-plaintext highlighter-rouge">Slice</code> macro</a>. Because of the latter it requires a very
recent C compiler, like GCC 15 (e.g. via <a href="https://github.com/skeeto/w64devkit">w64devkit</a>) or Clang 22. Try
out some <code class="language-plaintext highlighter-rouge">find</code> commands and see how they appear as bytecode. The simplest
case is also optimal:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ findc
// path: .
        action  -print
        halt
</code></pre></div></div>

<p>Print the path then halt. Simple. Stepping it up:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ findc -type f -executable
// path: .
        action  -type f
        braf    L1
        action  -executable
L1:     braf    L2
        action  -print
L2:     halt
</code></pre></div></div>

<p>If the path is not a file, it skips over the rest of the program by way of
the second branch instruction. It’s correct, but already we can see room
for improvement. This would be better:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>        action  -type f
        braf    L1
        action  -executable
        braf    L1
        action  -print
L1:     halt
</code></pre></div></div>

<p>More complex still:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ findc -type f \( -executable -o -name '*.exe' \)
// path: .
        action  -type f
        braf    L1
        action  -executable
        brat    L1
        action  -name *.exe
L1:     braf    L2
        action  -print
L2:     halt
</code></pre></div></div>

<p>Inside the parentheses, if <code class="language-plaintext highlighter-rouge">-executable</code> succeeds, the right-hand side is
skipped. Though the <code class="language-plaintext highlighter-rouge">brat</code> jumps straight to a <code class="language-plaintext highlighter-rouge">braf</code>. It would be better
to jump ahead one more instruction:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>        action  -type f
        braf    L2
        action  -executable
        brat    L1
        action  -name *.exe
        braf    L2
L1      action  -print
L2:     halt
</code></pre></div></div>

<p>Silly things aren’t optimized either:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ findc ! ! -executable
// path: .
        action  -executable
        not
        not
        braf    L1
        action  -print
L1:     halt
</code></pre></div></div>

<p>Two <code class="language-plaintext highlighter-rouge">not</code> in a row cancel out, and so these instructions could be
eliminated. Overall this compiler could benefit from a <a href="https://en.wikipedia.org/wiki/Peephole_optimization">peephole
optimizer</a>, scanning over the program repeatedly, making small
improvements until no more can be made:</p>

<ul>
  <li>Delete <code class="language-plaintext highlighter-rouge">not</code>-<code class="language-plaintext highlighter-rouge">not</code>.</li>
  <li>A <code class="language-plaintext highlighter-rouge">brat</code> to a <code class="language-plaintext highlighter-rouge">braf</code> re-targets ahead one instruction, and vice versa.</li>
  <li>Jumping onto an identical jump adopts its target for itself.</li>
  <li>A <code class="language-plaintext highlighter-rouge">not</code>-<code class="language-plaintext highlighter-rouge">braf</code> might convert to a <code class="language-plaintext highlighter-rouge">brat</code>, and vice versa.</li>
  <li>Delete side-effect-free instructions before <code class="language-plaintext highlighter-rouge">halt</code> (e.g. <code class="language-plaintext highlighter-rouge">not</code>-<code class="language-plaintext highlighter-rouge">halt</code>).</li>
  <li>Exploit always-true actions, e.g. <code class="language-plaintext highlighter-rouge">-print</code>-<code class="language-plaintext highlighter-rouge">braf</code> can drop the branch.</li>
</ul>

<p>Writing a bunch of peephole pattern matchers sounds kind of fun. Though my
compiler would first need a slightly richer representation in order to
detect and fix up changes to branches. One more for the road:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ findc -type f ! \( -executable -o -name '*.exe' \)
// path: .
        action  -type f
        braf    L1
        action  -executable
        brat    L2
        action  -name *.exe
L2:     not
L1:     braf    L3
        action  -print
L3:     halt
</code></pre></div></div>

<p>The unoptimal jumps hint at my compiler’s structure. If you’re feeling up
for a challenge, pause here to consider how you’d build this compiler, and
how it might produce these particular artifacts.</p>

<h3 id="parsing-and-compiling">Parsing and compiling</h3>

<p>Before I even considered the shape of the bytecode I knew I needed to
convert <code class="language-plaintext highlighter-rouge">find</code> infix into a compiler-friendly postfix. That is, this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-type f -a ! ( -executable -o -name *.exe )
</code></pre></div></div>

<p>Becomes:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-type f -executable -name *.exe -o ! -a
</code></pre></div></div>

<p>Which, importantly, erases the parentheses. This comes in as an <code class="language-plaintext highlighter-rouge">argv</code>
array, so it’s already tokenized for us by the shell <a href="/blog/2022/02/18/">or runtime</a>. The
classic <a href="https://en.wikipedia.org/wiki/Shunting_yard_algorithm">shunting-yard algorithm</a> solves this problem easily enough.
We have an output queue that goes into the compiler, and a token stack for
tracking <code class="language-plaintext highlighter-rouge">-a</code>, <code class="language-plaintext highlighter-rouge">-o</code>, <code class="language-plaintext highlighter-rouge">!</code>, and <code class="language-plaintext highlighter-rouge">(</code>. Then we walk <code class="language-plaintext highlighter-rouge">argv</code> in order:</p>

<ul>
  <li>
    <p>Actions go straight into the output queue.</p>
  </li>
  <li>
    <p>If we see one of the special stack tokens we push it onto the stack,
first popping operators with greater precedence into the queue, stopping
at <code class="language-plaintext highlighter-rouge">(</code>.</p>
  </li>
  <li>
    <p>If we see <code class="language-plaintext highlighter-rouge">)</code> we pop the stack into the output queue until we see <code class="language-plaintext highlighter-rouge">(</code>.</p>
  </li>
</ul>

<p>When we’re out of tokens, pop the remaining stack into the queue. My
parser synthesizes <code class="language-plaintext highlighter-rouge">-a</code> where it’s implied, so the compiler always sees
logical AND. If the expression contains no <code class="language-plaintext highlighter-rouge">-exec</code>, <code class="language-plaintext highlighter-rouge">-ok</code>, or <code class="language-plaintext highlighter-rouge">-print</code>,
after processing is complete the parser puts <code class="language-plaintext highlighter-rouge">-print</code> then <code class="language-plaintext highlighter-rouge">-a</code> into the
queue, which effectively wraps the whole expression in <code class="language-plaintext highlighter-rouge">( expr ) -print</code>.
By clearing the stack first, the real expression is effectively wrapped in
parentheses, so no parenthesis tokens need to be synthesized.</p>

<p>I’ve used the shunting-yard algorithm many times before, so this part was
easy. The new part was coming up with an algorithm to convert a series of
postfix tokens into bytecode. My solution is the compiler <strong>maintains a
stack of bytecode fragments</strong>. That is, each stack element is a sequence
of one or more bytecode instructions. Branches use relative addresses, so
they’re position-independent, and I can concatenate code fragments without
any branch fix-ups. It takes the following actions from queue tokens:</p>

<ul>
  <li>
    <p>For an action token, create an <code class="language-plaintext highlighter-rouge">action</code> instruction, and push it onto
the fragment stack as a new fragment.</p>
  </li>
  <li>
    <p>For a <code class="language-plaintext highlighter-rouge">!</code> token, pop the top fragment, append a <code class="language-plaintext highlighter-rouge">not</code> instruction, and
push it back onto the stack.</p>
  </li>
  <li>
    <p>For a <code class="language-plaintext highlighter-rouge">-a</code> token, pop the top two fragments, join then with a <code class="language-plaintext highlighter-rouge">braf</code> in
the middle which jumps just beyond the second fragment. That is, if the
first fragment evaluates to false, skip over the second fragment into
whatever follows.</p>
  </li>
  <li>
    <p>For a <code class="language-plaintext highlighter-rouge">-o</code> token, just like <code class="language-plaintext highlighter-rouge">-a</code> but use <code class="language-plaintext highlighter-rouge">brat</code>. If the first fragment
is true, we skip over the second fragment.</p>
  </li>
</ul>

<p>If the expression is valid, at the end of this process the stack contains
exactly one fragment. Append a <code class="language-plaintext highlighter-rouge">halt</code> instruction to this fragment, and
that’s our program! If the final fragment contained a branch just beyond
its end, this <code class="language-plaintext highlighter-rouge">halt</code> is that branch target. A few peephole optimizations
and could probably be an optimal program for this instruction set.</p>

]]>
    </content>
  </entry>
  
  <entry>
    <title>Closures as Win32 window procedures</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2025/12/12/"/>
    <id>urn:uuid:7bf46ec6-a8b2-4ffa-857a-86c040357702</id>
    <updated>2025-12-12T19:52:10Z</updated>
    <category term="c"/><category term="win32"/><category term="x86"/>
    <content type="html">
      <![CDATA[<p>Back in 2017 I wrote <a href="/blog/2017/01/08/">about a technique for creating closures in C</a>
using <a href="/blog/2015/03/19/">JIT-compiled</a> wrapper. It’s neat, though rarely necessary in
real programs, so I don’t think about it often. I applied it to <code class="language-plaintext highlighter-rouge">qsort</code>,
which <a href="/blog/2023/02/11/">sadly</a> accepts no context pointer. More practical would be
working around <a href="/blog/2023/12/17/">insufficient custom allocator interfaces</a>, to
create allocation functions at run-time bound to a particular allocation
region. I’ve learned a lot since I last wrote about this subject, and <a href="https://lowkpro.com/blog/creating-c-closures-from-lua-closures.html">a
recent article</a> had me thinking about it again, and how I could do
better than before. In this article I will enhance Win32 window procedure
callbacks with a fifth argument, allowing us to more directly pass extra
context. I’m using <a href="https://github.com/skeeto/w64devkit">w64devkit</a> on x64, but the everything here should
work out-of-the-box with any x64 toolchain that speaks GNU assembly.</p>

<p>A <a href="https://learn.microsoft.com/en-us/windows/win32/api/winuser/nc-winuser-wndproc">window procedure</a> has this prototype:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">LRESULT</span> <span class="nf">Wndproc</span><span class="p">(</span>
  <span class="n">HWND</span> <span class="n">hWnd</span><span class="p">,</span>
  <span class="n">UINT</span> <span class="n">Msg</span><span class="p">,</span>
  <span class="n">WPARAM</span> <span class="n">wParam</span><span class="p">,</span>
  <span class="n">LPARAM</span> <span class="n">lParam</span><span class="p">,</span>
<span class="p">);</span>
</code></pre></div></div>

<p>To create a window we must first register a class with <code class="language-plaintext highlighter-rouge">RegisterClass</code>,
which accepts a set of properties describing a window class, including a
pointer to one of these functions.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">MyState</span> <span class="o">*</span><span class="n">state</span> <span class="o">=</span> <span class="p">...;</span>

    <span class="n">RegisterClassA</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">WNDCLASSA</span><span class="p">){</span>
        <span class="c1">// ...</span>
        <span class="p">.</span><span class="n">lpfnWndProc</span>   <span class="o">=</span> <span class="n">my_wndproc</span><span class="p">,</span>
        <span class="p">.</span><span class="n">lpszClassName</span> <span class="o">=</span> <span class="s">"my_class"</span><span class="p">,</span>
        <span class="c1">// ...</span>
    <span class="p">});</span>

    <span class="n">HWND</span> <span class="n">hwnd</span> <span class="o">=</span> <span class="n">CreateWindowExA</span><span class="p">(</span><span class="s">"my_class"</span><span class="p">,</span> <span class="p">...,</span> <span class="n">state</span><span class="p">);</span>
</code></pre></div></div>

<p>The thread drives a message pump with events from the operating system,
dispatching them to this procedure, which then manipulates the program
state in response:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">for</span> <span class="p">(</span><span class="n">MSG</span> <span class="n">msg</span><span class="p">;</span> <span class="n">GetMessageW</span><span class="p">(</span><span class="o">&amp;</span><span class="n">msg</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);)</span> <span class="p">{</span>
        <span class="n">TranslateMessage</span><span class="p">(</span><span class="o">&amp;</span><span class="n">msg</span><span class="p">);</span>
        <span class="n">DispatchMessageW</span><span class="p">(</span><span class="o">&amp;</span><span class="n">msg</span><span class="p">);</span>  <span class="c1">// calls the window procedure</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>All four <code class="language-plaintext highlighter-rouge">WNDPROC</code> parameters are determined by Win32. There is no context
pointer argument. So how does this procedure access the program state? We
generally have two options:</p>

<ol>
  <li>Global variables. Yucky but easy. Frequently seen in tutorials.</li>
  <li>A <code class="language-plaintext highlighter-rouge">GWLP_USERDATA</code> pointer attached to the window.</li>
</ol>

<p>The second option takes some setup. Win32 passes the last <code class="language-plaintext highlighter-rouge">CreateWindowEx</code>
argument to the window procedure when the window created, via <code class="language-plaintext highlighter-rouge">WM_CREATE</code>.
The procedure attaches the pointer to its window as <code class="language-plaintext highlighter-rouge">GWLP_USERDATA</code>. This
pointer is passed indirectly, through a <code class="language-plaintext highlighter-rouge">CREATESTRUCT</code>. So ultimately it
looks like this:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">case</span> <span class="n">WM_CREATE</span><span class="p">:</span>
        <span class="n">CREATESTRUCT</span> <span class="o">*</span><span class="n">cs</span> <span class="o">=</span> <span class="p">(</span><span class="n">CREATESTRUCT</span> <span class="o">*</span><span class="p">)</span><span class="n">lParam</span><span class="p">;</span>
        <span class="kt">void</span> <span class="o">*</span><span class="n">arg</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">state</span> <span class="o">*</span><span class="p">)</span><span class="n">cs</span><span class="o">-&gt;</span><span class="n">lpCreateParams</span><span class="p">;</span>
        <span class="n">SetWindowLongPtr</span><span class="p">(</span><span class="n">hwnd</span><span class="p">,</span> <span class="n">GWLP_USERDATA</span><span class="p">,</span> <span class="p">(</span><span class="n">LONG_PTR</span><span class="p">)</span><span class="n">arg</span><span class="p">);</span>
        <span class="c1">// ...</span>
</code></pre></div></div>

<p>In future messages we can retrieve it with <code class="language-plaintext highlighter-rouge">GetWindowLongPtr</code>. Every time
I go through this I wish there was a better way. What if there was a fifth
window procedure parameter though which we could pass a context?</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>typedef LRESULT Wndproc5(HWND, UINT, WPARAM, LPARAM, void *);
</code></pre></div></div>

<p>We’ll build just this as a trampoline. The <a href="https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention">x64 calling convention</a>
passes the first four arguments in registers, and the rest are pushed on
the stack, including this new parameter. Our trampoline cannot just stuff
the extra parameter in the register, but will actually have to build a
stack frame. Slightly more complicated, but barely so.</p>

<h3 id="allocating-executable-memory">Allocating executable memory</h3>

<p>In previous articles, and in the programs where I’ve applied techniques
like this, I’ve allocated executable memory with <code class="language-plaintext highlighter-rouge">VirtualAlloc</code> (or <code class="language-plaintext highlighter-rouge">mmap</code>
elsewhere). This introduces a small challenge for solving the problem
generally: Allocations may be arbitrarily far from our code and data, out
of reach of relative addressing. If they’re further than 2G apart, we need
to encode absolute addresses, and in the simple case would just assume
they’re always too far apart.</p>

<p>These days I’ve more experience with executable formats, and allocation,
and I immediately see a better solution: Request a block of writable,
executable memory from the loader, then allocate our trampolines from it.
Other than being executable, this memory isn’t special, and <a href="/blog/2025/01/19/">allocation
works the usual way</a>, using functions unaware it’s executable. By
allocating through the loader, this memory will be part of our loaded
image, guaranteed to be close to our other code and data, allowing our JIT
compiler to assume <a href="https://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models#small-code-model">a small code model</a>.</p>

<p>There are a number of ways to do this, and here’s one way to do it with
GNU-styled toolchains targeting COFF:</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code>        <span class="nf">.section</span> <span class="nv">.exebuf</span><span class="p">,</span><span class="s">"bwx"</span>
        <span class="nf">.globl</span> <span class="nv">exebuf</span>
<span class="nl">exebuf:</span>	<span class="nf">.space</span> <span class="mi">1</span><span class="o">&lt;&lt;</span><span class="mi">21</span>
</code></pre></div></div>

<p>This assembly program defines a new section named <code class="language-plaintext highlighter-rouge">.exebuf</code> containing 2M
of writable (<code class="language-plaintext highlighter-rouge">"w"</code>), executable (<code class="language-plaintext highlighter-rouge">"x"</code>) memory, allocated at run time just
like <code class="language-plaintext highlighter-rouge">.bss</code> (<code class="language-plaintext highlighter-rouge">"b"</code>). We’ll treat this like an arena out of which we can
allocate all trampolines we’ll probably ever need. With careful use of
<code class="language-plaintext highlighter-rouge">.pushsection</code> this could be basic inline assembly, but I’ve left it as a
separate source. On the C side I retrieve this like so:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">beg</span><span class="p">;</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">end</span><span class="p">;</span>
<span class="p">}</span> <span class="n">Arena</span><span class="p">;</span>

<span class="n">Arena</span> <span class="nf">get_exebuf</span><span class="p">()</span>
<span class="p">{</span>
    <span class="k">extern</span> <span class="kt">char</span> <span class="n">exebuf</span><span class="p">[</span><span class="mi">1</span><span class="o">&lt;&lt;</span><span class="mi">21</span><span class="p">];</span>
    <span class="n">Arena</span> <span class="n">r</span> <span class="o">=</span> <span class="p">{</span><span class="n">exebuf</span><span class="p">,</span> <span class="n">exebuf</span><span class="o">+</span><span class="k">sizeof</span><span class="p">(</span><span class="n">exebuf</span><span class="p">)};</span>
    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Unfortunately I have to repeat myself on the size. There are different
ways to deal with this, but this is simple enough for now. I would have
loved to define the array in C with the GCC <a href="https://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Variable-Attributes.html"><code class="language-plaintext highlighter-rouge">section</code> attribute</a>,
but as is usually the case with this attribute, it’s not up to the task,
lacking the ability to set section flags. Besides, by not relying on the
attribute, any C compiler could compile this source, and we only need a
GNU-style toolchain to create the tiny COFF object containing <code class="language-plaintext highlighter-rouge">exebuf</code>.</p>

<p>While we’re at it, a reminder of some other basic definitions we’ll need:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define S(s)            (Str){s, sizeof(s)-1}
#define new(a, n, t)    (t *)alloc(a, n, sizeof(t), _Alignof(t))
</span>
<span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
    <span class="kt">char</span>     <span class="o">*</span><span class="n">data</span><span class="p">;</span>
    <span class="kt">ptrdiff_t</span> <span class="n">len</span><span class="p">;</span>
<span class="p">}</span> <span class="n">Str</span><span class="p">;</span>

<span class="n">Str</span> <span class="nf">clone</span><span class="p">(</span><span class="n">Arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="n">Str</span> <span class="n">s</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">Str</span> <span class="n">r</span> <span class="o">=</span> <span class="n">s</span><span class="p">;</span>
    <span class="n">r</span><span class="p">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">new</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">r</span><span class="p">.</span><span class="n">len</span><span class="p">,</span> <span class="kt">char</span><span class="p">);</span>
    <span class="n">memcpy</span><span class="p">(</span><span class="n">r</span><span class="p">.</span><span class="n">data</span><span class="p">,</span> <span class="n">s</span><span class="p">.</span><span class="n">data</span><span class="p">,</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)</span><span class="n">r</span><span class="p">.</span><span class="n">len</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Which have been discussed at length in previous articles.</p>

<h3 id="trampoline-compiler">Trampoline compiler</h3>

<p>From here the plan is to create a function that accepts a <code class="language-plaintext highlighter-rouge">Wndproc5</code> and a
context pointer to bind, and returns a classic <code class="language-plaintext highlighter-rouge">WNDPROC</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">WNDPROC</span> <span class="nf">make_wndproc</span><span class="p">(</span><span class="n">Arena</span> <span class="o">*</span><span class="p">,</span> <span class="n">Wndproc5</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">arg</span><span class="p">);</span>
</code></pre></div></div>

<p>Our window procedure now gets a fifth argument with the program state:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">LRESULT</span> <span class="nf">my_wndproc</span><span class="p">(</span><span class="n">HWND</span><span class="p">,</span> <span class="n">UINT</span><span class="p">,</span> <span class="n">WPARAM</span><span class="p">,</span> <span class="n">LPARAM</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">arg</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">MyState</span> <span class="o">*</span><span class="n">state</span> <span class="o">=</span> <span class="n">arg</span><span class="p">;</span>
    <span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div></div>

<p>When registering the class we wrap it in a trampoline compatible with
<code class="language-plaintext highlighter-rouge">RegisterClass</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">RegisterClassA</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">WNDCLASSA</span><span class="p">){</span>
        <span class="c1">// ...</span>
        <span class="p">.</span><span class="n">lpfnWndProc</span>   <span class="o">=</span> <span class="n">make_wndproc</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">my_wndproc</span><span class="p">,</span> <span class="n">state</span><span class="p">),</span>
        <span class="p">.</span><span class="n">lpszClassName</span> <span class="o">=</span> <span class="s">"my_class"</span><span class="p">,</span>
        <span class="c1">// ...</span>
    <span class="p">});</span>
</code></pre></div></div>

<p>All windows using this class will readily have access to this state object
through their fifth parameter. It turns out setting up <code class="language-plaintext highlighter-rouge">exebuf</code> was the
more complicated part, and <code class="language-plaintext highlighter-rouge">make_wndproc</code> is quite simple!</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">WNDPROC</span> <span class="nf">make_wndproc</span><span class="p">(</span><span class="n">Arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="n">Wndproc5</span> <span class="n">proc</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">arg</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">Str</span> <span class="n">thunk</span> <span class="o">=</span> <span class="n">S</span><span class="p">(</span>
        <span class="s">"</span><span class="se">\x48\x83\xec\x28</span><span class="s">"</span>      <span class="c1">// sub   $40, %rsp</span>
        <span class="s">"</span><span class="se">\x48\xb8</span><span class="s">........"</span>      <span class="c1">// movq  $arg, %rax</span>
        <span class="s">"</span><span class="se">\x48\x89\x44\x24\x20</span><span class="s">"</span>  <span class="c1">// mov   %rax, 32(%rsp)</span>
        <span class="s">"</span><span class="se">\xe8</span><span class="s">...."</span>              <span class="c1">// call  proc</span>
        <span class="s">"</span><span class="se">\x48\x83\xc4\x28</span><span class="s">"</span>      <span class="c1">// add   $40, %rsp</span>
        <span class="s">"</span><span class="se">\xc3</span><span class="s">"</span>                  <span class="c1">// ret</span>
    <span class="p">);</span>
    <span class="n">Str</span> <span class="n">r</span>   <span class="o">=</span> <span class="n">clone</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">thunk</span><span class="p">);</span>
    <span class="kt">int</span> <span class="n">rel</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int</span><span class="p">)((</span><span class="kt">uintptr_t</span><span class="p">)</span><span class="n">proc</span> <span class="o">-</span> <span class="p">(</span><span class="kt">uintptr_t</span><span class="p">)(</span><span class="n">r</span><span class="p">.</span><span class="n">data</span> <span class="o">+</span> <span class="mi">24</span><span class="p">));</span>
    <span class="n">memcpy</span><span class="p">(</span><span class="n">r</span><span class="p">.</span><span class="n">data</span><span class="o">+</span> <span class="mi">6</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">arg</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">arg</span><span class="p">));</span>
    <span class="n">memcpy</span><span class="p">(</span><span class="n">r</span><span class="p">.</span><span class="n">data</span><span class="o">+</span><span class="mi">20</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">rel</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">rel</span><span class="p">));</span>
    <span class="k">return</span> <span class="p">(</span><span class="n">WNDPROC</span><span class="p">)</span><span class="n">r</span><span class="p">.</span><span class="n">data</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The assembly allocates a new stack frame, with callee shadow space, and
with room for the new argument, which also happens to re-align the stack.
It stores the new argument for the <code class="language-plaintext highlighter-rouge">Wndproc5</code> just above the shadow space.
Then calls into the <code class="language-plaintext highlighter-rouge">Wndproc5</code> without touching other parameters. There
are two “patches” to fill out, which I’ve initially filled with dots: the
context pointer itself, and a 32-bit signed relative address for the call.
It’s going to be very near the callee. The only thing I don’t like about
this function is that I’ve manually worked out the patch offsets.</p>

<p>It’s probably not useful, but it’s easy to update the context pointer at
any time if hold onto the trampoline pointer:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">set_wndproc_arg</span><span class="p">(</span><span class="n">WNDPROC</span> <span class="n">p</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">arg</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">memcpy</span><span class="p">((</span><span class="kt">char</span> <span class="o">*</span><span class="p">)</span><span class="n">p</span><span class="o">+</span><span class="mi">6</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">arg</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">arg</span><span class="p">));</span>
<span class="p">}</span>
</code></pre></div></div>

<p>So, for instance:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">MyState</span> <span class="o">*</span><span class="n">state</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="p">...;</span>  <span class="c1">// multiple states</span>
    <span class="n">WNDPROC</span> <span class="n">proc</span> <span class="o">=</span> <span class="n">make_wndproc</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">my_wndproc</span><span class="p">,</span> <span class="n">state</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span>
    <span class="c1">// ...</span>
    <span class="n">set_wndproc_arg</span><span class="p">(</span><span class="n">proc</span><span class="p">,</span> <span class="n">state</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span>  <span class="c1">// switch states</span>
</code></pre></div></div>

<p>Though I expect the most common case is just creating multiple procedures:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">WNDPROC</span> <span class="n">procs</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
        <span class="n">make_wndproc</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">my_wndproc</span><span class="p">,</span> <span class="n">state</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span>
        <span class="n">make_wndproc</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">my_wndproc</span><span class="p">,</span> <span class="n">state</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span>
    <span class="p">};</span>
</code></pre></div></div>

<p>To my slight surprise these trampolines still work with an active <a href="https://learn.microsoft.com/en-us/windows/win32/secbp/control-flow-guard">Control
Flow Guard</a> system policy. Trampolines do not have stack unwind
entries, and I thought Windows might refuse to pass control to them.</p>

<p>Here’s a complete, runnable example if you’d like to try it yourself:
<a href="https://gist.github.com/skeeto/13363b78489b26bed7485ec0d6b2c7f8"><code class="language-plaintext highlighter-rouge">main.c</code> and <code class="language-plaintext highlighter-rouge">exebuf.s</code></a></p>

<h3 id="better-cases">Better cases</h3>

<p>This is more work than going through <code class="language-plaintext highlighter-rouge">GWLP_USERDATA</code>, and real programs
have a small, fixed number of window procedures — typically one — so this
isn’t the best example, but I wanted to illustrate with a real interface.
Again, perhaps the best real use is a library with a weak custom allocator
interface:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
    <span class="kt">void</span> <span class="o">*</span><span class="p">(</span><span class="o">*</span><span class="n">malloc</span><span class="p">)(</span><span class="kt">size_t</span><span class="p">);</span>   <span class="c1">// no context pointer!</span>
    <span class="kt">void</span>  <span class="p">(</span><span class="o">*</span><span class="n">free</span><span class="p">)(</span><span class="kt">void</span> <span class="o">*</span><span class="p">);</span>     <span class="c1">// "</span>
<span class="p">}</span> <span class="n">Allocator</span><span class="p">;</span>

<span class="kt">void</span> <span class="o">*</span><span class="nf">arena_malloc</span><span class="p">(</span><span class="kt">size_t</span><span class="p">,</span> <span class="n">Arena</span> <span class="o">*</span><span class="p">);</span>

<span class="c1">// ...</span>

    <span class="n">Allocator</span> <span class="n">perm_allocator</span> <span class="o">=</span> <span class="p">{</span>
        <span class="p">.</span><span class="n">malloc</span> <span class="o">=</span> <span class="n">make_trampoline</span><span class="p">(</span><span class="n">exearena</span><span class="p">,</span> <span class="n">arena_malloc</span><span class="p">,</span> <span class="n">perm</span><span class="p">);</span>
        <span class="p">.</span><span class="n">free</span>   <span class="o">=</span> <span class="n">noop_free</span><span class="p">,</span>
    <span class="p">};</span>
    <span class="n">Allocator</span> <span class="n">scratch_allocator</span> <span class="o">=</span> <span class="p">{</span>
        <span class="p">.</span><span class="n">malloc</span> <span class="o">=</span> <span class="n">make_trampoline</span><span class="p">(</span><span class="n">exearena</span><span class="p">,</span> <span class="n">arena_malloc</span><span class="p">,</span> <span class="n">scratch</span><span class="p">);</span>
        <span class="p">.</span><span class="n">free</span>   <span class="o">=</span> <span class="n">noop_free</span><span class="p">,</span>
    <span class="p">};</span>
</code></pre></div></div>

<p>Something to keep in my back pocket for the future.</p>

]]>
    </content>
  </entry>
  
  <entry>
    <title>Speculations on arenas and non-trivial destructors</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2025/10/16/"/>
    <id>urn:uuid:102e0e39-0078-4698-b2d2-b9454dfe5545</id>
    <updated>2025-10-16T20:11:22Z</updated>
    <category term="cpp"/>
    <content type="html">
      <![CDATA[<p>As I <a href="/blog/2025/09/30/">continue to reflect</a> on arenas and lifetimes in C++, I realized
that dealing with destructors is not so onerous. In fact, it does not even
impact <a href="/blog/2025/01/19/">my established arena usage</a>! That is, implicit RAII-style
deallocation at scope termination, which works even in plain old C. With a
small change we can safely place resource-managing objects in arenas, such
as those owning file handles, sockets, threads, etc. (Though the ideal
remains <a href="/blog/2024/10/03/">resource management avoidance</a> when possible.) We can also
place traditional, memory-managing C++ objects in arenas, too. Their own
allocations won’t come from the arena — either because they <a href="/blog/2024/09/04/">lack the
interfaces</a> to do so, or they’re simply ineffective at it (<a href="https://en.cppreference.com/w/cpp/memory/polymorphic.html">pmr</a>) —
but they will reliably clean up after themselves. It’s all exception-safe,
too. In this article I’ll update my arena allocator with this new feature.
The change requires one additional arena pointer member, a bit of overhead
for objects with non-trivial destructors, and no impact for other objects.</p>

<p>I continue to title this “speculations” because, unlike arenas in C, I
have not (yet?) put these C++ techniques into practice in real software. I
haven’t refined them through use. Even ignoring its standard library as I
do here, C++ is an enormously complex programming language — far more so
than C — and I’m less confident that I’m not breaking a rule by accident.
I only want to break rules with intention!</p>

<p>As a reminder here’s where we left things off:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">Arena</span> <span class="p">{</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">beg</span><span class="p">;</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">end</span><span class="p">;</span>
<span class="p">};</span>

<span class="k">template</span><span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">&gt;</span>
<span class="n">T</span> <span class="o">*</span><span class="n">raw_alloc</span><span class="p">(</span><span class="n">Arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">count</span> <span class="o">=</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">ptrdiff_t</span> <span class="n">size</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">T</span><span class="p">);</span>
    <span class="kt">ptrdiff_t</span> <span class="n">pad</span>  <span class="o">=</span> <span class="o">-</span><span class="p">(</span><span class="kt">uintptr_t</span><span class="p">)</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">&amp;</span> <span class="p">(</span><span class="k">alignof</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">count</span> <span class="o">&gt;=</span> <span class="p">(</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">end</span> <span class="o">-</span> <span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">-</span> <span class="n">pad</span><span class="p">)</span><span class="o">/</span><span class="n">size</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">throw</span> <span class="n">std</span><span class="o">::</span><span class="n">bad_alloc</span><span class="p">{};</span>  <span class="c1">// OOM policy</span>
    <span class="p">}</span>
    <span class="kt">void</span> <span class="o">*</span><span class="n">r</span> <span class="o">=</span> <span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">+</span> <span class="n">pad</span><span class="p">;</span>
    <span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">+=</span> <span class="n">pad</span> <span class="o">+</span> <span class="n">count</span><span class="o">*</span><span class="n">size</span><span class="p">;</span>
    <span class="k">return</span> <span class="k">new</span><span class="p">(</span><span class="n">r</span><span class="p">)</span> <span class="n">T</span><span class="p">[</span><span class="n">count</span><span class="p">]{};</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I used <code class="language-plaintext highlighter-rouge">throw</code> when out of memory mainly to emphasize that this works, but
you’re free to pick whatever is appropriate for your program. Remember,
that’s the entire allocator, including implicit deallocation, sufficient
to fulfill the allocation needs for most programs, though they must be
designed for it. Also note that it’s now <code class="language-plaintext highlighter-rouge">raw_alloc</code>, as we’ll be writing
a new, enhanced <code class="language-plaintext highlighter-rouge">alloc</code> that builds upon this one.</p>

<p>Also a reminder on usage, I’ll draw on an old example, updated for C++:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">wchar_t</span>   <span class="o">*</span><span class="nf">towidechar</span><span class="p">(</span><span class="n">Str</span><span class="p">,</span> <span class="n">Arena</span> <span class="o">*</span><span class="p">);</span>   <span class="c1">// convert to UTF-16</span>
<span class="n">Str</span>        <span class="nf">slurpfile</span><span class="p">(</span><span class="kt">wchar_t</span> <span class="o">*</span><span class="n">path</span><span class="p">);</span>   <span class="c1">// read an entire file</span>
<span class="n">Slice</span><span class="o">&lt;</span><span class="n">Str</span><span class="o">&gt;</span> <span class="n">split</span><span class="p">(</span><span class="n">Str</span><span class="p">,</span> <span class="kt">char</span><span class="p">,</span> <span class="n">Arena</span> <span class="o">*</span><span class="p">);</span>  <span class="c1">// split on delimiter</span>

<span class="n">Slice</span><span class="o">&lt;</span><span class="n">Str</span><span class="o">&gt;</span> <span class="n">getlines</span><span class="p">(</span><span class="n">Str</span> <span class="n">path</span><span class="p">,</span> <span class="n">Arena</span> <span class="o">*</span><span class="n">perm</span><span class="p">,</span> <span class="n">Arena</span> <span class="n">scratch</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// Use scratch for path conversion, auto-free on return</span>
    <span class="kt">wchar_t</span> <span class="o">*</span><span class="n">wpath</span> <span class="o">=</span> <span class="n">towidechar</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">scratch</span><span class="p">);</span>

    <span class="c1">// Use perm for file contents, which are returned</span>
    <span class="n">Str</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">slurpfile</span><span class="p">(</span><span class="n">wpath</span><span class="p">,</span> <span class="n">perm</span><span class="p">);</span>

    <span class="c1">// Use perm for the slice, pointing into buf</span>
    <span class="k">return</span> <span class="n">split</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="sc">'\n'</span><span class="p">,</span> <span class="n">perm</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Changes to <code class="language-plaintext highlighter-rouge">scratch</code> do not persist after <code class="language-plaintext highlighter-rouge">getlines</code> returns, so objects
allocated from that arena are automatically freed on return. So far this
doesn’t rely on C++ RAII features, just simple value semantics. It works
well because all the objects in question have trivial destructors. But
suppose there’s a resource to manage:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">TcpSocket</span> <span class="p">{</span>
    <span class="kt">int</span> <span class="n">socket</span> <span class="o">=</span> <span class="o">::</span><span class="n">socket</span><span class="p">(</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">SOCK_STREAM</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
    <span class="n">TcpSocket</span><span class="p">()</span> <span class="o">=</span> <span class="k">default</span><span class="p">;</span>
    <span class="n">TcpSocket</span><span class="p">(</span><span class="n">TcpSocket</span> <span class="o">&amp;</span><span class="p">)</span> <span class="o">=</span> <span class="k">delete</span><span class="p">;</span>
    <span class="kt">void</span> <span class="k">operator</span><span class="o">=</span><span class="p">(</span><span class="n">TcpSocket</span> <span class="o">&amp;</span><span class="p">)</span> <span class="o">=</span> <span class="k">delete</span><span class="p">;</span>
    <span class="c1">// TODO: move ctor/operator</span>
    <span class="o">~</span><span class="n">TcpSocket</span><span class="p">()</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span><span class="n">socket</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="p">)</span> <span class="n">close</span><span class="p">(</span><span class="n">socket</span><span class="p">);</span> <span class="p">}</span>
    <span class="k">operator</span> <span class="kt">int</span><span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="n">socket</span><span class="p">;</span> <span class="p">}</span>
<span class="p">};</span>
</code></pre></div></div>

<p>If we allocate a TcpSocket in an arena, including as a member of another
object, the destructor will never run unless we call it manually. To deal
with this we’ll need to keep track of objects requiring destruction, which
we’ll do with a linked list of destructors, forming a LIFO stack:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">Dtor</span> <span class="p">{</span>
    <span class="n">Dtor</span>     <span class="o">*</span><span class="n">next</span><span class="p">;</span>
    <span class="kt">void</span>     <span class="o">*</span><span class="n">objects</span><span class="p">;</span>
    <span class="kt">ptrdiff_t</span> <span class="n">count</span><span class="p">;</span>
    <span class="kt">void</span>     <span class="p">(</span><span class="o">*</span><span class="n">dtor</span><span class="p">)(</span><span class="kt">void</span> <span class="o">*</span><span class="n">objects</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">count</span><span class="p">);</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Each Dtor points to a homogeneous array, a count (typically one), and a
pointer to a function that knows how to destroy these objects. The linked
list itself is heterogeneous, with dynamic type. The function pointer is
like a kind of type tag. The <code class="language-plaintext highlighter-rouge">dtor</code> functions will be generated using a
template function:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span><span class="o">&lt;</span><span class="k">class</span> <span class="nc">T</span><span class="p">&gt;</span>
<span class="kt">void</span> <span class="nf">destroy</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="n">ptr</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">count</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">T</span> <span class="o">*</span><span class="n">objects</span> <span class="o">=</span> <span class="p">(</span><span class="n">T</span> <span class="o">*</span><span class="p">)</span><span class="n">ptr</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">ptrdiff_t</span> <span class="n">i</span> <span class="o">=</span> <span class="n">count</span><span class="o">-</span><span class="mi">1</span><span class="p">;</span> <span class="n">i</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span><span class="o">--</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">objects</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="o">~</span><span class="n">T</span><span class="p">();</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Notice it destroys end-to-beginning, in reverse order that these objects
would be instantiated by placement <code class="language-plaintext highlighter-rouge">new[]</code>. It’s essentially a placement
<code class="language-plaintext highlighter-rouge">delete[]</code>. An arena initializes with an empty list of Dtors as a new
member:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">Arena</span> <span class="p">{</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">beg</span><span class="p">;</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">end</span><span class="p">;</span>
    <span class="n">Dtor</span> <span class="o">*</span><span class="n">dtors</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

    <span class="c1">// ...</span>

<span class="p">};</span>
</code></pre></div></div>

<p>There are two different ways to construct an arena: over a block of raw
memory (unowned), or from an existing arena to borrow a scratch arena over
its free space. So that’s two constructors:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">Arena</span> <span class="p">{</span>
    <span class="c1">// ...</span>

    <span class="n">Arena</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">mem</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">len</span><span class="p">)</span> <span class="o">:</span> <span class="n">beg</span><span class="p">{</span><span class="n">mem</span><span class="p">},</span> <span class="n">end</span><span class="p">{</span><span class="n">mem</span><span class="o">+</span><span class="n">len</span><span class="p">}</span> <span class="p">{}</span>
    <span class="n">Arena</span><span class="p">(</span><span class="n">Arena</span> <span class="o">&amp;</span><span class="n">a</span><span class="p">)</span> <span class="o">:</span> <span class="n">beg</span><span class="p">{</span><span class="n">a</span><span class="p">.</span><span class="n">beg</span><span class="p">},</span> <span class="n">end</span><span class="p">{</span><span class="n">a</span><span class="p">.</span><span class="n">end</span><span class="p">}</span> <span class="p">{}</span>

    <span class="c1">// ...</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Finally a destructor that pops the Dtor linked list until empty, which
runs the destructors in reverse order when the arena is destroyed:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">Arena</span> <span class="p">{</span>
    <span class="c1">// ...</span>

    <span class="kt">void</span> <span class="k">operator</span><span class="o">=</span><span class="p">(</span><span class="n">Arena</span> <span class="o">&amp;</span><span class="p">)</span> <span class="o">=</span> <span class="k">delete</span><span class="p">;</span>  <span class="c1">// rule of three</span>

    <span class="o">~</span><span class="n">Arena</span><span class="p">()</span>
    <span class="p">{</span>
        <span class="k">while</span> <span class="p">(</span><span class="n">dtors</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">Dtor</span> <span class="o">*</span><span class="n">dead</span> <span class="o">=</span> <span class="n">dtors</span><span class="p">;</span>
            <span class="n">dtors</span> <span class="o">=</span> <span class="n">dead</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">;</span>
            <span class="n">dead</span><span class="o">-&gt;</span><span class="n">dtor</span><span class="p">(</span><span class="n">dead</span><span class="o">-&gt;</span><span class="n">objects</span><span class="p">,</span> <span class="n">dead</span><span class="o">-&gt;</span><span class="n">count</span><span class="p">);</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">};</span>
</code></pre></div></div>

<p>(Note: This should probably use a local variable instead of manipulating
the <code class="language-plaintext highlighter-rouge">dtors</code> member directly. Updates to <code class="language-plaintext highlighter-rouge">dtors</code> are potentially visible to
destructors, inhibiting optimization.) The new, enhanced <code class="language-plaintext highlighter-rouge">alloc</code> building
upon <code class="language-plaintext highlighter-rouge">raw_alloc</code>:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span><span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">&gt;</span>
<span class="n">T</span> <span class="o">*</span><span class="nf">alloc</span><span class="p">(</span><span class="n">Arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">count</span> <span class="o">=</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">__has_trivial_destructor</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="o">||</span> <span class="o">!</span><span class="n">count</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">return</span> <span class="n">raw_alloc</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">count</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="n">Dtor</span> <span class="o">*</span><span class="n">dtor</span>    <span class="o">=</span> <span class="n">raw_alloc</span><span class="o">&lt;</span><span class="n">Dtor</span><span class="o">&gt;</span><span class="p">(</span><span class="n">a</span><span class="p">);</span>  <span class="c1">// allocate first</span>
    <span class="n">T</span>    <span class="o">*</span><span class="n">r</span>       <span class="o">=</span> <span class="n">raw_alloc</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">count</span><span class="p">);</span>
    <span class="n">dtor</span><span class="o">-&gt;</span><span class="n">next</span>    <span class="o">=</span> <span class="n">a</span><span class="o">-&gt;</span><span class="n">dtors</span><span class="p">;</span>
    <span class="n">dtor</span><span class="o">-&gt;</span><span class="n">objects</span> <span class="o">=</span> <span class="n">r</span><span class="p">;</span>
    <span class="n">dtor</span><span class="o">-&gt;</span><span class="n">count</span>   <span class="o">=</span> <span class="n">count</span><span class="p">;</span>
    <span class="n">dtor</span><span class="o">-&gt;</span><span class="n">dtor</span>    <span class="o">=</span> <span class="n">destroy</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">;</span>

    <span class="n">a</span><span class="o">-&gt;</span><span class="n">dtors</span> <span class="o">=</span> <span class="n">dtor</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I’m using the non-standard <code class="language-plaintext highlighter-rouge">__has_trivial_destructor</code> built-in supported
by all major C++ implementations, meaning we still don’t need the C++
standard library, but <a href="https://en.cppreference.com/w/cpp/types/is_destructible.html"><code class="language-plaintext highlighter-rouge">std::is_trivially_destructible</code></a> is the usual
tool here. <a href="https://clang.llvm.org/docs/LanguageExtensions.html#:~:text=__has_trivial_destructor">LLVM is pushing <code class="language-plaintext highlighter-rouge">__is_trivially_destructible</code></a> instead,
but it’s not supported by GCC <a href="https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107600">until GCC 16</a>.</p>

<p>Since it’s so simple to do it, if the count is zero then it doesn’t care
about non-trivial destruction, as there’s nothing to destroy. Things get
more interesting for a non-zero number of non-trivially destructible
objects. First allocate a Dtor, important because failing to allocate it
second would cause a leak (no Dtor entry in place). Then allocate the
array, attach it to the Dtor, attach the Dtor to the arena, registering
the objects for cleanup.</p>

<p>If a constructor throws, placement <code class="language-plaintext highlighter-rouge">new[]</code> will automatically destroy
objects that have been created so far — i.e. the real placement <code class="language-plaintext highlighter-rouge">delete[]</code>
— before returning, so that case was already covered at the start.</p>

<p>With a little more cleverness we could omit the <code class="language-plaintext highlighter-rouge">objects</code> pointer and
discover the array using pointer arithmetic off the Dtor object itself.
That’s tricky (consider alignment), and generally unnecessary, so I didn’t
worry about it. With arenas, allocator overhead is already well below that
of conventional allocation, so slack is plentiful. Chances are we will
also never need an <em>array</em> of non-trivially destructible objects, and so
we could probably omit <code class="language-plaintext highlighter-rouge">count</code>, then write a single-object allocator that
forwards constructor arguments (e.g. a handles to the resource to be
managed). That involves no new concepts, and I leave it as an exercise for
the reader.</p>

<p>With that in place, we could now allocate an array of TcpSockets:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">example</span><span class="p">(</span><span class="n">Arena</span> <span class="n">scratch</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">TcpSocket</span> <span class="o">*</span><span class="n">sockets</span> <span class="o">=</span> <span class="n">alloc</span><span class="o">&lt;</span><span class="n">TcpSocket</span><span class="o">&gt;</span><span class="p">(</span><span class="o">&amp;</span><span class="n">scratch</span><span class="p">,</span> <span class="mi">100</span><span class="p">);</span>
    <span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div></div>

<p>These sockets will all be closed when <code class="language-plaintext highlighter-rouge">example</code> exits via their singular
Dtor entry on <code class="language-plaintext highlighter-rouge">scratch</code>. When calling this <code class="language-plaintext highlighter-rouge">example</code> with an arena:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">caller</span><span class="p">(</span><span class="n">Arena</span> <span class="o">*</span><span class="n">perm</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">example</span><span class="p">(</span><span class="o">*</span><span class="n">perm</span><span class="p">);</span>  <span class="c1">// creates a scratch arena</span>
    <span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This invokes the copy constructor, creating a scratch arena with an empty
<code class="language-plaintext highlighter-rouge">dtors</code> list to be passed into <code class="language-plaintext highlighter-rouge">example</code>. Objects existing in <code class="language-plaintext highlighter-rouge">*perm</code> will
not be destroyed by <code class="language-plaintext highlighter-rouge">example</code> because <code class="language-plaintext highlighter-rouge">dtors</code> isn’t passed in. If we had
passed a <em>pointer to an arena</em>, the Arena constructor isn’t invoked, so
the callee uses the caller’s arena, pushing its Dtors onto the callee’s
list.</p>

<p>In other words, the interface hasn’t changed! That’s the most exciting
part for me. This by-copy, by-pointer interfacing has really grown on me
the past two years.</p>

]]>
    </content>
  </entry>
  
  <entry>
    <title>More speculations on arenas in C++</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2025/09/30/"/>
    <id>urn:uuid:ffce917f-c757-42e7-a4d1-55e8d80c5051</id>
    <updated>2025-09-30T11:46:16Z</updated>
    <category term="cpp"/>
    <content type="html">
      <![CDATA[<p><em>Update October 2025: <a href="/blog/2025/10/16/">further enhancements</a></em>.</p>

<p>Patrice Roy’s new book, <a href="https://www.packtpub.com/en-us/product/c-memory-management-9781805129806"><em>C++ Memory Management</em></a>, has made me more
conscious of object lifetimes. C++ is stricter than C about lifetimes, and
common, textbook memory management that’s sound in C is less so in C++ —
<em>more than I realized</em>. The book also presents a form of arena allocation
so watered down as to enjoy none of the benefits. (Despite its precision
otherwise, the second half is also littered with <a href="https://github.com/PacktPublishing/C-Plus-Plus-Memory-Management/blob/9e4c4ea7/chapter12/Vector-better.cpp#L45">integer overflows</a>
lacking <a href="/blog/2024/05/24/">the appropriate checks</a>, and near the end has some <a href="https://github.com/PacktPublishing/C-Plus-Plus-Memory-Management/blob/9e4c4ea7/chapter14/Vector_with_allocator_cpp23.cpp#L118-L119">pointer
overflows</a> invalidating the check.) However, I’m grateful for the new
insights, and it’s made me revisit <a href="/blog/2024/04/14/">my own C++ arena allocation</a>. In
this new light I see I got it subtly wrong myself!</p>

<!--more-->

<p>Surprising to most C++ programmers, but not language lawyers, <a href="https://wg21.link/P0593#idiomatic-c-code-as-c">idiomatic C
memory allocation was ill-formed in C++ until recently</a>:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="o">*</span><span class="nf">newint</span><span class="p">(</span><span class="kt">int</span> <span class="n">v</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="o">*</span><span class="n">r</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="p">)</span><span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">r</span><span class="p">));</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">r</span><span class="p">)</span> <span class="p">{</span>
        <span class="o">*</span><span class="n">r</span> <span class="o">=</span> <span class="n">v</span><span class="p">;</span>  <span class="c1">// &lt;-- undefined behavior before C++20</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This program allocates memory for an object but never starts a lifetime.
Assignment without a lifetime is invalid. Pointer casts are that much more
suspicious in C++, and due to lifetime semantics, in many cases indicate
incorrect code. (To be clear, I’m not arguing in favor of these semantics,
but reasoning about the facts on the ground.) C++20 carved out special
exceptions for <code class="language-plaintext highlighter-rouge">malloc</code> and friends, but addressing this kind of thing in
general is the purpose of the brand new <a href="https://en.cppreference.com/w/cpp/memory/start_lifetime_as.html"><code class="language-plaintext highlighter-rouge">start_lifetime_as</code></a> (and
similar), the slightly older <a href="https://en.cppreference.com/w/cpp/memory/construct_at.html"><code class="language-plaintext highlighter-rouge">construct_at</code></a>, or a classic placement
new. They all start lifetimes. The last looks like:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="o">*</span><span class="nf">newint</span><span class="p">(</span><span class="kt">int</span> <span class="n">v</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">void</span> <span class="o">*</span><span class="n">r</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="kt">int</span><span class="p">));</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">r</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">return</span> <span class="k">new</span><span class="p">(</span><span class="n">r</span><span class="p">)</span> <span class="kt">int</span><span class="p">{</span><span class="n">v</span><span class="p">};</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>That’s no good as a C/C++ polyglot, though per the differing old semantics
that was impossible anyway without macros. Which is basically cheating. An
important detail: The corrected version has no casts, and it returns the
result of <code class="language-plaintext highlighter-rouge">new</code>. That’s important because only the pointer returned by
<code class="language-plaintext highlighter-rouge">new</code> is imbued as a pointer to the new lifetime, <em>not</em> <code class="language-plaintext highlighter-rouge">r</code>. There are no
side effects affecting the provenance of <code class="language-plaintext highlighter-rouge">r</code>, which still points to raw
memory as far as the language is concerned.</p>

<p>With that in mind let’s revisit my arena from last time, which does not
necessarily benefit from the recent changes, not being one of the special
case C standard library functions:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">Arena</span> <span class="p">{</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">beg</span><span class="p">;</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">end</span><span class="p">;</span>
<span class="p">};</span>

<span class="k">template</span><span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">&gt;</span>
<span class="n">T</span> <span class="o">*</span><span class="n">alloc</span><span class="p">(</span><span class="n">Arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">count</span> <span class="o">=</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">ptrdiff_t</span> <span class="n">size</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">T</span><span class="p">);</span>
    <span class="kt">ptrdiff_t</span> <span class="n">pad</span>  <span class="o">=</span> <span class="o">-</span><span class="p">(</span><span class="kt">uintptr_t</span><span class="p">)</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">&amp;</span> <span class="p">(</span><span class="k">alignof</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">);</span>
    <span class="n">assert</span><span class="p">(</span><span class="n">count</span> <span class="o">&lt;</span> <span class="p">(</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">end</span> <span class="o">-</span> <span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">-</span> <span class="n">pad</span><span class="p">)</span><span class="o">/</span><span class="n">size</span><span class="p">);</span>  <span class="c1">// OOM policy</span>
    <span class="n">T</span> <span class="o">*</span><span class="n">r</span> <span class="o">=</span> <span class="p">(</span><span class="n">T</span> <span class="o">*</span><span class="p">)(</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">+</span> <span class="n">pad</span><span class="p">);</span>
    <span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">+=</span> <span class="n">pad</span> <span class="o">+</span> <span class="n">count</span><span class="o">*</span><span class="n">size</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">ptrdiff_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">count</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">new</span><span class="p">((</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">r</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="n">T</span><span class="p">{};</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Hey, look, placement new! I did that to produce a nicer interface, but I
lucked out also starting lifetimes appropriately. Except it returns the
wrong pointer. This allocator discards the pointer blessed with the new
lifetime. Both pointers have the same address but different provenance.
That matters. But I’m calling <code class="language-plaintext highlighter-rouge">new</code> many times, so how do I fix this?
Array new, duh.</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span><span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">&gt;</span>
<span class="n">T</span> <span class="o">*</span><span class="nf">alloc</span><span class="p">(</span><span class="n">Arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">count</span> <span class="o">=</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">ptrdiff_t</span> <span class="n">size</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">T</span><span class="p">);</span>
    <span class="kt">ptrdiff_t</span> <span class="n">pad</span>  <span class="o">=</span> <span class="o">-</span><span class="p">(</span><span class="kt">uintptr_t</span><span class="p">)</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">&amp;</span> <span class="p">(</span><span class="k">alignof</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">);</span>
    <span class="n">assert</span><span class="p">(</span><span class="n">count</span> <span class="o">&lt;</span> <span class="p">(</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">end</span> <span class="o">-</span> <span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">-</span> <span class="n">pad</span><span class="p">)</span><span class="o">/</span><span class="n">size</span><span class="p">);</span>  <span class="c1">// OOM policy</span>
    <span class="kt">void</span> <span class="o">*</span><span class="n">r</span> <span class="o">=</span> <span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">+</span> <span class="n">pad</span><span class="p">;</span>
    <span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">+=</span> <span class="n">pad</span> <span class="o">+</span> <span class="n">count</span><span class="o">*</span><span class="n">size</span><span class="p">;</span>
    <span class="k">return</span> <span class="k">new</span><span class="p">(</span><span class="n">r</span><span class="p">)</span> <span class="n">T</span><span class="p">[</span><span class="n">count</span><span class="p">]{};</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Wow… that’s actually much better anyway. No explicit casts, no loop. Why
didn’t I think of this in the first place? The catch is I can’t forward
constructor arguments, emplace-style — the part that gave me the trouble
with perfect forwarding — but that’s for the best. Forwarding more than
once was unsound, made more obvious by <code class="language-plaintext highlighter-rouge">new[]</code>.</p>

<p>Caveat: This only works starting in C++20, and strictly with <code class="language-plaintext highlighter-rouge">operator
new[](size_t, void *)</code>. Any other placement <code class="language-plaintext highlighter-rouge">new[]</code> may require <em>array
overhead</em> — e.g. it prepends an array size so that <code class="language-plaintext highlighter-rouge">delete[]</code> can run
non-trivial destructors — which is unknowable and therefore impossible to
provide or align correctly. Overhead for placement <code class="language-plaintext highlighter-rouge">new[]</code> is nonsense, of
course, but as of this writing, <em>all three major C++ compilers do it</em> and
essentially have broken custom placement <code class="language-plaintext highlighter-rouge">new[]</code>.</p>

<p>Since I’m thinking about lifetimes, what about the other end? My arena
does not call destructors, by design, and starts new lifetimes on top of
objects that are technically still alive. Is that undefined behavior? As
far as I can tell <a href="https://en.cppreference.com/w/cpp/language/lifetime.html#Storage_reuse">this is allowed</a>, even for non-trivial destructors,
with the caveat that it might leak resources. In this case the resource is
memory managed by the arena, so that’s fine of course.</p>

<p>So addressing pointer provenance also produced a nicer definition. What a
great result from reading that book! While researching, I noticed Jonathan
Müller, who personally gave me great advice and feedback on my previous
article, <a href="https://www.youtube.com/watch?v=oZyhq4D-QL4">talked about lifetimes</a> just a couple weeks later. I
recommend both.</p>

]]>
    </content>
  </entry>
  
  <entry>
    <title>Hierarchical field sort with string interning</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2025/09/24/"/>
    <id>urn:uuid:30d4b889-d14b-4b32-b389-858fb3dde34b</id>
    <updated>2025-09-24T17:11:32Z</updated>
    <category term="c"/>
    <content type="html">
      <![CDATA[<p>In a recent, real world problem I needed to load a heterogeneous sequence
of records from a buffer. Record layout is defined in a header before the
sequence. Each field is numeric, with a unique name composed of non-empty
alphanumeric period-delimited segments, where segments signify nested
structure. Field names are a comma-delimited list, in order of the record
layout. The catch motivating this article is that nested structures are
not necessarily contiguous. In my transformed representation I needed
nested structures to be contiguous. For illustrative purposes here, it
will be for JSON output. I came up with what I think is an interesting
solution, which I’ve implemented in C using <a href="/blog/2025/01/19/">techniques previously
discussed</a>.</p>

<p>The above description is probably confusing on its own, and an example is
worth a thousand words, so here’s a listing naming 7 fields:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>timestamp,point.x,point.y,foo.bar.z,point.z,foo.bar.y,foo.bar.x
</code></pre></div></div>

<p>Where <code class="language-plaintext highlighter-rouge">point</code> is a substructure, as is <code class="language-plaintext highlighter-rouge">foo</code> and <code class="language-plaintext highlighter-rouge">bar</code>, but note they’re
interleaved in the record. So if a record contains these values:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{1758158348, 1.23, 4.56, -100, 7.89, -200, -300}
</code></pre></div></div>

<p>The JSON representation would look like:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"timestamp"</span><span class="p">:</span><span class="w"> </span><span class="mi">1758158348</span><span class="p">,</span><span class="w">
  </span><span class="nl">"point"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"x"</span><span class="p">:</span><span class="w"> </span><span class="mf">1.23</span><span class="p">,</span><span class="w">
    </span><span class="nl">"y"</span><span class="p">:</span><span class="w"> </span><span class="mf">4.56</span><span class="p">,</span><span class="w">
    </span><span class="nl">"z"</span><span class="p">:</span><span class="w"> </span><span class="mf">7.89</span><span class="w">
  </span><span class="p">},</span><span class="w">
  </span><span class="nl">"foo"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"bar"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nl">"z"</span><span class="p">:</span><span class="w"> </span><span class="mi">-100</span><span class="p">,</span><span class="w">
      </span><span class="nl">"y"</span><span class="p">:</span><span class="w"> </span><span class="mi">-200</span><span class="p">,</span><span class="w">
      </span><span class="nl">"x"</span><span class="p">:</span><span class="w"> </span><span class="mi">-300</span><span class="w">
    </span><span class="p">}</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>Notice <code class="language-plaintext highlighter-rouge">point.z</code> moved up and <code class="language-plaintext highlighter-rouge">foo.bar.z</code> down, so that substructures are
contiguous in this representation as required for JSON. Sorting the field
names lexicographically would group them together as a simple solution.
However, as an additional constraint I want to retain the original field
order as much as possible. For example, <code class="language-plaintext highlighter-rouge">timestamp</code> is first in both the
original and JSON representations, but sorting would put it last. If all
substructures are already contiguous, nothing should change.</p>

<h3 id="solution-with-string-interning">Solution with string interning</h3>

<p>My solution is to intern the segment strings, assigning each a unique,
monotonic integral token in the order they’re observed. In my program,
zero is reserved as a special “root” token, and so the first string has
the value 1. The concrete values aren’t important, only that they’re
assigned monotonically.</p>

<p>The trick is that a string is always interned in the “namespace” of a
previous token. That is, we’re building a <code class="language-plaintext highlighter-rouge">(token, string) -&gt; token</code> map.
For our segments that namespace is the token for the parent structure, and
the top-level fields are interned in the reserved “root” namespace. When
applied to the example, we get the token sequences:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>timestamp  -&gt; 1
point.x    -&gt; 2 3
point.y    -&gt; 2 4
foo.bar.z  -&gt; 5 6 7
point.z    -&gt; 2 8
foo.bar.y  -&gt; 5 6 9
foo.bar.x  -&gt; 5 6 10
</code></pre></div></div>

<p>And our map looks like:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{0, "timestamp"} -&gt; 1
{0, "point"}     -&gt; 2
{2, "x"}         -&gt; 3
{2, "y"}         -&gt; 4
{0, "foo"}       -&gt; 5
{5, "bar"}       -&gt; 6
{6, "z"}         -&gt; 7
{2, "z"}         -&gt; 8
{6, "y"}         -&gt; 9
{6, "x"}         -&gt; 10
</code></pre></div></div>

<p>Notice how <code class="language-plaintext highlighter-rouge">"x"</code> is assigned 3 and 10 due to different namespaces. That’s
important because otherwise the fields of <code class="language-plaintext highlighter-rouge">foo.bar</code> would sort in the same
order as <code class="language-plaintext highlighter-rouge">point</code>. Namespace gives these fields unique identities.</p>

<p>Once we have the token representation, sort lexicographically <em>by token</em>.
That pulls <code class="language-plaintext highlighter-rouge">point.z</code> up to its siblings.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>timestamp  -&gt; 1
point.x    -&gt; 2 3
point.y    -&gt; 2 4
point.z    -&gt; 2 8
foo.bar.z  -&gt; 5 6 7
foo.bar.y  -&gt; 5 6 9
foo.bar.x  -&gt; 5 6 10
</code></pre></div></div>

<p>Now we have the “output” order with minimal re-ordering. If substructures
were already contiguous, nothing changes. Assuming a reasonable map, this
is <code class="language-plaintext highlighter-rouge">O(n log n)</code>, primarily due to sorting.</p>

<h4 id="alternatives">Alternatives</h4>

<p>Before I thought of namespaces, my initial idea was to intern the whole
prefix of a segment. The sequence of look-ups would be:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"timestamp"    -&gt; 1  -&gt; {1}
"point"        -&gt; 2
"point.x"      -&gt; 3  -&gt; {2, 3}
"point"        -&gt; 2
"point.y"      -&gt; 4  -&gt; {2, 4}
"foo"          -&gt; 5
"foo.bar"      -&gt; 6
"foo.bar.z"    -&gt; 7  -&gt; {5, 6, 7}
"point"        -&gt; 2
"point.z"      -&gt; 8  -&gt; {2, 8}
"foo"          -&gt; 5
"foo.bar"      -&gt; 6
"foo.bar.y"    -&gt; 9  -&gt; {5, 6, 9}
"foo"          -&gt; 5
"foo.bar"      -&gt; 6
"foo.bar.x"    -&gt; 10 -&gt; {5, 6, 10}
</code></pre></div></div>

<p>Ultimately it produces the same tokens, and this is a more straightforward
<code class="language-plaintext highlighter-rouge">string -&gt; string</code> map. The prefixes are acting as namespaces. However, I
wrote it this way as a kind of visual proof: Notice the right triangle
shape formed by the strings for each field. From the area we can see that
processing prefixes as strings is <code class="language-plaintext highlighter-rouge">O(n^2)</code> quadratic time on the number of
segments! In my real problem the inputs were never large enough for this
to matter, but I hate <a href="https://randomascii.wordpress.com/category/quadratic/">leaving behind avoidable quadratic algorithms</a>.
Using a token as a namespace flattens the prefix to a constant size.</p>

<p>Another option is a different map for each namespace. So for <code class="language-plaintext highlighter-rouge">foo.bar.z</code>
lookup the <code class="language-plaintext highlighter-rouge">"foo"</code> map <code class="language-plaintext highlighter-rouge">(string -&gt; map)</code> in the root <code class="language-plaintext highlighter-rouge">(string -&gt; map)</code>,
then within that lookup the <code class="language-plaintext highlighter-rouge">"bar"</code> table <code class="language-plaintext highlighter-rouge">(string -&gt; token)</code> (since this
is the penultimate segment), then intern <code class="language-plaintext highlighter-rouge">"z"</code> within that to get its
token. That wouldn’t have quadratic time complexity, but it seems quite a
bit more complicated than a single, flat <code class="language-plaintext highlighter-rouge">(token, string) -&gt; token</code> map.</p>

<h3 id="implementation-in-c">Implementation in C</h3>

<p>Because <a href="/blog/2023/02/11/">the standard library has little useful for us</a>, I am
building on <a href="/blog/2025/01/19/"><strong>previously-established definitions</strong></a>, so refer to
that article for basic definitions like <code class="language-plaintext highlighter-rouge">Str</code>. To start off, tokens will
be a size-typed integer so we never need to worry about overflowing the
token counter. We’d run out of memory first:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="n">ptrdiff</span> <span class="n">Token</span><span class="p">;</span>
</code></pre></div></div>

<p>We’re building a <code class="language-plaintext highlighter-rouge">(token, string) -&gt; token)</code> map, so we’ll need a hash
function for such keys:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint64_t</span> <span class="nf">hash</span><span class="p">(</span><span class="n">Token</span> <span class="n">t</span><span class="p">,</span> <span class="n">Str</span> <span class="n">s</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">r</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">t</span> <span class="o">&lt;&lt;</span> <span class="mi">8</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="n">ptrdiff</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">s</span><span class="p">.</span><span class="n">len</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">r</span> <span class="o">^=</span> <span class="n">s</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
        <span class="n">r</span> <span class="o">*=</span> <span class="mi">1111111111111111111u</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The map itself is a forever-useful <a href="/blog/2023/09/30/">hash trie</a>.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="n">Map</span> <span class="n">Map</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">Map</span> <span class="p">{</span>
    <span class="n">Map</span>  <span class="o">*</span><span class="n">child</span><span class="p">[</span><span class="mi">4</span><span class="p">];</span>
    <span class="n">Token</span> <span class="n">namespace</span><span class="p">;</span>
    <span class="n">Str</span>   <span class="n">segment</span><span class="p">;</span>
    <span class="n">Token</span> <span class="n">token</span><span class="p">;</span>
<span class="p">};</span>

<span class="n">Token</span> <span class="o">*</span><span class="nf">upsert</span><span class="p">(</span><span class="n">Map</span> <span class="o">**</span><span class="n">m</span><span class="p">,</span> <span class="n">Token</span> <span class="n">namespace</span><span class="p">,</span> <span class="n">Str</span> <span class="n">segment</span><span class="p">,</span> <span class="n">Arena</span> <span class="o">*</span><span class="n">a</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="n">h</span> <span class="o">=</span> <span class="n">hash</span><span class="p">(</span><span class="n">ns</span><span class="p">,</span> <span class="n">segment</span><span class="p">);</span> <span class="o">*</span><span class="n">m</span><span class="p">;</span> <span class="n">h</span> <span class="o">&lt;&lt;=</span> <span class="mi">2</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">namespace</span><span class="o">==</span><span class="p">(</span><span class="o">*</span><span class="n">m</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">namespace</span> <span class="o">&amp;&amp;</span> <span class="n">equals</span><span class="p">(</span><span class="n">segment</span><span class="p">,</span> <span class="p">(</span><span class="o">*</span><span class="n">m</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">segment</span><span class="p">))</span> <span class="p">{</span>
            <span class="k">return</span> <span class="o">&amp;</span><span class="p">(</span><span class="o">*</span><span class="n">m</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">token</span><span class="p">;</span>
        <span class="p">}</span>
        <span class="n">m</span> <span class="o">=</span> <span class="o">&amp;</span><span class="p">(</span><span class="o">*</span><span class="n">m</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">child</span><span class="p">[</span><span class="n">h</span><span class="o">&gt;&gt;</span><span class="mi">62</span><span class="p">];</span>
    <span class="p">}</span>
    <span class="o">*</span><span class="n">m</span> <span class="o">=</span> <span class="n">new</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">Map</span><span class="p">);</span>
    <span class="p">(</span><span class="o">*</span><span class="n">m</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">namespace</span> <span class="o">=</span> <span class="n">namespace</span><span class="p">;</span>
    <span class="p">(</span><span class="o">*</span><span class="n">m</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">segment</span> <span class="o">=</span> <span class="n">segment</span><span class="p">;</span>
    <span class="k">return</span> <span class="o">&amp;</span><span class="p">(</span><span class="o">*</span><span class="n">m</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">token</span><span class="p">;</span>  <span class="c1">// caller will assign</span>
<span class="p">}</span>
</code></pre></div></div>

<p>We’ll use this map to convert a string naming a field into a sequence of
tokens, so we’ll <a href="/blog/2025/06/26/">need a slice</a>. Fields also have an offset within
the record and a type, which we’ll track via its original ordering, which
I’ll do with an <code class="language-plaintext highlighter-rouge">index</code> field (e.g. into the original header). Also track
the original name.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
    <span class="n">Str</span>          <span class="n">name</span><span class="p">;</span>
    <span class="kt">ptrdiff_t</span>    <span class="n">index</span><span class="p">;</span>
    <span class="n">Slice</span><span class="p">(</span><span class="n">Token</span><span class="p">)</span> <span class="n">tokens</span><span class="p">;</span>
<span class="p">}</span> <span class="n">Field</span><span class="p">;</span>
</code></pre></div></div>

<p>To sort fields we’ll need a comparator:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">ptrdiff_t</span> <span class="nf">field_compare</span><span class="p">(</span><span class="n">Field</span> <span class="n">a</span><span class="p">,</span> <span class="n">Field</span> <span class="n">b</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">ptrdiff_t</span> <span class="n">len</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="n">a</span><span class="p">.</span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">,</span> <span class="n">b</span><span class="p">.</span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">);</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">ptrdiff_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">len</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">Token</span> <span class="n">d</span> <span class="o">=</span> <span class="n">a</span><span class="p">.</span><span class="n">tokens</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">-</span> <span class="n">b</span><span class="p">.</span><span class="n">tokens</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">d</span><span class="p">)</span> <span class="p">{</span>
            <span class="k">return</span> <span class="n">d</span><span class="p">;</span>
        <span class="p">}</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">a</span><span class="p">.</span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span> <span class="o">-</span> <span class="n">b</span><span class="p">.</span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Because field names are unique, each token sequence is unique, and so we
need not use <code class="language-plaintext highlighter-rouge">index</code> in the comparator.</p>

<p>Finally down to business: <a href="/blog/2025/03/02/">cut up the list</a> and build the token
sequences with the established <code class="language-plaintext highlighter-rouge">push</code> macro. The sort function isn’t
interesting, and could be as simple as libc <code class="language-plaintext highlighter-rouge">qsort</code> with the above
comparator (and adapter), so I’m only listing the prototype.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">field_sort</span><span class="p">(</span><span class="n">Slice</span><span class="p">(</span><span class="n">Field</span><span class="p">),</span> <span class="n">Arena</span> <span class="n">scratch</span><span class="p">);</span>

<span class="n">Slice</span><span class="p">(</span><span class="n">Field</span><span class="p">)</span> <span class="n">parse_fields</span><span class="p">(</span><span class="n">Str</span> <span class="n">fieldlist</span><span class="p">,</span> <span class="n">Arena</span> <span class="o">*</span><span class="n">a</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">Slice</span><span class="p">(</span><span class="n">Field</span><span class="p">)</span> <span class="n">fields</span>  <span class="o">=</span> <span class="p">{};</span>
    <span class="n">Map</span>         <span class="o">*</span><span class="n">strtab</span>  <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="kt">ptrdiff_t</span>    <span class="n">ntokens</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

    <span class="k">for</span> <span class="p">(</span><span class="n">Cut</span> <span class="n">c</span> <span class="o">=</span> <span class="p">{.</span><span class="n">tail</span><span class="o">=</span><span class="n">fieldlist</span><span class="p">,</span> <span class="p">.</span><span class="n">ok</span><span class="o">=</span><span class="nb">true</span><span class="p">};</span> <span class="n">c</span><span class="p">.</span><span class="n">ok</span><span class="p">;)</span> <span class="p">{</span>
        <span class="n">c</span> <span class="o">=</span> <span class="n">cut</span><span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">tail</span><span class="p">,</span> <span class="sc">','</span><span class="p">);</span>
        <span class="n">Field</span> <span class="n">field</span> <span class="o">=</span> <span class="p">{};</span>
        <span class="n">field</span><span class="p">.</span><span class="n">name</span>  <span class="o">=</span> <span class="n">c</span><span class="p">.</span><span class="n">head</span><span class="p">;</span>
        <span class="n">field</span><span class="p">.</span><span class="n">index</span> <span class="o">=</span> <span class="n">fields</span><span class="p">.</span><span class="n">len</span><span class="p">;</span>

        <span class="n">Token</span> <span class="n">prev</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
        <span class="k">for</span> <span class="p">(</span><span class="n">Cut</span> <span class="n">f</span> <span class="o">=</span> <span class="p">{.</span><span class="n">tail</span><span class="o">=</span><span class="n">field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span> <span class="p">.</span><span class="n">ok</span><span class="o">=</span><span class="nb">true</span><span class="p">};</span> <span class="n">f</span><span class="p">.</span><span class="n">ok</span><span class="p">;)</span> <span class="p">{</span>
            <span class="n">f</span> <span class="o">=</span> <span class="n">cut</span><span class="p">(</span><span class="n">f</span><span class="p">.</span><span class="n">tail</span><span class="p">,</span> <span class="sc">'.'</span><span class="p">);</span>
            <span class="n">Token</span> <span class="o">*</span><span class="n">token</span> <span class="o">=</span> <span class="n">upsert</span><span class="p">(</span><span class="o">&amp;</span><span class="n">strtab</span><span class="p">,</span> <span class="n">prev</span><span class="p">,</span> <span class="n">f</span><span class="p">.</span><span class="n">head</span><span class="p">,</span> <span class="n">a</span><span class="p">);</span>
            <span class="k">if</span> <span class="p">(</span><span class="o">!*</span><span class="n">token</span><span class="p">)</span> <span class="p">{</span>
                <span class="o">*</span><span class="n">token</span> <span class="o">=</span> <span class="o">++</span><span class="n">ntokens</span><span class="p">;</span>
            <span class="p">}</span>
            <span class="o">*</span><span class="n">push</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">field</span><span class="p">.</span><span class="n">tokens</span><span class="p">)</span> <span class="o">=</span> <span class="o">*</span><span class="n">token</span><span class="p">;</span>
            <span class="n">prev</span> <span class="o">=</span> <span class="o">*</span><span class="n">token</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="o">*</span><span class="n">push</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">fields</span><span class="p">)</span> <span class="o">=</span> <span class="n">field</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="n">field_sort</span><span class="p">(</span><span class="n">fields</span><span class="p">,</span> <span class="o">*</span><span class="n">a</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">fields</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Usage here suggests <code class="language-plaintext highlighter-rouge">Cut::ok</code> should be inverted to <code class="language-plaintext highlighter-rouge">Cut::done</code> so that it
better zero-initializes. Something I’ll need to consider. Because it’s all
allocated from an arena, no need for destructors or anything like that, so
this is the complete implementation. Back to the example:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">Str</span> <span class="n">fieldlist</span> <span class="o">=</span> <span class="n">S</span><span class="p">(</span>
        <span class="s">"timestamp,"</span>
        <span class="s">"point.x,"</span>
        <span class="s">"point.y,"</span>
        <span class="s">"foo.bar.z,"</span>
        <span class="s">"point.z,"</span>
        <span class="s">"foo.bar.y,"</span>
        <span class="s">"foo.bar.x"</span>
    <span class="p">);</span>
    <span class="n">Slice</span><span class="p">(</span><span class="n">Field</span><span class="p">)</span> <span class="n">fields</span> <span class="o">=</span> <span class="n">parse_fields</span><span class="p">(</span><span class="n">fieldlist</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">scratch</span><span class="p">);</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">ptrdiff_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">fields</span><span class="p">.</span><span class="n">len</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">Str</span> <span class="n">name</span> <span class="o">=</span> <span class="n">fields</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">name</span><span class="p">;</span>
        <span class="n">fwrite</span><span class="p">(</span><span class="n">name</span><span class="p">.</span><span class="n">data</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">name</span><span class="p">.</span><span class="n">len</span><span class="p">,</span> <span class="n">stdout</span><span class="p">);</span>
        <span class="n">putchar</span><span class="p">(</span><span class="sc">'\n'</span><span class="p">);</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>This program will print the proper output field order. In a real program
we’d hold onto the string table, define an inverse lookup to translate
tokens back into strings, and use it when in producing output. I do just
that in my exploratory program, <a href="https://github.com/skeeto/scratch/blob/master/rec2json/rec2json.c"><strong><code class="language-plaintext highlighter-rouge">rec2json.c</code></strong></a>, written a little
differently than presented above. It uses the sorted tokens to compile a
simple bytecode program that, when run against a record, produces its JSON
representation. It compiles the example to:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>OPEN          # print '{'
KEY     1     # print token 1 as a key, i.e. "timestamp:"
READ    0     # print double at record offset 0
COMMA         # print ','
KEY     2     # print token 2 as a key, i.e. "point:"
OPEN
KEY     3
READ    8     # print double at record offset 8
COMMA
KEY     4
READ    16
COMMA
KEY     8
READ    32
CLOSE         # print '}'
COMMA
KEY     5
OPEN
KEY     6
OPEN
KEY     7
READ    24
COMMA
KEY     9
READ    40
COMMA
KEY     10
READ    48
CLOSE
CLOSE
CLOSE
</code></pre></div></div>

<p>Seeing it written out, I notice more room for improvement. An optimization
pass could coalesce instructions so that, for instance, <code class="language-plaintext highlighter-rouge">OPEN</code> then <code class="language-plaintext highlighter-rouge">KEY</code>
<a href="/blog/2024/05/25/">concatenate</a> to a single string at compile time so that it only needs
one instruction. This program could be 15 instructions instead of 31. In
my real case I didn’t need anything quite this sophisticated, but it was
fun to explore.</p>

]]>
    </content>
  </entry>
  

</feed>
