<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title>Articles tagged cpp at null program</title>
  <link rel="alternate" type="text/html"
        href="https://nullprogram.com/tags/cpp/"/>
  <link rel="self" type="application/atom+xml"
        href="https://nullprogram.com/tags/cpp/feed/"/>
  <updated>2026-04-09T13:25:45Z</updated>
  <id>urn:uuid:bdc867e4-b8f7-4cf0-8437-adafc8297129</id>

  <author>
    <name>Christopher Wellons</name>
    <uri>https://nullprogram.com</uri>
    <email>wellons@nullprogram.com</email>
  </author>

  
    
  <entry>
    <title>dcmake: a new CMake debugger UI</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2026/04/07/"/>
    <id>urn:uuid:eb448519-0a55-4c1c-bc55-17a65634224f</id>
    <updated>2026-04-07T03:04:02Z</updated>
    <category term="cpp"/>
    <content type="html">
      <![CDATA[<p>CMake has a <a href="https://cmake.org/cmake/help/latest/manual/cmake.1.html#cmdoption-cmake-debugger"><code class="language-plaintext highlighter-rouge">--debugger</code> mode</a> since <a href="https://cmake.org/cmake/help/latest/release/3.27.html#debugger">3.27</a> (July 2023),
allowing software to manipulate it interactively through the <a href="https://microsoft.github.io/debug-adapter-protocol/">Debugger
Adaptor Protocol</a> (DAP), an HTTP-like protocol passing JSON messages.
Debugger front-ends can start, stop, step, breakpoint, query variables,
etc. a live CMake. When I came across this mode, I immediately conceived a
project putting it to use. Thanks to <a href="/blog/2026/03/29/">recent leaps in software engineering
productivity</a>, I had a working prototype in 30 minutes, and by the
end of that same day, a complete, multi-platform, native, GUI application.
I named it <strong><a href="https://github.com/skeeto/dcmake">dcmake</a></strong> (“debugger for CMake”). I’ve tested it on macOS,
Windows, and Linux. Despite only being couple days old, it’s one of the
coolest things I’ve ever built. Prior to 2026, I estimate it would have
taken me a month to get the tool to this point.</p>

<p><a href="/img/dcmake/dcmake.png"><img src="/img/dcmake/dcmake-thumb.png" alt="" /></a></p>

<p>It has a <a href="https://github.com/ocornut/imgui">Dear ImGui</a> interface, which I’ve experienced as a user but
never built on myself before. Specifically the <a href="https://github.com/ocornut/imgui/wiki/Docking">docking branch</a>. In a
sense it’s a toolkit for building debuggers, so it’s playing an enormous
role in how quickly I put this project together. All of the “windows” tear
out and may be free-floating or docked wherever you like, closely matching
the classic Visual Studio UI. I borrowed all the same keybindings: F10 to
step over, F11 to step in, F5 to start/continue, shift+F5 to stop. Click
on line numbers to toggle breakpoints, right click to run-to-line, hover
over variables with the mouse to see their values. Nearly every every UI
state persists across sessions, and it opens nearly instantly.</p>

<video src="/vid/dcmake.mp4" loop="" muted="" autoplay=""></video>

<p>This is just one of many situations I’ve used AI the past month for UI
development, and it’s been shockingly effective. I can describe roughly
the interface I want, and the AI makes it happen in a matter of minutes.
It understands what I mean, filling in the details, sometimes anticipating
what I’ll ask for next. If I’m unsure how I want a UI to work, it also
offers good advice. If I need simple icons and such, it can draw those,
too. It’s all incredibly empowering.</p>

<p>On macOS and Linux it runs on top of GLFW with OpenGL 3 rendering, and on
Windows it uses native Win32 windowing and DirectX 11 rendering.</p>

<p>Program arguments given to dcmake populate the top-left arguments text
input, which go straight into CMake on start. So you can prepend <code class="language-plaintext highlighter-rouge">d</code> to
your CMake configuration command to run it inside the debugger. Passing no
arguments sets it up for “standard” <code class="language-plaintext highlighter-rouge">-B build</code> configuration.</p>

<p>In general, if you don’t have anywhere in particular to look, likely the
first thing to do after starting dcmake (in a project) is press F10. It
starts CMake paused on the first line of <code class="language-plaintext highlighter-rouge">CMakeLists.txt</code>, or whatever
script you’re debugging. If you’re trying out dcmake for the first time,
that’s a good place to start. Keep pressing F10 to step through that
script, watching it run through its configuration. If you F11 through the
script then you’ll dive deeper and deeper into CMake itself, which can be
insightful.</p>

<p>There is no point in trying to debug <code class="language-plaintext highlighter-rouge">--build</code> invocations. It’s just a
uniform interface to the underlying build tool, and there is no CMake left
to debug at that point. However, it <em>does</em> work with <code class="language-plaintext highlighter-rouge">-P</code> <a href="https://cmake.org/cmake/help/latest/manual/cmake.1.html#cmdoption-cmake-P">script mode</a>
invocations. CMake can operate as a <a href="https://claude.ai/public/artifacts/06b50c8f-ff71-4562-8ab5-80adaddff9b7">platform-agnostic shell script-like
tool</a>, but unlike shell scripts you can step through them with a
debugger like dcmake.</p>

<p>On Windows it supports Unicode paths all the way through, without <a href="/blog/2021/12/30/">a UTF-8
manifest</a>. This took some <a href="/blog/2022/02/18/">special care</a>, in particular
avoiding any C++ standard library I/O functionality. Current frontier AI
cannot handle this detail on their own. The macOS platform required a bit
of Objective-C, as it often does, and I’m happy I didn’t have to figure
that part out myself.</p>

<p>The next release of <a href="https://github.com/skeeto/w64devkit">w64devkit</a> will include dcmake, complementing its
recent addition of CMake. This new tool has already proven useful in its
own development.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>2026 has been the most pivotal year in my career… and it's only March</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2026/03/29/"/>
    <id>urn:uuid:91d679b3-4f07-4b61-b359-5890695ad621</id>
    <updated>2026-03-29T21:38:22Z</updated>
    <category term="ai"/><category term="c"/><category term="cpp"/>
    <content type="html">
      <![CDATA[<p>In February I left my employer after nearly two decades of service. In the
moment I was optimistic, yet unsure I made the right choice. Dust settled,
I’m now absolutely sure I chose correctly. I’m happier and better for it.
There were multiple factors, but it’s not mere chance it coincides with
these early months of <a href="https://shumer.dev/something-big-is-happening">the automation of software engineering</a>. I
left an employer that is <em>years behind</em> adopting AI to one actively
supporting and encouraging it. As of March, in my professional capacity
<strong>I no longer write code myself</strong>. My current situation was unimaginable
to me only a year ago. Like it or not, this is the future of software
engineering. Turns out I like it, and having tasted the future I don’t
want to go back to the old ways.</p>

<p>In case you’re worried, this is still me. These are my own words. <a href="https://paulgraham.com/writes.html">Writing
is thinking</a>, and it would defeat the purpose for an AI to write
in my place on my personal blog. That’s not going to change.</p>

<p>I still spend much time reading and understanding code, and using most of
the same development tools. It’s more like being a manager, orchestrating
a nebulous team of inhumanly-fast, nameless assistants. Instead of dicing
the vegetables, I conjure a helper to do it while I continue to run the
kitchen. I haven’t managed people in some 20 years now, but I can feel
those old muscles being put to use again as I improve at this new role.
Will these kitchens still need human chefs like me by the end of the
decade? Unclear, and it’s something we all need to prepare for.</p>

<p>My situation gave me an experience onboarding with AI assistance — a fast
process given a near-instant, infinitely-patient helper answering any
question about the code. By second week I was making substantial, wide
contributions to the large C++ code base. It’s difficult to attach a
quantifiable factor like 2x, 5x, 10x, etc. faster, but I can say for
certain this wouldn’t have been possible without AI. The bottlenecks have
shifted from producing code, which now takes relatively no time at all, to
other points, and we’re all still trying to figure it out.</p>

<p>My personal programming has transformed as well. Everything <a href="/blog/2024/11/10/">I said about
AI in late 2024</a> is, as I predicted, utterly obsolete. There’s a
huge, growing gap between open weight models and the frontier. Models you
can run yourself are toys. In general, almost any AI product or service
worth your attention costs money. The free stuff is, at minimum, months
behind. Most people only use limited, free services, so there’s a broad
unawareness of just how far AI has advanced. AI is <em>now highly skilled at
programming</em>, and better than me at almost every programming task, with
inhumanly-low defect rates. The remaining issues are mainly steering
problems: If AI code doesn’t do what I need, likely the AI writing it
didn’t understand what I needed.</p>

<p>I’ll still write code myself from time to time for fun — <a href="/blog/2018/06/10/">minimalist</a>,
with my <a href="/blog/2023/10/08/">style</a> and <a href="/blog/2025/01/19/">techniques</a> — the same way I play <a href="https://en.wikipedia.org/wiki/Shogi">shogi</a> on
the weekends for fun. However, artisan production is uneconomical in the
presence of industrialization. AI makes programming so cheap that only the
rich will write code by hand.</p>

<p>A small part of me is sad at what is lost. A bigger part is excited about
the possibilities of the future. I’ve always had more ideas than time or
energy to pursue them. With AI at my command, the problem changes shape. I
can comfortably take on complexity from which I previously shied away, and
I can take a shot at any idea sufficiently formed in my mind to prompt an
AI — a whole skill of its own that I’m actively developing.</p>

<p>For instance, a couple weeks ago I <a href="https://github.com/skeeto/w64devkit/pull/357">put AI to work on a problem</a>,
and it produced a working solution for me after ~12 hours of continuous,
autonomous work, literally while I slept. The past month <a href="https://github.com/skeeto/w64devkit">w64devkit</a> has
burst with activity, almost entirely AI-driven. Some of it architectural
changes I’ve wanted for years, but would require hours of tedious work,
and so I never got around to it. AI knocked it out in minutes, with the
new architecture opening new opportunities. It’s also taken on most of the
cognitive load of maintenance.</p>

<h3 id="quiltcpp">Quilt.cpp</h3>

<p>So far the my biggest, successful undertaking is <strong><a href="https://github.com/skeeto/quilt.cpp">Quilt.cpp</a></strong>, a C++
clone of <a href="https://savannah.nongnu.org/projects/quilt">Quilt</a>, an early, actively-used source control system for
patch management. Git is a glaring omission from the <a href="/blog/2020/09/25/">almost</a> complete
w64devkit, due platform and build issues. I’ve thought Quilt could fill
<em>some</em> of that source control hole, except the original is written in
Bash, Perl, and GNU Coreutils — even more of a challenge than Git. Since
Quilt is conceptually simple, and I could lean on <a href="https://frippery.org/busybox/">busybox-w32</a> <code class="language-plaintext highlighter-rouge">diff</code>
and <code class="language-plaintext highlighter-rouge">patch</code>, I’ve considered writing my own implementation, just <a href="/blog/2023/01/18/">as I did
pkg-config</a>, but I never found the energy to do it.</p>

<p>Then I got good enough with AI to knock out a near feature-complete clone
in about four days, including a built-in <code class="language-plaintext highlighter-rouge">diff</code> and <code class="language-plaintext highlighter-rouge">patch</code> so it doesn’t
actually depend on external tools (except invoking <code class="language-plaintext highlighter-rouge">$EDITOR</code>). On Windows
it’s a ~1.6MB standalone EXE, to be included in future w64devkit releases.
The source is distributed as an amalgamation, a single file <code class="language-plaintext highlighter-rouge">quilt.cpp</code>
per its namesake:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ c++ -std=c++20 -O2 -s -o quilt.exe quilt.cpp
$ ./quilt.exe --help
Usage: quilt [--quiltrc file] &lt;command&gt; [options] [args]

Commands:
  new        Create a new empty patch
  add        Add files to the topmost patch
  push       Apply patches to the source tree
  pop        Remove applied patches from the stack
  refresh    Regenerate a patch from working tree changes
  diff       Show the diff of the topmost or a specified patch
  series     List all patches in the series
  applied    List applied patches
  unapplied  List patches not yet applied
  top        Show the topmost applied patch
  next       Show the next patch after the top or a given patch
  previous   Show the patch before the top or a given patch
  delete     Remove a patch from the series
  rename     Rename a patch
  import     Import an external patch into the series
  header     Print or modify a patch header
  files      List files modified by a patch
  patches    List patches that modify a given file
  edit       Add files to the topmost patch and open an editor
  revert     Discard working tree changes to files in a patch
  remove     Remove files from the topmost patch
  fold       Fold a diff from stdin into the topmost patch
  fork       Create a copy of the topmost patch under a new name
  annotate   Show which patch modified each line of a file
  graph      Print a dot dependency graph of applied patches
  mail       Generate an mbox file from a range of patches
  grep       Search source files (not implemented)
  setup      Set up a source tree from a series file (not implemented)
  shell      Open a subshell (not implemented)
  snapshot   Save a snapshot of the working tree for later diff
  upgrade    Upgrade quilt metadata to the current format
  init       Initialize quilt metadata in the current directory

Use "quilt &lt;command&gt; --help" for details on a specific command.
</code></pre></div></div>

<p>It supports Windows and POSIX, and runs ~5x faster than the original. AI
developed it on Windows, Linux, and macOS: It’s best when the AI can close
the debug loop and tackle problems autonomously without involving a human
slowpoke. The handful of “not implemented” parts aren’t because they’re
too hard — each would probably take an AI ~10 minutes — but deliberate
decisions of taste.</p>

<p>There’s an irony that the reason I could produce Quilt.cpp with such ease
is also a reason I don’t really need it anymore.</p>

<p>I changed the output of <code class="language-plaintext highlighter-rouge">quilt mail</code> to be more Git-compatible. The mbox
produced by Quilt.cpp can be imported into Git with a plain <code class="language-plaintext highlighter-rouge">git am</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ quilt mail --mbox feature-branch.mbox
$ git am feature-branch.mbox
</code></pre></div></div>

<p>The idea being that I could work on a machine without Git (e.g. Windows
XP), and copy/mail the mbox to another machine where Git can absorb it as
though it were in Git the whole time. <code class="language-plaintext highlighter-rouge">git format-patch</code> to <code class="language-plaintext highlighter-rouge">quilt import</code>
sends commits in the opposite direction, useful for manually testing
Quilt.cpp on real change sets.</p>

<p>To be clear, I could not have done this if the original Quilt did not
exist as a working program. I began with an AI generating a <a href="https://eli.thegreenplace.net/2026/rewriting-pycparser-with-the-help-of-an-llm/">conformance
suite</a> based on the original, its documentation, and other online
documentation, validating that suite against the original implementation
(see <code class="language-plaintext highlighter-rouge">-DQUILT_TEST_EXECUTABLE</code>). Then had another AI code to the tests, on
architectural guidance from me, with <code class="language-plaintext highlighter-rouge">-D_GLIBCXX_DEBUG</code> and sanitizers as
guardrails. That was day one. The next three days were lots of refining
and iteration as I discover the gaps in the test suite. I’d prompt AI to
compare Quilt.cpp to the original Quilt man page, add tests for missing
features, validate the new tests against the original Quilt, then run
several agents to fix the tests. While they worked I’d try the latest
build and note any bugs. As of this writing, the result is about equal
parts test and non-test, ~9KLoC each.</p>

<p>I’m likely to use this technique to clone other tools with implementations
unsuitable for my purposes. I learned quite a bit from this first attempt.</p>

<p>Why C++ instead of my usual choice of C? As we know, <a href="/blog/2023/02/11/">conventional C is
highly error-prone</a>. Even AI has trouble with it. In the ~9k lines
of C++ that is Quilt.cpp, I am only aware of three memory safety errors by
the AI. Two were null-terminated string issues with <code class="language-plaintext highlighter-rouge">strtol</code>, where the AI
was essentially writing C instead of C++, after which I directed the AI to
use <code class="language-plaintext highlighter-rouge">std::from_chars</code> and drop as much direct libc use as possible. (The
other was an unlikely branch with <code class="language-plaintext highlighter-rouge">std::vector::back</code> on an empty vector.)
We can rescue C with better techniques like arena allocation, counted
strings, and slices, but while (current) state of the art AI understands
these things, it cannot work effectively with them in C. I’ve tried. So I
picked C++, and from my professional work I know AI is better at C++ than
me.</p>

<p>Also like a manager, I have not read most of the code, and instead focused
on results, so you might say this was “vibe-coded.” It <em>is</em> thoroughly
tested, though I’m sure there are still bugs to be ironed out, especially
on the more esoteric features I haven’t tried by hand yet.</p>

<h3 id="lets-discuss-tools">Let’s discuss tools</h3>

<p>After opposing CMake for years, you may have noticed the latest w64devkit
now includes CMake and Ninja. What happened? Preparing for my anticipated
employment change, this past December I read <a href="https://crascit.com/professional-cmake/"><em>Professional CMake</em></a>.
I realized that my practical problems with CMake were that nearly everyone
uses it incorrectly. Most CMake builds are a disaster, but my new-found
knowledge allows me to navigate the common mistakes. Only high profile
open source projects manage to put together proper CMake builds. Otherwise
the internet is loaded with CMake misinformation. Similar to AI, if you’re
not paying for CMake knowledge then it’s likely wrong or misleading. So I
highly recommend that book!</p>

<p>Frontier AI is <em>very good</em> with CMake. When a project has a CMake build
that isn’t <em>too</em> badly broken, just tell AI to fix it, <em>without any
specifics</em>, and build problems disappear in mere minutes without having to
think about it. It’s awesome. Combine it with the previous discussion
about tests making AI so much more effective, and that it <em>also</em> knows
CTest well, and you’ve got a killer formula. I’m more effective with CTest
myself merely from observing how AI uses it. AI (currently) cannot use
debuggers, so putting powerful, familiar testing tools in its hands helps
a lot, versus the usual bespoke, debugger-friendly solutions I prefer.</p>

<p>Similar to solving CMake problems: Have a hairy merge conflict? Just ask
AI resolve it. It’s like magic. I no longer fear merge conflicts.</p>

<p>So part of my motivation for adding CMake to w64devkit was anticipation of
projects like Quilt.cpp, where they’d be available to AI, or at least so I
could use the tools the AI used to build/test myself. It’s already paid
for itself, and there’s more to come.</p>

<p>For agent software, on personal projects I’m using Claude Code. It’s a
great value, cheaper than paying API rates but requires working around
5-hour limit windows. I started with Pro (US$20/mo), but I’m getting so
much out of it that as of this writing I’m on 5x Max (US$100/mo) simply to
have enough to explore all my ideas. Be warned: <strong>Anthropic software is
quite buggy, more so than industry average</strong>, and it’s obvious that they
never even <em>start</em>, let alone test, some of their released software on
disfavored platforms (Windows, Android). Don’t expect to use Claude Code
effectively for native Windows platform development, which sadly includes
w64devkit. Hopefully that’s fixed someday. I suspect Anthropic hit a
bottleneck on QA, and unable to fit AI in that role they don’t bother. You
can theoretically report bugs on GitHub, but they’re just ignored and
closed. (Why don’t they have AI agents jumping on this wealth of bug
reports?)</p>

<p>At work I’m using Cursor where I get a choice of models. My favorite for
March has been GPT-5.4, which in my experience beats Opus 4.6 on Claude
Code by a small margin. It’s immediately obvious that Cursor is better
agent software than Claude Code. It’s more robust, more featureful, and
with a clearer UI than Claude Code. It has no trouble on Windows and can
drive w64devkit flawlessly. It’s also more expensive than Claude Code. My
employer currently spends ~US$250/mo on my AI tokens, dirt cheap
considering what they’re getting out of it. I have bottlenecks elsewhere
that keep me from spending even more.</p>

<p>As a general rule, for software engineering always use the smartest model
available. The cheaper, dumber models cost more in the long run. It takes
more tokens to achieve worse results, which costs more human time to sort
out.</p>

<p>Neither Cursor nor Claude Code are open source, so what are the purists to
do, even if they’re willing to pay API rates for tokens? Sadly I have no
answers for you. I haven’t gotten any open source agent software actually
working, and it seems they may lack the necessary secret sauce.</p>

<p>Update: Several folks suggested I give <a href="https://opencode.ai/">OpenCode</a> another shot, and this
time I got over the configuration hurdle. Single executable, slick
interface, and unlike Claude Code, I observed no bugs in my brief trial.
Give that a shot if you’re looking for an open source client.</p>

<p>The future is going to be weird. My experience is only a peek at what’s to
come, and my head is still spinning. However, the more I adapt to the
changes, the better I feel. If you’re feeling anxious like I was, don’t
flinch from improving your own AI knowledge and experience.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Speculations on arenas and non-trivial destructors</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2025/10/16/"/>
    <id>urn:uuid:102e0e39-0078-4698-b2d2-b9454dfe5545</id>
    <updated>2025-10-16T20:11:22Z</updated>
    <category term="cpp"/>
    <content type="html">
      <![CDATA[<p>As I <a href="/blog/2025/09/30/">continue to reflect</a> on arenas and lifetimes in C++, I realized
that dealing with destructors is not so onerous. In fact, it does not even
impact <a href="/blog/2025/01/19/">my established arena usage</a>! That is, implicit RAII-style
deallocation at scope termination, which works even in plain old C. With a
small change we can safely place resource-managing objects in arenas, such
as those owning file handles, sockets, threads, etc. (Though the ideal
remains <a href="/blog/2024/10/03/">resource management avoidance</a> when possible.) We can also
place traditional, memory-managing C++ objects in arenas, too. Their own
allocations won’t come from the arena — either because they <a href="/blog/2024/09/04/">lack the
interfaces</a> to do so, or they’re simply ineffective at it (<a href="https://en.cppreference.com/w/cpp/memory/polymorphic.html">pmr</a>) —
but they will reliably clean up after themselves. It’s all exception-safe,
too. In this article I’ll update my arena allocator with this new feature.
The change requires one additional arena pointer member, a bit of overhead
for objects with non-trivial destructors, and no impact for other objects.</p>

<p>I continue to title this “speculations” because, unlike arenas in C, I
have not (yet?) put these C++ techniques into practice in real software. I
haven’t refined them through use. Even ignoring its standard library as I
do here, C++ is an enormously complex programming language — far more so
than C — and I’m less confident that I’m not breaking a rule by accident.
I only want to break rules with intention!</p>

<p>As a reminder here’s where we left things off:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">Arena</span> <span class="p">{</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">beg</span><span class="p">;</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">end</span><span class="p">;</span>
<span class="p">};</span>

<span class="k">template</span><span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">&gt;</span>
<span class="n">T</span> <span class="o">*</span><span class="n">raw_alloc</span><span class="p">(</span><span class="n">Arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">count</span> <span class="o">=</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">ptrdiff_t</span> <span class="n">size</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">T</span><span class="p">);</span>
    <span class="kt">ptrdiff_t</span> <span class="n">pad</span>  <span class="o">=</span> <span class="o">-</span><span class="p">(</span><span class="kt">uintptr_t</span><span class="p">)</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">&amp;</span> <span class="p">(</span><span class="k">alignof</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">count</span> <span class="o">&gt;=</span> <span class="p">(</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">end</span> <span class="o">-</span> <span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">-</span> <span class="n">pad</span><span class="p">)</span><span class="o">/</span><span class="n">size</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">throw</span> <span class="n">std</span><span class="o">::</span><span class="n">bad_alloc</span><span class="p">{};</span>  <span class="c1">// OOM policy</span>
    <span class="p">}</span>
    <span class="kt">void</span> <span class="o">*</span><span class="n">r</span> <span class="o">=</span> <span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">+</span> <span class="n">pad</span><span class="p">;</span>
    <span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">+=</span> <span class="n">pad</span> <span class="o">+</span> <span class="n">count</span><span class="o">*</span><span class="n">size</span><span class="p">;</span>
    <span class="k">return</span> <span class="k">new</span><span class="p">(</span><span class="n">r</span><span class="p">)</span> <span class="n">T</span><span class="p">[</span><span class="n">count</span><span class="p">]{};</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I used <code class="language-plaintext highlighter-rouge">throw</code> when out of memory mainly to emphasize that this works, but
you’re free to pick whatever is appropriate for your program. Remember,
that’s the entire allocator, including implicit deallocation, sufficient
to fulfill the allocation needs for most programs, though they must be
designed for it. Also note that it’s now <code class="language-plaintext highlighter-rouge">raw_alloc</code>, as we’ll be writing
a new, enhanced <code class="language-plaintext highlighter-rouge">alloc</code> that builds upon this one.</p>

<p>Also a reminder on usage, I’ll draw on an old example, updated for C++:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">wchar_t</span>   <span class="o">*</span><span class="nf">towidechar</span><span class="p">(</span><span class="n">Str</span><span class="p">,</span> <span class="n">Arena</span> <span class="o">*</span><span class="p">);</span>   <span class="c1">// convert to UTF-16</span>
<span class="n">Str</span>        <span class="nf">slurpfile</span><span class="p">(</span><span class="kt">wchar_t</span> <span class="o">*</span><span class="n">path</span><span class="p">);</span>   <span class="c1">// read an entire file</span>
<span class="n">Slice</span><span class="o">&lt;</span><span class="n">Str</span><span class="o">&gt;</span> <span class="n">split</span><span class="p">(</span><span class="n">Str</span><span class="p">,</span> <span class="kt">char</span><span class="p">,</span> <span class="n">Arena</span> <span class="o">*</span><span class="p">);</span>  <span class="c1">// split on delimiter</span>

<span class="n">Slice</span><span class="o">&lt;</span><span class="n">Str</span><span class="o">&gt;</span> <span class="n">getlines</span><span class="p">(</span><span class="n">Str</span> <span class="n">path</span><span class="p">,</span> <span class="n">Arena</span> <span class="o">*</span><span class="n">perm</span><span class="p">,</span> <span class="n">Arena</span> <span class="n">scratch</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// Use scratch for path conversion, auto-free on return</span>
    <span class="kt">wchar_t</span> <span class="o">*</span><span class="n">wpath</span> <span class="o">=</span> <span class="n">towidechar</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">scratch</span><span class="p">);</span>

    <span class="c1">// Use perm for file contents, which are returned</span>
    <span class="n">Str</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">slurpfile</span><span class="p">(</span><span class="n">wpath</span><span class="p">,</span> <span class="n">perm</span><span class="p">);</span>

    <span class="c1">// Use perm for the slice, pointing into buf</span>
    <span class="k">return</span> <span class="n">split</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="sc">'\n'</span><span class="p">,</span> <span class="n">perm</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Changes to <code class="language-plaintext highlighter-rouge">scratch</code> do not persist after <code class="language-plaintext highlighter-rouge">getlines</code> returns, so objects
allocated from that arena are automatically freed on return. So far this
doesn’t rely on C++ RAII features, just simple value semantics. It works
well because all the objects in question have trivial destructors. But
suppose there’s a resource to manage:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">TcpSocket</span> <span class="p">{</span>
    <span class="kt">int</span> <span class="n">socket</span> <span class="o">=</span> <span class="o">::</span><span class="n">socket</span><span class="p">(</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">SOCK_STREAM</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
    <span class="n">TcpSocket</span><span class="p">()</span> <span class="o">=</span> <span class="k">default</span><span class="p">;</span>
    <span class="n">TcpSocket</span><span class="p">(</span><span class="n">TcpSocket</span> <span class="o">&amp;</span><span class="p">)</span> <span class="o">=</span> <span class="k">delete</span><span class="p">;</span>
    <span class="kt">void</span> <span class="k">operator</span><span class="o">=</span><span class="p">(</span><span class="n">TcpSocket</span> <span class="o">&amp;</span><span class="p">)</span> <span class="o">=</span> <span class="k">delete</span><span class="p">;</span>
    <span class="c1">// TODO: move ctor/operator</span>
    <span class="o">~</span><span class="n">TcpSocket</span><span class="p">()</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span><span class="n">socket</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="p">)</span> <span class="n">close</span><span class="p">(</span><span class="n">socket</span><span class="p">);</span> <span class="p">}</span>
    <span class="k">operator</span> <span class="kt">int</span><span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="n">socket</span><span class="p">;</span> <span class="p">}</span>
<span class="p">};</span>
</code></pre></div></div>

<p>If we allocate a TcpSocket in an arena, including as a member of another
object, the destructor will never run unless we call it manually. To deal
with this we’ll need to keep track of objects requiring destruction, which
we’ll do with a linked list of destructors, forming a LIFO stack:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">Dtor</span> <span class="p">{</span>
    <span class="n">Dtor</span>     <span class="o">*</span><span class="n">next</span><span class="p">;</span>
    <span class="kt">void</span>     <span class="o">*</span><span class="n">objects</span><span class="p">;</span>
    <span class="kt">ptrdiff_t</span> <span class="n">count</span><span class="p">;</span>
    <span class="kt">void</span>     <span class="p">(</span><span class="o">*</span><span class="n">dtor</span><span class="p">)(</span><span class="kt">void</span> <span class="o">*</span><span class="n">objects</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">count</span><span class="p">);</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Each Dtor points to a homogeneous array, a count (typically one), and a
pointer to a function that knows how to destroy these objects. The linked
list itself is heterogeneous, with dynamic type. The function pointer is
like a kind of type tag. The <code class="language-plaintext highlighter-rouge">dtor</code> functions will be generated using a
template function:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span><span class="o">&lt;</span><span class="k">class</span> <span class="nc">T</span><span class="p">&gt;</span>
<span class="kt">void</span> <span class="nf">destroy</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="n">ptr</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">count</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">T</span> <span class="o">*</span><span class="n">objects</span> <span class="o">=</span> <span class="p">(</span><span class="n">T</span> <span class="o">*</span><span class="p">)</span><span class="n">ptr</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">ptrdiff_t</span> <span class="n">i</span> <span class="o">=</span> <span class="n">count</span><span class="o">-</span><span class="mi">1</span><span class="p">;</span> <span class="n">i</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span><span class="o">--</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">objects</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="o">~</span><span class="n">T</span><span class="p">();</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Notice it destroys end-to-beginning, in reverse order that these objects
would be instantiated by placement <code class="language-plaintext highlighter-rouge">new[]</code>. It’s essentially a placement
<code class="language-plaintext highlighter-rouge">delete[]</code>. An arena initializes with an empty list of Dtors as a new
member:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">Arena</span> <span class="p">{</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">beg</span><span class="p">;</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">end</span><span class="p">;</span>
    <span class="n">Dtor</span> <span class="o">*</span><span class="n">dtors</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

    <span class="c1">// ...</span>

<span class="p">};</span>
</code></pre></div></div>

<p>There are two different ways to construct an arena: over a block of raw
memory (unowned), or from an existing arena to borrow a scratch arena over
its free space. So that’s two constructors:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">Arena</span> <span class="p">{</span>
    <span class="c1">// ...</span>

    <span class="n">Arena</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">mem</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">len</span><span class="p">)</span> <span class="o">:</span> <span class="n">beg</span><span class="p">{</span><span class="n">mem</span><span class="p">},</span> <span class="n">end</span><span class="p">{</span><span class="n">mem</span><span class="o">+</span><span class="n">len</span><span class="p">}</span> <span class="p">{}</span>
    <span class="n">Arena</span><span class="p">(</span><span class="n">Arena</span> <span class="o">&amp;</span><span class="n">a</span><span class="p">)</span> <span class="o">:</span> <span class="n">beg</span><span class="p">{</span><span class="n">a</span><span class="p">.</span><span class="n">beg</span><span class="p">},</span> <span class="n">end</span><span class="p">{</span><span class="n">a</span><span class="p">.</span><span class="n">end</span><span class="p">}</span> <span class="p">{}</span>

    <span class="c1">// ...</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Finally a destructor that pops the Dtor linked list until empty, which
runs the destructors in reverse order when the arena is destroyed:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">Arena</span> <span class="p">{</span>
    <span class="c1">// ...</span>

    <span class="kt">void</span> <span class="k">operator</span><span class="o">=</span><span class="p">(</span><span class="n">Arena</span> <span class="o">&amp;</span><span class="p">)</span> <span class="o">=</span> <span class="k">delete</span><span class="p">;</span>  <span class="c1">// rule of three</span>

    <span class="o">~</span><span class="n">Arena</span><span class="p">()</span>
    <span class="p">{</span>
        <span class="k">while</span> <span class="p">(</span><span class="n">dtors</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">Dtor</span> <span class="o">*</span><span class="n">dead</span> <span class="o">=</span> <span class="n">dtors</span><span class="p">;</span>
            <span class="n">dtors</span> <span class="o">=</span> <span class="n">dead</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">;</span>
            <span class="n">dead</span><span class="o">-&gt;</span><span class="n">dtor</span><span class="p">(</span><span class="n">dead</span><span class="o">-&gt;</span><span class="n">objects</span><span class="p">,</span> <span class="n">dead</span><span class="o">-&gt;</span><span class="n">count</span><span class="p">);</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">};</span>
</code></pre></div></div>

<p>(Note: This should probably use a local variable instead of manipulating
the <code class="language-plaintext highlighter-rouge">dtors</code> member directly. Updates to <code class="language-plaintext highlighter-rouge">dtors</code> are potentially visible to
destructors, inhibiting optimization.) The new, enhanced <code class="language-plaintext highlighter-rouge">alloc</code> building
upon <code class="language-plaintext highlighter-rouge">raw_alloc</code>:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span><span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">&gt;</span>
<span class="n">T</span> <span class="o">*</span><span class="nf">alloc</span><span class="p">(</span><span class="n">Arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">count</span> <span class="o">=</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">__has_trivial_destructor</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="o">||</span> <span class="o">!</span><span class="n">count</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">return</span> <span class="n">raw_alloc</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">count</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="n">Dtor</span> <span class="o">*</span><span class="n">dtor</span>    <span class="o">=</span> <span class="n">raw_alloc</span><span class="o">&lt;</span><span class="n">Dtor</span><span class="o">&gt;</span><span class="p">(</span><span class="n">a</span><span class="p">);</span>  <span class="c1">// allocate first</span>
    <span class="n">T</span>    <span class="o">*</span><span class="n">r</span>       <span class="o">=</span> <span class="n">raw_alloc</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">count</span><span class="p">);</span>
    <span class="n">dtor</span><span class="o">-&gt;</span><span class="n">next</span>    <span class="o">=</span> <span class="n">a</span><span class="o">-&gt;</span><span class="n">dtors</span><span class="p">;</span>
    <span class="n">dtor</span><span class="o">-&gt;</span><span class="n">objects</span> <span class="o">=</span> <span class="n">r</span><span class="p">;</span>
    <span class="n">dtor</span><span class="o">-&gt;</span><span class="n">count</span>   <span class="o">=</span> <span class="n">count</span><span class="p">;</span>
    <span class="n">dtor</span><span class="o">-&gt;</span><span class="n">dtor</span>    <span class="o">=</span> <span class="n">destroy</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">;</span>

    <span class="n">a</span><span class="o">-&gt;</span><span class="n">dtors</span> <span class="o">=</span> <span class="n">dtor</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I’m using the non-standard <code class="language-plaintext highlighter-rouge">__has_trivial_destructor</code> built-in supported
by all major C++ implementations, meaning we still don’t need the C++
standard library, but <a href="https://en.cppreference.com/w/cpp/types/is_destructible.html"><code class="language-plaintext highlighter-rouge">std::is_trivially_destructible</code></a> is the usual
tool here. <a href="https://clang.llvm.org/docs/LanguageExtensions.html#:~:text=__has_trivial_destructor">LLVM is pushing <code class="language-plaintext highlighter-rouge">__is_trivially_destructible</code></a> instead,
but it’s not supported by GCC <a href="https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107600">until GCC 16</a>.</p>

<p>Since it’s so simple to do it, if the count is zero then it doesn’t care
about non-trivial destruction, as there’s nothing to destroy. Things get
more interesting for a non-zero number of non-trivially destructible
objects. First allocate a Dtor, important because failing to allocate it
second would cause a leak (no Dtor entry in place). Then allocate the
array, attach it to the Dtor, attach the Dtor to the arena, registering
the objects for cleanup.</p>

<p>If a constructor throws, placement <code class="language-plaintext highlighter-rouge">new[]</code> will automatically destroy
objects that have been created so far — i.e. the real placement <code class="language-plaintext highlighter-rouge">delete[]</code>
— before returning, so that case was already covered at the start.</p>

<p>With a little more cleverness we could omit the <code class="language-plaintext highlighter-rouge">objects</code> pointer and
discover the array using pointer arithmetic off the Dtor object itself.
That’s tricky (consider alignment), and generally unnecessary, so I didn’t
worry about it. With arenas, allocator overhead is already well below that
of conventional allocation, so slack is plentiful. Chances are we will
also never need an <em>array</em> of non-trivially destructible objects, and so
we could probably omit <code class="language-plaintext highlighter-rouge">count</code>, then write a single-object allocator that
forwards constructor arguments (e.g. a handles to the resource to be
managed). That involves no new concepts, and I leave it as an exercise for
the reader.</p>

<p>With that in place, we could now allocate an array of TcpSockets:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">example</span><span class="p">(</span><span class="n">Arena</span> <span class="n">scratch</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">TcpSocket</span> <span class="o">*</span><span class="n">sockets</span> <span class="o">=</span> <span class="n">alloc</span><span class="o">&lt;</span><span class="n">TcpSocket</span><span class="o">&gt;</span><span class="p">(</span><span class="o">&amp;</span><span class="n">scratch</span><span class="p">,</span> <span class="mi">100</span><span class="p">);</span>
    <span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div></div>

<p>These sockets will all be closed when <code class="language-plaintext highlighter-rouge">example</code> exits via their singular
Dtor entry on <code class="language-plaintext highlighter-rouge">scratch</code>. When calling this <code class="language-plaintext highlighter-rouge">example</code> with an arena:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">caller</span><span class="p">(</span><span class="n">Arena</span> <span class="o">*</span><span class="n">perm</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">example</span><span class="p">(</span><span class="o">*</span><span class="n">perm</span><span class="p">);</span>  <span class="c1">// creates a scratch arena</span>
    <span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This invokes the copy constructor, creating a scratch arena with an empty
<code class="language-plaintext highlighter-rouge">dtors</code> list to be passed into <code class="language-plaintext highlighter-rouge">example</code>. Objects existing in <code class="language-plaintext highlighter-rouge">*perm</code> will
not be destroyed by <code class="language-plaintext highlighter-rouge">example</code> because <code class="language-plaintext highlighter-rouge">dtors</code> isn’t passed in. If we had
passed a <em>pointer to an arena</em>, the Arena constructor isn’t invoked, so
the callee uses the caller’s arena, pushing its Dtors onto the callee’s
list.</p>

<p>In other words, the interface hasn’t changed! That’s the most exciting
part for me. This by-copy, by-pointer interfacing has really grown on me
the past two years.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>More speculations on arenas in C++</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2025/09/30/"/>
    <id>urn:uuid:ffce917f-c757-42e7-a4d1-55e8d80c5051</id>
    <updated>2025-09-30T11:46:16Z</updated>
    <category term="cpp"/>
    <content type="html">
      <![CDATA[<p><em>Update October 2025: <a href="/blog/2025/10/16/">further enhancements</a></em>.</p>

<p>Patrice Roy’s new book, <a href="https://www.packtpub.com/en-us/product/c-memory-management-9781805129806"><em>C++ Memory Management</em></a>, has made me more
conscious of object lifetimes. C++ is stricter than C about lifetimes, and
common, textbook memory management that’s sound in C is less so in C++ —
<em>more than I realized</em>. The book also presents a form of arena allocation
so watered down as to enjoy none of the benefits. (Despite its precision
otherwise, the second half is also littered with <a href="https://github.com/PacktPublishing/C-Plus-Plus-Memory-Management/blob/9e4c4ea7/chapter12/Vector-better.cpp#L45">integer overflows</a>
lacking <a href="/blog/2024/05/24/">the appropriate checks</a>, and near the end has some <a href="https://github.com/PacktPublishing/C-Plus-Plus-Memory-Management/blob/9e4c4ea7/chapter14/Vector_with_allocator_cpp23.cpp#L118-L119">pointer
overflows</a> invalidating the check.) However, I’m grateful for the new
insights, and it’s made me revisit <a href="/blog/2024/04/14/">my own C++ arena allocation</a>. In
this new light I see I got it subtly wrong myself!</p>

<!--more-->

<p>Surprising to most C++ programmers, but not language lawyers, <a href="https://wg21.link/P0593#idiomatic-c-code-as-c">idiomatic C
memory allocation was ill-formed in C++ until recently</a>:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="o">*</span><span class="nf">newint</span><span class="p">(</span><span class="kt">int</span> <span class="n">v</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="o">*</span><span class="n">r</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="p">)</span><span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">r</span><span class="p">));</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">r</span><span class="p">)</span> <span class="p">{</span>
        <span class="o">*</span><span class="n">r</span> <span class="o">=</span> <span class="n">v</span><span class="p">;</span>  <span class="c1">// &lt;-- undefined behavior before C++20</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This program allocates memory for an object but never starts a lifetime.
Assignment without a lifetime is invalid. Pointer casts are that much more
suspicious in C++, and due to lifetime semantics, in many cases indicate
incorrect code. (To be clear, I’m not arguing in favor of these semantics,
but reasoning about the facts on the ground.) C++20 carved out special
exceptions for <code class="language-plaintext highlighter-rouge">malloc</code> and friends, but addressing this kind of thing in
general is the purpose of the brand new <a href="https://en.cppreference.com/w/cpp/memory/start_lifetime_as.html"><code class="language-plaintext highlighter-rouge">start_lifetime_as</code></a> (and
similar), the slightly older <a href="https://en.cppreference.com/w/cpp/memory/construct_at.html"><code class="language-plaintext highlighter-rouge">construct_at</code></a>, or a classic placement
new. They all start lifetimes. The last looks like:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="o">*</span><span class="nf">newint</span><span class="p">(</span><span class="kt">int</span> <span class="n">v</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">void</span> <span class="o">*</span><span class="n">r</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="kt">int</span><span class="p">));</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">r</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">return</span> <span class="k">new</span><span class="p">(</span><span class="n">r</span><span class="p">)</span> <span class="kt">int</span><span class="p">{</span><span class="n">v</span><span class="p">};</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>That’s no good as a C/C++ polyglot, though per the differing old semantics
that was impossible anyway without macros. Which is basically cheating. An
important detail: The corrected version has no casts, and it returns the
result of <code class="language-plaintext highlighter-rouge">new</code>. That’s important because only the pointer returned by
<code class="language-plaintext highlighter-rouge">new</code> is imbued as a pointer to the new lifetime, <em>not</em> <code class="language-plaintext highlighter-rouge">r</code>. There are no
side effects affecting the provenance of <code class="language-plaintext highlighter-rouge">r</code>, which still points to raw
memory as far as the language is concerned.</p>

<p>With that in mind let’s revisit my arena from last time, which does not
necessarily benefit from the recent changes, not being one of the special
case C standard library functions:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">Arena</span> <span class="p">{</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">beg</span><span class="p">;</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">end</span><span class="p">;</span>
<span class="p">};</span>

<span class="k">template</span><span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">&gt;</span>
<span class="n">T</span> <span class="o">*</span><span class="n">alloc</span><span class="p">(</span><span class="n">Arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">count</span> <span class="o">=</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">ptrdiff_t</span> <span class="n">size</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">T</span><span class="p">);</span>
    <span class="kt">ptrdiff_t</span> <span class="n">pad</span>  <span class="o">=</span> <span class="o">-</span><span class="p">(</span><span class="kt">uintptr_t</span><span class="p">)</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">&amp;</span> <span class="p">(</span><span class="k">alignof</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">);</span>
    <span class="n">assert</span><span class="p">(</span><span class="n">count</span> <span class="o">&lt;</span> <span class="p">(</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">end</span> <span class="o">-</span> <span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">-</span> <span class="n">pad</span><span class="p">)</span><span class="o">/</span><span class="n">size</span><span class="p">);</span>  <span class="c1">// OOM policy</span>
    <span class="n">T</span> <span class="o">*</span><span class="n">r</span> <span class="o">=</span> <span class="p">(</span><span class="n">T</span> <span class="o">*</span><span class="p">)(</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">+</span> <span class="n">pad</span><span class="p">);</span>
    <span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">+=</span> <span class="n">pad</span> <span class="o">+</span> <span class="n">count</span><span class="o">*</span><span class="n">size</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">ptrdiff_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">count</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">new</span><span class="p">((</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">r</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="n">T</span><span class="p">{};</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Hey, look, placement new! I did that to produce a nicer interface, but I
lucked out also starting lifetimes appropriately. Except it returns the
wrong pointer. This allocator discards the pointer blessed with the new
lifetime. Both pointers have the same address but different provenance.
That matters. But I’m calling <code class="language-plaintext highlighter-rouge">new</code> many times, so how do I fix this?
Array new, duh.</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span><span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">&gt;</span>
<span class="n">T</span> <span class="o">*</span><span class="nf">alloc</span><span class="p">(</span><span class="n">Arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">count</span> <span class="o">=</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">ptrdiff_t</span> <span class="n">size</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">T</span><span class="p">);</span>
    <span class="kt">ptrdiff_t</span> <span class="n">pad</span>  <span class="o">=</span> <span class="o">-</span><span class="p">(</span><span class="kt">uintptr_t</span><span class="p">)</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">&amp;</span> <span class="p">(</span><span class="k">alignof</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">);</span>
    <span class="n">assert</span><span class="p">(</span><span class="n">count</span> <span class="o">&lt;</span> <span class="p">(</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">end</span> <span class="o">-</span> <span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">-</span> <span class="n">pad</span><span class="p">)</span><span class="o">/</span><span class="n">size</span><span class="p">);</span>  <span class="c1">// OOM policy</span>
    <span class="kt">void</span> <span class="o">*</span><span class="n">r</span> <span class="o">=</span> <span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">+</span> <span class="n">pad</span><span class="p">;</span>
    <span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">+=</span> <span class="n">pad</span> <span class="o">+</span> <span class="n">count</span><span class="o">*</span><span class="n">size</span><span class="p">;</span>
    <span class="k">return</span> <span class="k">new</span><span class="p">(</span><span class="n">r</span><span class="p">)</span> <span class="n">T</span><span class="p">[</span><span class="n">count</span><span class="p">]{};</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Wow… that’s actually much better anyway. No explicit casts, no loop. Why
didn’t I think of this in the first place? The catch is I can’t forward
constructor arguments, emplace-style — the part that gave me the trouble
with perfect forwarding — but that’s for the best. Forwarding more than
once was unsound, made more obvious by <code class="language-plaintext highlighter-rouge">new[]</code>.</p>

<p>Caveat: This only works starting in C++20, and strictly with <code class="language-plaintext highlighter-rouge">operator
new[](size_t, void *)</code>. Any other placement <code class="language-plaintext highlighter-rouge">new[]</code> may require <em>array
overhead</em> — e.g. it prepends an array size so that <code class="language-plaintext highlighter-rouge">delete[]</code> can run
non-trivial destructors — which is unknowable and therefore impossible to
provide or align correctly. Overhead for placement <code class="language-plaintext highlighter-rouge">new[]</code> is nonsense, of
course, but as of this writing, <em>all three major C++ compilers do it</em> and
essentially have broken custom placement <code class="language-plaintext highlighter-rouge">new[]</code>.</p>

<p>Since I’m thinking about lifetimes, what about the other end? My arena
does not call destructors, by design, and starts new lifetimes on top of
objects that are technically still alive. Is that undefined behavior? As
far as I can tell <a href="https://en.cppreference.com/w/cpp/language/lifetime.html#Storage_reuse">this is allowed</a>, even for non-trivial destructors,
with the caveat that it might leak resources. In this case the resource is
memory managed by the arena, so that’s fine of course.</p>

<p>So addressing pointer provenance also produced a nicer definition. What a
great result from reading that book! While researching, I noticed Jonathan
Müller, who personally gave me great advice and feedback on my previous
article, <a href="https://www.youtube.com/watch?v=oZyhq4D-QL4">talked about lifetimes</a> just a couple weeks later. I
recommend both.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Tips for more effective fuzz testing with AFL++</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2025/02/05/"/>
    <id>urn:uuid:eff3b773-99ee-4c38-9f9c-f51294a1b9e0</id>
    <updated>2025-02-05T18:03:55Z</updated>
    <category term="c"/><category term="cpp"/><category term="tutorial"/>
    <content type="html">
      <![CDATA[<p>Fuzz testing is incredibly effective for mechanically discovering software
defects, yet remains underused and neglected. Pick any program that must
gracefully accept complex input, written <em>in any language</em>, which has not
yet been been fuzzed, and fuzz testing usually reveals at least one bug.
At least one program currently installed on your own computer certainly
qualifies. Perhaps even most of them. <a href="https://danluu.com/everything-is-broken/">Everything is broken</a> and
low-hanging fruit is everywhere. After fuzz testing ~1,000 projects <a href="/blog/2019/01/25/">over
the past six years</a>, I’ve accumulated tips for picking that fruit.
The checklist format has worked well in the past (<a href="/blog/2024/12/20/">1</a>, <a href="/blog/2023/01/08/">2</a>), so
I’ll use it again. This article discusses <a href="https://aflplus.plus/">AFL++</a> on source-available
C and C++ targets, running on glibc-based Linux distributions, currently
the <em>indisputable</em> best fuzzing platform for C and C++.</p>

<p>My tips complement the official, upstream documentation, so consult them,
too:</p>

<ul>
  <li><a href="https://afl-1.readthedocs.io/en/latest/tips.html">Performance Tips</a> on the AFL++ website</li>
  <li><a href="https://lcamtuf.coredump.cx/afl/technical_details.txt">Technical “whitepaper” for afl-fuzz</a></li>
</ul>

<p>Even if a program has been fuzz tested, applying the techniques in this
article may reveal defects missed by previous fuzz testing.</p>

<h3 id="1-configure-sanitizers-and-assertions">(1) Configure sanitizers and assertions</h3>

<p>More assertions means more effective fuzzing, and sanitizers are a kind of
automatically-inserted assertions. By default, fuzz with both Address
Sanitizer (ASan) and Undefined Behavior Sanitizer (UBSan):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ afl-gcc-fast -g3 -fsanitize=address,undefined ...
</code></pre></div></div>

<p>ASan’s default configuration is not ideal, and should be adjusted via the
<code class="language-plaintext highlighter-rouge">ASAN_OPTIONS</code> environment variable. If customized at all, AFL++ requires
at least these options:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>export ASAN_OPTIONS="abort_on_error=1:halt_on_error=1:symbolize=0"
</code></pre></div></div>

<p>Except <code class="language-plaintext highlighter-rouge">symbolize=0</code>, <a href="/blog/2022/06/26/">this <em>ought to be</em> the ASan default</a>. When
debugging a discovered crash, you’ll want UBSan set up the same way so
that it behaves under in a debugger. To improve fuzzing, make ASan even
more sensitive to defects by detecting use-after-return bugs. It slows
fuzzing slightly, but it’s well worth the cost:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ASAN_OPTIONS+=":detect_stack_use_after_return=1"
</code></pre></div></div>

<p>By default ASan fills the first 4KiB of fresh allocations with a pattern,
to help detect use-after-free bugs. That’s not nearly enough for fuzzing.
Crank it up to completely fill virtually all allocations with a pattern:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ASAN_OPTIONS+=":max_malloc_fill_size=$((1&lt;&lt;30))"
</code></pre></div></div>

<p>In the default configuration, if a program allocates more than 4KiB with
<code class="language-plaintext highlighter-rouge">malloc</code> then, say, uses <code class="language-plaintext highlighter-rouge">strlen</code> on the uninitialized memory, no bug will
be detected. There’s almost certainly a zero somewhere after 4KiB. Until I
noticed it, the 4KiB limit hid a number of bugs from my fuzz testing. Per
(4), fulling filling allocations with a pattern better isolates tests when
using persistent mode.</p>

<p>When fuzzing C++ and linking GCC’s libstdc++, consider <code class="language-plaintext highlighter-rouge">-D_GLIBCXX_DEBUG</code>.
ASan cannot “see” out-of-bounds accesses within a container’s capacity,
and the extra assertions fill in the gaps. Mind that it changes the ABI,
though fuzz testing will instantly highlight such mismatches.</p>

<h3 id="2-prefer-the-persistent-mode">(2) Prefer the persistent mode</h3>

<p>While AFL++ can fuzz many programs in-place without writing a single line
of code (<code class="language-plaintext highlighter-rouge">afl-gcc</code>, <code class="language-plaintext highlighter-rouge">afl-clang</code>), prefer AFL++’s <a href="https://github.com/AFLplusplus/AFLplusplus/blob/stable/instrumentation/README.persistent_mode.md">persistent mode</a>
(<code class="language-plaintext highlighter-rouge">afl-gcc-fast</code>, <code class="language-plaintext highlighter-rouge">afl-clang-fast</code>). It’s typically an order of magnitude
faster and worth the effort. Though it also has pitfalls (see (4), (5)). I
keep a file on hand, <code class="language-plaintext highlighter-rouge">fuzztmpl.c</code> — the progenitor of all my fuzz testers:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;unistd.h&gt;</span><span class="cp">
</span>
<span class="n">__AFL_FUZZ_INIT</span><span class="p">();</span>

<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">__AFL_INIT</span><span class="p">();</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">src</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">buf</span> <span class="o">=</span> <span class="n">__AFL_FUZZ_TESTCASE_BUF</span><span class="p">;</span>
    <span class="k">while</span> <span class="p">(</span><span class="n">__AFL_LOOP</span><span class="p">(</span><span class="mi">10000</span><span class="p">))</span> <span class="p">{</span>
        <span class="kt">int</span> <span class="n">len</span> <span class="o">=</span> <span class="n">__AFL_FUZZ_TESTCASE_LEN</span><span class="p">;</span>
        <span class="n">src</span> <span class="o">=</span> <span class="n">realloc</span><span class="p">(</span><span class="n">src</span><span class="p">,</span> <span class="n">len</span><span class="p">);</span>
        <span class="n">memcpy</span><span class="p">(</span><span class="n">src</span><span class="p">,</span> <span class="n">buf</span><span class="p">,</span> <span class="n">len</span><span class="p">);</span>
        <span class="c1">// ... send src to target ...</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I <a href="https://vimhelp.org/insert.txt.html#%3Aread"><code class="language-plaintext highlighter-rouge">:r</code></a> this into my Vim buffer, then modify as needed. It’s a
stripped and improved version of the official template, which itself has a
serious flaw (see (5)). There are unstated constraints about the position
of <code class="language-plaintext highlighter-rouge">buf</code> and <code class="language-plaintext highlighter-rouge">len</code> in the code, so if in doubt, refer to the original
template.</p>

<h3 id="3-include-source-files-not-header-files">(3) Include source files, not header files</h3>

<p>We’re well into the 21st century. Nobody is compiling software on 16-bit
machines anymore. Don’t get hung up on the one translation unit (TU) per
source file mindset. When fuzz testing, we need at most two TUs: One TU
for instrumented code and one TU for uninstrumented code. In most cases
the latter takes the form of a library (libc, libstdc++, etc.) and we
don’t need to think about it.</p>

<p>Fuzz testing typically requires only a subset of the program. Including
just those sources straight in the template is both effective and simple.
In my template I put includes just <em>above</em> <code class="language-plaintext highlighter-rouge">unistd.h</code> so that the header
isn’t visible to the sources unless they include it themselves.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">"src/utils.c"</span><span class="cp">
#include</span> <span class="cpf">"src/parser.c"</span><span class="cp">
#include</span> <span class="cpf">&lt;unistd.h&gt;</span><span class="cp">
</span></code></pre></div></div>

<p>I know, if you’ve never seen this before it looks bonkers. This isn’t what
they taught you in college. Trust me, <a href="https://en.wikipedia.org/wiki/Unity_build">this simple technique</a> will
save you a thousand lines of build configuration. Otherwise you’ll need to
manage different object files between fuzz testing and otherwise.</p>

<p>Perhaps more importantly, you can now fuzz test <em>any arbitrary function</em>
in the program, including static functions! They’re all right there in the
same TU. You’re not limited to public-facing interfaces. Perhaps you can
skip (7) and test against a better internal interface. It also gives you
direct access to static variables so that you can clear/reset them between
tests, per (4).</p>

<p>Programs are often not designed for fuzz testing, or testing generally,
and it may be difficult to tease apart tightly-coupled components. Many of
the programs I’ve fuzz tested look like this. This technique lets you take
a hacksaw to the program and substitute troublesome symbols just for fuzz
testing without modifying a single original source line. For example, if
the source I’m testing contains a <code class="language-plaintext highlighter-rouge">main</code> function, I can remove it:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define main oldmain
#  include "src/utils.c"
#  include "src/parser.c"
#undef main
#include</span> <span class="cpf">&lt;unistd.h&gt;</span><span class="cp">
</span></code></pre></div></div>

<p>Sure, better to improve the program so that such hacks are unnecessary,
but most cases I’m fuzz testing as part of a drive-by review of some open
source project. It allows me to quickly discover defects in the original,
unmodified program, and produces simpler bug reports like, “Compile with
ASan, open this 50-byte file, and then the program will crash.”</p>

<h3 id="4-isolate-fuzz-tests-from-each-other">(4) Isolate fuzz tests from each other</h3>

<p>Tests should be unaffected by previous tests. This is challenging in
persistent mode, sometimes even impractical. That means resetting all
global state, even something like the internal <code class="language-plaintext highlighter-rouge">strtok</code> buffer if that
function is used. Add fuzz testing to your list of reasons to eschew
global variables.</p>

<p>It’s mitigated by (1), but otherwise uninitialized heap memory may hold
contents from previous tests, breaking isolation. Besides interference
with fuzzing instrumentation, bugs found this way are wickedly difficult
to reproduce.</p>

<p>Don’t pass uninitialized memory into a test, e.g. an output parameter
allocated on the stack. Zero-initialize or fill it with a pattern. If it
accepts an arena, fill it with a pattern before each test.</p>

<p>Typically you have little control over heap addresses, which likely varies
across tests and depends on the behavior previous tests. If the program
<a href="/blog/2025/01/19/#hash-hardening-bonus">depends on address values</a>, this may affect the results and make
reproduction difficult, so watch for that.</p>

<h3 id="5-do-not-test-directly-on-the-fuzz-test-buffer">(5) Do not test directly on the fuzz test buffer</h3>

<p>Passing <code class="language-plaintext highlighter-rouge">buf</code> and <code class="language-plaintext highlighter-rouge">len</code> straight into the target is the most common
mistake, especially when fuzzing better-designed C programs, and
particularly because the official template encourages it.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">myprogram</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="n">len</span><span class="p">);</span>  <span class="c1">// BAD!</span>
</code></pre></div></div>

<p>While it’s a great sign the program doesn’t depend on null termination, it
creates a subtle trap. The underlying buffer allocated by AFL++ is larger
than <code class="language-plaintext highlighter-rouge">len</code>, and ASan will not detect read overflows on inputs! Instead
pass a copy sized to fit, which is the purpose of <code class="language-plaintext highlighter-rouge">src</code> in my template.
Adjust the type of <code class="language-plaintext highlighter-rouge">src</code> as needed.</p>

<p>If the program expects null-terminated input then you’ll need to do this
anyway in order to append the null byte. If it accepts an “owning” type
like <code class="language-plaintext highlighter-rouge">std::string</code>, then it’s also already done on your behalf. With
“non-owning” views like <code class="language-plaintext highlighter-rouge">std::string_view</code> you’ll still want to your own
size-fit copy.</p>

<p>If you see a program’s checked in fuzz test using <code class="language-plaintext highlighter-rouge">buf</code> directly, make
this change and see if anything new pops out. It’s worked for me on a
number of occasions.</p>

<h3 id="6-dont-bother-freeing-memory">(6) Don’t bother freeing memory</h3>

<p>In general, avoid doing work irrelevant to the fuzz test. The official
tips say to “use a simpler target” and “instrument just what you need,”
and keeping destructors out of the tests helps in both cases. Unless the
program is especially memory-hungry, you won’t run out of memory before
AFL++ resets the target process.</p>

<p>If not for (1), it also helps with isolation (4), as different tests are
less likely contaminated with uninitialized memory from previous tests.</p>

<p>As an exception, if you want your destructor included in the fuzz test,
then use it in the test. Also, it’s easy to exhaust non-memory resources,
particularly file descriptors, and you may need to <a href="https://man7.org/linux/man-pages/man2/close_range.2.html">clean those up</a>
in order to fuzz test reliably.</p>

<p>Of course, if the target uses <a href="/blog/2023/09/27/">arena allocation</a> then none of this
matters! It also makes for perfect isolation, as even addresses won’t vary
between tests.</p>

<h3 id="7-use-a-memory-file-descriptor-to-back-named-paths">(7) Use a memory file descriptor to back named paths</h3>

<p>Many interfaces are, shall we say, <em>not so well-designed</em> and only accept
input from a named file system path, insisting on opening and reading the
file themselves. Testing such interfaces presents challenges, especially
if you’re interested in parallel fuzzing. Fortunately there’s usually an
easy out: Create a memory file descriptor and use its <code class="language-plaintext highlighter-rouge">/proc</code> name.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">memfd_create</span><span class="p">(</span><span class="s">"fuzz"</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="n">assert</span><span class="p">(</span><span class="n">fd</span> <span class="o">==</span> <span class="mi">3</span><span class="p">);</span>
<span class="k">while</span> <span class="p">(...)</span> <span class="p">{</span>
    <span class="c1">// ...</span>
    <span class="n">ftruncate</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
    <span class="n">pwrite</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">buf</span><span class="p">,</span> <span class="n">len</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
    <span class="n">myprogram</span><span class="p">(</span><span class="s">"/proc/self/fd/3"</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>With standard input as 0, output as 1, and error as 2, I’ve assumed the
memory file descriptor will land on 3, which makes the test code a little
simpler. If it’s not 3 then something’s probably gone wrong anyway, and
aborting is the best option. If you don’t want to assume, use <code class="language-plaintext highlighter-rouge">snprintf</code>
or whatever to construct the path name from <code class="language-plaintext highlighter-rouge">fd</code>.</p>

<p>Using <code class="language-plaintext highlighter-rouge">pwrite</code> (instead of <code class="language-plaintext highlighter-rouge">write</code>) leaves the file description offset at
the beginning of the file.</p>

<p>Thanks to the memory file descriptor, fuzz test data doesn’t land in
permanent storage, so less wear and tear on your SSD from the occasional
flush. Because of <code class="language-plaintext highlighter-rouge">/proc</code>, the file is unique to the process despite the
common path name, so no problems parallel fuzzing. No cleanup needed,
either.</p>

<p>If the program wants a file descriptor — i.e. it wants a socket because
you’re fuzzing some internal function — pass the file descriptor directly:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">myprogram</span><span class="p">(</span><span class="n">fd</span><span class="p">);</span>
</code></pre></div></div>

<p>If it accepts a <code class="language-plaintext highlighter-rouge">FILE *</code>, you <em>could</em> <code class="language-plaintext highlighter-rouge">fopen</code> the <code class="language-plaintext highlighter-rouge">/proc</code> path, but better
to use <code class="language-plaintext highlighter-rouge">fdmemopen</code> to create a <code class="language-plaintext highlighter-rouge">FILE *</code> on the object:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">myprogram</span><span class="p">(</span><span class="n">fdmemopen</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="n">len</span><span class="p">,</span> <span class="s">"rb"</span><span class="p">));</span>
</code></pre></div></div>

<p>Note how, per (6), we don’t need to bother with <code class="language-plaintext highlighter-rouge">fclose</code> because it’s not
associated with a file descriptor.</p>

<h3 id="8-configure-the-target-for-smaller-buffers">(8) Configure the target for smaller buffers</h3>

<p>A common sight in <a href="http://catb.org/jargon/html/C/C-Programmers-Disease.html">diseased programs</a> are “generous” fixed buffer
sizes:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define MY_MAX_BUFFER_LENGTH 65536
</span>
<span class="kt">void</span> <span class="nf">example</span><span class="p">(...)</span>
<span class="p">{</span>
    <span class="kt">char</span> <span class="n">path</span><span class="p">[</span><span class="n">PATH_MAX</span><span class="p">];</span>  <span class="c1">// typically 4,096</span>
    <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="n">MY_MAX_BUFFER_LENGTH</span><span class="p">];</span>
    <span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div></div>

<p>These huge buffers tend to hide bugs. Turn those stones over! It takes a
lot of fuzzing time to max them out and excite the unhappy paths — or the
super-unhappy paths, overflows. Better if the fuzz test can reach worst
case conditions quickly and explore the execution paths out of it.</p>

<p>So when you see these, cut them way down, possibly using (3). Change 65536
to, say, 16 and see what happens. If fuzzing finds a crash on the short
buffer, typically extending the input to crash on the original buffer size
is straightforward, e.g. repeat one of the bytes even more than it already
repeats.</p>

<h3 id="conclusion-and-samples">Conclusion and samples</h3>

<p>Hopefully something here will help you catch a defect that would have
otherwise gone unnoticed. Even better, perhaps awareness of these fuzzing
techniques will prevent the bug in the first place. Thanks to my template,
some solid tooling, and the know-how in this article, I can whip up a fuzz
test in a couple of minutes. But that ease means I discard it as just as
casually, and so I don’t take time to capture and catalog most. If you’d
like to see some samples, <a href="https://old.reddit.com/r/C_Programming/comments/15wouat/_/jx2ld4a/">I do have an old, short list</a>. Perhaps
after another kiloproject of fuzz testing I’ll pick up more techniques.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>Rules to avoid common extended inline assembly mistakes</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2024/12/20/"/>
    <id>urn:uuid:594e546f-15c7-4834-bece-9c9f24122a01</id>
    <updated>2024-12-20T19:46:48Z</updated>
    <category term="c"/><category term="cpp"/>
    <content type="html">
      <![CDATA[<p>GCC and Clang inline assembly is an interface between high and low level
programming languages. It is subtle and treacherous. Many are ensnared in
its traps, usually unknowingly. As such, the <code class="language-plaintext highlighter-rouge">asm</code> keyword is essentially
the <code class="language-plaintext highlighter-rouge">unsafe</code> keyword of C and C++. Nearly every inline assembly tutorial,
including <a href="https://web.archive.org/web/20241216071150/https://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html">the awful ibilio page</a> at the top of search engines for
decades, propagate fundamental, serious mistakes, and <em>most examples are
incorrect</em>. The dangerous part is that the examples <em>usually</em> produce the
expected results! The situation is dire. This article isn’t a tutorial,
but basic rules to avoid the most common mistakes, or to spot them in code
review.</p>

<p><strong>The focus is entirely <em>extended assembly</em>, and not <em>basic assembly</em></strong>,
which has different rules. The former is any inline assembly statement
with constraints or clobbers. That is, there’s a colon <code class="language-plaintext highlighter-rouge">:</code> token between
the <code class="language-plaintext highlighter-rouge">asm</code> parenthesis. Basic assembly is blunt and has fewer uses, mostly
at the top level or in <a href="/blog/2023/03/23/">“naked” functions</a>, making misuse less
likely.</p>

<h3 id="1-avoid-inline-assembly-if-possible">(1) Avoid inline assembly if possible</h3>

<p>Because it’s so treacherous, the first rule is to avoid it if at all
possible. Modern compilers are loaded with intrinsics and built-ins that
replace nearly all the old inline assembly use cases. They allow access to
low level features from the high level language. No need to bridge the gap
between low and high yourself when there’s an intrinsic.</p>

<p>Compilers do not have built-ins for system calls, and occasionally <a href="/blog/2024/01/28/">lack a
useful intrinsic</a>. Other times you might be building <a href="https://github.com/skeeto/scratch/blob/fbd3260e/misc/buddy.c#L594-#L616">foundational
infrastructure</a>. These remaining cases are mostly about interacting
with external interfaces, not optimization nor performance.</p>

<h3 id="2-it-should-nearly-always-be-volatile">(2) It should nearly always be volatile</h3>

<p>Falling right out of rule (1), the remaining inline assembly cases nearly
always have side effects beyond output constraints. That includes memory
accesses, and it certainly includes system calls. Because of this, inline
assembly should usually have the <code class="language-plaintext highlighter-rouge">volatile</code> qualifier.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">asm</span> <span class="nf">volatile</span> <span class="p">(</span> <span class="p">...</span> <span class="p">);</span>
</code></pre></div></div>

<p>This prevents compilers from eliding or re-ordering the assembly. As a
special rule, inline assembly lacking output constraints is implicitly
volatile. Despite this, <em>please use <code class="language-plaintext highlighter-rouge">volatile</code> anyway!</em> When I do not see
<code class="language-plaintext highlighter-rouge">volatile</code> it’s likely a defect. Stopping to consider if it’s this special
case slows understanding and impedes code review.</p>

<p>Tutorials often use <code class="language-plaintext highlighter-rouge">__volatile__</code>. Do not do this. It is an ancient alias
keyword to support pre-standard compilers lacking the <code class="language-plaintext highlighter-rouge">volatile</code> keyword.
This is not your situation. When I see <code class="language-plaintext highlighter-rouge">__volatile__</code> it likely means you
copy-pasted the inline assembly from somewhere without understanding it.
It’s a red flag that draws my attention for even more careful review.</p>

<p>Side note: <code class="language-plaintext highlighter-rouge">__asm</code> or <code class="language-plaintext highlighter-rouge">__asm__</code> is fine, and even required in some cases
(e.g. <code class="language-plaintext highlighter-rouge">-std=cXX</code>). I usually write it <code class="language-plaintext highlighter-rouge">asm</code>.</p>

<h3 id="3-it-probably-needs-a-memory-clobber">(3) It probably needs a memory clobber</h3>

<p>The <code class="language-plaintext highlighter-rouge">"memory"</code> clobber is orthogonal to <code class="language-plaintext highlighter-rouge">volatile</code>, each serving different
purposes. It’s less often needed than <code class="language-plaintext highlighter-rouge">volatile</code>, but typical remaining
inline assembly cases require it. If memory is accessed in any way while
executing the assembly, you need a memory clobber. This includes most
system calls, and definitely a generic <code class="language-plaintext highlighter-rouge">syscall</code> wrapper.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">asm</span> <span class="nf">volatile</span> <span class="p">(...</span> <span class="o">:</span> <span class="s">"memory"</span><span class="p">);</span>
</code></pre></div></div>

<p>In code review, if you do not see a <code class="language-plaintext highlighter-rouge">"memory"</code> clobber, give it extra
scrutiny. It’s probably missing. If it’s truly unnecessary, I suggest
documenting such in a comment so that reviewers know the omission is
considered and intentional.</p>

<p>The constraint prevents compilers from re-ordering loads and stores around
the assembly. It would be disastrous, for example, if a <code class="language-plaintext highlighter-rouge">write(2)</code> system
call occurred before the program populated the output buffer! In this
case, <code class="language-plaintext highlighter-rouge">volatile</code> would prevent followup <code class="language-plaintext highlighter-rouge">write(2)</code> from being optimized
out while <code class="language-plaintext highlighter-rouge">"memory"</code> forces memory stores to occur before the system call.</p>

<h3 id="4-never-modify-input-constraints">(4) Never modify input constraints</h3>

<p>It’s easy not to modify inputs, so this is mostly about ignorance, but
this rule is broken with shocking frequency. Most of the time you can get
away with it, right up until certain configurations have a heisenbug. In
most cases this can be fixed by changing an input into read-write output
constraint with <code class="language-plaintext highlighter-rouge">"+"</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">asm</span> <span class="nf">volatile</span> <span class="p">(</span><span class="s">"..."</span> <span class="o">::</span> <span class="s">"r"</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="o">:</span> <span class="p">...);</span>  <span class="c1">// before</span>
<span class="n">asm</span> <span class="nf">volatile</span> <span class="p">(</span><span class="s">"..."</span> <span class="o">:</span> <span class="s">"+r"</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="o">:</span> <span class="p">...);</span>  <span class="c1">// after</span>
</code></pre></div></div>

<p>If you hadn’t been using <code class="language-plaintext highlighter-rouge">volatile</code> (in violation of rule 2) then now
suddenly you’d need it because there’s an output constraint. This happens
often.</p>

<h3 id="5-never-call-functions-from-inline-assembly">(5) Never call functions from inline assembly</h3>

<p>Many things can go wrong because the semantics cannot be expressed using
inline assembly constraints. The stack may not be aligned, and you’ll
clobber the redzone. (Yes, there’s a <code class="language-plaintext highlighter-rouge">"redzone"</code> constraint, but its
insufficient to actually make a function call.) Do not do it. Tutorials
like to show it because it makes for a simple demonstration, but all those
examples are littered with defects.</p>

<p>System calls are fine. Basic assembly may call functions when used outside
of non-naked functions. The <code class="language-plaintext highlighter-rouge">goto</code> qualifier, used correctly, allows jumps
to be safely expressed to the compiler. Just don’t use <code class="language-plaintext highlighter-rouge">call</code> in extended
assembly.</p>

<h3 id="6-do-not-define-absolute-assembly-labels">(6) Do not define absolute assembly labels</h3>

<p>That is, if you need to jump within your assembly block, such as for a
loop, do not write a named label:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>myloop:
    ...
    jz myloop
</code></pre></div></div>

<p>Your inline assembly is part of a function, and that function may be
cloned or inlined, in which case there will be <em>multiple copies of your
assembly block</em> in the translation unit. The assembler will see duplicate
label names and reject the program. Until that function is inlined,
perhaps at a high optimization level, this will likely work as expected.
On the plus side it’s a loud compile time error when it doesn’t work.</p>

<p>In inline assembly you can have the compiler generate a unique label with
<code class="language-plaintext highlighter-rouge">%=</code>, but my preferred solution is the <a href="https://sourceware.org/binutils/docs/as/Symbol-Names.html">local labels</a> feature of the
assembler:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0:
    ...
    jz 0b
</code></pre></div></div>

<p>In this case the assembler generates unique labels, and the number <code class="language-plaintext highlighter-rouge">0</code>
isn’t the literal label name. <code class="language-plaintext highlighter-rouge">0b</code> (“backward”) refers to the previous <code class="language-plaintext highlighter-rouge">0</code>
label, and <code class="language-plaintext highlighter-rouge">0f</code> (“forward”) would refer to the next <code class="language-plaintext highlighter-rouge">0</code> label. Perfectly
unambiguous.</p>

<h3 id="naturally-occurring-practice-problems">Naturally occurring practice problems</h3>

<p>Now that you’ve made it this far, here’s an exercise for practice: Search
online for “inline assembly tutorial” and count the defects you find by
applying my 6 rules. You’ll likely find at least one per result that isn’t
<a href="https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html">official compiler documentation</a>. Besides tutorials and reviewing
real programs, you could <a href="/blog/2024/11/10/">ask an LLM to generate inline assembly</a>, as
they’ve been been trained to produce these common defects.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  <entry>
    <title>Slim Reader/Writer Locks are neato</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2024/10/03/"/>
    <id>urn:uuid:0bbd925e-c012-4711-b513-b34cd0357bfa</id>
    <updated>2024-10-03T22:40:13Z</updated>
    <category term="win32"/><category term="cpp"/>
    <content type="html">
      <![CDATA[<p>I’m 18 years late, but <a href="https://learn.microsoft.com/en-us/windows/win32/sync/slim-reader-writer--srw--locks">Slim Reader/Writer Locks</a> have a fantastic
interface: pointer-sized (“slim”), zero-initialized, and non-allocating.
Lacking cleanup, they compose naturally with <a href="/blog/2023/09/27/">arena allocation</a>.
Sounds like a futex? That’s because they’re built on futexes introduced at
the same time. They’re also complemented by <a href="https://learn.microsoft.com/en-us/windows/win32/sync/condition-variables">condition variables</a>
with the same desirable properties. My only quibble is that slim locks
<a href="/blog/2022/10/05/">could easily have been 32-bit objects</a>, but it hardly matters. This
article, while treating <a href="/blog/2023/05/31/">Win32 as a foreign interface</a>, discusses a
paper-thin C++ wrapper interface around lock and condition variables, in
<a href="/blog/2024/04/14/">my own style</a>.</p>

<p>If you’d like to see/try a complete, working demonstration before diving
into the details: <a href="https://gist.github.com/skeeto/42adc0c90a156d4457422e034be697e8"><code class="language-plaintext highlighter-rouge">demo.cpp</code></a>. We’re going to build this from the
ground up, so let’s establish a few primitive integer definitions:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">using</span> <span class="n">b32</span> <span class="o">=</span> <span class="kt">signed</span><span class="p">;</span>
<span class="k">using</span> <span class="n">i32</span> <span class="o">=</span> <span class="kt">signed</span><span class="p">;</span>
<span class="k">using</span> <span class="n">uz</span>  <span class="o">=</span> <span class="k">decltype</span><span class="p">(</span><span class="mi">0u</span><span class="n">z</span><span class="p">);</span>
</code></pre></div></div>

<p>Think of <code class="language-plaintext highlighter-rouge">uz</code> as like <code class="language-plaintext highlighter-rouge">uintptr_t</code>. This implementation will support both
32-bit and 64-bit targets, and we’ll need it as the basis for locks and
condition variables:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">enum</span> <span class="n">Lock</span> <span class="o">:</span> <span class="n">uz</span><span class="p">;</span>
<span class="k">enum</span> <span class="n">Cond</span> <span class="o">:</span> <span class="n">uz</span><span class="p">;</span>
</code></pre></div></div>

<p>Opaque enums provide additional type safety: They have the properties of
an integer, including trivial destruction, but are distinct types which
compilers forbid mixing with other integers. We can’t, say, accidentally
cross condition variable and lock parameters — my main concern. Aside from
zero-initialization, we do not actually care about the values of these
variables, so enumerators are unnecessary. (Caveat: GDB cannot display
opaque enums, which is slightly irritating.)</p>

<p>The documentation doesn’t explicitly mention zero initialization, but the
official <code class="language-plaintext highlighter-rouge">*_INIT</code> constants are defined as zero. That locks in zero at the
ABI level, so we can count on it.</p>

<p>All the functions we’ll need are exported by <code class="language-plaintext highlighter-rouge">kernel32.dll</code>. Locks have
two variations on lock/unlock: “exclusive” (write) and “shared” (read).
There are also “try” versions, but I won’t be using them.</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define W32(r, p) extern "C" __declspec(dllimport) r __stdcall p noexcept
</span><span class="n">W32</span><span class="p">(</span><span class="kt">void</span><span class="p">,</span> <span class="n">AcquireSRWLockExclusive</span><span class="p">(</span><span class="n">Lock</span> <span class="o">*</span><span class="p">));</span>
<span class="n">W32</span><span class="p">(</span><span class="kt">void</span><span class="p">,</span> <span class="n">AcquireSRWLockShared</span><span class="p">(</span><span class="n">Lock</span> <span class="o">*</span><span class="p">));</span>
<span class="n">W32</span><span class="p">(</span><span class="kt">void</span><span class="p">,</span> <span class="n">ReleaseSRWLockExclusive</span><span class="p">(</span><span class="n">Lock</span> <span class="o">*</span><span class="p">));</span>
<span class="n">W32</span><span class="p">(</span><span class="kt">void</span><span class="p">,</span> <span class="n">ReleaseSRWLockShared</span><span class="p">(</span><span class="n">Lock</span> <span class="o">*</span><span class="p">));</span>
</code></pre></div></div>

<p>Declaring Win32 functions in C++ is a mouthful, and everything must be
written in just the right order, but it’s mostly tucked away in a macro.
Usually there’s a stack discipline to these locks, so an RAII scoped guard
is in order:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">Guard</span> <span class="p">{</span>
    <span class="n">Lock</span> <span class="o">*</span><span class="n">l</span><span class="p">;</span>
    <span class="n">Guard</span><span class="p">(</span><span class="n">Lock</span> <span class="o">*</span><span class="n">l</span><span class="p">)</span> <span class="o">:</span> <span class="n">l</span><span class="p">{</span><span class="n">l</span><span class="p">}</span> <span class="p">{</span> <span class="n">AcquireSRWLockExclusive</span><span class="p">(</span><span class="n">l</span><span class="p">);</span> <span class="p">}</span>
    <span class="o">~</span><span class="n">Guard</span><span class="p">()</span>              <span class="p">{</span> <span class="n">ReleaseSRWLockExclusive</span><span class="p">(</span><span class="n">l</span><span class="p">);</span> <span class="p">}</span>
<span class="p">};</span>

<span class="k">struct</span> <span class="nc">RGuard</span> <span class="p">{</span>
    <span class="n">Lock</span> <span class="o">*</span><span class="n">l</span><span class="p">;</span>
    <span class="n">RGuard</span><span class="p">(</span><span class="n">Lock</span> <span class="o">*</span><span class="n">l</span><span class="p">)</span> <span class="o">:</span> <span class="n">l</span><span class="p">{</span><span class="n">l</span><span class="p">}</span> <span class="p">{</span> <span class="n">AcquireSRWLockShared</span><span class="p">(</span><span class="n">l</span><span class="p">);</span> <span class="p">}</span>
    <span class="o">~</span><span class="n">RGuard</span><span class="p">()</span>              <span class="p">{</span> <span class="n">ReleaseSRWLockShared</span><span class="p">(</span><span class="n">l</span><span class="p">);</span> <span class="p">}</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Dead simple. (What about <a href="https://en.cppreference.com/w/cpp/language/rule_of_three">rule of three</a>? Instead of working around
this language design flaw, <a href="https://quuxplusone.github.io/blog/2023/05/05/deprecated-copy-with-dtor/">reach into the distant future</a> where
it’s been fixed: <code class="language-plaintext highlighter-rouge">-Werror=deprecated-copy-dtor</code>.) Usage might look like:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">Example</span> <span class="p">{</span>
    <span class="n">Lock</span> <span class="n">lock</span> <span class="o">=</span> <span class="p">{};</span>
    <span class="n">i32</span>  <span class="n">value</span><span class="p">;</span>
<span class="p">};</span>

<span class="n">i32</span> <span class="n">incr</span><span class="p">(</span><span class="n">Example</span> <span class="o">*</span><span class="n">e</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">Guard</span> <span class="n">g</span><span class="p">(</span><span class="o">&amp;</span><span class="n">e</span><span class="o">-&gt;</span><span class="n">lock</span><span class="p">);</span>
    <span class="k">return</span> <span class="o">++</span><span class="n">e</span><span class="o">-&gt;</span><span class="n">value</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Note the <code class="language-plaintext highlighter-rouge">= {}</code> to guarantee the lock is always ready for use. It gets
more interesting with condition variables in the mix. That’s three more
functions:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">W32</span><span class="p">(</span><span class="n">b32</span><span class="p">,</span>  <span class="n">SleepConditionVariableSRW</span><span class="p">(</span><span class="n">Cond</span> <span class="o">*</span><span class="p">,</span> <span class="n">Lock</span> <span class="o">*</span><span class="p">,</span> <span class="n">i32</span><span class="p">,</span> <span class="n">b32</span><span class="p">));</span>
<span class="n">W32</span><span class="p">(</span><span class="kt">void</span><span class="p">,</span> <span class="n">WakeAllConditionVariable</span><span class="p">(</span><span class="n">Cond</span> <span class="o">*</span><span class="p">));</span>
<span class="n">W32</span><span class="p">(</span><span class="kt">void</span><span class="p">,</span> <span class="n">WakeConditionVariable</span><span class="p">(</span><span class="n">Cond</span> <span class="o">*</span><span class="p">));</span>
</code></pre></div></div>

<p>The last parameter on <a href="https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-sleepconditionvariablesrw">SleepConditionVariableSRW</a> indicates if the
lock was acquired shared. Why do locks have distinct acquire and release
functions while condition variables use a flag for the same purpose? Beats
me. I’ll unfold it into two functions, selected by type, with a default
infinite wait:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">b32</span> <span class="nf">wait</span><span class="p">(</span><span class="n">Cond</span> <span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="n">Guard</span> <span class="o">*</span><span class="n">g</span><span class="p">,</span> <span class="n">i32</span> <span class="n">ms</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="n">SleepConditionVariableSRW</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">g</span><span class="o">-&gt;</span><span class="n">l</span><span class="p">,</span> <span class="n">ms</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>

<span class="n">b32</span> <span class="n">wait</span><span class="p">(</span><span class="n">Cond</span> <span class="o">*</span><span class="n">c</span><span class="p">,</span> <span class="n">RGuard</span> <span class="o">*</span><span class="n">g</span><span class="p">,</span> <span class="n">i32</span> <span class="n">ms</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="n">SleepConditionVariableSRW</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">g</span><span class="o">-&gt;</span><span class="n">l</span><span class="p">,</span> <span class="n">ms</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Usage might look like:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="p">(</span><span class="n">RGuard</span> <span class="nf">g</span><span class="p">(</span><span class="o">&amp;</span><span class="n">lock</span><span class="p">);</span> <span class="n">remaining</span><span class="p">;)</span> <span class="p">{</span>
    <span class="n">wait</span><span class="p">(</span><span class="o">&amp;</span><span class="n">done</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">g</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The other side is nothing more than a rename (but could also be
<a href="/blog/2023/08/27/">accomplished through linking</a>):</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">signal</span><span class="p">(</span><span class="n">Cond</span> <span class="o">*</span><span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">WakeConditionVariable</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="n">broadcast</span><span class="p">(</span><span class="n">Cond</span> <span class="o">*</span><span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">WakeAllConditionVariable</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And a couple examples of its usage:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="p">(</span><span class="n">Guard</span> <span class="nf">g</span><span class="p">(</span><span class="o">&amp;</span><span class="n">lock</span><span class="p">);</span> <span class="o">!--</span><span class="n">remaining</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">signal</span><span class="p">(</span><span class="o">&amp;</span><span class="n">done</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// Or:</span>

<span class="n">Guard</span> <span class="n">g</span><span class="p">(</span><span class="o">&amp;</span><span class="n">lock</span><span class="p">);</span>
<span class="n">ready</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
<span class="n">broadcast</span><span class="p">(</span><span class="o">&amp;</span><span class="n">init</span><span class="p">);</span>
<span class="k">while</span> <span class="p">(</span><span class="n">remaining</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">wait</span><span class="p">(</span><span class="o">&amp;</span><span class="n">done</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">g</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>A satisfying, powerful synchronization interface with hardly any code!</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Giving C++ std::regex a C makeover</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2024/09/04/"/>
    <id>urn:uuid:83fb81ed-290e-4bc7-87bd-d0bbc6c01d25</id>
    <updated>2024-09-04T17:15:07Z</updated>
    <category term="c"/><category term="cpp"/><category term="win32"/>
    <content type="html">
      <![CDATA[<p>Suppose you’re working in C using one of the major toolchains — that is,
it’s mainly a C++ implementation — and you need regular expressions. You
could integrate a library, but there’s a regex implementation in the C++
standard library included with your compiler, just within reach. As a
resourceful engineer, using an asset already in hand seems prudent. But
it’s a C++ interface, and you’re using C instead of C++ for a reason,
perhaps <em>to avoid dealing with C++</em>. Have no worries. This article is
about wrapping <a href="https://en.cppreference.com/w/cpp/regex"><code class="language-plaintext highlighter-rouge">std::regex</code></a> in a tidy C interface which not only
hides all the C++ machinery, but <em>utterly tames it</em>. It’s not so much
practical as a potpourri of interesting techniques.</p>

<p>If you’d like to skip ahead, here’s the full source up front. Tested with
<a href="https://github.com/skeeto/w64devkit">w64devkit</a>, MSVC <code class="language-plaintext highlighter-rouge">cl</code>, and <code class="language-plaintext highlighter-rouge">clang-cl</code>: <strong><a href="https://github.com/skeeto/scratch/tree/master/regex-wrap">scratch/regex-wrap</a></strong></p>

<h3 id="interface-design">Interface design</h3>

<p>The C interface I came up with, <code class="language-plaintext highlighter-rouge">regex.h</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#pragma once
#include</span> <span class="cpf">&lt;stddef.h&gt;</span><span class="cp">
</span>
<span class="cp">#define S(s) (str){s, sizeof(s)-1}
</span>
<span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
    <span class="kt">char</span>     <span class="o">*</span><span class="n">data</span><span class="p">;</span>
    <span class="kt">ptrdiff_t</span> <span class="n">len</span><span class="p">;</span>
<span class="p">}</span> <span class="n">str</span><span class="p">;</span>

<span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">beg</span><span class="p">;</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">end</span><span class="p">;</span>
<span class="p">}</span> <span class="n">arena</span><span class="p">;</span>

<span class="k">typedef</span> <span class="k">struct</span> <span class="n">regex</span> <span class="n">regex</span><span class="p">;</span>

<span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
    <span class="n">str</span>      <span class="o">*</span><span class="n">data</span><span class="p">;</span>
    <span class="kt">ptrdiff_t</span> <span class="n">len</span><span class="p">;</span>
<span class="p">}</span> <span class="n">strlist</span><span class="p">;</span>

<span class="n">regex</span>  <span class="o">*</span><span class="nf">regex_new</span><span class="p">(</span><span class="n">str</span><span class="p">,</span> <span class="n">arena</span> <span class="o">*</span><span class="p">);</span>
<span class="n">strlist</span> <span class="nf">regex_match</span><span class="p">(</span><span class="n">regex</span> <span class="o">*</span><span class="p">,</span> <span class="n">str</span><span class="p">,</span> <span class="n">arena</span> <span class="o">*</span><span class="p">);</span>
</code></pre></div></div>

<p>Longtime readers will find it familiar: <a href="/blog/2023/10/08/">my favorite</a> non-owning,
counted strings form in place of null-terminated strings — similar to C++
<code class="language-plaintext highlighter-rouge">std::string_view</code> — and <a href="/blog/2023/09/27/">arena allocation</a>. Yes, such fundamental
types wouldn’t “belong” to a regex library like this, but imagine they’re
standardized by the project or whatever. Also, this is purely a C header,
not a C/C++ polyglot, and will not be used by the C++ portion.</p>

<p>In particular note the lack of “free” functions. <strong>The regex engine
allocates everything in the arena</strong>, including all temporary working
memory used while compiling, matching, etc. So in a sense, it could be
called <a href="/blog/2018/06/10/">a <em>non-allocating library</em></a>. This requires a bit of C++
abuse: I will not call some C++ regex destructors. It shouldn’t matter
because they only redundantly manage memory in the arena.  (If regex
objects are holding file handles or something else unnecessary then its
implementation so poor as to not be worth using, and we should just use a
better regex library.)</p>

<p>Now’s a good time to mention a caveat: In order to pull this off the regex
library lives in its own Dynamic-Link Library with its own copy of the C++
standard library, i.e. statically linked. My demo is Windows-only, but
this concept theoretically extends to shared objects on Linux. Since it’s
a C interface that doesn’t expose standard library objects, the DLL can be
used by programs compiled with different toolchains. Though that wouldn’t
apply to my inciting hypothetical.</p>

<p>Example usage:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">regex</span>  <span class="o">*</span><span class="n">re</span> <span class="o">=</span> <span class="n">regex_new</span><span class="p">(</span><span class="n">S</span><span class="p">(</span><span class="s">"(</span><span class="se">\\</span><span class="s">w+)"</span><span class="p">),</span> <span class="n">perm</span><span class="p">);</span>
<span class="n">str</span>     <span class="n">s</span>  <span class="o">=</span> <span class="n">S</span><span class="p">(</span><span class="s">"Hello, world! This is a test."</span><span class="p">);</span>
<span class="n">strlist</span> <span class="n">m</span>  <span class="o">=</span> <span class="n">regex_match</span><span class="p">(</span><span class="n">re</span><span class="p">,</span> <span class="n">s</span><span class="p">,</span> <span class="n">perm</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">ptrdiff_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">m</span><span class="p">.</span><span class="n">len</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"%2td = %.*s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="n">m</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">len</span><span class="p">,</span> <span class="n">m</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">data</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This program prints:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> 0 = Hello
 1 = world
 2 = This
 3 = is
 4 = a
 5 = test
</code></pre></div></div>

<p>If matching lots of source strings, scope the arena to the loop and then
the results, and any regex working memory, are automatically freed in O(1)
at the end of each iteration:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="p">(</span><span class="kt">ptrdiff_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">ninputs</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">arena</span>   <span class="n">scratch</span> <span class="o">=</span> <span class="o">*</span><span class="n">perm</span><span class="p">;</span>
    <span class="n">strlist</span> <span class="n">matches</span> <span class="o">=</span> <span class="n">regex_match</span><span class="p">(</span><span class="n">re</span><span class="p">,</span> <span class="n">inputs</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="o">&amp;</span><span class="n">scratch</span><span class="p">);</span>
    <span class="c1">// ... consume matches ...</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="c-implementation">C++ implementation</h3>

<p>On the C++ side the first thing I do is replace <code class="language-plaintext highlighter-rouge">new</code> and <code class="language-plaintext highlighter-rouge">delete</code>, which
is how I force it to allocate from the arena. This replaces <code class="language-plaintext highlighter-rouge">new</code>/<code class="language-plaintext highlighter-rouge">delete</code>
for <em>globally</em>, but recall that the regex library has its own, private C++
implementation. Replacements apply only to itself even if there’s other
C++ present in the process. If this is the only C++ in the process then it
doesn’t require such careful isolation.</p>

<p>I can’t tell <code class="language-plaintext highlighter-rouge">std::regex</code> about the arena — it calls <code class="language-plaintext highlighter-rouge">operator new</code> the
usual way, without extra arguments — so I have to smuggle it in through a
thread-local variable:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="k">thread_local</span> <span class="n">arena</span> <span class="o">*</span><span class="n">perm</span><span class="p">;</span>
</code></pre></div></div>

<p>If I’m sure the library is only used by a single thread then I can omit
<code class="language-plaintext highlighter-rouge">thread_local</code>, but it’s useful here to demonstrate and measure. Using it
in my operator replacements:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="o">*</span><span class="k">operator</span> <span class="nf">new</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">size</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">align_val_t</span> <span class="n">align</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">arena</span>    <span class="o">*</span><span class="n">a</span>     <span class="o">=</span> <span class="n">perm</span><span class="p">;</span>
    <span class="kt">ptrdiff_t</span> <span class="n">ssize</span> <span class="o">=</span> <span class="n">size</span><span class="p">;</span>
    <span class="kt">ptrdiff_t</span> <span class="n">pad</span>   <span class="o">=</span> <span class="p">(</span><span class="kt">uintptr_t</span><span class="p">)</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">end</span> <span class="o">&amp;</span> <span class="p">((</span><span class="kt">int</span><span class="p">)</span><span class="n">align</span> <span class="o">-</span> <span class="mi">1</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">ssize</span> <span class="o">&lt;</span> <span class="mi">0</span> <span class="o">||</span> <span class="n">ssize</span> <span class="o">&gt;</span> <span class="n">a</span><span class="o">-&gt;</span><span class="n">end</span> <span class="o">-</span> <span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">-</span> <span class="n">pad</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">throw</span> <span class="n">std</span><span class="o">::</span><span class="n">bad_alloc</span><span class="p">{};</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">a</span><span class="o">-&gt;</span><span class="n">end</span> <span class="o">-=</span> <span class="n">size</span> <span class="o">+</span> <span class="n">pad</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="o">*</span><span class="k">operator</span> <span class="k">new</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">size</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="k">operator</span> <span class="k">new</span><span class="p">(</span>
        <span class="n">size</span><span class="p">,</span>
        <span class="n">std</span><span class="o">::</span><span class="n">align_val_t</span><span class="p">(</span><span class="n">__STDCPP_DEFAULT_NEW_ALIGNMENT__</span><span class="p">)</span>
    <span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Starting in C++17, replacing the global allocator requires definitions for
both plain <code class="language-plaintext highlighter-rouge">new</code>/<code class="language-plaintext highlighter-rouge">delete</code> and aligned <code class="language-plaintext highlighter-rouge">new</code>/<code class="language-plaintext highlighter-rouge">delete</code>. The <a href="https://en.cppreference.com/w/cpp/memory/new/operator_new">many other
variants</a>, including arrays, call these four and so may be skipped.
Allocating over-aligned objects isn’t a special case for arenas, so I
implemented plain <code class="language-plaintext highlighter-rouge">new</code> by calling aligned <code class="language-plaintext highlighter-rouge">new</code>. I’d prefer to <a href="/blog/2024/04/14/">allocate
through a template</a> so that I can “see” the type, but that’s not an
option in this case.</p>

<p>After converting to signed sizes <a href="/blog/2024/05/24/">because they’re simpler</a>, it’s the
usual from-the-end allocation. I prefer <code class="language-plaintext highlighter-rouge">-fno-exceptions</code> but <code class="language-plaintext highlighter-rouge">std::regex</code>
is inherently <em>exceptional</em> — and I mean that in at least two bad ways —
so they’re required. The good news is this library gracefully and reliably
handles out-of-memory errors. (The arena makes this trivial to test, so
try it for yourself!)</p>

<p>I added a little extra flair replacing <code class="language-plaintext highlighter-rouge">delete</code>:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="k">operator</span> <span class="k">delete</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span> <span class="k">noexcept</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="k">operator</span> <span class="k">delete</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">align_val_t</span><span class="p">)</span> <span class="k">noexcept</span> <span class="p">{}</span>

<span class="kt">void</span> <span class="k">operator</span> <span class="k">delete</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="n">p</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">size</span><span class="p">)</span> <span class="k">noexcept</span>
<span class="p">{</span>
    <span class="n">arena</span> <span class="o">*</span><span class="n">a</span> <span class="o">=</span> <span class="n">perm</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">end</span> <span class="o">==</span> <span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="p">)</span><span class="n">p</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">a</span><span class="o">-&gt;</span><span class="n">end</span> <span class="o">+=</span> <span class="n">size</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The two mandatory replacements are no-ops because that’s simply how arenas
work. We don’t free individual objects, but many at once. It’s <em>completely
optional</em>, but I also replaced sized <code class="language-plaintext highlighter-rouge">delete</code> for little other reason than
<a href="/blog/2023/12/17/">sized deallocation is cool</a>. C++ destructs in reverse order, so
this is likely to work out. At least with GCC libstdc++, it freed about a
third of the workspace memory before returning to C. I’d rather it didn’t
try to free anything at all, but since it’s going to call <code class="language-plaintext highlighter-rouge">delete</code> anyway
I can get some use out of it.</p>

<p>Interesting side note: In a rough benchmark these replacements made MSVC
<code class="language-plaintext highlighter-rouge">std::regex</code> matching four times faster! I expected a <em>small</em> speedup, but
not that. In the typical case it appears to be wasting most of its time on
allocation. On the other hand, libstdc++ <code class="language-plaintext highlighter-rouge">std::regex</code> is overall quite a
bit slower than MSVC, and my replacements had no performance effect. It’s
spending its time elsewhere, and the small gains are lost interacting with
the thread-local.</p>

<p>Finally the meat:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">extern</span> <span class="s">"C"</span> <span class="n">std</span><span class="o">::</span><span class="n">regex</span> <span class="o">*</span><span class="nf">regex_new</span><span class="p">(</span><span class="n">str</span> <span class="n">re</span><span class="p">,</span> <span class="n">arena</span> <span class="o">*</span><span class="n">a</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">perm</span> <span class="o">=</span> <span class="n">a</span><span class="p">;</span>
    <span class="k">try</span> <span class="p">{</span>
        <span class="k">return</span> <span class="k">new</span> <span class="n">std</span><span class="o">::</span><span class="n">regex</span><span class="p">(</span><span class="n">re</span><span class="p">.</span><span class="n">data</span><span class="p">,</span> <span class="n">re</span><span class="p">.</span><span class="n">data</span><span class="o">+</span><span class="n">re</span><span class="p">.</span><span class="n">len</span><span class="p">);</span>
    <span class="p">}</span> <span class="k">catch</span> <span class="p">(...)</span> <span class="p">{</span>
        <span class="k">return</span> <span class="p">{};</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>It sets the thread-local to the arena, then constructs with “iterators” at
each end of the input. All exceptions are caught and turned into a null
return. Depending on need, we may want to indicate <em>why</em> it failed — out
of memory, invalid regex, etc. — by returning an error value of some sort.
An exercise for the reader.</p>

<p>The matcher is a little more complicated:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">extern</span> <span class="s">"C"</span> <span class="n">strlist</span> <span class="nf">regex_match</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">regex</span> <span class="o">*</span><span class="n">re</span><span class="p">,</span> <span class="n">str</span> <span class="n">s</span><span class="p">,</span> <span class="n">arena</span> <span class="o">*</span><span class="n">a</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">perm</span> <span class="o">=</span> <span class="n">a</span><span class="p">;</span>
    <span class="k">try</span> <span class="p">{</span>
        <span class="n">std</span><span class="o">::</span><span class="n">cregex_iterator</span> <span class="n">it</span><span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">data</span><span class="p">,</span> <span class="n">s</span><span class="p">.</span><span class="n">data</span><span class="o">+</span><span class="n">s</span><span class="p">.</span><span class="n">len</span><span class="p">,</span> <span class="o">*</span><span class="n">re</span><span class="p">);</span>
        <span class="n">std</span><span class="o">::</span><span class="n">cregex_iterator</span> <span class="n">end</span><span class="p">;</span>

        <span class="n">strlist</span> <span class="n">r</span> <span class="o">=</span> <span class="p">{};</span>
        <span class="n">r</span><span class="p">.</span><span class="n">len</span>  <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">distance</span><span class="p">(</span><span class="n">it</span><span class="p">,</span> <span class="n">end</span><span class="p">);</span>
        <span class="n">r</span><span class="p">.</span><span class="n">data</span> <span class="o">=</span> <span class="k">new</span> <span class="n">str</span><span class="p">[</span><span class="n">r</span><span class="p">.</span><span class="n">len</span><span class="p">]();</span>
        <span class="k">for</span> <span class="p">(</span><span class="kt">ptrdiff_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">it</span> <span class="o">!=</span> <span class="n">end</span><span class="p">;</span> <span class="n">it</span><span class="o">++</span><span class="p">,</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">r</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">data</span> <span class="o">=</span> <span class="n">s</span><span class="p">.</span><span class="n">data</span> <span class="o">+</span> <span class="n">it</span><span class="o">-&gt;</span><span class="n">position</span><span class="p">();</span>
            <span class="n">r</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">len</span>  <span class="o">=</span> <span class="n">it</span><span class="o">-&gt;</span><span class="n">length</span><span class="p">();</span>
        <span class="p">}</span>
        <span class="k">return</span> <span class="n">r</span><span class="p">;</span>

    <span class="p">}</span> <span class="k">catch</span> <span class="p">(...)</span> <span class="p">{</span>
        <span class="k">return</span> <span class="p">{};</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I create a <code class="language-plaintext highlighter-rouge">char *</code> “cregex” iterator, again giving it each end of the
input. I hope it’s not just making a copy (MSVC <code class="language-plaintext highlighter-rouge">std::regex</code> does <em>grumble
grumble</em>). The result is allocated out of the arena. As before, exceptions
convert to a null return. Callers can distinguish errors because no-match
results have a non-null pointer. The iterator, being a local variable, is
destroyed before returning, uselessly calling <code class="language-plaintext highlighter-rouge">delete</code>. I could avoid this
by allocating it with <code class="language-plaintext highlighter-rouge">new</code>, but in practice it doesn’t matter.</p>

<p>You might have noticed the lack of <code class="language-plaintext highlighter-rouge">declspec(dllexport)</code>. <a href="/blog/2023/08/27/">DEF files are
great</a>, and I’ve come to appreciate and prefer them. GCC and MSVC
accept them as another input on the command line, and the source need not
be aware exports. My <code class="language-plaintext highlighter-rouge">regex.def</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>LIBRARY regex
EXPORTS
regex_new
regex_match
</code></pre></div></div>

<p>In w64devkit, the command to build the DLL:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ g++ -shared -std=c++17 -o regex.dll regex.cpp regex.def
</code></pre></div></div>

<p>The MSVC command almost maps 1:1 to the GCC command:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cl /LD /std:c++17 /EHsc regex.cpp regex.def
</code></pre></div></div>

<p>In either case only the C interface is exported (via <a href="/blog/2024/06/30/">peports</a>):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ peports -e regex.dll
EXPORTS
        1       regex_match
        2       regex_new
</code></pre></div></div>

<h3 id="reasons-against">Reasons against</h3>

<p>Though this library is conveniently on hand, and my minimalist C wrapper
interface is nicer than a typical C regex library interface, and even
hides some <code class="language-plaintext highlighter-rouge">std::regex</code> problems, trade-offs must be considered:</p>

<ul>
  <li>No Unicode support, particularly UTF-8</li>
  <li><code class="language-plaintext highlighter-rouge">std::regex</code> implementations are universally poor and slow</li>
  <li>libstdc++ <code class="language-plaintext highlighter-rouge">std::regex</code> is especially slow to compile</li>
  <li>Isolating in a DLL (if needed) is inconvenient</li>
  <li>DLL is 200K (MSVC) to 700K (GCC) or so</li>
</ul>

<p>Depending on what I’m doing, some of these may have me looking elsewhere.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  <entry>
    <title>Arenas and the almighty concatenation operator</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2024/05/25/"/>
    <id>urn:uuid:e88784ce-08fb-40d2-b6ad-c3d9af3cf5bc</id>
    <updated>2024-05-25T00:00:00Z</updated>
    <category term="c"/><category term="cpp"/>
    <content type="html">
      <![CDATA[<p>I continue to streamline <a href="/blog/2023/09/27/">an arena-based paradigm</a>, and stumbled
upon a concise technique for dynamic growth — an efficient, generic
“concatenate anything to anything” within an arena built atop a core of
9-ish lines of code. The key insight originated from a reader suggestion
about <a href="/blog/2023/10/05/">dynamic arrays</a>. The subject of concatenation can be a string,
dynamic array, or even something else. The “system” is extensible, and
especially useful for path handling.</p>

<p>Continuing <a href="/blog/2024/04/14/">from last time</a>, the examples are in light, C-style C++.
I chose it because templates and function overloading express the concepts
succinctly. It uses no standard library functionality, so converting to C,
or similar, should be straightforward. The core concatenation “operator”:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span><span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">&gt;</span>
<span class="n">T</span> <span class="nf">concat</span><span class="p">(</span><span class="n">arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="n">T</span> <span class="n">head</span><span class="p">,</span> <span class="n">T</span> <span class="n">tail</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">((</span><span class="kt">char</span> <span class="o">*</span><span class="p">)(</span><span class="n">head</span><span class="p">.</span><span class="n">data</span><span class="o">+</span><span class="n">head</span><span class="p">.</span><span class="n">len</span><span class="p">)</span> <span class="o">!=</span> <span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">head</span> <span class="o">=</span> <span class="n">T</span><span class="p">{</span><span class="n">a</span><span class="p">,</span> <span class="n">head</span><span class="p">};</span>
    <span class="p">}</span>
    <span class="n">head</span><span class="p">.</span><span class="n">len</span> <span class="o">+=</span> <span class="n">T</span><span class="p">{</span><span class="n">a</span><span class="p">,</span> <span class="n">tail</span><span class="p">}.</span><span class="n">len</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">head</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This concatenates two objects of the same type in the arena, and does so
<em>in place</em> if possible. That is, we can efficiently build a value piece by
piece. The type <code class="language-plaintext highlighter-rouge">T</code> must have <code class="language-plaintext highlighter-rouge">data</code> and <code class="language-plaintext highlighter-rouge">len</code> members, and a “copy”
constructor that makes a copy of the given object at <em>the front of the
arena</em>. Size integer overflows and out-of-memory errors are, as usual,
handled by the arena. In particular, note that the <code class="language-plaintext highlighter-rouge">len</code> addition happens
after allocation.</p>

<p>Since the front-of-the-arena business implicit, consider <code class="language-plaintext highlighter-rouge">assert</code>ing it if
you’re worried. I’ve also considered declaring a <code class="language-plaintext highlighter-rouge">clone</code> “operator” where
that behavior is an explicit part of its interface.</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Make a copy of the object at the front of the arena.</span>
<span class="k">template</span><span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">&gt;</span> <span class="n">T</span> <span class="nf">clone</span><span class="p">(</span><span class="n">arena</span> <span class="o">*</span><span class="p">,</span> <span class="n">T</span><span class="p">);</span>

<span class="c1">// In concat, replace the T{} constructors with clone:</span>
    <span class="n">head</span> <span class="o">=</span> <span class="n">clone</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">head</span><span class="p">);</span>
    <span class="n">head</span><span class="p">.</span><span class="n">len</span> <span class="o">+=</span> <span class="n">clone</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">tail</span><span class="p">).</span><span class="n">len</span><span class="p">;</span>
</code></pre></div></div>

<p>Strings are perhaps them most interesting subject of concatenation. Here’s
a compatible string, <code class="language-plaintext highlighter-rouge">str</code>, definition from my previous article:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">str</span> <span class="p">{</span>
    <span class="k">union</span> <span class="p">{</span>
        <span class="kt">uint8_t</span>    <span class="o">*</span><span class="n">data</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
        <span class="kt">char</span> <span class="k">const</span> <span class="o">*</span><span class="n">cdata</span><span class="p">;</span>
    <span class="p">};</span>
    <span class="kt">ptrdiff_t</span> <span class="n">len</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

    <span class="n">str</span><span class="p">()</span> <span class="o">=</span> <span class="k">default</span><span class="p">;</span>

    <span class="n">str</span><span class="p">(</span><span class="kt">uint8_t</span> <span class="o">*</span><span class="n">beg</span><span class="p">,</span> <span class="kt">uint8_t</span> <span class="o">*</span><span class="n">end</span><span class="p">)</span> <span class="o">:</span> <span class="n">data</span><span class="p">{</span><span class="n">beg</span><span class="p">},</span> <span class="n">len</span><span class="p">{</span><span class="n">end</span><span class="o">-</span><span class="n">beg</span><span class="p">}</span> <span class="p">{}</span>

    <span class="k">template</span><span class="o">&lt;</span><span class="kt">ptrdiff_t</span> <span class="n">N</span><span class="p">&gt;</span>
    <span class="k">constexpr</span> <span class="n">str</span><span class="p">(</span><span class="kt">char</span> <span class="k">const</span> <span class="p">(</span><span class="o">&amp;</span><span class="n">s</span><span class="p">)[</span><span class="n">N</span><span class="p">])</span> <span class="o">:</span> <span class="n">cdata</span><span class="p">{</span><span class="n">s</span><span class="p">},</span> <span class="n">len</span><span class="p">{</span><span class="n">N</span><span class="o">-</span><span class="mi">1</span><span class="p">}</span> <span class="p">{}</span>

    <span class="n">str</span><span class="p">(</span><span class="n">arena</span> <span class="o">*</span><span class="p">,</span> <span class="n">str</span><span class="p">);</span>  <span class="c1">// TODO</span>

    <span class="kt">uint8_t</span> <span class="o">&amp;</span><span class="k">operator</span><span class="p">[](</span><span class="kt">ptrdiff_t</span> <span class="n">i</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="p">];</span> <span class="p">}</span>
<span class="p">};</span>
</code></pre></div></div>

<p>This has <code class="language-plaintext highlighter-rouge">data</code>, <code class="language-plaintext highlighter-rouge">len</code>, and the necessary constructor declaration. Before
showing the constructor definition, here’s an arena following the usual
formula, which should be familiar to those who’ve been following along:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">arena</span> <span class="p">{</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">beg</span><span class="p">;</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">end</span><span class="p">;</span>
<span class="p">};</span>

<span class="k">template</span><span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">,</span> <span class="k">typename</span> <span class="o">...</span><span class="nc">A</span><span class="p">&gt;</span>
<span class="n">T</span> <span class="o">*</span><span class="n">makefront</span><span class="p">(</span><span class="kt">ptrdiff_t</span> <span class="n">count</span><span class="p">,</span> <span class="n">arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="n">A</span> <span class="p">...</span><span class="n">args</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">ptrdiff_t</span> <span class="n">size</span>  <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">T</span><span class="p">);</span>
    <span class="kt">ptrdiff_t</span> <span class="n">align</span> <span class="o">=</span> <span class="o">-</span><span class="p">(</span><span class="kt">uintptr_t</span><span class="p">)</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">&amp;</span> <span class="p">(</span><span class="k">alignof</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">);</span>
    <span class="n">assert</span><span class="p">(</span><span class="n">count</span> <span class="o">&lt;</span> <span class="p">(</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">end</span> <span class="o">-</span> <span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">-</span> <span class="n">align</span><span class="p">)</span><span class="o">/</span><span class="n">size</span><span class="p">);</span>  <span class="c1">// OOM</span>
    <span class="n">T</span> <span class="o">*</span><span class="n">r</span> <span class="o">=</span> <span class="p">(</span><span class="n">T</span> <span class="o">*</span><span class="p">)(</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">+</span> <span class="n">align</span><span class="p">);</span>
    <span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">+=</span> <span class="n">align</span> <span class="o">+</span> <span class="n">size</span><span class="o">*</span><span class="n">count</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">ptrdiff_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">count</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">new</span> <span class="p">(</span><span class="n">r</span><span class="o">+</span><span class="n">i</span><span class="p">)</span> <span class="n">T</span><span class="p">(</span><span class="n">args</span><span class="p">...);</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Note how it bumps <code class="language-plaintext highlighter-rouge">beg</code>, not <code class="language-plaintext highlighter-rouge">end</code>, because it’s allocated at the front.
That opens the end of the object for concatenation. When it returns, <code class="language-plaintext highlighter-rouge">beg</code>
points just past the end of the new object, aligned to it. Later, <code class="language-plaintext highlighter-rouge">concat</code>
inspects <code class="language-plaintext highlighter-rouge">beg</code> to see if it can <em>extend in place</em>. That will be true if
nothing else has been allocated <em>at the front</em> in the meantime. That is,
we can allocate objects <em>at the end</em> — such as <a href="/blog/2023/09/30/">hash map nodes</a> —
while efficiently growing an object at the front through concatenation. If
it’s not true for whatever reason, concatenation still works, just with
reduced efficiency.</p>

<p>With that out of the way, the “copy” constructor is simple:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">str</span><span class="o">::</span><span class="n">str</span><span class="p">(</span><span class="n">arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="n">str</span> <span class="n">s</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">data</span> <span class="o">=</span> <span class="n">makefront</span><span class="o">&lt;</span><span class="kt">uint8_t</span><span class="o">&gt;</span><span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">len</span><span class="p">,</span> <span class="n">a</span><span class="p">);</span>
    <span class="n">len</span> <span class="o">=</span> <span class="n">s</span><span class="p">.</span><span class="n">len</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">ptrdiff_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">len</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>That’s everything we need to put it into action. For example, a function
that deletes a file at a path following a path template.</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">char</span> <span class="o">*</span><span class="nf">tocstr</span><span class="p">(</span><span class="n">arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="n">str</span> <span class="n">s</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="p">)</span><span class="n">concat</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">s</span><span class="p">,</span> <span class="n">str</span><span class="p">{</span><span class="s">"</span><span class="se">\0</span><span class="s">"</span><span class="p">}).</span><span class="n">data</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">bool</span> <span class="n">removeconfig</span><span class="p">(</span><span class="n">str</span> <span class="n">home</span><span class="p">,</span> <span class="n">str</span> <span class="n">program</span><span class="p">,</span> <span class="n">arena</span> <span class="n">scratch</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">str</span> <span class="n">path</span> <span class="o">=</span> <span class="p">{};</span>
    <span class="n">path</span> <span class="o">=</span> <span class="n">concat</span><span class="p">(</span><span class="o">&amp;</span><span class="n">scratch</span><span class="p">,</span> <span class="n">path</span><span class="p">,</span> <span class="n">home</span><span class="p">);</span>
    <span class="n">path</span> <span class="o">=</span> <span class="n">concat</span><span class="p">(</span><span class="o">&amp;</span><span class="n">scratch</span><span class="p">,</span> <span class="n">path</span><span class="p">,</span> <span class="n">str</span><span class="p">{</span><span class="s">"/.config/"</span><span class="p">});</span>
    <span class="n">path</span> <span class="o">=</span> <span class="n">concat</span><span class="p">(</span><span class="o">&amp;</span><span class="n">scratch</span><span class="p">,</span> <span class="n">path</span><span class="p">,</span> <span class="n">program</span><span class="p">);</span>
    <span class="n">path</span> <span class="o">=</span> <span class="n">concat</span><span class="p">(</span><span class="o">&amp;</span><span class="n">scratch</span><span class="p">,</span> <span class="n">path</span><span class="p">,</span> <span class="n">str</span><span class="p">{</span><span class="s">"/rc"</span><span class="p">});</span>
    <span class="k">return</span> <span class="o">!</span><span class="n">unlink</span><span class="p">(</span><span class="n">tocstr</span><span class="p">(</span><span class="o">&amp;</span><span class="n">scratch</span><span class="p">,</span> <span class="n">path</span><span class="p">));</span>
<span class="p">}</span>
</code></pre></div></div>

<p>First, <code class="language-plaintext highlighter-rouge">concat</code> does all the heavy lifting in a null-terminated “C string”
conversion function that operates in place if possible. In <code class="language-plaintext highlighter-rouge">removeconfig</code>
I construct a path from path components, starting from a zero-initialized
<em>null string</em>. In the first <code class="language-plaintext highlighter-rouge">concat</code>, this null string is “copied” into
the arena, laying a foundation for additional concatenations. Each path
component is copied in place, so unlike <a href="/blog/2021/07/30/">a dumb <code class="language-plaintext highlighter-rouge">strcat</code></a>, it’s not
quadratic.</p>

<p>Even more, notice it supports arbitrary path lengths. No <code class="language-plaintext highlighter-rouge">PATH_MAX</code>,
<code class="language-plaintext highlighter-rouge">MAX_PATH</code>, etc., it grows into the arena as needed. No <a href="/blog/2024/02/05/">huge stack
variables</a> necessary, and the scratch arena automatically frees
the path on return. Fancier yet, imagine a variadic function that glues
path components together with the proper path delimiter, and it wouldn’t
involve <a href="/blog/2024/05/24/">a single, error-prone size calculation</a>.</p>

<p>The <code class="language-plaintext highlighter-rouge">str{}</code> business is unfortunate. The <code class="language-plaintext highlighter-rouge">char</code> array constructor normally
kicks in in these situations, but compilers can’t resolve the template
without an explicit <code class="language-plaintext highlighter-rouge">str</code> object. Perhaps there’s a workaround, but I’m
not yet savvy enough with C++ to figure it out. In the C version you’d
always need to wrap those literals in the string macro.</p>

<h3 id="extending-concatenation">Extending concatenation</h3>

<p>The “operator” can be extended by defining more overloads. For example, to
concatenate 32-bit integers to a string:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">str</span> <span class="nf">concat</span><span class="p">(</span><span class="n">arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="n">str</span> <span class="n">s</span><span class="p">,</span> <span class="kt">int32_t</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">uint8_t</span>  <span class="n">buf</span><span class="p">[</span><span class="mi">16</span><span class="p">];</span>
    <span class="kt">uint8_t</span> <span class="o">*</span><span class="n">end</span> <span class="o">=</span> <span class="n">buf</span> <span class="o">+</span> <span class="n">countof</span><span class="p">(</span><span class="n">buf</span><span class="p">);</span>
    <span class="kt">uint8_t</span> <span class="o">*</span><span class="n">beg</span> <span class="o">=</span> <span class="n">end</span><span class="p">;</span>
    <span class="kt">int32_t</span>  <span class="n">neg</span> <span class="o">=</span> <span class="n">x</span><span class="o">&lt;</span><span class="mi">0</span> <span class="o">?</span> <span class="n">x</span> <span class="o">:</span> <span class="o">-</span><span class="n">x</span><span class="p">;</span>
    <span class="k">do</span> <span class="p">{</span>
        <span class="o">*--</span><span class="n">beg</span> <span class="o">=</span> <span class="sc">'0'</span> <span class="o">-</span> <span class="n">neg</span><span class="o">%</span><span class="mi">10</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">while</span> <span class="p">(</span><span class="n">neg</span> <span class="o">/=</span> <span class="mi">10</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">x</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="o">*--</span><span class="n">beg</span> <span class="o">=</span> <span class="sc">'-'</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">concat</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">s</span><span class="p">,</span> <span class="p">{</span><span class="n">beg</span><span class="p">,</span> <span class="n">end</span><span class="p">});</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Now we can, say, construct a randomly-generated temporary path:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">str</span> <span class="n">path</span> <span class="o">=</span> <span class="p">{};</span>
<span class="n">path</span> <span class="o">=</span> <span class="n">concat</span><span class="p">(</span><span class="o">&amp;</span><span class="n">scratch</span><span class="p">,</span> <span class="n">path</span><span class="p">,</span> <span class="n">tempdir</span><span class="p">);</span>
<span class="n">path</span> <span class="o">=</span> <span class="n">concat</span><span class="p">(</span><span class="o">&amp;</span><span class="n">scratch</span><span class="p">,</span> <span class="n">path</span><span class="p">,</span> <span class="n">str</span><span class="p">{</span><span class="s">"/temp"</span><span class="p">});</span>
<span class="kt">int32_t</span> <span class="n">id</span> <span class="o">=</span> <span class="n">rand32</span><span class="p">(</span><span class="o">&amp;</span><span class="n">rng</span><span class="p">);</span>
<span class="n">path</span> <span class="o">=</span> <span class="n">concat</span><span class="p">(</span><span class="o">&amp;</span><span class="n">scratch</span><span class="p">,</span> <span class="n">path</span><span class="p">,</span> <span class="n">id</span><span class="p">);</span>
</code></pre></div></div>

<p>Keep adding more definitions like this and you’ll have something like, or
complementing, <a href="/blog/2023/02/13/">buffered output</a>. It doesn’t stop there. Code points
concatenated as UTF-8:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">str</span> <span class="nf">concat</span><span class="p">(</span><span class="n">arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="n">str</span> <span class="n">s</span><span class="p">,</span> <span class="kt">char32_t</span> <span class="n">rune</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">enum</span> <span class="p">{</span> <span class="n">REPLACEMENT_CHARACTER</span> <span class="o">=</span> <span class="mh">0xfffd</span> <span class="p">};</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">rune</span><span class="o">&gt;=</span><span class="mh">0xd800</span> <span class="o">&amp;&amp;</span> <span class="n">rune</span><span class="o">&lt;=</span><span class="mh">0xdfff</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">rune</span> <span class="o">=</span> <span class="n">REPLACEMENT_CHARACTER</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="kt">uint8_t</span>  <span class="n">buf</span><span class="p">[</span><span class="mi">4</span><span class="p">];</span>
    <span class="kt">uint8_t</span> <span class="o">*</span><span class="n">end</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">rune</span> <span class="o">&lt;</span> <span class="mh">0x80</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">buf</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">rune</span><span class="p">;</span>
        <span class="n">end</span> <span class="o">=</span> <span class="n">buf</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">rune</span> <span class="o">&lt;</span> <span class="mh">0x800</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">buf</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span>  <span class="p">(</span><span class="n">rune</span> <span class="o">&gt;&gt;</span>  <span class="mi">6</span><span class="p">)</span>         <span class="o">|</span> <span class="mh">0xc0</span><span class="p">;</span>
        <span class="n">buf</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="p">((</span><span class="n">rune</span> <span class="o">&gt;&gt;</span>  <span class="mi">0</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0x3f</span><span class="p">)</span> <span class="o">|</span> <span class="mh">0x80</span><span class="p">;</span>
        <span class="n">end</span> <span class="o">=</span> <span class="n">buf</span> <span class="o">+</span> <span class="mi">2</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">rune</span> <span class="o">&lt;</span> <span class="mh">0x10000</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">buf</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span>  <span class="p">(</span><span class="n">rune</span> <span class="o">&gt;&gt;</span> <span class="mi">12</span><span class="p">)</span>         <span class="o">|</span> <span class="mh">0xe0</span><span class="p">;</span>
        <span class="n">buf</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="p">((</span><span class="n">rune</span> <span class="o">&gt;&gt;</span>  <span class="mi">6</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0x3f</span><span class="p">)</span> <span class="o">|</span> <span class="mh">0x80</span><span class="p">;</span>
        <span class="n">buf</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="p">((</span><span class="n">rune</span> <span class="o">&gt;&gt;</span>  <span class="mi">0</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0x3f</span><span class="p">)</span> <span class="o">|</span> <span class="mh">0x80</span><span class="p">;</span>
        <span class="n">end</span> <span class="o">=</span> <span class="n">buf</span> <span class="o">+</span> <span class="mi">3</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="n">buf</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span>  <span class="p">(</span><span class="n">rune</span> <span class="o">&gt;&gt;</span> <span class="mi">18</span><span class="p">)</span>         <span class="o">|</span> <span class="mh">0xf0</span><span class="p">;</span>
        <span class="n">buf</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="p">((</span><span class="n">rune</span> <span class="o">&gt;&gt;</span> <span class="mi">12</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0x3f</span><span class="p">)</span> <span class="o">|</span> <span class="mh">0x80</span><span class="p">;</span>
        <span class="n">buf</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="p">((</span><span class="n">rune</span> <span class="o">&gt;&gt;</span>  <span class="mi">6</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0x3f</span><span class="p">)</span> <span class="o">|</span> <span class="mh">0x80</span><span class="p">;</span>
        <span class="n">buf</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">=</span> <span class="p">((</span><span class="n">rune</span> <span class="o">&gt;&gt;</span>  <span class="mi">0</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0x3f</span><span class="p">)</span> <span class="o">|</span> <span class="mh">0x80</span><span class="p">;</span>
        <span class="n">end</span> <span class="o">=</span> <span class="n">buf</span> <span class="o">+</span> <span class="mi">4</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">concat</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">s</span><span class="p">,</span> <span class="p">{</span><span class="n">buf</span><span class="p">,</span> <span class="n">end</span><span class="p">});</span>
<span class="p">}</span>
</code></pre></div></div>

<p>That composes well for general UTF-8 handling. For example, to ingest
Win32 strings (arguments, paths, etc.):</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">str</span> <span class="nf">convert</span><span class="p">(</span><span class="n">arena</span> <span class="o">*</span><span class="n">perm</span><span class="p">,</span> <span class="kt">char16_t</span> <span class="o">*</span><span class="n">s</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">str</span> <span class="n">r</span> <span class="o">=</span> <span class="p">{};</span>
    <span class="k">while</span> <span class="p">(</span><span class="o">*</span><span class="n">s</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">char32_t</span> <span class="n">rune</span> <span class="o">=</span> <span class="n">decode</span><span class="p">(</span><span class="o">&amp;</span><span class="n">s</span><span class="p">);</span>
        <span class="n">r</span> <span class="o">=</span> <span class="n">concat</span><span class="p">(</span><span class="n">perm</span><span class="p">,</span> <span class="n">r</span><span class="p">,</span> <span class="n">rune</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="beyond-strings">Beyond strings</h3>

<p>One of my most useful C++ templates has been a span structure:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span><span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">&gt;</span>
<span class="k">struct</span> <span class="nc">span</span> <span class="p">{</span>
    <span class="n">T</span>        <span class="o">*</span><span class="n">data</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="kt">ptrdiff_t</span> <span class="n">len</span>  <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

    <span class="n">span</span><span class="p">()</span> <span class="o">=</span> <span class="k">default</span><span class="p">;</span>

    <span class="n">span</span><span class="p">(</span><span class="n">T</span> <span class="o">*</span><span class="n">beg</span><span class="p">,</span> <span class="n">T</span> <span class="o">*</span><span class="n">end</span><span class="p">)</span> <span class="o">:</span> <span class="n">data</span><span class="p">{</span><span class="n">beg</span><span class="p">},</span> <span class="n">len</span><span class="p">{</span><span class="n">end</span><span class="o">-</span><span class="n">beg</span><span class="p">}</span> <span class="p">{}</span>

    <span class="n">span</span><span class="p">(</span><span class="n">arena</span> <span class="o">*</span><span class="p">,</span> <span class="n">span</span><span class="p">);</span>  <span class="c1">// for concat</span>

    <span class="n">T</span> <span class="o">&amp;</span><span class="k">operator</span><span class="p">[](</span><span class="kt">ptrdiff_t</span> <span class="n">i</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="p">];</span> <span class="p">}</span>
<span class="p">};</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">span::span</code> definition looks exactly like <code class="language-plaintext highlighter-rouge">str::str</code>. In fact, we
could nearly define strings as <code class="language-plaintext highlighter-rouge">uint8_t</code> spans:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="n">span</span><span class="o">&lt;</span><span class="kt">uint8_t</span><span class="o">&gt;</span> <span class="n">str</span><span class="p">;</span>  <span class="c1">// hypothetical</span>
</code></pre></div></div>

<p>Though I’ve found strings to be just special enough not to be worth it.</p>

<p>This <code class="language-plaintext highlighter-rouge">span</code> definition is now fleshed out sufficiently to use <code class="language-plaintext highlighter-rouge">concat</code>
with no additional definitions! However, outside of strings, concatenating
spans is unusual. More often we want to append individual elements. Again,
we can build on that core <code class="language-plaintext highlighter-rouge">concat</code> template:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span><span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">&gt;</span>
<span class="n">span</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="n">concat</span><span class="p">(</span><span class="n">arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="n">span</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="n">s</span><span class="p">,</span> <span class="n">T</span> <span class="n">v</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="n">concat</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">s</span><span class="p">,</span> <span class="n">span</span><span class="p">{</span><span class="o">&amp;</span><span class="n">v</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">v</span><span class="o">+</span><span class="mi">1</span><span class="p">});</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Now <code class="language-plaintext highlighter-rouge">span</code> is ready for 99% of its use cases. For example:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">span</span><span class="o">&lt;</span><span class="kt">int32_t</span><span class="o">&gt;</span> <span class="n">squares</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int32_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;=</span> <span class="mi">1000</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">squares</span> <span class="o">=</span> <span class="n">concat</span><span class="p">(</span><span class="o">&amp;</span><span class="n">scratch</span><span class="p">,</span> <span class="n">squares</span><span class="p">,</span> <span class="n">i</span><span class="o">*</span><span class="n">i</span><span class="p">);</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>It’s often good enough, but it’s not ideal as a general purpose dynamic
array. Each append makes a trip through arena allocation, and this span
cannot efficiently shrink and then grow again. Sometimes we’d like to
track capacity, covering both those cases.</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span><span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">&gt;</span>
<span class="k">struct</span> <span class="nc">list</span> <span class="p">{</span>
    <span class="n">T</span>        <span class="o">*</span><span class="n">data</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="kt">ptrdiff_t</span> <span class="n">len</span>  <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="kt">ptrdiff_t</span> <span class="n">cap</span>  <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

    <span class="n">list</span><span class="p">()</span> <span class="o">=</span> <span class="k">default</span><span class="p">;</span>

    <span class="n">list</span><span class="p">(</span><span class="n">arena</span> <span class="o">*</span><span class="p">,</span> <span class="n">list</span><span class="p">);</span>  <span class="c1">// for concat</span>

    <span class="n">T</span> <span class="o">&amp;</span><span class="k">operator</span><span class="p">[](</span><span class="kt">ptrdiff_t</span> <span class="n">i</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="p">];</span> <span class="p">}</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Unfortunately <code class="language-plaintext highlighter-rouge">cap</code> is a curve ball that the core template can’t handle,
requiring a slightly more complex definition. Since concatenating whole
<code class="language-plaintext highlighter-rouge">list</code> objects is unusual, a definition for appending single elements:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span><span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">&gt;</span>
<span class="n">list</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="n">concat</span><span class="p">(</span><span class="n">arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="n">list</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="n">s</span><span class="p">,</span> <span class="n">T</span> <span class="n">v</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">len</span> <span class="o">==</span> <span class="n">s</span><span class="p">.</span><span class="n">cap</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">if</span> <span class="p">((</span><span class="kt">char</span> <span class="o">*</span><span class="p">)(</span><span class="n">s</span><span class="p">.</span><span class="n">data</span><span class="o">+</span><span class="n">s</span><span class="p">.</span><span class="n">len</span><span class="p">)</span> <span class="o">!=</span> <span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">s</span> <span class="o">=</span> <span class="n">list</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">{</span><span class="n">a</span><span class="p">,</span> <span class="n">s</span><span class="p">};</span>
        <span class="p">}</span>
        <span class="kt">ptrdiff_t</span> <span class="n">extend</span> <span class="o">=</span> <span class="n">s</span><span class="p">.</span><span class="n">cap</span> <span class="o">?</span> <span class="n">s</span><span class="p">.</span><span class="n">cap</span> <span class="o">:</span> <span class="mi">4</span><span class="p">;</span>
        <span class="n">makefront</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">(</span><span class="n">extend</span><span class="p">,</span> <span class="n">a</span><span class="p">);</span>
        <span class="n">s</span><span class="p">.</span><span class="n">cap</span> <span class="o">+=</span> <span class="n">extend</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="n">s</span><span class="p">[</span><span class="n">s</span><span class="p">.</span><span class="n">len</span><span class="o">++</span><span class="p">]</span> <span class="o">=</span> <span class="n">v</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">s</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Note how inside the <code class="language-plaintext highlighter-rouge">if</code> it’s basically the same core definition. As
before, this definition extends in place if possible, but otherwise
handles it correctly anyway. In addition the above concerns, this <code class="language-plaintext highlighter-rouge">list</code>
is more suited to having multiple “open” dynamic arrays at once.</p>

<p>This concatenative concept has been a useful way to think about a variety
of situations in order to solve them effectively with arena allocation.</p>

<p><strong>Update</strong>: NRK <a href="https://lists.sr.ht/~skeeto/public-inbox/%3Cane2ee7fpnyn3qxslygprmjw2yrvzppxuim25jvf7e6f5jgxbd@p7y6own2j3it%3E#%3C2qzyqky3jtv6w64vicwnkrwa7nb52uohuu625bc3zrkaoor6ml@v57pb72uozpy%3E">sharply points out</a> that “extend in place” as
expressed in <code class="language-plaintext highlighter-rouge">concat</code> is incompatible with the <a href="https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html"><code class="language-plaintext highlighter-rouge">alloc_size</code> and <code class="language-plaintext highlighter-rouge">malloc</code>
GCC function attributes</a>, which I’ve suggested in the past. While
considering how to mitigate this, we’ve also discovered that <code class="language-plaintext highlighter-rouge">alloc_size</code>
has always been fundamentally broken in GCC. Correct use is impossible,
and so <em>it must not be used</em>.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Guidelines for computing sizes and subscripts</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2024/05/24/"/>
    <id>urn:uuid:df6214e0-e408-4254-bd65-49d64e06a93e</id>
    <updated>2024-05-24T22:25:10Z</updated>
    <category term="c"/><category term="cpp"/><category term="go"/><category term="tutorial"/>
    <content type="html">
      <![CDATA[<p>Occasionally we need to compute the size of an object that does not yet
exist, or a subscript <a href="https://research.google/blog/extra-extra-read-all-about-it-nearly-all-binary-searches-and-mergesorts-are-broken/">that may fall out of bounds</a>. It’s easy to miss
the edge cases where results overflow, creating a nasty, subtle bug, <a href="https://blog.carlana.net/post/2024/golang-slices-concat/">even
in the presence of type safety</a>. Ideally such computations happen in
specialized code, such as <em>inside</em> an allocator (<code class="language-plaintext highlighter-rouge">calloc</code>, <code class="language-plaintext highlighter-rouge">reallocarray</code>)
and not <em>outside</em> by the allocatee (i.e. <code class="language-plaintext highlighter-rouge">malloc</code>). Mitigations exist with
different trade-offs: arbitrary precision, or using a wider fixed integer
— i.e. 128-bit integers on 64-bit hosts. In the typical case, working only
with fixed size-type integers, I’ve come up with a set of guidelines to
avoid overflows in the edge cases.</p>

<ol>
  <li>Range check <em>before</em> computing a result. No exceptions.</li>
  <li>Do not cast unless you know <em>a priori</em> the operand is in range.</li>
  <li>Never mix unsigned and signed operands. <a href="https://www.youtube.com/watch?v=wvtFGa6XJDU">Prefer signed.</a> If you
need to convert an operand, see (2).</li>
  <li>Do not add unless you know <em>a priori</em> the result is in range.</li>
  <li>Do not multiply unless you know <em>a priori</em> the result is in range.</li>
  <li>Do not subtract unless you know <em>a priori</em> both signed operands
are non-negative. For unsigned, that the second operand is not larger
than the first (treat it like (4)).</li>
  <li>Do not divide unless you know <em>a prior</em> the denominator is positive.</li>
  <li>Make it correct first. Make it fast later, if needed.</li>
</ol>

<p>These guidelines are also useful when <em>reviewing</em> code, tracking in your
mind whether the invariants are held at each step. If not, you’ve likely
found a bug. If in doubt, use assertions to document and check invariants.
I compiled this list during code review, so for me that’s where it’s most
useful.</p>

<h3 id="range-check-then-compute">Range check, then compute</h3>

<p>Not strictly necessary when overflow is well-defined, i.e. wraparound, but
it’s like defensive driving. It’s simpler and clearer to check with basic
arithmetic rather than reason from a wraparound, i.e. a negative result.
Checked math functions are fine, too, if you check the overflow boolean
before accessing the result.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// bad
len++;
if (len &lt;= 0) error();

// good
if (len == MAX) error();
len++;
</code></pre></div></div>

<h3 id="casting">Casting</h3>

<p>Casting from signed to unsigned, it’s as simple as knowing the value is
non-negative, which is likely if you’re following (1). If a negative size
has appeared, there’s already been a bug earlier in the program, and the
only reasonable course of action is to abort, not handle it like an error.</p>

<h3 id="addition">Addition</h3>

<p>To check if addition will overflow, subtract one of the operands from the
maximum value.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>if (b &gt; MAX - a) error();
r = a + b;
</code></pre></div></div>

<p>In pointer arithmetic addition, it’s a common mistake to compute the
result pointer then compare it to the bounds. If the check failed, then
the pointer <em>already</em> overflowed, i.e. undefined behavior. Major pieces
software, <a href="https://sourcegraph.com/search?q=context:global+%22%3E+outend%22+repo:%5Egithub%5C.com/bminor/glibc%24+&amp;patternType=keyword&amp;sm=0">like glibc</a>, are riddled with such pointer overflows.
(Now that you’re aware of it, you’ll start noticing it everywhere. Sorry.)</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// bad: never do this
beg += size;
if (beg &gt; end) error();
</code></pre></div></div>

<p>To do this correctly, <strong>check integers not pointers</strong>. Like before,
subtract before adding.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>available = end - beg;
if (size &gt; available) error();
beg += size;
</code></pre></div></div>

<p>Mind mixing signed and unsigned operands for the comparison operator (3),
e.g. an unsigned size on the left and signed difference on the right.</p>

<h3 id="multiplication-and-division">Multiplication and division</h3>

<p>If you’re working this out on your own, multiplication seems tricky until
you’ve internalized a simple pattern. Just as we subtracted before adding,
we need to divide before multiplying. Divide the maximum value by one of
the operands:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>if (a&gt;0 &amp;&amp; b&gt;MAX/a) error();
r = a * b;
</code></pre></div></div>

<p>It’s often permitted for one or both to be zero, so mind divide-by-zero,
which is handled above by the first condition. Sometimes size must be
positive, e.g. the result of the <code class="language-plaintext highlighter-rouge">sizeof</code> operator in C, in which case we
should prefer it as the denominator.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>assert(size  &gt;  0);
assert(count &gt;= 0);
if (count &gt; MAX/size) error();
total = count * size;
</code></pre></div></div>

<p>With <a href="/blog/2023/09/27/">arena allocation</a> there are usually two concerns. First, will
it overflow when computing the total size, i.e. <code class="language-plaintext highlighter-rouge">count * size</code>? Second, is
the total size within the arena capacity. Naively that’s two checks, but
we can kill two birds with one stone: Check both at once by using the
current arena capacity as the maximum value when considering overflow.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>if (count &gt; (end - beg)/size) error();
total = count * size;
</code></pre></div></div>

<p>One condition pulling double duty.</p>

<h3 id="subtraction">Subtraction</h3>

<p>With signed sizes, the negative range is a long “runway” allowing a single
unchecked subtraction before overflow might occur. In essence, we were
exploiting this in order to check addition. The most common mistake with
unsigned subtraction is not accounting for overflow when going below zero.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// note: signed "i" only
for (i = end - stride; i &gt;= beg; i -= stride) ...
</code></pre></div></div>

<p>This loop will go awry if <code class="language-plaintext highlighter-rouge">i</code> is unsigned and <code class="language-plaintext highlighter-rouge">beg &lt;= stride</code>.</p>

<p>In special cases we can get away with a second subtraction without an
overflow check if we know some properties of our operands. For example, my
arena allocators look like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>padding = -beg &amp; (align - 1);
if (count &gt;= (end - beg - padding)/size) error();
</code></pre></div></div>

<p>That’s two subtractions in a row. However, <code class="language-plaintext highlighter-rouge">end - beg</code> describes the size
of a realized object, and <code class="language-plaintext highlighter-rouge">align</code> is a small constant (e.g. 2^(0–6)). It
could only overflow if the entirety of memory was occupied by the arena.</p>

<p>Bonus, advanced note: This check is actually pulling <em>triple duty</em>. Notice
that I used <code class="language-plaintext highlighter-rouge">&gt;=</code> instead of <code class="language-plaintext highlighter-rouge">&gt;</code>. The arena can’t fill exactly to the brim,
but it handles the extreme edge case where <code class="language-plaintext highlighter-rouge">count</code> is zero, the arena is
nearly full, but the bump pointer is unaligned. The result of subtracting
<code class="language-plaintext highlighter-rouge">padding</code> is negative, which rounds to zero by integer division, and would
pass a <code class="language-plaintext highlighter-rouge">&gt;</code> check. That wouldn’t be a problem except that aligning the bump
pointer would break the invariant <code class="language-plaintext highlighter-rouge">beg &lt;= end</code>.</p>

<h3 id="try-it-for-yourself">Try it for yourself</h3>

<p>Next time you’re reviewing code that computes sizes or subscripts, bring
the list up and see how well it follows the guidelines. If it misses one,
try to contrive an input that causes an overflow. If it follows guidelines
and you can still contrive such an input, then perhaps the list could use
another item!</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Speculations on arenas and custom strings in C++</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2024/04/14/"/>
    <id>urn:uuid:6b07a406-b303-4c2b-8afd-3e589b26eaa1</id>
    <updated>2024-04-14T00:39:18Z</updated>
    <category term="c"/><category term="cpp"/>
    <content type="html">
      <![CDATA[<p><em>Update September 2025: This article <a href="/blog/2025/09/30/">has a followup</a> with
corrections.</em></p>

<p>My techniques with <a href="/blog/2023/09/27/">arena allocation</a> and <a href="/blog/2023/10/08/">strings</a> are
oriented around C. I’m always looking for a better way, and lately I’ve
been experimenting with building them using C++ features. What are the
trade-offs? Are the benefits worth the costs? In this article I lay out my
goals, review implementation possibilities, and discuss my findings.
Following along will require familiarity with those previous two articles.</p>

<!--more-->

<p>Some of C++ is beyond my mental capabilities, and so I cannot wield those
parts effectively. Other parts I <em>can</em> wrap my head around, but it
requires substantial effort and the inevitable mistakes are difficult to
debug. So a general goal is to minimize contact with that complexity, only
touching a few higher-value features that I can use confidently.</p>

<p>Existing practice is unimportant. I’ve seen where that goes. <a href="/blog/2023/02/11/">Like the C
standard library</a>, the C++ standard library offers me little. Its
concepts regarding ownership and memory management are irreconcilable
(move semantics, smart pointers, etc.), so I have to build from scratch
anyway. So absolutely no including C++ headers. The most valuable features
are built right into the language, so I won’t need to include library
definitions.</p>

<p>No <a href="https://www.youtube.com/watch?v=uHSLHvWFkto&amp;t=4386s"><code class="language-plaintext highlighter-rouge">public</code> or <code class="language-plaintext highlighter-rouge">private</code></a>. Still no <code class="language-plaintext highlighter-rouge">const</code> beyond what is required
to access certain features. This means I can toss out a bunch of keywords
like <code class="language-plaintext highlighter-rouge">class</code>, <code class="language-plaintext highlighter-rouge">friend</code>, etc. It eliminates noisy, repetitive code and
interfaces — getters, setters, separate <code class="language-plaintext highlighter-rouge">const</code> and non-<code class="language-plaintext highlighter-rouge">const</code> — which in
my experience means fewer defects.</p>

<p>No references beyond mandatory cases. References hide addresses being
taken — or merely implies it, when it’s actually an expensive copy — which
is an annoying experience when reading unfamiliar C++. After all, for
arenas the explicit address-taking (permanent) or copying (scratch) is a
critical part of communicating the interfaces.</p>

<p>In theory <code class="language-plaintext highlighter-rouge">constexpr</code> could be useful, but it keeps falling short when I
try it out, so I’m ignoring it. I’ll elaborate in a moment.</p>

<p>Minimal template use. They blow up compile times and code size, they’re
noisy, and in practice they make debug builds (i.e. <code class="language-plaintext highlighter-rouge">-O0</code>) much slower
(typically ~10x) because there’s no optimization to clean up the mess.
I’ll only use them for a few foundational purposes, such as allocation.
(Though this article <em>is</em> about the fundamental stuff.)</p>

<p>No methods aside from limited use of operator overloads. I want to keep a
C style, plus methods just look ugly without references: <code class="language-plaintext highlighter-rouge">obj-&gt;func()</code> vs.
<code class="language-plaintext highlighter-rouge">func(obj)</code>. (Why are we still writing <code class="language-plaintext highlighter-rouge">-&gt;</code> in the 21st century?) Function
overloading can instead differentiate “methods.” Overloads are acceptable
in moderation, especially because I’m paying for it (symbol decoration)
whether or not I take advantage.</p>

<p>Finally, no exceptions of course. I assume <code class="language-plaintext highlighter-rouge">-fno-exceptions</code>, or the local
equivalent, is active.</p>

<h3 id="allocation">Allocation</h3>

<p>Let’s start with allocation. Since writing that previous article, I’ve
streamlined arena allocation in C:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define new(a, t, n)  (t *)alloc(a, sizeof(t), _Alignof(t), n)
</span>
<span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
    <span class="n">byte</span> <span class="o">*</span><span class="n">beg</span><span class="p">;</span>
    <span class="n">byte</span> <span class="o">*</span><span class="n">end</span><span class="p">;</span>
<span class="p">}</span> <span class="n">arena</span><span class="p">;</span>

<span class="k">static</span> <span class="n">byte</span> <span class="o">*</span><span class="nf">alloc</span><span class="p">(</span><span class="n">arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="n">size</span> <span class="n">objsize</span><span class="p">,</span> <span class="n">size</span> <span class="n">align</span><span class="p">,</span> <span class="n">size</span> <span class="n">count</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">assert</span><span class="p">(</span><span class="n">count</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="p">);</span>
    <span class="n">size</span> <span class="n">pad</span> <span class="o">=</span> <span class="p">(</span><span class="n">uptr</span><span class="p">)</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">end</span> <span class="o">&amp;</span> <span class="p">(</span><span class="n">align</span> <span class="o">-</span> <span class="mi">1</span><span class="p">);</span>
    <span class="n">assert</span><span class="p">(</span><span class="n">count</span> <span class="o">&lt;</span> <span class="p">(</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">end</span> <span class="o">-</span> <span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">-</span> <span class="n">pad</span><span class="p">)</span><span class="o">/</span><span class="n">objsize</span><span class="p">);</span>  <span class="c1">// oom</span>
    <span class="k">return</span> <span class="n">memset</span><span class="p">(</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">end</span> <span class="o">-=</span> <span class="n">objsize</span><span class="o">*</span><span class="n">count</span> <span class="o">+</span> <span class="n">pad</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">objsize</span><span class="o">*</span><span class="n">count</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>(As needed, replace the second <code class="language-plaintext highlighter-rouge">assert</code> with whatever out of memory policy
is appropriate.) Then allocating, say, a <a href="/blog/2023/06/26/">10k-element hash table</a>
(i.e. to keep it <a href="/blog/2024/02/05/">off the stack</a>):</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">i16</span> <span class="o">*</span><span class="n">seen</span> <span class="o">=</span> <span class="n">new</span><span class="p">(</span><span class="o">&amp;</span><span class="n">scratch</span><span class="p">,</span> <span class="n">i16</span><span class="p">,</span> <span class="mi">1</span><span class="o">&lt;&lt;</span><span class="mi">14</span><span class="p">);</span>
</code></pre></div></div>

<p>With C++, I initially tried <a href="https://en.cppreference.com/w/cpp/language/new#Placement_new">placement new</a> with the arena as the
“place” for the allocation:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="o">*</span><span class="k">operator</span> <span class="nf">new</span><span class="p">(</span><span class="kt">size_t</span><span class="p">,</span> <span class="n">arena</span> <span class="o">*</span><span class="p">);</span>  <span class="c1">// avoid this</span>
</code></pre></div></div>

<p>Then to create a single object:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">object</span> <span class="o">*</span><span class="n">o</span> <span class="o">=</span> <span class="k">new</span> <span class="p">(</span><span class="o">&amp;</span><span class="n">scratch</span><span class="p">)</span> <span class="n">object</span><span class="p">{};</span>
</code></pre></div></div>

<p>This exposes the constructor, but everything else about it is poor. It
relies on complex, finicky rules governing <code class="language-plaintext highlighter-rouge">new</code> overloads, especially for
alignment handling. It’s difficult to tell what’s happening, and it’s too
easy to make mistakes that compile. That doesn’t even count the mess that
is array <code class="language-plaintext highlighter-rouge">new[]</code>.</p>

<p>I soon learned it’s better to replace the <code class="language-plaintext highlighter-rouge">new</code> macro with a template,
which can actually see what it’s doing. I can’t call it <code class="language-plaintext highlighter-rouge">new</code> in C++, so I
settled on <code class="language-plaintext highlighter-rouge">make</code> instead:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span><span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">&gt;</span>
<span class="k">static</span> <span class="n">T</span> <span class="o">*</span><span class="nf">make</span><span class="p">(</span><span class="n">arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="n">size</span> <span class="n">count</span> <span class="o">=</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">assert</span><span class="p">(</span><span class="n">count</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="p">);</span>
    <span class="n">size</span> <span class="n">objsize</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">T</span><span class="p">);</span>
    <span class="n">size</span> <span class="n">align</span>   <span class="o">=</span> <span class="k">alignof</span><span class="p">(</span><span class="n">T</span><span class="p">);</span>
    <span class="n">size</span> <span class="n">pad</span>     <span class="o">=</span> <span class="p">(</span><span class="n">uptr</span><span class="p">)</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">end</span> <span class="o">&amp;</span> <span class="p">(</span><span class="n">align</span> <span class="o">-</span> <span class="mi">1</span><span class="p">);</span>
    <span class="n">assert</span><span class="p">(</span><span class="n">count</span> <span class="o">&lt;</span> <span class="p">(</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">end</span> <span class="o">-</span> <span class="n">a</span><span class="o">-&gt;</span><span class="n">beg</span> <span class="o">-</span> <span class="n">pad</span><span class="p">)</span><span class="o">/</span><span class="n">objsize</span><span class="p">);</span>  <span class="c1">// oom</span>
    <span class="n">a</span><span class="o">-&gt;</span><span class="n">end</span> <span class="o">-=</span> <span class="n">objsize</span><span class="o">*</span><span class="n">count</span> <span class="o">+</span> <span class="n">pad</span><span class="p">;</span>
    <span class="n">T</span> <span class="o">*</span><span class="n">r</span> <span class="o">=</span> <span class="p">(</span><span class="n">T</span> <span class="o">*</span><span class="p">)</span><span class="n">a</span><span class="o">-&gt;</span><span class="n">end</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="n">size</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">count</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">new</span> <span class="p">((</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">r</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="n">T</span><span class="p">{};</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Then allocating that hash table becomes:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">i16</span> <span class="o">*</span><span class="n">seen</span> <span class="o">=</span> <span class="n">make</span><span class="o">&lt;</span><span class="n">i16</span><span class="o">&gt;</span><span class="p">(</span><span class="o">&amp;</span><span class="n">scratch</span><span class="p">,</span> <span class="mi">10000</span><span class="p">);</span>
</code></pre></div></div>

<p>Or a single object, relying on the default argument:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">object</span> <span class="o">*</span><span class="n">o</span> <span class="o">=</span> <span class="n">make</span><span class="o">&lt;</span><span class="n">object</span><span class="o">&gt;</span><span class="p">(</span><span class="o">&amp;</span><span class="n">scratch</span><span class="p">);</span>
</code></pre></div></div>

<p>Due to placement new, merely for invoking the constructor, these objects
aren’t just zero-initialized, but value-initialized. It can only construct
objects that define an empty initializer, but in exchange unlocks some
interesting possibilities:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">mat3</span> <span class="p">{</span>
    <span class="n">f32</span> <span class="n">data</span><span class="p">[</span><span class="mi">9</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
        <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span>
        <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span>
        <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span>
    <span class="p">};</span>
<span class="p">};</span>

<span class="k">struct</span> <span class="nc">list</span> <span class="p">{</span>
    <span class="n">node</span>  <span class="o">*</span><span class="n">head</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">node</span> <span class="o">**</span><span class="n">tail</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">head</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>When a zero-initialized state isn’t ideal, objects can still initialize to
a more useful state straight out of the arena. The second case is even
self-referencing, which is specifically supported through placement new.
Otherwise you’d need a special-written copy or move constructor.</p>

<p><code class="language-plaintext highlighter-rouge">make</code> could accept constructor arguments and perfect forward them to a
constructor. However, that’s too far into the dark arts for my comfort,
plus it requires a correct definition of <code class="language-plaintext highlighter-rouge">std::forward</code>. In practice that
means <code class="language-plaintext highlighter-rouge">#include</code>-ing it, and whatever comes in with it. Or ask an expert
capable of writing such a definition from scratch, though both are
probably too busy.</p>

<p><strong>Update 1</strong>: One of those experts, Jonathan Müller, kindly reached out to
say that <a href="https://www.foonathan.net/2020/09/move-forward/">a static cast is sufficient</a>. This is easy to do:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span><span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">,</span> <span class="k">typename</span> <span class="o">...</span><span class="nc">A</span><span class="p">&gt;</span>
<span class="k">static</span> <span class="n">T</span> <span class="o">*</span><span class="nf">make</span><span class="p">(</span><span class="n">arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="n">size</span> <span class="n">count</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="n">A</span> <span class="o">&amp;&amp;</span><span class="p">...</span><span class="n">args</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// ...</span>
        <span class="k">new</span> <span class="p">((</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">r</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="n">T</span><span class="p">{(</span><span class="n">A</span> <span class="o">&amp;&amp;</span><span class="p">)</span><span class="n">args</span><span class="p">...};</span>
    <span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Update 2</strong>: I later realized that because I do not care about copy or
move semantics, I also don’t care about perfect forwarding. I can simply
expand the parameter pack without casting or <code class="language-plaintext highlighter-rouge">&amp;&amp;</code>. I also don’t want the
extra restrictions on braced initializer conversions, so better to use
parentheses with <code class="language-plaintext highlighter-rouge">new</code>.</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span><span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">,</span> <span class="k">typename</span> <span class="o">...</span><span class="nc">A</span><span class="p">&gt;</span>
<span class="k">static</span> <span class="n">T</span> <span class="o">*</span><span class="nf">make</span><span class="p">(</span><span class="n">arena</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="n">size</span> <span class="n">count</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="n">A</span> <span class="p">...</span><span class="n">args</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// ...</span>
        <span class="k">new</span> <span class="p">((</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">r</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="n">T</span><span class="p">(</span><span class="n">args</span><span class="p">...);</span>
    <span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div></div>

<p>One small gotcha: placement new doesn’t work out of the box, and you need
to provide a definition. That means including <code class="language-plaintext highlighter-rouge">&lt;new&gt;</code> or writing one out.
Fortunately it’s trivial, but the prototype must exactly match, including
<code class="language-plaintext highlighter-rouge">size_t</code>:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="o">*</span><span class="k">operator</span> <span class="nf">new</span><span class="p">(</span><span class="kt">size_t</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">p</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">p</span><span class="p">;</span> <span class="p">}</span>
</code></pre></div></div>

<p>Overall I feel the template is a small improvement over the macro.</p>

<h3 id="strings">Strings</h3>

<p>Recall my basic C string type, with a macro to wrap literals:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define countof(a)  (size)(sizeof(a) / sizeof(*(a)))
#define s8(s)       (s8){(u8 *)s, countof(s)-1}
</span>
<span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
    <span class="n">u8</span>  <span class="o">*</span><span class="n">data</span><span class="p">;</span>
    <span class="n">size</span> <span class="n">len</span><span class="p">;</span>
<span class="p">}</span> <span class="n">s8</span><span class="p">;</span>
</code></pre></div></div>

<p>Since it doesn’t own the underlying buffer — region-based allocation has
already solved the ownership problem — this is what C++ long-windedly
calls a <code class="language-plaintext highlighter-rouge">std::string_view</code>. In C++ we won’t need the <code class="language-plaintext highlighter-rouge">countof</code> macro for
strings, but it’s still generally useful. Converting it to a template,
which is <em>theoretically</em> more robust (rejects pointers), but comes with <a href="https://vittorioromeo.info/index/blog/debug_performance_cpp.html">a
non-zero cost</a>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">template</span><span class="o">&lt;</span><span class="kr">typename</span> <span class="n">T</span><span class="p">,</span> <span class="n">size</span> <span class="n">N</span><span class="o">&gt;</span>
<span class="n">size</span> <span class="nf">countof</span><span class="p">(</span><span class="n">T</span> <span class="p">(</span><span class="o">&amp;</span><span class="p">)[</span><span class="n">N</span><span class="p">])</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="n">N</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The reference — here a reference to an array — is unavoidable, so it’s one
of the rare cases. The same concept applies as an <code class="language-plaintext highlighter-rouge">s8</code> constructor to
replace the macro:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">s8</span> <span class="p">{</span>
    <span class="n">u8</span>  <span class="o">*</span><span class="n">data</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">size</span> <span class="n">len</span>  <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

    <span class="n">s8</span><span class="p">()</span> <span class="o">=</span> <span class="k">default</span><span class="p">;</span>

    <span class="n">template</span><span class="o">&lt;</span><span class="n">size</span> <span class="n">N</span><span class="o">&gt;</span>
    <span class="n">s8</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="p">(</span><span class="o">&amp;</span><span class="n">s</span><span class="p">)[</span><span class="n">N</span><span class="p">])</span> <span class="o">:</span> <span class="n">data</span><span class="p">{(</span><span class="n">u8</span> <span class="o">*</span><span class="p">)</span><span class="n">s</span><span class="p">},</span> <span class="n">len</span><span class="p">{</span><span class="n">N</span><span class="o">-</span><span class="mi">1</span><span class="p">}</span> <span class="p">{}</span>
<span class="p">};</span>
</code></pre></div></div>

<p>I’ve explicitly asked to keep a default zero-initialized (empty) string
since it’s useful — and necessary to directly allocate strings using
<code class="language-plaintext highlighter-rouge">make</code>, e.g. an array of strings. <code class="language-plaintext highlighter-rouge">const</code> is required because string
literals are <code class="language-plaintext highlighter-rouge">const</code> in C++, but it’s immediately stripped off for the
sake of simplicity. The new constructor allows:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">s8</span> <span class="n">version</span> <span class="o">=</span> <span class="s">"1.2.3"</span><span class="p">;</span>
</code></pre></div></div>

<p>Or even <a href="/blog/2023/02/13/">more usefully</a>:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kt">void</span> <span class="nf">print</span><span class="p">(</span><span class="n">bufout</span> <span class="o">*</span><span class="p">,</span> <span class="n">s8</span><span class="p">);</span>
    <span class="c1">// ...</span>
    <span class="n">print</span><span class="p">(</span><span class="n">stdout</span><span class="p">,</span> <span class="s">"hello world</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
</code></pre></div></div>

<p>Define <code class="language-plaintext highlighter-rouge">operator==</code> and it’s more useful yet:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">b32</span> <span class="k">operator</span><span class="o">==</span><span class="p">(</span><span class="n">s8</span> <span class="n">s</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="k">return</span> <span class="n">len</span><span class="o">==</span><span class="n">s</span><span class="p">.</span><span class="n">len</span> <span class="o">&amp;&amp;</span> <span class="p">(</span><span class="o">!</span><span class="n">len</span> <span class="o">||</span> <span class="o">!</span><span class="n">memcmp</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">s</span><span class="p">.</span><span class="n">data</span><span class="p">,</span> <span class="n">len</span><span class="p">));</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>Now this works, and it’s cheap and fast even in debug builds:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">s8</span> <span class="n">key</span> <span class="o">=</span> <span class="p">...;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">key</span> <span class="o">==</span> <span class="s">"HOME"</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// ...</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>That’s more ergonomic than the macro and comparison function. <code class="language-plaintext highlighter-rouge">operator[]</code>
also improves ergonomics, to subscript a string without going through the
<code class="language-plaintext highlighter-rouge">data</code> member:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">u8</span> <span class="o">&amp;</span><span class="k">operator</span><span class="p">[](</span><span class="n">size</span> <span class="n">i</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">assert</span><span class="p">(</span><span class="n">i</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="p">);</span>
        <span class="n">assert</span><span class="p">(</span><span class="n">i</span> <span class="o">&lt;</span> <span class="n">len</span><span class="p">);</span>
        <span class="k">return</span> <span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>The reference is again necessary to make subscripts assignable. Since
<code class="language-plaintext highlighter-rouge">s8span</code> — make a string spanning two pointers — so often appears in my
programs, a constructor seems appropriate, too:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">s8</span><span class="p">(</span><span class="n">u8</span> <span class="o">*</span><span class="n">beg</span><span class="p">,</span> <span class="n">u8</span> <span class="o">*</span><span class="n">end</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">assert</span><span class="p">(</span><span class="n">beg</span> <span class="o">&lt;=</span> <span class="n">end</span><span class="p">);</span>
        <span class="n">data</span> <span class="o">=</span> <span class="n">beg</span><span class="p">;</span>
        <span class="n">len</span> <span class="o">=</span> <span class="n">end</span> <span class="o">-</span> <span class="n">beg</span><span class="p">;</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>By the way, these assertions I’ve been using are great for catching
mistakes quickly and early, and they complement <a href="/blog/2019/01/25/">fuzz testing</a>.</p>

<p>I’m not sold on it, but an idea for the future: C++23’s multi-index
<code class="language-plaintext highlighter-rouge">operator[]</code> as a slice operator:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">s8</span> <span class="k">operator</span><span class="p">[](</span><span class="n">size</span> <span class="n">beg</span><span class="p">,</span> <span class="n">size</span> <span class="n">end</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">assert</span><span class="p">(</span><span class="n">beg</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="p">);</span>
        <span class="n">assert</span><span class="p">(</span><span class="n">beg</span> <span class="o">&lt;=</span> <span class="n">end</span><span class="p">);</span>
        <span class="n">assert</span><span class="p">(</span><span class="n">end</span> <span class="o">&lt;=</span> <span class="n">len</span><span class="p">);</span>
        <span class="k">return</span> <span class="p">{</span><span class="n">data</span><span class="o">+</span><span class="n">beg</span><span class="p">,</span> <span class="n">data</span><span class="o">+</span><span class="n">end</span><span class="p">};</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>Then:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">s8</span> <span class="n">msg</span> <span class="o">=</span> <span class="s">"foo bar baz"</span><span class="p">;</span>
    <span class="n">msg</span> <span class="o">=</span> <span class="n">msg</span><span class="p">[</span><span class="mi">4</span><span class="p">,</span><span class="mi">7</span><span class="p">];</span>  <span class="c1">// msg = "bar"</span>
</code></pre></div></div>

<p>I could keep going with, say, iterators and such, but each will be more
specialized and less useful. (I don’t care about range-based <code class="language-plaintext highlighter-rouge">for</code> loops.)</p>

<h3 id="downside-static-initialization">Downside: static initialization</h3>

<p>The new string stuff is neat, but I hit a wall trying it out: These fancy
constructors do not reliably construct at compile time, <em>not even with a
<code class="language-plaintext highlighter-rouge">constexpr</code> qualifier</em> in two of the three major C++ implementations. A
static lookup table that contains a string is likely constructed at run
time in at least some builds. For example, this table:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="n">s8</span> <span class="n">keys</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span><span class="s">"foo"</span><span class="p">,</span> <span class="s">"bar"</span><span class="p">,</span> <span class="s">"baz"</span><span class="p">};</span>
</code></pre></div></div>

<p>Requires run-time construction in real world cases I care about, requiring
C++ magic and linking runtime gunk. The constructor is therefore a strict
downgrade from the macro, which works perfectly in these lookup tables.
Once a non-default constructor is defined, I’ve been unable to find an
escape hatch back to the original, dumb, reliable behavior.</p>

<p><strong>Update</strong>: Jonathan Müller points out the reinterpret cast is forbidden
in a <code class="language-plaintext highlighter-rouge">constexpr</code> function, so it’s not required to happen at compile time.
After some thought, I’ve figured out a workaround using a union:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">s8</span> <span class="p">{</span>
    <span class="k">union</span> <span class="p">{</span>
        <span class="n">u8</span>         <span class="o">*</span><span class="n">data</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
        <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">cdata</span><span class="p">;</span>
    <span class="p">};</span>
    <span class="n">size</span> <span class="n">len</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

    <span class="k">template</span><span class="o">&lt;</span><span class="n">size</span> <span class="n">N</span><span class="p">&gt;</span>
    <span class="k">constexpr</span> <span class="n">s8</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="p">(</span><span class="o">&amp;</span><span class="n">s</span><span class="p">)[</span><span class="n">N</span><span class="p">])</span> <span class="o">:</span> <span class="n">cdata</span><span class="p">{</span><span class="n">s</span><span class="p">},</span> <span class="n">len</span><span class="p">{</span><span class="n">N</span><span class="o">-</span><span class="mi">1</span><span class="p">}</span> <span class="p">{}</span>

    <span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div></div>

<p>In all three C++ implementations, in all configurations, this reliably
constructs strings at compile time. The other semantics are unchanged.</p>

<h3 id="other-features">Other features</h3>

<p>Having a generic dynamic array would be handy, and more ergonomic than <a href="/blog/2023/10/05/">my
dynamic array macro</a>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">template</span><span class="o">&lt;</span><span class="kr">typename</span> <span class="n">T</span><span class="o">&gt;</span>
<span class="k">struct</span> <span class="n">slice</span> <span class="p">{</span>
    <span class="n">T</span>   <span class="o">*</span><span class="n">data</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">size</span> <span class="n">len</span>  <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">size</span> <span class="n">cap</span>  <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

    <span class="n">slice</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="o">=</span> <span class="k">default</span><span class="p">;</span>

    <span class="n">template</span><span class="o">&lt;</span><span class="n">size</span> <span class="n">N</span><span class="o">&gt;</span>
    <span class="n">slice</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">(</span><span class="n">T</span> <span class="p">(</span><span class="o">&amp;</span><span class="n">a</span><span class="p">)[</span><span class="n">N</span><span class="p">])</span> <span class="o">:</span> <span class="n">data</span><span class="p">{</span><span class="n">a</span><span class="p">},</span> <span class="n">len</span><span class="p">{</span><span class="n">N</span><span class="p">},</span> <span class="n">cap</span><span class="p">{</span><span class="n">N</span><span class="p">}</span> <span class="p">{}</span>

    <span class="n">T</span> <span class="o">&amp;</span><span class="n">operator</span><span class="p">[](</span><span class="n">size</span> <span class="n">i</span><span class="p">)</span> <span class="p">{</span> <span class="p">...</span> <span class="p">}</span>
<span class="p">}</span>

<span class="n">template</span><span class="o">&lt;</span><span class="kr">typename</span> <span class="n">T</span><span class="o">&gt;</span>
<span class="n">slice</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="n">append</span><span class="p">(</span><span class="n">arena</span> <span class="o">*</span><span class="p">,</span> <span class="n">slice</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">,</span> <span class="n">T</span><span class="p">);</span>
</code></pre></div></div>

<p>On the other hand, <a href="/blog/2023/09/30/">hash maps are mostly solved</a>, so I wouldn’t
bother with a generic map.</p>

<p>Function overloads would simplify naming. For example, this in C:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">prints8</span><span class="p">(</span><span class="n">bufout</span> <span class="o">*</span><span class="p">,</span> <span class="n">s8</span><span class="p">);</span>
<span class="n">printi32</span><span class="p">(</span><span class="n">bufout</span> <span class="o">*</span><span class="p">,</span> <span class="n">i32</span><span class="p">);</span>
<span class="n">printf64</span><span class="p">(</span><span class="n">bufout</span> <span class="o">*</span><span class="p">,</span> <span class="n">f64</span><span class="p">);</span>
<span class="n">printvec3</span><span class="p">(</span><span class="n">bufout</span> <span class="o">*</span><span class="p">,</span> <span class="n">vec3</span><span class="p">);</span>
</code></pre></div></div>

<p>Would hide that stuff behind the scenes in the symbol decoration:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">print</span><span class="p">(</span><span class="n">bufout</span> <span class="o">*</span><span class="p">,</span> <span class="n">s8</span><span class="p">);</span>
<span class="n">print</span><span class="p">(</span><span class="n">bufout</span> <span class="o">*</span><span class="p">,</span> <span class="n">i32</span><span class="p">);</span>
<span class="n">print</span><span class="p">(</span><span class="n">bufout</span> <span class="o">*</span><span class="p">,</span> <span class="n">f64</span><span class="p">);</span>
<span class="n">print</span><span class="p">(</span><span class="n">bufout</span> <span class="o">*</span><span class="p">,</span> <span class="n">vec3</span><span class="p">);</span>
</code></pre></div></div>

<p>Same goes for a <code class="language-plaintext highlighter-rouge">hash()</code> function on different types.</p>

<p>C++ has better null pointer semantics than C. Addition or subtraction of
zero with a null pointer produces a null pointer, and subtracting null
pointers results in zero. This eliminates some boneheaded special case
checks required in C, though not all: <code class="language-plaintext highlighter-rouge">memcpy</code>, for instance, arbitrarily
still does not accept null pointers even in C++.</p>

<h3 id="ultimately-worth-it">Ultimately worth it?</h3>

<p>The static data problem is a real bummer, but perhaps it’s worth it for
the other features. I still need to put it all to the test in a real,
sizable project.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>An improved chkstk function on Windows</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2024/02/05/"/>
    <id>urn:uuid:381be450-559c-4521-911a-ba524dca7b64</id>
    <updated>2024-02-05T17:56:05Z</updated>
    <category term="c"/><category term="cpp"/><category term="win32"/><category term="rant"/>
    <content type="html">
      <![CDATA[<p>If you’ve spent much time developing with Mingw-w64 you’ve likely seen the
symbol <code class="language-plaintext highlighter-rouge">___chkstk_ms</code>, perhaps in an error message. It’s a little piece of
runtime provided by GCC via libgcc which ensures enough of the stack is
committed for the caller’s stack frame. The “function” uses a custom ABI
and is implemented in assembly. So is the subject of this article, a
slightly improved implementation soon to be included in <a href="/blog/2020/05/15/">w64devkit</a> as
libchkstk (<code class="language-plaintext highlighter-rouge">-lchkstk</code>).</p>

<p>The MSVC toolchain has an identical (x64) or similar (x86) function named
<code class="language-plaintext highlighter-rouge">__chkstk</code>. We’ll discuss that as well, and w64devkit will include x86 and
x64 implementations, useful when linking with MSVC object files. The new
x86 <code class="language-plaintext highlighter-rouge">__chkstk</code> in particular is also better than the MSVC definition.</p>

<p>A note on spelling: <code class="language-plaintext highlighter-rouge">___chkstk_ms</code> is spelled with three underscores, and
<code class="language-plaintext highlighter-rouge">__chkstk</code> is spelled with two. On x86, <a href="https://learn.microsoft.com/en-us/cpp/build/reference/decorated-names#FormatC"><code class="language-plaintext highlighter-rouge">cdecl</code> functions</a> are
decorated with a leading underscore, and so may be rendered, e.g. in error
messages, with one fewer underscore. The true name is undecorated, and the
raw symbol name is identical on x86 and x64. Further complicating matters,
libgcc defines a <code class="language-plaintext highlighter-rouge">___chkstk</code> with three underscores. As far as I can tell,
this spelling arose from confusion regarding name decoration, but nobody’s
noticed for the past 28 years. libgcc’s x64 <code class="language-plaintext highlighter-rouge">___chkstk</code> is obviously and
badly broken, so I’m sure nobody has ever used it anyway, not even by
accident thanks to the misspelling. I’ll touch on that below.</p>

<p>When referring to a particular instance, I will use a specific spelling.
Otherwise the term “chkstk” refers to the family. If you’d like to skip
ahead to the source for libchkstk: <strong><a href="https://github.com/skeeto/w64devkit/blob/master/src/libchkstk.S"><code class="language-plaintext highlighter-rouge">libchkstk.S</code></a></strong>.</p>

<h3 id="a-gradually-committed-stack">A gradually committed stack</h3>

<p>The header of a Windows executable lists two stack sizes: a <em>reserve</em> size
and an initial <em>commit</em> size. The first is the largest the main thread
stack can grow, and the second is the amount <a href="https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc">committed</a> when the
program starts. A program gradually commits stack pages <em>as needed</em> up to
the reserve size. Binutils <code class="language-plaintext highlighter-rouge">objdump</code> option <code class="language-plaintext highlighter-rouge">-p</code> lists the sizes. Typical
output for a Mingw-w64 program:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ objdump -p example.exe | grep SizeOfStack
SizeOfStackReserve      0000000000200000
SizeOfStackCommit       0000000000001000
</code></pre></div></div>

<p>The values are in hexadecimal, and this indicates 2MiB reserved and 4KiB
initially committed. With the Binutils linker, <code class="language-plaintext highlighter-rouge">ld</code>, you can set them at
link time using <code class="language-plaintext highlighter-rouge">--stack</code>. Via <code class="language-plaintext highlighter-rouge">gcc</code>, use <code class="language-plaintext highlighter-rouge">-Xlinker</code>. For example, to
reserve an 8MiB stack and commit half of it:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -Xlinker --stack=$((8&lt;&lt;20)),$((4&lt;&lt;20)) ...
</code></pre></div></div>

<p>MSVC <code class="language-plaintext highlighter-rouge">link.exe</code> similarly has <a href="https://learn.microsoft.com/en-us/cpp/build/reference/stack-stack-allocations"><code class="language-plaintext highlighter-rouge">/stack</code></a>.</p>

<p>The purpose of this mechanism is to avoid paying the <em>commit charge</em> for
unused stack. It made sense 30 years ago when stacks were a potentially
large portion of physical memory. These days it’s a rounding error and
silly we’re still dealing with it. Using the above options you can choose
to commit the entire stack up front, at which point a chkstk helper is no
longer needed (<a href="https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59532"><code class="language-plaintext highlighter-rouge">-mno-stack-arg-probe</code></a>, <a href="https://learn.microsoft.com/en-us/cpp/build/reference/gs-control-stack-checking-calls"><code class="language-plaintext highlighter-rouge">/Gs2147483647</code></a>). This
requires link-time control of the main module, which isn’t always an
option, like when supplying a DLL for someone else to run.</p>

<p>The program grows the stack by touching the singular <a href="https://devblogs.microsoft.com/oldnewthing/20220203-00/?p=106215">guard page</a>
mapped between the committed and uncommitted portions of the stack. This
action triggers a page fault, and the default fault handler commits the
guard page and maps a new guard page just below. In other words, the stack
grows one page at a time, in order.</p>

<p>In most cases nothing special needs to happen. The guard page mechanism is
transparent and in the background. However, if a function stack frame
exceeds the page size then there’s a chance that it might leap over the
guard page, crashing the program. To prevent this, compilers insert a
chkstk call in the function prologue. Before local variable allocation,
chkstk walks down the stack — that is, towards lower addresses — nudging
the guard page with each step. (As a side effect it provides <a href="/blog/2017/06/21/">stack clash
protection</a> — the only security aspect of chkstk.) For example:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">callee</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="p">);</span>

<span class="kt">void</span> <span class="nf">example</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">char</span> <span class="n">large</span><span class="p">[</span><span class="mi">1</span><span class="o">&lt;&lt;</span><span class="mi">20</span><span class="p">];</span>
    <span class="n">callee</span><span class="p">(</span><span class="n">large</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Compiled with 64-bit <code class="language-plaintext highlighter-rouge">gcc -O</code>:</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">example:</span>
    <span class="nf">movl</span>    <span class="kc">$</span><span class="mi">1048616</span><span class="p">,</span> <span class="o">%</span><span class="nb">eax</span>
    <span class="nf">call</span>    <span class="nv">___chkstk_ms</span>
    <span class="nf">subq</span>    <span class="o">%</span><span class="nb">rax</span><span class="p">,</span> <span class="o">%</span><span class="nb">rsp</span>
    <span class="nf">leaq</span>    <span class="mi">32</span><span class="p">(</span><span class="o">%</span><span class="nb">rsp</span><span class="p">),</span> <span class="o">%</span><span class="nb">rcx</span>
    <span class="nf">call</span>    <span class="nv">callee</span>
    <span class="nf">addq</span>    <span class="kc">$</span><span class="mi">1048616</span><span class="p">,</span> <span class="o">%</span><span class="nb">rsp</span>
    <span class="nf">ret</span>
</code></pre></div></div>

<p>I used GCC, but this is practically identical to the code generated by
MSVC and Clang. Note the call to <code class="language-plaintext highlighter-rouge">___chkstk_ms</code> in the function prologue
before allocating the stack frame (<code class="language-plaintext highlighter-rouge">subq</code>). Also note that it sets <code class="language-plaintext highlighter-rouge">eax</code>.
As a volatile register, this would normally accomplish nothing because
it’s done just before a function call, but recall that <code class="language-plaintext highlighter-rouge">___chkstk_ms</code> has
a custom ABI. That’s the argument to chkstk. Further note that it uses
<code class="language-plaintext highlighter-rouge">rax</code> on the return. That’s not the value returned by chkstk, but rather
that x64 <em>chkstk preserves all registers</em>.</p>

<p>Well, maybe. The official documentation says that registers <a href="https://learn.microsoft.com/en-us/cpp/build/prolog-and-epilog">r10 and r11
are volatile</a>, but that information conflicts with Microsoft’s own
implementation. Just in case, I choose a conservative interpretation that
all registers are preserved.</p>

<h3 id="implementing-chkstk">Implementing chkstk</h3>

<p>In a high level language, chkstk might look something like so:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// NOTE: hypothetical implementation</span>
<span class="kt">void</span> <span class="nf">___chkstk_ms</span><span class="p">(</span><span class="kt">ptrdiff_t</span> <span class="n">frame_size</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">volatile</span> <span class="kt">char</span> <span class="n">frame</span><span class="p">[</span><span class="n">frame_size</span><span class="p">];</span>  <span class="c1">// NOTE: variable-length array</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">ptrdiff_t</span> <span class="n">i</span> <span class="o">=</span> <span class="n">frame_size</span> <span class="o">-</span> <span class="n">PAGE_SIZE</span><span class="p">;</span> <span class="n">i</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">-=</span> <span class="n">PAGE_SIZE</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">frame</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>  <span class="c1">// touch the guard page</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This wouldn’t work for a number of reasons, but if it did, <code class="language-plaintext highlighter-rouge">volatile</code>
would serve two purposes. First, forcing the side effect to occur. The
second is more subtle: The loop must happen in exactly this order, from
high to low. Without <code class="language-plaintext highlighter-rouge">volatile</code>, loop iterations would be independent — as
there are no dependencies between iterations — and so a compiler could
reverse the loop direction.</p>

<p>The store can happen anywhere within the guard page, so it’s not necessary
to align <code class="language-plaintext highlighter-rouge">frame</code> to the page. Simply touching at least one byte per page
is enough. This is essentially the definition of libgcc <code class="language-plaintext highlighter-rouge">___chkstk_ms</code>.</p>

<p>How many iterations occur? In <code class="language-plaintext highlighter-rouge">example</code> above, the stack frame will be
around 1MiB (2<sup>20</sup>). With pages of 4KiB (2<sup>12</sup>) that’s
256 iterations. The loop happens unconditionally, meaning <em>every function
call</em> requires 256 iterations of this loop. Wouldn’t it be better if the
loop ran only as needed, i.e. the first time? MSVC x64 <code class="language-plaintext highlighter-rouge">__chkstk</code> skips
iterations if possible, and the same goes for my new <code class="language-plaintext highlighter-rouge">___chkstk_ms</code>. Much
like <a href="/blog/2022/02/18/#my-getcommandlinew">the command line string</a>, the low address of the current
thread’s guard page is accessible through the <a href="https://en.wikipedia.org/wiki/Win32_Thread_Information_Block">Thread Information
Block</a> (TIB). A chkstk can cheaply query this address, only looping
during initialization or so. (<a href="/blog/2023/03/23/">In contrast to Linux</a>, a thread’s
stack is fundamentally managed by the operating system.)</p>

<p>Taking that into account, an improved algorithm:</p>

<ol>
  <li>Push registers that will be used</li>
  <li>Compute the low address of the new stack frame (F)</li>
  <li>Retrieve the low address of the committed stack (C)</li>
  <li>Go to 7</li>
  <li>Subtract the page size from C</li>
  <li>Touch memory at C</li>
  <li>If C &gt; F, go to 5</li>
  <li>Pop registers to restore them and return</li>
</ol>

<p>A little unusual for an unconditional forward jump in pseudo-code, but
this closely matches my assembly. The loop causes page faults, and it’s
the slow, uncommon path. The common, fast path never executes 5–6. I’d
also chose smaller instructions in order to keep the function small and
reduce instruction cache pressure. My x64 implementation as of this
writing:</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">___chkstk_ms:</span>
    <span class="nf">push</span> <span class="o">%</span><span class="nb">rax</span>              <span class="o">//</span> <span class="mi">1</span><span class="nv">.</span>
    <span class="nf">push</span> <span class="o">%</span><span class="nb">rcx</span>              <span class="o">//</span> <span class="mi">1</span><span class="nv">.</span>
    <span class="nf">neg</span>  <span class="o">%</span><span class="nb">rax</span>              <span class="o">//</span> <span class="mi">2</span><span class="nv">.</span> <span class="nb">rax</span> <span class="err">=</span> <span class="nv">frame</span> <span class="nv">low</span> <span class="nv">address</span>
    <span class="nf">add</span>  <span class="o">%</span><span class="nb">rsp</span><span class="p">,</span> <span class="o">%</span><span class="nb">rax</span>        <span class="o">//</span> <span class="mi">2</span><span class="nv">.</span> <span class="err">"</span>
    <span class="nf">mov</span>  <span class="o">%</span><span class="nb">gs</span><span class="p">:(</span><span class="mh">0x10</span><span class="p">),</span> <span class="o">%</span><span class="nb">rcx</span>  <span class="o">//</span> <span class="mi">3</span><span class="nv">.</span> <span class="nb">rcx</span> <span class="err">=</span> <span class="nv">stack</span> <span class="nv">low</span> <span class="nv">address</span>
    <span class="nf">jmp</span>  <span class="mi">1</span><span class="nv">f</span>                <span class="o">//</span> <span class="mi">4</span><span class="nv">.</span>
<span class="err">0:</span>  <span class="nf">sub</span>  <span class="kc">$</span><span class="mh">0x1000</span><span class="p">,</span> <span class="o">%</span><span class="nb">rcx</span>     <span class="o">//</span> <span class="mi">5</span><span class="nv">.</span>
    <span class="nf">test</span> <span class="o">%</span><span class="nb">eax</span><span class="p">,</span> <span class="p">(</span><span class="o">%</span><span class="nb">rcx</span><span class="p">)</span>      <span class="o">//</span> <span class="mi">6</span><span class="nv">.</span> <span class="nv">page</span> <span class="nv">fault</span> <span class="p">(</span><span class="nv">very</span> <span class="nv">slow</span><span class="err">!</span><span class="p">)</span>
<span class="err">1:</span>  <span class="nf">cmp</span>  <span class="o">%</span><span class="nb">rax</span><span class="p">,</span> <span class="o">%</span><span class="nb">rcx</span>        <span class="o">//</span> <span class="mi">7</span><span class="nv">.</span>
    <span class="nf">ja</span>   <span class="mb">0b</span>                <span class="o">//</span> <span class="mi">7</span><span class="nv">.</span>
    <span class="nf">pop</span>  <span class="o">%</span><span class="nb">rcx</span>              <span class="o">//</span> <span class="mi">8</span><span class="nv">.</span>
    <span class="nf">pop</span>  <span class="o">%</span><span class="nb">rax</span>              <span class="o">//</span> <span class="mi">8</span><span class="nv">.</span>
    <span class="nf">ret</span>                    <span class="o">//</span> <span class="mi">8</span><span class="nv">.</span>
</code></pre></div></div>

<p>I’ve labeled each instruction with its corresponding pseudo-code. Step 6
is unusual among chkstk implementations: It’s not a <em>store</em>, but a <em>load</em>,
still sufficient to fault the page. That <code class="language-plaintext highlighter-rouge">test</code> instruction is just two
bytes, and unlike other two-byte options, doesn’t write garbage onto the
stack — which <em>would</em> be allowed — nor use an extra register. I searched
through single byte instructions that can page fault, all of which involve
implicit addressing through <code class="language-plaintext highlighter-rouge">rdi</code> or <code class="language-plaintext highlighter-rouge">rsi</code>, but they increment <code class="language-plaintext highlighter-rouge">rdi</code> or
<code class="language-plaintext highlighter-rouge">rsi</code>, and would would require another instruction to correct it.</p>

<p>Because of the return address and two <code class="language-plaintext highlighter-rouge">push</code> operations, the low stack
frame address is technically <em>too low</em> by 24 bytes. That’s fine. If this
exhausts the stack, the program is really cutting it close and the stack
is too small anyway. I could be more precise — which, as we’ll soon see,
is required for x86 <code class="language-plaintext highlighter-rouge">__chkstk</code> — but it would cost an extra instruction
byte.</p>

<p>On x64, <code class="language-plaintext highlighter-rouge">___chkstk_ms</code> and <code class="language-plaintext highlighter-rouge">__chkstk</code> have identical semantics, so name it
<code class="language-plaintext highlighter-rouge">__chkstk</code> — which I’ve done in libchkstk — and it works with MSVC. The
only practical difference between my chkstk and MSVC <code class="language-plaintext highlighter-rouge">__chkstk</code> is that
mine is smaller: 36 bytes versus 48 bytes. Largest of all, despite lacking
the optimization, is libgcc <code class="language-plaintext highlighter-rouge">___chkstk_ms</code>, weighing 50 bytes, or in
practice, due to an unfortunate Binutils default of padding sections, 64
bytes.</p>

<p>I’m no assembly guru, and I bet this can be even smaller without hurting
the fast path, but this is the best I could come up with at this time.</p>

<p><strong>Update</strong>: Stefan Kanthak, who has <a href="https://skanthak.homepage.t-online.de/msvcrt.html">extensively explored this
topic</a>, points out that large stack frame requests might overflow
my low frame address calculation at (3), effectively disabling the probe.
Such requests might occur from alloca calls or variable-length arrays
(VLAs) with untrusted sizes. As far as I’m concerned, such programs are
already broken, but it only cost a two-byte instruction to deal with it. I
have not changed this article, but the source in w64devkit <a href="https://github.com/skeeto/w64devkit/commit/50b343db">has been
updated</a>.</p>

<h3 id="32-bit-chkstk">32-bit chkstk</h3>

<p>On x86 <code class="language-plaintext highlighter-rouge">___chkstk_ms</code> has identical semantics to x64. Mine is a copy-paste
of my x64 chkstk but with 32-bit registers and an updated TIB lookup. GCC
was ahead of the curve on this design.</p>

<p>However, x86 <code class="language-plaintext highlighter-rouge">__chkstk</code> is <em>bonkers</em>. It not only commits the stack, but
also allocates the stack frame. That is, it returns with a different stack
pointer. The return pointer is initially <em>inside the new stack frame</em>, so
chkstk must retrieve it and return by other means. It must also precisely
compute the low frame address.</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">__chkstk:</span>
    <span class="nf">push</span> <span class="o">%</span><span class="nb">ecx</span>               <span class="o">//</span> <span class="mi">1</span><span class="nv">.</span>
    <span class="nf">neg</span>  <span class="o">%</span><span class="nb">eax</span>               <span class="o">//</span> <span class="mi">2</span><span class="nv">.</span>
    <span class="nf">lea</span>  <span class="mi">8</span><span class="p">(</span><span class="o">%</span><span class="nb">esp</span><span class="p">,</span><span class="o">%</span><span class="nb">eax</span><span class="p">),</span> <span class="o">%</span><span class="nb">eax</span> <span class="o">//</span> <span class="mi">2</span><span class="nv">.</span>
    <span class="nf">mov</span>  <span class="o">%</span><span class="nb">fs</span><span class="p">:(</span><span class="mh">0x08</span><span class="p">),</span> <span class="o">%</span><span class="nb">ecx</span>   <span class="o">//</span> <span class="mi">3</span><span class="nv">.</span>
    <span class="nf">jmp</span>  <span class="mi">1</span><span class="nv">f</span>                 <span class="o">//</span> <span class="mi">4</span><span class="nv">.</span>
<span class="err">0:</span>  <span class="nf">sub</span>  <span class="kc">$</span><span class="mh">0x1000</span><span class="p">,</span> <span class="o">%</span><span class="nb">ecx</span>      <span class="o">//</span> <span class="mi">5</span><span class="nv">.</span>
    <span class="nf">test</span> <span class="o">%</span><span class="nb">eax</span><span class="p">,</span> <span class="p">(</span><span class="o">%</span><span class="nb">ecx</span><span class="p">)</span>       <span class="o">//</span> <span class="mi">6</span><span class="nv">.</span> <span class="nv">page</span> <span class="nv">fault</span> <span class="p">(</span><span class="nv">very</span> <span class="nv">slow</span><span class="err">!</span><span class="p">)</span>
<span class="err">1:</span>  <span class="nf">cmp</span>  <span class="o">%</span><span class="nb">eax</span><span class="p">,</span> <span class="o">%</span><span class="nb">ecx</span>         <span class="o">//</span> <span class="mi">7</span><span class="nv">.</span>
    <span class="nf">ja</span>   <span class="mb">0b</span>                 <span class="o">//</span> <span class="mi">7</span><span class="nv">.</span>
    <span class="nf">pop</span>  <span class="o">%</span><span class="nb">ecx</span>               <span class="o">//</span> <span class="mi">8</span><span class="nv">.</span>
    <span class="nf">xchg</span> <span class="o">%</span><span class="nb">eax</span><span class="p">,</span> <span class="o">%</span><span class="nb">esp</span>         <span class="o">//</span> <span class="nv">?.</span> <span class="nb">al</span><span class="nv">locate</span> <span class="nv">frame</span>
    <span class="nf">jmp</span>  <span class="o">*</span><span class="p">(</span><span class="o">%</span><span class="nb">eax</span><span class="p">)</span>            <span class="o">//</span> <span class="mi">8</span><span class="nv">.</span> <span class="nv">return</span>
</code></pre></div></div>

<p>The main differences are:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">eax</code> is treated as volatile, so it is not saved</li>
  <li>The low frame address is precisely computed with <code class="language-plaintext highlighter-rouge">lea</code> (2)</li>
  <li>The frame is allocated at step (?) by swapping F and the stack pointer</li>
  <li>Post-swap F now points at the return address, so jump through it</li>
</ul>

<p>MSVC x86 <code class="language-plaintext highlighter-rouge">__chkstk</code> does not query the TIB (3), and so unconditionally
runs the loop. So there’s an advantage to my implementation besides size.</p>

<p>libgcc x86 <code class="language-plaintext highlighter-rouge">___chkstk</code> has this behavior, and so it’s also a suitable
<code class="language-plaintext highlighter-rouge">__chkstk</code> aside from the misspelling. Strangely, libgcc x64 <code class="language-plaintext highlighter-rouge">___chkstk</code>
<em>also</em> allocates the stack frame, which is never how chkstk was supposed
to work on x64. I can only conclude it’s never been used.</p>

<h3 id="optimization-in-practice">Optimization in practice</h3>

<p>Does the skip-the-loop optimization matter in practice? Consider a
function using a large-ish, stack-allocated array, perhaps to process
<a href="/blog/2023/08/23/">environment variables</a> or <a href="https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation">long paths</a>, each of which max out
around 64KiB.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">_Bool</span> <span class="nf">path_contains</span><span class="p">(</span><span class="kt">wchar_t</span> <span class="o">*</span><span class="n">name</span><span class="p">,</span> <span class="n">wchar</span> <span class="o">*</span><span class="n">path</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">wchar_t</span> <span class="n">var</span><span class="p">[</span><span class="mi">1</span><span class="o">&lt;&lt;</span><span class="mi">15</span><span class="p">];</span>
    <span class="n">GetEnvironmentVariableW</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">var</span><span class="p">,</span> <span class="n">countof</span><span class="p">(</span><span class="n">var</span><span class="p">));</span>
    <span class="c1">// ... search for path in var ...</span>
<span class="p">}</span>

<span class="kt">int64_t</span> <span class="nf">getfilesize</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">path</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">wchar_t</span> <span class="n">wide</span><span class="p">[</span><span class="mi">1</span><span class="o">&lt;&lt;</span><span class="mi">15</span><span class="p">];</span>
    <span class="n">MultiByteToWideChar</span><span class="p">(</span><span class="n">CP_UTF8</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">path</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="n">wide</span><span class="p">,</span> <span class="n">countof</span><span class="p">(</span><span class="n">wide</span><span class="p">));</span>
    <span class="c1">// ... look up file size via wide path ...</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">example</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">path_contains</span><span class="p">(</span><span class="s">L"PATH"</span><span class="p">,</span> <span class="s">L"c:</span><span class="se">\\</span><span class="s">windows</span><span class="se">\\</span><span class="s">system32"</span><span class="p">))</span> <span class="p">{</span>
        <span class="c1">// ...</span>
    <span class="p">}</span>

    <span class="kt">int64_t</span> <span class="n">size</span> <span class="o">=</span> <span class="n">getfilesize</span><span class="p">(</span><span class="s">"π.txt"</span><span class="p">);</span>
    <span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Each call to these functions with such large local arrays is also a call
to chkstk. Though with a 64KiB frame, that’s only 16 iterations; barely
detectable in a benchmark. If the function touches the file system, which
is likely when processing paths, then chkstk doesn’t matter at all. My
starting example had a 1MiB array, or 256 chkstk iterations. That starts
to become measurable, though it’s also pushing the limits. At that point
you <a href="/blog/2023/09/27/">ought to be using a scratch arena</a>.</p>

<p>So ultimately after writing an improved <code class="language-plaintext highlighter-rouge">___chkstk_ms</code> I could only
measure a tiny difference in contrived programs, and none in any real
application. Though there’s still one more benefit I haven’t yet
mentioned…</p>

<h3 id="the-first-thing-we-do-lets-kill-all-the-lawyers">“The first thing we do, let’s <a href="/blog/2023/06/22/#119-henry-vi">kill all the lawyers</a>”.</h3>

<p>My original motivation for this project wasn’t the optimization — which I
didn’t even discover until after I had started — but <em>licensing</em>. I hate
software licenses, and the <a href="/blog/2023/01/18/">tools I’ve written for w64devkit</a>
are dedicated to the public domain. Both source <em>and</em> binaries (as
distributed). I can do so because <a href="/blog/2023/02/15/">I don’t link runtime components</a>,
not even libgcc. Not <a href="/blog/2023/05/31/">even header files</a>. Every byte of code in those
binaries is my work or the work of my collaborators.</p>

<p>Every once in awhile <code class="language-plaintext highlighter-rouge">___chkstk_ms</code> rears its ugly head, and I have to
make a decision. Do I re-work my code to avoid it? Do I take the reigns of
the linker and disable stack probes? I haven’t necessarily allocated a
large local array: A bit of luck with function inlining can combine
several smaller stack frames into one that’s just large enough to require
chkstk.</p>

<p>Since libgcc falls under the <a href="https://www.gnu.org/licenses/gcc-exception-3.1.html">GCC Runtime Library Exception</a>, if it’s
linked into my program through an “Eligible Compilation Process” — which I
believe includes w64devkit — then the GPL-licensed functions embedded in
my binary are legally siloed and the GPL doesn’t infect the rest of the
program. These bits are still GPL in isolation, and if someone were to
copy them out of the program then they’d be normal GPL code again. In
other words, it’s not a 100% public domain binary if libgcc was linked!</p>

<p>(If some FSF lawyer says I’m wrong, then this is an escape hatch through
which anyone can scrub the GPL from GCC runtime code, and then ignore the
runtime exception entirely.)</p>

<p>MSVC is worse. Hardly anyone follows its license, but fortunately for most
the license is practically unenforced. Its chkstk, which currently resides
in a loose <code class="language-plaintext highlighter-rouge">chkstk.obj</code>, falls into what Microsoft calls “Distributable
Code.” Its license requires “external end users to agree to terms that
protect the Distributable Code.” In other words, if you compile a program
with MSVC, you’re required to have a EULA including the relevant terms
from the Visual Studio license. You’re not legally permitted to distribute
software in the manner of w64devkit — no installer, just a portable zip
distribution — if that software has been built with MSVC.  At least not
without special care which nobody does. (Don’t worry, I won’t tell.)</p>

<h3 id="how-to-use-libchkstk">How to use libchkstk</h3>

<p>To avoid libgcc entirely you need <code class="language-plaintext highlighter-rouge">-nostdlib</code>. Otherwise it’s implicitly
offered to the linker, and you’d need to manually check if it picked up
code from libgcc. If <code class="language-plaintext highlighter-rouge">ld</code> complains about a missing chkstk, use <code class="language-plaintext highlighter-rouge">-lchkstk</code>
to get a definition. If you use <code class="language-plaintext highlighter-rouge">-lchkstk</code> when it’s not needed, nothing
happens, so it’s safe to always include.</p>

<p>I also recently added <a href="https://github.com/skeeto/w64devkit/blob/master/src/libmemory.c">a libmemory</a> to w64devkit, providing tiny,
public domain definitions of <code class="language-plaintext highlighter-rouge">memset</code>, <code class="language-plaintext highlighter-rouge">memcpy</code>, <code class="language-plaintext highlighter-rouge">memmove</code>, <code class="language-plaintext highlighter-rouge">memcmp</code>, and
<code class="language-plaintext highlighter-rouge">strlen</code>. All compilers fabricate calls to these five functions even if
you don’t call them yourself, which is how they were selected. (Not
because I like them. <a href="/blog/2023/02/11/">I really don’t.</a>). If a <code class="language-plaintext highlighter-rouge">-nostdlib</code> build
complains about these, too, then add <code class="language-plaintext highlighter-rouge">-lmemory</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -nostdlib ... -lchkstk -lmemory
</code></pre></div></div>

<p>In MSVC the equivalent option is <code class="language-plaintext highlighter-rouge">/nodefaultlib</code>, after which you may see
missing chkstk errors, and perhaps more. <code class="language-plaintext highlighter-rouge">libchkstk.a</code> is compatible with
MSVC, and <code class="language-plaintext highlighter-rouge">link.exe</code> doesn’t care that the extension is <code class="language-plaintext highlighter-rouge">.a</code> rather than
<code class="language-plaintext highlighter-rouge">.lib</code>, so supply it at link time. Same goes for <code class="language-plaintext highlighter-rouge">libmemory.a</code> if you need
any of those, too.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cl ... /link /nodefaultlib libchkstk.a libmemory.a
</code></pre></div></div>

<p>While I despise licenses, I still take them seriously in the software I
distribute. With libchkstk I have another tool to get it under control.</p>

<hr />

<p>Big thanks to Felipe Garcia for reviewing and correcting mistakes in this
article before it was published!</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Two handy GDB breakpoint tricks</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2024/01/28/"/>
    <id>urn:uuid:e56cce3b-8e70-497b-a13a-e609bacdde88</id>
    <updated>2024-01-28T21:56:07Z</updated>
    <category term="c"/><category term="cpp"/>
    <content type="html">
      <![CDATA[<p>Over the past couple months I’ve discovered a couple of handy tricks for
working with GDB breakpoints. I figured these out on my own, and I’ve not
seen either discussed elsewhere, so I really ought to share them.</p>

<h3 id="continuable-assertions">Continuable assertions</h3>

<p>The <code class="language-plaintext highlighter-rouge">assert</code> macro in typical C implementations <a href="/blog/2022/06/26/">leaves a lot to be
desired</a>, as does <code class="language-plaintext highlighter-rouge">raise</code> and <code class="language-plaintext highlighter-rouge">abort</code>, so I’ve suggested
alternative definitions that behave better under debuggers:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define assert(c)  while (!(c)) __builtin_trap()
#define assert(c)  while (!(c)) __builtin_unreachable()
#define assert(c)  while (!(c)) *(volatile int *)0 = 0
</span></code></pre></div></div>

<p>Each serves a slightly <a href="/blog/2023/10/08/#macros">different purpose</a> but still has the most
important property: Immediately halt the program <em>directly on the defect</em>.
None have an occasionally useful secondary property: Optionally allow the
program to continue through the defect. If the program reaches the body of
any of these macros then there is no reliable continuation. Even manually
nudging the instruction pointer over the assertion isn’t enough. Compilers
assume that the program cannot continue through the condition and generate
code accordingly.</p>

<p>The MSVC ecosystem has a solution for this on x86: <code class="language-plaintext highlighter-rouge">int3</code>. The portable
name is <code class="language-plaintext highlighter-rouge">__debugbreak</code>, a name <a href="/blog/2022/07/31/">I’ve borrowed elsewhere</a>.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define assert(c)  do if (!(c)) __debugbreak(); while (0)
</span></code></pre></div></div>

<p>On x86 it inserts an <code class="language-plaintext highlighter-rouge">int3</code> instruction, which fires an interrupt,
trapping in the attached debugger, or otherwise abnormally terminating the
program. Because it’s an interrupt, it’s expected that the program might
continue. It even leaves the instruction pointer on the next instruction.
As of this writing, GCC has no matching intrinsic, but Clang recently
added <code class="language-plaintext highlighter-rouge">__builtin_debugtrap</code>. In GCC you need some less portable inline
assembly: <code class="language-plaintext highlighter-rouge">asm ("int3")</code>.</p>

<p>However, regardless of how you get an <code class="language-plaintext highlighter-rouge">int3</code> in your program, GDB does not
currently understand it. The problem is that feature I mentioned: The
instruction pointer does not point at the <code class="language-plaintext highlighter-rouge">int3</code> but the next instruction.
This confuses GDB, causing it to break in the wrong places, possibly even
in the wrong scope. For example:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">n</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// ...</span>
    <span class="n">int3_assert</span><span class="p">(...);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>With <code class="language-plaintext highlighter-rouge">int3</code> at the very end of the loop, GDB will break at the <em>top</em> of
the next loop iteration, because that’s where the instruction pointer
lands by the time GDB is involved. It’s a similar story when placed at the
end of a function, leaving GDB to break in the caller. To resolve this, we
need the instruction pointer to still be “inside” the breakpoint after the
interrupt fires. Easy! Add a <code class="language-plaintext highlighter-rouge">nop</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define breakpoint()  asm ("int3; nop")
</span></code></pre></div></div>

<p>This behaves beautifully, eliminating all the problems GDB has with a
plain <code class="language-plaintext highlighter-rouge">int3</code>. Not only is this a solid basis for a continuable assertion,
it’s also useful as a fast conditional breakpoint, where conventional
conditional breakpoints are far too slow.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">1000000000</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="cm">/* rare condition */</span><span class="p">)</span> <span class="n">breakpoint</span><span class="p">();</span>
    <span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Could GDB handle <code class="language-plaintext highlighter-rouge">int3</code> better? Yes! Visual Studio, for instance, does not
require the <code class="language-plaintext highlighter-rouge">nop</code> instruction. As far as I know there is no ARM equivalent
compatible with GDB (or even LLDB). The closest instruction, <code class="language-plaintext highlighter-rouge">brk #0x1</code>,
does not behave as needed.</p>

<h3 id="named-positions">Named positions</h3>

<p>GDB’s built-in user interface understands three classes of breakpoint
positions: symbols, context-free line numbers, and absolute addresses.
When you set some breakpoints and (re)start a program under GDB, each kind
of breakpoint is handled differently:</p>

<ul>
  <li>
    <p>Resolve each symbol, placing a breakpoint on its run-time address.</p>
  </li>
  <li>
    <p>Map each file+lineno tuple to a run-time address, and place a breakpoint
on that address. If the line does not exist (i.e. the file is shorter),
skip it.</p>
  </li>
  <li>
    <p>Place breakpoints exactly on each absolute address. If it’s not a mapped
address, don’t start the program.</p>
  </li>
</ul>

<p>The first is the best case because it adapts to program changes. Modify
the code, recompile, and the breakpoint generally remains where you want
it.</p>

<p>The third is the least useful. These breakpoints rarely survive across
rebuilds, and sometimes not even across reruns.</p>

<p>The second is in the middle between useful and useless. If you edit the
source file which has the breakpoint — likely, because you placed the
breakpoint there for a reason — chances are high that the line number is
no longer correct. Instead it drifts, requiring manual replacement. This
is tedious and GDB ought to do better. Think that’s unreasonable? The
Visual Studio debugger does exactly that <a href="https://lists.sr.ht/~skeeto/public-inbox/%3C2d3d7662a361ddd049f7dc65b94cecdd%40disroot.org%3E#%3C20240112210447.mxhvo7bg4mjp4jyz@nullprogram.com%3E">quite effectively</a> through
external code edits! GDB front ends tend to handle it better, especially
when they’re also the code editor and so directly observe all edits.</p>

<p>As a workaround we can get the first kind by temporarily <em>naming</em> a line
number. This requires editing the source, but remember, the very reason we
need it is because the source in question is actively changing. How to
name a line? C and C++ labels give a name to program position:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">example</span><span class="p">(</span><span class="kt">double</span> <span class="o">*</span><span class="n">nums</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">,</span> <span class="p">...)</span>
<span class="p">{</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">n</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="nl">loop:</span>  <span class="c1">// named position at the start of the loop</span>
        <span class="c1">// ...</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The name <code class="language-plaintext highlighter-rouge">loop</code> is local to <code class="language-plaintext highlighter-rouge">example</code>, but the qualified <code class="language-plaintext highlighter-rouge">example:loop</code> is
a global name, as suitable as any other symbol. I could, say, reliably
trace the progress of this loop despite changes to its position in the
source.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) dprintf example:loop,"nums[%d] = %g\n",i,nums[i]
</code></pre></div></div>

<p>One downside is dealing with <code class="language-plaintext highlighter-rouge">-Wunused-label</code> (enabled by <code class="language-plaintext highlighter-rouge">-Wall</code>), and so
I’ve considered disabling the warning in <a href="/blog/2023/04/29/">my defaults</a>. <strong>Update</strong>:
Matthew Fernandez pointed out that the <code class="language-plaintext highlighter-rouge">unused</code> label attribute eliminates
the warning, solving my problem:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">n</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="nl">loop:</span> <span class="n">__attribute</span><span class="p">((</span><span class="n">unused</span><span class="p">))</span>
        <span class="c1">// ...</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>More often I use an assembly label, usually named <code class="language-plaintext highlighter-rouge">b</code> for convenience:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">n</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">asm</span> <span class="p">(</span><span class="s">"b:"</span><span class="p">);</span>
        <span class="c1">// ...</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>Like <code class="language-plaintext highlighter-rouge">int3</code>, sometimes it’s necessary to give it a <code class="language-plaintext highlighter-rouge">nop</code> so that GDB has
something on which to break. “Enabling” it at any time is quick:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) b b
</code></pre></div></div>

<p>Because it’s not <a href="https://sourceware.org/binutils/docs/as/Global.html"><code class="language-plaintext highlighter-rouge">.globl</code></a>, it’s a weak symbol, and I can place up to
one per translation unit, all covered by the same GDB breakpoint item
(less useful than it sounds). I haven’t actually checked, but I probably
more often use <code class="language-plaintext highlighter-rouge">dprintf</code> with such named lines than actual breakpoints.</p>

<p>If you have similar tips and tricks of your own, I’d like to learn about
them!</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Hand-written Windows API prototypes: fast, flexible, and tedious</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2023/05/31/"/>
    <id>urn:uuid:35b44114-7ad2-422b-9eaf-dc37e7eaaf97</id>
    <updated>2023-05-31T01:38:31Z</updated>
    <category term="win32"/><category term="c"/><category term="cpp"/>
    <content type="html">
      <![CDATA[<p>I love fast builds, and for years I’ve been bothered by the build penalty
for translation units including <code class="language-plaintext highlighter-rouge">windows.h</code>. This header has an enormous
number of definitions and declarations and so, for C programs, it tends to
dominate the build time of those translation units. Most programs,
especially systems software, only needs a tiny portion of it. For example,
when compiling <a href="/blog/2023/01/18/">u-config</a> with GCC, two thirds of the debug build was
spent processing <code class="language-plaintext highlighter-rouge">windows.h</code> just for <a href="https://github.com/skeeto/u-config/blob/e6ebb9b/miniwin32.h">4 types, 16 definitions, and 16
prototypes</a>.</p>

<p>To give a sense of the numbers, here’s <code class="language-plaintext highlighter-rouge">empty.c</code>, which does nothing but
include <code class="language-plaintext highlighter-rouge">windows.h</code>.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;windows.h&gt;</span><span class="cp">
</span></code></pre></div></div>

<p>With the current Mingw-w64 headers, that’s ~82kLOC (non-blank):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -E empty.c | grep -vc '^$'
82041
</code></pre></div></div>

<p>With <a href="https://github.com/skeeto/w64devkit">w64devkit</a> this takes my system ~450ms to compile with GCC:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ time gcc -c empty.c
real    0m 0.45s
user    0m 0.00s
sys     0m 0.00s
</code></pre></div></div>

<p>Compiling an actually empty source file takes ~10ms, so it really is
spending practically all that time processing headers. MSVC is a faster
compiler, and this extends to processing an even larger <code class="language-plaintext highlighter-rouge">windows.h</code> that
crosses over 100kLOC (VS2022). It clocks in at 120ms on the same system:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cl /nologo /E empty.c | grep -vc '^$'
empty.c
100944
$ time cl /nologo /c empty.c
empty.c
real    0m 0.12s
user    0m 0.09s
sys     0m 0.01s
</code></pre></div></div>

<p>That’s just low enough to be tolerable, but I’d like the situation with
GCC to be better. Defining <code class="language-plaintext highlighter-rouge">WIN32_LEAN_AND_MEAN</code> reduces the number of
included headers, which has a significant effect:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -E -DWIN32_LEAN_AND_MEAN empty.c | grep -vc '^$'
55025
$ time gcc -c -DWIN32_LEAN_AND_MEAN empty.c
real    0m 0.30s
user    0m 0.00s
sys     0m 0.00s

$ cl /nologo /E /DWIN32_LEAN_AND_MEAN empty.c | grep -vc '^$'
empty.c
41436
$ time cl /nologo /c /DWIN32_LEAN_AND_MEAN empty.c
empty.c
real    0m 0.07s
user    0m 0.01s
sys     0m 0.01s
</code></pre></div></div>

<h3 id="precompiled-headers">Precompiled headers</h3>

<p>The official solution is precompiled headers. Put all the system header
includes, <a href="/blog/2023/01/08/">or similar</a>, into a dedicated header, then compile that
header into a special format. For example, <code class="language-plaintext highlighter-rouge">headers.h</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define WIN32_LEAN_AND_MEAN
#include</span> <span class="cpf">&lt;windows.h&gt;</span><span class="cp">
</span></code></pre></div></div>

<p>Then <code class="language-plaintext highlighter-rouge">main.c</code> includes <code class="language-plaintext highlighter-rouge">windows.h</code> through this header:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">"headers.h"</span><span class="cp">
</span>
<span class="kt">int</span> <span class="nf">mainCRTStartup</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>If I ask <a href="https://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html">GCC to compile <code class="language-plaintext highlighter-rouge">headers.h</code></a>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc headers.h
</code></pre></div></div>

<p>It produces <code class="language-plaintext highlighter-rouge">headers.h.gch</code>. When a source includes <code class="language-plaintext highlighter-rouge">headers.h</code>, GCC first
searches for an appropriate <code class="language-plaintext highlighter-rouge">.gch</code>. Not only must the name match, but so
must all the definitions at the moment of inclusion: <code class="language-plaintext highlighter-rouge">headers.h</code> should
always be the first included header, otherwise it may not work. Now when I
compile <code class="language-plaintext highlighter-rouge">main.c</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ time gcc -c main.c
real    0m 0.04s
user    0m 0.00s
sys     0m 0.00s
</code></pre></div></div>

<p>Much better! MSVC has a conventional name for this header recognizable to
every Visual Studio user: <code class="language-plaintext highlighter-rouge">stdafx.h</code>. It works a bit differently, and I’ve
never used it myself, but I trust it has similar results.</p>

<p>Precompiled headers requires some extra steps that vary by toolchain. Can
we do better? That depends on your definition of “better!”</p>

<h3 id="artisan-handcrafted-prototypes">Artisan, handcrafted prototypes</h3>

<p>As mentioned, systems software tends to need only a few declarations:
open, read, write, stat, etc. What if I wrote these out manually? A bit
tedious, but it doesn’t require special precompiled header handling. It
also creates some new possibilities. To illustrate, a <a href="/blog/2023/02/15/">CRT-free</a>
“hello world” program:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;windows.h&gt;</span><span class="cp">
</span>
<span class="kt">int</span> <span class="nf">mainCRTStartup</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">HANDLE</span> <span class="n">stdout</span> <span class="o">=</span> <span class="n">GetStdHandle</span><span class="p">(</span><span class="n">STD_OUTPUT_HANDLE</span><span class="p">);</span>
    <span class="kt">char</span> <span class="n">message</span><span class="p">[]</span> <span class="o">=</span> <span class="s">"Hello, world!</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
    <span class="n">DWORD</span> <span class="n">len</span><span class="p">;</span>
    <span class="k">return</span> <span class="o">!</span><span class="n">WriteFile</span><span class="p">(</span><span class="n">stdout</span><span class="p">,</span> <span class="n">message</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">message</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">len</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This takes my system half a second to compile — quite long to produce just
26 assembly instructions:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ time cc -nostartfiles -o hello.exe hello.c
real    0m 0.50s
user    0m 0.00s
sys     0m 0.00s
$ ./hello.exe
Hello, world!
</code></pre></div></div>

<p>The program requires prototypes only for GetStdHandle and WriteFile, a
definition for <code class="language-plaintext highlighter-rouge">STD_OUTPUT_HANDLE</code>, and some typedefs. Starting with the
easy stuff, the definition and <a href="https://learn.microsoft.com/en-us/windows/win32/winprog/windows-data-types">types look like this</a>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define STD_OUTPUT_HANDLE ((DWORD)-11)
</span>
<span class="k">typedef</span> <span class="kt">int</span> <span class="n">BOOL</span><span class="p">;</span>
<span class="k">typedef</span> <span class="kt">void</span> <span class="o">*</span><span class="n">HANDLE</span><span class="p">;</span>
<span class="k">typedef</span> <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">DWORD</span><span class="p">;</span>
</code></pre></div></div>

<p>By the way, here’s a cheat code for quickly finding preprocessor
definitions, faster than looking them up elsewhere:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ echo '#include &lt;windows.h&gt;' | gcc -E -dM - | grep 'STD_\w*_HANDLE'
#define STD_INPUT_HANDLE ((DWORD)-10)
#define STD_ERROR_HANDLE ((DWORD)-12)
#define STD_OUTPUT_HANDLE ((DWORD)-11)
</code></pre></div></div>

<p>Did you catch the pattern? It’s <code class="language-plaintext highlighter-rouge">-10 - fd</code>, where <code class="language-plaintext highlighter-rouge">fd</code> is the conventional
unix file descriptor number: a kind of mnemonic.</p>

<p>Prototypes are a little trickier, especially if you care about 32-bit. The
Windows API uses the “stdcall” calling convention, which is distinct from
the “cdecl” calling convention on x86, though the same on x64. Of course,
you must already be aware of this merely using the API, as your own
callbacks must usually be stdcall themselves. Further, API functions are
<a href="/blog/2021/05/31/">DLL imports</a> and should be declared as such. Putting it together,
here’s GetStdHandle:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">__declspec</span><span class="p">(</span><span class="n">dllimport</span><span class="p">)</span>
<span class="n">HANDLE</span> <span class="kr">__stdcall</span> <span class="nf">GetStdHandle</span><span class="p">(</span><span class="n">DWORD</span><span class="p">);</span>
</code></pre></div></div>

<p>This works with both Mingw-w64 and MSVC. MSVC requires <code class="language-plaintext highlighter-rouge">__stdcall</code> between
the return type and function name, so don’t get clever about it. If you
only care about GCC then you can declare both at once using attributes:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">HANDLE</span> <span class="nf">GetStdHandle</span><span class="p">(</span><span class="n">DWORD</span><span class="p">)</span>
    <span class="n">__attribute__</span><span class="p">((</span><span class="n">dllimport</span><span class="p">,</span><span class="n">stdcall</span><span class="p">));</span>
</code></pre></div></div>

<p>I like to hide all this behind a macro, with a “table” of all my imports
listed just below:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define W32(r) __declspec(dllimport) r __stdcall
</span><span class="n">W32</span><span class="p">(</span><span class="n">HANDLE</span><span class="p">)</span> <span class="n">GetStdHandle</span><span class="p">(</span><span class="n">DWORD</span><span class="p">);</span>
<span class="n">W32</span><span class="p">(</span><span class="n">BOOL</span><span class="p">)</span>   <span class="n">WriteFile</span><span class="p">(</span><span class="n">HANDLE</span><span class="p">,</span> <span class="k">const</span> <span class="kt">void</span> <span class="o">*</span><span class="p">,</span> <span class="n">DWORD</span><span class="p">,</span> <span class="n">DWORD</span> <span class="o">*</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="p">);</span>
</code></pre></div></div>

<p>In WriteFile you may have noticed I’m taking shortcuts. The “official”
definition uses an ugly pointer typedef, <code class="language-plaintext highlighter-rouge">LPCVOID</code>, instead of pointer
syntax, but I skipped that type definition. I also replaced the last
argument, an <code class="language-plaintext highlighter-rouge">OVERLAPPED</code> pointer, with a generic pointer. I only need to
pass null. I can keep sanding it down to something more ergonomic:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">W32</span><span class="p">(</span><span class="kt">int</span><span class="p">)</span>    <span class="n">WriteFile</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="p">,</span> <span class="kt">int</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="p">);</span>
</code></pre></div></div>

<p>That’s how I typically write these prototypes. I dropped the <code class="language-plaintext highlighter-rouge">const</code>
because it doesn’t help me. I used signed sizes because I like them better
and it’s <a href="/blog/2023/02/13/">what I’m usually holding</a> at the call site. But doesn’t
changing the signedness potentially break compatibility? It makes no
difference to any practical ABI: It’s passed the same way. In general,
signedness is a matter for <em>operators</em>, and only some of them — mainly
comparisons (<code class="language-plaintext highlighter-rouge">&lt;</code>, <code class="language-plaintext highlighter-rouge">&gt;</code>, etc.) and division. It’s a similar story for
pointers starting with the 32-bit era, so I can choose whatever pointer
types are convenient.</p>

<p>In general, I can do anything I want so long as I know my compiler will
produce an appropriate function call. These are not standard functions,
like <code class="language-plaintext highlighter-rouge">printf</code> or <code class="language-plaintext highlighter-rouge">memcpy</code>, which are implemented in part by the compiler
itself, but foreign functions. It’s no different than teaching <a href="/blog/2018/05/27/">an
FFI</a> how to make a call. This is also, in essence, how OpenGL and
Vulkan work, with applications <a href="https://www.khronos.org/opengl/wiki/OpenGL_Loading_Library">defining the API for themselves</a>.</p>

<p>Considering all this, my new hello world:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define W32(r) __declspec(dllimport) r __stdcall
</span><span class="n">W32</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span> <span class="n">GetStdHandle</span><span class="p">(</span><span class="kt">int</span><span class="p">);</span>
<span class="n">W32</span><span class="p">(</span><span class="kt">int</span><span class="p">)</span>    <span class="n">WriteFile</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="p">,</span> <span class="kt">int</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="p">);</span>

<span class="kt">int</span> <span class="nf">mainCRTStartup</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">void</span> <span class="o">*</span><span class="n">stdout</span> <span class="o">=</span> <span class="n">GetStdHandle</span><span class="p">(</span><span class="o">-</span><span class="mi">10</span> <span class="o">-</span> <span class="mi">1</span><span class="p">);</span>
    <span class="kt">char</span> <span class="n">message</span><span class="p">[]</span> <span class="o">=</span> <span class="s">"Hello, world!</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">len</span><span class="p">;</span>
    <span class="k">return</span> <span class="o">!</span><span class="n">WriteFile</span><span class="p">(</span><span class="n">stdout</span><span class="p">,</span> <span class="n">message</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">message</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">len</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>You know, there’s a kind of beauty to a program that requires no external
definitions. It builds quickly and produces a binary bit-for-bit identical
to the original:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ time cc -nostartfiles -o hello.exe main.c
real    0m 0.04s
user    0m 0.00s
sys     0m 0.00s

$ time cl /nologo hello.c /link /subsystem:console kernel32.lib
hello.c
real    0m 0.03s
user    0m 0.00s
sys     0m 0.00s
</code></pre></div></div>

<p>I’ve also been using this to patch over API rough edges. For example,
<a href="https://learn.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsarecvfrom">WSARecvFrom</a> takes <a href="https://learn.microsoft.com/en-us/windows/win32/api/winsock2/ns-winsock2-wsaoverlapped">WSAOVERLAPPED</a>, but <a href="https://learn.microsoft.com/en-us/windows/win32/api/ioapiset/nf-ioapiset-getqueuedcompletionstatus">GetQueuedCompletionStatus</a>
takes <a href="https://learn.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-overlapped">OVERLAPPED</a>. These types are explicitly compatible, and only
defined separately for annoying technical reasons. I must use the same
overlapped object with both APIs at once, meaning I would normally need
ugly pointer casts on my Winsock calls, or vice versa with I/O completion
ports. But because I’m writing all these definitions myself, I can define
a common overlapped structure for both!</p>

<p>Perhaps you’re worried that this would be too fragile. Well, as a legacy
software aficionado, I enjoy <a href="/blog/2018/04/13/">building and running my programs on old
platforms</a>. So far these programs still work properly <a href="https://winworldpc.com/library/">going back
30 years</a> to Windows NT 3.5 and Visual C++ 4.2. When I do hit a snag,
it’s always been a bug (now long fixed) in the old operating system, not
in my programs or these prototypes. So, in effect, this technique has
worked well for the past 30 years!</p>

<p>Writing out these definitions is a bit of a chore, but after paying that
price I’ve been quite happy with the results. I will likely continue doing
it in the future, at least for non-graphical applications.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>My favorite C compiler flags during development</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2023/04/29/"/>
    <id>urn:uuid:a90f3f5b-b4c3-4153-ac8e-6cdbf235f44b</id>
    <updated>2023-04-29T22:55:25Z</updated>
    <category term="c"/><category term="cpp"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=35758898">on Hacker News</a> and <a href="https://old.reddit.com/r/C_Programming/comments/133bjlp">on reddit</a>.</em></p>

<p>The major compilers have an <a href="https://man7.org/linux/man-pages/man1/gcc.1.html">enormous number of knobs</a>. Most are
highly specialized, but others are generally useful even if uncommon. For
warnings, the venerable <code class="language-plaintext highlighter-rouge">-﻿Wall -﻿Wextra</code> is a good start, but
circumstances improve by tweaking this warning set. This article covers
high-hitting development-time options in GCC, Clang, and MSVC that ought
to get more consideration.</p>

<!--more-->

<p>There’s an irony that the more you use these options, the less useful they
become. Given a reasonable workflow, they are a harsh mistress in a fast,
tight feedback loop quickly breaking the habits that cause warnings and
errors. It’s a kind of self-improvement, where eventually most findings
will be false positives. With heuristics internalized, you will be able
spot the same issues just reading code — a handy skill during code review.</p>

<h3 id="static-warnings">Static warnings</h3>

<p>Traditionally, C and C++ compilers are by default conservative with
warnings. Unless configured otherwise, they only warn about the most
egregious issues where it’s highly confident. That’s too conservative. For
<code class="language-plaintext highlighter-rouge">gcc</code> and <code class="language-plaintext highlighter-rouge">clang</code>, the first order of business is turning on more warnings
with <strong><code class="language-plaintext highlighter-rouge">-﻿Wall</code></strong>. Despite the name, this doesn’t actually enable all
warnings. (<code class="language-plaintext highlighter-rouge">clang</code> has <code class="language-plaintext highlighter-rouge">-﻿Weverything</code> which does literally this, but
trust me, you don’t want it.) However, that still falls short, and you’re
better served enabling <em>extra</em> warnings on with <strong><code class="language-plaintext highlighter-rouge">-﻿Wextra</code></strong>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cc -Wall -Wextra ...
</code></pre></div></div>

<p>That should be the baseline on any new project, and closer to what these
compilers should do by default. Not using these means leaving value on the
table. If you come across such a project, there’s a good chance you can
find bugs statically just by using this baseline. Some warnings only occur
at higher <a href="https://www.openwall.com/lists/musl/2023/05/22/2/1">optimization levels</a>, so leave these on for your release
builds, too.</p>

<p>For MSVC, including <code class="language-plaintext highlighter-rouge">clang-cl</code>, a similar baseline is <strong><code class="language-plaintext highlighter-rouge">/W4</code></strong>. Though it
goes a bit far, warning about use of unary minus on unsigned types
(C4146), and sign conversions (C4245). If you’re <a href="/blog/2023/02/15/">using a CRT</a>, also
disable the bogus and irresponsible “security” warnings. Putting it
together, the warning baseline becomes:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cl /W4 /wd4146 /wd4245 /D_CRT_SECURE_NO_WARNINGS ...
</code></pre></div></div>

<p>As for <code class="language-plaintext highlighter-rouge">gcc</code> and <code class="language-plaintext highlighter-rouge">clang</code>, I dislike unused parameter warnings, so I often
turn it off, at least while I’m working: <strong><code class="language-plaintext highlighter-rouge">-﻿Wno-unused-parameter</code></strong>.
Rarely is it a defect to not use a parameter. It’s common for a function
to fit a fixed prototype but not need all its parameters (e.g. <code class="language-plaintext highlighter-rouge">WinMain</code>).
Were it up to me, this would not be part of <code class="language-plaintext highlighter-rouge">-﻿Wextra</code>.</p>

<p>I also dislike unused functions warnings: <strong><code class="language-plaintext highlighter-rouge">-﻿Wno-unused-function</code></strong>.
I can’t say this is wrong for the baseline since, in most cases, ultimately
I do want to know if there are unused functions, e.g. to be deleted. But
while I’m working it’s usually noise.</p>

<p>If I’m <a href="/blog/2017/03/01/">working with OpenMP</a>, I may also disable warnings about
unknown pragmas: <strong><code class="language-plaintext highlighter-rouge">-﻿Wno-unknown-pragmas</code></strong>. One cool feature of
OpenMP is that the typical case gracefully degrades to single-threaded
behavior when not enabled. That is, compiling without <code class="language-plaintext highlighter-rouge">-﻿fopenmp</code>.
I’ll test both ways to ensure I get deterministic results, or just to ease
debugging, and I don’t want warnings when it’s disabled. It’s fine for the
baseline to have this warning, but sometimes it’s a poor match.</p>

<p>When working with single-precision floats, perhaps on games or graphics,
it’s easy to accidentally introduce promotion to double precision, which
can hurt performance. It could be neglecting an <code class="language-plaintext highlighter-rouge">f</code> suffix on a constant
or using <code class="language-plaintext highlighter-rouge">sin</code> instead of <code class="language-plaintext highlighter-rouge">sinf</code>. Use <strong><code class="language-plaintext highlighter-rouge">-﻿Wdouble-promotion</code></strong> to
catch such mistakes. Honestly, this is important enough that it should go
into the baseline.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define PI 3.141592653589793
</span><span class="kt">float</span> <span class="n">degs</span> <span class="o">=</span> <span class="p">...;</span>
<span class="kt">float</span> <span class="n">rads</span> <span class="o">=</span> <span class="n">degs</span> <span class="o">*</span> <span class="n">PI</span> <span class="o">/</span> <span class="mi">180</span><span class="p">;</span>  <span class="c1">// warns about promotion</span>
</code></pre></div></div>

<p>It can be awkward around variadic functions, particularly <code class="language-plaintext highlighter-rouge">printf</code>, which
cannot receive <code class="language-plaintext highlighter-rouge">float</code> arguments, and so implicitly converts. You’ll need
a explicit cast to disable the warning. I imagine this is the main reason
the warning is not part of <code class="language-plaintext highlighter-rouge">-﻿Wextra</code>.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">float</span> <span class="n">x</span> <span class="o">=</span> <span class="p">...;</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"%.17g</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="p">(</span><span class="kt">double</span><span class="p">)</span><span class="n">x</span><span class="p">);</span>
</code></pre></div></div>

<p>Finally, an advanced option: <strong><code class="language-plaintext highlighter-rouge">-﻿Wconversion -Wno-sign-conversion</code></strong>.
It warns about implicit conversions that may result in data loss. Sign
conversions do not have data loss, the implicit conversions are useful,
and in my experience they’re not a source of defects, so I disable that
part using the second flag (like MSVC <code class="language-plaintext highlighter-rouge">/wd4245</code>). The important warning
here is truncation of size values, warning about unsound uses of sizes and
subscripts. For example:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// NOTE: would be declared/defined via windows.h</span>
<span class="k">typedef</span> <span class="kt">uint32_t</span> <span class="n">DWORD</span><span class="p">;</span>
<span class="n">BOOL</span> <span class="nf">WriteFile</span><span class="p">(</span><span class="n">HANDLE</span><span class="p">,</span> <span class="k">const</span> <span class="kt">void</span> <span class="o">*</span><span class="p">,</span> <span class="n">DWORD</span><span class="p">,</span> <span class="n">DWORD</span> <span class="o">*</span><span class="p">,</span> <span class="n">OVERLAPPED</span> <span class="o">*</span><span class="p">);</span>

<span class="kt">void</span> <span class="nf">logmsg</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">msg</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">len</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">HANDLE</span> <span class="n">err</span> <span class="o">=</span> <span class="n">GetStdHandle</span><span class="p">(</span><span class="n">STD_ERROR_HANDLE</span><span class="p">);</span>
    <span class="n">DWORD</span> <span class="n">out</span><span class="p">;</span>
    <span class="n">WriteFile</span><span class="p">(</span><span class="n">err</span><span class="p">,</span> <span class="n">msg</span><span class="p">,</span> <span class="n">len</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">out</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>  <span class="c1">// len truncation warning</span>
<span class="p">}</span>
</code></pre></div></div>

<p>On 64-bit targets, it will warn about truncating the 64-bit <code class="language-plaintext highlighter-rouge">len</code> for the
32-bit parameter. To dismiss the warning, you must either address it by
using a loop to <a href="/blog/2023/02/13/">call <code class="language-plaintext highlighter-rouge">WriteFile</code> multiple times</a>, or acknowledge the
truncation with an explicit cast and accept the consequences. In this case
I may know from context it’s impossible for the program to even construct
such a large message, so I’d use an assertion and truncate.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">logmsg</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">msg</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">len</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">HANDLE</span> <span class="n">err</span> <span class="o">=</span> <span class="n">GetStdHandle</span><span class="p">(</span><span class="n">STD_ERROR_HANDLE</span><span class="p">);</span>
    <span class="n">DWORD</span> <span class="n">out</span><span class="p">;</span>
    <span class="n">assert</span><span class="p">(</span><span class="n">len</span> <span class="o">&lt;=</span> <span class="mh">0xffffffff</span><span class="p">);</span>
    <span class="n">WriteFile</span><span class="p">(</span><span class="n">err</span><span class="p">,</span> <span class="n">msg</span><span class="p">,</span> <span class="p">(</span><span class="n">DWORD</span><span class="p">)</span><span class="n">len</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">out</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>You might consider changing the interface instead:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">logmsg</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">msg</span><span class="p">,</span> <span class="kt">uint32_t</span> <span class="n">len</span><span class="p">);</span>
</code></pre></div></div>

<p>That probably passes the buck and doesn’t solve the underlying problem.
The caller may be holding a <code class="language-plaintext highlighter-rouge">size_t</code> length, so the truncation happens
there instead. Or maybe you keep propagating this change backwards until
it, say, dissipates on a known constant. <code class="language-plaintext highlighter-rouge">-﻿Wconversion</code> leads to
these ripple effects that improves the overall program, which is why I
like it.</p>

<p>The catch is that the above warning only happens for 64-bit targets. So
you might miss it. The inverse is true in other cases. This is one area
where <a href="/blog/2021/08/21/">cross-architecture testing</a> can pay off.</p>

<p>Unfortunately since this warning is off the beaten path, it seems like it
doesn’t quite get the attention it could use. It warns about simple cases
where truncation has been explicitly handled/avoided. For example:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">x</span> <span class="o">=</span> <span class="p">...;</span>
<span class="kt">char</span> <span class="n">digit</span> <span class="o">=</span> <span class="sc">'0'</span> <span class="o">+</span> <span class="n">x</span><span class="o">%</span><span class="mi">10</span><span class="p">;</span>  <span class="c1">// false warning</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">'0'</code> is a known constant. The operation <code class="language-plaintext highlighter-rouge">x%10</code> has a known range (-9
to 9). Therefore the addition result has a known range, and all results
can be represented in a <code class="language-plaintext highlighter-rouge">char</code>. Yet it still warns. This often comes up
dealing with character data like this.</p>

<p>In my <code class="language-plaintext highlighter-rouge">logmsg</code> fix I had used an assertion to check that no truncation
actually occurred. But wouldn’t it be nice if the compiler could generate
that for us somehow? That brings us to dynamic checks.</p>

<h3 id="dynamic-run-time-checks">Dynamic run-time checks</h3>

<p>Sanitizers have been around for nearly a decade but are still criminally
underused. They insert run-time assertions into programs at the flip of a
switch typically at a modest performance cost — less than the cost of a
debug build. All three major compilers support at least one sanitizer on
all targets. In most cases, failing to use them is practically the same as
not even trying to find defects. Every beginner tutorial ought to be using
sanitizers <em>from page 1</em> where they teach how to compile a program with
<code class="language-plaintext highlighter-rouge">gcc</code>. (That this is universally <em>not</em> the case, and that these same
tutorials also do not begin with teaching a debugger, is a major, on-going
education failure.)</p>

<p>There are multiple different sanitizers with lots of overlap, but Address
Sanitizer (ASan) and Undefined Behavior Sanitizer (UBSan) are the most
general. They are compatible with each other and form a solid, general
baseline. To use address sanitizer, at both compile and link time do:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cc ... -fsanitize=address ...
</code></pre></div></div>

<p>It’s even spelled the same way in MSVC. It’s needed at link time because
it includes a runtime component. When working properly it’s aware of all
allocations and checks all memory accesses that might be out of bounds,
producing a run-time error if that occurs. It’s not always appropriate,
but most projects that can use it probably should.</p>

<p>UBSan is enabled similarly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cc ... -fsanitize=undefined ...
</code></pre></div></div>

<p>It adds checks around operations that might be undefined, emitting a
run-time error if it occurs. It has an optional runtime component to
produce a helpful diagnostic. You can instead insert a trap instruction,
which is how I prefer to use it: <strong><code class="language-plaintext highlighter-rouge">-﻿fsanitize-trap=undefined</code></strong>.
(Until recently it was <strong><code class="language-plaintext highlighter-rouge">-﻿fsanitize-undefined-trap-on-error</code></strong>.)
This works on platforms where the UBSan runtime is unsupported. Some
instrumentation is only inserted at higher optimization levels.</p>

<p>For me, the most useful UBSan check is signed overflow — e.g. computing
the wrong result — and it’s instrumentation I miss when not working in C.
In programs where this might be an issue, combine it <a href="/blog/2019/01/25/">with a fuzzer</a>
to search for inputs that cause overflows. This is yet another argument in
favor of <a href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1428r0.pdf">signed sizes</a>, as UBSan can detect such overflows. (Yes,
UBSan optionally instruments unsigned overflow, too, but then you must
somehow distinguish <a href="/blog/2019/11/19/">intentional</a> from <a href="/blog/2017/07/19/">unintentional</a>
overflow.)</p>

<p>On Linux, ASan and UBSan strangely do not have <a href="/blog/2022/06/26/">debugger-oriented
defaults</a>. Fortunately that’s easy to address with a couple of
environment variables, which cause them to break on error instead of
uselessly exiting:</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">ASAN_OPTIONS</span><span class="o">=</span><span class="nv">abort_on_error</span><span class="o">=</span>1:halt_on_error<span class="o">=</span>1
<span class="nb">export </span><span class="nv">UBSAN_OPTIONS</span><span class="o">=</span><span class="nv">abort_on_error</span><span class="o">=</span>1:halt_on_error<span class="o">=</span>1
</code></pre></div></div>

<p>Also, when compiling you can combine sanitizers like so:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cc ... -fsanitize=address,undefined ...
</code></pre></div></div>

<p>As of this writing, MSVC does not have UBSan, but it does have a similar
feature, <a href="https://learn.microsoft.com/en-us/cpp/build/reference/rtc-run-time-error-checks">run-time error checks</a>. Three sub-flags (<code class="language-plaintext highlighter-rouge">c</code>, <code class="language-plaintext highlighter-rouge">s</code>, <code class="language-plaintext highlighter-rouge">u</code>)
enable different checks, and <strong><code class="language-plaintext highlighter-rouge">/RTCcsu</code></strong> turns them all on. The <code class="language-plaintext highlighter-rouge">c</code> flag
generates the assertion I had manually written with <code class="language-plaintext highlighter-rouge">-﻿Wconversion</code>,
and traps any truncation at run time. There’s nothing quite like this in
UBSan! It’s so extreme that it’s compatible with neither standard runtime
libraries (fortunately <a href="/blog/2023/02/11/">not a big deal</a>) nor with ASan.</p>

<p>Caveat: Explicit casts aren’t enough, you must actually truncate variables
using a mask in order to pass the check. For example, to accept truncation
in the <code class="language-plaintext highlighter-rouge">logmsg</code> function:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">WriteFile</span><span class="p">(</span><span class="n">err</span><span class="p">,</span> <span class="n">msg</span><span class="p">,</span> <span class="n">len</span><span class="o">&amp;</span><span class="mh">0xffffffff</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">out</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
</code></pre></div></div>

<p>Thread Sanitizer (TSan) is occasionally useful for finding — or, more
often, <em>proving</em> the presence of — data races. It has a runtime component
and so must be used at compile time and link time.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cc ... -fsanitize=thread ...
</code></pre></div></div>

<p>Unfortunately it only works in a narrow context. The target must use
pthreads, not C11 threads, OpenMP, nor <a href="/blog/2023/03/23/">direct cloning</a>. It must
only synchronize through code that was compiled with TSan. That means no
synchronization <a href="/blog/2022/10/03/">through system calls</a>, especially no futexes. Most
non-trivial programs do not meet the criteria.</p>

<h3 id="debug-information">Debug information</h3>

<p>Another common mistake in tutorials is using plain old <code class="language-plaintext highlighter-rouge">-﻿g</code> instead
of <strong><code class="language-plaintext highlighter-rouge">-﻿g3</code></strong> (read: “debug level 3”). That’s like using <code class="language-plaintext highlighter-rouge">-﻿O</code>
instead of <code class="language-plaintext highlighter-rouge">-﻿O3</code>. It adds a lot more debug information to the
output, particularly enums and macros. The extra information is useful and
you’re better off having it!</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cc ... -g3 ...
</code></pre></div></div>

<p>All the major build systems — CMake, Autotools, Meson, etc. — get this
wrong in their standard debug configurations. Producing a fully-featured
debug build from these systems is a constant battle for me. Often it’s
easier to ignore the build system entirely and <code class="language-plaintext highlighter-rouge">cc -g3 **/*.c</code> (plus
sanitizers, etc.).</p>

<p>(Short term note: GCC 11, released in March 2021, switched to DWARF5 by
default. However, GDB could not access the extra <code class="language-plaintext highlighter-rouge">-﻿g3</code> debug
information in DWARF5 until GDB 13, released February 2023. If you have a
toolchain from that two year window — except <a href="https://github.com/skeeto/w64devkit">mine</a> because I patched
it — then you may also need <code class="language-plaintext highlighter-rouge">-﻿gdwarf-4</code> to switch back to DWARF4.)</p>

<p>What about <code class="language-plaintext highlighter-rouge">-﻿Og</code>? In theory it enables optimizations that do not
interfere with debugging, and potentially some additional warnings. In
practice I still get far too many “optimized out” messages from GDB when I
use it, so I don’t bother. Fortunately C is such a simple language that
debug builds are nearly as fast as release builds anyway.</p>

<p>On MSVC I like having debug information embedded in binaries, as GCC does,
which is done using <strong><code class="language-plaintext highlighter-rouge">/Z7</code></strong>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cl ... /Z7 ...
</code></pre></div></div>

<p>Though I certainly understand the value of separate debug information,
<code class="language-plaintext highlighter-rouge">/Zi</code>, in some cases. Sometimes I wish the GNU toolchain made this easier.</p>

<h3 id="summary">Summary</h3>

<p>My personal rigorous baseline for development using <code class="language-plaintext highlighter-rouge">gcc</code> and <code class="language-plaintext highlighter-rouge">clang</code>
looks like this (all platforms):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cc -g3 -Wall -Wextra -Wconversion -Wdouble-promotion
     -Wno-unused-parameter -Wno-unused-function -Wno-sign-conversion
     -fsanitize=undefined -fsanitize-trap ...
</code></pre></div></div>

<p>While ASan is great for quickly reviewing and evaluating other people’s
projects, I don’t find it useful for my own programs. I avoid that class
of defects through smarter paradigms (region-based allocation, no null
terminated strings, etc.). I also prefer the behavior of trap instruction
UBSan versus a diagnostic, as it behaves better under debuggers.</p>

<p>For <code class="language-plaintext highlighter-rouge">cl</code> and <code class="language-plaintext highlighter-rouge">clang-cl</code>, my personal baseline looks like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cl /Z7 /W4 /wd4146 /wd4245 /RTCcsu ...
</code></pre></div></div>

<p>I don’t normally need <code class="language-plaintext highlighter-rouge">/D_CRT_SECURE_NO_WARNINGS</code> since I don’t use a CRT
anyway.</p>

<p><strong>Update</strong>: Peter0x44 points out <code class="language-plaintext highlighter-rouge">-D_GLIBCXX_DEBUG</code> if you’re working in
C++ with libstdc++, including on Windows with Mingw-w64. I agree, this is
an excellent option! ASan does not “see” C++ containers, and it fills in
some of those gaps.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>My new debugbreak command</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2022/07/31/"/>
    <id>urn:uuid:c333d1ab-86b5-4389-b2b7-325d0eb90987</id>
    <updated>2022-07-31T12:59:59Z</updated>
    <category term="c"/><category term="cpp"/><category term="win32"/><category term="linux"/>
    <content type="html">
      <![CDATA[<p>I <a href="/blog/2022/06/26/">previously mentioned</a> the Windows feature where <a href="https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-registerhotkey">pressing
F12</a> in a debuggee window causes it to break in the debugger. It
works with any debugger — GDB, RemedyBG, Visual Studio, etc. — since the
hotkey simply raises a breakpoint <a href="https://docs.microsoft.com/en-us/cpp/cpp/structured-exception-handling-c-cpp">structured exception</a>. It’s been
surprisingly useful, and I’ve wanted it available in more contexts, such
as console programs or even on Linux. The result is a new <a href="https://github.com/skeeto/w64devkit/blob/4282797/src/debugbreak.c"><code class="language-plaintext highlighter-rouge">debugbreak</code>
command</a>, now included in <a href="/blog/2020/05/15/">w64devkit</a>. Though, of course, you
already have <a href="/blog/2020/09/25/">everything you need</a> to build it and try it out right
now. I’ve also worked out a Linux implementation.</p>

<p>It’s named after an <a href="https://docs.microsoft.com/en-us/visualstudio/debugger/debugbreak-and-debugbreak">MSVC intrinsic and Win32 function</a>. It takes no
arguments, and its operation is indiscriminate: It raises a breakpoint
exception in <em>all</em> debuggee processes system-wide. Reckless? Perhaps, but
certainly convenient. You don’t need to tell it which process you want to
pause. It just works, and a good debugging experience is one of ease and
convenience.</p>

<p>The linchpin is <a href="https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-debugbreakprocess">DebugBreakProcess</a>. The command walks the process
list and fires this function at each process. Nothing happens for programs
without a debugger attached, so it doesn’t even bother checking if it’s a
debuggee. It couldn’t be simpler. I’ve used it on everything from Windows
XP to Windows 11, and it’s worked flawlessly.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">HANDLE</span> <span class="n">s</span> <span class="o">=</span> <span class="n">CreateToolhelp32Snapshot</span><span class="p">(</span><span class="n">TH32CS_SNAPPROCESS</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="n">PROCESSENTRY32W</span> <span class="n">p</span> <span class="o">=</span> <span class="p">{</span><span class="k">sizeof</span><span class="p">(</span><span class="n">p</span><span class="p">)};</span>
<span class="k">for</span> <span class="p">(</span><span class="n">BOOL</span> <span class="n">r</span> <span class="o">=</span> <span class="n">Process32FirstW</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">p</span><span class="p">);</span> <span class="n">r</span><span class="p">;</span> <span class="n">r</span> <span class="o">=</span> <span class="n">Process32NextW</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">p</span><span class="p">))</span> <span class="p">{</span>
    <span class="n">HANDLE</span> <span class="n">h</span> <span class="o">=</span> <span class="n">OpenProcess</span><span class="p">(</span><span class="n">PROCESS_ALL_ACCESS</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">p</span><span class="p">.</span><span class="n">th32ProcessID</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">h</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">DebugBreakProcess</span><span class="p">(</span><span class="n">h</span><span class="p">);</span>
        <span class="n">CloseHandle</span><span class="p">(</span><span class="n">h</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I use it almost exclusively from Vim, where I’ve given it a <a href="https://learnvimscriptthehardway.stevelosh.com/chapters/06.html">leader
mapping</a>. With the editor focused, I can type backslash then
<kbd>d</kbd> to pause the debuggee.</p>

<div class="language-vim highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">map</span> <span class="p">&lt;</span>leader<span class="p">&gt;</span><span class="k">d</span> <span class="p">:</span><span class="k">call</span> <span class="nb">system</span><span class="p">(</span><span class="s2">"debugbreak"</span><span class="p">)&lt;</span><span class="k">cr</span><span class="p">&gt;</span>
</code></pre></div></div>

<p>With the debuggee paused, I’m free to add new breakpoints or watchpoints,
or print the call stack to see what the heck it’s busy doing. The
mechanism behind DebugBreakProcess is to create a new thread in the
target, with that thread raising the breakpoint exception. The debugger
will be stopped in this new thread. In GDB you can use the <code class="language-plaintext highlighter-rouge">thread</code>
command to switch over to the thread that actually matters, usually <code class="language-plaintext highlighter-rouge">thr
1</code>.</p>

<h3 id="debugbreak-on-linux">debugbreak on Linux</h3>

<p>On unix-like systems the equivalent of a breakpoint exception is a
<code class="language-plaintext highlighter-rouge">SIGTRAP</code>. There’s already a standard command for sending signals,
<a href="https://man7.org/linux/man-pages/man1/kill.1.html"><code class="language-plaintext highlighter-rouge">kill</code></a>, so a <code class="language-plaintext highlighter-rouge">debugbreak</code> command can be built using nothing more
than a few lines of shell script. However, unlike DebugBreakProcess,
signaling every process with <code class="language-plaintext highlighter-rouge">SIGTRAP</code> will only end in tears. The script
will need a way to determine which processes are debuggees.</p>

<p>Linux exposes processes in the file system as virtual files under <code class="language-plaintext highlighter-rouge">/proc</code>,
where each process appears as a directory. Its <code class="language-plaintext highlighter-rouge">status</code> file includes a
<code class="language-plaintext highlighter-rouge">TracerPid</code> field, which will be non-zero for debuggees. The script
inspects this field, and if non-zero sends a <code class="language-plaintext highlighter-rouge">SIGTRAP</code>.</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/sh</span>
<span class="nb">set</span> <span class="nt">-e</span>
<span class="k">for </span>pid <span class="k">in</span> <span class="si">$(</span>find /proc <span class="nt">-maxdepth</span> 1 <span class="nt">-printf</span> <span class="s1">'%f\n'</span> | <span class="nb">grep</span> <span class="s1">'^[0-9]\+$'</span><span class="si">)</span><span class="p">;</span> <span class="k">do
    </span><span class="nb">grep</span> <span class="nt">-q</span> <span class="s1">'^TracerPid:\s[^0]'</span> /proc/<span class="nv">$pid</span>/status 2&gt;/dev/null <span class="o">&amp;&amp;</span>
        <span class="nb">kill</span> <span class="nt">-TRAP</span> <span class="nv">$pid</span>
<span class="k">done</span>
</code></pre></div></div>

<p>This script, now part of <a href="/blog/2012/06/23/">my dotfiles</a>, has worked very well so
far, and effectively smoothes over some debugging differences between
Windows and Linux, reducing my context switching mental load. There’s
probably a better way to express this script, but that’s the best I could
do so far. On the BSDs you’d need to parse the output of <code class="language-plaintext highlighter-rouge">ps</code>, though each
system seems to do its own thing for distinguishing debuggees.</p>

<h3 id="a-missing-feature">A missing feature</h3>

<p>I had originally planned for one flag, <code class="language-plaintext highlighter-rouge">-k</code>. Rather than breakpoint
debugees, it would terminate all debuggee processes. This is especially
important on Windows where debuggee processes block builds due to file
locking shenanigans. I’d just run <code class="language-plaintext highlighter-rouge">debugbreak -k</code> as part of the build.
However, it’s not possible to terminate debuggees paused in the debugger —
the common situation. I’ve given up on this for now.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Assertions should be more debugger-oriented</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2022/06/26/"/>
    <id>urn:uuid:22ae914c-971b-4cee-ba48-a189db1b6df6</id>
    <updated>2022-06-26T18:51:04Z</updated>
    <category term="c"/><category term="cpp"/><category term="go"/><category term="python"/><category term="java"/>
    <content type="html">
      <![CDATA[<p>Prompted by <a href="https://www.youtube.com/watch?v=r9eQth4Q5jg">a 20 minute video</a>, over the past month I’ve improved my
debugger skills. I’d shamefully acquired a bad habit: avoiding a debugger
until exhausting dumber, insufficient methods. My <em>first</em> choice should be
a debugger, but I had allowed a bit of friction to dissuade me. With some
thoughtful practice and deliberate effort clearing the path, my bad habit
is finally broken — at least when a good debugger is available. It feels
like I’ve leveled up and, <a href="/blog/2017/04/01/">like touch typing</a>, this was a skill I’d
neglected far too long. One friction point was the less-than-optimal
<code class="language-plaintext highlighter-rouge">assert</code> feature in basically every programming language implementation.
It ought to work better with debuggers.</p>

<p>An assertion verifies a program invariant, and so if one fails then
there’s undoubtedly a defect in the program. In other words, assertions
make programs more sensitive to defects, allowing problems to be caught
more quickly and accurately. Counter-intuitively, crashing early and often
makes for more robust and reliable software in the long run. For exactly
this reason, assertions go especially well with <a href="/blog/2019/01/25/">fuzzing</a>.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">assert</span><span class="p">(</span><span class="n">i</span> <span class="o">&gt;=</span> <span class="mi">0</span> <span class="o">&amp;&amp;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">len</span><span class="p">);</span>   <span class="c1">// bounds check</span>
<span class="n">assert</span><span class="p">((</span><span class="kt">ssize_t</span><span class="p">)</span><span class="n">size</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="p">);</span>  <span class="c1">// suspicious size_t</span>
<span class="n">assert</span><span class="p">(</span><span class="n">cur</span><span class="o">-&gt;</span><span class="n">next</span> <span class="o">!=</span> <span class="n">cur</span><span class="p">);</span>    <span class="c1">// circular reference?</span>
</code></pre></div></div>

<p>They’re sometimes abused for error handling, which is a reason they’ve
also been (wrongfully) discouraged at times. For example, failing to open
a file is an error, not a defect, so an assertion is inappropriate.</p>

<p>Normal programs have implicit assertions all over, even if we don’t
usually think of them as assertions. In some cases they’re checked by the
hardware. Examples of implicit assertion failures:</p>

<ul>
  <li>Out-of-bounds indexing</li>
  <li>Dereferencing null/nil/None</li>
  <li>Dividing by zero</li>
  <li>Certain kinds of integer overflow (e.g. <code class="language-plaintext highlighter-rouge">-ftrapv</code>)</li>
</ul>

<p>Programs are generally not intended to recover from these situations
because, had they been anticipated, the invalid operation wouldn’t have
been attempted in the first place. The program simply crashes because
there’s no better alternative. Sanitizers, including Address Sanitizer
(ASan) and Undefined Behavior Sanitizer (UBSan), are in essence
additional, implicit assertions, checking invariants that aren’t normally
checked.</p>

<p>Ideally a failing assertion should have these two effects:</p>

<ul>
  <li>
    <p>Execution should <em>immediately</em> stop. The program is in an unknown state,
so it’s neither safe to “clean up” nor attempt to recover. Additional
execution will only make debugging more difficult, and may obscure the
defect.</p>
  </li>
  <li>
    <p>When run under a debugger — or visited as a core dump — it should break
exactly at the failed assertion, ready for inspection. I should not need
to dig around the call stack to figure out where the failure occurred. I
certainly shouldn’t need to manually set a breakpoint and restart the
program hoping to fail the assertion a second time. The whole reason for
using a debugger is to save time, so if it’s wasting my time then it’s
failing at its primary job.</p>
  </li>
</ul>

<p>I examined standard <code class="language-plaintext highlighter-rouge">assert</code> features across various language
implementations, and none strictly meet the criteria. Fortunately, in some
cases, it’s trivial to build a better assertion, and you can substitute
your own definition. First, let’s discuss the way assertions disappoint.</p>

<h3 id="a-test-assertion">A test assertion</h3>

<p>My test for C and C++ is minimal but establishes some state and gives me a
variable to inspect:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;assert.h&gt;</span><span class="cp">
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">10</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">assert</span><span class="p">(</span><span class="n">i</span> <span class="o">&lt;</span> <span class="mi">5</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Then I compile and debug in the most straightforward way:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cc -g -o test test.c
$ gdb test
(gdb) r
(gdb) bt
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">r</code> in GDB stands for <code class="language-plaintext highlighter-rouge">run</code>, which immediately breaks because of the
<code class="language-plaintext highlighter-rouge">assert</code>. The <code class="language-plaintext highlighter-rouge">bt</code> prints a backtrace. On a typical Linux distribution
that shows this backtrace:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0  __GI_raise
#1  __GI_abort
#2  __assert_fail_base
#3  __GI___assert_fail
#4  main
</code></pre></div></div>

<p>Well, actually, it’s much messier than this, but I manually cleaned it up:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linu
x/raise.c:50
#1  0x00007ffff7df4537 in __GI_abort () at abort.c:79
#2  0x00007ffff7df440f in __assert_fail_base (fmt=0x7ffff7f5d
128 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x
55555555600b "i &lt; 5", file=0x555555556004 "test.c", line=6, f
unction=&lt;optimized out&gt;) at assert.c:92
#3  0x00007ffff7e03662 in __GI___assert_fail (assertion=0x555
55555600b "i &lt; 5", file=0x555555556004 "test.c", line=6, func
tion=0x555555556011 &lt;__PRETTY_FUNCTION__.0&gt; "main") at assert
.c:101
#4  0x0000555555555178 in main () at test.c:6
</code></pre></div></div>

<p>That’s a lot to take in at a glance, and about 95% of it is noise that
will never contain useful information. Most notably, GDB didn’t stop at
the failing assertion. Instead there’s <em>four stack frames</em> of libc junk I
have to navigate before I can even begin debugging.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) up
(gdb) up
(gdb) up
(gdb) up
</code></pre></div></div>

<p>I must wade through this for every assertion failure. This is some of the
friction that made me avoid the debugger in the first place. glibc loves
indirection, so maybe the other libc implementations do better? How about
musl?</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0  setjmp
#1  raise
#2  ??
#3  ??
#4  ??
#5  ??
#6  ??
#7  ??
#8  ??
#9  ??
#10 ??
#11 ??
</code></pre></div></div>

<p>Oops, without musl debugging symbols I can’t debug assertions at all
because GDB can’t read the stack, so it’s lost. If you’re on Alpine you
can install <code class="language-plaintext highlighter-rouge">musl-dbg</code>, but otherwise you’ll probably need to build your
own from source. With debugging symbols, musl is no better than glibc:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0  __restore_sigs
#1  raise
#2  abort
#3  __assert_fail
#4  main
</code></pre></div></div>

<p>Same with FreeBSD:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0  thr_kill
#1  in raise
#2  in abort
#3  __assert
#4  main
</code></pre></div></div>

<p>OpenBSD has one fewer frame:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0  thrkill
#1  _libc_abort
#2  _libc___assert2
#3  main
</code></pre></div></div>

<p>How about on Windows with Mingw-w64?</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[Inferior 1 (process 7864) exited with code 03]
</code></pre></div></div>

<p>Oops, on Windows GDB doesn’t break at all on <code class="language-plaintext highlighter-rouge">assert</code>. You must first set
a breakpoint on <code class="language-plaintext highlighter-rouge">abort</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) b abort
</code></pre></div></div>

<p>Besides that, it’s the most straightforward so far:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0 msvcrt!abort
#1 msvcrt!_assert
#2 main
</code></pre></div></div>

<p>With MSVC (default CRT) I get something slightly different:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0 abort
#1 common_assert_to_stderr
#2 _wassert
#3 main
#4 __scrt_common_main_seh
</code></pre></div></div>

<p>RemedyBG leaves me at the <code class="language-plaintext highlighter-rouge">abort</code> like GDB does elsewhere. Visual Studio
recognizes that I don’t care about its stack frames and instead puts the
focus on the assertion, ready for debugging. The other stack frames are
there, but basically invisible. It’s the only case that practically meets
all my criteria!</p>

<p>I can’t entirely blame these implementations. The C standard requires that
<code class="language-plaintext highlighter-rouge">assert</code> print a diagnostic and call <code class="language-plaintext highlighter-rouge">abort</code>, and that <code class="language-plaintext highlighter-rouge">abort</code> raises
<code class="language-plaintext highlighter-rouge">SIGABRT</code>. There’s not much implementations can do, and it’s up to the
debugger to be smarter about it.</p>

<h3 id="sanitizers">Sanitizers</h3>

<p>ASan doesn’t break GDB on assertion failures, which is yet another source
of friction. You can work around this with an environment variable:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>export ASAN_OPTIONS=abort_on_error=1:print_legend=0
</code></pre></div></div>

<p>This works, but it’s the worst case of all: I get 7 junk stack frames on
top of the failed assertion. It’s also very noisy when it traps, so the
<code class="language-plaintext highlighter-rouge">print_legend=0</code> helps to cut it down a bit. I want this variable so often
that I set it in my shell’s <code class="language-plaintext highlighter-rouge">.profile</code> so that it’s always set.</p>

<p>With UBSan you can use <code class="language-plaintext highlighter-rouge">-fsanitize-undefined-trap-on-error</code>, which behaves
like the improved assertion. It traps directly on the defect with no junk
frames, though it prints no diagnostic. As a bonus, it also means you
don’t need to link <code class="language-plaintext highlighter-rouge">libubsan</code>. Thanks to the bonus, it fully supplants
<code class="language-plaintext highlighter-rouge">-ftrapv</code> for me on all platforms.</p>

<p><strong>Update November 2022</strong>: This “stop” hook eliminates ASan friction by
popping runtime frames — functions with the reserved <code class="language-plaintext highlighter-rouge">__</code> prefix — from
the call stack so that they’re not in the way when GDB takes control. It
requires Python support, which is the purpose of the feature-sniff outer
condition.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>if !$_isvoid($_any_caller_matches)
    define hook-stop
        while $_thread &amp;&amp; $_any_caller_matches("^__")
            up-silently
        end
    end
end
</code></pre></div></div>

<p>This is now part of my <code class="language-plaintext highlighter-rouge">.gdbinit</code>.</p>

<h3 id="a-better-assertion">A better assertion</h3>

<p>At least when under a debugger, here’s a much better assertion macro for
GCC and Clang:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define assert(c) if (!(c)) __builtin_trap()
</span></code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">__builtin_trap</code> inserts a trap instruction — a built-in breakpoint. By
not calling a function to raise a signal, there are no junk stack frames
and no need to breakpoint on <code class="language-plaintext highlighter-rouge">abort</code>. It stops exactly where it should as
quickly as possible. This definition works reliably with GCC across all
platforms, too. On MSVC the equivalent is <code class="language-plaintext highlighter-rouge">__debugbreak</code>. If you’re really
in a pinch then do whatever it takes to trigger a fault, like
dereferencing a null pointer. A more complete definition might be:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#ifdef DEBUG
#  if __GNUC__
#    define assert(c) if (!(c)) __builtin_trap()
#  elif _MSC_VER
#    define assert(c) if (!(c)) __debugbreak()
#  else
#    define assert(c) if (!(c)) *(volatile int *)0 = 0
#  endif
#else
#  define assert(c)
#endif
</span></code></pre></div></div>

<p>None of these print a diagnostic, but that’s unnecessary when a debugger
is involved.</p>

<h3 id="other-languages">Other languages</h3>

<p>Unfortunately the situation <a href="https://github.com/rust-lang/rust/issues/21102">mostly gets worse</a> with other language
implementations, and it’s generally not possible to build a better
assertion. Assertions typically have exception-like semantics, if not
literally just another exception, and so they are far less reliable. If a
failed assertion raises an exception, then the program won’t stop until
it’s unwound the stack — running destructors and such along the way — all
the way to the top level looking for a handler. It only knows there’s a
problem when nobody was there to catch it.</p>

<p><a href="https://go.dev/doc/faq#assertions">Go officially doesn’t have assertions</a>, though panics are a kind of
assertion. However, panics have exception-like semantics, and so suffer
the problems of exceptions. A Go version of my test:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">defer</span> <span class="n">fmt</span><span class="o">.</span><span class="n">Println</span><span class="p">(</span><span class="s">"DEFER"</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">i</span> <span class="o">:=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="m">10</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span> <span class="p">{</span>
        <span class="k">if</span> <span class="n">i</span> <span class="o">&gt;=</span> <span class="m">5</span> <span class="p">{</span>
            <span class="nb">panic</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>If I run this under Go’s premier debugger, <a href="https://github.com/go-delve/delve">Delve</a>, the unrecovered
panic causes it to break. So far so good. However, I get two junk frames:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#0 runtime.fatalpanic
#1 runtime.gopanic
#2 main.main
#3 runtime.main
#4 runtime.goexit
</code></pre></div></div>

<p>It only knows to stop because the Go runtime called <code class="language-plaintext highlighter-rouge">fatalpanic</code>, but the
backtrace is a fiction: The program continued to run after the panic,
enough to run all the registered defers (including printing “DEFER”),
unwinding the stack to the top level, and only then did it <code class="language-plaintext highlighter-rouge">fatalpanic</code>.
Fortunately it’s still possible to inspect all those stack frames even if
some variables may have changed while unwinding, but it’s more like
inspecting a core dump than a paused process.</p>

<p>The situation in Python is similar: <code class="language-plaintext highlighter-rouge">assert</code> raises AssertionError — a
plain old exception — and <code class="language-plaintext highlighter-rouge">pdb</code> won’t break until the stack has unwound,
exiting context managers and such. Only once the exception reaches the top
level does it enter “post mortem debugging,” like a core dump. At least
there are no junk stack frames on top. If you’re using asyncio then your
program may continue running for quite awhile before the right tasks are
scheduled and the exception finally propagates to the top level, if ever.</p>

<p>The worst offender of all is Java. First <code class="language-plaintext highlighter-rouge">jdb</code> never breaks for unhandled
exceptions. It’s up to you to set a breakpoint before the exception is
thrown. But it gets worse: assertions are disabled under <code class="language-plaintext highlighter-rouge">jdb</code>. The Java
<code class="language-plaintext highlighter-rouge">assert</code> statement is worse than useless.</p>

<h3 id="addendum-dont-exit-the-debugger">Addendum: Don’t exit the debugger</h3>

<p>The largest friction-reducing change I made is never exiting the debugger.
Previously I would enter GDB, run my program, exit, edit/rebuild, repeat.
However, there’s no reason to exit GDB! It automatically and reliably
reloads symbols and updates breakpoints on symbols. It remembers your run
configuration, so re-running is just <code class="language-plaintext highlighter-rouge">r</code> rather than interacting with
shell history.</p>

<p>My workflow on all platforms (<a href="/blog/2020/05/15/">including Windows</a>) is a vertically
maximized Vim window and a vertically maximized terminal window. The new
part for me: The terminal runs a long-term GDB session exclusively, with
<code class="language-plaintext highlighter-rouge">file</code> set to the program I’m writing, usually set by initial the command
line.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gdb myprogram
gdb&gt;
</code></pre></div></div>

<p>Alternatively use <code class="language-plaintext highlighter-rouge">file</code> after starting GDB. Occasionally useful if my
project has multiple binaries, and I want to examine a different program.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; file myprogram
</code></pre></div></div>

<p>I use <code class="language-plaintext highlighter-rouge">make</code> and Vim’s <code class="language-plaintext highlighter-rouge">:mak</code> command for building from within the editor,
so I don’t need to change context to build. The quickfix list takes me
straight to warnings/errors. Often I’m writing something that takes input
from standard input. So I use the <code class="language-plaintext highlighter-rouge">run</code> (<code class="language-plaintext highlighter-rouge">r</code>) command to set this up
(along with any command line arguments).</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; r &lt;test.txt
</code></pre></div></div>

<p>You can redirect standard output as well. It remembers these settings for
plain <code class="language-plaintext highlighter-rouge">run</code> later, so I can test my program by entering <code class="language-plaintext highlighter-rouge">r</code> and nothing
else.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; r
</code></pre></div></div>

<p>My usual workflow is edit, <code class="language-plaintext highlighter-rouge">:mak</code>, <code class="language-plaintext highlighter-rouge">r</code>, repeat. If I want to test a
different input or use different options, change the run configuration
using <code class="language-plaintext highlighter-rouge">run</code> again:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; r -a -b -c &lt;test2.txt
</code></pre></div></div>

<p>On Windows you cannot recompile while the program is running. If GDB is
sitting on a breakpoint but I want to build, use <code class="language-plaintext highlighter-rouge">kill</code> (<code class="language-plaintext highlighter-rouge">k</code>) to stop it
without exiting GDB.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; k
</code></pre></div></div>

<p>GDB has an annoying, flow-breaking yes/no prompt for this, so I recommend
<code class="language-plaintext highlighter-rouge">set confirm no</code> in your <code class="language-plaintext highlighter-rouge">.gdbinit</code> to disable it.</p>

<p>Sometimes a program is stuck in a loop and I need it to break in the
debugger. I try to avoid CTRL-C in the terminal it since it can confuse
GDB. A safer option is to signal the process from Vim with <code class="language-plaintext highlighter-rouge">pkill</code>, which
GDB will catch (except on Windows):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>:!pkill myprogram
</code></pre></div></div>

<p>I suspect many people don’t know this, but if you’re on Windows and
<a href="/blog/2021/03/11/">developing a graphical application</a>, you can <a href="https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-registerhotkey">press F12</a> in the
debuggee’s window to immediately break the program in the attached
debugger. This is a general platform feature and works with any native
debugger. I’ve been using it quite a lot.</p>

<p>On that note, you can run commands from GDB with <code class="language-plaintext highlighter-rouge">!</code>, which is another way
to avoid having an extra terminal window around:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; !git diff
</code></pre></div></div>

<p>In any case, GDB will re-read the binary on the next <code class="language-plaintext highlighter-rouge">run</code> and update
breakpoints, so it’s mostly seamless. If there’s a function I want to
debug, I set a breakpoint on it, then run.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; b somefunc
gdb&gt; r
</code></pre></div></div>

<p>Alternatively I’ll use a line number, which I read from Vim. Though GDB,
not being involved in the editing process, cannot track how that line
moves between builds.</p>

<p>An empty command repeats the last command, so once I’m at a breakpoint,
I’ll type <code class="language-plaintext highlighter-rouge">next</code> (<code class="language-plaintext highlighter-rouge">n</code>) — or <code class="language-plaintext highlighter-rouge">step</code> (<code class="language-plaintext highlighter-rouge">s</code>) to enter function calls — then
press enter each time I want to advance a line, often with my eye on the
context in Vim in the other window:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; n
gdb&gt;
gdb&gt;
</code></pre></div></div>

<p>(<del>I wish GDB could print a source listing around the breakpoint as
context, like Delve, but no such feature exists. The woeful <code class="language-plaintext highlighter-rouge">list</code> command
is inadequate.</del> <strong>Update</strong>: GDB’s TUI is a reasonable compromise for GUI
applications or terminal applications running under a separate tty/console
with either <code class="language-plaintext highlighter-rouge">tty</code> or <code class="language-plaintext highlighter-rouge">set new-console</code>. I can access it everywhere since
w64devkit now supports GDB TUI.)</p>

<p>If I want to advance to the next breakpoint, I use <code class="language-plaintext highlighter-rouge">continue</code> (<code class="language-plaintext highlighter-rouge">c</code>):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; c
</code></pre></div></div>

<p>If I’m walking through a loop, I want to see how variables change, but
it’s tedious to keep <code class="language-plaintext highlighter-rouge">print</code>ing (<code class="language-plaintext highlighter-rouge">p</code>) the same variables again and again.
So I use <code class="language-plaintext highlighter-rouge">display</code> (<code class="language-plaintext highlighter-rouge">disp</code>) to display an expression with each prompt,
much like the “watch” window in Visual Studio. For example, if my loop
variable is <code class="language-plaintext highlighter-rouge">i</code> over some string <code class="language-plaintext highlighter-rouge">str</code>, this will show me the current
character in character format (<code class="language-plaintext highlighter-rouge">/c</code>).</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; disp/c str[i]
</code></pre></div></div>

<p>You can accumulate multiple expressions. Use <code class="language-plaintext highlighter-rouge">undisplay</code> to remove them.</p>

<p>Too many breakpoints? Use <code class="language-plaintext highlighter-rouge">info breakpoints</code> (<code class="language-plaintext highlighter-rouge">i b</code>) to list them, then
<code class="language-plaintext highlighter-rouge">delete</code> (<code class="language-plaintext highlighter-rouge">d</code>) the unwanted ones by ID.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; i b
gdb&gt; d 3 5 8
</code></pre></div></div>

<p>GDB has many more feature than this, but 10 commands cover 99% of use
cases: <code class="language-plaintext highlighter-rouge">r</code>, <code class="language-plaintext highlighter-rouge">c</code>, <code class="language-plaintext highlighter-rouge">n</code>, <code class="language-plaintext highlighter-rouge">s</code>, <code class="language-plaintext highlighter-rouge">disp</code>, <code class="language-plaintext highlighter-rouge">k</code>, <code class="language-plaintext highlighter-rouge">b</code>, <code class="language-plaintext highlighter-rouge">i</code>, <code class="language-plaintext highlighter-rouge">d</code>, <code class="language-plaintext highlighter-rouge">p</code>.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  <entry>
    <title>A flexible, lightweight, spin-lock barrier</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2022/03/13/"/>
    <id>urn:uuid:5a72d27a-60f4-4b52-a4c2-f1c3b72e6c85</id>
    <updated>2022-03-13T23:55:08Z</updated>
    <category term="c"/><category term="cpp"/><category term="go"/><category term="x86"/><category term="optimization"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=30671979">on Hacker News</a>.</em></p>

<p>The other day I wanted try the famous <a href="https://preshing.com/20120515/memory-reordering-caught-in-the-act/">memory reordering experiment</a>
for myself. It’s the double-slit experiment of concurrency, where a
program can observe an <a href="https://research.swtch.com/hwmm">“impossible” result</a> on common hardware, as
though a thread had time-traveled. While getting thread timing as tight as
possible, I designed a possibly-novel thread barrier. It’s purely
spin-locked, the entire footprint is a zero-initialized integer, it
automatically resets, it can be used across processes, and the entire
implementation is just three to four lines of code.</p>

<!--more-->

<p>Here’s the entire barrier implementation for two threads in C11.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Spin-lock barrier for two threads. Initialize *barrier to zero.</span>
<span class="kt">void</span> <span class="nf">barrier_wait</span><span class="p">(</span><span class="k">_Atomic</span> <span class="kt">uint32_t</span> <span class="o">*</span><span class="n">barrier</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">uint32_t</span> <span class="n">v</span> <span class="o">=</span> <span class="o">++*</span><span class="n">barrier</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">v</span> <span class="o">&amp;</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">for</span> <span class="p">(</span><span class="n">v</span> <span class="o">&amp;=</span> <span class="mi">2</span><span class="p">;</span> <span class="p">(</span><span class="o">*</span><span class="n">barrier</span><span class="o">&amp;</span><span class="mi">2</span><span class="p">)</span> <span class="o">==</span> <span class="n">v</span><span class="p">;);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Or in Go:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">BarrierWait</span><span class="p">(</span><span class="n">barrier</span> <span class="o">*</span><span class="kt">uint32</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">v</span> <span class="o">:=</span> <span class="n">atomic</span><span class="o">.</span><span class="n">AddUint32</span><span class="p">(</span><span class="n">barrier</span><span class="p">,</span> <span class="m">1</span><span class="p">)</span>
    <span class="k">if</span> <span class="n">v</span><span class="o">&amp;</span><span class="m">1</span> <span class="o">==</span> <span class="m">1</span> <span class="p">{</span>
        <span class="n">v</span> <span class="o">&amp;=</span> <span class="m">2</span>
        <span class="k">for</span> <span class="n">atomic</span><span class="o">.</span><span class="n">LoadUint32</span><span class="p">(</span><span class="n">barrier</span><span class="p">)</span><span class="o">&amp;</span><span class="m">2</span> <span class="o">==</span> <span class="n">v</span> <span class="p">{</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Even more, these two implementations are compatible with each other. C
threads and Go goroutines can synchronize on a common barrier using these
functions. Also note how it only uses two bits.</p>

<p>When I was done with my experiment, I did a quick search online for other
spin-lock barriers to see if anyone came up with the same idea. I found a
couple of <a href="https://web.archive.org/web/20151109230817/https://stackoverflow.com/questions/33598686/spinning-thread-barrier-using-atomic-builtins">subtly-incorrect</a> spin-lock barriers, and some
straightforward barrier constructions using a mutex spin-lock.</p>

<p>Before diving into how this works, and how to generalize it, let’s discuss
the circumstance that let to its design.</p>

<h3 id="experiment">Experiment</h3>

<p>Here’s the setup for the memory reordering experiment, where <code class="language-plaintext highlighter-rouge">w0</code> and <code class="language-plaintext highlighter-rouge">w1</code>
are initialized to zero.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>thread#1    thread#2
w0 = 1      w1 = 1
r1 = w1     r0 = w0
</code></pre></div></div>

<p>Considering all the possible orderings, it would seem that at least one of
<code class="language-plaintext highlighter-rouge">r0</code> or <code class="language-plaintext highlighter-rouge">r1</code> is 1. There seems to be no ordering where <code class="language-plaintext highlighter-rouge">r0</code> and <code class="language-plaintext highlighter-rouge">r1</code> could
both be 0. However, if raced precisely, this is a frequent or possibly
even majority occurrence on common hardware, including x86 and ARM.</p>

<p>How to go about running this experiment? These are concurrent loads and
stores, so it’s tempting to use <code class="language-plaintext highlighter-rouge">volatile</code> for <code class="language-plaintext highlighter-rouge">w0</code> and <code class="language-plaintext highlighter-rouge">w1</code>. However,
this would constitute a data race — undefined behavior in at least C and
C++ — and so we couldn’t really reason much about the results, at least
not without first verifying the compiler’s assembly. These are variables
in a high-level language, not architecture-level stores/loads, even with
<code class="language-plaintext highlighter-rouge">volatile</code>.</p>

<p>So my first idea was to use a bit of inline assembly for all accesses that
would otherwise be data races. x86-64:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">int</span> <span class="nf">experiment</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">w0</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">w1</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">r1</span><span class="p">;</span>
    <span class="kr">__asm</span> <span class="k">volatile</span> <span class="p">(</span>
        <span class="s">"movl  $1, %1</span><span class="se">\n</span><span class="s">"</span>
        <span class="s">"movl  %2, %0</span><span class="se">\n</span><span class="s">"</span>
        <span class="o">:</span> <span class="s">"=r"</span><span class="p">(</span><span class="n">r1</span><span class="p">),</span> <span class="s">"=m"</span><span class="p">(</span><span class="o">*</span><span class="n">w0</span><span class="p">)</span>
        <span class="o">:</span> <span class="s">"m"</span><span class="p">(</span><span class="o">*</span><span class="n">w1</span><span class="p">)</span>
    <span class="p">);</span>
    <span class="k">return</span> <span class="n">r1</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>ARM64 (to try on my Raspberry Pi):</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">int</span> <span class="nf">experiment</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">w0</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">w1</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">r1</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
    <span class="kr">__asm</span> <span class="k">volatile</span> <span class="p">(</span>
        <span class="s">"str  %w0, %1</span><span class="se">\n</span><span class="s">"</span>
        <span class="s">"ldr  %w0, %2</span><span class="se">\n</span><span class="s">"</span>
        <span class="o">:</span> <span class="s">"+r"</span><span class="p">(</span><span class="n">r1</span><span class="p">),</span> <span class="s">"=m"</span><span class="p">(</span><span class="n">w0</span><span class="p">)</span>
        <span class="o">:</span> <span class="s">"m"</span><span class="p">(</span><span class="n">w1</span><span class="p">)</span>
    <span class="p">);</span>
    <span class="k">return</span> <span class="n">r1</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This is from the point-of-view of thread#1, but I can swap the arguments
for thread#2. I’m expecting this to be inlined, and encouraging it with
<code class="language-plaintext highlighter-rouge">static</code>.</p>

<p>Alternatively, I could use C11 atomics with a relaxed memory order:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">int</span> <span class="nf">experiment</span><span class="p">(</span><span class="k">_Atomic</span> <span class="kt">int</span> <span class="o">*</span><span class="n">w0</span><span class="p">,</span> <span class="k">_Atomic</span> <span class="kt">int</span> <span class="o">*</span><span class="n">w1</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">atomic_store_explicit</span><span class="p">(</span><span class="n">w0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">memory_order_relaxed</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">atomic_load_explicit</span><span class="p">(</span><span class="n">w1</span><span class="p">,</span> <span class="n">memory_order_relaxed</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Since this is a <em>race</em> and I want both threads to run their two experiment
instructions as simultaneously as possible, it would be wise to use some
sort of <em>starting barrier</em>… exactly the purpose of a thread barrier! It
will hold the threads back until they’re both ready.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">w0</span><span class="p">,</span> <span class="n">w1</span><span class="p">,</span> <span class="n">r0</span><span class="p">,</span> <span class="n">r1</span><span class="p">;</span>

<span class="c1">// thread#1                   // thread#2</span>
<span class="n">w0</span> <span class="o">=</span> <span class="n">w1</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">BARRIER</span><span class="p">;</span>                      <span class="n">BARRIER</span><span class="p">;</span>
<span class="n">r1</span> <span class="o">=</span> <span class="n">experiment</span><span class="p">(</span><span class="o">&amp;</span><span class="n">w0</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">w1</span><span class="p">);</span>    <span class="n">r0</span> <span class="o">=</span> <span class="n">experiment</span><span class="p">(</span><span class="o">&amp;</span><span class="n">w1</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">w0</span><span class="p">);</span>
<span class="n">BARRIER</span><span class="p">;</span>                      <span class="n">BARRIER</span><span class="p">;</span>

<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">r0</span> <span class="o">&amp;&amp;</span> <span class="o">!</span><span class="n">r1</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">puts</span><span class="p">(</span><span class="s">"impossible!"</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The second thread goes straight into the barrier, but the first thread
does a little more work to initialize the experiment and a little more at
the end to check the result. The second barrier ensures they’re both done
before checking.</p>

<p>Running this only once isn’t so useful, so each thread loops a few million
times, hence the re-initialization in thread#1. The barriers keep them
lockstep.</p>

<h3 id="barrier-selection">Barrier selection</h3>

<p>On my first attempt, I made the obvious decision for the barrier: I used
<a href="https://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_barrier_wait.html"><code class="language-plaintext highlighter-rouge">pthread_barrier_t</code></a>. I was already using pthreads for spawning the
extra thread, including <a href="/blog/2020/05/15/">on Windows</a>, so this was convenient.</p>

<p>However, my initial results were disappointing. I only observed an
“impossible” result around one in a million trials. With some debugging I
determined that the pthreads barrier was just too damn slow, throwing off
the timing. This was especially true with winpthreads, bundled with
Mingw-w64, which in addition to the per-barrier mutex, grabs a <em>global</em>
lock <em>twice</em> per wait to manage the barrier’s reference counter.</p>

<p>All pthreads implementations I used were quick to yield to the system
scheduler. The first thread to arrive at the barrier would go to sleep,
the second thread would wake it up, and it was rare they’d actually race
on the experiment. This is perfectly reasonable for a pthreads barrier
designed for the general case, but I really needed a <em>spin-lock barrier</em>.
That is, the first thread to arrive spins in a loop until the second
thread arrives, and it never interacts with the scheduler. This happens so
frequently and quickly that it should only spin for a few iterations.</p>

<h3 id="barrier-design">Barrier design</h3>

<p>Spin locking means atomics. By default, atomics have sequentially
consistent ordering and will provide the necessary synchronization for the
non-atomic experiment variables. Stores (e.g. to <code class="language-plaintext highlighter-rouge">w0</code>, <code class="language-plaintext highlighter-rouge">w1</code>) made before
the barrier will be visible to all other threads upon passing through the
barrier. In other words, the initialization will propagate before either
thread exits the first barrier, and results propagate before either thread
exits the second barrier.</p>

<p>I know statically that there are only two threads, simplifying the
implementation. The plan: When threads arrive, they atomically increment a
shared variable to indicate such. The first to arrive will see an odd
number, telling it to atomically read the variable in a loop until the
other thread changes it to an even number.</p>

<p>At first with just two threads this might seem like a single bit would
suffice. If the bit is set, the other thread hasn’t arrived. If clear,
both threads have arrived.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">broken_wait1</span><span class="p">(</span><span class="k">_Atomic</span> <span class="kt">unsigned</span> <span class="o">*</span><span class="n">barrier</span><span class="p">)</span>
<span class="p">{</span>
    <span class="o">++*</span><span class="n">barrier</span><span class="p">;</span>
    <span class="k">while</span> <span class="p">(</span><span class="o">*</span><span class="n">barrier</span><span class="o">&amp;</span><span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Or to avoid an extra load, use the result directly:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">broken_wait2</span><span class="p">(</span><span class="k">_Atomic</span> <span class="kt">unsigned</span> <span class="o">*</span><span class="n">barrier</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">++*</span><span class="n">barrier</span> <span class="o">&amp;</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">while</span> <span class="p">(</span><span class="o">*</span><span class="n">barrier</span><span class="o">&amp;</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Neither of these work correctly, and the other mutex-free barriers I found
all have the same defect. Consider the broader picture: Between atomic
loads in the first thread spin-lock loop, suppose the second thread
arrives, passes through the barrier, does its work, hits the next barrier,
and increments the counter. Both threads see an odd counter simultaneously
and deadlock. No good.</p>

<p>To fix this, the wait function must also track the <em>phase</em>. The first
barrier is the first phase, the second barrier is the second phase, etc.
Conveniently <strong>the rest of the integer acts like a phase counter</strong>!
Writing this out more explicitly:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">barrier_wait</span><span class="p">(</span><span class="k">_Atomic</span> <span class="kt">unsigned</span> <span class="o">*</span><span class="n">barrier</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">unsigned</span> <span class="n">observed</span> <span class="o">=</span> <span class="o">++*</span><span class="n">barrier</span><span class="p">;</span>
    <span class="kt">unsigned</span> <span class="n">thread_count</span> <span class="o">=</span> <span class="n">observed</span> <span class="o">&amp;</span> <span class="mi">1</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">thread_count</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// not last arrival, watch for phase change</span>
        <span class="kt">unsigned</span> <span class="n">init_phase</span> <span class="o">=</span> <span class="n">observed</span> <span class="o">&gt;&gt;</span> <span class="mi">1</span><span class="p">;</span>
        <span class="k">for</span> <span class="p">(;;)</span> <span class="p">{</span>
            <span class="kt">unsigned</span> <span class="n">current_phase</span> <span class="o">=</span> <span class="o">*</span><span class="n">barrier</span> <span class="o">&gt;&gt;</span> <span class="mi">1</span><span class="p">;</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">current_phase</span> <span class="o">!=</span> <span class="n">init_phase</span><span class="p">)</span> <span class="p">{</span>
                <span class="k">break</span><span class="p">;</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The key: When the last thread arrives, it overflows the thread counter to
zero and increments the phase counter in one operation.</p>

<p>By the way, I’m using <code class="language-plaintext highlighter-rouge">unsigned</code> since it may eventually overflow, and
even <code class="language-plaintext highlighter-rouge">_Atomic int</code> overflow is undefined for the <code class="language-plaintext highlighter-rouge">++</code> operator. However,
if you use <code class="language-plaintext highlighter-rouge">atomic_fetch_add</code> or C++ <code class="language-plaintext highlighter-rouge">std::atomic</code> then overflow is
defined and you can use <code class="language-plaintext highlighter-rouge">int</code>.</p>

<p>Threads can never be more than one phase apart by definition, so only one
bit is needed for the phase counter, making this effectively a two-phase,
two-bit barrier. In my final implementation, rather than shift (<code class="language-plaintext highlighter-rouge">&gt;&gt;</code>), I
mask (<code class="language-plaintext highlighter-rouge">&amp;</code>) the phase bit with 2.</p>

<p>With this spin-lock barrier, the experiment observes <code class="language-plaintext highlighter-rouge">r0 = r1 = 0</code> in ~10%
of trials on my x86 machines and ~75% of trials on my Raspberry Pi 4.</p>

<h3 id="generalizing-to-more-threads">Generalizing to more threads</h3>

<p>Two threads required two bits. This generalizes to <code class="language-plaintext highlighter-rouge">log2(n)+1</code> bits for
<code class="language-plaintext highlighter-rouge">n</code> threads, where <code class="language-plaintext highlighter-rouge">n</code> is a power of two. You may have already figured out
how to support more threads: spend more bits on the thread counter.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Spin-lock barrier for n threads, where n is a power of two.</span>
<span class="c1">// Initialize *barrier to zero.</span>
<span class="kt">void</span> <span class="nf">barrier_waitn</span><span class="p">(</span><span class="k">_Atomic</span> <span class="kt">unsigned</span> <span class="o">*</span><span class="n">barrier</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">unsigned</span> <span class="n">v</span> <span class="o">=</span> <span class="o">++*</span><span class="n">barrier</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">v</span> <span class="o">&amp;</span> <span class="p">(</span><span class="n">n</span> <span class="o">-</span> <span class="mi">1</span><span class="p">))</span> <span class="p">{</span>
        <span class="k">for</span> <span class="p">(</span><span class="n">v</span> <span class="o">&amp;=</span> <span class="n">n</span><span class="p">;</span> <span class="p">(</span><span class="o">*</span><span class="n">barrier</span><span class="o">&amp;</span><span class="n">n</span><span class="p">)</span> <span class="o">==</span> <span class="n">v</span><span class="p">;);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Note: <strong>It never makes sense for <code class="language-plaintext highlighter-rouge">n</code> to exceed the logical core count!</strong>
If it does, then at least one thread must not be actively running. The
spin-lock ensures it does not get scheduled promptly, and the barrier will
waste lots of resources doing nothing in the meantime.</p>

<p>If the barrier is used little enough that you won’t overflow the overall
barrier integer — maybe just use a <code class="language-plaintext highlighter-rouge">uint64_t</code> — an implementation could
support arbitrary thread counts with the same principle using modular
division instead of the <code class="language-plaintext highlighter-rouge">&amp;</code> operator. The denominator is ideally a
compile-time constant in order to avoid paying for division in the
spin-lock loop.</p>

<p>While C11 <code class="language-plaintext highlighter-rouge">_Atomic</code> seems like it would be useful, unsurprisingly it is
not supported by one major, <a href="/blog/2021/12/30/">stubborn</a> implementation. If you’re
using C++11 or later, then go ahead use <code class="language-plaintext highlighter-rouge">std::atomic&lt;int&gt;</code> since it’s
well-supported. In real, practical C programs, I will continue using dual
implementations: interlocked functions on MSVC, and GCC built-ins (also
supported by Clang) everywhere else.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#if __GNUC__
#  define BARRIER_INC(x) __atomic_add_fetch(x, 1, __ATOMIC_SEQ_CST)
#  define BARRIER_GET(x) __atomic_load_n(x, __ATOMIC_SEQ_CST)
#elif _MSC_VER
#  define BARRIER_INC(x) _InterlockedIncrement(x)
#  define BARRIER_GET(x) _InterlockedOr(x, 0)
#endif
</span>
<span class="c1">// Spin-lock barrier for n threads, where n is a power of two.</span>
<span class="c1">// Initialize *barrier to zero.</span>
<span class="k">static</span> <span class="kt">void</span> <span class="nf">barrier_wait</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">barrier</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">v</span> <span class="o">=</span> <span class="n">BARRIER_INC</span><span class="p">(</span><span class="n">barrier</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">v</span> <span class="o">&amp;</span> <span class="p">(</span><span class="n">n</span> <span class="o">-</span> <span class="mi">1</span><span class="p">))</span> <span class="p">{</span>
        <span class="k">for</span> <span class="p">(</span><span class="n">v</span> <span class="o">&amp;=</span> <span class="n">n</span><span class="p">;</span> <span class="p">(</span><span class="n">BARRIER_GET</span><span class="p">(</span><span class="n">barrier</span><span class="p">)</span><span class="o">&amp;</span><span class="n">n</span><span class="p">)</span> <span class="o">==</span> <span class="n">v</span><span class="p">;);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This has the nice bonus that the interface does not have the <code class="language-plaintext highlighter-rouge">_Atomic</code>
qualifier, nor <code class="language-plaintext highlighter-rouge">std::atomic</code> template. It’s just a plain old <code class="language-plaintext highlighter-rouge">int</code>, making
the interface simpler and easier to use. It’s something I’ve grown to
appreciate from Go.</p>

<p>If you’d like to try the experiment yourself: <a href="https://gist.github.com/skeeto/c63b9ddf2c599eeca86356325b93f3a7"><code class="language-plaintext highlighter-rouge">reorder.c</code></a>. If
you’d like to see a test of Go and C sharing a thread barrier:
<a href="https://gist.github.com/skeeto/bdb5a0d2aa36b68b6f66ca39989e1444"><code class="language-plaintext highlighter-rouge">coop.go</code></a>.</p>

<p>I’m intentionally not providing the spin-lock barrier as a library. First,
it’s too trivial and small for that, and second, I believe <a href="https://vimeo.com/644068002">context is
everything</a>. Now that you understand the principle, you can whip up
your own, custom-tailored implementation when the situation calls for it,
just as the one in my experiment is hard-coded for exactly two threads.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Some sanity for C and C++ development on Windows</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2021/12/30/"/>
    <id>urn:uuid:2e417030-915f-4897-99ff-2a0dafd0ac89</id>
    <updated>2021-12-30T23:25:53Z</updated>
    <category term="c"/><category term="cpp"/><category term="win32"/>
    <content type="html">
      <![CDATA[<p>A hard reality of C and C++ software development on Windows is that there
has never been a good, native C or C++ standard library implementation for
the platform. A standard library should abstract over the underlying host
facilities in order to ease portable software development. On Windows, C
and C++ is so poorly hooked up to operating system interfaces that most
portable or mostly-portable software — programs which work perfectly
elsewhere — are subtly broken on Windows, particularly outside of the
English-speaking world. The reasons are almost certainly political,
originally motivated by vendor lock-in, than technical, which adds insult
to injury. This article is about what’s wrong, how it’s wrong, and some
easy techniques to deal with it in portable software.</p>

<p>There are <a href="/blog/2016/06/13/">multiple C implementations</a>, so how could they all be
bad, even the <a href="/blog/2018/04/13/">early ones</a>? Microsoft’s C runtime has defined how
the standard library should work on the platform, and everyone else
followed along for the sake of compatibility. I’m excluding <a href="https://www.cygwin.com/">Cygwin</a> and
its major fork, <a href="https://www.msys2.org/">MSYS2</a>, despite not inheriting any of these flaws. They
change so much that they’re effectively whole new platforms, not truly
“native” to Windows.</p>

<p>In practice, C++ standard libraries are implemented on top of a C standard
library, which is why C++ shares the same problems. CPython dodges these
issues: Though written in C, on Windows it bypasses the broken C standard
library and directly calls the proprietary interfaces. Other language
implementations, such “gc” Go, simply aren’t built on C at all, and
instead do things correctly in the first place — the behaviors the C
runtimes should have had all along.</p>

<p>If you’re just working on one large project, bypassing the C runtime isn’t
such a big deal, and you’re likely already doing so to access important
platform functionality. You don’t really even need a C runtime. However,
if you write many small programs, <a href="https://github.com/skeeto/scratch">as I do</a>, writing the same
special Windows support for each one ends up being most of the work, and
honestly makes properly supporting Windows not worth the trouble. I end up
just accepting the broken defaults most of the time.</p>

<p>Before diving into the details, if you’re looking for a quick-and-easy
solution for the Mingw-w64 toolchain, <a href="/blog/2020/05/15/">including w64devkit</a>, which
magically makes your C and C++ console programs behave well on Windows,
I’ve put together a “library” named <strong><a href="https://github.com/skeeto/scratch/tree/master/libwinsane">libwinsane</a></strong>. It solves all
problems discussed in this article, except for one. No source changes
required, simply link it into your program.</p>

<h3 id="what-exactly-is-broken">What exactly is broken?</h3>

<p>The Windows API comes in two flavors: narrow with an “A” (“ANSI”) suffix,
and wide (Unicode, UTF-16) with a “W” suffix. The former is the legacy
API, where an active <em>code page</em> maps 256 bytes onto (up to) 256 specific
characters. On typical machines configured for European languages, this
means <a href="https://en.wikipedia.org/wiki/Windows-1252">code page 1252</a>. <a href="http://simonsapin.github.io/wtf-8/">Roughly speaking</a>, Windows
internally uses UTF-16, and calls through the narrow interface use the
active code page to translate the narrow strings to wide strings. The
result is that calls through the narrow API have limited access to the
system.</p>

<p>The UTF-8 encoding was invented in 1992 and standardized by January 1993.
UTF-8 was adopted by the unix world over the following years due to <a href="/blog/2017/10/06/#what-is-utf-8">its
backwards-compatibility</a> with its existing interfaces. Programs
could read and write Unicode data, access Unicode paths, pass Unicode
arguments, and get and set Unicode environment variables without needing
to change anything. Today UTF-8 has become the dominant text encoding
format in the world, in large part due to the world wide web.</p>

<p>In July 1993, Microsoft introduced the wide Windows API with the release
of Windows NT 3.1, placing all their bets on UCS-2 (later UTF-16) rather
than UTF-8. This turned out to be a mistake, since <a href="http://utf8everywhere.org/">UTF-16 is inferior to
UTF-8 in practically every way</a>, though admittedly some problems
weren’t so obvious at the time.</p>

<p>The major problem: <strong>The C and C++ standard libraries only hook up to the
narrow Windows interfaces</strong>. The standard library, and therefore typical
portable software on Windows, cannot handle anything but ASCII. The
effective result is that these programs:</p>

<ul>
  <li>Cannot accept non-ASCII arguments</li>
  <li>Cannot get/set non-ASCII environment variables</li>
  <li>Cannot access non-ASCII paths</li>
  <li>Cannot read and write non-ASCII on a console</li>
</ul>

<p>Doing any of these requires calling proprietary functions, treating
Windows as a special target. It’s part of what makes correctly porting
software to Windows so painful.</p>

<p>The sensible solution would have been for the C runtime to speak UTF-8 and
connect to the wide API. Alternatively, the narrow API could have been
changed over to UTF-8, phasing out the old code page concept. In theory
this is what the UTF-8 “code page” is about, though it doesn’t always
work. There would have been compatibility problems with abruptly making
such a change, but until very recently, <em>this wasn’t even an option</em>. Why
couldn’t there be a switch I could flip to get sane behavior that works
like every other platform?</p>

<h3 id="how-to-mostly-fix-unicode-support">How to mostly fix Unicode support</h3>

<p>In 2019, Microsoft introduced a feature to allow programs to <a href="https://docs.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page">request
UTF-8 as their active code page on start</a>, along with supporting
UTF-8 on more narrow API functions. This is like the magic switch I
wanted, except that it involves embedding some ugly XML into your binary
in a particular way. At least it’s now an option.</p>

<p>For Mingw-w64, that means writing a resource file like so:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#include &lt;winuser.h&gt;
CREATEPROCESS_MANIFEST_RESOURCE_ID RT_MANIFEST "utf8.xml"
</code></pre></div></div>

<p>Compiling it with <code class="language-plaintext highlighter-rouge">windres</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ windres -o manifest.o manifest.rc
</code></pre></div></div>

<p>Then linking that into your program. Amazingly it mostly works! Programs
can access Unicode arguments, Unicode environment variables, and Unicode
paths, including with <code class="language-plaintext highlighter-rouge">fopen</code>, just as it’s worked on other platforms for
decades. Since the active code page is set at load time, it happens before
<code class="language-plaintext highlighter-rouge">argv</code> is constructed (from <code class="language-plaintext highlighter-rouge">GetCommandLineA</code>), which is why that works
out.</p>

<p>Alternatively you could create a “side-by-side assembly” placing that XML
in a file with the same name as your EXE but with <code class="language-plaintext highlighter-rouge">.manifest</code> suffix
(after the <code class="language-plaintext highlighter-rouge">.exe</code> suffix), then placing that next to your EXE. Just be
mindful that there’s a “side-by-side” cache (WinSxS), and so it might not
immediately pick up your changes.</p>

<p>What <em>doesn’t</em> work is console input and output since the console is
external to the process, and so isn’t covered by the process’s active code
page. It must be configured separately using a proprietary call:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">SetConsoleOutputCP</span><span class="p">(</span><span class="n">CP_UTF8</span><span class="p">);</span>
</code></pre></div></div>

<p>Annoying, but at least it’s not <em>that</em> painful. This only covers output,
though, meaning programs can only print UTF-8. Unfortunately <a href="https://github.com/microsoft/terminal/issues/4551#issuecomment-585487802">UTF-8 input
still doesn’t work</a>, and setting the input code page doesn’t do
anything despite reporting success:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">SetConsoleCP</span><span class="p">(</span><span class="n">CP_UTF8</span><span class="p">);</span>  <span class="c1">// doesn't work</span>
</code></pre></div></div>

<p>If you care about reading interactive Unicode input, you’re <a href="/blog/2020/05/04/">stuck
bypassing the C runtime</a> since it’s still broken.</p>

<h3 id="text-stream-translation">Text stream translation</h3>

<p>Another long-standing issue is that C and C++ on Windows has distinct
“text” and “binary” streams, which it inherited from DOS. Mainly this
means automatic newline conversion between CRLF and LF. The C standard
explicitly allows for this, though unix-like platforms have never actually
distinguished between text and binary streams.</p>

<p>The standard also specifies that standard input, output, and error are all
open as text streams, and there’s no portable method to change the stream
mode to binary — a serious deficiency with the standard. On unix-likes
this doesn’t matter, but on Windows it means programs can’t read or write
binary data on standard streams without calling a non-standard function.
It also means reading and writing standard streams is slow, <a href="/blog/2021/12/04/">frequently a
bottleneck</a> unless I route around it.</p>

<p>Personally, I like <a href="/blog/2020/06/29/">writing binary data to standard output</a>,
<a href="/blog/2020/11/24/">including video</a>, and sometimes <a href="/blog/2017/07/02/">binary filters</a> that also read
binary input. I do it so often that in probably half my C programs I have
this snippet in <code class="language-plaintext highlighter-rouge">main</code> just so they work correctly on Windows:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="cp">#ifdef _WIN32
</span>    <span class="kt">int</span> <span class="nf">_setmode</span><span class="p">(</span><span class="kt">int</span><span class="p">,</span> <span class="kt">int</span><span class="p">);</span>
    <span class="n">_setmode</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mh">0x8000</span><span class="p">);</span>
    <span class="n">_setmode</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mh">0x8000</span><span class="p">);</span>
    <span class="cp">#endif
</span></code></pre></div></div>

<p>That incantation sets standard input and output in the C runtime to binary
mode without the need to include a header, making it compact, simple, and
self-contained.</p>

<p>This built-in newline translation, along with the Windows standard text
editor, Notepad, <a href="https://devblogs.microsoft.com/commandline/extended-eol-in-notepad/">lagging decades behind</a>, meant that many other
programs, including Git, grew their own, annoying, newline conversion
<a href="https://github.com/skeeto/w64devkit/issues/10">misfeatures</a> that cause <a href="https://github.com/skeeto/binitools/commit/2efd690c3983856c9633b0be66d57483491d1e10">other problems</a>.</p>

<h3 id="libwinsane">libwinsane</h3>

<p>I introduced libwinsane at the beginning of the article, which fixes all
this simply by being linked into a program. It includes the magic XML
manifest <code class="language-plaintext highlighter-rouge">.rsrc</code> section, configures the console for UTF-8 output, and
sets standard streams to binary before <code class="language-plaintext highlighter-rouge">main</code> (via a GCC constructor). I
called it a “library”, but it’s actually a single object file. It can’t be
a static library since it must be linked into the program despite not
actually being referenced by the program.</p>

<p>So normally this program:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;string.h&gt;</span><span class="cp">
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">argv</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">arg</span> <span class="o">=</span> <span class="n">argv</span><span class="p">[</span><span class="n">argc</span><span class="o">-</span><span class="mi">1</span><span class="p">];</span>
    <span class="kt">size_t</span> <span class="n">len</span> <span class="o">=</span> <span class="n">strlen</span><span class="p">(</span><span class="n">arg</span><span class="p">);</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"%zu %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">len</span><span class="p">,</span> <span class="n">arg</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Compiled and run:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>C:\&gt;cc -o example example.c
C:\&gt;example π
1 p
</code></pre></div></div>

<p>As usual, the Unicode argument is silently mangled into one byte. Linked
with libwinsane, it just works like everywhere else:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>C:\&gt;gcc -o example example.c libwinsane.o
C:\&gt;example π
2 π
</code></pre></div></div>

<p>If you’re maintaining a substantial program, you probably want to copy and
integrate the necessary parts of libwinsane into your project and build,
rather than always link against this loose object file. This is more for
convenience and for succinctly capturing the concept. You may even want to
<a href="https://github.com/skeeto/hastyhex/blob/f03b6e0f/hastyhex.c#L298-L309">enable ANSI escape processing</a> in your version.</p>

<p><strong>Update December 2024</strong>: Pavel Galkin <a href="https://lists.sr.ht/~skeeto/public-inbox/%3Cdf749edc-0413-4735-9cf2-c77db202cc6e@app.fastmail.com%3E">demonstrates how <code class="language-plaintext highlighter-rouge">libwinsane.o</code>
changes the console state</a>, which affects all processes associated
with the terminal. This is mostly unavoidable, and it’s one reason I’ve
since concluded that UTF-8 manifests are a poor solution. Better to <a href="/blog/2023/01/18/">solve
the problem using a platform layer</a>.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>More DLL fun with w64devkit: Go, assembly, and Python</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2021/06/29/"/>
    <id>urn:uuid:b2c53451-b12a-4f1a-a475-6c81096c9b5a</id>
    <updated>2021-06-29T21:50:30Z</updated>
    <category term="c"/><category term="cpp"/><category term="go"/><category term="win32"/><category term="x86"/>
    <content type="html">
      <![CDATA[<p>My previous article explained <a href="/blog/2021/05/31/">how to work with dynamic-link libraries
(DLLs) using w64devkit</a>. These techniques also apply to other
circumstances, including with languages and ecosystems outside of C and
C++. In particular, <a href="/blog/2020/05/15/">w64devkit</a> is a great complement to Go and reliably
fullfills all the needs of <a href="https://golang.org/cmd/cgo/">cgo</a> — Go’s C interop — and can even
bootstrap Go itself. As before, this article is in large part an exercise
in capturing practical information I’ve picked up over time.</p>

<h3 id="go-bootstrap-and-cgo">Go: bootstrap and cgo</h3>

<p>The primary Go implementation, confusingly <a href="https://golang.org/doc/faq#What_compiler_technology_is_used_to_build_the_compilers">named “gc”</a>, is an
<a href="/blog/2020/01/21/">incredible piece of software engineering</a>. This is apparent when
building the Go toolchain itself, a process that is fast, reliable, easy,
and simple. It was originally written in C, but was re-written in Go
starting with Go 1.5. The C compiler in w64devkit can build the original C
implementation which then can be used to bootstrap any more recent
version. It’s so easy that I personally never use official binary releases
and always bootstrap from source.</p>

<p>You will need the Go 1.4 source, <a href="https://dl.google.com/go/go1.4-bootstrap-20171003.tar.gz">go1.4-bootstrap-20171003.tar.gz</a>.
This “bootstrap” tarball is the last Go 1.4 release plus a few additional
bugfixes. You will also need the source of the actual version of Go you
want to use, such as Go 1.16.5 (latest version as of this writing).</p>

<p>Start by building Go 1.4 using w64devkit. On Windows, Go is built using a
batch script and no special build system is needed. Since it shouldn’t be
invoked with the BusyBox ash shell, I use <a href="/blog/2021/02/08/"><code class="language-plaintext highlighter-rouge">cmd.exe</code></a> explicitly.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ tar xf go1.4-bootstrap-20171003.tar.gz
$ mv go/ bootstrap
$ (cd bootstrap/src/ &amp;&amp; cmd /c make)
</code></pre></div></div>

<p>In about 30 seconds you’ll have a fully-working Go 1.4 toolchain. Next use
it to build the desired toolchain. You can move this new toolchain after
it’s built if necessary.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ export GOROOT_BOOTSTRAP="$PWD/bootstrap"
$ tar xf go1.16.5.src.tar.gz
$ (cd go/src/ &amp;&amp; cmd /c make)
</code></pre></div></div>

<p>At this point you can delete the bootstrap toolchain. You probably also
want to put Go on your PATH.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ rm -rf bootstrap/
$ printf 'PATH="$PATH;%s/go/bin"\n' "$PWD" &gt;&gt;~/.profile
$ source ~/.profile
</code></pre></div></div>

<p>Not only is Go now available, so is the full power of cgo. (Including <a href="https://dave.cheney.net/2016/01/18/cgo-is-not-go">its
costs</a> if used.)</p>

<h3 id="vim-suggestions">Vim suggestions</h3>

<p>Since w64devkit is oriented so much around Vim, here’s my personal Vim
configuration for Go. I don’t need or want fancy plugins, just access to
<code class="language-plaintext highlighter-rouge">goimports</code> and a couple of corrections to Vim’s built-in Go support (<code class="language-plaintext highlighter-rouge">[[</code>
and <code class="language-plaintext highlighter-rouge">]]</code> navigation). The included <code class="language-plaintext highlighter-rouge">ctags</code> understands Go, so tags
navigation works the same as it does with C. <code class="language-plaintext highlighter-rouge">\i</code> saves the current
buffer, runs <code class="language-plaintext highlighter-rouge">goimports</code>, and populates the quickfix list with any errors.
Similarly <code class="language-plaintext highlighter-rouge">:make</code> invokes <code class="language-plaintext highlighter-rouge">go build</code> and, as expected, populates the
quickfix list.</p>

<div class="language-vim highlighter-rouge"><div class="highlight"><pre class="highlight"><code>autocmd <span class="nb">FileType</span> <span class="k">go</span> <span class="k">setlocal</span> <span class="nb">makeprg</span><span class="p">=</span><span class="k">go</span>\ build
autocmd <span class="nb">FileType</span> <span class="k">go</span> <span class="nb">map</span> <span class="p">&lt;</span><span class="k">silent</span><span class="p">&gt;</span> <span class="p">&lt;</span><span class="k">buffer</span><span class="p">&gt;</span> <span class="p">&lt;</span>leader<span class="p">&gt;</span><span class="k">i</span>
<span class="se">    \</span> <span class="p">:</span><span class="k">update</span> \<span class="p">|</span>
<span class="se">    \</span> <span class="p">:</span><span class="k">cexpr</span> <span class="nb">system</span><span class="p">(</span><span class="s2">"goimports -w "</span> <span class="p">.</span> <span class="nb">expand</span><span class="p">(</span><span class="s2">"%"</span><span class="p">))</span> \<span class="p">|</span>
<span class="se">    \</span> <span class="p">:</span><span class="k">silent</span> <span class="k">edit</span><span class="p">&lt;</span><span class="k">cr</span><span class="p">&gt;</span>
autocmd <span class="nb">FileType</span> <span class="k">go</span> <span class="nb">map</span> <span class="p">&lt;</span><span class="k">buffer</span><span class="p">&gt;</span> <span class="p">[[</span>
<span class="se">    \</span> ?^\<span class="p">(</span>func\\<span class="p">|</span>var\\<span class="p">|</span><span class="nb">type</span>\\<span class="p">|</span><span class="k">import</span>\\<span class="p">|</span>package\<span class="p">)</span>\<span class="p">&gt;&lt;</span><span class="k">cr</span><span class="p">&gt;</span>
autocmd <span class="nb">FileType</span> <span class="k">go</span> <span class="nb">map</span> <span class="p">&lt;</span><span class="k">buffer</span><span class="p">&gt;</span> <span class="p">]]</span>
<span class="se">    \</span> /^\<span class="p">(</span>func\\<span class="p">|</span>var\\<span class="p">|</span><span class="nb">type</span>\\<span class="p">|</span><span class="k">import</span>\\<span class="p">|</span>package\<span class="p">)</span>\<span class="p">&gt;&lt;</span><span class="k">cr</span><span class="p">&gt;</span>
</code></pre></div></div>

<p>Go only comes with <code class="language-plaintext highlighter-rouge">gofmt</code> but <code class="language-plaintext highlighter-rouge">goimports</code> is just one command away, so
there’s little excuse not to have it:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ go install golang.org/x/tools/cmd/goimports@latest
</code></pre></div></div>

<p>Thanks to GOPROXY, all Go dependencies are accessible without (or before)
installing Git, so this tool installation works with nothing more than
w64devkit and a bootstrapped Go toolchain.</p>

<h3 id="cgo-dlls">cgo DLLs</h3>

<p>The intricacies of cgo are beyond the scope of this article, but the gist
is that a Go source file contains C source in a comment followed by
<code class="language-plaintext highlighter-rouge">import "C"</code>. The imported <code class="language-plaintext highlighter-rouge">C</code> object provides access to C types and
functions. Go functions marked with an <code class="language-plaintext highlighter-rouge">//export</code> comment, as well as the
commented C code, are accessible to C. The latter means we can use Go to
implement a C interface in a DLL, and the caller will have no idea they’re
actually talking to Go.</p>

<p>To illustrate, here’s an little C interface. To keep it simple, I’ve
specifically sidestepped some more complicated issues, particularly
involving memory management.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Which DLL am I running?</span>
<span class="kt">int</span> <span class="nf">version</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span>

<span class="c1">// Generate 64 bits from a CSPRNG.</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="kt">long</span> <span class="nf">rand64</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span>

<span class="c1">// Compute the Euclidean norm.</span>
<span class="kt">float</span> <span class="nf">dist</span><span class="p">(</span><span class="kt">float</span> <span class="n">x</span><span class="p">,</span> <span class="kt">float</span> <span class="n">y</span><span class="p">);</span>
</code></pre></div></div>

<p>Here’s a C implementation which I’m calling “version 1”.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;math.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;windows.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;ntsecapi.h&gt;</span><span class="cp">
</span>
<span class="kr">__declspec</span><span class="p">(</span><span class="n">dllexport</span><span class="p">)</span>
<span class="kt">int</span>
<span class="nf">version</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>

<span class="kr">__declspec</span><span class="p">(</span><span class="n">dllexport</span><span class="p">)</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="kt">long</span>
<span class="nf">rand64</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">unsigned</span> <span class="kt">long</span> <span class="kt">long</span> <span class="n">x</span><span class="p">;</span>
    <span class="n">RtlGenRandom</span><span class="p">(</span><span class="o">&amp;</span><span class="n">x</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">x</span><span class="p">));</span>
    <span class="k">return</span> <span class="n">x</span><span class="p">;</span>
<span class="p">}</span>

<span class="kr">__declspec</span><span class="p">(</span><span class="n">dllexport</span><span class="p">)</span>
<span class="kt">float</span>
<span class="nf">dist</span><span class="p">(</span><span class="kt">float</span> <span class="n">x</span><span class="p">,</span> <span class="kt">float</span> <span class="n">y</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="n">sqrtf</span><span class="p">(</span><span class="n">x</span><span class="o">*</span><span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="o">*</span><span class="n">y</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>As discussed in the previous article, each function is exported using
<code class="language-plaintext highlighter-rouge">__declspec</code> so that they’re available for import. As before:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cc -shared -Os -s -o hello1.dll hello1.c
</code></pre></div></div>

<p>Side note: This could be trivially converted into a C++ implementation
just by adding <code class="language-plaintext highlighter-rouge">extern "C"</code> to each declaration. It disables C++ features
like name mangling, and follows the C ABI so that the C++ functions appear
as C functions. Compiling the C++ DLL is exactly the same.</p>

<p>Suppose we wanted to implement this in Go instead of C. We already have
all the tools needed to do so. Here’s a Go implementation, “version 2”:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">package</span> <span class="n">main</span>

<span class="k">import</span> <span class="s">"C"</span>
<span class="k">import</span> <span class="p">(</span>
	<span class="s">"crypto/rand"</span>
	<span class="s">"encoding/binary"</span>
	<span class="s">"math"</span>
<span class="p">)</span>

<span class="c">//export version</span>
<span class="k">func</span> <span class="n">version</span><span class="p">()</span> <span class="n">C</span><span class="o">.</span><span class="kt">int</span> <span class="p">{</span>
	<span class="k">return</span> <span class="m">2</span>
<span class="p">}</span>

<span class="c">//export rand64</span>
<span class="k">func</span> <span class="n">rand64</span><span class="p">()</span> <span class="n">C</span><span class="o">.</span><span class="n">ulonglong</span> <span class="p">{</span>
	<span class="k">var</span> <span class="n">buf</span> <span class="p">[</span><span class="m">8</span><span class="p">]</span><span class="kt">byte</span>
	<span class="n">rand</span><span class="o">.</span><span class="n">Read</span><span class="p">(</span><span class="n">buf</span><span class="p">[</span><span class="o">:</span><span class="p">])</span>
	<span class="n">r</span> <span class="o">:=</span> <span class="n">binary</span><span class="o">.</span><span class="n">LittleEndian</span><span class="o">.</span><span class="n">Uint64</span><span class="p">(</span><span class="n">buf</span><span class="p">[</span><span class="o">:</span><span class="p">])</span>
	<span class="k">return</span> <span class="n">C</span><span class="o">.</span><span class="n">ulonglong</span><span class="p">(</span><span class="n">r</span><span class="p">)</span>
<span class="p">}</span>

<span class="c">//export dist</span>
<span class="k">func</span> <span class="n">dist</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span> <span class="n">C</span><span class="o">.</span><span class="n">float</span><span class="p">)</span> <span class="n">C</span><span class="o">.</span><span class="n">float</span> <span class="p">{</span>
	<span class="k">return</span> <span class="n">C</span><span class="o">.</span><span class="n">float</span><span class="p">(</span><span class="n">math</span><span class="o">.</span><span class="n">Sqrt</span><span class="p">(</span><span class="kt">float64</span><span class="p">(</span><span class="n">x</span><span class="o">*</span><span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="o">*</span><span class="n">y</span><span class="p">)))</span>
<span class="p">}</span>

<span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Note the use of C types for all arguments and return values. The <code class="language-plaintext highlighter-rouge">main</code>
function is required since this is the main package, but it will never be
called. The DLL is built like so:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ go build -buildmode=c-shared -o hello2.dll hello2.go
</code></pre></div></div>

<p>Without the <code class="language-plaintext highlighter-rouge">-o</code> option, the DLL will lack an extension. This works fine
since it’s mostly only convention on Windows, but it may be confusing
without it.</p>

<p>What if we need an import library? This will be required when linking with
the MSVC toolchain. In the previous article we asked Binutils to generate
one using <code class="language-plaintext highlighter-rouge">--out-implib</code>. For Go we have to handle this ourselves via
<code class="language-plaintext highlighter-rouge">gendef</code> and <code class="language-plaintext highlighter-rouge">dlltool</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gendef hello2.dll
$ dlltool -l hello2.lib -d hello2.def
</code></pre></div></div>

<p>The only way anyone upgrading would know version 2 was implemented in Go
is that the DLL is a lot bigger (a few MB vs. a few kB) since it now
contains an entire Go runtime.</p>

<h3 id="nasm-assembly-dll">NASM assembly DLL</h3>

<p>We could also go the other direction and implement the DLL using plain
assembly. It won’t even require linking against a C runtime.</p>

<p>w64devkit includes two assemblers: GAS (Binutils) which is used by GCC,
and NASM which has <a href="https://elronnd.net/writ/2021-02-13_att-asm.html">friendlier syntax</a>. I prefer the latter whenever
possible — exactly why I included NASM in the distribution. So here’s how
I implemented “version 3” in NASM assembly.</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">bits</span> <span class="mi">64</span>

<span class="nf">section</span> <span class="nv">.text</span>

<span class="nf">global</span> <span class="nb">Dl</span><span class="nv">lMainCRTStartup</span>
<span class="nf">export</span> <span class="nb">Dl</span><span class="nv">lMainCRTStartup</span>
<span class="nl">DllMainCRTStartup:</span>
	<span class="nf">mov</span> <span class="nb">eax</span><span class="p">,</span> <span class="mi">1</span>
	<span class="nf">ret</span>

<span class="nf">global</span> <span class="nv">version</span>
<span class="nf">export</span> <span class="nv">version</span>
<span class="nl">version:</span>
	<span class="nf">mov</span> <span class="nb">eax</span><span class="p">,</span> <span class="mi">3</span>
	<span class="nf">ret</span>

<span class="nf">global</span> <span class="nv">rand64</span>
<span class="nf">export</span> <span class="nv">rand64</span>
<span class="nl">rand64:</span>
	<span class="nf">rdrand</span> <span class="nb">rax</span>
	<span class="nf">ret</span>

<span class="nf">global</span> <span class="nb">di</span><span class="nv">st</span>
<span class="nf">export</span> <span class="nb">di</span><span class="nv">st</span>
<span class="nl">dist:</span>
	<span class="nf">mulss</span>  <span class="nv">xmm0</span><span class="p">,</span> <span class="nv">xmm0</span>
	<span class="nf">mulss</span>  <span class="nv">xmm1</span><span class="p">,</span> <span class="nv">xmm1</span>
	<span class="nf">addss</span>  <span class="nv">xmm0</span><span class="p">,</span> <span class="nv">xmm1</span>
	<span class="nf">sqrtss</span> <span class="nv">xmm0</span><span class="p">,</span> <span class="nv">xmm0</span>
	<span class="nf">ret</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">global</code> directive is common in NASM assembly and causes the named
symbol to have the external linkage needed when linking the DLL. The
<code class="language-plaintext highlighter-rouge">export</code> directive is Windows-specific and is equivalent to <code class="language-plaintext highlighter-rouge">dllexport</code> in
C.</p>

<p>Every DLL must have an entrypoint, usually named <code class="language-plaintext highlighter-rouge">DllMainCRTStartup</code>. The
return value indicates if the DLL successfully loaded. So far this has
been handled automatically by the C implementation, but at this low level
we must define it explicitly.</p>

<p>Here’s how to assemble and link the DLL:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ nasm -fwin64 -o hello3.o hello3.s
$ ld -shared -s -o hello3.dll hello3.o
</code></pre></div></div>

<h3 id="call-the-dlls-from-python">Call the DLLs from Python</h3>

<p>Python has a nice, built-in C interop, <code class="language-plaintext highlighter-rouge">ctypes</code>, that allows Python to
call arbitrary C functions in shared libraries, including DLLs, without
writing C to glue it together. To tie this all off, here’s a Python
program that loads all of the DLLs above and invokes each of the
functions:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">ctypes</span>

<span class="k">def</span> <span class="nf">load</span><span class="p">(</span><span class="n">version</span><span class="p">):</span>
    <span class="n">hello</span> <span class="o">=</span> <span class="n">ctypes</span><span class="p">.</span><span class="n">CDLL</span><span class="p">(</span><span class="sa">f</span><span class="s">"./hello</span><span class="si">{</span><span class="n">version</span><span class="si">}</span><span class="s">.dll"</span><span class="p">)</span>
    <span class="n">hello</span><span class="p">.</span><span class="n">version</span><span class="p">.</span><span class="n">restype</span> <span class="o">=</span> <span class="n">ctypes</span><span class="p">.</span><span class="n">c_int</span>
    <span class="n">hello</span><span class="p">.</span><span class="n">version</span><span class="p">.</span><span class="n">argtypes</span> <span class="o">=</span> <span class="p">()</span>
    <span class="n">hello</span><span class="p">.</span><span class="n">dist</span><span class="p">.</span><span class="n">restype</span> <span class="o">=</span> <span class="n">ctypes</span><span class="p">.</span><span class="n">c_float</span>
    <span class="n">hello</span><span class="p">.</span><span class="n">dist</span><span class="p">.</span><span class="n">argtypes</span> <span class="o">=</span> <span class="p">(</span><span class="n">ctypes</span><span class="p">.</span><span class="n">c_float</span><span class="p">,</span> <span class="n">ctypes</span><span class="p">.</span><span class="n">c_float</span><span class="p">)</span>
    <span class="n">hello</span><span class="p">.</span><span class="n">rand64</span><span class="p">.</span><span class="n">restype</span> <span class="o">=</span> <span class="n">ctypes</span><span class="p">.</span><span class="n">c_ulonglong</span>
    <span class="n">hello</span><span class="p">.</span><span class="n">rand64</span><span class="p">.</span><span class="n">argtypes</span> <span class="o">=</span> <span class="p">()</span>
    <span class="k">return</span> <span class="n">hello</span>

<span class="k">for</span> <span class="n">hello</span> <span class="ow">in</span> <span class="n">load</span><span class="p">(</span><span class="mi">1</span><span class="p">),</span> <span class="n">load</span><span class="p">(</span><span class="mi">2</span><span class="p">),</span> <span class="n">load</span><span class="p">(</span><span class="mi">3</span><span class="p">):</span>
    <span class="k">print</span><span class="p">(</span><span class="s">"version"</span><span class="p">,</span> <span class="n">hello</span><span class="p">.</span><span class="n">version</span><span class="p">())</span>
    <span class="k">print</span><span class="p">(</span><span class="s">"rand   "</span><span class="p">,</span> <span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">hello</span><span class="p">.</span><span class="n">rand64</span><span class="p">()</span><span class="si">:</span><span class="mi">016</span><span class="n">x</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="s">"dist   "</span><span class="p">,</span> <span class="n">hello</span><span class="p">.</span><span class="n">dist</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">))</span>
</code></pre></div></div>

<p>After loading the DLL with <code class="language-plaintext highlighter-rouge">CDLL</code> the program defines each function
prototype so that Python knows how to call it. Unfortunately it’s not
possible to build Python with w64devkit, so you’ll also need to install
the standard CPython distribution in order to run it. Here’s the output:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ python finale.py
version 1
rand    b011ea9bdbde4bdf
dist    5.0
version 2
rand    f7c86ff06ae3d1a2
dist    5.0
version 3
rand    2a35a05b0482c898
dist    5.0
</code></pre></div></div>

<p>That output is the result of four different languages interfacing in one
process: C, Go, x86-64 assembly, and Python. Pretty neat if you ask me!</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>How to build and use DLLs on Windows</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2021/05/31/"/>
    <id>urn:uuid:6b64024a-6945-4bff-8226-33b9357babda</id>
    <updated>2021-05-31T02:13:40Z</updated>
    <category term="win32"/><category term="c"/><category term="cpp"/><category term="linux"/>
    <content type="html">
      <![CDATA[<p>I’ve recently been involved with a couple of discussions about Windows’
dynamic linking. One was <a href="https://begriffs.com/">Joe Nelson</a> in considering how to make
<a href="https://github.com/begriffs/libderp">libderp</a> accessible on Windows, and the other was about <a href="/blog/2020/05/15/">w64devkit</a>,
my Mingw-w64 distribution. I use these techniques so infrequently that I
need to figure it all out again each time I need it. Unfortunately there’s
a whole lot of outdated and incorrect information online which gets in the
way every time this happens. While it’s all fresh in my head, I will now
document what I know works.</p>

<p>In this article, all commands and examples are being run in the context of
w64devkit (1.8.0).</p>

<h3 id="mingw-w64">Mingw-w64</h3>

<p>If all you care about is the GNU toolchain then DLLs are straightforward,
working mostly like shared objects on other platforms. To illustrate,
let’s build a “square” library with one “exported” function, <code class="language-plaintext highlighter-rouge">square</code>,
that returns the square of its input (<code class="language-plaintext highlighter-rouge">square.c</code>):</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">long</span> <span class="nf">square</span><span class="p">(</span><span class="kt">long</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="n">x</span> <span class="o">*</span> <span class="n">x</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The header file (<code class="language-plaintext highlighter-rouge">square.h</code>):</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#ifndef SQUARE_H
#define SQUARE_H
</span>
<span class="kt">long</span> <span class="nf">square</span><span class="p">(</span><span class="kt">long</span><span class="p">);</span>

<span class="cp">#endif
</span></code></pre></div></div>

<p>To build a stripped, size-optimized DLL, <code class="language-plaintext highlighter-rouge">square.dll</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cc -shared -Os -s -o square.dll square.c
</code></pre></div></div>

<p>Now a test program to link against it (<code class="language-plaintext highlighter-rouge">main.c</code>), which “imports” <code class="language-plaintext highlighter-rouge">square</code>
from <code class="language-plaintext highlighter-rouge">square.dll</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp">
#include</span> <span class="cpf">"square.h"</span><span class="cp">
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"%ld</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">square</span><span class="p">(</span><span class="mi">2</span><span class="p">));</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Linking and testing it:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cc -Os -s main.c square.dll
$ ./a
4
</code></pre></div></div>

<p>It’s that simple. Or more traditionally, using the <code class="language-plaintext highlighter-rouge">-l</code> flag:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cc -Os -s -L. main.c -lsquare
</code></pre></div></div>

<p>Given <code class="language-plaintext highlighter-rouge">-lxyz</code> GCC will look for <code class="language-plaintext highlighter-rouge">xyz.dll</code> in the library path.</p>

<h4 id="viewing-exported-symbols">Viewing exported symbols</h4>

<p>Given a DLL, printing a list of the exported functions of a DLL is not so
straightforward. For ELF shared objects there’s <code class="language-plaintext highlighter-rouge">nm -D</code>, but despite what
the internet will tell you, this tool does not support DLLs. <code class="language-plaintext highlighter-rouge">objdump</code>
will print the exports as part of the “private” headers (<code class="language-plaintext highlighter-rouge">-p</code>). A bit of
<code class="language-plaintext highlighter-rouge">awk</code> can cut this down to just a list of exports. Since we’ll need this a
few times, here’s a script, <code class="language-plaintext highlighter-rouge">exports.sh</code>, that composes <code class="language-plaintext highlighter-rouge">objdump</code> and
<code class="language-plaintext highlighter-rouge">awk</code> into the tool I want:</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/sh</span>
<span class="nb">set</span> <span class="nt">-e</span>
<span class="nb">printf</span> <span class="s1">'LIBRARY %s\nEXPORTS\n'</span> <span class="s2">"</span><span class="nv">$1</span><span class="s2">"</span>
objdump <span class="nt">-p</span> <span class="s2">"</span><span class="nv">$1</span><span class="s2">"</span> | <span class="nb">awk</span> <span class="s1">'/^$/{t=0} {if(t)print$NF} /^\[O/{t=1}'</span>
</code></pre></div></div>

<p>Running this on <code class="language-plaintext highlighter-rouge">square.dll</code> above:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./exports.sh square.dll
LIBRARY square.dll
EXPORTS
square
</code></pre></div></div>

<p>This can be helpful when debugging. It also works outside of Windows, such
as on Linux. By the way, the output format is no accident: This is the
<a href="https://sourceware.org/binutils/docs/binutils/def-file-format.html"><code class="language-plaintext highlighter-rouge">.def</code> file format</a> (<a href="https://www.willus.com/mingw/yongweiwu_stdcall.html">also</a>), which will be particularly
useful in a moment.</p>

<p>Mingw-w64 has a <code class="language-plaintext highlighter-rouge">gendef</code> tool to produce the above output, and this tool
is now included in w64devkit. To print the exports to standard output:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gendef - square.dll
LIBRARY "square.dll"
EXPORTS
square
</code></pre></div></div>

<p>Alternatively Visual Studio provides <code class="language-plaintext highlighter-rouge">dumpbin</code>. It’s not as concise as
<code class="language-plaintext highlighter-rouge">exports.sh</code> but it’s a lot less verbose than <code class="language-plaintext highlighter-rouge">objdump -p</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ dumpbin /nologo /exports square.dll
...
          1    0 000012B0 square
...
</code></pre></div></div>

<h4 id="mingw-w64-improved">Mingw-w64 (improved)</h4>

<p>You can get by without knowing anything more, which is usually enough for
those looking to support Windows as a secondary platform, even just as a
cross-compilation target. However, with a bit more work we can do better.
Imagine doing the above with a non-trivial program. GCC doesn’t know which
functions are part of the API and which are not. Obviously static
functions should not be exported, but what about non-static functions
visible between translation units (i.e. object files)?</p>

<p>For instance, suppose <code class="language-plaintext highlighter-rouge">square.c</code> also has this function which is not part
of its API but may be called by another translation unit.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">internal_func</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{}</span>
</code></pre></div></div>

<p>Now when I build:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./exports.sh square.dll
LIBRARY square.dll
EXPORTS
internal_func
square
</code></pre></div></div>

<p>On the other side, when I build <code class="language-plaintext highlighter-rouge">main.c</code> how does it know which functions
are imported from a DLL and which will be found in another translation
unit? GCC makes it work regardless, but it can generate more efficient
code if it knows at compile time (vs. link time).</p>

<p>On Windows both are solved by adding <code class="language-plaintext highlighter-rouge">__declspec</code> notation on both sides.
In <code class="language-plaintext highlighter-rouge">square.c</code> the exports are marked as <code class="language-plaintext highlighter-rouge">dllexport</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">__declspec</span><span class="p">(</span><span class="n">dllexport</span><span class="p">)</span>
<span class="kt">long</span> <span class="nf">square</span><span class="p">(</span><span class="kt">long</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="n">x</span> <span class="o">*</span> <span class="n">x</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">internal_func</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{}</span>
</code></pre></div></div>

<p>In the header, it’s marked as an import:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">__declspec</span><span class="p">(</span><span class="n">dllimport</span><span class="p">)</span>
<span class="kt">long</span> <span class="nf">square</span><span class="p">(</span><span class="kt">long</span><span class="p">);</span>
</code></pre></div></div>

<p>The mere presence of <code class="language-plaintext highlighter-rouge">dllexport</code> tells the linker to only export those
functions marked as exports, and so <code class="language-plaintext highlighter-rouge">internal_func</code> disappears from the
exports list. Convenient!</p>

<p>On the import side, during compilation of the original program, GCC
assumed <code class="language-plaintext highlighter-rouge">square</code> wasn’t an import and generated a local function call.
When the linker later resolved the symbol to the DLL, it generated a
trampoline to fill in as that local function (like a <a href="https://www.airs.com/blog/archives/41">PLT</a>). With
<code class="language-plaintext highlighter-rouge">dllimport</code>, GCC knows it’s an imported function and so doesn’t go through
a trampoline.</p>

<p>While generally unnecessary for the GNU toolchain, it’s good hygiene to
use <code class="language-plaintext highlighter-rouge">__declspec</code>. It’s also mandatory when using <a href="https://en.wikipedia.org/wiki/Microsoft_Visual_C%2B%2B">MSVC</a>, in case you
care about that as well.</p>

<h3 id="msvc">MSVC</h3>

<p>Mingw-w64-compiled DLLs will work with <code class="language-plaintext highlighter-rouge">LoadLibrary</code> out of the box, which
is sufficient in many cases, such as for dynamically-loaded plugins. For
example (<code class="language-plaintext highlighter-rouge">loadlib.c</code>):</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;windows.h&gt;</span><span class="cp">
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">HANDLE</span> <span class="n">h</span> <span class="o">=</span> <span class="n">LoadLibrary</span><span class="p">(</span><span class="s">"square.dll"</span><span class="p">);</span>
    <span class="kt">long</span> <span class="p">(</span><span class="o">*</span><span class="n">square</span><span class="p">)(</span><span class="kt">long</span><span class="p">)</span> <span class="o">=</span> <span class="n">GetProcAddress</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="s">"square"</span><span class="p">);</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"%ld</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">square</span><span class="p">(</span><span class="mi">2</span><span class="p">));</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Compiled with MSVC <code class="language-plaintext highlighter-rouge">cl</code> (via <a href="/blog/2016/06/13/#visual-c"><code class="language-plaintext highlighter-rouge">vcvars.bat</code></a>):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cl /nologo loadlib.c
$ ./loadlib
4
</code></pre></div></div>

<p>However, the MSVC linker, unlike Binutils <code class="language-plaintext highlighter-rouge">ld</code>, cannot link directly with
DLLs. It requires an <em>import library</em>. Conventionally this matches the DLL
name but has a <code class="language-plaintext highlighter-rouge">.lib</code> extension — <code class="language-plaintext highlighter-rouge">square.lib</code> in this case. The Mingw-w64
ecosystem conventionally uses <code class="language-plaintext highlighter-rouge">.dll.a</code>, as in <code class="language-plaintext highlighter-rouge">square.dll.a</code>, in order to
distinguish it from a static library, but it’s the same format. The most
convenient way to get an import library is to ask GCC to generate one at
link-time via <code class="language-plaintext highlighter-rouge">--out-implib</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cc -shared -Wl,--out-implib,square.lib -o square.dll square.c
</code></pre></div></div>

<p>Back to <code class="language-plaintext highlighter-rouge">cl</code>, just add <code class="language-plaintext highlighter-rouge">square.lib</code> as another input. You don’t actually
need <code class="language-plaintext highlighter-rouge">square.dll</code> present at link time.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cl /nologo /Os main.c square.lib
$ ./main
4
</code></pre></div></div>

<p>What if you already have the DLL and you just need an import library? GNU
Binutils’ <code class="language-plaintext highlighter-rouge">dlltool</code> can do this, though not without help. It cannot
generate an import library from a DLL alone since it requires a <code class="language-plaintext highlighter-rouge">.def</code>
file enumerating the exports. (Why?) What luck that we have a tool for
this!</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./exports.sh square.dll &gt;square.def
$ dlltool --input-def square.def --output-lib square.lib
</code></pre></div></div>

<h3 id="reversing-directions">Reversing directions</h3>

<p>Going the other way, building a DLL with MSVC and linking it with
Mingw-w64, is nearly as easy as the pure Mingw-w64 case, though it
requires that all exports are tagged with <code class="language-plaintext highlighter-rouge">dllexport</code>. The <code class="language-plaintext highlighter-rouge">/LD</code> (case
sensitive) is just like GCC’s <code class="language-plaintext highlighter-rouge">-shared</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cl /nologo /LD /Os square.c
$ cc -Os -s main.c square.dll
$ ./a
4
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">cl</code> outputs three files: <code class="language-plaintext highlighter-rouge">square.dll</code>, <code class="language-plaintext highlighter-rouge">square.lib</code>, and <code class="language-plaintext highlighter-rouge">square.exp</code>.
The last can be discarded, and the second will be needed if linking with
MSVC, but as before, Mingw-w64 requires only the first.</p>

<p>This all demonstrates that Mingw-w64 and MSVC are quite interoperable — at
least for C interfaces that <a href="/blog/2023/08/27/">don’t share CRT objects</a>.</p>

<h3 id="tying-it-all-together">Tying it all together</h3>

<p>If your program is designed to be portable, those <code class="language-plaintext highlighter-rouge">__declspec</code> will get in
the way. That can be tidied up with some macros, but even better, those
macros can be used to control ELF symbol visibility so that the library
has good hygiene on, say, Linux as well.</p>

<p>The strategy will be to mark all API functions with <code class="language-plaintext highlighter-rouge">SQUARE_API</code> and
expand that to whatever is necessary at the time. When building a library,
it will expand to <code class="language-plaintext highlighter-rouge">dllexport</code>, or default visibility on unix-likes. When
consuming a library it will expand to <code class="language-plaintext highlighter-rouge">dllimport</code>, or nothing outside of
Windows. The new <code class="language-plaintext highlighter-rouge">square.h</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#ifndef SQUARE_H
#define SQUARE_H
</span>
<span class="cp">#if defined(SQUARE_BUILD)
#  if defined(_WIN32)
#    define SQUARE_API __declspec(dllexport)
#  elif defined(__ELF__)
#    define SQUARE_API __attribute__ ((visibility ("default")))
#  else
#    define SQUARE_API
#  endif
#else
#  if defined(_WIN32)
#    define SQUARE_API __declspec(dllimport)
#  else
#    define SQUARE_API
#  endif
#endif
</span>
<span class="n">SQUARE_API</span>
<span class="kt">long</span> <span class="nf">square</span><span class="p">(</span><span class="kt">long</span><span class="p">);</span>

<span class="cp">#endif
</span></code></pre></div></div>

<p>The new <code class="language-plaintext highlighter-rouge">square.c</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define SQUARE_BUILD
#include</span> <span class="cpf">"square.h"</span><span class="cp">
</span>
<span class="n">SQUARE_API</span>
<span class="kt">long</span> <span class="nf">square</span><span class="p">(</span><span class="kt">long</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="n">x</span> <span class="o">*</span> <span class="n">x</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">main.c</code> remains the same. When compiling on unix-like systems, add the
<code class="language-plaintext highlighter-rouge">-fvisibility=hidden</code> to hide all symbols by default so that this macro
can reveal them.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cc -shared -Os -fvisibility=hidden -s -o libsquare.so square.c
$ cc -Os -s main.c ./libsquare.so
$ ./a.out
4
</code></pre></div></div>

<h3 id="makefile-ideas">Makefile ideas</h3>

<p>While Mingw-w64 hides a lot of the differences between Windows and
unix-like systems, when it comes to dynamic libraries it can only do so
much, especially if you care about import libraries. If I were maintaining
a dynamic library — unlikely since I strongly prefer embedding or static
linking — I’d probably just use different <a href="/blog/2017/08/20/">Makefiles</a> per toolchain
and target. Aside from the <code class="language-plaintext highlighter-rouge">SQUARE_API</code> type of macros, the source code
can fortunately remain fairly agnostic about it.</p>

<p>Here’s what I might use as <code class="language-plaintext highlighter-rouge">NMakefile</code> for MSVC <code class="language-plaintext highlighter-rouge">nmake</code>:</p>

<div class="language-makefile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">CC</span>     <span class="o">=</span> cl /nologo
<span class="nv">CFLAGS</span> <span class="o">=</span> /Os

<span class="nl">all</span><span class="o">:</span> <span class="nf">main.exe square.dll square.lib</span>

<span class="nl">main.exe</span><span class="o">:</span> <span class="nf">main.c square.h square.lib</span>
	<span class="nv">$(CC)</span> <span class="nv">$(CFLAGS)</span> main.c square.lib

<span class="nl">square.dll</span><span class="o">:</span> <span class="nf">square.c square.h</span>
	<span class="nv">$(CC)</span> /LD <span class="nv">$(CFLAGS)</span> square.c

<span class="nl">square.lib</span><span class="o">:</span> <span class="nf">square.dll</span>

<span class="nl">clean</span><span class="o">:</span>
	<span class="p">-</span>del /f main.exe square.dll square.lib square.exp
</code></pre></div></div>

<p>Usage:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>nmake /nologo /f NMakefile
</code></pre></div></div>

<p>For w64devkit and cross-compiling, <code class="language-plaintext highlighter-rouge">Makefile.w64</code>, which includes
import library generation for the sake of MSVC consumers:</p>

<div class="language-makefile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">CC</span>      <span class="o">=</span> cc
<span class="nv">CFLAGS</span>  <span class="o">=</span> <span class="nt">-Os</span>
<span class="nv">LDFLAGS</span> <span class="o">=</span> <span class="nt">-s</span>
<span class="nv">LDLIBS</span>  <span class="o">=</span>

<span class="nl">all</span><span class="o">:</span> <span class="nf">main.exe square.dll square.lib</span>

<span class="nl">main.exe</span><span class="o">:</span> <span class="nf">main.c square.dll square.h</span>
	<span class="nv">$(CC)</span> <span class="nv">$(CFLAGS)</span> <span class="nv">$(LDFLAGS)</span> <span class="nt">-o</span> <span class="nv">$@</span> main.c square.dll <span class="nv">$(LDLIBS)</span>

<span class="nl">square.dll</span><span class="o">:</span> <span class="nf">square.c square.h</span>
	<span class="nv">$(CC)</span> <span class="nt">-shared</span> <span class="nt">-Wl</span>,--out-implib,<span class="err">$</span><span class="o">(</span>@:dll<span class="o">=</span>lib<span class="o">)</span> <span class="se">\</span>
	    <span class="nv">$(CFLAGS)</span> <span class="nv">$(LDFLAGS)</span> <span class="nt">-o</span> <span class="nv">$@</span> square.c <span class="nv">$(LDLIBS)</span>

<span class="nl">square.lib</span><span class="o">:</span> <span class="nf">square.dll</span>

<span class="nl">clean</span><span class="o">:</span>
	<span class="nb">rm</span> <span class="nt">-f</span> main.exe square.dll square.lib
</code></pre></div></div>

<p>Usage:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make -f Makefile.w64
</code></pre></div></div>

<p>And a <code class="language-plaintext highlighter-rouge">Makefile</code> for everyone else:</p>

<div class="language-makefile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">CC</span>      <span class="o">=</span> cc
<span class="nv">CFLAGS</span>  <span class="o">=</span> <span class="nt">-Os</span> <span class="nt">-fvisibility</span><span class="o">=</span>hidden
<span class="nv">LDFLAGS</span> <span class="o">=</span> <span class="nt">-s</span>
<span class="nv">LDLIBS</span>  <span class="o">=</span>

<span class="nl">all</span><span class="o">:</span> <span class="nf">main libsquare.so</span>

<span class="nl">main</span><span class="o">:</span> <span class="nf">main.c libsquare.so square.h</span>
	<span class="nv">$(CC)</span> <span class="nv">$(CFLAGS)</span> <span class="nv">$(LDFLAGS)</span> <span class="nt">-o</span> <span class="nv">$@</span> main.c ./libsquare.so <span class="nv">$(LDLIBS)</span>

<span class="nl">libsquare.so</span><span class="o">:</span> <span class="nf">square.c square.h</span>
	<span class="nv">$(CC)</span> <span class="nt">-shared</span> <span class="nv">$(CFLAGS)</span> <span class="nv">$(LDFLAGS)</span> <span class="nt">-o</span> <span class="nv">$@</span> square.c <span class="nv">$(LDLIBS)</span>

<span class="nl">clean</span><span class="o">:</span>
	<span class="nb">rm</span> <span class="nt">-f</span> main libsquare.so
</code></pre></div></div>

<p>Now that I have this article, I’m glad I won’t have to figure this all out
again next time I need it!</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  <entry>
    <title>A guide to Windows application development using w64devkit</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2021/03/11/"/>
    <id>urn:uuid:b04dbe3d-2e79-4afd-ad20-6ce0b232242e</id>
    <updated>2021-03-11T01:40:31Z</updated>
    <category term="c"/><category term="cpp"/><category term="win32"/>
    <content type="html">
      <![CDATA[<p>There’s a trend of building services where a monolithic application is
better suited, or using JavaScript and Python then being stumped by their
troublesome deployment story. This leads to solutions like <a href="https://deftly.net/posts/2017-06-01-measuring-the-weight-of-an-electron.html">bundling an
entire web browser</a> with an application, or using containers to
circumscribe <a href="https://research.swtch.com/deps">a sprawling dependency tree made of mystery meat</a>.</p>

<p>My <a href="/blog/2020/05/15/">small development distribution</a> for Windows, <a href="https://github.com/skeeto/w64devkit">w64devkit</a>,
is my own little way of pushing back against this trend where it affects
me most. Following in the footsteps of projects like <a href="https://handmadehero.org/">Handmade Hero</a>
and <a href="https://www.youtube.com/playlist?list=PLlaINRtydtNWuRfd4Ra3KeD6L9FP_tDE7">Making a Video Game from Scratch</a>, this is my guide to
no-nonsense software development using my development kit. It’s an
overview of the tooling and development workflow, and I’ve tried not to
assume too much knowledge of the reader. Being a guide rather than manual,
it is incomplete on its own, and I link to substantial external resources
to fill in the gaps. The guide is capped with a small game I wrote
entirely using my development kit, serving as a demonstration of what
sorts of things are not only possible, but quite reasonably attainable.</p>

<!--more-->

<video src="https://nullprogram.s3.amazonaws.com/asteroids/asteroids.mp4" width="600" height="600" controls="">
</video>

<p>Game repository: <a href="https://github.com/skeeto/asteroids-demo">https://github.com/skeeto/asteroids-demo</a><br />
Guide to source: <a href="https://idle.nprescott.com/2021/understanding-asteroids.html">Understanding Asteroids</a></p>

<h3 id="initial-setup">Initial setup</h3>

<p>Of course you cannot use the development kit if you don’t have it yet. Go
to the <a href="https://github.com/skeeto/w64devkit/releases">releases section</a> and download the latest release. It will be
a .zip file named <code class="language-plaintext highlighter-rouge">w64devkit-x.y.z.zip</code> where <code class="language-plaintext highlighter-rouge">x.y.z</code> is the version.</p>

<p>You will need to unzip the development kit before using it. Windows has
built-in support for .zip files, so you can either right-click to access
“Extract All…” or navigate into it as a folder then drag-and-drop the
<code class="language-plaintext highlighter-rouge">w64devkit</code> directory somewhere outside the .zip file. It doesn’t care
where it’s unzipped (aka it’s “portable”), so put it where ever is
convenient: your desktop, user profile directory, a thumb drive, etc. You
can move it later if you change your mind just so long as you’re not
actively running it. If you decide you don’t need it anymore then delete
it.</p>

<h3 id="entering-the-development-environment">Entering the development environment</h3>

<p>There is a <code class="language-plaintext highlighter-rouge">w64devkit.exe</code> in the unzipped <code class="language-plaintext highlighter-rouge">w64devkit</code> directory. This is
the easiest way to enter the development environment, and will not require
system configuration changes. This program puts the kit’s programs in the
<code class="language-plaintext highlighter-rouge">PATH</code> environment variable then runs a Bourne shell — the standard unix
shell. Aside from the text editor, this is the primary interface for
developing software. In time you may even extend this environment with
your own tools.</p>

<p>If you want an additional “terminal” window, run <code class="language-plaintext highlighter-rouge">w64devkit.exe</code> again. If
you use it a lot, you may want to create a shortcut and even pin it to
your task bar.</p>

<p>Whether on Windows or unix-like systems, when you type a command into the
system shell it uses the <code class="language-plaintext highlighter-rouge">PATH</code> environment variable to locate the actual
program to run for that command. In practice, the <code class="language-plaintext highlighter-rouge">PATH</code> variable is a
concatenation of multiple directories, and the shell searches these
directories in order. On unix-like systems, <code class="language-plaintext highlighter-rouge">PATH</code> elements are separated
by colons. However, Windows uses colons to delimit drive letters, so its
<code class="language-plaintext highlighter-rouge">PATH</code> elements are separated by semicolons.</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Prepending to PATH on unix</span>
<span class="nv">PATH</span><span class="o">=</span><span class="s2">"</span><span class="nv">$HOME</span><span class="s2">/bin:</span><span class="nv">$PATH</span><span class="s2">"</span>

<span class="c"># Prepending to PATH on Windows (w64devkit)</span>
<span class="nv">PATH</span><span class="o">=</span><span class="s2">"</span><span class="nv">$HOME</span><span class="s2">/bin;</span><span class="nv">$PATH</span><span class="s2">"</span>
</code></pre></div></div>

<p>For more advanced users: Rather than use <code class="language-plaintext highlighter-rouge">w64devkit.exe</code>, you could “Edit
environment variables for your account” and manually add w64devkit’s <code class="language-plaintext highlighter-rouge">bin</code>
directory to your <code class="language-plaintext highlighter-rouge">PATH</code>, making the tools generally available everywhere
on your system. If you’ve gone this route, you can start a Bourne shell at
any time with <code class="language-plaintext highlighter-rouge">sh -l</code>. (The <code class="language-plaintext highlighter-rouge">-l</code> option requests a login shell.)</p>

<p>Also borrowed from the unix world is the concept of a <em>home directory</em>,
specified by the <code class="language-plaintext highlighter-rouge">HOME</code> environment variable. By default this will be your
user profile directory, typically <code class="language-plaintext highlighter-rouge">C:/Users/$USER</code>. Login shells always
start in the home directory. This directory is often indicated by tilde
(<code class="language-plaintext highlighter-rouge">~</code>), and many programs automatically expand a leading tilde to the home
directory.</p>

<h3 id="shell-basics">Shell basics</h3>

<p>The shell is a command interpreter. It’s named such because <a href="https://www.youtube.com/watch?v=tc4ROCJYbm0&amp;t=4m57s">it was
originally a <em>shell</em> around the operating system kernel</a> — the user
interface to the kernel. Your system’s graphical interface — Windows
Explorer, or <code class="language-plaintext highlighter-rouge">Explorer.exe</code> — is really just a kind of shell, too. That
shell is oriented around the mouse and graphics. This is fine for some
tasks, but a keyboard-oriented command shell is far better suited for
development tasks. It’s more efficient, but more importantly its features
are composable: Complex operations and processes can be <a href="https://www.youtube.com/watch?v=bKzonnwoR2I">constructed
from</a> simple, easy-to-understand tools. Embrace it!</p>

<p>In the shell you can navigate between directories with <code class="language-plaintext highlighter-rouge">cd</code>, make
directories with <code class="language-plaintext highlighter-rouge">mkdir</code>, remove files with <code class="language-plaintext highlighter-rouge">rm</code>, regular expression text
searches with <code class="language-plaintext highlighter-rouge">grep</code>, etc. Run <code class="language-plaintext highlighter-rouge">busybox</code> to see a listing of the available
standard commands. Unfortunately there are no manual pages, but you can
access basic usage information for any command with <code class="language-plaintext highlighter-rouge">busybox CMD --help</code>.</p>

<p>Windows’ standard command shell is <code class="language-plaintext highlighter-rouge">cmd.exe</code>. Unfortunately this shell is
terrible and exists mostly for legacy compatibility. The intended
replacement is PowerShell for users who regularly use a shell. However,
PowerShell is fundamentally broken, does virtually everything incorrectly,
and manages to be even worse than <code class="language-plaintext highlighter-rouge">cmd.exe</code>. Besides, sticking to POSIX
shell conventions significantly improves build portability, and unix tool
knowledge is transferable to basically every other operating system.</p>

<p>Unix’s standard shell was the Bourne shell, <code class="language-plaintext highlighter-rouge">sh</code>. The shells in use today
are Bourne shell clones with a superset of its features. The most popular
interactive shells are Bash and Zsh. On Linux, dash (Debian Almquist
shell) has become popular for non-interactive use (scripting). The shell
included with w64devkit is the BusyBox fork of the Almquist shell (<code class="language-plaintext highlighter-rouge">ash</code>),
closely related to dash. The Almquist shell has almost no non-interactive
features beyond the standard Bourne shell, and so as far as scripts are
concerned can be regarded as a plain Bourne shell clone. That’s why I
typically refer to it by the name <code class="language-plaintext highlighter-rouge">sh</code>.</p>

<p>However, BusyBox’s Almquist shell has interactive features much like Bash,
and Bash users should be quite comfortable. It’s not just tab-completion
but a slew of Emacs-like keybindings:</p>

<ul>
  <li><kbd>Ctrl-r</kbd>: search backwards in history</li>
  <li><kbd>Ctrl-s</kbd>: search forwards in history</li>
  <li><kbd>Ctrl-p</kbd>: previous command (Up)</li>
  <li><kbd>Ctrl-n</kbd>: next command (Down)</li>
  <li><kbd>Ctrl-a</kbd>: cursor to the beginning of line (Home)</li>
  <li><kbd>Ctrl-e</kbd>: cursor to the end of line (End)</li>
  <li><kbd>Alt-b</kbd>: cursor back one word</li>
  <li><kbd>Alt-f</kbd>: cursor forward one word</li>
  <li><kbd>Ctrl-l</kbd>: clear the screen</li>
  <li><kbd>Alt-d</kbd>: delete word after the cursor</li>
  <li><kbd>Ctrl-w</kbd>: delete the word before the cursor</li>
  <li><kbd>Ctrl-k</kbd>: delete to the end of the line</li>
  <li><kbd>Ctrl-u</kbd>: delete to the beginning of the line</li>
  <li><kbd>Ctrl-f</kbd>: cursor forward one character (Right)</li>
  <li><kbd>Ctrl-b</kbd>: cursor backward one character (Left)</li>
  <li><kbd>Ctrl-d</kbd>: delete character under the cursor (Delete)</li>
  <li><kbd>Ctrl-h</kbd>: delete character before the cursor (Backspace)</li>
</ul>

<p>Take special note of Ctrl-r, which is the most important and powerful
shortcut of the bunch. Frequent use is a good habit. Don’t mash the up
arrow to search through the command history.</p>

<p>Special note for Cygwin and MSYS2 users: the shell is aware of Windows
paths and does not present a virtual unix file system scheme. This has
important consequences for scripting, both good and bad. The shell even
supports backslash as a directory separator, though you should of course
prefer forward slashes.</p>

<h4 id="shell-customization">Shell customization</h4>

<p>Login shells (<code class="language-plaintext highlighter-rouge">-l</code>) evaluate the contents of <code class="language-plaintext highlighter-rouge">~/.profile</code> on startup. This
is your chance to customize the shell configuration, such as setting
environment variables or defining aliases and functions. For instance, if
you wanted the prompt to show the working directory in green you’d set
<code class="language-plaintext highlighter-rouge">PS1</code> in your <code class="language-plaintext highlighter-rouge">~/.profile</code>:</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">PS1</span><span class="o">=</span><span class="s2">"</span><span class="si">$(</span><span class="nb">printf</span> <span class="s1">'\x1b[33;1m\\w\x1b[0m$ '</span><span class="si">)</span><span class="s2">"</span>
</code></pre></div></div>

<p>If you find yourself using the same command sequences or set of options
again and again, you might consider putting those commands into a script,
and then installing that script somewhere on your <code class="language-plaintext highlighter-rouge">PATH</code> so that you can
run it as a new command. First make a directory to hold your scripts, say
in <code class="language-plaintext highlighter-rouge">~/bin</code>:</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">mkdir</span> ~/bin
</code></pre></div></div>

<p>In <code class="language-plaintext highlighter-rouge">~/.profile</code> prepend it to your <code class="language-plaintext highlighter-rouge">PATH</code>:</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">PATH</span><span class="o">=</span><span class="s2">"</span><span class="nv">$HOME</span><span class="s2">/bin;</span><span class="nv">$PATH</span><span class="s2">"</span>
</code></pre></div></div>

<p>If you don’t want to start a fresh shell to try it out, then load the new
configuration in your current shell:</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">source</span> ~/.profile
</code></pre></div></div>

<p>Suppose you keep getting the <code class="language-plaintext highlighter-rouge">tar</code> switches mixed up and you’d like to
just have an <code class="language-plaintext highlighter-rouge">untar</code> command that does the right thing. Create a file
named <code class="language-plaintext highlighter-rouge">untar</code> or <code class="language-plaintext highlighter-rouge">untar.sh</code> in <code class="language-plaintext highlighter-rouge">~/bin</code> with these contents:</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/sh</span>
<span class="nb">set</span> <span class="nt">-e</span>
<span class="nb">tar</span> <span class="nt">-xaf</span> <span class="s2">"</span><span class="nv">$@</span><span class="s2">"</span>
</code></pre></div></div>

<p>Now a command like <code class="language-plaintext highlighter-rouge">untar something.tar.gz</code> will extract the archive
contents.</p>

<p>To learn more about Bourne shell scripting, the POSIX <a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html">shell command
language specification</a> is a good reference. All of the features
listed in that document are available to your shell scripts.</p>

<h3 id="text-editing">Text editing</h3>

<p>The development kit includes the powerful and popular text editor
<a href="https://www.vim.org/">Vim</a>. It takes effort to learn, but is well worth the investment.
It’s packed with features, but since you only need a small number of them
on a regular basis it’s not as daunting as it might appear. Using Vim
effectively, you will write and edit text so much more quickly than
before. That includes not just code, but prose: READMEs, documentation,
etc.</p>

<p>(The catch: Non-modal editing will forever feel frustratingly inefficient.
That’s not because you will become unpracticed at it, or even have trouble
code switching between input styles, but because you’ll now be aware how
bad it is. Ignorance is bliss.)</p>

<p>Vim includes its own tutorial for absolute beginners which you can access
with the <code class="language-plaintext highlighter-rouge">vimtutor</code> command. It will run in the console window and guide
you through the basics in about half an hour. Do not be afraid to return
to the tutorial at any time since this is the stuff you need to know by
heart.</p>

<p>When it comes time to actually use Vim to write code, you can continue
writing code via the terminal interface (<code class="language-plaintext highlighter-rouge">vim</code>), or you can run the
graphical interface (<code class="language-plaintext highlighter-rouge">gvim</code>). The latter is recommended since it has some
nice quality-of-life features, but it’s not strictly necessary. When
starting the GUI, put an ampersand (<code class="language-plaintext highlighter-rouge">&amp;</code>) on the command so that it runs in
the background. For instance this brings up the editor with two files open
but leaves the shell running in the foreground so you can continue using
it while you edit:</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gvim main.c Makefile &amp;
</code></pre></div></div>

<p>Vim’s defaults are good but imperfect. Before getting started with
actually editing code you should establish at least the following minimal
configuration in <code class="language-plaintext highlighter-rouge">~/_vimrc</code>. (To understand these better, use <code class="language-plaintext highlighter-rouge">:help</code> to
jump the built-in documentation.)</p>

<div class="language-vim highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">set</span> <span class="nb">hidden</span> <span class="nb">encoding</span><span class="p">=</span>utf<span class="m">-8</span> <span class="nb">shellslash</span>
<span class="k">filetype</span> plugin <span class="nb">indent</span> <span class="k">on</span>
<span class="nb">syntax</span> <span class="k">on</span>
</code></pre></div></div>

<p>The graphical interface defaults to a white background. Many people prefer
“dark mode” when editing code, so inverting this is simply a matter of
choosing a dark color scheme. Vim comes with a handful of color schemes,
around half of which have dark backgrounds. Use <code class="language-plaintext highlighter-rouge">:colorscheme</code> to change
it, and put it in your <code class="language-plaintext highlighter-rouge">~/_vimrc</code> to persist it.</p>

<div class="language-vim highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">colorscheme</span> slate
</code></pre></div></div>

<p>The default graphical interface includes a menu bar and tool bar. There
are better ways to accomplish all these operations, none of which require
touching the mouse, so consider removing all that junk:</p>

<div class="language-vim highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">set</span> <span class="nb">guioptions</span><span class="p">=</span>ac
</code></pre></div></div>

<p>Finally, since the development kit is oriented around C and C++, here’s my
own entire Vim configuration for C which makes it obey my own style:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>set cinoptions+=t0,l1,:0 cinkeys-=0#
</code></pre></div></div>

<p>Once you’re comfortable with the basics, the best next step is to read
<a href="https://pragprog.com/titles/dnvim2/practical-vim-second-edition/"><em>Practical Vim: Edit Text at the Speed of Thought</em></a> by Drew Neil.
It’s an opinionated guide to Vim that instills good habits. If you want
something cost-free to whet your appetite, check out <a href="https://www.moolenaar.net/habits.html"><em>Seven habits of
effective text editing</em></a>.</p>

<h3 id="writing-an-application">Writing an application</h3>

<p>We’ve established a shell and text editor. Next is the development
workflow for writing an actual application. Ultimately you will invoke a
compiler from within Vim, which will parse compiler messages and take you
directly to the parts of your source code that need attention. Before we
get that far, let’s start with the basics.</p>

<p>The classic example is the “hello world” program, which we’ll suppose is
in a file called <code class="language-plaintext highlighter-rouge">hello.c</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp">
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">puts</span><span class="p">(</span><span class="s">"Hello, world!"</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>While this development kit provides a version of the GNU compiler, <code class="language-plaintext highlighter-rouge">gcc</code>,
this guide mostly speaks of it in terms of the generic unix C compiler
name, <code class="language-plaintext highlighter-rouge">cc</code>. Unix-like systems install <code class="language-plaintext highlighter-rouge">cc</code> as an alias for the system’s
default C compiler, and w64devkit is no exception.</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cc <span class="nt">-o</span> hello.exe hello.c
</code></pre></div></div>

<p>This command creates <code class="language-plaintext highlighter-rouge">hello.exe</code> from <code class="language-plaintext highlighter-rouge">hello.c</code>. Since this is not (yet?)
on your <code class="language-plaintext highlighter-rouge">PATH</code>, you must invoke it via a path name (i.e. the command must
include a slash), since otherwise the shell will search for it via the
<code class="language-plaintext highlighter-rouge">PATH</code> variable. Typically this means putting <code class="language-plaintext highlighter-rouge">./</code> in front of the program
name, meaning “run the program in the current directory”. As a convenience
you do not need to include the <code class="language-plaintext highlighter-rouge">.exe</code> extension:</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./hello
</code></pre></div></div>

<p>Unlike the <code class="language-plaintext highlighter-rouge">untar</code> shell script from before, this <code class="language-plaintext highlighter-rouge">hello.exe</code> is entirely
independent of w64devkit. You can share it with anyone running Windows and
they’ll be able to execute it. There’s a little bit of runtime embedded in
the executable, but the bulk of the runtime is in the operating system
itself. I want to highlight this point because <em>most programming languages
don’t work like this</em>, or at least doing so is unnatural with lots of
compromises. The users of your software do not need to install a runtime
or other supporting software. They just run the executable you give them!</p>

<p>That executable is probably pretty small, less than 50kB — basically a
miracle by today’s standards. Sure, it’s hardly doing anything right now,
but you can add a whole lot more functionality without that executable
getting much bigger. In fact, it’s entirely unoptimized right now and
could be even smaller. Passing the <code class="language-plaintext highlighter-rouge">-Os</code> flag tells the compiler to
optimize for size and <code class="language-plaintext highlighter-rouge">-s</code> flag tells the linker to strip out unneeded
information.</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cc <span class="nt">-Os</span> <span class="nt">-s</span> <span class="nt">-o</span> hello.exe hello.c
</code></pre></div></div>

<p>That cuts the program down to around a third of its previous size. If
necessary you can still do even better than this, but that’s outside the
scope of this guide.</p>

<p>So far the program could still be valid enough to compile but contain
obvious mistakes. The compiler can warn about many of these mistakes, and
so it’s always worth enabling these warnings. This requires two flags:
<code class="language-plaintext highlighter-rouge">-Wall</code> (“all” warnings) and <code class="language-plaintext highlighter-rouge">-Wextra</code> (extra warnings).</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cc <span class="nt">-Wall</span> <span class="nt">-Wextra</span> <span class="nt">-o</span> hello.exe hello.c
</code></pre></div></div>

<p>When you’re working on a program, you often don’t want optimization
enabled since it makes it more difficult to debug. However, some warnings
aren’t fired unless optimization is enabled. Fortunately there’s an
optimization level to resolve this, <code class="language-plaintext highlighter-rouge">-Og</code> (optimize for debugging).
Combine this with <code class="language-plaintext highlighter-rouge">-g3</code> to embed debug information in the program. This
will be handy later.</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cc <span class="nt">-Wall</span> <span class="nt">-Wextra</span> <span class="nt">-Og</span> <span class="nt">-g3</span> <span class="nt">-o</span> hello.exe hello.c
</code></pre></div></div>

<p>These are the compiler flags you typically want to enable while developing
your software. When you distribute it, you’d use either <code class="language-plaintext highlighter-rouge">-Os -s</code> (optimize
for size) or <code class="language-plaintext highlighter-rouge">-O3 -s</code> (optimize for speed).</p>

<h4 id="makefiles">Makefiles</h4>

<p>I mentioned running the compiler from Vim. This isn’t done directly but
via special build script called a Makefile. You invoke the <code class="language-plaintext highlighter-rouge">make</code> program
from Vim, which invokes the compiler as above. The simplest Makefile would
look like this, in a file literally named <code class="language-plaintext highlighter-rouge">Makefile</code>:</p>

<div class="language-makefile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">hello.exe</span><span class="o">:</span> <span class="nf">hello.c</span>
    <span class="err">cc</span> <span class="err">-Wall</span> <span class="err">-Wextra</span> <span class="err">-Og</span> <span class="err">-g3</span> <span class="err">-o</span> <span class="err">hello.exe</span> <span class="err">hello.c</span>
</code></pre></div></div>

<p>This tells <code class="language-plaintext highlighter-rouge">make</code> that the file named <code class="language-plaintext highlighter-rouge">hello.exe</code> is derived from another
file called <code class="language-plaintext highlighter-rouge">hello.c</code>, and the tab-indented line is the recipe for doing
so. Running the <code class="language-plaintext highlighter-rouge">make</code> command will run the compiler command if and only
if <code class="language-plaintext highlighter-rouge">hello.c</code> is newer than <code class="language-plaintext highlighter-rouge">hello.exe</code>.</p>

<p>To run <code class="language-plaintext highlighter-rouge">make</code> from Vim, use the <code class="language-plaintext highlighter-rouge">:make</code> command inside Vim. It will not
only run <code class="language-plaintext highlighter-rouge">make</code> but also capture its output in an internal buffer called
the <em>quickfix list</em>. If there is any warning or error, Vim will jump to
it. Use <code class="language-plaintext highlighter-rouge">:cn</code> (next) and <code class="language-plaintext highlighter-rouge">:cp</code> (prev) to move between issues and correct
them, or <code class="language-plaintext highlighter-rouge">:cc</code> to re-display the current issue. When you’re done fixing
the issues, run <code class="language-plaintext highlighter-rouge">:make</code> again to start the cycle over.</p>

<p>Try that now by changing the printed message and recompiling from within
Vim. Intentionally create an error (bad syntax, too many arguments, etc.)
and see what happens.</p>

<p>Makefiles are a powerful and conventional way to build C and C++ software.
Since the development kit includes the standard set of unix utilities,
it’s very easy to write portable Makefiles that work across a variety a
operating systems and environments. Your software isn’t necessarily tied
to Windows just because you’re using a Windows-based development
environment. If you want to learn how Makefiles work and how to use them
effectively, read <a href="/blog/2017/08/20/"><em>A Tutorial on Portable Makefiles</em></a>. From here on
I’ll assume you’ve read that tutorial.</p>

<p>Ultimately I’d probably write my “hello world” Makefile like so:</p>

<div class="language-makefile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">.POSIX</span><span class="o">:</span>
<span class="nv">CC</span>      <span class="o">=</span> cc
<span class="nv">CFLAGS</span>  <span class="o">=</span> <span class="nt">-Wall</span> <span class="nt">-Wextra</span> <span class="nt">-Og</span> <span class="nt">-g3</span>
<span class="nv">LDFLAGS</span> <span class="o">=</span>
<span class="nv">LDLIBS</span>  <span class="o">=</span>
<span class="nv">EXE</span>     <span class="o">=</span> .exe

<span class="nl">hello$(EXE)</span><span class="o">:</span> <span class="nf">hello.c</span>
    <span class="err">$(CC)</span> <span class="err">$(CFLAGS)</span> <span class="err">$(LDFLAGS)</span> <span class="err">-o</span> <span class="err">$@</span> <span class="err">hello.c</span> <span class="err">$(LDLIBS)</span>
</code></pre></div></div>

<p>When building a release, optimize for size or speed:</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make <span class="nv">CFLAGS</span><span class="o">=</span><span class="nt">-Os</span> <span class="nv">LDFLAGS</span><span class="o">=</span><span class="nt">-s</span>
</code></pre></div></div>

<p>This is very much a Windows-first style of Makefile, but still allows it
to be comfortably used on other systems. On Linux this <code class="language-plaintext highlighter-rouge">make</code> invocation
strips away the <code class="language-plaintext highlighter-rouge">.exe</code> extension:</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make <span class="nv">EXE</span><span class="o">=</span>
</code></pre></div></div>

<p>For a Windows-second Makefile, remove the line with <code class="language-plaintext highlighter-rouge">EXE = .exe</code>. This
allows <code class="language-plaintext highlighter-rouge">EXE</code> to come from the environment. So, for instance, I already
define the <code class="language-plaintext highlighter-rouge">EXE</code> environment variable in my w64devkit <code class="language-plaintext highlighter-rouge">~/.profile</code>:</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">EXE</span><span class="o">=</span>.exe
</code></pre></div></div>

<p>On Linux running <code class="language-plaintext highlighter-rouge">make</code> does the right thing, as does running <code class="language-plaintext highlighter-rouge">make</code> on
Windows. No special configuration required.</p>

<p>If my software is truly limited to Windows, I’m likely still interested in
supporting cross-compilation. A common convention for GNU toolchains is a
<code class="language-plaintext highlighter-rouge">CROSS</code> Makefile macro. For example:</p>

<div class="language-makefile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">.POSIX</span><span class="o">:</span>
<span class="nv">CROSS</span>   <span class="o">=</span>
<span class="nv">CC</span>      <span class="o">=</span> <span class="nv">$(CROSS)</span>gcc
<span class="nv">CFLAGS</span>  <span class="o">=</span> <span class="nt">-Wall</span> <span class="nt">-Wextra</span> <span class="nt">-Og</span> <span class="nt">-g3</span>
<span class="nv">LDFLAGS</span> <span class="o">=</span>
<span class="nv">LDLIBS</span>  <span class="o">=</span>

<span class="nl">hello.exe</span><span class="o">:</span> <span class="nf">hello.c</span>
    <span class="err">$(CC)</span> <span class="err">$(CFLAGS)</span> <span class="err">$(LDFLAGS)</span> <span class="err">-o</span> <span class="err">$@</span> <span class="err">hello.c</span> <span class="err">$(LDLIBS)</span>
</code></pre></div></div>

<p>On Windows I just run <code class="language-plaintext highlighter-rouge">make</code>, but on Linux I’d set <code class="language-plaintext highlighter-rouge">CROSS</code> appropriately.</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make <span class="nv">CROSS</span><span class="o">=</span>x86_64-w64-mingw32-
</code></pre></div></div>

<h4 id="navigating">Navigating</h4>

<p>What happens if you’re working on a larger program and you need to jump to
the definition of a function, macro, or variable? It would be tedious to
use <code class="language-plaintext highlighter-rouge">grep</code> all the time to find definitions. The development kit includes
a solid implementation of <code class="language-plaintext highlighter-rouge">ctags</code> for building a <em>tags database</em> lists the
locations for various kinds of definitions, and Vim knows how to read this
database. Most often you’ll want to run it recursively like so:</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ctags <span class="nt">-R</span>
</code></pre></div></div>

<p>You can of course do this from Vim, too: <code class="language-plaintext highlighter-rouge">:!ctags -R</code></p>

<p>With the cursor over an identifier, press <code class="language-plaintext highlighter-rouge">CTRL-]</code> to jump to a definition
for that name. Use <code class="language-plaintext highlighter-rouge">:tn</code> and <code class="language-plaintext highlighter-rouge">:tp</code> to move between different definitions
(e.g. when the name is overloaded). Or if you have a tag in mind rather
than a name listed in the buffer, use the <code class="language-plaintext highlighter-rouge">:tag</code> command to jump by name.
Vim maintains a tag stack and jump list for going back and forth, like the
backward and forward buttons in a browser.</p>

<h4 id="debugging">Debugging</h4>

<p>I had mentioned that the <code class="language-plaintext highlighter-rouge">-g3</code> option embeds extra information in the
executable. This is for debuggers, and the development kit includes the
GNU Debugger, <code class="language-plaintext highlighter-rouge">gdb</code>, to help you debug your programs. To use it, invoke
GDB on your executable:</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb hello.exe
</code></pre></div></div>

<p>From here you can set breakpoints and such, then run the program with
<code class="language-plaintext highlighter-rouge">start</code> or <code class="language-plaintext highlighter-rouge">run</code>, then <code class="language-plaintext highlighter-rouge">step</code> through it line by line. See <a href="https://beej.us/guide/bggdb/"><em>Beej’s Quick
Guide to GDB</em></a> for a guide. During development, always run your
program through GDB, and never exit GDB. See also: <a href="/blog/2022/06/26/"><em>Assertions should be
more debugger-oriented</em></a>.</p>

<h4 id="learning-c-and-c">Learning C and C++</h4>

<p>So far this guide hasn’t actually assumed any C knowledge. One of the best
ways to learn C is by reading the highly-regarded <a href="https://en.wikipedia.org/wiki/The_C_Programming_Language"><em>The C Programming
Language</em></a> and doing the exercises. Alternatively, cost-free options
are <a href="http://beej.us/guide/bgc/"><em>Beej’s Guide to C Programming</em></a> and <a href="https://modernc.gforge.inria.fr/"><em>Modern C</em></a> (more
advanced). You can use the development kit to go through any of these.</p>

<p>I’ve focused on C, but everything above also applies to C++. To learn C++
<a href="https://www.stroustrup.com/tour2.html"><em>A Tour of C++</em></a> is a safe bet.</p>

<h3 id="demonstration">Demonstration</h3>

<p>To illustrate how much you can do with nothing beyond than this 76MB
development kit, here’s a taste in the form of a weekend project: an
<a href="https://github.com/skeeto/asteroids-demo">Asteroids Clone for Windows</a>. That’s the game in the video at the
top of this guide.</p>

<p>The development kit doesn’t include Git so you’d need to install it
separately in order to clone the repository, but you could at least skip
that and download a .zip snapshot of the source. It has no third-party
dependencies yet it includes hardware-accelerated graphics, real-time
sound mixing, and gamepad input. Building a larger and more complex game
is much less about tooling and more about time and skill. That’s what I
mean about w64devkit being <a href="/blog/2020/09/25/">(almost) everything you need</a>.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Well-behaved alias commands on Windows</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2021/02/08/"/>
    <id>urn:uuid:d1c90d96-3696-4183-a52b-b10598a630c7</id>
    <updated>2021-02-08T20:32:45Z</updated>
    <category term="c"/><category term="cpp"/><category term="win32"/><category term="trick"/>
    <content type="html">
      <![CDATA[<p>Since its inception I’ve faced a dilemma with <a href="https://github.com/skeeto/w64devkit">w64devkit</a>, my
<a href="/blog/2020/09/25/">all-in-one</a> Mingw-w64 toolchain and <a href="/blog/2020/05/15/">development environment
distribution for Windows</a>. A major goal of the project is no
installation: unzip anywhere and it’s ready to go as-is. However, full
functionality requires alias commands, particularly for BusyBox applets,
and the usual solutions are neither available nor viable. It seemed that
an installer was needed to assemble this last puzzle piece. This past
weekend I finally discovered a tidy and complete solution that solves this
problem for good.</p>

<p>That solution is a small C source file, <a href="https://github.com/skeeto/w64devkit/blob/master/src/alias.c"><code class="language-plaintext highlighter-rouge">alias.c</code></a>. This article is
about why it’s necessary and how it works.</p>

<h3 id="hard-and-symbolic-links">Hard and symbolic links</h3>

<p>Some alias commands are for convenience, such as a <code class="language-plaintext highlighter-rouge">cc</code> alias for <code class="language-plaintext highlighter-rouge">gcc</code> so
that build systems need not assume any particular C compiler. Others are
essential, such as an <code class="language-plaintext highlighter-rouge">sh</code> alias for “<code class="language-plaintext highlighter-rouge">busybox sh</code>” so that it’s available
as a shell for <code class="language-plaintext highlighter-rouge">make</code>. These aliases are usually created with links, hard
or symbolic. A GCC installation might include (roughly) a symbolic link
created like so:</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">ln</span> <span class="nt">-s</span> gcc cc
</code></pre></div></div>

<p>BusyBox looks at its <code class="language-plaintext highlighter-rouge">argv[0]</code> on startup, and if it names an applet
(<code class="language-plaintext highlighter-rouge">ls</code>, <code class="language-plaintext highlighter-rouge">sh</code>, <code class="language-plaintext highlighter-rouge">awk</code>, etc.), it behaves like that applet. Typically BusyBox
aliases are installed as hard links to the original binary, and there’s
even a <code class="language-plaintext highlighter-rouge">busybox --install</code> to set these up. Both kinds of aliases are
cheap and effective.</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">ln </span>busybox sh
<span class="nb">ln </span>busybox <span class="nb">ls
ln </span>busybox <span class="nb">awk</span>
</code></pre></div></div>

<p>Unfortunately links are not supported by .zip files on Windows. They’d
need to be created by a dedicated installer. As a result, I’ve strongly
recommended that users run “<code class="language-plaintext highlighter-rouge">busybox --install</code>” at some point to
establish the BusyBox alias commands. While w64devkit works without them,
it works better with them. Still, that’s an installation step!</p>

<p>An alternative option is to simply include a full copy of the BusyBox
binary for each applet — all 150 of them — simulating hard links. BusyBox
is small, around 4kB per applet on average, but it’s not quite <em>that</em>
small. Since the .zip format doesn’t use block compression — files are
compressed individually — this duplication will appear in the .zip itself.
My 573kB BusyBox build duplicated 150 times would double the distribution
size and increase the installation footprint by 25%. It’s not worth the
cost.</p>

<p>Since .zip is so limited, perhaps I should use a different distribution
format that supports links. However, another w64devkit goal is making no
assumptions about what other tools are installed. Windows natively
supports .zip, even if that support isn’t so great (poor performance, low
composability, missing features, etc.). With nothing more than the
w64devkit .zip on a fresh, offline Windows installation, you can begin
efficiently developing professional, native applications in under a
minute.</p>

<h3 id="scripts-as-aliases">Scripts as aliases</h3>

<p>With links off the table, the next best option is a shell script. On
unix-like systems shell scripts are an effective tool for creating complex
alias commands. Unlike links, they can manipulate the argument list. For
instance, w64devkit includes a <code class="language-plaintext highlighter-rouge">c99</code> alias to invoke the C compiler
configured to use the C99 standard. To do this with a shell script:</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/sh</span>
<span class="nb">exec </span>cc <span class="nt">-std</span><span class="o">=</span>c99 <span class="s2">"</span><span class="nv">$@</span><span class="s2">"</span>
</code></pre></div></div>

<p>This prepends <code class="language-plaintext highlighter-rouge">-std=c99</code> to the argument list and passes through the rest
untouched via the Bourne shell’s special case <code class="language-plaintext highlighter-rouge">"$@"</code>. Because I used
<code class="language-plaintext highlighter-rouge">exec</code>, the shell process <em>becomes</em> the compiler in place. The shell
doesn’t hang around in the background. It’s just gone. This really quite
elegant and powerful.</p>

<p>The closest available on Windows is a .bat batch file. However, like some
other parts of DOS and Windows, the Batch language was designed as though
its designer once glimpsed at someone using a unix shell, perhaps looking
over their shoulder, then copied some of the ideas without understanding
them. As a result, it’s not nearly as useful or powerful. Here’s the Batch
equivalent:</p>

<div class="language-bat highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@cc <span class="na">-std</span><span class="o">=</span><span class="kd">c99</span> <span class="err">%</span><span class="o">*</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">@</code> is necessary because Batch prints its commands by default (Bourne
shell’s <code class="language-plaintext highlighter-rouge">-x</code> option), and <code class="language-plaintext highlighter-rouge">@</code> disables it. Windows lacks the concept of
<code class="language-plaintext highlighter-rouge">exec(3)</code>, so Batch file interpreter <code class="language-plaintext highlighter-rouge">cmd.exe</code> continues running alongside
the compiler. A little wasteful but that hardly matters. What does matter
though is that <code class="language-plaintext highlighter-rouge">cmd.exe</code> doesn’t behave itself! If you, say, Ctrl+C to
cancel compilation, you will get the infamous “Terminate batch job (Y/N)?”
prompt which interferes with other programs running in the same console.
The so-called “batch” script isn’t a batch job at all: It’s interactive.</p>

<p>I tried to use Batch files for BusyBox applets, but this issue came up
constantly and made this approach impractical. Nearly all BusyBox applets
are non-interactive, and lots of things break when they aren’t. Worst of
all, you can easily end up with layers of <code class="language-plaintext highlighter-rouge">cmd.exe</code> clobbering each other
to ask if they should terminate. It was frustrating.</p>

<p>The prompt is hardcoded in <code class="language-plaintext highlighter-rouge">cmd.exe</code> and cannot be disabled. Since so much
depends on <code class="language-plaintext highlighter-rouge">cmd.exe</code> remaining exactly the way it is, Microsoft will never
alter this behavior either. After all, that’s why they made PowerShell a
new, separate tool.</p>

<p>Speaking of PowerShell, could we use that instead? Unfortunately not:</p>

<ol>
  <li>
    <p>It’s installed by default on Windows, but is not necessarily enabled.
One of my own use cases for w64devkit involves systems where PowerShell
is disabled by policy. A common policy is it can be used interactively
but not run scripts (“Running scripts is disabled on this system”).</p>
  </li>
  <li>
    <p>PowerShell is not a first class citizen on Windows, and will likely
never be. Even under the friendliest policy it’s not normally possible
to put a PowerShell script on the <code class="language-plaintext highlighter-rouge">PATH</code> and run it by name. (I’m sure
there are ways to make this work via system-wide configuration, but
that’s off the table.)</p>
  </li>
  <li>
    <p>Everything in PowerShell is broken. For example, it does not support
input redirection with files, and instead you must use the <code class="language-plaintext highlighter-rouge">cat</code>-like
command, <code class="language-plaintext highlighter-rouge">Get-Content</code>, to pipe file contents. However, <code class="language-plaintext highlighter-rouge">Get-Content</code>
translates its input and quietly damages your data. There is no way to
disable this “feature” in the version of PowerShell that ships with
Windows, meaning it cannot accomplish the simplest of tasks. This is
just one of many ways that PowerShell is broken beyond usefulness.</p>
  </li>
</ol>

<p>Item (2) also affects w64devkit. It has a Bourne shell, but shell scripts
are still not first class citizens since Windows doesn’t know what to do
with them. Fixing would require system-wide configuration, antithetical to
the philosophy of the project.</p>

<h3 id="solution-compiled-shell-scripts">Solution: compiled shell “scripts”</h3>

<p>My working solution is inspired by an insanely clever hack used by my
favorite media player, <a href="https://mpv.io/">mpv</a>. The Windows build is strange at first
glance, containing two binaries, <code class="language-plaintext highlighter-rouge">mpv.exe</code> (large) and <code class="language-plaintext highlighter-rouge">mpv.com</code> (tiny).
Is that COM as in <a href="/blog/2014/12/09/">an old-school 16-bit DOS binary</a>? No, that’s just
a trick that works around a Windows limitation.</p>

<p>The Windows technology is broken up into subsystems. Console programs run
in the Console subsystem. Graphical programs run in the Windows subsystem.
<a href="/blog/2017/11/30/">The original WSL</a> was a subsystem. Unfortunately this design means
that a program must statically pick a subsystem, hardcoded into the binary
image. The program cannot select a subsystem dynamically. For example,
this is why Java installations have both <code class="language-plaintext highlighter-rouge">java.exe</code> and <code class="language-plaintext highlighter-rouge">javaw.exe</code>, and
Emacs has <code class="language-plaintext highlighter-rouge">emacs.exe</code> and <code class="language-plaintext highlighter-rouge">runemacs.exe</code>. Different binaries for different
subsystems.</p>

<p>On Linux, a program that wants to do graphics just talks to the Xorg
server or Wayland compositor. It can dynamically choose to be a terminal
application or a graphical application. Or even both at once. This is
exactly the behavior of <code class="language-plaintext highlighter-rouge">mpv</code>, and it faces a dilemma on Windows: With
subsystems, how can it be both?</p>

<p>The trick is based on the environment variable <code class="language-plaintext highlighter-rouge">PATHEXT</code> which tells
Windows how to prioritize executables with the same base name but
different file extensions. If I type <code class="language-plaintext highlighter-rouge">mpv</code> and it finds both <code class="language-plaintext highlighter-rouge">mpv.exe</code> and
<code class="language-plaintext highlighter-rouge">mpv.com</code>, which binary will run? It will be the first listed in
<code class="language-plaintext highlighter-rouge">PATHEXT</code>, and by default that starts with:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>PATHEXT=.COM;.EXE;.BAT;...
</code></pre></div></div>

<p>So it will run <code class="language-plaintext highlighter-rouge">mpv.com</code>, which is actually a plain old <a href="https://wiki.osdev.org/PE">PE+</a> <code class="language-plaintext highlighter-rouge">.exe</code>
in disguise. The Windows subsystem <code class="language-plaintext highlighter-rouge">mpv.exe</code> gets the shortcut and file
associations while Console subsystem <code class="language-plaintext highlighter-rouge">mpv.com</code> catches command line
invocations and serves as console liaison as it invokes the real
<code class="language-plaintext highlighter-rouge">mpv.exe</code>. Ingenious!</p>

<p>I realized I can pull a similar trick to create command aliases — not the
<code class="language-plaintext highlighter-rouge">.com</code> trick, but the miniature flagger program. If only I could compile
each of those Batch files to tiny, well-behaved <code class="language-plaintext highlighter-rouge">.exe</code> files so that it
wouldn’t rely on the badly-behaved <code class="language-plaintext highlighter-rouge">cmd.exe</code>…</p>

<h4 id="tiny-c-programs">Tiny C programs</h4>

<p>Years ago <a href="/blog/2016/01/31/">I wrote about tiny, freestanding Windows executables</a>.
That research paid off here since that’s exactly what I want. The alias
command program need only manipulate its command line, invoke another
program, then wait for it to finish. This doesn’t require the C library,
just a handful of <code class="language-plaintext highlighter-rouge">kernel32.dll</code> calls. My alias command programs can be
so small that would no longer matter that I have 150 of them, and I get
complete control over their behavior.</p>

<p>To compile, I use <code class="language-plaintext highlighter-rouge">-nostdlib</code> and <code class="language-plaintext highlighter-rouge">-ffreestanding</code> to disable all system
libraries, <code class="language-plaintext highlighter-rouge">-lkernel32</code> to pull that one back in, <code class="language-plaintext highlighter-rouge">-Os</code> (optimize for
size), and <code class="language-plaintext highlighter-rouge">-s</code> (strip) all to make the result as small as possible.</p>

<p>I don’t want to write a little program for each alias command. Instead
I’ll use a couple of C defines, <code class="language-plaintext highlighter-rouge">EXE</code> and <code class="language-plaintext highlighter-rouge">CMD</code>, to inject the target
command at compile time. So this Batch file:</p>

<div class="language-bat highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@target <span class="kd">arg1</span> <span class="kd">arg2</span> <span class="err">%</span><span class="o">*</span>
</code></pre></div></div>

<p>Is equivalent to this alias compilation:</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gcc <span class="nt">-DEXE</span><span class="o">=</span><span class="s2">"target.exe"</span> <span class="nt">-DCMD</span><span class="o">=</span><span class="s2">"target arg1 arg2"</span> <span class="se">\</span>
    <span class="nt">-s</span> <span class="nt">-Os</span> <span class="nt">-nostdlib</span> <span class="nt">-ffreestanding</span> <span class="nt">-o</span> alias.exe alias.c <span class="nt">-lkernel32</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">EXE</code> string is the actual <em>module</em> name, so the <code class="language-plaintext highlighter-rouge">.exe</code> extension is
required. The <code class="language-plaintext highlighter-rouge">CMD</code> string replaces the first complete token of the
command line string (think <code class="language-plaintext highlighter-rouge">argv[0]</code>) and may contain arbitrary additional
arguments (e.g. <code class="language-plaintext highlighter-rouge">-std=c99</code>). Both are handled as wide strings (<code class="language-plaintext highlighter-rouge">L"..."</code>)
since the alias program uses the wide Win32 API in order to be fully
transparent. Though unfortunately at this time it makes no difference: All
currently aliased programs use the “ANSI” API since the underlying C and
C++ standard libraries only use the ANSI API. (As far as I know, nobody
has ever written fully-functional C and C++ standard libraries for
Windows, not even Microsoft.)</p>

<p>You might wonder why the heck I’m gluing strings together for the
arguments. These will need to be parsed (word split, etc.) by someone
else, so shouldn’t I construct an argv array instead? That’s not how it
works on Windows: Programs receive a flat command string and are expected
to parse it themselves following <a href="https://docs.microsoft.com/en-us/previous-versions/17w5ykft(v=vs.85)">the format specification</a>. When
you write a C program, the C runtime does this for you to provide the
usual argv array.</p>

<p>This is upside down. The caller creating the process already has arguments
split into an argv array — or something like it — but Win32 requires the
caller to encode the argv array as a string following a special format so
that the recipient can immediately decode it. Why marshaling rather than
pass structured data in the first place? Why does Win32 only supply a
decoder (<a href="https://docs.microsoft.com/en-us/windows/win32/api/shellapi/nf-shellapi-commandlinetoargvw"><code class="language-plaintext highlighter-rouge">CommandLineToArgv</code></a>) and not an encoder (e.g. the missing
<code class="language-plaintext highlighter-rouge">ArgvToCommandLine</code>)? Hey, I don’t make the rules; I just have to live
with them.</p>

<p>You can look at the original source for the details, but the summary is
that I supply my own <code class="language-plaintext highlighter-rouge">xstrlen()</code>, <code class="language-plaintext highlighter-rouge">xmemcpy()</code>, and partial Win32 command
line parser — just enough to identify the first token, even if that token
is quoted. It glues the strings together, calls <code class="language-plaintext highlighter-rouge">CreateProcessW</code>, waits
for it to exit (<code class="language-plaintext highlighter-rouge">WaitForSingleObject</code>), retrieves the exit code
(<code class="language-plaintext highlighter-rouge">GetExitCodeProcess</code>), and exits with the same status. (The stuff that
comes for free with <code class="language-plaintext highlighter-rouge">exec(3)</code>.)</p>

<p>This all compiles to a 4kB executable, mostly padding, which is small
enough for my purposes. These compress to an acceptable 1kB each in the
.zip file. Smaller would be nicer, but this would require at minimum a
custom linker script, and even smaller would require hand-crafted
assembly.</p>

<p>This lingering issue solved, w64devkit now works better than ever. The
<code class="language-plaintext highlighter-rouge">alias.c</code> source is included in the kit in case you need to make any of
your own well-behaved alias commands.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>w64devkit: (Almost) Everything You Need</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2020/09/25/"/>
    <id>urn:uuid:e594c82d-a2e1-4035-8527-1b998045ceeb</id>
    <updated>2020-09-25T00:04:11Z</updated>
    <category term="c"/><category term="cpp"/><category term="win32"/><category term="rant"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=24586556">on Hacker News</a>.</em></p>

<p><a href="/blog/2020/05/15/">This past May</a> I put together my own C and C++ development
distribution for Windows called <a href="https://github.com/skeeto/w64devkit"><strong>w64devkit</strong></a>. The <em>entire</em>
release weighs under 80MB and requires no installation. Unzip and run it
in-place anywhere. It’s also entirely offline. It will never
automatically update, or even touch the network. In mere seconds any
Windows system can become a reliable development machine. (To further
increase reliability, <a href="https://jacquesmattheij.com/why-johnny-wont-upgrade/">disconnect it from the internet</a>.) Despite
its simple nature and small packaging, w64devkit is <em>almost</em> everything
you need to develop <em>any</em> professional desktop application, from a
command line utility to a AAA game.</p>

<!--more-->

<p>I don’t mean this in some <a href="/blog/2016/04/30/">useless Turing-complete sense</a>, but in
a practical, <em>get-stuff-done</em> sense. It’s much more a matter of
<em>know-how</em> than of tools or libraries. So then what is this “almost”
about?</p>

<ul>
  <li>
    <p>The distribution does not have WinAPI documentation. It’s notoriously
<a href="http://laurencejackson.com/win32/">difficult to obtain</a> and, besides, unfriendly to redistribution.
It’s essential for interfacing with the operating system and difficult
to work without. Even a dead tree reference book would suffice.</p>
  </li>
  <li>
    <p>Depending on what you’re building, you may still need specialized
tools. For instance, game development requires <a href="https://www.blender.org/">tools for editing art
assets</a>.</p>
  </li>
  <li>
    <p>There is no formal source control system. Git is excluded per the
issues noted in the announcement, and my next option, <a href="https://wiki.debian.org/UsingQuilt">Quilt</a>,
has similar limitations. However, <code class="language-plaintext highlighter-rouge">diff</code> and <code class="language-plaintext highlighter-rouge">patch</code> <em>are</em> included,
and are sufficient for a kind of old-school, patch-based source
control. I’ve used it successfully when dogfooding w64devkit in a
fresh Windows installation.</p>
  </li>
</ul>

<h3 id="everything-else">Everything else</h3>

<p>As I said in my announcement, w64devkit includes a powerful text editor
that fulfills all text editing needs, from code to documentation. The
editor includes a tutorial (<code class="language-plaintext highlighter-rouge">vimtutor</code>) and complete, built-in manual
(<code class="language-plaintext highlighter-rouge">:help</code>) in case you’re not yet familiar with it.</p>

<p>What about navigation? Use the included <a href="https://github.com/universal-ctags/ctags">ctags</a> to generate a
tags database (<code class="language-plaintext highlighter-rouge">ctags -R</code>), then <a href="http://vimdoc.sourceforge.net/htmldoc/tagsrch.html#tagsrch.txt">jump instantly</a> to any
definition at any time. No need for <a href="https://old.reddit.com/r/vim/comments/b3yzq4/a_lsp_client_maintainers_view_of_the_lsp_protocol/">that Language Server Protocol
rubbish</a>. This does not mean you must laboriously type identifiers
as you work. Use <a href="https://georgebrock.github.io/talks/vim-completion/">built-in completion</a>!</p>

<p>Build system? That’s also covered, via a Windows-aware unix-like
environment that includes <code class="language-plaintext highlighter-rouge">make</code>. <a href="/blog/2017/08/20/">Learning how to use it</a> is a
breeze. Software is by its nature unavoidably complicated, so <a href="/blog/2017/03/30/">don’t
make it more complicated than necessary</a>.</p>

<p>What about debugging? Use the debugger, GDB. Performance problems? Use
the profiler, gprof. Inspect compiler output either by asking for it
(<code class="language-plaintext highlighter-rouge">-S</code>) or via the disassembler (<code class="language-plaintext highlighter-rouge">objdump -d</code>). No need to go online for
the <a href="https://godbolt.org/">Godbolt Compiler Explorer</a>, as slick as it is. If the compiler
output is insufficient, use <a href="/blog/2015/07/10/">SIMD intrinsics</a>. In the worst case
there are two different assemblers available. Real time graphics? Use an
operating system API like OpenGL, DirectX, or Vulkan.</p>

<p>w64devkit <em>really is</em> nearly everything you need in a <a href="https://www.youtube.com/watch?v=W3ml7cO96F0&amp;t=1h25m50s">single, no
nonsense, fully-<em>offline</em> package</a>! It’s difficult to emphasize this
point as much as I’d like. When interacting with the broader software
ecosystem, I often despair that <a href="https://www.youtube.com/watch?v=ZSRHeXYDLko">software development has lost its
way</a>. This distribution is my way of carving out an escape from some
of the insanity. As a C and C++ toolchain, w64devkit by default produces
lean, sane, trivially-distributable, offline-friendly artifacts. All
runtime components in the distribution are <a href="https://drewdevault.com/dynlib">static link only</a>,
so no need to distribute DLLs with your application either.</p>

<h3 id="customize-the-distribution-own-the-toolchain">Customize the distribution, own the toolchain</h3>

<p>While most users would likely stick to my published releases, building
w64devkit is a two-step process with a single build dependency, Docker.
Anyone can easily customize it for their own needs. Don’t care about
C++? Toss it to shave 20% off the distribution. Need to tune the runtime
for a specific microarchitecture? Tweak the compiler flags.</p>

<p>One of the intended strengths of open source is users can modify
software to suit their needs. With w64devkit, you <em>own the toolchain</em>
itself. It is <a href="https://research.swtch.com/deps">one of your dependencies</a> after all. Unfortunately
the build initially requires an internet connection even when working
from source tarballs, but at least it’s a one-time event.</p>

<p>If you choose to <a href="https://github.com/nothings/stb">take on dependencies</a>, and you build those
dependencies using w64devkit, all the better! You can tweak them to your
needs and choose precisely how they’re built. You won’t be relying on
the goodwill of internet randos nor the generosity of a free package
registry.</p>

<h3 id="customization-examples">Customization examples</h3>

<p>Building existing software using w64devkit is probably easier than
expected, particularly since much of it has already been “ported” to
MinGW and Mingw-w64. Just don’t bother with GNU Autoconf configure
scripts. They never work in w64devkit despite having everything they
technically need. So other than that, here’s a demonstration of building
some popular software.</p>

<p>One of <a href="/blog/2016/09/02/">my coworkers</a> uses his own version of <a href="https://www.chiark.greenend.org.uk/~sgtatham/putty/">PuTTY</a>
patched to play more nicely with Emacs. If you wanted to do the same,
grab the source tarball, unpack it using the provided tools, then in the
unpacked source:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ make -C windows -f Makefile.mgw
</code></pre></div></div>

<p>You’ll have a custom-built putty.exe, as well as the other tools. If you
have any patches, apply those first!</p>

<p>Would you like to embed an extension language in your application? Lua
is a solid choice, in part because it’s such a well-behaved dependency.
After unpacking the source tarball:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ make PLAT=mingw
</code></pre></div></div>

<p>This produces a complete Lua compiler, runtime, and library. It’s not
even necessary to use the Makefile, as it’s nearly as simple as “<code class="language-plaintext highlighter-rouge">cc
*.c</code>” — painless to integrate or embed into any project.</p>

<p>Do you enjoy NetHack? Perhaps you’d like to <a href="https://bilious.alt.org/">try a few of the custom
patches</a>. This one is a little more complicated, but I was able to
build NetHack 3.6.6 like so:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sys/winnt/nhsetup.bat
$ make -C src -f Makefile.gcc cc="cc -fcommon" link="cc"
</code></pre></div></div>

<p>NetHack has <a href="https://wiki.gentoo.org/wiki/Gcc_10_porting_notes/fno_common">a bug necessitating <code class="language-plaintext highlighter-rouge">-fcommon</code></a>. If you have any
patches, apply them with <code class="language-plaintext highlighter-rouge">patch</code> before the last step. I won’t belabor it
here, but with just a little more effort I was also able to produce a
NetHack binary with curses support via <a href="https://pdcurses.org/">PDCurses</a> — statically-linked
of course.</p>

<p>How about my archive encryption tool, <a href="https://github.com/skeeto/enchive">Enchive</a>? The one that
<a href="/blog/2018/04/13/">even works with 16-bit DOS compilers</a>. It requires nothing special
at all!</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ make
</code></pre></div></div>

<p>w64devkit can also host parts of itself: Universal Ctags, Vim, and NASM.
This means you can modify and recompile these tools without going
through the Docker build. Sadly <a href="https://frippery.org/busybox/">busybox-w32</a> cannot host itself,
though it’s close. I’d <em>love</em> if w64devkit could fully host itself, and
so Docker — and therefore an internet connection and such — would only
be needed to bootstrap, but unfortunately that’s not realistic given the
state of the GNU components.</p>

<h3 id="offline-and-reliable">Offline and reliable</h3>

<p>Software development has increasingly become <a href="https://deftly.net/posts/2017-06-01-measuring-the-weight-of-an-electron.html">dependent on a constant
internet connection</a>. Robust, offline tooling and development is
undervalued.</p>

<p>Consider: Does your current project depend on an external service? Do
you pay for this service to ensure that it remains up? If you pull your
dependencies from a repository, how much do you trust those who maintain
the packages? <a href="https://drewdevault.com/2020/02/06/Dependencies-and-maintainers.html">Do you even know their names?</a> What would be your
project’s fate if that service went down permanently? It will someday,
though hopefully only after your project is dead and forgotten. If you
have the ability to work permanently offline, then you already have
happy answers to all these questions.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>w64devkit: a Portable C and C++ Development Kit for Windows</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2020/05/15/"/>
    <id>urn:uuid:d600d846-3692-474f-adbf-45db63079581</id>
    <updated>2020-05-15T03:43:04Z</updated>
    <category term="c"/><category term="cpp"/><category term="win32"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=23292161">on Hacker News</a>.</em></p>

<p>As a computer engineer, my job is to use computers to solve important
problems. Ideally my solutions will be efficient, and typically that
means making the best use of the resources at hand. Quite often these
resources are machines running Windows and, despite my misgivings about
the platform, there is much to be gained by properly and effectively
leveraging it.</p>

<p>Sometimes <a href="/blog/2018/11/15/">targeting Windows while working from another platform</a>
is sufficient, but other times I must work on the platform itself. There
<a href="/blog/2016/06/13/">are various options available</a> for C development, and I’ve
finally formalized my own development kit: <a href="https://github.com/skeeto/w64devkit"><strong>w64devkit</strong></a>.</p>

<!--more-->

<p>For most users, the value is in the <strong>78MiB .zip</strong> available in the
“Releases” on GitHub. This (relatively) small package includes a
state-of-the-art C and C++ compiler (<a href="http://mingw-w64.org/">latest GCC</a>), a <a href="https://www.vim.org/">powerful
text editor</a>, <a href="https://www.gnu.org/software/gdb/">debugger</a>, a <a href="https://www.nasm.us/">complete x86 assembler</a>,
and <a href="https://frippery.org/busybox/">miniature unix environment</a>. It’s “portable” in that there’s no
installation. Just unzip it and start using it in place. With w64devkit,
it literally takes a few seconds on any Windows to get up and running
with a fully-featured, fully-equipped, first-class <a href="https://sanctum.geek.nz/arabesque/unix-as-ide-introduction/">development
environment</a>.</p>

<p>The development kit is cross-compiled entirely from source using Docker,
though Docker is not needed to actually use it. The repository is just a
Dockerfile and some documentation. The only build dependency is Docker
itself. It’s also easy to customize it for your own personal use, or to
audit and build your own if, for whatever reason, you didn’t trust my
distribution. This is in stark contrast to Windows builds of most open
source software where the build process is typically undocumented,
under-documented, obtuse, or very complicated.</p>

<h3 id="from-script-to-docker">From script to Docker</h3>

<p>Publishing this is not necessarily a commitment to always keep w64devkit
up to date, but this Dockerfile <em>is</em> derived from (and replaces) a shell
script I’ve been using continuously <a href="/blog/2018/04/13/#a-better-alternative">for over two years now</a>. In
this period, every time GCC has made a release, I’ve built myself a new
development kit, so I’m already in the habit.</p>

<p>I’ve been using Docker on and off for about 18 months now. It’s an
oddball in that it’s something I learned on the job rather than my own
time. I formed an early impression that still basically holds: <strong>The
main purpose of Docker is to contain and isolate misbehaved software to
improve its reliability</strong>. Well-behaved, well-designed software benefits
little from containers.</p>

<p>My unusual application of Docker here is no exception. <a href="/blog/2017/03/30/">Most software
builds are needlessly complicated and fragile</a>, especially
Autoconf-based builds. Ironically, the worst configure scripts I’ve
dealt with come from GNU projects. They waste time on superfluous checks
(“Does your compiler define <code class="language-plaintext highlighter-rouge">size_t</code>?”) then produce a build that
doesn’t work anyway because you’re doing something slightly unusual.
Worst of all, despite my best efforts, the build will be contaminated by
the state of the system doing the build.</p>

<p>My original build script was fragile by extension. It would work on one
system, but not another due to some subtle environment change — a
slightly different system header that reveals a build system bug
(<a href="https://gcc.gnu.org/legacy-ml/gcc/2017-05/msg00219.html">example in GCC</a>), or the system doesn’t have a file at a certain
hard-coded absolute path that shouldn’t be hard-coded. Converting my
script to a Dockerfile locks these problems in place and makes builds
much more reliable and repeatable. The misbehavior is contained and
isolated by Docker.</p>

<p>Unfortunately it’s not <em>completely</em> contained. In each case I use make’s
<code class="language-plaintext highlighter-rouge">-j</code> option to parallelize the build since otherwise it would take
hours. Some of the builds have subtle race conditions, and some bad luck
in timing can cause a build to fail. Docker is good about picking up
where it left off, so it’s just a matter of trying again.</p>

<p>In one case a build failed because Bison and flex were not installed
even though they’re not normally needed. Some dependency isn’t expressed
correctly, and unlucky ordering leads to an unused <code class="language-plaintext highlighter-rouge">.y</code> file having the
wrong timestamp. Ugh. I’ve had this happen a lot more in Docker than
out, probably because file system operations are slow inside Docker and
it creates greater timing variance.</p>

<h3 id="other-tools">Other tools</h3>

<p>The README explains some of my decisions, but I’ll summarize a few here:</p>

<ul>
  <li>
    <p>Git. Important and useful, so I’d love to have it. But it has a weird
installation (many <a href="https://github.com/skeeto/w64devkit/issues/1">.zip-unfriendly symlinks</a>) tightly-coupled
with msys2, and its build system does not support cross-compilation.
I’d love to see a clean, straightforward rewrite of Git in a single,
appropriate implementation language. Imagine installing the latest Git
with <code class="language-plaintext highlighter-rouge">go get git-scm.com/git</code>. (<em>Update</em>: <a href="https://github.com/libgit2/libgit2/pull/5507">libgit2 is working on
it</a>!)</p>
  </li>
  <li>
    <p>Bash. It’s a much nicer interactive shell than BusyBox-w32 <code class="language-plaintext highlighter-rouge">ash</code>. But
the build system doesn’t support cross-compilation, and I’m not sure
it supports Windows without some sort of compatibility layer anyway.</p>
  </li>
  <li>
    <p>Emacs. Another powerful editor. But the build system doesn’t support
cross-compilation. It’s also <em>way</em> too big.</p>
  </li>
  <li>
    <p>Go. Tempting to toss it in, but <a href="/blog/2020/01/21/">Go already does this all correctly
and effectively</a>. It simply doesn’t require a specialized
distribution. It’s trivial to manage a complete Go toolchain with
nothing but Go itself on any system. People may say its language
design comes from the 1970s, but the tooling is decades ahead of
everyone else.</p>
  </li>
</ul>

<h3 id="alternatives">Alternatives</h3>

<p>For a long, long time Cygwin filled this role for me. However, I never
liked its bulky nature, the complete opposite of portable. Cygwin
processes always felt second-class on Windows, particularly in that it
has its own view of the file system compared to other Windows processes.
They could never fully cooperate. I also don’t like that there’s no
toolchain for cross-compiling with Cygwin as a target — e.g. compile
Cygwin binaries from Linux. Finally <a href="/blog/2017/11/30/">it’s been essentially obsoleted by
WSL</a> which matches or surpasses it on every front.</p>

<p>There’s msys and <a href="https://www.msys2.org/">msys2</a>, which are a bit lighter. However, I’m
still in an isolated, second-class environment with weird path
translation issues. These tools <em>do</em> have important uses, and it’s the
only way to compile most open source software natively on Windows. For
those builds that don’t support cross-compilation, it’s <em>the</em> only path
for producing Windows builds. It’s just not what I’m looking for when
developing my own software.</p>

<p><em>Update</em>: <a href="https://github.com/mstorsjo/llvm-mingw">llvm-mingw</a> is an eerily similar project using Docker
the same way, but instead builds LLVM.</p>

<h3 id="using-docker-for-other-builds">Using Docker for other builds</h3>

<p>I also <a href="https://github.com/skeeto/gnupg-windows-build">converted my GnuPG build script</a> to a Dockerfile. Of
course I don’t plan to actually <em>use</em> GnuPG on Windows. I just need it
<a href="/blog/2019/07/10/">for passphrase2pgp</a>, which I test against GnuPG. This tests the
Windows build.</p>

<p>In the future I may extend this idea to a few other tools I don’t intend
to include with w64devkit. If you have something in mind, you could use
my Dockerfiles as a kind of starter template.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Chunking Optimizations: Let the Knife Do the Work</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/12/09/"/>
    <id>urn:uuid:961086fa-46af-42d4-bd69-6f4a326a1505</id>
    <updated>2019-12-09T22:37:55Z</updated>
    <category term="c"/><category term="cpp"/><category term="optimization"/><category term="x86"/>
    <content type="html">
      <![CDATA[<p>There’s an old saying, <a href="https://www.youtube.com/watch?v=bTee6dKpDB0"><em>let the knife do the work</em></a>. Whether
preparing food in the kitchen or whittling a piece of wood, don’t push
your weight into the knife. Not only is it tiring, you’re much more
likely to hurt yourself. Use the tool properly and little force will be
required.</p>

<p>The same advice also often applies to compilers.</p>

<p>Suppose you need to XOR two, non-overlapping 64-byte (512-bit) blocks of
data. The simplest approach would be to do it a byte at a time:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* XOR src into dst */</span>
<span class="kt">void</span>
<span class="nf">xor512a</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="n">dst</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">src</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">pd</span> <span class="o">=</span> <span class="n">dst</span><span class="p">;</span>
    <span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">ps</span> <span class="o">=</span> <span class="n">src</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">64</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">pd</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">^=</span> <span class="n">ps</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Maybe you benchmark it or you look at the assembly output, and the
results are disappointing. Your compiler did <em>exactly</em> what you asked
of it and produced code that performs 64 single-byte XOR operations
(GCC 9.2.0, x86-64, <code class="language-plaintext highlighter-rouge">-Os</code>):</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">xor512a:</span>
        <span class="nf">xor</span>    <span class="nb">eax</span><span class="p">,</span> <span class="nb">eax</span>
<span class="nl">.L0:</span>    <span class="nf">mov</span>    <span class="nb">cl</span><span class="p">,</span> <span class="p">[</span><span class="nb">rsi</span><span class="o">+</span><span class="nb">rax</span><span class="p">]</span>
        <span class="nf">xor</span>    <span class="p">[</span><span class="nb">rdi</span><span class="o">+</span><span class="nb">rax</span><span class="p">],</span> <span class="nb">cl</span>
        <span class="nf">inc</span>    <span class="nb">rax</span>
        <span class="nf">cmp</span>    <span class="nb">rax</span><span class="p">,</span> <span class="mi">64</span>
        <span class="nf">jne</span>    <span class="nv">.L0</span>
        <span class="nf">ret</span>
</code></pre></div></div>

<p>The target architecture has wide registers so it could be doing <em>at
least</em> 8 bytes at a time. Since your compiler isn’t doing it, you
decide to chunk the work into 8 byte blocks yourself in an attempt to
manually implement a <em>chunking operation</em>. Here’s some <a href="https://old.reddit.com/r/C_Programming/comments/e83jzk/strange_gcc_compiler_bug_when_using_o2_or_higher/">real world
code</a> that does so:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* WARNING: Broken, do not use! */</span>
<span class="kt">void</span>
<span class="nf">xor512b</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="n">dst</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">src</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">pd</span> <span class="o">=</span> <span class="n">dst</span><span class="p">;</span>
    <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">ps</span> <span class="o">=</span> <span class="n">src</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">8</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">pd</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">^=</span> <span class="n">ps</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>You check the assembly output of this function, and it looks much
better. It’s now processing 8 bytes at a time, so it should be about 8
times faster than before.</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">xor512b:</span>
        <span class="nf">xor</span>    <span class="nb">eax</span><span class="p">,</span> <span class="nb">eax</span>
<span class="nl">.L0:</span>    <span class="nf">mov</span>    <span class="nb">rcx</span><span class="p">,</span> <span class="p">[</span><span class="nb">rsi</span><span class="o">+</span><span class="nb">rax</span><span class="o">*</span><span class="mi">8</span><span class="p">]</span>
        <span class="nf">xor</span>    <span class="p">[</span><span class="nb">rdi</span><span class="o">+</span><span class="nb">rax</span><span class="o">*</span><span class="mi">8</span><span class="p">],</span> <span class="nb">rcx</span>
        <span class="nf">inc</span>    <span class="nb">rax</span>
        <span class="nf">cmp</span>    <span class="nb">rax</span><span class="p">,</span> <span class="mi">8</span>
        <span class="nf">jne</span>    <span class="nv">.L0</span>
        <span class="nf">ret</span>
</code></pre></div></div>

<p>Still, this machine has 16-byte wide registers (SSE2 <code class="language-plaintext highlighter-rouge">xmm</code>), so there
could be another doubling in speed. Oh well, this is good enough, so you
plug it into your program. But something strange happens: <strong>The output
is now wrong!</strong></p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>
<span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">uint32_t</span> <span class="n">dst</span><span class="p">[</span><span class="mi">32</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
        <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span>
        <span class="mi">9</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">11</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">13</span><span class="p">,</span> <span class="mi">14</span><span class="p">,</span> <span class="mi">15</span><span class="p">,</span> <span class="mi">16</span>
    <span class="p">};</span>
    <span class="kt">uint32_t</span> <span class="n">src</span><span class="p">[</span><span class="mi">32</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
        <span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">9</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">25</span><span class="p">,</span> <span class="mi">36</span><span class="p">,</span> <span class="mi">49</span><span class="p">,</span> <span class="mi">64</span><span class="p">,</span>
        <span class="mi">81</span><span class="p">,</span> <span class="mi">100</span><span class="p">,</span> <span class="mi">121</span><span class="p">,</span> <span class="mi">144</span><span class="p">,</span> <span class="mi">169</span><span class="p">,</span> <span class="mi">196</span><span class="p">,</span> <span class="mi">225</span><span class="p">,</span> <span class="mi">256</span><span class="p">,</span>
    <span class="p">};</span>
    <span class="n">xor512b</span><span class="p">(</span><span class="n">dst</span><span class="p">,</span> <span class="n">src</span><span class="p">);</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">16</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"%d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="n">dst</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Your program prints 1..16 as if <code class="language-plaintext highlighter-rouge">xor512b()</code> was never called. You check
over everything a dozen times, and you can’t find anything wrong. Even
crazier, if you disable optimizations then the bug goes away. It must be
some kind of compiler bug!</p>

<p>Investigating a bit more, you learn that the <code class="language-plaintext highlighter-rouge">-fno-strict-aliasing</code>
option also fixes the bug. That’s because this program violates C strict
aliasing rules. An array of <code class="language-plaintext highlighter-rouge">uint32_t</code> was accessed as a <code class="language-plaintext highlighter-rouge">uint64_t</code>. As
an <a href="/blog/2018/07/20/#strict-aliasing">important optimization</a>, compilers are allowed to assume such
variables do not alias and generate code accordingly. Otherwise every
memory store could potentially modify any variable, which limits the
compiler’s ability to produce decent code.</p>

<p>The original version is fine because <code class="language-plaintext highlighter-rouge">char *</code>, including both <code class="language-plaintext highlighter-rouge">signed</code>
and <code class="language-plaintext highlighter-rouge">unsigned</code>, has a special exemption and may alias with anything. For
the same reason, using <code class="language-plaintext highlighter-rouge">char *</code> unnecessarily can also make your
programs slower.</p>

<p>What could you do to keep the chunking operation while not running afoul
of strict aliasing? Counter-intuitively, you could use <code class="language-plaintext highlighter-rouge">memcpy()</code>. Copy
the chunks into legitimate, local <code class="language-plaintext highlighter-rouge">uint64_t</code> variables, do the work, and
copy the result back out.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span>
<span class="nf">xor512c</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="n">dst</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">src</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">8</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">uint64_t</span> <span class="n">buf</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span>
        <span class="n">memcpy</span><span class="p">(</span><span class="n">buf</span> <span class="o">+</span> <span class="mi">0</span><span class="p">,</span> <span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="p">)</span><span class="n">dst</span> <span class="o">+</span> <span class="n">i</span><span class="o">*</span><span class="mi">8</span><span class="p">,</span> <span class="mi">8</span><span class="p">);</span>
        <span class="n">memcpy</span><span class="p">(</span><span class="n">buf</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="p">)</span><span class="n">src</span> <span class="o">+</span> <span class="n">i</span><span class="o">*</span><span class="mi">8</span><span class="p">,</span> <span class="mi">8</span><span class="p">);</span>
        <span class="n">buf</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">^=</span> <span class="n">buf</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
        <span class="n">memcpy</span><span class="p">((</span><span class="kt">char</span> <span class="o">*</span><span class="p">)</span><span class="n">dst</span> <span class="o">+</span> <span class="n">i</span><span class="o">*</span><span class="mi">8</span><span class="p">,</span> <span class="n">buf</span><span class="p">,</span> <span class="mi">8</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Since <code class="language-plaintext highlighter-rouge">memcpy()</code> is a built-in function, your compiler knows its
semantics and can ultimately elide all that copying. The assembly
listing for <code class="language-plaintext highlighter-rouge">xor512c</code> is identical to <code class="language-plaintext highlighter-rouge">xor512b</code>, but it won’t go haywire
when integrated into a real program.</p>

<p>It works and it’s correct, but you can still do much better than this!</p>

<h3 id="letting-your-compiler-do-the-work">Letting your compiler do the work</h3>

<p>The problem is you’re forcing the knife and not letting it do the work.
There’s a constraint on your compiler that hasn’t been considered: It
must work correctly for overlapping inputs.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">74</span><span class="p">]</span> <span class="o">=</span> <span class="p">{...};</span>
<span class="n">xor512a</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="n">buf</span> <span class="o">+</span> <span class="mi">10</span><span class="p">);</span>
</code></pre></div></div>

<p>In this situation, the byte-by-byte and chunked versions of the function
will have different results. That’s exactly why your compiler can’t do
the chunking operation itself. However, <em>you don’t care about this
situation</em> because the inputs never overlap.</p>

<p>Let’s revisit the first, simple implementation, but this time being
smarter about it. The <code class="language-plaintext highlighter-rouge">restrict</code> keyword indicates that the inputs
will not overlap, freeing your compiler of this unwanted concern.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span>
<span class="nf">xor512d</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="kr">restrict</span> <span class="n">dst</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="kr">restrict</span> <span class="n">src</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">pd</span> <span class="o">=</span> <span class="n">dst</span><span class="p">;</span>
    <span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">ps</span> <span class="o">=</span> <span class="n">src</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">64</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">pd</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">^=</span> <span class="n">ps</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>(Side note: Adding <code class="language-plaintext highlighter-rouge">restrict</code> to the manually chunked function,
<code class="language-plaintext highlighter-rouge">xor512b()</code>, will not fix it. Using <code class="language-plaintext highlighter-rouge">restrict</code> can never make an
incorrect program correct.)</p>

<p>Compiled with GCC 9.2.0 and <code class="language-plaintext highlighter-rouge">-O3</code>, the resulting unrolled code
processes 16-byte chunks at a time (<code class="language-plaintext highlighter-rouge">pxor</code>):</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">xor512d:</span>
        <span class="nf">movdqu</span>  <span class="nv">xmm0</span><span class="p">,</span> <span class="p">[</span><span class="nb">rdi</span><span class="o">+</span><span class="mh">0x00</span><span class="p">]</span>
        <span class="nf">movdqu</span>  <span class="nv">xmm1</span><span class="p">,</span> <span class="p">[</span><span class="nb">rsi</span><span class="o">+</span><span class="mh">0x00</span><span class="p">]</span>
        <span class="nf">movdqu</span>  <span class="nv">xmm2</span><span class="p">,</span> <span class="p">[</span><span class="nb">rsi</span><span class="o">+</span><span class="mh">0x10</span><span class="p">]</span>
        <span class="nf">movdqu</span>  <span class="nv">xmm3</span><span class="p">,</span> <span class="p">[</span><span class="nb">rsi</span><span class="o">+</span><span class="mh">0x20</span><span class="p">]</span>
        <span class="nf">pxor</span>    <span class="nv">xmm0</span><span class="p">,</span> <span class="nv">xmm1</span>
        <span class="nf">movdqu</span>  <span class="nv">xmm4</span><span class="p">,</span> <span class="p">[</span><span class="nb">rdi</span><span class="o">+</span><span class="mh">0x30</span><span class="p">]</span>
        <span class="nf">movups</span>  <span class="p">[</span><span class="nb">rdi</span><span class="o">+</span><span class="mh">0x00</span><span class="p">],</span> <span class="nv">xmm0</span>
        <span class="nf">movdqu</span>  <span class="nv">xmm0</span><span class="p">,</span> <span class="p">[</span><span class="nb">rdi</span><span class="o">+</span><span class="mh">0x10</span><span class="p">]</span>
        <span class="nf">pxor</span>    <span class="nv">xmm0</span><span class="p">,</span> <span class="nv">xmm2</span>
        <span class="nf">movups</span>  <span class="p">[</span><span class="nb">rdi</span><span class="o">+</span><span class="mh">0x10</span><span class="p">],</span> <span class="nv">xmm0</span>
        <span class="nf">movdqu</span>  <span class="nv">xmm0</span><span class="p">,</span> <span class="p">[</span><span class="nb">rdi</span><span class="o">+</span><span class="mh">0x20</span><span class="p">]</span>
        <span class="nf">pxor</span>    <span class="nv">xmm0</span><span class="p">,</span> <span class="nv">xmm3</span>
        <span class="nf">movups</span>  <span class="p">[</span><span class="nb">rdi</span><span class="o">+</span><span class="mh">0x20</span><span class="p">],</span> <span class="nv">xmm0</span>
        <span class="nf">movdqu</span>  <span class="nv">xmm0</span><span class="p">,</span> <span class="p">[</span><span class="nb">rsi</span><span class="o">+</span><span class="mh">0x30</span><span class="p">]</span>
        <span class="nf">pxor</span>    <span class="nv">xmm0</span><span class="p">,</span> <span class="nv">xmm4</span>
        <span class="nf">movups</span>  <span class="p">[</span><span class="nb">rdi</span><span class="o">+</span><span class="mh">0x30</span><span class="p">],</span> <span class="nv">xmm0</span>
        <span class="nf">ret</span>
</code></pre></div></div>

<p>Compiled with Clang 9.0.0 with AVX-512 enabled in the target
(<code class="language-plaintext highlighter-rouge">-mavx512bw</code>), <em>it does the entire operation in a single, big chunk!</em></p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">xor512d:</span>
        <span class="nf">vmovdqu64</span>   <span class="nv">zmm0</span><span class="p">,</span> <span class="p">[</span><span class="nb">rdi</span><span class="p">]</span>
        <span class="nf">vpxorq</span>      <span class="nv">zmm0</span><span class="p">,</span> <span class="nv">zmm0</span><span class="p">,</span> <span class="p">[</span><span class="nb">rsi</span><span class="p">]</span>
        <span class="nf">vmovdqu64</span>   <span class="p">[</span><span class="nb">rdi</span><span class="p">],</span> <span class="nv">zmm0</span>
        <span class="nf">vzeroupper</span>
        <span class="nf">ret</span>
</code></pre></div></div>

<p>“Letting the knife do the work” means writing a correct program and
lifting unnecessary constraints so that the compiler can use whatever
chunk size is appropriate for the target.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>The Day I Fell in Love with Fuzzing</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/01/25/"/>
    <id>urn:uuid:9ab4d645-222e-37f6-0d41-6db1e5c126c6</id>
    <updated>2019-01-25T21:52:45Z</updated>
    <category term="c"/><category term="cpp"/>
    <content type="html">
      <![CDATA[<p><em>Follow-up: <a href="/blog/2025/02/05/">Tips for more effective fuzz testing with AFL++</a></em></p>

<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=19019048">on Hacker News</a> and <a href="https://old.reddit.com/r/programming/comments/akrcyp/the_day_i_fell_in_love_with_fuzzing/">on reddit</a>.</em></p>

<p>In 2007 I wrote a pair of modding tools, <a href="https://github.com/skeeto/binitools">binitools</a>, for a space
trading and combat simulation game named <a href="https://en.wikipedia.org/wiki/Freelancer_(video_game)"><em>Freelancer</em></a>. The game
stores its non-art assets in the format of “binary INI” files, or “BINI”
files. The motivation for the binary format over traditional INI files
was probably performance: it’s faster to load and read these files than
it is to parse arbitrary text in INI format.</p>

<!--more-->

<p>Much of the in-game content can be changed simply by modifying these
files — changing time names, editing commodity prices, tweaking ship
statistics, or even adding new ships to the game. The binary nature
makes them unsuitable to in-place modification, so the natural approach
is to convert them to text INI files, make the desired modifications
using a text editor, then convert back to the BINI format and replace
the file in the game’s installation.</p>

<p>I didn’t reverse engineer the BINI format, nor was I the first person
the create tools to edit them. The existing tools weren’t to my tastes,
and I had my own vision for how they should work — an interface more
closely following <a href="http://www.catb.org/esr/writings/taoup/html/">the Unix tradition</a> despite the target being a
Windows game.</p>

<p>When I got started, I had just learned how to use <a href="http://pubs.opengroup.org/onlinepubs/9699919799/utilities/yacc.html">yacc</a> (really
<a href="https://www.gnu.org/software/bison/">Bison</a>) and <a href="http://pubs.opengroup.org/onlinepubs/9699919799/utilities/lex.html">lex</a> (really <a href="https://github.com/westes/flex">flex</a>), as well as
Autoconf, so I went all-in with these newly-discovered tools. It was
exciting to try them out in a real-world situation, though I slavishly
aped the practices of other open source projects without really
understanding why things were they way they were. Due to the use of
yacc/lex and the configure script build, compiling the project required
a full, Unix-like environment. This is all visible in <a href="https://github.com/skeeto/binitools/tree/original">the original
version of the source</a>.</p>

<p>The project was moderately successful in two ways. First, I was able to
use the tools to modify the game. Second, other people were using the
tools, since the binaries I built show up in various collections of
Freelancer modding tools online.</p>

<h3 id="the-rewrite">The Rewrite</h3>

<p>That’s the way things were until mid-2018 when I revisited the project.
Ever look at your own old code and wonder what they heck you were
thinking? My INI format was far more rigid and strict than necessary, I
was doing questionable things when writing out binary data, and the
build wasn’t even working correctly.</p>

<p>With an additional decade of experience under my belt, I knew I could do
<em>way</em> better if I were to rewrite these tools today. So, over the course
of a few days, I did, from scratch. That’s what’s visible in the master
branch today.</p>

<p><a href="/blog/2017/03/30/">I like to keep things simple</a> which meant no more Autoconf, and
instead <a href="/blog/2017/08/20/">a simple, portable Makefile</a>. No more yacc or lex, and
instead a hand-coded parser. Using only conforming, portable C. The
result was so simple that I can <a href="/blog/2016/06/13/">build using Visual Studio</a> in a
single, short command, so the Makefile isn’t all that necessary. With
one small tweak (replace <code class="language-plaintext highlighter-rouge">stdint.h</code> with a <code class="language-plaintext highlighter-rouge">typedef</code>), I can even <a href="/blog/2018/04/13/">build
and run binitools in DOS</a>.</p>

<p>The new version is faster, leaner, cleaner, and simpler. It’s far more
flexible about its INI input, so its easier to use. But is it more
correct?</p>

<h3 id="fuzzing">Fuzzing</h3>

<p>I’ve been interested in <a href="https://labs.mwrinfosecurity.com/blog/what-the-fuzz/">fuzzing</a> for years, especially
<a href="http://lcamtuf.coredump.cx/afl/">american fuzzy lop</a>, or <em>afl</em>. However, I wasn’t having success
with it. I’d fuzz some of the tools I use regularly, and it wouldn’t
find anything of note, at least not before I gave up. I fuzzed <a href="https://github.com/skeeto/pdjson">my
JSON library</a>, and somehow it turned up nothing. Surely my
JSON parser couldn’t be <em>that</em> robust already, could it? Fuzzing just
wasn’t accomplishing anything for me. (As it turns out, my JSON
library <em>is</em> quite robust, thanks in large part to various
contributors!)</p>

<p>So I’ve got this relatively new INI parser, and while it can
successfully parse and correctly re-assemble the game’s original set of
BINI files, it hasn’t <em>really</em> been exercised that much. Surely there’s
something in here for a fuzzer to find. Plus I don’t even have to write
a line of code in order to run afl against it. The tools already read
from standard input by default, which is perfect.</p>

<p>Assuming you’ve got the necessary tools installed (make, gcc, afl),
here’s how easy it is to start fuzzing binitools:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ make CC=afl-gcc
$ mkdir in out
$ echo '[x]' &gt; in/empty
$ afl-fuzz -i in -o out -- ./bini
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">bini</code> utility takes INI as input and produces BINI as output, so
it’s far more interesting to fuzz than its inverse, <code class="language-plaintext highlighter-rouge">unbini</code>. Since
<code class="language-plaintext highlighter-rouge">unbini</code> parses relatively simple binary data, there are (probably) no
bugs for the fuzzer to find. I did try anyway just in case.</p>

<p><img src="/img/screenshot/afl.png" alt="" /></p>

<p>In my example above, I swapped out the default compiler for afl’s GCC
wrapper (<code class="language-plaintext highlighter-rouge">CC=afl-gcc</code>). It calls GCC in the background, but in doing so
adds its own instrumentation to the binary. When fuzzing, <code class="language-plaintext highlighter-rouge">afl-fuzz</code>
uses that instrumentation to monitor the program’s execution path. The
<a href="http://lcamtuf.coredump.cx/afl/technical_details.txt">afl whitepaper</a> explains the technical details.</p>

<p>I also created input and output directories, placing a minimal, working
example into the input directory, which gives afl a starting point. As
afl runs, it mutates a queue of inputs and observes the changes on the
program’s execution. The output directory contains the results and, more
importantly, a corpus of inputs that cause unique execution paths. In
other words, the fuzzer output will be lots of inputs that exercise many
different edge cases.</p>

<p>The most exciting and dreaded result is a crash. The first time I ran it
against binitools, <code class="language-plaintext highlighter-rouge">bini</code> had <em>many</em> such crashes. Within minutes, afl
was finding a number of subtle and interesting bugs in my program, which
was <em>incredibly</em> useful. It even discovered an unlikely <a href="https://github.com/skeeto/binitools/commit/b695aec7d0021299cbd83c8c6983055f16d11507">stale pointer
bug</a> by exercising different orderings for various memory
allocations. This particular bug was the turning point that made me
realize the value of fuzzing.</p>

<p>Not all the bugs it found led to crashes. I also combed through the
outputs to see what sorts of inputs were succeeding, what was failing,
and observe how my program handled various edge cases. It was rejecting
some inputs I thought should be valid, accepting some I thought should
be invalid, and interpreting some in ways I hadn’t intended. So even
after I fixed the crashing inputs, I still made tweaks to the parser to
fix each of these troublesome inputs.</p>

<h3 id="building-a-test-suite">Building a test suite</h3>

<p>Once I combed out all the fuzzer-discovered bugs, and I agreed with the
parser on how all the various edge cases should be handled, I turned the
fuzzer’s corpus into a test suite — though not directly.</p>

<p>I had run the fuzzer in parallel — a process that is explained in the
afl documentation — so I had lots of redundant inputs. By redundant I
mean that the inputs are different but have the same execution path.
Fortunately afl has a tool to deal with this: <code class="language-plaintext highlighter-rouge">afl-cmin</code>, the corpus
minimization tool. It eliminates all the redundant inputs.</p>

<p>Second, many of these inputs were longer than necessary in order to
invoke their unique execution path. There’s <code class="language-plaintext highlighter-rouge">afl-tmin</code>, the test case
minimizer, which I used to further shrink my test corpus.</p>

<p>I sorted the valid from invalid inputs and checked them into the
repository. Have a look at all the wacky inputs <a href="https://lcamtuf.blogspot.com/2014/11/pulling-jpegs-out-of-thin-air.html">invented by</a> the
fuzzer starting from my single, minimal input:</p>

<ul>
  <li><a href="https://github.com/skeeto/binitools/tree/master/tests/valid">valid inputs</a></li>
  <li><a href="https://github.com/skeeto/binitools/tree/master/tests/invalid">invalid inputs</a></li>
</ul>

<p>This essentially locks down the parser, and the test suite ensures a
particular build behaves in a <em>very</em> specific way. This is most useful
for ensuring that builds on other platforms and by other compilers are
indeed behaving identically with respect to their outputs. My test suite
even revealed a bug in diet libc, as binitools doesn’t pass the tests
when linked against it. If I were to make non-trivial changes to the
parser, I’d essentially need to scrap the current test suite and start
over, having afl generate an entire new corpus for the new parser.</p>

<p>Fuzzing has certainly proven itself to be a powerful technique. It found
a number of bugs that I likely wouldn’t have otherwise discovered on my
own. I’ve since gotten more savvy on its use and have used it on other
software — not just software I’ve written myself — and discovered more
bugs. It’s got a permanent slot on my software developer toolbelt.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>The Value of Undefined Behavior</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2018/07/20/"/>
    <id>urn:uuid:9758e9ea-46b6-3904-5166-52c7e6922892</id>
    <updated>2018-07-20T21:31:18Z</updated>
    <category term="c"/><category term="cpp"/><category term="x86"/><category term="optimization"/>
    <content type="html">
      <![CDATA[<p>In several places, the C and C++ language specifications use a
curious, and fairly controversial, phrase: <em>undefined behavior</em>. For
certain program constructs, the specification prescribes no specific
behavior, instead allowing <a href="http://www.catb.org/jargon/html/N/nasal-demons.html">anything to happen</a>. Such constructs
are considered erroneous, and so the result depends on the particulars
of the platform and implementation. The original purpose of undefined
behavior was for implementation flexibility. In other words, it’s
slack that allows a compiler to produce appropriate and efficient code
for its target platform.</p>

<p>Specifying a particular behavior would have put unnecessary burden on
implementations — especially in the earlier days of computing — making
for inefficient programs on some platforms. For example, if the result
of dereferencing a null pointer was defined to trap — to cause the
program to halt with an error — then platforms that do not have
hardware trapping, such as those without virtual memory, would be
required to instrument, in software, each pointer dereference.</p>

<p>In the 21st century, undefined behavior has taken on a somewhat
different meaning. Optimizers use it — or <em>ab</em>use it depending on your
point of view — to lift <a href="/blog/2016/12/22/">constraints</a> that would otherwise
inhibit more aggressive optimizations. It’s not so much a
fundamentally different application of undefined behavior, but it does
take the concept to an extreme.</p>

<p>The reasoning works like this: A program that evaluates a construct
whose behavior is undefined cannot, by definition, have any meaningful
behavior, and so that program would be useless. As a result,
<a href="http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html">compilers assume programs never invoke undefined behavior</a> and
use those assumptions to prove its optimizations.</p>

<p>Under this newer interpretation, mistakes involving undefined behavior
are more <a href="https://kristerw.blogspot.com/2017/09/why-undefined-behavior-may-call-never.html">punishing</a> and <a href="/blog/2018/05/01/">surprising</a> than before. Programs
that <em>seem</em> to make some sense when run on a particular architecture may
actually compile into a binary with a security vulnerability due to
conclusions reached from an analysis of its undefined behavior.</p>

<p>This can be frustrating if your programs are intended to run on a very
specific platform. In this situation, all behavior really <em>could</em> be
locked down and specified in a reasonable, predictable way. Such a
language would be like an extended, less portable version of C or C++.
But your toolchain still insists on running your program on the
<em>abstract machine</em> rather than the hardware you actually care about.
However, <strong>even in this situation undefined behavior can still be
desirable</strong>. I will provide a couple of examples in this article.</p>

<h3 id="signed-integer-overflow">Signed integer overflow</h3>

<p>To start things off, let’s look at one of my all time favorite examples
of useful undefined behavior, a situation involving signed integer
overflow. The result of a signed integer overflow isn’t just
unspecified, it’s undefined behavior. Full stop.</p>

<p>This goes beyond a simple matter of whether or not the underlying
machine uses a two’s complement representation. From the perspective of
the abstract machine, just the act a signed integer overflowing is
enough to throw everything out the window, even if the overflowed result
is never actually used in the program.</p>

<p>On the other hand, unsigned integer overflow is defined — or, more
accurately, defined to wrap, <em>not</em> overflow. Both the undefined signed
overflow and defined unsigned overflow are useful in different
situations.</p>

<p>For example, here’s a fairly common situation, much like what <a href="https://www.youtube.com/watch?v=yG1OZ69H_-o&amp;t=38m18s">actually
happened in bzip2</a>. Consider this function that does substring
comparison:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>
<span class="nf">cmp_signed</span><span class="p">(</span><span class="kt">int</span> <span class="n">i1</span><span class="p">,</span> <span class="kt">int</span> <span class="n">i2</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">buf</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">for</span> <span class="p">(;;)</span> <span class="p">{</span>
        <span class="kt">int</span> <span class="n">c1</span> <span class="o">=</span> <span class="n">buf</span><span class="p">[</span><span class="n">i1</span><span class="p">];</span>
        <span class="kt">int</span> <span class="n">c2</span> <span class="o">=</span> <span class="n">buf</span><span class="p">[</span><span class="n">i2</span><span class="p">];</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">c1</span> <span class="o">!=</span> <span class="n">c2</span><span class="p">)</span>
            <span class="k">return</span> <span class="n">c1</span> <span class="o">-</span> <span class="n">c2</span><span class="p">;</span>
        <span class="n">i1</span><span class="o">++</span><span class="p">;</span>
        <span class="n">i2</span><span class="o">++</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="kt">int</span>
<span class="nf">cmp_unsigned</span><span class="p">(</span><span class="kt">unsigned</span> <span class="n">i1</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="n">i2</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">buf</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">for</span> <span class="p">(;;)</span> <span class="p">{</span>
        <span class="kt">int</span> <span class="n">c1</span> <span class="o">=</span> <span class="n">buf</span><span class="p">[</span><span class="n">i1</span><span class="p">];</span>
        <span class="kt">int</span> <span class="n">c2</span> <span class="o">=</span> <span class="n">buf</span><span class="p">[</span><span class="n">i2</span><span class="p">];</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">c1</span> <span class="o">!=</span> <span class="n">c2</span><span class="p">)</span>
            <span class="k">return</span> <span class="n">c1</span> <span class="o">-</span> <span class="n">c2</span><span class="p">;</span>
        <span class="n">i1</span><span class="o">++</span><span class="p">;</span>
        <span class="n">i2</span><span class="o">++</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>In this function, the indices <code class="language-plaintext highlighter-rouge">i1</code> and <code class="language-plaintext highlighter-rouge">i2</code> will always be some small,
non-negative value. Since it’s non-negative, it should be <code class="language-plaintext highlighter-rouge">unsigned</code>,
right? Not necessarily. That puts an extra constraint on code generation
and, at least on x86-64, makes for a less efficient function. Most of
the time you actually <em>don’t</em> want overflow to be defined, and instead
allow the compiler to assume it just doesn’t happen.</p>

<p>The constraint is that <strong>the behavior of <code class="language-plaintext highlighter-rouge">i1</code> or <code class="language-plaintext highlighter-rouge">i2</code> overflowing as an
unsigned integer is defined, and the compiler is obligated to implement
that behavior.</strong> On x86-64, where <code class="language-plaintext highlighter-rouge">int</code> is 32 bits, the result of the
operation must be truncated to 32 bits one way or another, requiring
extra instructions inside the loop.</p>

<p>In the signed case, incrementing the integers cannot overflow since that
would be undefined behavior. This permits the compiler to perform the
increment only in 64-bit precision without truncation if it would be
more efficient, which, in this case, it is.</p>

<p>Here’s the output of Clang 6.0.0 with <code class="language-plaintext highlighter-rouge">-Os</code> on x86-64. Pay close
attention to the main loop, which I named <code class="language-plaintext highlighter-rouge">.loop</code>:</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">cmp_signed:</span>
        <span class="nf">movsxd</span> <span class="nb">rdi</span><span class="p">,</span> <span class="nb">edi</span>             <span class="c1">; use i1 as a 64-bit integer</span>
        <span class="nf">mov</span>    <span class="nb">al</span><span class="p">,</span> <span class="p">[</span><span class="nb">rdx</span> <span class="o">+</span> <span class="nb">rdi</span><span class="p">]</span>
        <span class="nf">movsxd</span> <span class="nb">rsi</span><span class="p">,</span> <span class="nb">esi</span>             <span class="c1">; use i2 as a 64-bit integer</span>
        <span class="nf">mov</span>    <span class="nb">cl</span><span class="p">,</span> <span class="p">[</span><span class="nb">rdx</span> <span class="o">+</span> <span class="nb">rsi</span><span class="p">]</span>
        <span class="nf">jmp</span>    <span class="nv">.check</span>

<span class="nl">.loop:</span>  <span class="nf">mov</span>    <span class="nb">al</span><span class="p">,</span> <span class="p">[</span><span class="nb">rdx</span> <span class="o">+</span> <span class="nb">rdi</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span>
        <span class="nf">mov</span>    <span class="nb">cl</span><span class="p">,</span> <span class="p">[</span><span class="nb">rdx</span> <span class="o">+</span> <span class="nb">rsi</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span>
        <span class="nf">inc</span>    <span class="nb">rdx</span>                  <span class="c1">; increment only the base pointer</span>
<span class="nl">.check:</span> <span class="nf">cmp</span>    <span class="nb">al</span><span class="p">,</span> <span class="nb">cl</span>
        <span class="nf">je</span>     <span class="nv">.loop</span>

        <span class="nf">movzx</span>  <span class="nb">eax</span><span class="p">,</span> <span class="nb">al</span>
        <span class="nf">movzx</span>  <span class="nb">ecx</span><span class="p">,</span> <span class="nb">cl</span>
        <span class="nf">sub</span>    <span class="nb">eax</span><span class="p">,</span> <span class="nb">ecx</span>             <span class="c1">; return c1 - c2</span>
        <span class="nf">ret</span>

<span class="nl">cmp_unsigned:</span>
        <span class="nf">mov</span>    <span class="nb">eax</span><span class="p">,</span> <span class="nb">edi</span>
        <span class="nf">mov</span>    <span class="nb">al</span><span class="p">,</span> <span class="p">[</span><span class="nb">rdx</span> <span class="o">+</span> <span class="nb">rax</span><span class="p">]</span>
        <span class="nf">mov</span>    <span class="nb">ecx</span><span class="p">,</span> <span class="nb">esi</span>
        <span class="nf">mov</span>    <span class="nb">cl</span><span class="p">,</span> <span class="p">[</span><span class="nb">rdx</span> <span class="o">+</span> <span class="nb">rcx</span><span class="p">]</span>
        <span class="nf">cmp</span>    <span class="nb">al</span><span class="p">,</span> <span class="nb">cl</span>
        <span class="nf">jne</span>    <span class="nv">.ret</span>
        <span class="nf">inc</span>    <span class="nb">edi</span>
        <span class="nf">inc</span>    <span class="nb">esi</span>

<span class="nl">.loop:</span>  <span class="nf">mov</span>    <span class="nb">eax</span><span class="p">,</span> <span class="nb">edi</span>             <span class="c1">; truncated i1 overflow</span>
        <span class="nf">mov</span>    <span class="nb">al</span><span class="p">,</span> <span class="p">[</span><span class="nb">rdx</span> <span class="o">+</span> <span class="nb">rax</span><span class="p">]</span>
        <span class="nf">mov</span>    <span class="nb">ecx</span><span class="p">,</span> <span class="nb">esi</span>             <span class="c1">; truncated i2 overflow</span>
        <span class="nf">mov</span>    <span class="nb">cl</span><span class="p">,</span> <span class="p">[</span><span class="nb">rdx</span> <span class="o">+</span> <span class="nb">rcx</span><span class="p">]</span>
        <span class="nf">inc</span>    <span class="nb">edi</span>                  <span class="c1">; increment i1</span>
        <span class="nf">inc</span>    <span class="nb">esi</span>                  <span class="c1">; increment i2</span>
        <span class="nf">cmp</span>    <span class="nb">al</span><span class="p">,</span> <span class="nb">cl</span>
        <span class="nf">je</span>     <span class="nv">.loop</span>

<span class="nl">.ret:</span>   <span class="nf">movzx</span>  <span class="nb">eax</span><span class="p">,</span> <span class="nb">al</span>
        <span class="nf">movzx</span>  <span class="nb">ecx</span><span class="p">,</span> <span class="nb">cl</span>
        <span class="nf">sub</span>    <span class="nb">eax</span><span class="p">,</span> <span class="nb">ecx</span>
        <span class="nf">ret</span>
</code></pre></div></div>

<p>As unsigned values, <code class="language-plaintext highlighter-rouge">i1</code> and <code class="language-plaintext highlighter-rouge">i2</code> can overflow independently, so they
have to be handled as independent 32-bit unsigned integers. As signed
values they can’t overflow, so they’re treated as if they were 64-bit
integers and, instead, the pointer, <code class="language-plaintext highlighter-rouge">buf</code>, is incremented without
concern for overflow. The signed loop is much more efficient (5
instructions versus 8).</p>

<p>The signed integer helps to communicate the <em>narrow contract</em> of the
function — the limited range of <code class="language-plaintext highlighter-rouge">i1</code> and <code class="language-plaintext highlighter-rouge">i2</code> — to the compiler. In a
variant of C where signed integer overflow is defined (i.e. <code class="language-plaintext highlighter-rouge">-fwrapv</code>),
this capability is lost. In fact, using <code class="language-plaintext highlighter-rouge">-fwrapv</code> deoptimizes the signed
version of this function.</p>

<p>Side note: Using <code class="language-plaintext highlighter-rouge">size_t</code> (an unsigned integer) is even better on x86-64
for this example since it’s already 64 bits and the function doesn’t
need the initial sign/zero extension. However, this might simply move
the sign extension out to the caller.</p>

<h3 id="strict-aliasing">Strict aliasing</h3>

<p>Another controversial undefined behavior is <a href="https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8"><em>strict aliasing</em></a>.
This particular term doesn’t actually appear anywhere in the C
specification, but it’s the popular name for C’s aliasing rules. In
short, variables with types that aren’t compatible are not allowed to
alias through pointers.</p>

<p>Here’s the classic example:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>
<span class="nf">foo</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">b</span><span class="p">)</span>
<span class="p">{</span>
    <span class="o">*</span><span class="n">b</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>    <span class="c1">// store</span>
    <span class="o">*</span><span class="n">a</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>    <span class="c1">// store</span>
    <span class="k">return</span> <span class="o">*</span><span class="n">b</span><span class="p">;</span> <span class="c1">// load</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Naively one might assume the <code class="language-plaintext highlighter-rouge">return *b</code> could be optimized to a simple
<code class="language-plaintext highlighter-rouge">return 0</code>. However, since <code class="language-plaintext highlighter-rouge">a</code> and <code class="language-plaintext highlighter-rouge">b</code> have the same type, the compiler
must consider the possibility that they alias — that they point to the
same place in memory — and must generate code that works correctly under
these conditions.</p>

<p>If <code class="language-plaintext highlighter-rouge">foo</code> has a narrow contract that forbids <code class="language-plaintext highlighter-rouge">a</code> and <code class="language-plaintext highlighter-rouge">b</code> to alias, we
have a couple of options for helping our compiler.</p>

<p>First, we could manually resolve the aliasing issue by returning 0
explicitly. In more complicated functions this might mean making local
copies of values, working only with those local copies, then storing the
results back before returning. Then aliasing would no longer matter.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>
<span class="nf">foo</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">b</span><span class="p">)</span>
<span class="p">{</span>
    <span class="o">*</span><span class="n">b</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="o">*</span><span class="n">a</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Second, C99 introduced a <code class="language-plaintext highlighter-rouge">restrict</code> qualifier to communicate to the
compiler that pointers passed to functions cannot alias. For example,
the pointers to <code class="language-plaintext highlighter-rouge">memcpy()</code> are qualified with <code class="language-plaintext highlighter-rouge">restrict</code> as of C99.
Passing aliasing pointers through <code class="language-plaintext highlighter-rouge">restrict</code> parameters is undefined
behavior, e.g. this doesn’t ever happen as far as a compiler is
concerned.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="kr">restrict</span> <span class="n">a</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="kr">restrict</span> <span class="n">b</span><span class="p">);</span>
</code></pre></div></div>

<p>The third option is to design an interface that uses incompatible
types, exploiting strict aliasing. This happens all the time, usually
by accident. For example, <code class="language-plaintext highlighter-rouge">int</code> and <code class="language-plaintext highlighter-rouge">long</code> are never compatible even
when they have the same representation.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="kt">long</span> <span class="o">*</span><span class="n">b</span><span class="p">);</span>
</code></pre></div></div>

<p>If you use an extended or modified version of C without strict
aliasing (<code class="language-plaintext highlighter-rouge">-fno-strict-aliasing</code>), then the compiler must assume
everything aliases all the time, generating a lot more precautionary
loads than necessary.</p>

<p>What <a href="https://lkml.org/lkml/2003/2/26/158">irritates</a> a lot of people is that compilers will still
apply the strict aliasing rule even when it’s trivial for the compiler
to prove that aliasing is occurring:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* note: forbidden */</span>
<span class="kt">long</span> <span class="n">a</span><span class="p">;</span>
<span class="kt">int</span> <span class="o">*</span><span class="n">b</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">a</span><span class="p">;</span>
</code></pre></div></div>

<p>It’s not just a simple matter of making exceptions for these cases.
The language specification would need to define all the rules about
when and where incompatible types are permitted to alias, and
developers would have to understand all these rules if they wanted to
take advantage of the exceptions. It can’t just come down to trusting
that the compiler is smart enough to see the aliasing when it’s
sufficiently simple. It would need to be carefully defined.</p>

<p>Besides, there are probably <a href="https://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html">conforming, portable solutions</a>
that, with contemporary compilers, will safely compile to the efficient
code you actually want anyway.</p>

<p>There <em>is</em> one special exception for strict aliasing: <code class="language-plaintext highlighter-rouge">char *</code> is
allowed to alias with anything. This is important to keep in mind both
when you intentionally want aliasing, but also when you want to avoid
it. Writing through a <code class="language-plaintext highlighter-rouge">char *</code> pointer could force the compiler to
generate additional, unnecessary loads.</p>

<p>In fact, there’s a whole dimension to strict aliasing that, even today,
no compiler yet exploits: <code class="language-plaintext highlighter-rouge">uint8_t</code> is not necessarily <code class="language-plaintext highlighter-rouge">unsigned char</code>.
That’s just one possible <code class="language-plaintext highlighter-rouge">typedef</code> definition for it. It could instead
<code class="language-plaintext highlighter-rouge">typedef</code> to, say, some internal <code class="language-plaintext highlighter-rouge">__byte</code> type.</p>

<p>In other words, technically speaking, <code class="language-plaintext highlighter-rouge">uint8_t</code> does not have the strict
aliasing exemption. If you wanted to write bytes to a buffer without
worrying the compiler about aliasing issues with other pointers, this
would be the tool to accomplish it. Unfortunately there’s far too much
existing code that violates this part of strict aliasing that no
toolchain is <a href="https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66110">willing to exploit it</a> for optimization purposes.</p>

<h3 id="other-undefined-behaviors">Other undefined behaviors</h3>

<p>Some kinds of undefined behavior don’t have performance or portability
benefits. They’re only there to make the compiler’s job a little
simpler. Today, most of these are caught trivially at compile time as
syntax or semantic issues (i.e. a pointer cast to a float).</p>

<p>Some others are obvious about their performance benefits and don’t
require much explanation. For example, it’s undefined behavior to
index out of bounds (with some special exceptions for one past the
end), meaning compilers are not obligated to generate those checks,
instead relying on the programmer to arrange, by whatever means, that
it doesn’t happen.</p>

<p>Undefined behavior is like nitro, a dangerous, volatile substance that
makes things go really, really fast. You could argue that it’s <em>too</em>
dangerous to use in practice, but the aggressive use of undefined
behavior is <a href="http://thoughtmesh.net/publish/367.php">not without merit</a>.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Building and Installing Software in $HOME</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2017/06/19/"/>
    <id>urn:uuid:ae490550-a3b8-3b8f-4338-c2aba7306c8f</id>
    <updated>2017-06-19T02:34:39Z</updated>
    <category term="linux"/><category term="tutorial"/><category term="debian"/><category term="c"/><category term="cpp"/>
    <content type="html">
      <![CDATA[<p>For more than 5 years now I’ve kept a private “root” filesystem within
my home directory under <code class="language-plaintext highlighter-rouge">$HOME/.local/</code>. Within are the standard
<code class="language-plaintext highlighter-rouge">/usr</code> directories, such as <code class="language-plaintext highlighter-rouge">bin/</code>, <code class="language-plaintext highlighter-rouge">include/</code>, <code class="language-plaintext highlighter-rouge">lib/</code>, etc.,
containing my own software, libraries, and man pages. These are
first-class citizens, indistinguishable from the system-installed
programs and libraries. With one exception (setuid programs), none of
this requires root privileges.</p>

<p>Installing software in $HOME serves two important purposes, both of
which are indispensable to me on a regular basis.</p>

<ul>
  <li><strong>No root access</strong>: Sometimes I’m using a system administered by
someone else, and I don’t have root access.</li>
</ul>

<p>This prevents me from installing packaged software myself through the
system’s package manager. Building and installing the software myself in
my home directory, without involvement from the system administrator,
neatly works around this issue. As a software developer, it’s already
perfectly normal for me to build and run custom software, and this is
just an extension of that behavior.</p>

<p>In the most desperate situation, all I need from the sysadmin is a
decent C compiler and at least a minimal POSIX environment. I can
<a href="/blog/2016/11/17/">bootstrap anything I might need</a>, both libraries and
programs, including a better C compiler along the way. This is one
major strength of open source software.</p>

<p>I have noticed one alarming trend: Both GCC (since 4.8) and Clang are
written in C++, so it’s becoming less and less reasonable to bootstrap
a C++ compiler from a C compiler, or even from a C++ compiler that’s
more than a few years old. So you may also need your sysadmin to
supply a fairly recent C++ compiler if you want to bootstrap an
environment that includes C++. I’ve had to avoid some C++ software
(such as CMake) for this reason.</p>

<ul>
  <li><strong>Custom software builds</strong>: Even if I <em>am</em> root, I may still want to
install software not available through the package manager, a version
not available in the package manager, or a version with custom
patches.</li>
</ul>

<p>In theory this is what <code class="language-plaintext highlighter-rouge">/usr/local</code> is all about. It’s typically the
location for software not managed by the system’s package manager.
However, I think it’s cleaner to put this in <code class="language-plaintext highlighter-rouge">$HOME/.local</code>, so long
as other system users don’t need it.</p>

<p>For example, I have an installation of each version of Emacs between
24.3 (the oldest version worth supporting) through the latest stable
release, each suffixed with its version number, under <code class="language-plaintext highlighter-rouge">$HOME/.local</code>.
This is useful for quickly running a test suite under different
releases.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git clone https://github.com/skeeto/elfeed
$ cd elfeed/
$ make EMACS=emacs24.3 clean test
...
$ make EMACS=emacs25.2 clean test
...
</code></pre></div></div>

<p>Another example is NetHack, which I prefer to play with a couple of
custom patches (<a href="https://bilious.alt.org/?11">Menucolors</a>, <a href="https://gist.github.com/skeeto/11fed852dbfe9889a5fce80e9f6576ac">wchar</a>). The install to
<code class="language-plaintext highlighter-rouge">$HOME/.local</code> <a href="https://gist.github.com/skeeto/5cb9d5e774ce62655aff3507cb806981">is also captured as a patch</a>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ tar xzf nethack-343-src.tar.gz
$ cd nethack-3.4.3/
$ patch -p1 &lt; ~/nh343-menucolor.diff
$ patch -p1 &lt; ~/nh343-wchar.diff
$ patch -p1 &lt; ~/nh343-home-install.diff
$ sh sys/unix/setup.sh
$ make -j$(nproc) install
</code></pre></div></div>

<p>Normally NetHack wants to be setuid (e.g. run as the “games” user) in
order to restrict access to high scores, saves, and bones — saved levels
where a player died, to be inserted randomly into other players’ games.
This prevents cheating, but requires root to set up. Fortunately, when I
install NetHack in my home directory, this isn’t a feature I actually
care about, so I can ignore it.</p>

<p><a href="/blog/2017/06/15/">Mutt</a> is in a similar situation, since it wants to install a
special setgid program (<code class="language-plaintext highlighter-rouge">mutt_dotlock</code>) that synchronizes mailbox
access. All MUAs need something like this.</p>

<p>Everything described below is relevant to basically any modern
unix-like system: Linux, BSD, etc. I personally install software in
$HOME across a variety of systems and, fortunately, it mostly works
the same way everywhere. This is probably in large part due to
everyone standardizing around the GCC and GNU binutils interfaces,
even if the system compiler is actually LLVM/Clang.</p>

<h3 id="configuring-for-home-installs">Configuring for $HOME installs</h3>

<p>Out of the box, installing things in <code class="language-plaintext highlighter-rouge">$HOME/.local</code> won’t do anything
useful. You need to set up some environment variables in your shell
configuration (i.e. <code class="language-plaintext highlighter-rouge">.profile</code>, <code class="language-plaintext highlighter-rouge">.bashrc</code>, etc.) to tell various
programs, such as your shell, about it. The most obvious variable is
$PATH:</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">PATH</span><span class="o">=</span><span class="nv">$HOME</span>/.local/bin:<span class="nv">$PATH</span>
</code></pre></div></div>

<p>Notice I put it in the front of the list. This is because I want my
home directory programs to override system programs with the same
name. For what other reason would I install a program with the same
name if not to override the system program?</p>

<p>In the simplest situation this is good enough, but in practice you’ll
probably need to set a few more things. If you install libraries in
your home directory and expect to use them just as if they were
installed on the system, you’ll need to tell the compiler where else
to look for those headers and libraries, both for C and C++.</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">C_INCLUDE_PATH</span><span class="o">=</span><span class="nv">$HOME</span>/.local/include
<span class="nb">export </span><span class="nv">CPLUS_INCLUDE_PATH</span><span class="o">=</span><span class="nv">$HOME</span>/.local/include
<span class="nb">export </span><span class="nv">LIBRARY_PATH</span><span class="o">=</span><span class="nv">$HOME</span>/.local/lib
</code></pre></div></div>

<p>The first two are like the <code class="language-plaintext highlighter-rouge">-I</code> compiler option and the third is like
<code class="language-plaintext highlighter-rouge">-L</code> linker option, except you <em>usually</em> won’t need to use them
explicitly. Unfortunately <code class="language-plaintext highlighter-rouge">LIBRARY_PATH</code> doesn’t override the system
library paths, so in some cases, you will need to explicitly set
<code class="language-plaintext highlighter-rouge">-L</code>. Otherwise you will still end up linking against the system library
rather than the custom packaged version. I really wish GCC and Clang
didn’t behave this way.</p>

<p>Some software uses <code class="language-plaintext highlighter-rouge">pkg-config</code> to determine its compiler and linker
flags, and your home directory will contain some of the needed
information. So set that up too:</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">PKG_CONFIG_PATH</span><span class="o">=</span><span class="nv">$HOME</span>/.local/lib/pkgconfig
</code></pre></div></div>

<h4 id="run-time-linker">Run-time linker</h4>

<p>Finally, when you install libraries in your home directory, the run-time
dynamic linker will need to know where to find them. There are three
ways to deal with this:</p>

<ol>
  <li>The <a href="https://web.archive.org/web/20090312014334/http://blogs.sun.com/rie/entry/tt_ld_library_path_tt">crude, easy way</a>: <code class="language-plaintext highlighter-rouge">LD_LIBRARY_PATH</code>.</li>
  <li>The elegant, difficult way: ELF runpath.</li>
  <li>Screw it, just statically link the bugger. (Not always possible.)</li>
</ol>

<p>For the crude way, point the run-time linker at your <code class="language-plaintext highlighter-rouge">lib/</code> and you’re
done:</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">LD_LIBRARY_PATH</span><span class="o">=</span><span class="nv">$HOME</span>/.local/lib
</code></pre></div></div>

<p>However, this is like using a shotgun to kill a fly. If you install a
library in your home directory that is also installed on the system,
and then run a system program, it may be linked against <em>your</em> library
rather than the library installed on the system as was originally
intended. This could have detrimental effects.</p>

<p>The precision method is to set the ELF “runpath” value. It’s like a
per-binary <code class="language-plaintext highlighter-rouge">LD_LIBRARY_PATH</code>. The run-time linker uses this path first
in its search for libraries, and it will only have an effect on that
particular program/library. This also applies to <code class="language-plaintext highlighter-rouge">dlopen()</code>.</p>

<p>Some software will configure the runpath by default in their build
system, but often you need to configure this yourself. The simplest way
is to set the <code class="language-plaintext highlighter-rouge">LD_RUN_PATH</code> environment variable when building software.
Another option is to manually pass <code class="language-plaintext highlighter-rouge">-rpath</code> options to the linker via
<code class="language-plaintext highlighter-rouge">LDFLAGS</code>. It’s used directly like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -Wl,-rpath=$HOME/.local/lib -o foo bar.o baz.o -lquux
</code></pre></div></div>

<p>Verify with <code class="language-plaintext highlighter-rouge">readelf</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ readelf -d foo | grep runpath
Library runpath: [/home/username/.local/lib]
</code></pre></div></div>

<p>ELF supports a special <code class="language-plaintext highlighter-rouge">$ORIGIN</code> “variable” set to the binary’s
location. This allows the program and associated libraries to be
installed anywhere without changes, so long as they have the same
relative position to each other . (Note the quotes to prevent shell
interpolation.)</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -Wl,-rpath='$ORIGIN/../lib' -o foo bar.o baz.o -lquux
</code></pre></div></div>

<p>There is one situation where <code class="language-plaintext highlighter-rouge">runpath</code> won’t work: when you want a
system-installed program to find a home directory library with
<code class="language-plaintext highlighter-rouge">dlopen()</code> — e.g. as an extension to that program. You either need to
ensure it uses a relative or absolute path (i.e. the argument to
<code class="language-plaintext highlighter-rouge">dlopen()</code> contains a slash) or you must use <code class="language-plaintext highlighter-rouge">LD_LIBRARY_PATH</code>.</p>

<p>Personally, I always use the <a href="https://www.jwz.org/doc/worse-is-better.html">Worse is Better</a> <code class="language-plaintext highlighter-rouge">LD_LIBRARY_PATH</code>
shotgun. Occasionally it’s caused some annoying issues, but the vast
majority of the time it gets the job done with little fuss. This is
just my personal development environment, after all, not a production
server.</p>

<h4 id="manual-pages">Manual pages</h4>

<p>Another potentially tricky issue is man pages. When a program or
library installs a man page in your home directory, it would certainly
be nice to access it with <code class="language-plaintext highlighter-rouge">man &lt;topic&gt;</code> just like it was installed on
the system. Fortunately, Debian and Debian-derived systems, using a
mechanism I haven’t yet figured out, discover home directory man pages
automatically without any assistance. No configuration needed.</p>

<p>It’s more complicated on other systems, such as the BSDs. You’ll need to
set the <code class="language-plaintext highlighter-rouge">MANPATH</code> variable to include <code class="language-plaintext highlighter-rouge">$HOME/.local/share/man</code>. It’s
unset by default and it overrides the system settings, which means you
need to manually include the system paths. The <code class="language-plaintext highlighter-rouge">manpath</code> program can
help with this … if it’s available.</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">MANPATH</span><span class="o">=</span><span class="nv">$HOME</span>/.local/share/man:<span class="si">$(</span>manpath<span class="si">)</span>
</code></pre></div></div>

<p>I haven’t figured out a portable way to deal with this issue, so I
mostly ignore it.</p>

<h3 id="how-to-install-software-in-home">How to install software in $HOME</h3>

<p>While I’ve <a href="/blog/2017/03/30/">poo-pooed autoconf</a> in the past, the standard
<code class="language-plaintext highlighter-rouge">configure</code> script usually makes it trivial to build and install
software in $HOME. The key ingredient is the <code class="language-plaintext highlighter-rouge">--prefix</code> option:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ tar xzf name-version.tar.gz
$ cd name-version/
$ ./configure --prefix=$HOME/.local
$ make -j$(nproc)
$ make install
</code></pre></div></div>

<p>Most of the time it’s that simple! If you’re linking against your own
libraries and want to use <code class="language-plaintext highlighter-rouge">runpath</code>, it’s a little more complicated:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./configure --prefix=$HOME/.local \
              LDFLAGS="-Wl,-rpath=$HOME/.local/lib"
</code></pre></div></div>

<p>For <a href="https://cmake.org/">CMake</a>, there’s <code class="language-plaintext highlighter-rouge">CMAKE_INSTALL_PREFIX</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cmake -DCMAKE_INSTALL_PREFIX=$HOME/.local ..
</code></pre></div></div>

<p>The CMake builds I’ve seen use ELF runpath by default, and no further
configuration may be required to make that work. I’m sure that’s not
always the case, though.</p>

<p>Some software is just a single, static, standalone binary with
<a href="/blog/2016/11/15/">everything baked in</a>. It doesn’t need to be given a prefix, and
installation is as simple as copying the binary into place. For example,
<a href="https://github.com/skeeto/enchive">Enchive</a> works like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git clone https://github.com/skeeto/enchive
$ cd enchive/
$ make
$ cp enchive ~/.local/bin
</code></pre></div></div>

<p>Some software uses its own unique configuration interface. I can respect
that, but it does add some friction for users who now have something
additional and non-transferable to learn. I demonstrated a NetHack build
above, which has a configuration much more involved than it really
should be. Another example is LuaJIT, which uses <code class="language-plaintext highlighter-rouge">make</code> variables that
must be provided consistently on every invocation:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ tar xzf LuaJIT-2.0.5.tar.gz
$ cd LuaJIT-2.0.5/
$ make -j$(nproc) PREFIX=$HOME/.local
$ make PREFIX=$HOME/.local install
</code></pre></div></div>

<p>(You <em>can</em> use the “install” target to both build and install, but I
wanted to illustrate the repetition of <code class="language-plaintext highlighter-rouge">PREFIX</code>.)</p>

<p>Some libraries aren’t so smart about <code class="language-plaintext highlighter-rouge">pkg-config</code> and need some
handholding — for example, <a href="https://www.gnu.org/software/ncurses/">ncurses</a>. I mention it because
it’s required for both Vim and Emacs, among many others, so I’m often
building it myself. It ignores <code class="language-plaintext highlighter-rouge">--prefix</code> and needs to be told a
second time where to install things:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./configure --prefix=$HOME/.local \
              --enable-pc-files \
              --with-pkg-config-libdir=$PKG_CONFIG_PATH
</code></pre></div></div>

<p>Another issue is that a whole lot of software has been hardcoded for
ncurses 5.x (i.e. <code class="language-plaintext highlighter-rouge">ncurses5-config</code>), and it requires hacks/patching
to make it behave properly with ncurses 6.x. I’ve avoided ncurses 6.x
for this reason.</p>

<h3 id="learning-through-experience">Learning through experience</h3>

<p>I could go on and on like this, discussing the quirks for the various
libraries and programs that I use. Over the years I’ve gotten used to
many of these issues, committing the solutions to memory.
Unfortunately, even within the same version of a piece of software,
the quirks can change <a href="https://www.debian.org/News/2017/20170617.en.html">between major operating system
releases</a>, so I’m continuously learning my way around new
issues. It’s really given me an appreciation for all the hard work
that package maintainers put into customizing and maintaining software
builds to <a href="https://www.debian.org/doc/manuals/maint-guide/">fit properly into a larger ecosystem</a>.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>The Vulgarness of Abbreviated Function Templates</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/10/02/"/>
    <id>urn:uuid:048f746a-de7f-3357-8409-cfd531363726</id>
    <updated>2016-10-02T23:59:59Z</updated>
    <category term="c"/><category term="cpp"/><category term="rant"/><category term="lang"/>
    <content type="html">
      <![CDATA[<p>The <code class="language-plaintext highlighter-rouge">auto</code> keyword has been a part of C and C++ since the very
beginning, originally as a one of the four <em>storage class specifiers</em>:
<code class="language-plaintext highlighter-rouge">auto</code>, <code class="language-plaintext highlighter-rouge">register</code>, <code class="language-plaintext highlighter-rouge">static</code>, and <code class="language-plaintext highlighter-rouge">extern</code>. An <code class="language-plaintext highlighter-rouge">auto</code> variable has
“automatic storage duration,” meaning it is automatically allocated at
the beginning of its scope and deallocated at the end. It’s the
default storage class for any variable without external linkage or
without <code class="language-plaintext highlighter-rouge">static</code> storage, so the vast majority of variables in a
typical C program are automatic.</p>

<p>In C and C++ <em>prior to C++11</em>, the following definitions are
equivalent because the <code class="language-plaintext highlighter-rouge">auto</code> is implied.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>
<span class="nf">square</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">x2</span> <span class="o">=</span> <span class="n">x</span> <span class="o">*</span> <span class="n">x</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">x2</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">int</span>
<span class="nf">square</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">auto</span> <span class="kt">int</span> <span class="n">x2</span> <span class="o">=</span> <span class="n">x</span> <span class="o">*</span> <span class="n">x</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">x2</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>As a holdover from <em>really</em> old school C, unspecified types in C are
implicitly <code class="language-plaintext highlighter-rouge">int</code>, and even today you can get away with weird stuff
like this:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* C only */</span>
<span class="n">square</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">auto</span> <span class="n">x2</span> <span class="o">=</span> <span class="n">x</span> <span class="o">*</span> <span class="n">x</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">x2</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>By “get away with” I mean in terms of the compiler accepting this as
valid input. Your co-workers, on the other hand, may become violent.</p>

<p>Like <code class="language-plaintext highlighter-rouge">register</code>, as a storage class <code class="language-plaintext highlighter-rouge">auto</code> is an historical artifact
without direct practical use in modern code. However, as a <em>concept</em>
it’s indispensable for the specification. In practice, automatic
storage means the variables lives on “the” stack (or <a href="http://clang.llvm.org/docs/SafeStack.html">one of the
stacks</a>), but the specifications make no mention of a
stack. In fact, the word “stack” doesn’t appear even once. Instead
it’s all described in terms of “automatic storage,” rightfully leaving
the details to the implementations. A stack is the most sensible
approach the vast majority of the time, particularly because it’s both
thread-safe and re-entrant.</p>

<h3 id="c11-type-inference">C++11 Type Inference</h3>

<p>One of the major changes in C++11 was repurposing the <code class="language-plaintext highlighter-rouge">auto</code> keyword,
moving it from a storage class specifier to a a <em>type specifier</em>. In
C++11, the compiler <strong>infers the type of an <code class="language-plaintext highlighter-rouge">auto</code> variable from its
initializer</strong>. In C++14, it’s also permitted for a function’s return
type, inferred from the <code class="language-plaintext highlighter-rouge">return</code> statement.</p>

<p>This new specifier is very useful in idiomatic C++ with its
ridiculously complex types. Transient variables, such as variables
bound to iterators in a loop, don’t need a redundant type
specification. It keeps code <em>DRY</em> (“Don’t Repeat Yourself”). Also,
templates easier to write, since it makes the compiler do more of the
work. The necessary type information is already semantically present,
and the compiler is a lot better at dealing with it.</p>

<p>With this change, the following is valid in both C and C++11, and, by
<em>sheer coincidence</em>, has the same meaning, but for entirely different
reasons.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>
<span class="nf">square</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">auto</span> <span class="n">x2</span> <span class="o">=</span> <span class="n">x</span> <span class="o">*</span> <span class="n">x</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">x2</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>In C the type is implied as <code class="language-plaintext highlighter-rouge">int</code>, and in C++11 the type is inferred
from the type of <code class="language-plaintext highlighter-rouge">x * x</code>, which, in this case, is <code class="language-plaintext highlighter-rouge">int</code>. The prior
example with <code class="language-plaintext highlighter-rouge">auto int x2</code>, valid in C++98 and C++03, is no longer
valid in C++11 since <code class="language-plaintext highlighter-rouge">auto</code> and <code class="language-plaintext highlighter-rouge">int</code> are redundant type specifiers.</p>

<p>Occasionally I wish I had something like <code class="language-plaintext highlighter-rouge">auto</code> in C. If I’m writing a
<code class="language-plaintext highlighter-rouge">for</code> loop from 0 to <code class="language-plaintext highlighter-rouge">n</code>, I’d like the loop variable to be the same
type as <code class="language-plaintext highlighter-rouge">n</code>, even if I decide to change the type of <code class="language-plaintext highlighter-rouge">n</code> in the future.
For example,</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">foo</span> <span class="o">*</span><span class="n">foo</span> <span class="o">=</span> <span class="n">foo_create</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">foo</span><span class="o">-&gt;</span><span class="n">n</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
    <span class="cm">/* ... */</span><span class="p">;</span>
</code></pre></div></div>

<p>The loop variable <code class="language-plaintext highlighter-rouge">i</code> should be the same type as <code class="language-plaintext highlighter-rouge">foo-&gt;n</code>. If I decide
to change the type of <code class="language-plaintext highlighter-rouge">foo-&gt;n</code> in the struct definition, I’d have to
find and update every loop. The idiomatic C solution is to <code class="language-plaintext highlighter-rouge">typedef</code>
the integer, using the new type both in the struct and in loops, but I
don’t think that’s much better.</p>

<h3 id="abbreviated-function-templates">Abbreviated Function Templates</h3>

<p>Why is all this important? Well, I was recently reviewing some C++ and
came across this odd specimen. I’d never seen anything like it before.
Notice the use of <code class="language-plaintext highlighter-rouge">auto</code> for the parameter types.</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span>
<span class="nf">set_odd</span><span class="p">(</span><span class="k">auto</span> <span class="n">first</span><span class="p">,</span> <span class="k">auto</span> <span class="n">last</span><span class="p">,</span> <span class="k">const</span> <span class="k">auto</span> <span class="o">&amp;</span><span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">bool</span> <span class="n">toggle</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(;</span> <span class="n">first</span> <span class="o">!=</span> <span class="n">last</span><span class="p">;</span> <span class="n">first</span><span class="o">++</span><span class="p">,</span> <span class="n">toggle</span> <span class="o">=</span> <span class="o">!</span><span class="n">toggle</span><span class="p">)</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">toggle</span><span class="p">)</span>
            <span class="o">*</span><span class="n">first</span> <span class="o">=</span> <span class="n">x</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Given the other uses of <code class="language-plaintext highlighter-rouge">auto</code> as a type specifier, this kind of makes
sense, right? The compiler infers the type from the input argument.
But, as you should often do, put yourself in the compiler’s shoes for
a moment. Given this function definition in isolation, can you
generate any code? Nope. The compiler needs to see the call site
before it can infer the type. Even more, different call sites may use
different types. That <strong>sounds an awful lot like a template</strong>, eh?</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span><span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">,</span> <span class="k">typename</span> <span class="nc">V</span><span class="p">&gt;</span>
<span class="kt">void</span>
<span class="nf">set_odd</span><span class="p">(</span><span class="n">T</span> <span class="n">first</span><span class="p">,</span> <span class="n">T</span> <span class="n">last</span><span class="p">,</span> <span class="k">const</span> <span class="n">V</span> <span class="o">&amp;</span><span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">bool</span> <span class="n">toggle</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(;</span> <span class="n">first</span> <span class="o">!=</span> <span class="n">last</span><span class="p">;</span> <span class="n">first</span><span class="o">++</span><span class="p">,</span> <span class="n">toggle</span> <span class="o">=</span> <span class="o">!</span><span class="n">toggle</span><span class="p">)</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">toggle</span><span class="p">)</span>
            <span class="o">*</span><span class="n">first</span> <span class="o">=</span> <span class="n">x</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This is <strong>a proposed feature called <em>abbreviated function
templates</em></strong>, part of <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4361.pdf"><em>C++ Extensions for Concepts</em></a>. It’s
intended to be shorthand for the template version of the function. GCC
4.9 implements it as an extension, which is why the author was unaware
of its unofficial status. In March 2016 it was established that
<a href="http://honermann.net/blog/2016/03/06/why-concepts-didnt-make-cxx17/">abbreviated function templates <strong>would <em>not</em> be part of
C++17</strong></a>, but may still appear in a future revision.</p>

<p>Personally, I find this use of <code class="language-plaintext highlighter-rouge">auto</code> to be vulgar. It overloads the
keyword with a third definition. This isn’t unheard of — <code class="language-plaintext highlighter-rouge">static</code> also
serves a number of unrelated purposes — but while similar to the
second form of <code class="language-plaintext highlighter-rouge">auto</code> (type inference), this proposed third form is
very different in its semantics (far more complex) and overhead
(potentially very costly). I’m glad it’s been rejected so far.
Templates better reflect the nature of this sort of code.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Automatic Deletion of Incomplete Output Files</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/08/07/"/>
    <id>urn:uuid:431fafe9-6630-363e-4596-85eb3a289ec2</id>
    <updated>2016-08-07T02:00:37Z</updated>
    <category term="c"/><category term="cpp"/><category term="win32"/><category term="linux"/>
    <content type="html">
      <![CDATA[<p>Conventionally, a program that creates an output file will delete its
incomplete output should an error occur while writing the file. It’s
risky to leave behind a file that the user may rightfully confuse for
a valid file. They might not have noticed the error.</p>

<p>For example, compression programs such as gzip, bzip2, and xz when
given a compressed file as an argument will create a new file with the
compression extension removed. They write to this file as the
compressed input is being processed. If the compressed stream contains
an error in the middle, the partially-completed output is removed.</p>

<p>There are exceptions of course, such as programs that download files
over a network. The partial result has value, especially if the
transfer can be <a href="https://tools.ietf.org/html/rfc7233">continued from where it left off</a>. The
convention is to append another extension, such as “.part”, to
indicate a partial output.</p>

<p>The straightforward solution is to always delete the file as part of
error handling. A non-interactive program would report the error on
standard error, delete the file, and exit with an error code. However,
there are at least two situations where error handling would be unable
to operate: unhandled signals (usually including a segmentation fault)
and power failures. A partial or corrupted output file will be left
behind, possibly looking like a valid file.</p>

<p>A common, more complex approach is to name the file differently from
its final name while being written. If written successfully, the
completed file is renamed into place. This is already <a href="http://blog.httrack.com/blog/2013/11/15/everything-you-always-wanted-to-know-about-fsync/">required for
durable replacement</a>, so it’s basically free for many
applications. In the worst case, where the program is unable to clean
up, the obviously incomplete file is left behind only wasting space.</p>

<p>Looking to be more robust, I had the following misguided idea: <strong>Rely
completely on the operating system to perform cleanup in the case of a
failure.</strong> Initially the file would be configured to be automatically
deleted when the final handle is closed. This takes care of all
abnormal exits, and possibly even power failures. The program can just
exit on error without deleting the file. Once written successfully,
the automatic-delete indicator is cleared so that the file survives.</p>

<p>The target application for this technique supports both Linux and
Windows, so I would need to figure it out for both systems. On
Windows, there’s the flag <code class="language-plaintext highlighter-rouge">FILE_FLAG_DELETE_ON_CLOSE</code>. I’d just need
to find a way to clear it. On POSIX, file would be unlinked while
being written, and linked into the filesystem on success. The latter
turns out to be a lot harder than I expected.</p>

<h3 id="solution-for-windows">Solution for Windows</h3>

<p>I’ll start with Windows since the technique actually works fairly well
here — ignoring the usual, dumb Win32 filesystem caveats. This is a
little surprising, since it’s usually Win32 that makes these things
far more difficult than they should be.</p>

<p>The primary Win32 function for opening and creating files is
<a href="https://msdn.microsoft.com/en-us/library/windows/desktop/aa363858(v=vs.85).aspx">CreateFile</a>. There are many options, but the key is
<code class="language-plaintext highlighter-rouge">FILE_FLAG_DELETE_ON_CLOSE</code>. Here’s how an application might typically
open a file for output.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">DWORD</span> <span class="n">access</span> <span class="o">=</span> <span class="n">GENERIC_WRITE</span><span class="p">;</span>
<span class="n">DWORD</span> <span class="n">create</span> <span class="o">=</span> <span class="n">CREATE_ALWAYS</span><span class="p">;</span>
<span class="n">DWORD</span> <span class="n">flags</span> <span class="o">=</span> <span class="n">FILE_FLAG_DELETE_ON_CLOSE</span><span class="p">;</span>
<span class="n">HANDLE</span> <span class="n">f</span> <span class="o">=</span> <span class="n">CreateFile</span><span class="p">(</span><span class="s">"out.tmp"</span><span class="p">,</span> <span class="n">access</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">create</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
</code></pre></div></div>

<p>This special flag asks Windows to delete the file as soon as the last
handle to to <em>file object</em> is closed. Notice I said file object, not
file, since <a href="https://web.archive.org/web/0/https://blogs.msdn.microsoft.com/oldnewthing/20160108-00/?p=92821">these are different things</a>. The catch: This flag
is a property of the file object, not the file, and cannot be removed.</p>

<p>However, the solution is simple. Create a new link to the file so that
it survives deletion. This even works for files residing on a network
shares.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">CreateHardLink</span><span class="p">(</span><span class="s">"out"</span><span class="p">,</span> <span class="s">"out.tmp"</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="n">CloseHandle</span><span class="p">(</span><span class="n">f</span><span class="p">);</span>  <span class="c1">// deletes out.tmp file</span>
</code></pre></div></div>

<p>The gotcha is that the underlying filesystem must be NTFS. FAT32
doesn’t support hard links. Unfortunately, since FAT32 remains the
least common denominator and is still widely used for removable media,
depending on the application, your users may expect support for saving
files to FAT32. A workaround is probably required.</p>

<h3 id="solution-for-linux">Solution for Linux</h3>

<p>This is where things really fall apart. It’s just <em>barely</em> possible on
Linux, it’s messy, and it’s not portable anywhere else. There’s no way
to do this for POSIX in general.</p>

<p>My initial thought was to create a file then unlink it. Unlike the
situation on Windows, files can be unlinked while they’re currently
open by a process. These files are finally deleted when the last file
descriptor (the last reference) is closed. Unfortunately, using
unlink(2) to remove the last link to a file prevents that file from
being linked again.</p>

<p>Instead, the solution is to use the relatively new (since Linux 3.11),
Linux-specific <code class="language-plaintext highlighter-rouge">O_TMPFILE</code> flag when creating the file. Instead of a
filename, this variation of open(2) takes a directory and creates an
unnamed, temporary file in it. These files are special in that they’re
permitted to be given a name in the filesystem at some future point.</p>

<p>For this example, I’ll assume the output is relative to the current
working directory. If it’s not, you’ll need to open an additional file
descriptor for the parent directory, and also use openat(2) to avoid
possible race conditions (since paths can change from under you). The
number of ways this can fail is already rapidly multiplying.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">open</span><span class="p">(</span><span class="s">"."</span><span class="p">,</span> <span class="n">O_TMPFILE</span><span class="o">|</span><span class="n">O_WRONLY</span><span class="p">,</span> <span class="mo">0600</span><span class="p">);</span>
</code></pre></div></div>

<p>The catch is that only a handful of filesystems support <code class="language-plaintext highlighter-rouge">O_TMPFILE</code>.
It’s like the FAT32 problem above, but worse. You could easily end up
in a situation where it’s not supported, and will almost certainly
require a workaround.</p>

<p>Linking a file from a file descriptor is where things get messier. The
file descriptor must be linked with linkat(2) from its name on the
/proc virtual filesystem, constructed as a string. The following
snippet comes straight from the Linux open(2) manpage.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">64</span><span class="p">];</span>
<span class="n">sprintf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="s">"/proc/self/fd/%d"</span><span class="p">,</span> <span class="n">fd</span><span class="p">);</span>
<span class="n">linkat</span><span class="p">(</span><span class="n">AT_FDCWD</span><span class="p">,</span> <span class="n">buf</span><span class="p">,</span> <span class="n">AT_FDCWD</span><span class="p">,</span> <span class="s">"out"</span><span class="p">,</span> <span class="n">AT_SYMLINK_FOLLOW</span><span class="p">);</span>
</code></pre></div></div>

<p>Even on Linux, /proc isn’t always available, such as within a chroot
or a container, so this part can fail as well. In theory there’s a way
to do this with the Linux-specific <code class="language-plaintext highlighter-rouge">AT_EMPTY_PATH</code> and avoid /proc,
but I couldn’t get it to work.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Note: this doesn't actually work for me.</span>
<span class="n">linkat</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="s">""</span><span class="p">,</span> <span class="n">AT_FDCWD</span><span class="p">,</span> <span class="s">"out"</span><span class="p">,</span> <span class="n">AT_EMPTY_PATH</span><span class="p">);</span>
</code></pre></div></div>

<p>Given the poor portability (even within Linux), the number of ways
this can go wrong, and that a workaround is definitely needed anyway,
I’d say this technique is worthless. I’m going to stick with the
tried-and-true approach for this one.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Four Ways to Compile C for Windows</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/06/13/"/>
    <id>urn:uuid:1e99288c-0500-36f5-9fe7-262e6c6287c4</id>
    <updated>2016-06-13T04:13:25Z</updated>
    <category term="c"/><category term="cpp"/><category term="win32"/>
    <content type="html">
      <![CDATA[<p><em>Update 2020: If you’re on Windows, just use <a href="https://github.com/skeeto/w64devkit"><strong>w64devkit</strong></a>.
It’s <a href="/blog/2020/05/15/">my own toolchain distribution</a>, and it’s the best option
available. <a href="/blog/2020/09/25/">Everything you need</a> is in one package.</em></p>

<p>I primarily work on and develop for unix-like operating systems —
Linux in particular. However, when it comes to desktop applications,
most potential users are on Windows. Rather than develop on Windows,
which I’d rather avoid, I’ll continue developing, testing, and
debugging on Linux while keeping portability in mind. Unfortunately
every option I’ve found for building Windows C programs has some
significant limitations. These limitations advise my approach to
portability and restrict the C language features used by the program
for all platforms.</p>

<p>As of this writing I’ve identified four different practical ways to
build C applications for Windows. This information will definitely
become further and further out of date as this article ages, so if
you’re visiting from the future take a moment to look at the date.
Except for LLVM shaking things up recently, development tooling on
unix-like systems has had the same basic form for the past 15 years
(i.e. dominated by GCC). While Visual C++ has been around for more
than two decades, the tooling on Windows has seen more churn by
comparison.</p>

<p>Before I get into the specifics, let me point out a glaring problem
common to all four: Unicode arguments and filenames. Microsoft jumped
the gun and adopted UTF-16 early. UTF-16 is a kludge, a worst of all
worlds, being a variable length encoding (surrogate pairs), backwards
incompatible (<a href="http://utf8everywhere.org/">unlike UTF-8</a>), and having byte-order issues (BOM).
Most Win32 functions that accept strings generally come in two flavors,
ANSI and UTF-16. The standard, portable C library functions wrap the
ANSI-flavored functions. This means <strong>portable C programs can’t interact
with Unicode filenames</strong>. (Update 2021: <a href="/blog/2021/12/30/">Now they can</a>.) They must
call the non-portable, Windows-specific versions. This includes <code class="language-plaintext highlighter-rouge">main</code>
itself, which is only handed ANSI-truncated arguments.</p>

<p>Compare this to unix-like systems, which generally adopted UTF-8, but
rather as a convention than as a hard rule. The operating system
doesn’t know or care about Unicode. Program arguments and filenames
are just zero-terminated bytestrings. Implicitly decoding these as
UTF-8 <a href="https://utcc.utoronto.ca/~cks/space/blog/python/Python3UnicodeIssue">would be a mistake anyway</a>. What happens when the
encoding isn’t valid?</p>

<p>This doesn’t <em>have</em> to be a problem on Windows. A Windows standard C
library could connect to Windows’ Unicode-flavored functions and
encode to/from UTF-8 as needed, allowing portable programs to maintain
the bytestring illusion. It’s only that none of the existing standard
C libraries do it this way.</p>

<h3 id="mingw-w64">Mingw-w64</h3>

<p>Of course my first natural choice is MinGW, specifically the
<a href="http://mingw-w64.org/doku.php">Mingw-w64</a> fork. It’s GCC ported to Windows. You can
continue relying on GCC-specific features when you need them. It’s got
all the core language features up through C11, plus the common
extensions. It’s probably packaged by your Linux distribution of
choice, making it trivial to cross-compile programs and libraries from
Linux — and with Wine you can even execute them on x86. Like regular
GCC, it outputs GDB-friendly DWARF debugging information, so you can
debug applications with GDB.</p>

<p>If I’m using Mingw-w64 on Windows, <del>I prefer to do so from inside
Cygwin</del>. Since it provides a complete POSIX environment, it maximizes
portability for the whole tool chain. This isn’t strictly required.</p>

<p>However, it has one big flaw. Unlike unix-like systems, Windows doesn’t
supply a system standard C library. That’s the compiler’s job. But
Mingw-w64 doesn’t have one. Instead it links against <code class="language-plaintext highlighter-rouge">msvcrt.dll</code>,
<del>which <a href="https://web.archive.org/web/0/https://blogs.msdn.microsoft.com/oldnewthing/20140411-00/?p=1273">isn’t officially supported by Microsoft</a>. It just
happens to exist on modern Windows installations. Since it’s not
supported,</del> it’s way out of date and doesn’t support much of C99. A lot
of these problems are patched over by the compiler, <del>but if you’re
relying on Mingw-w64, you still have to stick to some C89 library
features, such as limiting yourself to the C89 printf specifiers</del>.</p>

<p><del>Update: Mārtiņš Možeiko has pointed out <code class="language-plaintext highlighter-rouge">__USE_MINGW_ANSI_STDIO</code>, an
undocumented feature that fixes the printf family. I now use this by
default in all of my Mingw-w64 builds. It fixes most of the formatted
output issues, except that it’s incompatible with the <a href="https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-g_t_0040code_007bformat_007d-function-attribute-3318"><code class="language-plaintext highlighter-rouge">format</code> function
attribute</a>.</del> (Update 2021: Mingw-w64 now does the right thing
out of the box.)</p>

<p><del>Another problem is that <a href="http://thelinuxjedi.blogspot.com/2014/07/tripping-up-using-mingw.html">position-independent code generation is
broken</a>, and so ASLR is not an option. This means binaries produced
by Mingw-w64 are less secure than they should be. There are also a
number of <a href="https://gcc.gnu.org/ml/gcc-bugs/2015-05/msg02025.html">subtle code generation bugs</a> that might arise if you’re
doing something unusual.</del> (Update 2021: Mingw-w64 makes PIE mandatory.)</p>

<h3 id="visual-c">Visual C++</h3>

<p>The behemoth usually considered in this situation is Visual Studio and
the Visual C++ build tools. I strongly prefer open source development
tools, and Visual Studio obviously the <em>least</em> open source option, but
at least it’s cost-free these days. Now, I have absolutely no interest
in Visual Studio, but fortunately the Visual C++ compiler and
associated build tools can be used standalone, supporting both C and
C++.</p>

<p>Included is a “vcvars” batch file — vcvars64.bat for x64. Execute that
batch file in a cmd.exe console and the Visual C++ command line build
tools will be made available in that console and in any programs
executed from it (your editor). It includes the compiler (cl.exe),
linker (link.exe), assembler (ml64.exe), disassembler (dumpbin.exe),
and more. It also includes a <a href="/blog/2016/04/30/">mostly POSIX-complete</a> make called
nmake.exe. All these tools are noisy and print a copyright banner on
every invocation, so get used to passing <code class="language-plaintext highlighter-rouge">-nologo</code> every time, which
suppresses some of it.</p>

<p>When I said behemoth, I meant it. In my experience it literally takes
<em>hours</em> (unattended) to install Visual Studio 2015. <del>The good news is you
don’t actually need it all anymore. The build tools <a href="http://landinghub.visualstudio.com/visual-cpp-build-tools">are available
standalone</a>. While it’s still a larger and slower installation
process than it really should be, it’s is much more reasonable to
install. It’s good enough that I’d even say I’m comfortable relying on
it for Windows builds.</del> (Update: The build tools are unfortunately no
longer standalone.)</p>

<p>That being said, it’s not without its flaws. Microsoft has never
announced any plans to support C99. They only care about C++, with C as
a second class citizen. Since C++11 incorporated most of C99 and
Microsoft supports C++11, Visual Studio 2015 supports most of C99. The
only things missing as far as I can tell are variable length arrays
(VLAs), complex numbers, and C99’s array parameter declarators, since
none of these were adopted by C++. Some C99 features are considered
extensions (as they would be for C89), so you’ll also get warnings about
them, which can be disabled.</p>

<p>The command line interface (option flags, intermediates, etc.) isn’t
quite reconcilable with the unix-like ecosystem (i.e. GCC, Clang), so
<strong>you’ll need separate Makefiles</strong>, or you’ll need to use a build
system that generates Visual C++ Makefiles.</p>

<p><del>Debugging is a major problem.</del> (Update 2022: It’s actually quite good
once <a href="/blog/2022/06/26/">you know how to do it</a>.) Visual C++ outputs separate .pdb
<a href="https://en.wikipedia.org/wiki/Program_database">program database</a> files, which aren’t usable from GDB. Visual
Studio has a built-in debugger, though it’s not included in the
standalone Visual C++ build tools. <del>I’m still searching for a decent
debugging solution for this scenario. I tried WinDbg, but I can’t stand
it.</del> (Update 2022: <a href="https://www.youtube.com/watch?v=r9eQth4Q5jg">RemedyBG is amazing</a>.)</p>

<p>In general the output code performance is on par with GCC and Clang,
so you’re not really gaining or losing performance with Visual C++.</p>

<h3 id="clang">Clang</h3>

<p>Unsurprisingly, <a href="http://clang.llvm.org/">Clang</a> has been ported to Windows. It’s like
Mingw-w64 in that you get the same features and interface across
platforms.</p>

<p>Unlike Mingw-w64, it doesn’t link against msvcrt.dll. Instead <strong>it
relies directly on the official Windows SDK</strong>. You’ll basically need
to install the Visual C++ build tools as if were going to build with
Visual C++. This means no practical cross-platform builds and you’re
still relying on the proprietary Microsoft toolchain. In the past you
even had to use Microsoft’s linker, but LLVM now provides its own.</p>

<p>It generates GDB-friendly DWARF debug information (in addition to
CodeView) so in theory <strong>you can debug with GDB</strong> again. I haven’t
given this a thorough evaluation yet.</p>

<h3 id="pelles-c">Pelles C</h3>

<p>Finally there’s <a href="http://www.smorgasbordet.com/pellesc/">Pelles C</a>. It’s cost-free but not open
source. It’s a reasonable, small install that includes a full IDE with
an integrated debugger and command line tools. It has its own C
library and Win32 SDK with the most complete C11 support around. It
also supports OpenMP 3.1. All in all it’s pretty nice and is something
I wouldn’t be afraid to rely upon for Windows builds.</p>

<p>Like Visual C++, it has a couple of “povars” batch files to set up the
right environment, which includes a C compiler, linker, assembler,
etc. The compiler interface mostly mimics cl.exe, though there are far
fewer code generation options. The make program, pomake.exe, mimics
nmake.exe, but is even less POSIX-complete. The compiler’s <strong>output
code performance is also noticeably poorer than GCC, Clang, and Visual
C++</strong>. It’s definitely a less mature compiler.</p>

<p>It outputs CodeView debugging information, so <strong>GDB is of no use</strong>.
The best solution is to simply use the compiler built into the IDE,
which can be invoked directly from the command line. You don’t
normally need to code from within the IDE just to use the debugger.</p>

<p>Like Visual C++, it’s Windows only, so cross-compilation isn’t really
in the picture.</p>

<p>If performance isn’t of high importance, and you don’t require
specific code generation options, then Pelles C is a nice choice for
Windows builds.</p>

<h3 id="other-options">Other Options</h3>

<p>I’m sure there are a few other options out there, and I’d like to hear
about them so I can try them out. I focused on these since they’re all
cost free and easy to download. If I have to register or pay, then
it’s not going to beat these options.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>You Can't Always Hash Pointers in C</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/05/30/"/>
    <id>urn:uuid:0fa3c99b-88ed-3a02-0342-4ee7536cc7ed</id>
    <updated>2016-05-30T23:59:46Z</updated>
    <category term="c"/><category term="cpp"/>
    <content type="html">
      <![CDATA[<p>Occasionally I’ve needed to key a hash table with C pointers. I don’t
care about the contents of the object itself — especially if it might
change — just its pointer identity. For example, suppose I’m using
null-terminated strings as keys and I know these strings will always
be interned in a common table. These strings can be compared directly
by their pointer values (<code class="language-plaintext highlighter-rouge">str_a == str_b</code>) rather than, more slowly,
by their contents (<code class="language-plaintext highlighter-rouge">strcmp(str_a, str_b) == 0</code>). The intern table
ensures that these expressions both have the same result.</p>

<p>As a key in a hash table, or other efficient map/dictionary data
structure, I’ll need to turn pointers into numerical values. However,
<strong>C pointers aren’t integers</strong>. Following certain rules it’s permitted
to cast pointers to integers and back, but doing so will reduce the
program’s portability. The most important consideration is that <strong>the
integer form isn’t guaranteed to have any meaningful or stable
value</strong>. In other words, even in a conforming implementation, the same
pointer might cast to two different integer values. This would break
any algorithm that isn’t keenly aware of the implementation details.</p>

<p>To show why this is, I’m going to be citing the relevant parts of the
C99 standard (ISO/IEC 9899:1999). The <a href="http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf">draft for C99</a> is freely
available (and what I use myself since I’m a cheapass). My purpose is
<em>not</em> to discourage you from casting pointers to integers and using
the result. The vast majority of the time this works fine and as you
would expect. I just think it’s an interesting part of the language,
and C/C++ programmers should be aware of potential the trade-offs.</p>

<h3 id="integer-to-pointer-casts">Integer to pointer casts</h3>

<p>What does the standard have to say about casting pointers to integers?
§6.3.2.3¶5:</p>

<blockquote>
  <p>An integer may be converted to any pointer type. Except as
previously specified, the result is implementation-defined, might
not be correctly aligned, might not point to an entity of the
referenced type, and might be a trap representation.</p>
</blockquote>

<p>It also includes a footnote:</p>

<blockquote>
  <p>The mapping functions for converting a pointer to an integer or an
integer to a pointer are intended to be consistent with the
addressing structure of the execution environment.</p>
</blockquote>

<p>Casting an integer to a pointer depends entirely on the
implementation. This is intended for things like memory mapped
hardware. The programmer may need to access memory as a specific
physical address, which would be encoded in the source as an integer
constant and cast to a pointer of the appropriate type.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>
<span class="nf">read_sensor_voltage</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="o">*</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="p">)</span><span class="mh">0x1ffc</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>It may also be used by a loader and dynamic linker to compute the
virtual address of various functions and variables, then cast to a
pointer before use.</p>

<p>Both cases are already dependent on implementation defined behavior,
so there’s nothing lost in relying on these casts.</p>

<p>An integer constant expression of 0 is a special case. It casts to a
NULL pointer in all implementations (§6.3.2.3¶3). However, a NULL
pointer doesn’t necessarily point to address zero, nor is it
necessarily a zero bit pattern (i.e. beware <code class="language-plaintext highlighter-rouge">memset</code> and <code class="language-plaintext highlighter-rouge">calloc</code> on
memory with pointers). It’s just guaranteed never to compare equally
with a valid object, and it is undefined behavior to dereference.</p>

<h3 id="pointer-to-integer-casts">Pointer to integer casts</h3>

<p>What about the other way around? §6.3.2.3¶6:</p>

<blockquote>
  <p>Any pointer type may be converted to an integer type. Except as
previously specified, the result is implementation-defined. If the
result cannot be represented in the integer type, the behavior is
undefined. The result need not be in the range of values of any
integer type.</p>
</blockquote>

<p>Like before, it’s implementation defined. However, the negatives are a
little stronger: the cast itself may be undefined behavior. I
speculate this is tied to integer overflow. The last part makes
pointer to integer casts optional for an implementation. This is one
way that the hash table above would be less portable.</p>

<p>When the cast is always possible, an implementation can provide an
integer type wide enough to hold any pointer value. §7.18.1.4¶1:</p>

<blockquote>
  <p>The following type designates a signed integer type with the
property that any valid pointer to void can be converted to this
type, then converted back to pointer to void, and the result will
compare equal to the original pointer:</p>

  <p><code class="language-plaintext highlighter-rouge">intptr_t</code></p>

  <p>The following type designates an unsigned integer type with the
property that any valid pointer to void can be converted to this
type, then converted back to pointer to void, and the result will
compare equal to the original pointer:</p>

  <p><code class="language-plaintext highlighter-rouge">uintptr_t</code></p>

  <p>These types are optional.</p>
</blockquote>

<p>The take-away is that the integer has no meaningful value. The only
guarantee is that the integer can be cast back into a void pointer
that will compare equally. It would be perfectly legal for an
implementation to pass these assertions (and still sometimes fail).</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span>
<span class="nf">example</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="n">ptr_a</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">ptr_b</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">ptr_a</span> <span class="o">==</span> <span class="n">ptr_b</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">uintptr_t</span> <span class="n">int_a</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uintptr_t</span><span class="p">)</span><span class="n">ptr_a</span><span class="p">;</span>
        <span class="kt">uintptr_t</span> <span class="n">int_b</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uintptr_t</span><span class="p">)</span><span class="n">ptr_b</span><span class="p">;</span>
        <span class="n">assert</span><span class="p">(</span><span class="n">int_a</span> <span class="o">!=</span> <span class="n">int_b</span><span class="p">);</span>
        <span class="n">assert</span><span class="p">((</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="n">int_a</span> <span class="o">==</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="n">int_b</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Since the bits don’t have any particular meaning, arithmetic
operations involving them will also have no meaning. When a pointer
might map to two different integers, the hash values might not match
up, breaking hash tables that rely on them. Even with <code class="language-plaintext highlighter-rouge">uintptr_t</code>
provided, casting pointers to integers isn’t useful without also
relying on implementation defined properties of the result.</p>

<h3 id="reasons-for-this-pointer-insanity">Reasons for this pointer insanity</h3>

<p>What purpose could such strange pointer-to-integer casts serve?</p>

<p>A security-conscious implementation may choose to annotate pointers
with additional information by setting unused bits. It might be for
<a href="https://www.usenix.org/legacy/event/sec09/tech/full_papers/akritidis.pdf">baggy bounds checks</a> or, someday, in an <a href="http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html">undefined behavior
sanitizer</a>. Before dereferencing annotated pointers, the
metadata bits would be checked for validity, and cleared/set before
use as an address. Or it may <a href="/blog/2016/04/10/">map the same object at multiple virtual
addresses</a>) to avoid setting/clearing the metadata bits,
providing interoperability with code unaware of the annotations. When
pointers are compared, these bits would be ignored.</p>

<p>When these annotated pointers are cast to integers, the metadata bits
will be present, but a program using the integer wouldn’t know their
meaning without tying itself closely to that implementation.
Completely unused bits may even be filled with random garbage when
cast. It’s allowed.</p>

<p>You may have been thinking before about using a union or <code class="language-plaintext highlighter-rouge">char *</code> to
bypass the cast and access the raw pointer bytes, but you’d run into
the same problems on the same implementations.</p>

<h3 id="conforming-programs">Conforming programs</h3>

<p>The standard makes a distinction between <em>strictly conforming
programs</em> (§4¶5) and <em>conforming programs</em> (§4¶7). A strictly
conforming program must not produce output depending on implementation
defined behavior nor exceed minimum implementation limits. Very few
programs fit in this category, including any program using <code class="language-plaintext highlighter-rouge">uintptr_t</code>
since it’s optional. Here are more examples of code that isn’t
strictly conforming:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">printf</span><span class="p">(</span><span class="s">"%zu"</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">int</span><span class="p">));</span> <span class="c1">// §6.5.3.4</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"%d"</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span> <span class="o">&gt;&gt;</span> <span class="mi">1</span><span class="p">);</span>      <span class="c1">// §6.5¶4</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"%d"</span><span class="p">,</span> <span class="n">MAX_INT</span><span class="p">);</span>      <span class="c1">// §5.2.4.2.1</span>
</code></pre></div></div>

<p>On the other hand, a <em>conforming program</em> is allowed to depend on
implementation defined behavior. Relying on meaningful, stable values
for pointers cast to <code class="language-plaintext highlighter-rouge">uintptr_t</code>/<code class="language-plaintext highlighter-rouge">intptr_t</code> is conforming even if your
program may exhibit bugs on some implementations.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Counting Processor Cores in Emacs</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2015/10/14/"/>
    <id>urn:uuid:dbfba1a0-b3af-356d-4d01-96917d622906</id>
    <updated>2015-10-14T03:17:16Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="c"/><category term="cpp"/>
    <content type="html">
      <![CDATA[<p>One of the great advantages of dependency analysis is parallelization.
Modern processors reorder instructions whose results don’t affect each
other. Compilers reorder expressions and statements to improve
throughput. Build systems know which outputs are inputs for other
targets and can choose any arbitrary build order within that
constraint. This article involves the last case.</p>

<p>The build system I use most often is GNU Make, either directly or
indirectly (Autoconf, CMake). It’s far from perfect, but it does what
I need. I almost always invoke it from within Emacs rather than in a
terminal. In fact, I do it so often that I’ve wrapped Emacs’ <code class="language-plaintext highlighter-rouge">compile</code>
command for rapid invocation.</p>

<p>I recently helped a co-worker set this set up for himself, so it had
me thinking about the problem again. The situation <a href="https://github.com/skeeto/.emacs.d">in my
config</a> is much more complicated than it needs to be, so I’ll
share a simplified version instead.</p>

<p>First bring in the usual goodies (we’re going to be making closures):</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: t; -*-</span>
<span class="p">(</span><span class="nb">require</span> <span class="ss">'cl-lib</span><span class="p">)</span>
</code></pre></div></div>

<p>We need a couple of configuration variables.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">quick-compile-command</span> <span class="s">"make -k "</span><span class="p">)</span>
<span class="p">(</span><span class="nb">defvar</span> <span class="nv">quick-compile-build-file</span> <span class="s">"Makefile"</span><span class="p">)</span>
</code></pre></div></div>

<p>Then a couple of interactive functions to set these on the fly. It’s
not strictly necessary, but I like giving each a key binding. I also
like having a history available via <code class="language-plaintext highlighter-rouge">read-string</code>, so I can switch
between a couple of different options with ease.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">quick-compile-set-command</span> <span class="p">(</span><span class="nv">command</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">interactive</span>
   <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="nv">read-string</span> <span class="s">"Command: "</span> <span class="nv">quick-compile-command</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="nv">quick-compile-command</span> <span class="nv">command</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">quick-compile-set-build-file</span> <span class="p">(</span><span class="nv">build-file</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">interactive</span>
   <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="nv">read-string</span> <span class="s">"Build file: "</span> <span class="nv">quick-compile-build-file</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="nv">quick-compile-build-file</span> <span class="nv">build-file</span><span class="p">))</span>
</code></pre></div></div>

<p>Now finally to the good part. Below, <code class="language-plaintext highlighter-rouge">quick-compile</code> is a
non-interactive function that returns an interactive closure ready to
be bound to any key I desire. It takes an optional target. This means
I don’t use the above <code class="language-plaintext highlighter-rouge">quick-compile-set-command</code> to choose a target,
only for setting other options. That will make more sense in a moment.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-defun</span> <span class="nv">quick-compile</span> <span class="p">(</span><span class="k">&amp;optional</span> <span class="p">(</span><span class="nv">target</span> <span class="s">""</span><span class="p">))</span>
  <span class="s">"Return an interaction function that runs `compile' for TARGET."</span>
  <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
    <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">save-buffer</span><span class="p">)</span>  <span class="c1">; so I don't get asked</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">default-directory</span>
            <span class="p">(</span><span class="nv">locate-dominating-file</span>
             <span class="nv">default-directory</span> <span class="nv">quick-compile-build-file</span><span class="p">)))</span>
      <span class="p">(</span><span class="k">if</span> <span class="nv">default-directory</span>
          <span class="p">(</span><span class="nb">compile</span> <span class="p">(</span><span class="nv">concat</span> <span class="nv">quick-compile-command</span> <span class="s">" "</span> <span class="nv">target</span><span class="p">))</span>
        <span class="p">(</span><span class="nb">error</span> <span class="s">"Cannot find %s"</span> <span class="nv">quick-compile-build-file</span><span class="p">)))))</span>
</code></pre></div></div>

<p>It traverses up (down?) the directory hierarchy towards root looking
for a Makefile — or whatever is set for <code class="language-plaintext highlighter-rouge">quick-compile-build-file</code>
— then invokes the build system there. I <a href="http://aegis.sourceforge.net/auug97.pdf">don’t believe in recursive
<code class="language-plaintext highlighter-rouge">make</code></a>.</p>

<p>So how do I put this to use? I clobber some key bindings I don’t
otherwise care about. A better choice might be the F-keys, but my
muscle memory is already committed elsewhere.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">global-set-key</span> <span class="p">(</span><span class="nv">kbd</span> <span class="s">"C-x c"</span><span class="p">)</span> <span class="p">(</span><span class="nv">quick-compile</span><span class="p">))</span> <span class="c1">; default target</span>
<span class="p">(</span><span class="nv">global-set-key</span> <span class="p">(</span><span class="nv">kbd</span> <span class="s">"C-x C"</span><span class="p">)</span> <span class="p">(</span><span class="nv">quick-compile</span> <span class="s">"clean"</span><span class="p">))</span>
<span class="p">(</span><span class="nv">global-set-key</span> <span class="p">(</span><span class="nv">kbd</span> <span class="s">"C-x t"</span><span class="p">)</span> <span class="p">(</span><span class="nv">quick-compile</span> <span class="s">"test"</span><span class="p">))</span>
<span class="p">(</span><span class="nv">global-set-key</span> <span class="p">(</span><span class="nv">kbd</span> <span class="s">"C-x r"</span><span class="p">)</span> <span class="p">(</span><span class="nv">quick-compile</span> <span class="s">"run"</span><span class="p">))</span>
</code></pre></div></div>

<p>Each of those invokes a different target without second guessing me.
Let me tell you, having “clean” at the tip of my fingers is wonderful.</p>

<h3 id="parallel-builds">Parallel Builds</h3>

<p>An extension common to many different <code class="language-plaintext highlighter-rouge">make</code> programs is <code class="language-plaintext highlighter-rouge">-j</code>, which
asks <code class="language-plaintext highlighter-rouge">make</code> to build targets in parallel where possible. These days
where multi-core machines are the norm, you nearly always want to use
this option, ideally set to the number of logical processor cores on
your system. It’s a huge time-saver.</p>

<p>My recent revelation was that my default build command could be
better: <code class="language-plaintext highlighter-rouge">make -k</code> is minimal. It should at least include <code class="language-plaintext highlighter-rouge">-j</code>, but
choosing an argument (number of processor cores) is a problem. Today I
use different machines with 2, 4, or 8 cores, so most of the time any
given number will be wrong. I could use a per-system configuration,
but I’d rather not. Unfortunately GNU Make will not automatically
detect the number of cores. That leaves the matter up to Emacs Lisp.</p>

<p>Emacs doesn’t currently have a built-in function that returns the
number of processor cores. I’ll need to reach into the operating
system to figure it out. My usual development environments are Linux,
Windows, and OpenBSD, so my solution should work on each. I’ve ranked
them by order of importance.</p>

<h4 id="number-of-cores-on-linux">Number of cores on Linux</h4>

<p>Linux has the <code class="language-plaintext highlighter-rouge">/proc</code> virtual filesystem in the fashion of Plan 9,
allowing different aspects of the system to be explored through the
standard filesystem API. The relevant file here is <code class="language-plaintext highlighter-rouge">/proc/cpuinfo</code>,
listing useful information about each of the system’s processors. To
get the number of processors, count the number of processor entries in
this file. I’ve wrapped it in <code class="language-plaintext highlighter-rouge">if-file-exists</code> so that it returns
<code class="language-plaintext highlighter-rouge">nil</code> on other operating systems instead of throwing an error.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nv">file-exists-p</span> <span class="s">"/proc/cpuinfo"</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">with-temp-buffer</span>
    <span class="p">(</span><span class="nv">insert-file-contents</span> <span class="s">"/proc/cpuinfo"</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">how-many</span> <span class="s">"^processor[[:space:]]+:"</span><span class="p">)))</span>
</code></pre></div></div>

<h4 id="number-of-cores-on-windows">Number of cores on Windows</h4>

<p>When I was first researching how to do this on Windows, I thought I
would need to invoke the <code class="language-plaintext highlighter-rouge">wmic</code> command line program and hope the
output could be parsed the same way on different versions of the
operating system and tool. However, it turns out the solution for
Windows is trivial. The environment variable <code class="language-plaintext highlighter-rouge">NUMBER_OF_PROCESSORS</code>
gives every process the answer for free. Being an environment
variable, it will need to be parsed.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">number-of-processors</span> <span class="p">(</span><span class="nv">getenv</span> <span class="s">"NUMBER_OF_PROCESSORS"</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">when</span> <span class="nv">number-of-processors</span>
    <span class="p">(</span><span class="nv">string-to-number</span> <span class="nv">number-of-processors</span><span class="p">)))</span>
</code></pre></div></div>

<h4 id="number-of-cores-on-bsd">Number of cores on BSD</h4>

<p>This seems to work the same across all the BSDs, including OS X,
though I haven’t yet tested it exhaustively. Invoke <code class="language-plaintext highlighter-rouge">sysctl</code>, which
returns an undecorated number to be parsed.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">with-temp-buffer</span>
  <span class="p">(</span><span class="nb">ignore-errors</span>
    <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nb">zerop</span> <span class="p">(</span><span class="nv">call-process</span> <span class="s">"sysctl"</span> <span class="no">nil</span> <span class="no">t</span> <span class="no">nil</span> <span class="s">"-n"</span> <span class="s">"hw.ncpu"</span><span class="p">))</span>
      <span class="p">(</span><span class="nv">string-to-number</span> <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)))))</span>
</code></pre></div></div>

<p>Also not complicated, but it’s the heaviest solution of the three.</p>

<h3 id="putting-it-all-together">Putting it all together</h3>

<p>Join all these together with <code class="language-plaintext highlighter-rouge">or</code>, call it <code class="language-plaintext highlighter-rouge">numcores</code>, and ta-da.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">quick-compile-command</span> <span class="p">(</span><span class="nb">format</span> <span class="s">"make -kj%d"</span> <span class="p">(</span><span class="nv">numcores</span><span class="p">)))</span>
</code></pre></div></div>

<p>Now <code class="language-plaintext highlighter-rouge">make</code> is invoked correctly on any system by default.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>Recovering Live Data with GDB</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2015/09/15/"/>
    <id>urn:uuid:5fa83dc1-2d5c-3313-b2b9-f4fb73ef5d9e</id>
    <updated>2015-09-15T14:53:44Z</updated>
    <category term="c"/><category term="cpp"/>
    <content type="html">
      <![CDATA[<p>I recently ran into a problem where <a href="https://github.com/skeeto/reddit-related">long-running program</a>
output was trapped in a C <code class="language-plaintext highlighter-rouge">FILE</code> buffer. The program had been running
for two days straight printing its results, but the last few kilobytes
of output were missing. It wouldn’t output these last bytes until the
program completed its day-long (or worse!) cleanup operation and
exited. This is easy to fix — and, honestly, the cleanup step was
unnecessary anyway — but I didn’t want to start all over and wait
two more days to recompute the result.</p>

<p>Here’s a minimal example of the situation. The first loop represents
the long-running computation and the infinite loop represents a
cleanup job that will never complete.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp">
</span>
<span class="kt">int</span>
<span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="cm">/* Compute output. */</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">10</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"%d/%d "</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="n">i</span> <span class="o">*</span> <span class="n">i</span><span class="p">);</span>
    <span class="n">putchar</span><span class="p">(</span><span class="sc">'\n'</span><span class="p">);</span>

    <span class="cm">/* "Slow" cleanup operation ... */</span>
    <span class="k">for</span> <span class="p">(;;)</span>
        <span class="p">;</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="buffered-output-review">Buffered Output Review</h3>

<p>Both <code class="language-plaintext highlighter-rouge">printf</code> and <code class="language-plaintext highlighter-rouge">putchar</code> are C library functions and are usually
buffered in some way. That is, each call to these functions doesn’t
necessarily send data out of the program. This is in contrast to the
POSIX functions <code class="language-plaintext highlighter-rouge">read</code> and <code class="language-plaintext highlighter-rouge">write</code>, which are unbuffered system calls.
Since system calls are relatively expensive, buffered input and output
is used to change a large number of system calls on small buffers into
a single system call on a single large buffer.</p>

<p>Typically, stdout is <em>line-buffered</em> if connected to a terminal. When
the program completes a line of output, the user probably wants to see
it immediately. So, if you compile the example program and run it at
your terminal you will probably see the output before the program
hangs on the infinite loop.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cc -std=c99 example.c
$ ./a.out
0/0 1/1 2/4 3/9 4/16 5/25 6/36 7/49 8/64 9/81
</code></pre></div></div>

<p>However, when stdout is connected to a file or pipe, it’s generally
buffered to something like 4kB. For this program, the output will
remain empty no matter how long you wait. It’s trapped in a <code class="language-plaintext highlighter-rouge">FILE</code>
buffer in process memory.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./a.out &gt; output.txt
</code></pre></div></div>

<p>The primary way to fix this is to use the <code class="language-plaintext highlighter-rouge">fflush</code> function, to force
the buffer empty before starting a long, non-output operation.
Unfortunately for me I didn’t think of this two days earlier.</p>

<h3 id="debugger-to-the-rescue">Debugger to the Rescue</h3>

<p>Fortunately there <em>is</em> a way to interrupt a running program and
manipulate its state: a debugger. First, find the process ID of the
running program (the one writing to <code class="language-plaintext highlighter-rouge">output.txt</code> above).</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ pgrep a.out
12934
</code></pre></div></div>

<p>Now attach GDB, which will pause the program’s execution.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gdb ./a.out
Reading symbols from ./a.out...(no debugging symbols found)...done.
gdb&gt; attach 12934
Attaching to program: /tmp/a.out, process 12934
... snip ...
0x0000000000400598 in main ()
gdb&gt;
</code></pre></div></div>

<p>From here I could examine the stdout <code class="language-plaintext highlighter-rouge">FILE</code> struct and try to extract
the buffer contents by hand. However, the easiest thing is to do is
perform the call I forgot in the first place: <code class="language-plaintext highlighter-rouge">fflush(stdout)</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gdb&gt; call fflush(stdout)
$1 = 0
gdb&gt; quit
Detaching from program: /tmp/a.out, process 12934
</code></pre></div></div>

<p>The program is still running, but the output has been recovered.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cat output.txt
0/0 1/1 2/4 3/9 4/16 5/25 6/36 7/49 8/64 9/81
</code></pre></div></div>

<h3 id="why-cleanup">Why Cleanup?</h3>

<p>As I said, in my case the cleanup operation was entirely unnecessary,
so it would be safe to just kill the program at this point. It was
taking a really long time to tear down a humongous data structure (on
the order of 50GB) one little node at a time with <code class="language-plaintext highlighter-rouge">free</code>. Obviously,
the memory would be freed much more quickly by the OS when the program
exited.</p>

<p>Freeing memory in the program was only to satisfy <a href="http://valgrind.org/">Valgrind</a>,
since it’s so incredibly useful for debugging. Not freeing the data
structure would hide actual memory leaks in Valgrind’s final report.
For the real “production” run, I should have disabled cleanup.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>C Object Oriented Programming</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/10/21/"/>
    <id>urn:uuid:3851ee30-1f9d-35af-e59f-e4be5023b2d5</id>
    <updated>2014-10-21T03:52:43Z</updated>
    <category term="c"/><category term="cpp"/><category term="tutorial"/>
    <content type="html">
      <![CDATA[<p><del>Object oriented programming, polymorphism in particular, is
essential to nearly any large, complex software system. Without it,
decoupling different system components is difficult.</del> (<em>Update in
2017</em>: I no longer agree with this statement.) C doesn’t come with
object oriented capabilities, so large C programs tend to grow their
own out of C’s primitives. This includes huge C projects like the
Linux kernel, BSD kernels, and SQLite.</p>

<h3 id="starting-simple">Starting Simple</h3>

<p>Suppose you’re writing a function <code class="language-plaintext highlighter-rouge">pass_match()</code> that takes an input
stream, an output stream, and a pattern. It works sort of like grep.
It passes to the output each line of input that matches the pattern.
The pattern string contains a shell glob pattern to be handled by
<a href="http://man7.org/linux/man-pages/man3/fnmatch.3.html">POSIX <code class="language-plaintext highlighter-rouge">fnmatch()</code></a>. Here’s what the interface looks like.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">pass_match</span><span class="p">(</span><span class="kt">FILE</span> <span class="o">*</span><span class="n">in</span><span class="p">,</span> <span class="kt">FILE</span> <span class="o">*</span><span class="n">out</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">pattern</span><span class="p">);</span>
</code></pre></div></div>

<p>Glob patterns are simple enough that pre-compilation, as would be done
for a regular expression, is unnecessary. The bare string is enough.</p>

<p>Some time later the customer wants the program to support regular
expressions in addition to shell-style glob patterns. For efficiency’s
sake, regular expressions need to be pre-compiled and so will not be
passed to the function as a string. It will instead be a <a href="http://man7.org/linux/man-pages/man3/regexec.3.html">POSIX
<code class="language-plaintext highlighter-rouge">regex_t</code></a> object. A quick-and-dirty approach might be to
accept both and match whichever one isn’t NULL.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">pass_match</span><span class="p">(</span><span class="kt">FILE</span> <span class="o">*</span><span class="n">in</span><span class="p">,</span> <span class="kt">FILE</span> <span class="o">*</span><span class="n">out</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">pattern</span><span class="p">,</span> <span class="n">regex_t</span> <span class="o">*</span><span class="n">re</span><span class="p">);</span>
</code></pre></div></div>

<p>Bleh. This is ugly and won’t scale well. What happens when more kinds
of filters are needed? It would be much better to accept a single
object that covers both cases, and possibly even another kind of
filter in the future.</p>

<h3 id="a-generalized-filter">A Generalized Filter</h3>

<p>One of the most common ways to customize the the behavior of a
function in C is to pass a function pointer. For example, the final
argument to <a href="http://man7.org/linux/man-pages/man3/qsort.3.html"><code class="language-plaintext highlighter-rouge">qsort()</code></a> is a comparator that determines how
objects get sorted.</p>

<p>For <code class="language-plaintext highlighter-rouge">pass_match()</code>, this function would accept a string and return a
boolean value deciding if the string should be passed to the output
stream. It gets called once on each line of input.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">pass_match</span><span class="p">(</span><span class="kt">FILE</span> <span class="o">*</span><span class="n">in</span><span class="p">,</span> <span class="kt">FILE</span> <span class="o">*</span><span class="n">out</span><span class="p">,</span> <span class="n">bool</span> <span class="p">(</span><span class="o">*</span><span class="n">match</span><span class="p">)(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="p">));</span>
</code></pre></div></div>

<p>However, this has one of the <a href="/blog/2014/08/29/">same problems as <code class="language-plaintext highlighter-rouge">qsort()</code></a>:
the passed function lacks context. It needs a pattern string or
<code class="language-plaintext highlighter-rouge">regex_t</code> object to operate on. In other languages these would be
attached to the function as a closure, but C doesn’t have closures. It
would need to be smuggled in via a global variable, <a href="/blog/2014/10/12/">which is not
good</a>.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="n">regex_t</span> <span class="n">regex</span><span class="p">;</span>  <span class="c1">// BAD!!!</span>

<span class="n">bool</span> <span class="nf">regex_match</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">string</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="n">regexec</span><span class="p">(</span><span class="o">&amp;</span><span class="n">regex</span><span class="p">,</span> <span class="n">string</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Because of the global variable, in practice <code class="language-plaintext highlighter-rouge">pass_match()</code> would be
neither reentrant nor thread-safe. We could take a lesson from GNU’s
<code class="language-plaintext highlighter-rouge">qsort_r()</code> and accept a context to be passed to the filter function.
This simulates a closure.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">pass_match</span><span class="p">(</span><span class="kt">FILE</span> <span class="o">*</span><span class="n">in</span><span class="p">,</span> <span class="kt">FILE</span> <span class="o">*</span><span class="n">out</span><span class="p">,</span>
                <span class="n">bool</span> <span class="p">(</span><span class="o">*</span><span class="n">match</span><span class="p">)(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="p">),</span> <span class="kt">void</span> <span class="o">*</span><span class="n">context</span><span class="p">);</span>
</code></pre></div></div>

<p>The provided context pointer would be passed to the filter function as
the second argument, and no global variables are needed. This would
probably be good enough for most purposes and it’s about as simple as
possible. The interface to <code class="language-plaintext highlighter-rouge">pass_match()</code> would cover any kind of
filter.</p>

<p>But wouldn’t it be nice to package the function and context together
as one object?</p>

<h3 id="more-abstraction">More Abstraction</h3>

<p>How about putting the context on a struct and making an interface out
of that? Here’s a tagged union that behaves as one or the other.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">enum</span> <span class="n">filter_type</span> <span class="p">{</span> <span class="n">GLOB</span><span class="p">,</span> <span class="n">REGEX</span> <span class="p">};</span>

<span class="k">struct</span> <span class="n">filter</span> <span class="p">{</span>
    <span class="k">enum</span> <span class="n">filter_type</span> <span class="n">type</span><span class="p">;</span>
    <span class="k">union</span> <span class="p">{</span>
        <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">pattern</span><span class="p">;</span>
        <span class="n">regex_t</span> <span class="n">regex</span><span class="p">;</span>
    <span class="p">}</span> <span class="n">context</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>There’s one function for interacting with this struct:
<code class="language-plaintext highlighter-rouge">filter_match()</code>. It checks the <code class="language-plaintext highlighter-rouge">type</code> member and calls the correct
function with the correct context.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">bool</span> <span class="nf">filter_match</span><span class="p">(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="n">filter</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">string</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">switch</span> <span class="p">(</span><span class="n">filter</span><span class="o">-&gt;</span><span class="n">type</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">case</span> <span class="n">GLOB</span><span class="p">:</span>
        <span class="k">return</span> <span class="n">fnmatch</span><span class="p">(</span><span class="n">filter</span><span class="o">-&gt;</span><span class="n">context</span><span class="p">.</span><span class="n">pattern</span><span class="p">,</span> <span class="n">string</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">case</span> <span class="n">REGEX</span><span class="p">:</span>
        <span class="k">return</span> <span class="n">regexec</span><span class="p">(</span><span class="o">&amp;</span><span class="n">filter</span><span class="o">-&gt;</span><span class="n">context</span><span class="p">.</span><span class="n">regex</span><span class="p">,</span> <span class="n">string</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="n">abort</span><span class="p">();</span> <span class="c1">// programmer error</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And the <code class="language-plaintext highlighter-rouge">pass_match()</code> API now looks like this. This will be the final
change to <code class="language-plaintext highlighter-rouge">pass_match()</code>, both in implementation and interface.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">pass_match</span><span class="p">(</span><span class="kt">FILE</span> <span class="o">*</span><span class="n">input</span><span class="p">,</span> <span class="kt">FILE</span> <span class="o">*</span><span class="n">output</span><span class="p">,</span> <span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="n">filter</span><span class="p">);</span>
</code></pre></div></div>

<p>It still doesn’t care how the filter works, so it’s good enough to
cover all future cases. It just calls <code class="language-plaintext highlighter-rouge">filter_match()</code> on the pointer
it was given. However, the <code class="language-plaintext highlighter-rouge">switch</code> and tagged union aren’t friendly
to extension. Really, it’s outright hostile. We finally have some
degree of polymorphism, but it’s crude. It’s like building duct tape
into a design. Adding new behavior means adding another <code class="language-plaintext highlighter-rouge">switch</code> case.
This is a step backwards. We can do better.</p>

<h4 id="methods">Methods</h4>

<p>With the <code class="language-plaintext highlighter-rouge">switch</code> we’re no longer taking advantage of function
pointers. So what about putting a function pointer on the struct?</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">filter</span> <span class="p">{</span>
    <span class="n">bool</span> <span class="p">(</span><span class="o">*</span><span class="n">match</span><span class="p">)(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="p">);</span>
<span class="p">};</span>
</code></pre></div></div>

<p>The filter itself is passed as the first argument, providing context.
In object oriented languages, that’s the implicit <code class="language-plaintext highlighter-rouge">this</code> argument. To
avoid requiring the caller to worry about this detail, we’ll hide it
in a new <code class="language-plaintext highlighter-rouge">switch</code>-free version of <code class="language-plaintext highlighter-rouge">filter_match()</code>.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">bool</span> <span class="nf">filter_match</span><span class="p">(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="n">filter</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">string</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="n">filter</span><span class="o">-&gt;</span><span class="n">match</span><span class="p">(</span><span class="n">filter</span><span class="p">,</span> <span class="n">string</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Notice we’re still lacking the actual context, the pattern string or
the regex object. Those will be different structs that embed the
filter struct.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">filter_regex</span> <span class="p">{</span>
    <span class="k">struct</span> <span class="n">filter</span> <span class="n">filter</span><span class="p">;</span>
    <span class="n">regex_t</span> <span class="n">regex</span><span class="p">;</span>
<span class="p">};</span>

<span class="k">struct</span> <span class="n">filter_glob</span> <span class="p">{</span>
    <span class="k">struct</span> <span class="n">filter</span> <span class="n">filter</span><span class="p">;</span>
    <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">pattern</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>For both the original filter struct is the first member. This is
critical. We’re going to be using a trick called <em>type punning</em>. The
first member is guaranteed to be positioned at the beginning of the
struct, so a pointer to a <code class="language-plaintext highlighter-rouge">struct filter_glob</code> is also a pointer to a
<code class="language-plaintext highlighter-rouge">struct filter</code>. Notice any resemblance to inheritance?</p>

<p>Each type, glob and regex, needs its own match method.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="n">bool</span>
<span class="nf">method_match_regex</span><span class="p">(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="n">filter</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">string</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">filter_regex</span> <span class="o">*</span><span class="n">regex</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">filter_regex</span> <span class="o">*</span><span class="p">)</span> <span class="n">filter</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">regexec</span><span class="p">(</span><span class="o">&amp;</span><span class="n">regex</span><span class="o">-&gt;</span><span class="n">regex</span><span class="p">,</span> <span class="n">string</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">static</span> <span class="n">bool</span>
<span class="nf">method_match_glob</span><span class="p">(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="n">filter</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">string</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">filter_glob</span> <span class="o">*</span><span class="n">glob</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">filter_glob</span> <span class="o">*</span><span class="p">)</span> <span class="n">filter</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">fnmatch</span><span class="p">(</span><span class="n">glob</span><span class="o">-&gt;</span><span class="n">pattern</span><span class="p">,</span> <span class="n">string</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I’ve prefixed them with <code class="language-plaintext highlighter-rouge">method_</code> to indicate their intended usage. I
declared these <code class="language-plaintext highlighter-rouge">static</code> because they’re completely private. Other
parts of the program will only be accessing them through a function
pointer on the struct. This means we need some constructors in order
to set up those function pointers. (For simplicity, I’m not error
checking.)</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="nf">filter_regex_create</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">pattern</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">filter_regex</span> <span class="o">*</span><span class="n">regex</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">regex</span><span class="p">));</span>
    <span class="n">regcomp</span><span class="p">(</span><span class="o">&amp;</span><span class="n">regex</span><span class="o">-&gt;</span><span class="n">regex</span><span class="p">,</span> <span class="n">pattern</span><span class="p">,</span> <span class="n">REG_EXTENDED</span><span class="p">);</span>
    <span class="n">regex</span><span class="o">-&gt;</span><span class="n">filter</span><span class="p">.</span><span class="n">match</span> <span class="o">=</span> <span class="n">method_match_regex</span><span class="p">;</span>
    <span class="k">return</span> <span class="o">&amp;</span><span class="n">regex</span><span class="o">-&gt;</span><span class="n">filter</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="nf">filter_glob_create</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">pattern</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">filter_glob</span> <span class="o">*</span><span class="n">glob</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">glob</span><span class="p">));</span>
    <span class="n">glob</span><span class="o">-&gt;</span><span class="n">pattern</span> <span class="o">=</span> <span class="n">pattern</span><span class="p">;</span>
    <span class="n">glob</span><span class="o">-&gt;</span><span class="n">filter</span><span class="p">.</span><span class="n">match</span> <span class="o">=</span> <span class="n">method_match_glob</span><span class="p">;</span>
    <span class="k">return</span> <span class="o">&amp;</span><span class="n">glob</span><span class="o">-&gt;</span><span class="n">filter</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Now this is real polymorphism. It’s really simple from the user’s
perspective. They call the correct constructor and get a filter object
that has the desired behavior. This object can be passed around
trivially, and no other part of the program worries about how it’s
implemented. Best of all, since each method is a separate function
rather than a <code class="language-plaintext highlighter-rouge">switch</code> case, new kinds of filter subtypes can be
defined independently. Users can create their own filter types that
work just as well as the two “built-in” filters.</p>

<h4 id="cleaning-up">Cleaning Up</h4>

<p>Oops, the regex filter needs to be cleaned up when it’s done, but the
user, by design, won’t know how to do it. Let’s add a <code class="language-plaintext highlighter-rouge">free()</code> method.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">filter</span> <span class="p">{</span>
    <span class="n">bool</span> <span class="p">(</span><span class="o">*</span><span class="n">match</span><span class="p">)(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="p">);</span>
    <span class="kt">void</span> <span class="p">(</span><span class="o">*</span><span class="n">free</span><span class="p">)(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="p">);</span>
<span class="p">};</span>

<span class="kt">void</span> <span class="nf">filter_free</span><span class="p">(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="n">filter</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="n">filter</span><span class="o">-&gt;</span><span class="n">free</span><span class="p">(</span><span class="n">filter</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And the methods for each. These would also be assigned in the
constructor.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">void</span>
<span class="nf">method_free_regex</span><span class="p">(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="n">f</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">filter_regex</span> <span class="o">*</span><span class="n">regex</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">filter_regex</span> <span class="o">*</span><span class="p">)</span> <span class="n">f</span><span class="p">;</span>
    <span class="n">regfree</span><span class="p">(</span><span class="o">&amp;</span><span class="n">regex</span><span class="o">-&gt;</span><span class="n">regex</span><span class="p">);</span>
    <span class="n">free</span><span class="p">(</span><span class="n">f</span><span class="p">);</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kt">void</span>
<span class="nf">method_free_glob</span><span class="p">(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="n">f</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">free</span><span class="p">(</span><span class="n">f</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The glob constructor should perhaps <code class="language-plaintext highlighter-rouge">strdup()</code> its pattern as a
private copy, in which case it would be freed here.</p>

<h3 id="object-composition">Object Composition</h3>

<p>A good rule of thumb is to prefer composition over inheritance. Having
tidy filter objects opens up some interesting possibilities for
composition. Here’s an AND filter that composes two arbitrary filter
objects. It only matches when both its subfilters match. It supports
short circuiting, so put the faster, or most discriminating, filter
first in the constructor (user’s responsibility).</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">filter_and</span> <span class="p">{</span>
    <span class="k">struct</span> <span class="n">filter</span> <span class="n">filter</span><span class="p">;</span>
    <span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="n">sub</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span>
<span class="p">};</span>

<span class="k">static</span> <span class="n">bool</span>
<span class="nf">method_match_and</span><span class="p">(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="n">f</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">s</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">filter_and</span> <span class="o">*</span><span class="n">and</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">filter_and</span> <span class="o">*</span><span class="p">)</span> <span class="n">f</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">filter_match</span><span class="p">(</span><span class="n">and</span><span class="o">-&gt;</span><span class="n">sub</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">s</span><span class="p">)</span> <span class="o">&amp;&amp;</span> <span class="n">filter_match</span><span class="p">(</span><span class="n">and</span><span class="o">-&gt;</span><span class="n">sub</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">s</span><span class="p">);</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kt">void</span>
<span class="nf">method_free_and</span><span class="p">(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="n">f</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">filter_and</span> <span class="o">*</span><span class="n">and</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">filter_and</span> <span class="o">*</span><span class="p">)</span> <span class="n">f</span><span class="p">;</span>
    <span class="n">filter_free</span><span class="p">(</span><span class="n">and</span><span class="o">-&gt;</span><span class="n">sub</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span>
    <span class="n">filter_free</span><span class="p">(</span><span class="n">and</span><span class="o">-&gt;</span><span class="n">sub</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span>
    <span class="n">free</span><span class="p">(</span><span class="n">f</span><span class="p">);</span>
<span class="p">}</span>

<span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="nf">filter_and</span><span class="p">(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="n">b</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">filter_and</span> <span class="o">*</span><span class="n">and</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">and</span><span class="p">));</span>
    <span class="n">and</span><span class="o">-&gt;</span><span class="n">sub</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">a</span><span class="p">;</span>
    <span class="n">and</span><span class="o">-&gt;</span><span class="n">sub</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">b</span><span class="p">;</span>
    <span class="n">and</span><span class="o">-&gt;</span><span class="n">filter</span><span class="p">.</span><span class="n">match</span> <span class="o">=</span> <span class="n">method_match_and</span><span class="p">;</span>
    <span class="n">and</span><span class="o">-&gt;</span><span class="n">filter</span><span class="p">.</span><span class="n">free</span> <span class="o">=</span> <span class="n">method_free_and</span><span class="p">;</span>
    <span class="k">return</span> <span class="o">&amp;</span><span class="n">and</span><span class="o">-&gt;</span><span class="n">filter</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>It can combine a regex filter and a glob filter, or two regex filters,
or two glob filters, or even other AND filters. It doesn’t care what
the subfilters are. Also, the <code class="language-plaintext highlighter-rouge">free()</code> method here frees its
subfilters. This means that the user doesn’t need to keep hold of
every filter created, just the “top” one in the composition.</p>

<p>To make composition filters easier to use, here are two “constant”
filters. These are statically allocated, shared, and are never
actually freed.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="n">bool</span>
<span class="nf">method_match_any</span><span class="p">(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="n">f</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">string</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">static</span> <span class="n">bool</span>
<span class="nf">method_match_none</span><span class="p">(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="n">f</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">string</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kt">void</span>
<span class="nf">method_free_noop</span><span class="p">(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="n">f</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">}</span>

<span class="k">struct</span> <span class="n">filter</span> <span class="n">FILTER_ANY</span>  <span class="o">=</span> <span class="p">{</span> <span class="n">method_match_any</span><span class="p">,</span>  <span class="n">method_free_noop</span> <span class="p">};</span>
<span class="k">struct</span> <span class="n">filter</span> <span class="n">FILTER_NONE</span> <span class="o">=</span> <span class="p">{</span> <span class="n">method_match_none</span><span class="p">,</span> <span class="n">method_free_noop</span> <span class="p">};</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">FILTER_NONE</code> filter will generally be used with a (theoretical)
<code class="language-plaintext highlighter-rouge">filter_or()</code> and <code class="language-plaintext highlighter-rouge">FILTER_ANY</code> will generally be used with the
previously defined <code class="language-plaintext highlighter-rouge">filter_and()</code>.</p>

<p>Here’s a simple program that composes multiple glob filters into a
single filter, one for each program argument.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">argv</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="n">filter</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">FILTER_ANY</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">char</span> <span class="o">**</span><span class="n">p</span> <span class="o">=</span> <span class="n">argv</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span> <span class="o">*</span><span class="n">p</span><span class="p">;</span> <span class="n">p</span><span class="o">++</span><span class="p">)</span>
        <span class="n">filter</span> <span class="o">=</span> <span class="n">filter_and</span><span class="p">(</span><span class="n">filter_glob_create</span><span class="p">(</span><span class="o">*</span><span class="n">p</span><span class="p">),</span> <span class="n">filter</span><span class="p">);</span>
    <span class="n">pass_match</span><span class="p">(</span><span class="n">stdin</span><span class="p">,</span> <span class="n">stdout</span><span class="p">,</span> <span class="n">filter</span><span class="p">);</span>
    <span class="n">filter_free</span><span class="p">(</span><span class="n">filter</span><span class="p">);</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Notice only one call to <code class="language-plaintext highlighter-rouge">filter_free()</code> is needed to clean up the
entire filter.</p>

<h3 id="multiple-inheritance">Multiple Inheritance</h3>

<p>As I mentioned before, the filter struct must be the first member of
filter subtype structs in order for type punning to work. If we want
to “inherit” from two different types like this, they would both need
to be in this position: a contradiction.</p>

<p>Fortunately type punning can be generalized such that it the
first-member constraint isn’t necessary. This is commonly done through
a <code class="language-plaintext highlighter-rouge">container_of()</code> macro. Here’s a C99-conforming definition.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;stddef.h&gt;</span><span class="cp">
</span>
<span class="cp">#define container_of(ptr, type, member) \
    ((type *)((char *)(ptr) - offsetof(type, member)))
</span></code></pre></div></div>

<p>Given a pointer to a member of a struct, the <code class="language-plaintext highlighter-rouge">container_of()</code> macro
allows us to back out to the containing struct. Suppose the regex
struct was defined differently, so that the <code class="language-plaintext highlighter-rouge">regex_t</code> member came
first.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">filter_regex</span> <span class="p">{</span>
    <span class="n">regex_t</span> <span class="n">regex</span><span class="p">;</span>
    <span class="k">struct</span> <span class="n">filter</span> <span class="n">filter</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>The constructor remains unchanged. The casts in the methods change to
the macro.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="n">bool</span>
<span class="nf">method_match_regex</span><span class="p">(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="n">f</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">string</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">filter_regex</span> <span class="o">*</span><span class="n">regex</span> <span class="o">=</span> <span class="n">container_of</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="k">struct</span> <span class="n">filter_regex</span><span class="p">,</span> <span class="n">filter</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">regexec</span><span class="p">(</span><span class="o">&amp;</span><span class="n">regex</span><span class="o">-&gt;</span><span class="n">regex</span><span class="p">,</span> <span class="n">string</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kt">void</span>
<span class="nf">method_free_regex</span><span class="p">(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="n">f</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">filter_regex</span> <span class="o">*</span><span class="n">regex</span> <span class="o">=</span> <span class="n">container_of</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="k">struct</span> <span class="n">filter_regex</span><span class="p">,</span> <span class="n">filter</span><span class="p">);</span>
    <span class="n">regfree</span><span class="p">(</span><span class="o">&amp;</span><span class="n">regex</span><span class="o">-&gt;</span><span class="n">regex</span><span class="p">);</span>
    <span class="n">free</span><span class="p">(</span><span class="n">f</span><span class="p">);</span>

<span class="p">}</span>
</code></pre></div></div>

<p>It’s a constant, compile-time computed offset, so there should be no
practical performance impact. The filter can now participate freely in
other <em>intrusive</em> data structures, like linked lists and such. It’s
analogous to multiple inheritance.</p>

<h3 id="vtables">Vtables</h3>

<p>Say we want to add a third method, <code class="language-plaintext highlighter-rouge">clone()</code>, to the filter API, to
make an independent copy of a filter, one that will need to be
separately freed. It will be like the copy assignment operator in C++.
Each kind of filter will need to define an appropriate “method” for
it. As long as new methods like this are added at the end, this
doesn’t break the API, but it does break the ABI regardless.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">filter</span> <span class="p">{</span>
    <span class="n">bool</span> <span class="p">(</span><span class="o">*</span><span class="n">match</span><span class="p">)(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="p">);</span>
    <span class="kt">void</span> <span class="p">(</span><span class="o">*</span><span class="n">free</span><span class="p">)(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="p">);</span>
    <span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="p">(</span><span class="o">*</span><span class="n">clone</span><span class="p">)(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="p">);</span>
<span class="p">};</span>
</code></pre></div></div>

<p>The filter object is starting to get big. It’s got three pointers —
24 bytes on modern systems — and these pointers are the same between
all instances of the same type. That’s a lot of redundancy. Instead,
these pointers could be shared between instances in a common table
called a <em>virtual method table</em>, commonly known as a <em>vtable</em>.</p>

<p>Here’s a vtable version of the filter API. The overhead is now only
one pointer regardless of the number of methods in the interface.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">filter</span> <span class="p">{</span>
    <span class="k">struct</span> <span class="n">filter_vtable</span> <span class="o">*</span><span class="n">vtable</span><span class="p">;</span>
<span class="p">};</span>

<span class="k">struct</span> <span class="n">filter_vtable</span> <span class="p">{</span>
    <span class="n">bool</span> <span class="p">(</span><span class="o">*</span><span class="n">match</span><span class="p">)(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="p">);</span>
    <span class="kt">void</span> <span class="p">(</span><span class="o">*</span><span class="n">free</span><span class="p">)(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="p">);</span>
    <span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="p">(</span><span class="o">*</span><span class="n">clone</span><span class="p">)(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="p">);</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Each type creates its own vtable and links to it in the constructor.
Here’s the regex filter re-written for the new vtable API and clone
method. This is all the tricks in one basket for a big object oriented
C finale!</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="nf">filter_regex_create</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">pattern</span><span class="p">);</span>

<span class="k">struct</span> <span class="n">filter_regex</span> <span class="p">{</span>
    <span class="n">regex_t</span> <span class="n">regex</span><span class="p">;</span>
    <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">pattern</span><span class="p">;</span>
    <span class="k">struct</span> <span class="n">filter</span> <span class="n">filter</span><span class="p">;</span>
<span class="p">};</span>

<span class="k">static</span> <span class="n">bool</span>
<span class="nf">method_match_regex</span><span class="p">(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="n">f</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">string</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">filter_regex</span> <span class="o">*</span><span class="n">regex</span> <span class="o">=</span> <span class="n">container_of</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="k">struct</span> <span class="n">filter_regex</span><span class="p">,</span> <span class="n">filter</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">regexec</span><span class="p">(</span><span class="o">&amp;</span><span class="n">regex</span><span class="o">-&gt;</span><span class="n">regex</span><span class="p">,</span> <span class="n">string</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kt">void</span>
<span class="nf">method_free_regex</span><span class="p">(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="n">f</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">filter_regex</span> <span class="o">*</span><span class="n">regex</span> <span class="o">=</span> <span class="n">container_of</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="k">struct</span> <span class="n">filter_regex</span><span class="p">,</span> <span class="n">filter</span><span class="p">);</span>
    <span class="n">regfree</span><span class="p">(</span><span class="o">&amp;</span><span class="n">regex</span><span class="o">-&gt;</span><span class="n">regex</span><span class="p">);</span>
    <span class="n">free</span><span class="p">(</span><span class="n">f</span><span class="p">);</span>
<span class="p">}</span>

<span class="k">static</span> <span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span>
<span class="nf">method_clone_regex</span><span class="p">(</span><span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="n">f</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">filter_regex</span> <span class="o">*</span><span class="n">regex</span> <span class="o">=</span> <span class="n">container_of</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="k">struct</span> <span class="n">filter_regex</span><span class="p">,</span> <span class="n">filter</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">filter_regex_create</span><span class="p">(</span><span class="n">regex</span><span class="o">-&gt;</span><span class="n">pattern</span><span class="p">);</span>
<span class="p">}</span>

<span class="cm">/* vtable */</span>
<span class="k">struct</span> <span class="n">filter_vtable</span> <span class="n">filter_regex_vtable</span> <span class="o">=</span> <span class="p">{</span>
    <span class="n">method_match_regex</span><span class="p">,</span> <span class="n">method_free_regex</span><span class="p">,</span> <span class="n">method_clone_regex</span>
<span class="p">};</span>

<span class="cm">/* constructor */</span>
<span class="k">struct</span> <span class="n">filter</span> <span class="o">*</span><span class="nf">filter_regex_create</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">pattern</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">filter_regex</span> <span class="o">*</span><span class="n">regex</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">regex</span><span class="p">));</span>
    <span class="n">regex</span><span class="o">-&gt;</span><span class="n">pattern</span> <span class="o">=</span> <span class="n">pattern</span><span class="p">;</span>
    <span class="n">regcomp</span><span class="p">(</span><span class="o">&amp;</span><span class="n">regex</span><span class="o">-&gt;</span><span class="n">regex</span><span class="p">,</span> <span class="n">pattern</span><span class="p">,</span> <span class="n">REG_EXTENDED</span><span class="p">);</span>
    <span class="n">regex</span><span class="o">-&gt;</span><span class="n">filter</span><span class="p">.</span><span class="n">vtable</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">filter_regex_vtable</span><span class="p">;</span>
    <span class="k">return</span> <span class="o">&amp;</span><span class="n">regex</span><span class="o">-&gt;</span><span class="n">filter</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This is almost exactly what’s going on behind the scenes in C++. When
a method/function is declared <code class="language-plaintext highlighter-rouge">virtual</code>, and therefore dispatches
based on the run-time type of its left-most argument, it’s listed in
the vtables for classes that implement it. Otherwise it’s just a
normal function. This is why functions need to be declared <code class="language-plaintext highlighter-rouge">virtual</code>
ahead of time in C++.</p>

<p>In conclusion, it’s relatively easy to get the core benefits of object
oriented programming in plain old C. It doesn’t require heavy use of
macros, nor do users of these systems need to know that underneath
it’s an object system, unless they want to extend it for themselves.</p>

<p>Here’s the whole example program once if you’re interested in poking:</p>

<ul>
  <li><a href="https://gist.github.com/skeeto/5faa131b19673549d8ca">https://gist.github.com/skeeto/5faa131b19673549d8ca</a></li>
</ul>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Duck Typing vs. Type Erasure</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/04/01/"/>
    <id>urn:uuid:d01a1d2e-2752-35f4-949a-ff69d7f78e22</id>
    <updated>2014-04-01T21:07:31Z</updated>
    <category term="java"/><category term="cpp"/><category term="lang"/><category term="compsci"/>
    <content type="html">
      <![CDATA[<p>Consider the following C++ class.</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;iostream&gt;</span><span class="cp">
</span>
<span class="k">template</span> <span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">&gt;</span>
<span class="k">struct</span> <span class="nc">Caller</span> <span class="p">{</span>
  <span class="k">const</span> <span class="n">T</span> <span class="n">callee_</span><span class="p">;</span>
  <span class="n">Caller</span><span class="p">(</span><span class="k">const</span> <span class="n">T</span> <span class="n">callee</span><span class="p">)</span> <span class="o">:</span> <span class="n">callee_</span><span class="p">(</span><span class="n">callee</span><span class="p">)</span> <span class="p">{}</span>
  <span class="kt">void</span> <span class="n">go</span><span class="p">()</span> <span class="p">{</span> <span class="n">callee_</span><span class="p">.</span><span class="n">call</span><span class="p">();</span> <span class="p">}</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Caller can be parameterized to <em>any</em> type so long as it has a <code class="language-plaintext highlighter-rouge">call()</code>
method. For example, introduce two types, Foo and Bar.</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">Foo</span> <span class="p">{</span>
  <span class="kt">void</span> <span class="n">call</span><span class="p">()</span> <span class="k">const</span> <span class="p">{</span> <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="s">"Foo"</span><span class="p">;</span> <span class="p">}</span>
<span class="p">};</span>

<span class="k">struct</span> <span class="nc">Bar</span> <span class="p">{</span>
  <span class="kt">void</span> <span class="n">call</span><span class="p">()</span> <span class="k">const</span> <span class="p">{</span> <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="s">"Bar"</span><span class="p">;</span> <span class="p">}</span>
<span class="p">};</span>

<span class="kt">int</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
  <span class="n">Caller</span><span class="o">&lt;</span><span class="n">Foo</span><span class="o">&gt;</span> <span class="n">foo</span><span class="p">{</span><span class="n">Foo</span><span class="p">()};</span>
  <span class="n">Caller</span><span class="o">&lt;</span><span class="n">Bar</span><span class="o">&gt;</span> <span class="n">bar</span><span class="p">{</span><span class="n">Bar</span><span class="p">()};</span>
  <span class="n">foo</span><span class="p">.</span><span class="n">go</span><span class="p">();</span>
  <span class="n">bar</span><span class="p">.</span><span class="n">go</span><span class="p">();</span>
  <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
  <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This code compiles cleanly and, when run, emits “FooBar”. This is an
example of <em>duck typing</em> — i.e., “If it looks like a duck, swims like
a duck, and quacks like a duck, then it probably is a duck.” Foo and
Bar are unrelated types. They have no common inheritance, but by
providing the expected interface, they both work with with Caller.
This is a special case of <em>polymorphism</em>.</p>

<p>Duck typing is normally only found in dynamically typed languages.
Thanks to templates, a statically, strongly typed language like C++
can have duck typing without sacrificing any type safety.</p>

<h3 id="java-duck-typing">Java Duck Typing</h3>

<p>Let’s try the same thing in Java using generics.</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">class</span> <span class="nc">Caller</span><span class="o">&lt;</span><span class="no">T</span><span class="o">&gt;</span> <span class="o">{</span>
    <span class="kd">final</span> <span class="no">T</span> <span class="n">callee</span><span class="o">;</span>
    <span class="nc">Caller</span><span class="o">(</span><span class="no">T</span> <span class="n">callee</span><span class="o">)</span> <span class="o">{</span>
        <span class="k">this</span><span class="o">.</span><span class="na">callee</span> <span class="o">=</span> <span class="n">callee</span><span class="o">;</span>
    <span class="o">}</span>
    <span class="kd">public</span> <span class="kt">void</span> <span class="nf">go</span><span class="o">()</span> <span class="o">{</span>
        <span class="n">callee</span><span class="o">.</span><span class="na">call</span><span class="o">();</span>  <span class="c1">// compiler error: cannot find symbol call</span>
    <span class="o">}</span>
<span class="o">}</span>

<span class="kd">class</span> <span class="nc">Foo</span> <span class="o">{</span>
    <span class="kd">public</span> <span class="kt">void</span> <span class="nf">call</span><span class="o">()</span> <span class="o">{</span> <span class="nc">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">print</span><span class="o">(</span><span class="s">"Foo"</span><span class="o">);</span> <span class="o">}</span>
<span class="o">}</span>

<span class="kd">class</span> <span class="nc">Bar</span> <span class="o">{</span>
    <span class="kd">public</span> <span class="kt">void</span> <span class="nf">call</span><span class="o">()</span> <span class="o">{</span> <span class="nc">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">print</span><span class="o">(</span><span class="s">"Bar"</span><span class="o">);</span> <span class="o">}</span>
<span class="o">}</span>

<span class="kd">public</span> <span class="kd">class</span> <span class="nc">Main</span> <span class="o">{</span>
    <span class="kd">public</span> <span class="kd">static</span> <span class="kt">void</span> <span class="nf">main</span><span class="o">(</span><span class="nc">String</span> <span class="n">args</span><span class="o">[])</span> <span class="o">{</span>
        <span class="nc">Caller</span><span class="o">&lt;</span><span class="nc">Foo</span><span class="o">&gt;</span> <span class="n">f</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Caller</span><span class="o">&lt;&gt;(</span><span class="k">new</span> <span class="nc">Foo</span><span class="o">());</span>
        <span class="nc">Caller</span><span class="o">&lt;</span><span class="nc">Bar</span><span class="o">&gt;</span> <span class="n">b</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Caller</span><span class="o">&lt;&gt;(</span><span class="k">new</span> <span class="nc">Bar</span><span class="o">());</span>
        <span class="n">f</span><span class="o">.</span><span class="na">go</span><span class="o">();</span>
        <span class="n">b</span><span class="o">.</span><span class="na">go</span><span class="o">();</span>
        <span class="nc">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">();</span>
    <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p>The program is practically identical, but this will fail with a
compile-time error. This is the result of <em>type erasure</em>. Unlike C++’s
templates, there will only ever be one compiled version of Caller, and
T will become Object. Since Object has no <code class="language-plaintext highlighter-rouge">call()</code> method, compilation
fails. The generic type is only for enabling additional compiler
checks later on.</p>

<p>C++ templates behave like a macros, expanded by the compiler once for
each different type of applied parameter. The <code class="language-plaintext highlighter-rouge">call</code> symbol is looked
up later, after the type has been fully realized, not when the
template is defined.</p>

<p>To fix this, Foo and Bar need a common ancestry. Let’s make this
<code class="language-plaintext highlighter-rouge">Callee</code>.</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">interface</span> <span class="nc">Callee</span> <span class="o">{</span>
    <span class="kt">void</span> <span class="nf">call</span><span class="o">();</span>
<span class="o">}</span>
</code></pre></div></div>

<p>Caller needs to be redefined such that T is a subclass of Callee.</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">class</span> <span class="nc">Caller</span><span class="o">&lt;</span><span class="no">T</span> <span class="kd">extends</span> <span class="nc">Callee</span><span class="o">&gt;</span> <span class="o">{</span>
    <span class="c1">// ...</span>
<span class="o">}</span>
</code></pre></div></div>

<p>This now compiles cleanly because <code class="language-plaintext highlighter-rouge">call()</code> will be found in <code class="language-plaintext highlighter-rouge">Callee</code>.
Finally, implement Callee.</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">class</span> <span class="nc">Foo</span> <span class="kd">implements</span> <span class="nc">Callee</span> <span class="o">{</span>
    <span class="c1">// ...</span>
<span class="o">}</span>

<span class="kd">class</span> <span class="nc">Bar</span> <span class="kd">implements</span> <span class="nc">Callee</span> <span class="o">{</span>
    <span class="c1">// ...</span>
<span class="o">}</span>
</code></pre></div></div>

<p>This is no longer duck typing, just plain old polymorphism. Type
erasure prohibits duck typing in Java (outside of dirty reflection
hacks).</p>

<h3 id="signals-and-slots-and-events-oh-my">Signals and Slots and Events! Oh My!</h3>

<p>Duck typing is useful for implementing the observer pattern without as
much boilerplate. A class can participate in the observer pattern
without <a href="http://raganwald.com/2014/03/31/class-hierarchies-dont-do-that.html">inheriting from some specialized class</a> or interface.
For example, see <a href="http://en.wikipedia.org/wiki/Signals_and_slots">the various signal and slots systems for C++</a>.
In constrast, Java <a href="http://docs.oracle.com/javase/7/docs/api/java/util/EventListener.html">has an EventListener type for everything</a>:</p>

<ul>
  <li>KeyListener</li>
  <li>MouseListener</li>
  <li>MouseMotionListener</li>
  <li>FocusListener</li>
  <li>ActionListener, etc.</li>
</ul>

<p>A class concerned with many different kinds of events, such as an
event logger, would need to inherit a large number of interfaces.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  

</feed>
