<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title>Articles tagged netsec at null program</title>
  <link rel="alternate" type="text/html"
        href="https://nullprogram.com/tags/netsec/"/>
  <link rel="self" type="application/atom+xml"
        href="https://nullprogram.com/tags/netsec/feed/"/>
  <updated>2026-04-26T00:45:29Z</updated>
  <id>urn:uuid:7c40636f-98cb-4dbf-b2a1-ed73f8712af7</id>

  <author>
    <name>Christopher Wellons</name>
    <uri>https://nullprogram.com</uri>
    <email>wellons@nullprogram.com</email>
  </author>

  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Infectious Executable Stacks</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/11/15/"/>
    <id>urn:uuid:7266b2ea-f39e-4b9a-87c8-e4480374af41</id>
    <updated>2019-11-15T03:29:37Z</updated>
    <category term="c"/><category term="netsec"/><category term="x86"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=21553882">on Hacker News</a></em>.</p>

<p>In software development there are many concepts that at first glance
seem useful and sound, but, after considering the consequences of their
implementation and use, are actually horrifying. Examples include
<a href="https://lwn.net/Articles/683118/">thread cancellation</a>, <a href="/blog/2019/10/27/">variable length arrays</a>, and <a href="/blog/2018/07/20/#strict-aliasing">memory
aliasing</a>. GCC’s closure extension to C is another, and this
little feature compromises the entire GNU toolchain.</p>

<!--more-->

<h3 id="gnu-c-nested-functions">GNU C nested functions</h3>

<p>GCC has its own dialect of C called GNU C. One feature unique to GNU C
is <em>nested functions</em>, which allow C programs to define functions inside
other functions:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">intsort1</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">base</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">nmemb</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">cmp</span><span class="p">(</span><span class="k">const</span> <span class="kt">void</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="k">const</span> <span class="kt">void</span> <span class="o">*</span><span class="n">b</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="k">return</span> <span class="o">*</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="p">)</span><span class="n">a</span> <span class="o">-</span> <span class="o">*</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="p">)</span><span class="n">b</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="n">qsort</span><span class="p">(</span><span class="n">base</span><span class="p">,</span> <span class="n">nmemb</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">base</span><span class="p">),</span> <span class="n">cmp</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The nested function above is straightforward and harmless. It’s nothing
groundbreaking, and it is trivial for the compiler to implement. The
<code class="language-plaintext highlighter-rouge">cmp</code> function is really just a static function whose scope is limited
to the containing function, no different than a local static variable.</p>

<p>With one slight variation the nested function turns into a closure. This
is where things get interesting:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">intsort2</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">base</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">nmemb</span><span class="p">,</span> <span class="kt">_Bool</span> <span class="n">invert</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">cmp</span><span class="p">(</span><span class="k">const</span> <span class="kt">void</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="k">const</span> <span class="kt">void</span> <span class="o">*</span><span class="n">b</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="kt">int</span> <span class="n">r</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="p">)</span><span class="n">a</span> <span class="o">-</span> <span class="o">*</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="p">)</span><span class="n">b</span><span class="p">;</span>
        <span class="k">return</span> <span class="n">invert</span> <span class="o">?</span> <span class="o">-</span><span class="n">r</span> <span class="o">:</span> <span class="n">r</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="n">qsort</span><span class="p">(</span><span class="n">base</span><span class="p">,</span> <span class="n">nmemb</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">base</span><span class="p">),</span> <span class="n">cmp</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">invert</code> variable from the outer scope is accessed from the inner
scope. This has <a href="/blog/2019/09/25/">clean, proper closure semantics</a> and works
correctly just as you’d expect. It fits quite well with traditional C
semantics. The closure itself is re-entrant and thread-safe. It’s
automatically (read: stack) allocated, and so it’s automatically freed
when the function returns, including when the stack is unwound via
<code class="language-plaintext highlighter-rouge">longjmp()</code>. It’s a natural progression to support closures like this
via nested functions. The eventual caller, <code class="language-plaintext highlighter-rouge">qsort</code>, doesn’t even know
it’s calling a closure!</p>

<p>While this seems so useful and easy, its implementation has serious
consequences that, in general, outweigh its benefits. In fact, in order
to make this work, the whole GNU toolchain has been specially rigged!</p>

<p>How does it work? The function pointer, <code class="language-plaintext highlighter-rouge">cmp</code>, passed to <code class="language-plaintext highlighter-rouge">qsort</code> must
somehow be associated with its lexical environment, specifically the
<code class="language-plaintext highlighter-rouge">invert</code> variable. A static address won’t do. When I <a href="/blog/2017/01/08/">implemented
closures as a toy library</a>, I talked about the function address for
each closure instance somehow needing to be unique.</p>

<p>GCC accomplishes this by constructing a trampoline on the stack. That
trampoline has access to the local variables stored adjacent to it, also
on the stack. GCC also generates a normal <code class="language-plaintext highlighter-rouge">cmp</code> function, like the
simple nested function before, that accepts <code class="language-plaintext highlighter-rouge">invert</code> as an additional
argument. The trampoline calls this function, passing the local variable
as this additional argument.</p>

<h3 id="trampoline-illustration">Trampoline illustration</h3>

<p>To illustrate this, I’ve manually implemented <code class="language-plaintext highlighter-rouge">intsort2()</code> below for
x86-64 (<a href="https://wiki.osdev.org/System_V_ABI">System V ABI</a>) without using GCC’s nested function
extension:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">cmp</span><span class="p">(</span><span class="k">const</span> <span class="kt">void</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="k">const</span> <span class="kt">void</span> <span class="o">*</span><span class="n">b</span><span class="p">,</span> <span class="kt">_Bool</span> <span class="n">invert</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">r</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="p">)</span><span class="n">a</span> <span class="o">-</span> <span class="o">*</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="p">)</span><span class="n">b</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">invert</span> <span class="o">?</span> <span class="o">-</span><span class="n">r</span> <span class="o">:</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">intsort3</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">base</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">nmemb</span><span class="p">,</span> <span class="kt">_Bool</span> <span class="n">invert</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">fp</span> <span class="o">=</span> <span class="p">(</span><span class="kt">unsigned</span> <span class="kt">long</span><span class="p">)</span><span class="n">cmp</span><span class="p">;</span>
    <span class="k">volatile</span> <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">buf</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
        <span class="c1">// mov  edx, invert</span>
        <span class="mh">0xba</span><span class="p">,</span> <span class="n">invert</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span>
        <span class="c1">// mov  rax, cmp</span>
        <span class="mh">0x48</span><span class="p">,</span> <span class="mh">0xb8</span><span class="p">,</span> <span class="n">fp</span> <span class="o">&gt;&gt;</span>  <span class="mi">0</span><span class="p">,</span> <span class="n">fp</span> <span class="o">&gt;&gt;</span>  <span class="mi">8</span><span class="p">,</span> <span class="n">fp</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">,</span> <span class="n">fp</span> <span class="o">&gt;&gt;</span> <span class="mi">24</span><span class="p">,</span>
                    <span class="n">fp</span> <span class="o">&gt;&gt;</span> <span class="mi">32</span><span class="p">,</span> <span class="n">fp</span> <span class="o">&gt;&gt;</span> <span class="mi">40</span><span class="p">,</span> <span class="n">fp</span> <span class="o">&gt;&gt;</span> <span class="mi">48</span><span class="p">,</span> <span class="n">fp</span> <span class="o">&gt;&gt;</span> <span class="mi">56</span><span class="p">,</span>
        <span class="c1">// jmp  rax</span>
        <span class="mh">0xff</span><span class="p">,</span> <span class="mh">0xe0</span>
    <span class="p">};</span>
    <span class="kt">int</span> <span class="p">(</span><span class="o">*</span><span class="n">trampoline</span><span class="p">)(</span><span class="k">const</span> <span class="kt">void</span> <span class="o">*</span><span class="p">,</span> <span class="k">const</span> <span class="kt">void</span> <span class="o">*</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="n">buf</span><span class="p">;</span>
    <span class="n">qsort</span><span class="p">(</span><span class="n">base</span><span class="p">,</span> <span class="n">nmemb</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">base</span><span class="p">),</span> <span class="n">trampoline</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Here’s a complete example you can try yourself on nearly any x86-64
unix-like system: <a href="/download/trampoline.c"><strong>trampoline.c</strong></a>. It even works with Clang. The
two notable systems where stack trampolines won’t work are
<a href="https://marc.info/?l=openbsd-cvs&amp;m=149606868308439&amp;w=2">OpenBSD</a> and <a href="https://github.com/microsoft/WSL/issues/286">WSL</a>.</p>

<p>(Note: The <code class="language-plaintext highlighter-rouge">volatile</code> is necessary because C compilers rightfully do
not see the contents of <code class="language-plaintext highlighter-rouge">buf</code> as being consumed. Execution of the
contents isn’t considered.)</p>

<p>In case you hadn’t already caught it, there’s a catch. The linker needs
to link a binary that asks the loader for an executable stack (<code class="language-plaintext highlighter-rouge">-z
execstack</code>):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cc -std=c99 -Os -Wl,-z,execstack trampoline.c
</code></pre></div></div>

<p>That’s because <code class="language-plaintext highlighter-rouge">buf</code> contains x86 code implementing the trampoline:</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">mov</span>  <span class="nb">edx</span><span class="p">,</span> <span class="nv">invert</span>    <span class="c1">; assign third argument</span>
<span class="nf">mov</span>  <span class="nb">rax</span><span class="p">,</span> <span class="nv">cmp</span>       <span class="c1">; store cmp address in RAX register</span>
<span class="nf">jmp</span>  <span class="nb">rax</span>            <span class="c1">; jump to cmp</span>
</code></pre></div></div>

<p>(Note: The absolute jump through a 64-bit register is necessary because
the trampoline on the stack and the jump target will be very far apart.
Further, these days the program will likely be compiled as a Position
Independent Executable (PIE), so <code class="language-plaintext highlighter-rouge">cmp</code> <a href="https://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models">might itself have an high
address</a> rather than load into the lowest 32 bits of the address
space.)</p>

<p>However, executable stacks were phased out ~15 years ago because it
makes buffer overflows so much more dangerous! Attackers can inject
and execute whatever code they like, typically <em>shellcode</em>. That’s why
we need this unusual linker option.</p>

<p>You can see that the stack will be executable using our old friend,
<code class="language-plaintext highlighter-rouge">readelf</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ readelf -l a.out
...
  GNU_STACK  0x00000000 0x00000000 0x00000000
             0x00000000 0x00000000 RWE   0x10
...
</code></pre></div></div>

<p>Note the “RWE” at the bottom right, meaning read-write-execute. This is
a really bad sign in a real binary. Do any binaries installed on your
system right now have an executable stack? <a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=944817">I found one on mine</a>.
(Update: <a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=944971">A major one was found in the comments by Walter Misar</a>.)</p>

<p>When compiling the original version using a nested function there’s no
need for that special linker option. That’s because GCC saw that it
would need an executable stack and used this option automatically.</p>

<p>Or, more specifically, GCC <em>stopped</em> requesting a non-executable stack
in the object file it produced. For the GNU Binutils linker, <strong>the
default is an executable stack.</strong></p>

<h3 id="fail-open-design">Fail open design</h3>

<p>Since this is the default, the only way to get a non-executable stack is
if <em>every</em> object file input to the linker explicitly declares that it
does not need an executable stack. To request a non-executable stack, an
object file <a href="https://www.airs.com/blog/archives/518">must contain the (empty) section <code class="language-plaintext highlighter-rouge">.note.GNU-stack</code></a>.
If even a single object file fails to do this, then the final program
gets an executable stack.</p>

<p>Not only does one contaminated object file infect the binary, everything
dynamically linked with it <em>also</em> gets an executable stack. Entire
processes are infected! This occurs even via <code class="language-plaintext highlighter-rouge">dlopen()</code>, where the stack
is dynamically made executable to accomodate the new shared object.</p>

<p>I’ve been bit myself. In <a href="/blog/2016/11/15/"><em>Baking Data with Serialization</em></a> I did
it completely by accident, and I didn’t notice my mistake until three
years later. The GNU linker outputs object files without the special
note by default even though the object file only contains data.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ echo hello world &gt;hello.txt
$ ld -r -b binary -o hello.o hello.txt
$ readelf -S hello.o | grep GNU-stack
$
</code></pre></div></div>

<p>This is fixed with <code class="language-plaintext highlighter-rouge">-z noexecstack</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ld -r -b binary -z noexecstack -o hello.o hello.txt
$ readelf -S hello.o | grep GNU-stack
  [ 2] .note.GNU-stack  PROGBITS  00000000  0000004c
$
</code></pre></div></div>

<p>This may happen any time you link object files not produced by GCC, such
as output <a href="/blog/2015/04/19/">from the NASM assembler</a> or <a href="/blog/2016/11/17/">hand-crafted object
files</a>.</p>

<p>Nested C closures are super slick, but they’re just not worth the risk
of an executable stack, and they’re certainly not worth an entire
toolchain being fail open about it.</p>

<p>Update: A <a href="http://verisimilitudes.net/2019-11-21">rebuttal</a>. My short response is that the issue
discussed in my article isn’t really about C the language but rather
about an egregious issue with one particular toolchain. The problem
doesn’t even arise if you use only C, but instead when linking in object
files specifically <em>not</em> derived from C code.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Endlessh: an SSH Tarpit</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/03/22/"/>
    <id>urn:uuid:5429ee15-3d42-4af2-8690-f7f402870dd0</id>
    <updated>2019-03-22T17:26:45Z</updated>
    <category term="netsec"/><category term="python"/><category term="c"/><category term="posix"/><category term="asyncio"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=19465967">on Hacker News</a> (<a href="https://news.ycombinator.com/item?id=24491453">later</a>), <a href="https://old.reddit.com/r/programming/comments/b4iq00/endlessh_an_ssh_tarpit/">on
reddit</a> (<a href="https://old.reddit.com/r/netsec/comments/b4dwjl/endlessh_an_ssh_tarpit/">also</a>), featured in <a href="https://www.youtube.com/watch?v=bM65iyRRW0A&amp;t=3m52s">BSD Now 294</a>.
Also check out <a href="https://github.com/bediger4000/ssh-tarpit-behavior">this Endlessh analysis</a>.</em></p>

<p>I’m a big fan of tarpits: a network service that intentionally inserts
delays in its protocol, slowing down clients by forcing them to wait.
This arrests the speed at which a bad actor can attack or probe the
host system, and it ties up some of the attacker’s resources that
might otherwise be spent attacking another host. When done well, a
tarpit imposes more cost on the attacker than the defender.</p>

<!--more-->

<p>The Internet is a very hostile place, and anyone who’s ever stood up
an Internet-facing IPv4 host has witnessed the immediate and
continuous attacks against their server. I’ve maintained <a href="/blog/2017/06/15/">such a
server</a> for nearly six years now, and more than 99% of my
incoming traffic has ill intent. One part of my defenses has been
tarpits in various forms. The latest addition is an SSH tarpit I wrote
a couple of months ago:</p>

<p><a href="https://github.com/skeeto/endlessh"><strong>Endlessh: an SSH tarpit</strong></a></p>

<p>This program opens a socket and pretends to be an SSH server. However,
it actually just ties up SSH clients with false promises indefinitely
— or at least until the client eventually gives up. After cloning the
repository, here’s how you can try it out for yourself (default port
2222):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ make
$ ./endlessh &amp;
$ ssh -p2222 localhost
</code></pre></div></div>

<p>Your SSH client will hang there and wait for at least several days
before finally giving up. Like a mammoth in the La Brea Tar Pits, it
got itself stuck and can’t get itself out. As I write, my
Internet-facing SSH tarpit currently has 27 clients trapped in it. A
few of these have been connected for weeks. In one particular spike it
had 1,378 clients trapped at once, lasting about 20 hours.</p>

<p>My Internet-facing Endlessh server listens on port 22, which is the
standard SSH port. I long ago moved my real SSH server off to another
port where it sees a whole lot less SSH traffic — essentially none.
This makes the logs a whole lot more manageable. And (hopefully)
Endlessh convinces attackers not to look around for an SSH server on
another port.</p>

<p>How does it work? Endlessh exploits <a href="https://tools.ietf.org/html/rfc4253#section-4.2">a little paragraph in RFC
4253</a>, the SSH protocol specification. Immediately after the TCP
connection is established, and before negotiating the cryptography,
both ends send an identification string:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>SSH-protoversion-softwareversion SP comments CR LF
</code></pre></div></div>

<p>The RFC also notes:</p>

<blockquote>
  <p>The server MAY send other lines of data before sending the version
string.</p>
</blockquote>

<p>There is no limit on the number of lines, just that these lines must
not begin with “SSH-“ since that would be ambiguous with the
identification string, and lines must not be longer than 255
characters including CRLF. So <strong>Endlessh sends and <em>endless</em> stream of
randomly-generated “other lines of data”</strong> without ever intending to
send a version string. By default it waits 10 seconds between each
line. This slows down the protocol, but prevents it from actually
timing out.</p>

<p>This means Endlessh need not know anything about cryptography or the
vast majority of the SSH protocol. It’s dead simple.</p>

<h3 id="implementation-strategies">Implementation strategies</h3>

<p>Ideally the tarpit’s resource footprint should be as small as
possible. It’s just a security tool, and the server does have an
actual purpose that doesn’t include being a tarpit. It should tie up
the attacker’s resources, not the server’s, and should generally be
unnoticeable. (Take note all those who write the awful “security”
products I have to tolerate at my day job.)</p>

<p>Even when many clients have been trapped, Endlessh spends more than
99.999% of its time waiting around, doing nothing. It wouldn’t even be
accurate to call it I/O-bound. If anything, it’s <em>timer-bound</em>,
waiting around before sending off the next line of data. <strong>The most
precious resource to conserve is <em>memory</em>.</strong></p>

<h4 id="processes">Processes</h4>

<p>The most straightforward way to implement something like Endlessh is a
fork server: accept a connection, fork, and the child simply alternates
between <code class="language-plaintext highlighter-rouge">sleep(3)</code> and <code class="language-plaintext highlighter-rouge">write(2)</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="p">(;;)</span> <span class="p">{</span>
    <span class="kt">ssize_t</span> <span class="n">r</span><span class="p">;</span>
    <span class="kt">char</span> <span class="n">line</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>

    <span class="n">sleep</span><span class="p">(</span><span class="n">DELAY</span><span class="p">);</span>
    <span class="n">generate_line</span><span class="p">(</span><span class="n">line</span><span class="p">);</span>
    <span class="n">r</span> <span class="o">=</span> <span class="n">write</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">line</span><span class="p">,</span> <span class="n">strlen</span><span class="p">(</span><span class="n">line</span><span class="p">));</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">r</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span> <span class="o">&amp;&amp;</span> <span class="n">errno</span> <span class="o">!=</span> <span class="n">EINTR</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>A process per connection is a lot of overhead when connections are
expected to be up hours or even weeks at a time. An attacker who knows
about this could exhaust the server’s resources with little effort by
opening up lots of connections.</p>

<h4 id="threads">Threads</h4>

<p>A better option is, instead of processes, to create a thread per
connection. On Linux <a href="/blog/2015/05/15/">this is practically the same thing</a>, but it’s
still better. However, you still have to allocate a stack for the thread
and the kernel will have to spend some resources managing the thread.</p>

<h4 id="poll">Poll</h4>

<p>For Endlessh I went for an even more lightweight version: a
single-threaded <code class="language-plaintext highlighter-rouge">poll(2)</code> server, analogous to stackless green threads.
The overhead per connection is about as low as it gets.</p>

<p>Clients that are being delayed are not registered in <code class="language-plaintext highlighter-rouge">poll(2)</code>. Their
only overhead is the socket object in the kernel, and another 78 bytes
to track them in Endlessh. Most of those bytes are used only for
accurate logging. Only those clients that are overdue for a new line
are registered for <code class="language-plaintext highlighter-rouge">poll(2)</code>.</p>

<p>When clients are waiting, but no clients are overdue, <code class="language-plaintext highlighter-rouge">poll(2)</code> is
essentially used in place of <code class="language-plaintext highlighter-rouge">sleep(3)</code>. Though since it still needs
to manage the <em>accept</em> server socket, it (almost) never actually waits
on <em>nothing</em>.</p>

<p>There’s an option to limit the total number of client connections so
that it doesn’t get out of hand. In this case it will stop polling the
accept socket until a client disconnects. I probably shouldn’t have
bothered with this option and instead relied on <code class="language-plaintext highlighter-rouge">ulimit</code>, a feature
already provided by the operating system.</p>

<p>I could have used epoll (Linux) or kqueue (BSD), which would be much
more efficient than <code class="language-plaintext highlighter-rouge">poll(2)</code>. The problem with <code class="language-plaintext highlighter-rouge">poll(2)</code> is that it’s
constantly registering and unregistering Endlessh on each of the
overdue sockets each time around the main loop. This is by far the
most CPU-intensive part of Endlessh, and it’s all inflicted on the
kernel. Most of the time, even with thousands of clients trapped in
the tarpit, only a small number of them at polled at once, so I opted
for better portability instead.</p>

<p>One consequence of not polling connections that are waiting is that
disconnections aren’t noticed in a timely fashion. This makes the logs
less accurate than I like, but otherwise it’s pretty harmless.
Unforunately even if I wanted to fix this, the <code class="language-plaintext highlighter-rouge">poll(2)</code> interface
isn’t quite equipped for it anyway.</p>

<h4 id="raw-sockets">Raw sockets</h4>

<p>With a <code class="language-plaintext highlighter-rouge">poll(2)</code> server, the biggest overhead remaining is in the
kernel, where it allocates send and receive buffers for each client
and manages the proper TCP state. The next step to reducing this
overhead is Endlessh opening a <em>raw socket</em> and speaking TCP itself,
bypassing most of the operating system’s TCP/IP stack.</p>

<p>Much of the TCP connection state doesn’t matter to Endlessh and doesn’t
need to be tracked. For example, it doesn’t care about any data sent by
the client, so no receive buffer is needed, and any data that arrives
could be dropped on the floor.</p>

<p>Even more, raw sockets would allow for some even nastier tarpit tricks.
Despite the long delays between data lines, the kernel itself responds
very quickly on the TCP layer and below. ACKs are sent back quickly and
so on. An astute attacker could detect that the delay is artificial,
imposed above the TCP layer by an application.</p>

<p>If Endlessh worked at the TCP layer, it could <a href="https://nyman.re/super-simple-ssh-tarpit/">tarpit the TCP protocol
itself</a>. It could introduce artificial “noise” to the connection
that requires packet retransmissions, delay ACKs, etc. It would look a
lot more like network problems than a tarpit.</p>

<p>I haven’t taken Endlessh this far, nor do I plan to do so. At the
moment attackers either have a hard timeout, so this wouldn’t matter,
or they’re pretty dumb and Endlessh already works well enough.</p>

<h3 id="asyncio-and-other-tarpits">asyncio and other tarpits</h3>

<p>Since writing Endless <a href="/blog/2019/03/10/">I’ve learned about Python’s <code class="language-plaintext highlighter-rouge">asyncio</code></a>, and
it’s actually a near perfect fit for this problem. I should have just
used it in the first place. The hard part is already implemented within
<code class="language-plaintext highlighter-rouge">asyncio</code>, and the problem isn’t CPU-bound, so being written in Python
<a href="/blog/2019/02/24/">doesn’t matter</a>.</p>

<p>Here’s a simplified (no logging, no configuration, etc.) version of
Endlessh implemented in about 20 lines of Python 3.7:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">asyncio</span>
<span class="kn">import</span> <span class="nn">random</span>

<span class="k">async</span> <span class="k">def</span> <span class="nf">handler</span><span class="p">(</span><span class="n">_reader</span><span class="p">,</span> <span class="n">writer</span><span class="p">):</span>
    <span class="k">try</span><span class="p">:</span>
        <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
            <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
            <span class="n">writer</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="sa">b</span><span class="s">'%x</span><span class="se">\r\n</span><span class="s">'</span> <span class="o">%</span> <span class="n">random</span><span class="p">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="o">**</span><span class="mi">32</span><span class="p">))</span>
            <span class="k">await</span> <span class="n">writer</span><span class="p">.</span><span class="n">drain</span><span class="p">()</span>
    <span class="k">except</span> <span class="nb">ConnectionResetError</span><span class="p">:</span>
        <span class="k">pass</span>

<span class="k">async</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
    <span class="n">server</span> <span class="o">=</span> <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">start_server</span><span class="p">(</span><span class="n">handler</span><span class="p">,</span> <span class="s">'0.0.0.0'</span><span class="p">,</span> <span class="mi">2222</span><span class="p">)</span>
    <span class="k">async</span> <span class="k">with</span> <span class="n">server</span><span class="p">:</span>
        <span class="k">await</span> <span class="n">server</span><span class="p">.</span><span class="n">serve_forever</span><span class="p">()</span>

<span class="n">asyncio</span><span class="p">.</span><span class="n">run</span><span class="p">(</span><span class="n">main</span><span class="p">())</span>
</code></pre></div></div>

<p>Since Python coroutines are stackless, the per-connection memory
overhead is comparable to the C version. So it seems asyncio is
perfectly suited for writing tarpits! Here’s an HTTP tarpit to trip up
attackers trying to exploit HTTP servers. It slowly sends a random,
endless HTTP header:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">asyncio</span>
<span class="kn">import</span> <span class="nn">random</span>

<span class="k">async</span> <span class="k">def</span> <span class="nf">handler</span><span class="p">(</span><span class="n">_reader</span><span class="p">,</span> <span class="n">writer</span><span class="p">):</span>
    <span class="n">writer</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="sa">b</span><span class="s">'HTTP/1.1 200 OK</span><span class="se">\r\n</span><span class="s">'</span><span class="p">)</span>
    <span class="k">try</span><span class="p">:</span>
        <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
            <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
            <span class="n">header</span> <span class="o">=</span> <span class="n">random</span><span class="p">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="o">**</span><span class="mi">32</span><span class="p">)</span>
            <span class="n">value</span> <span class="o">=</span> <span class="n">random</span><span class="p">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="o">**</span><span class="mi">32</span><span class="p">)</span>
            <span class="n">writer</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="sa">b</span><span class="s">'X-%x: %x</span><span class="se">\r\n</span><span class="s">'</span> <span class="o">%</span> <span class="p">(</span><span class="n">header</span><span class="p">,</span> <span class="n">value</span><span class="p">))</span>
            <span class="k">await</span> <span class="n">writer</span><span class="p">.</span><span class="n">drain</span><span class="p">()</span>
    <span class="k">except</span> <span class="nb">ConnectionResetError</span><span class="p">:</span>
        <span class="k">pass</span>

<span class="k">async</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
    <span class="n">server</span> <span class="o">=</span> <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">start_server</span><span class="p">(</span><span class="n">handler</span><span class="p">,</span> <span class="s">'0.0.0.0'</span><span class="p">,</span> <span class="mi">8080</span><span class="p">)</span>
    <span class="k">async</span> <span class="k">with</span> <span class="n">server</span><span class="p">:</span>
        <span class="k">await</span> <span class="n">server</span><span class="p">.</span><span class="n">serve_forever</span><span class="p">()</span>

<span class="n">asyncio</span><span class="p">.</span><span class="n">run</span><span class="p">(</span><span class="n">main</span><span class="p">())</span>
</code></pre></div></div>

<p>Try it out for yourself. Firefox and Chrome will spin on that server
for hours before giving up. I have yet to see curl actually timeout on
its own in the default settings (<code class="language-plaintext highlighter-rouge">--max-time</code>/<code class="language-plaintext highlighter-rouge">-m</code> does work
correctly, though).</p>

<p>Parting exercise for the reader: Using the examples above as a starting
point, implement an SMTP tarpit using asyncio. Bonus points for using
TLS connections and testing it against real spammers.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>When the Compiler Bites</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2018/05/01/"/>
    <id>urn:uuid:02b974e1-e25b-397d-a16f-c754338e9c1e</id>
    <updated>2018-05-01T23:28:06Z</updated>
    <category term="c"/><category term="x86"/><category term="optimization"/><category term="ai"/><category term="netsec"/>
    <content type="html">
      <![CDATA[<p><em>Update: There are discussions <a href="https://old.reddit.com/r/cpp/comments/8gfhq3/when_the_compiler_bites/">on Reddit</a> and <a href="https://news.ycombinator.com/item?id=16974770">on Hacker
News</a>.</em></p>

<p>So far this year I’ve been bitten three times by compiler edge cases
in GCC and Clang, each time catching me totally by surprise. Two were
caused by historical artifacts, where an ambiguous specification lead
to diverging implementations. The third was a compiler optimization
being far more clever than I expected, behaving almost like an
artificial intelligence.</p>

<p>In all examples I’ll be using GCC 7.3.0 and Clang 6.0.0 on Linux.</p>

<h3 id="x86-64-abi-ambiguity">x86-64 ABI ambiguity</h3>

<p>The first time I was bit — or, well, narrowly avoided being bit — was
when I examined a missed floating point optimization in both Clang and
GCC. Consider this function:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">double</span>
<span class="nf">zero_multiply</span><span class="p">(</span><span class="kt">double</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="n">x</span> <span class="o">*</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The function multiplies its argument by zero and returns the result. Any
number multiplied by zero is zero, so this should always return zero,
right? Unfortunately, no. IEEE 754 floating point arithmetic supports
NaN, infinities, and signed zeros. This function can return NaN,
positive zero, or negative zero. (In some cases, the operation could
also potentially produce a hardware exception.)</p>

<p>As a result, both GCC and Clang perform the multiply:</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">zero_multiply:</span>
    <span class="nf">xorpd</span>  <span class="nv">xmm1</span><span class="p">,</span> <span class="nv">xmm1</span>
    <span class="nf">mulsd</span>  <span class="nv">xmm0</span><span class="p">,</span> <span class="nv">xmm1</span>
    <span class="nf">ret</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">-ffast-math</code> option relaxes the C standard floating point rules,
permitting an optimization at the cost of conformance and
<a href="https://possiblywrong.wordpress.com/2017/09/12/floating-point-agreement-between-matlab-and-c/">consistency</a>:</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">zero_multiply:</span>
    <span class="nf">xorps</span>  <span class="nv">xmm0</span><span class="p">,</span> <span class="nv">xmm0</span>
    <span class="nf">ret</span>
</code></pre></div></div>

<p>Side note: <code class="language-plaintext highlighter-rouge">-ffast-math</code> doesn’t necessarily mean “less precise.”
Sometimes it will actually <a href="https://en.wikipedia.org/wiki/Multiply–accumulate_operation#Fused_multiply–add">improve precision</a>.</p>

<p>Here’s a modified version of the function that’s a little more
interesting. I’ve changed the argument to a <code class="language-plaintext highlighter-rouge">short</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">double</span>
<span class="nf">zero_multiply_short</span><span class="p">(</span><span class="kt">short</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="n">x</span> <span class="o">*</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>It’s no longer possible for the argument to be one of those special
values. The <code class="language-plaintext highlighter-rouge">short</code> will be promoted to one of 65,535 possible <code class="language-plaintext highlighter-rouge">double</code>
values, each of which results in 0.0 when multiplied by 0.0. GCC misses
this optimization (<code class="language-plaintext highlighter-rouge">-Os</code>):</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">zero_multiply_short:</span>
    <span class="nf">movsx</span>     <span class="nb">edi</span><span class="p">,</span> <span class="nb">di</span>       <span class="c1">; sign-extend 16-bit argument</span>
    <span class="nf">xorps</span>     <span class="nv">xmm1</span><span class="p">,</span> <span class="nv">xmm1</span>    <span class="c1">; xmm1 = 0.0</span>
    <span class="nf">cvtsi2sd</span>  <span class="nv">xmm0</span><span class="p">,</span> <span class="nb">edi</span>     <span class="c1">; convert int to double</span>
    <span class="nf">mulsd</span>     <span class="nv">xmm0</span><span class="p">,</span> <span class="nv">xmm1</span>
    <span class="nf">ret</span>
</code></pre></div></div>

<p>Clang also misses this optimization:</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">zero_multiply_short:</span>
    <span class="nf">cvtsi2sd</span> <span class="nv">xmm1</span><span class="p">,</span> <span class="nb">edi</span>
    <span class="nf">xorpd</span>    <span class="nv">xmm0</span><span class="p">,</span> <span class="nv">xmm0</span>
    <span class="nf">mulsd</span>    <span class="nv">xmm0</span><span class="p">,</span> <span class="nv">xmm1</span>
    <span class="nf">ret</span>
</code></pre></div></div>

<p>But hang on a minute. This is shorter by one instruction. What
happened to the sign-extension (<code class="language-plaintext highlighter-rouge">movsx</code>)? Clang is treating that
<code class="language-plaintext highlighter-rouge">short</code> argument as if it were a 32-bit value. Why do GCC and Clang
differ? Is GCC doing something unnecessary?</p>

<p>It turns out that the <a href="https://www.uclibc.org/docs/psABI-x86_64.pdf">x86-64 ABI</a> didn’t specify what happens with
the upper bits in argument registers. Are they garbage? Are they zeroed?
GCC takes the conservative position of assuming the upper bits are
arbitrary garbage. Clang takes the boldest position of assuming
arguments smaller than 32 bits have been promoted to 32 bits by the
caller. This is what the ABI specification <em>should</em> have said, but
currently it does not.</p>

<p>Fortunately GCC also conservative when passing arguments. It promotes
arguments to 32 bits as necessary, so there are no conflicts when
linking against Clang-compiled code. However, this is not true for
Intel’s ICC compiler: <a href="https://web.archive.org/web/20180908113552/https://stackoverflow.com/a/36760539"><strong>Clang and ICC are not ABI-compatible on
x86-64</strong></a>.</p>

<p>I don’t use ICC, so that particular issue wouldn’t bite me, <em>but</em> if I
was ever writing assembly routines that called Clang-compiled code, I’d
eventually get bit by this.</p>

<h3 id="floating-point-precision">Floating point precision</h3>

<p>Without looking it up or trying it, what does this function return?
Think carefully.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>
<span class="nf">float_compare</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">float</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">1</span><span class="p">.</span><span class="mi">3</span><span class="n">f</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">x</span> <span class="o">==</span> <span class="mi">1</span><span class="p">.</span><span class="mi">3</span><span class="n">f</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Confident in your answer? This is a trick question, because it can
return either 0 or 1 depending on the compiler. Boy was I confused when
this comparison returned 0 in my real world code.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc   -std=c99 -m32 cmp.c  # float_compare() == 0
$ clang -std=c99 -m32 cmp.c  # float_compare() == 1
</code></pre></div></div>

<p>So what’s going on here? The original ANSI C specification wasn’t
clear about how intermediate floating point values get rounded, and
implementations <a href="https://news.ycombinator.com/item?id=16974770">all did it differently</a>. The C99 specification
cleaned this all up and introduced <a href="https://en.wikipedia.org/wiki/C99#IEEE_754_floating_point_support"><code class="language-plaintext highlighter-rouge">FLT_EVAL_METHOD</code></a>.
Implementations can still differ, but at least you can now determine
at compile-time what the compiler would do by inspecting that macro.</p>

<p>Back in the late 1980’s or early 1990’s when the GCC developers were
deciding how GCC should implement floating point arithmetic, the trend
at the time was to use as much precision as possible. On the x86 this
meant using its support for 80-bit extended precision floating point
arithmetic. Floating point operations are performed in <code class="language-plaintext highlighter-rouge">long double</code>
precision and truncated afterward (<code class="language-plaintext highlighter-rouge">FLT_EVAL_METHOD == 2</code>).</p>

<p>In <code class="language-plaintext highlighter-rouge">float_compare()</code> the left-hand side is truncated to a <code class="language-plaintext highlighter-rouge">float</code> by the
assignment, but the right-hand side, <em>despite being a <code class="language-plaintext highlighter-rouge">float</code> literal</em>,
is actually “1.3” at 80 bits of precision as far as GCC is concerned.
That’s pretty unintuitive!</p>

<p>The remnants of this high precision trend are still in JavaScript, where
all arithmetic is double precision (even if <a href="http://thibaultlaurens.github.io/javascript/2013/04/29/how-the-v8-engine-works/#more-example-on-how-v8-optimized-javascript-code">simulated using
integers</a>), and great pains have been made <a href="https://blog.mozilla.org/javascript/2013/11/07/efficient-float32-arithmetic-in-javascript/">to work around</a>
the performance consequences of this. <a href="http://tirania.org/blog/archive/2018/Apr-11.html">Until recently</a>, Mono had
similar issues.</p>

<p>The trend reversed once SIMD hardware became widely available and
there were huge performance gains to be had. Multiple values could be
computed at once, side by side, at lower precision. So on x86-64, this
became the default (<code class="language-plaintext highlighter-rouge">FLT_EVAL_METHOD == 0</code>). The young Clang compiler
wasn’t around until well after this trend reversed, so it behaves
differently than the <a href="https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323">backwards compatible</a> GCC on the old x86.</p>

<p>I’m a little ashamed that I’m only finding out about this now. However,
by the time I was competent enough to notice and understand this issue,
I was already doing nearly all my programming on the x86-64.</p>

<h3 id="built-in-function-elimination">Built-in Function Elimination</h3>

<p>I’ve saved this one for last since it’s my favorite. Suppose we have
this little function, <code class="language-plaintext highlighter-rouge">new_image()</code>, that allocates a greyscale image
for, say, <a href="/blog/2017/11/03/">some multimedia library</a>.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span>
<span class="nf">new_image</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">w</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">h</span><span class="p">,</span> <span class="kt">int</span> <span class="n">shade</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">p</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">w</span> <span class="o">==</span> <span class="mi">0</span> <span class="o">||</span> <span class="n">h</span> <span class="o">&lt;=</span> <span class="n">SIZE_MAX</span> <span class="o">/</span> <span class="n">w</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// overflow?</span>
        <span class="n">p</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="n">w</span> <span class="o">*</span> <span class="n">h</span><span class="p">);</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">p</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">memset</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">shade</span><span class="p">,</span> <span class="n">w</span> <span class="o">*</span> <span class="n">h</span><span class="p">);</span>
        <span class="p">}</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">p</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>It’s a static function because this would be part of some <a href="https://github.com/nothings/stb">slick
header library</a> (and, secretly, because it’s necessary for
illustrating the issue). Being a responsible citizen, the function
even <a href="/blog/2017/07/19/">checks for integer overflow</a> before allocating anything.</p>

<p>I write a unit test to make sure it detects overflow. This function
should return 0.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* expected return == 0 */</span>
<span class="kt">int</span>
<span class="nf">test_new_image_overflow</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">void</span> <span class="o">*</span><span class="n">p</span> <span class="o">=</span> <span class="n">new_image</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">SIZE_MAX</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
    <span class="k">return</span> <span class="o">!!</span><span class="n">p</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>So far my test passes. Good.</p>

<p>I’d also like to make sure it correctly returns NULL — or, more
specifically, that it doesn’t crash — if the allocation fails. But how
can I make <code class="language-plaintext highlighter-rouge">malloc()</code> fail? As a hack I can pass image dimensions that
I know cannot ever practically be allocated. Essentially I want to
force a <code class="language-plaintext highlighter-rouge">malloc(SIZE_MAX)</code>, e.g. allocate every available byte in my
virtual address space. For a conventional 64-bit machine, that’s 16
exibytes of memory, and it leaves space for nothing else, including
the program itself.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* expected return == 0 */</span>
<span class="kt">int</span>
<span class="nf">test_new_image_oom</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">void</span> <span class="o">*</span><span class="n">p</span> <span class="o">=</span> <span class="n">new_image</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">SIZE_MAX</span><span class="p">,</span> <span class="mh">0xff</span><span class="p">);</span>
    <span class="k">return</span> <span class="o">!!</span><span class="n">p</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I compile with GCC, test passes. I compile with Clang and the test
fails. That is, <strong>the test somehow managed to allocate 16 exibytes of
memory, <em>and</em> initialize it</strong>. Wat?</p>

<p>Disassembling the test reveals what’s going on:</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">test_new_image_overflow:</span>
    <span class="nf">xor</span>  <span class="nb">eax</span><span class="p">,</span> <span class="nb">eax</span>
    <span class="nf">ret</span>

<span class="nl">test_new_image_oom:</span>
    <span class="nf">mov</span>  <span class="nb">eax</span><span class="p">,</span> <span class="mi">1</span>
    <span class="nf">ret</span>
</code></pre></div></div>

<p>The first test is actually being evaluated at compile time by the
compiler. The function being tested was inlined into the unit test
itself. This permits the compiler to collapse the whole thing down to
a single instruction. The path with <code class="language-plaintext highlighter-rouge">malloc()</code> became dead code and
was trivially eliminated.</p>

<p>In the second test, Clang correctly determined that the image buffer is
not actually being used, despite the <code class="language-plaintext highlighter-rouge">memset()</code>, so it eliminated the
allocation altogether and then <em>simulated</em> a successful allocation
despite it being absurdly large. Allocating memory is not an observable
side effect as far as the language specification is concerned, so it’s
allowed to do this. My thinking was wrong, and the compiler outsmarted
me.</p>

<p>I soon realized I can take this further and trick Clang into
performing an invalid optimization, <a href="https://bugs.llvm.org/show_bug.cgi?id=37304">revealing a bug</a>. Consider
this slightly-optimized version that uses <code class="language-plaintext highlighter-rouge">calloc()</code> when the shade is
zero (black). The <code class="language-plaintext highlighter-rouge">calloc()</code> function does its own overflow check, so
<code class="language-plaintext highlighter-rouge">new_image()</code> doesn’t need to do it.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">void</span> <span class="o">*</span>
<span class="nf">new_image</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">w</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">h</span><span class="p">,</span> <span class="kt">int</span> <span class="n">shade</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">p</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">shade</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// shortcut</span>
        <span class="n">p</span> <span class="o">=</span> <span class="n">calloc</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="n">h</span><span class="p">);</span>
    <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">w</span> <span class="o">==</span> <span class="mi">0</span> <span class="o">||</span> <span class="n">h</span> <span class="o">&lt;=</span> <span class="n">SIZE_MAX</span> <span class="o">/</span> <span class="n">w</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// overflow?</span>
        <span class="n">p</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="n">w</span> <span class="o">*</span> <span class="n">h</span><span class="p">);</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">p</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">memset</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">color</span><span class="p">,</span> <span class="n">w</span> <span class="o">*</span> <span class="n">h</span><span class="p">);</span>
        <span class="p">}</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">p</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>With this change, my overflow unit test is now also failing. The
situation is even worse than before. The <code class="language-plaintext highlighter-rouge">calloc()</code> is being
eliminated <em>despite the overflow</em>, and replaced with a simulated
success. This time it’s actually a bug in Clang. While failing a unit
test is mostly harmless, <strong>this could introduce a vulnerability in a
real program</strong>. The OpenBSD folks are so worried about this sort of
thing that <a href="https://marc.info/?l=openbsd-cvs&amp;m=150125592126437&amp;w=2">they’ve disabled this optimization</a>.</p>

<p>Here’s a slightly-contrived example of this. Imagine a program that
maintains a table of unsigned integers, and we want to keep track of
how many times the program has accessed each table entry. The “access
counter” table is initialized to zero, but the table of values need
not be initialized, since they’ll be written before first access (or
something like that).</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">table</span> <span class="p">{</span>
    <span class="kt">unsigned</span> <span class="o">*</span><span class="n">counter</span><span class="p">;</span>
    <span class="kt">unsigned</span> <span class="o">*</span><span class="n">values</span><span class="p">;</span>
<span class="p">};</span>

<span class="k">static</span> <span class="kt">int</span>
<span class="nf">table_init</span><span class="p">(</span><span class="k">struct</span> <span class="n">table</span> <span class="o">*</span><span class="n">t</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">n</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">t</span><span class="o">-&gt;</span><span class="n">counter</span> <span class="o">=</span> <span class="n">calloc</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">counter</span><span class="p">));</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">counter</span><span class="p">)</span> <span class="p">{</span>
        <span class="cm">/* Overflow already tested above */</span>
        <span class="n">t</span><span class="o">-&gt;</span><span class="n">values</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="n">n</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">values</span><span class="p">));</span>
        <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">values</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">free</span><span class="p">(</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">counter</span><span class="p">);</span>
            <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">// fail</span>
        <span class="p">}</span>
        <span class="k">return</span> <span class="mi">1</span><span class="p">;</span> <span class="c1">// success</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">// fail</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This function relies on the overflow test in <code class="language-plaintext highlighter-rouge">calloc()</code> for the second
<code class="language-plaintext highlighter-rouge">malloc()</code> allocation. However, this is a static function that’s
likely to get inlined, as we saw before. If the program doesn’t
actually make use of the <code class="language-plaintext highlighter-rouge">counter</code> table, and Clang is able to
statically determine this fact, it may eliminate the <code class="language-plaintext highlighter-rouge">calloc()</code>. This
would also <strong>eliminate the overflow test, introducing a
vulnerability</strong>. If an attacker can control <code class="language-plaintext highlighter-rouge">n</code>, then they can
overwrite arbitrary memory through that <code class="language-plaintext highlighter-rouge">values</code> pointer.</p>

<h3 id="the-takeaway">The takeaway</h3>

<p>Besides this surprising little bug, the main lesson for me is that I
should probably isolate unit tests from the code being tested. The
easiest solution is to put them in separate translation units and don’t
use link-time optimization (LTO). Allowing tested functions to be
inlined into the unit tests is probably a bad idea.</p>

<p>The unit test issues in my <em>real</em> program, which was <a href="https://github.com/skeeto/growable-buf">a bit more
sophisticated</a> than what was presented here, gave me artificial
intelligence vibes. It’s that situation where a computer algorithm did
something really clever and I felt it outsmarted me. It’s creepy to
consider <a href="https://wiki.lesswrong.com/wiki/Paperclip_maximizer">how far that can go</a>. I’ve gotten that even from
observing <a href="/blog/2017/04/27/">AI I’ve written myself</a>, and I know for sure no human
taught it some particularly clever trick.</p>

<p>My favorite AI story along these lines is about <a href="https://www.youtube.com/watch?v=xOCurBYI_gY">an AI that learned
how to play games on the Nintendo Entertainment System</a>. It
didn’t understand the games it was playing. It’s optimization task was
simply to choose controller inputs that maximized memory values,
because that’s generally associated with doing well — higher scores,
more progress, etc. The most unexpected part came when playing Tetris.
Eventually the screen would fill up with blocks, and the AI would face
the inevitable situation of losing the game, with all that memory
being reinitialized to low values. So what did it do?</p>

<p>Just before the end it would pause the game and wait… forever.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Introducing the Pokerware Secure Passphrase Generator</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2017/07/27/"/>
    <id>urn:uuid:c2d33d1a-d2a2-3863-04ae-68d2b48eecd5</id>
    <updated>2017-07-27T17:49:10Z</updated>
    <category term="crypto"/><category term="meatspace"/><category term="netsec"/>
    <content type="html">
      <![CDATA[<p>I recently developed <a href="https://github.com/skeeto/pokerware"><strong>Pokerware</strong></a>, an offline passphrase
generator that operates in the same spirit as <a href="http://world.std.com/~reinhold/diceware.html">Diceware</a>.
The primary difference is that it uses a shuffled deck of playing
cards as its entropy source rather than dice. Draw some cards and use
them to select a uniformly random word from a list. Unless you’re some
sort of <a href="/blog/2011/01/10/">tabletop gaming nerd</a>, a deck of cards is more readily
available than five 6-sided dice, which would typically need to be
borrowed from the Monopoly board collecting dust on the shelf, then
rolled two at a time.</p>

<p>There are various flavors of two different word lists here:</p>

<ul>
  <li><a href="https://github.com/skeeto/pokerware/releases/tag/1.0">https://github.com/skeeto/pokerware/releases/tag/1.0</a></li>
</ul>

<p>Hardware random number generators are <a href="https://lwn.net/Articles/629714/">difficult to verify</a>
and may not actually be as random as they promise, either
intentionally or unintentionally. For the particularly paranoid,
Diceware and Pokerware are an easily verifiable alternative for
generating secure passphrases for <a href="/blog/2017/03/12/">cryptographic purposes</a>.
At any time, a deck of 52 playing cards is in one of 52! possible
arrangements. That’s more than 225 bits of entropy. If you give your
deck <a href="https://possiblywrong.wordpress.com/2011/03/27/card-shuffling-youre-not-done-yet/">a thorough shuffle</a>, it will be in an arrangement that
has never been seen before and will never be seen again. Pokerware
draws on some of these bits to generate passphrases.</p>

<p>The Pokerware list has 5,304 words (12.4 bits per word), compared to
Diceware’s 7,776 words (12.9 bits per word). My goal was to invent a
card-drawing scheme that would uniformly select from a list in the same
sized ballpark as Diceware. Much smaller and you’d have to memorize more
words for the same passphrase strength. Much larger and the words on the
list would be more difficult to memorize, since the list would contain
longer and less frequently used words. Diceware strikes a nice balance
at five dice.</p>

<!-- Photo credit: Kelsey Wellons -->
<p><img src="/img/pokerware/deck.jpg" alt="" /></p>

<p>One important difference for me is that <em>I like my Pokerware word
lists a lot more</em> than the two official Diceware lists. My lists only
have simple, easy-to-remember words (for American English speakers, at
least), without any numbers or other short non-words. Pokerware has
two official lists, “formal” and “slang,” since my early testers
couldn’t agree on which was better. Rather than make a difficult
decision, I took the usual route of making no decision at all.</p>

<p>The “formal” list is derived in part from <a href="https://books.google.com/ngrams">Google’s Ngram
Viewer</a>, with my own additional filters and tweaking. It’s called
“formal” because the ngrams come from formal publications and represent
more formal kinds of speech.</p>

<p>The “slang” list is derived from <a href="http://files.pushshift.io/reddit/"><em>every</em> reddit comment</a> between
December 2005 and May 2017, tamed by the same additional filters. I
<a href="/blog/2016/12/01/">have this data on hand</a>, so I may as well put it to use. I
figured more casually-used words would be easier to remember. Due to
my extra filtering, there’s actually a lot of overlap between these
lists, so the differences aren’t too significant.</p>

<p>If you have your own word list, perhaps in a different language, you
can use the Makefile in the repository to build your own Pokerware
lookup table, both plain text and PDF. The PDF is generated using
Groff macros.</p>

<h3 id="passphrase-generation-instructions">Passphrase generation instructions</h3>

<ol>
  <li>
    <p>Thoroughly shuffle the deck.</p>
  </li>
  <li>
    <p>Draw two cards. Sort them by value, then suit. Suits are in
alphabetical order: Clubs, Diamonds, Hearts, Spades.</p>
  </li>
  <li>
    <p>Draw additional cards until you get a card that doesn’t match the
face value of either of your initial two cards. Observe its suit.</p>
  </li>
  <li>
    <p>Using your two cards and observed suit, look up a word in the table.</p>
  </li>
  <li>
    <p>Place all cards back in the deck, shuffle, and repeat from step 2
until you have the desired number of words. Each word is worth 12.4
bits of entropy.</p>
  </li>
</ol>

<p>A word of warning about step 4: If you use software to do the word list
lookup, beware that it might save your search/command history — and
therefore your passphrase — to a file. For example, the <code class="language-plaintext highlighter-rouge">less</code> pager
will store search history in <code class="language-plaintext highlighter-rouge">~/.lesshst</code>. It’s easy to prevent that
one:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ LESSHISTFILE=- less pokerware-slang.txt
</code></pre></div></div>

<h4 id="example-word-generation">Example word generation</h4>

<p>Suppose in step 2 you draw King of Hearts (KH/K♥) and Queen of Clubs
(QC/Q♣).</p>

<p class="grid"><img src="/img/pokerware/kh.png" alt="" class="card" />
<img src="/img/pokerware/qc.png" alt="" class="card" /></p>

<p>In step 3 you first draw King of Diamonds (KD/K♦), discarding it because
it matches the face value of one of your cards from step 2.</p>

<p class="grid"><img src="/img/pokerware/kd.png" alt="" class="card" /></p>

<p>Next you draw Four of Spades (4S/4♠), taking spades as your extra suit.</p>

<p class="grid"><img src="/img/pokerware/4s.png" alt="" class="card" /></p>

<p>In order, this gives you Queen of Clubs, King of Hearts, and Spades:
QCKHS or Q♣K♥♠. This corresponds to “wizard” in the formal word list and
would be the first word in your passphrase.</p>

<h4 id="a-deck-of-cards-as-an-office-tool">A deck of cards as an office tool</h4>

<p>I now have an excuse to keep a deck of cards out on my desk at work.
I’ve been using Diceware — or something approximating it since I’m not
so paranoid about hardware RNGs. From now I’ll deal new passwords from an
in-reach deck of cards. Though typically I need to tweak the results to
meet <a href="https://www.troyhunt.com/passwords-evolved-authentication-guidance-for-the-modern-era/">outdated character-composition requirements</a>.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Integer Overflow into Information Disclosure</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2017/07/19/"/>
    <id>urn:uuid:c85545c5-23a4-3147-b654-6dc7a62ee426</id>
    <updated>2017-07-19T01:57:36Z</updated>
    <category term="netsec"/><category term="c"/>
    <content type="html">
      <![CDATA[<p>Last week I was discussing <a href="https://security-tracker.debian.org/tracker/CVE-2017-7529">CVE-2017-7529</a> with <a href="/blog/2016/09/02/">my intern</a>.
Specially crafted input to Nginx causes an integer overflow which has the
potential to leak sensitive information. But how could an integer overflow
be abused to trick a program into leaking information? To answer this
question, I put together the simplest practical example I could imagine.</p>

<ul>
  <li><a href="https://github.com/skeeto/integer-overflow-demo">https://github.com/skeeto/integer-overflow-demo</a></li>
</ul>

<p>This small C program converts a vector image from a custom format
(described below) into a <a href="https://en.wikipedia.org/wiki/Netpbm_format">Netpbm image</a>, a <a href="/blog/2017/07/02/">conveniently simple
format</a>. The program defensively and carefully parses its input, but
still makes a subtle, fatal mistake. This mistake not only leads to
sensitive information disclosure, but, with a more sophisticated attack,
could be used to execute arbitrary code.</p>

<p>After getting the hang of the interface for the program, I encourage you
to take some time to work out an exploit yourself. Regardless, I’ll reveal
a functioning exploit and explain how it works.</p>

<h3 id="a-new-vector-format">A new vector format</h3>

<p>The input format is line-oriented and very similar to Netpbm itself. The
first line is the header, starting with the magic number <code class="language-plaintext highlighter-rouge">V2</code> (ASCII)
followed by the image dimensions. The target output format is Netpbm’s
“P2” (text gray scale) format, so the “V2” parallels it. The file must end
with a newline.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>V2 &lt;width&gt; &lt;height&gt;
</code></pre></div></div>

<p>What follows is drawing commands, one per line. For example, the <code class="language-plaintext highlighter-rouge">s</code>
command sets the value of a particular pixel.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>s &lt;x&gt; &lt;y&gt; &lt;00–ff&gt;
</code></pre></div></div>

<p>Since it’s not important for the demonstration, this is the only command I
implemented. It’s easy to imagine additional commands to draw lines,
circles, Bezier curves, etc.</p>

<p>Here’s an example (<code class="language-plaintext highlighter-rouge">example.txt</code>) that draws a single white point in the
middle of the image:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>V2 256 256
s 127 127 ff
</code></pre></div></div>

<p>The rendering tool reads standard input to standard output:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ render &lt; example.txt &gt; example.pgm
</code></pre></div></div>

<p>Here’s what it looks like rendered:</p>

<p><img src="/img/int-overflow/example.png" alt="" /></p>

<p>However, you will notice that when you run the rendering tool, it prompts
you for username and password. This is silly, of course, but it’s an
excuse to get “sensitive” information into memory. It will accept any
username/password combination where the username and password don’t match
each other. The key is this: <strong>It’s possible to craft a valid image that
leaks the the entered password.</strong></p>

<h3 id="tour-of-the-implementation">Tour of the implementation</h3>

<p>Without spoiling anything yet, let’s look at how this program works. The
first thing to notice is that I’m using a custom “<a href="http://www.gnu.org/software/libc/manual/html_node/Obstacks.html">obstack</a>”
allocator instead of <code class="language-plaintext highlighter-rouge">malloc()</code> and <code class="language-plaintext highlighter-rouge">free()</code>. Real-world allocators have
some defenses against this particular vulnerability. Plus a specific
exploit would have to target a specific libc. By using my own allocator,
the exploit will mostly be portable, making for a better and easier
demonstration.</p>

<p>The allocator interface should be pretty self-explanatory, except for two
details. This is an <em>obstack</em> allocator, so freeing an object also frees
every object allocated after it. Also, it doesn’t call <code class="language-plaintext highlighter-rouge">malloc()</code> in the
background. At initialization you give it a buffer from which to allocate
all memory.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">mstack</span> <span class="p">{</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">top</span><span class="p">;</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">max</span><span class="p">;</span>
    <span class="kt">char</span> <span class="n">buf</span><span class="p">[];</span>
<span class="p">};</span>

<span class="k">struct</span> <span class="n">mstack</span> <span class="o">*</span><span class="nf">mstack_init</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">,</span> <span class="kt">size_t</span><span class="p">);</span>
<span class="kt">void</span>          <span class="o">*</span><span class="nf">mstack_alloc</span><span class="p">(</span><span class="k">struct</span> <span class="n">mstack</span> <span class="o">*</span><span class="p">,</span> <span class="kt">size_t</span><span class="p">);</span>
<span class="kt">void</span>           <span class="nf">mstack_free</span><span class="p">(</span><span class="k">struct</span> <span class="n">mstack</span> <span class="o">*</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="p">);</span>
</code></pre></div></div>

<p>There are no vulnerabilities in these functions (I hope!). It’s just
here for predictability.</p>

<p>Next here’s the “authentication” function. It reads a username and
password combination from <code class="language-plaintext highlighter-rouge">/dev/tty</code>. It’s only an excuse to get a flag in
memory for this capture-the-flag game. The username and password must be
less than 32 characters each.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>
<span class="nf">authenticate</span><span class="p">(</span><span class="k">struct</span> <span class="n">mstack</span> <span class="o">*</span><span class="n">m</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">FILE</span> <span class="o">*</span><span class="n">tty</span> <span class="o">=</span> <span class="n">fopen</span><span class="p">(</span><span class="s">"/dev/tty"</span><span class="p">,</span> <span class="s">"r+"</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">tty</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">perror</span><span class="p">(</span><span class="s">"/dev/tty"</span><span class="p">);</span>
        <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="kt">char</span> <span class="o">*</span><span class="n">user</span> <span class="o">=</span> <span class="n">mstack_alloc</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="mi">32</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">user</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">fclose</span><span class="p">(</span><span class="n">tty</span><span class="p">);</span>
        <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="n">fputs</span><span class="p">(</span><span class="s">"User: "</span><span class="p">,</span> <span class="n">tty</span><span class="p">);</span>
    <span class="n">fflush</span><span class="p">(</span><span class="n">tty</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">fgets</span><span class="p">(</span><span class="n">user</span><span class="p">,</span> <span class="mi">32</span><span class="p">,</span> <span class="n">tty</span><span class="p">))</span>
        <span class="n">user</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

    <span class="kt">char</span> <span class="o">*</span><span class="n">pass</span> <span class="o">=</span> <span class="n">mstack_alloc</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="mi">32</span><span class="p">);</span>
    <span class="kt">int</span> <span class="n">result</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">pass</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">fputs</span><span class="p">(</span><span class="s">"Password: "</span><span class="p">,</span> <span class="n">tty</span><span class="p">);</span>
        <span class="n">fflush</span><span class="p">(</span><span class="n">tty</span><span class="p">);</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">fgets</span><span class="p">(</span><span class="n">pass</span><span class="p">,</span> <span class="mi">32</span><span class="p">,</span> <span class="n">tty</span><span class="p">))</span>
            <span class="n">result</span> <span class="o">=</span> <span class="n">strcmp</span><span class="p">(</span><span class="n">user</span><span class="p">,</span> <span class="n">pass</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="n">fclose</span><span class="p">(</span><span class="n">tty</span><span class="p">);</span>
    <span class="n">mstack_free</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">user</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">result</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Next here’s a little version of <code class="language-plaintext highlighter-rouge">calloc()</code> for the custom allocator. Hmm,
I wonder why this is called “naive”…</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="o">*</span>
<span class="nf">naive_calloc</span><span class="p">(</span><span class="k">struct</span> <span class="n">mstack</span> <span class="o">*</span><span class="n">m</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">nmemb</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">size</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">void</span> <span class="o">*</span><span class="n">p</span> <span class="o">=</span> <span class="n">mstack_alloc</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">nmemb</span> <span class="o">*</span> <span class="n">size</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">p</span><span class="p">)</span>
        <span class="n">memset</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">nmemb</span> <span class="o">*</span> <span class="n">size</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">p</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Next up is a paranoid wrapper for <code class="language-plaintext highlighter-rouge">strtoul()</code> that defensively checks its
inputs. If it’s out of range of an <code class="language-plaintext highlighter-rouge">unsigned long</code>, it bails out. If
there’s trailing garbage, it bails out. If there’s no number at all, it
bails out. If you make prolonged eye contact, it bails out.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">unsigned</span> <span class="kt">long</span>
<span class="nf">safe_strtoul</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">nptr</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">endptr</span><span class="p">,</span> <span class="kt">int</span> <span class="n">base</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">errno</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">n</span> <span class="o">=</span> <span class="n">strtoul</span><span class="p">(</span><span class="n">nptr</span><span class="p">,</span> <span class="n">endptr</span><span class="p">,</span> <span class="n">base</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">errno</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">perror</span><span class="p">(</span><span class="n">nptr</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span>
    <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">nptr</span> <span class="o">==</span> <span class="o">*</span><span class="n">endptr</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Expected an integer</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span>
    <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">isspace</span><span class="p">(</span><span class="o">**</span><span class="n">endptr</span><span class="p">))</span> <span class="p">{</span>
        <span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Invalid character '%c'</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="o">**</span><span class="n">endptr</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">n</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">main()</code> function parses the header using this wrapper and allocates
some zeroed memory:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">width</span> <span class="o">=</span> <span class="n">safe_strtoul</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">p</span><span class="p">,</span> <span class="mi">10</span><span class="p">);</span>
    <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">height</span> <span class="o">=</span> <span class="n">safe_strtoul</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">p</span><span class="p">,</span> <span class="mi">10</span><span class="p">);</span>
    <span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">pixels</span> <span class="o">=</span> <span class="n">naive_calloc</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">width</span><span class="p">,</span> <span class="n">height</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">pixels</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">fputs</span><span class="p">(</span><span class="s">"Not enough memory</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">stderr</span><span class="p">);</span>
        <span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>Then there’s a command processing loop, also using <code class="language-plaintext highlighter-rouge">safe_strtoul()</code>. It
carefully checks bounds against <code class="language-plaintext highlighter-rouge">width</code> and <code class="language-plaintext highlighter-rouge">height</code>. Finally it writes
out a Netpbm, P2 (.pgm) format.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">printf</span><span class="p">(</span><span class="s">"P2</span><span class="se">\n</span><span class="s">%ld %ld 255</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">width</span><span class="p">,</span> <span class="n">height</span><span class="p">);</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">y</span> <span class="o">&lt;</span> <span class="n">height</span><span class="p">;</span> <span class="n">y</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">for</span> <span class="p">(</span><span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">x</span> <span class="o">&lt;</span> <span class="n">width</span><span class="p">;</span> <span class="n">x</span><span class="o">++</span><span class="p">)</span>
            <span class="n">printf</span><span class="p">(</span><span class="s">"%d "</span><span class="p">,</span> <span class="n">pixels</span><span class="p">[</span><span class="n">y</span> <span class="o">*</span> <span class="n">width</span> <span class="o">+</span> <span class="n">x</span><span class="p">]);</span>
        <span class="n">putchar</span><span class="p">(</span><span class="sc">'\n'</span><span class="p">);</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>The vulnerability is in something I’ve shown above. Can you find it?</p>

<h3 id="exploiting-the-renderer">Exploiting the renderer</h3>

<p>Did you find it? If you’re on a platform with 64-bit <code class="language-plaintext highlighter-rouge">long</code>, here’s your
exploit:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>V2 16 1152921504606846977
</code></pre></div></div>

<p>And here’s an exploit for 32-bit <code class="language-plaintext highlighter-rouge">long</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>V2 16 268435457
</code></pre></div></div>

<p>Here’s how it looks in action. The most obvious result is that the program
crashes:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ echo V2 16 1152921504606846977 | ./mstack &gt; capture.txt
User: coolguy
Password: mysecret
Segmentation fault
</code></pre></div></div>

<p>Here are the initial contents of <code class="language-plaintext highlighter-rouge">capture.txt</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>P2
16 1152921504606846977 255
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
109 121 115 101 99 114 101 116 10 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
</code></pre></div></div>

<p>Where did those junk numbers come from in the image data? Plug them into
an ASCII table and you’ll get “mysecret”. Despite allocating the image
with <code class="language-plaintext highlighter-rouge">naive_calloc()</code>, the password has found its way into the image! How
could this be?</p>

<p>What happened is that <code class="language-plaintext highlighter-rouge">width * height</code> overflows an <code class="language-plaintext highlighter-rouge">unsigned long</code>.
(Well, technically speaking, unsigned integers are defined <em>not</em> to
overflow in C, wrapping around instead, but it’s really the same thing.)
In <code class="language-plaintext highlighter-rouge">naive_calloc()</code>, the overflow results in a value of 16, so it only
allocates and clears 16 bytes. The requested allocation “succeeds” despite
<em>far</em> exceeding the available memory. The caller has been given a lot less
memory than expected, and the memory believed to have been allocated
contains a password.</p>

<p>The final part that writes the output doesn’t multiply the integers and
doesn’t need to test for overflow. It uses a nested loop instead,
continuing along with the original, impossible image size.</p>

<p>How do we fix this? Add an overflow check at the beginning of the
<code class="language-plaintext highlighter-rouge">naive_calloc()</code> function (making it no longer naive). This is what the
real <code class="language-plaintext highlighter-rouge">calloc()</code> does.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">if</span> <span class="p">(</span><span class="n">nmemb</span> <span class="o">&amp;&amp;</span> <span class="n">size</span> <span class="o">&gt;</span> <span class="o">-</span><span class="mi">1UL</span> <span class="o">/</span> <span class="n">nmemb</span><span class="p">)</span>
        <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
</code></pre></div></div>

<p>The frightening takeaway is that this check is <em>very</em> easy to forget. It’s
a subtle bug with potentially disastrous consequences.</p>

<p>In practice, this sort of program wouldn’t have sensitive data resident in
memory. Instead an attacker would target the program’s stack with those
<code class="language-plaintext highlighter-rouge">s</code> commands — specifically the <a href="/blog/2017/01/21/">return pointers</a> — and perform a ROP
attack against the application. With the exploit header above and a
platform where <code class="language-plaintext highlighter-rouge">long</code> the same size as a <code class="language-plaintext highlighter-rouge">size_t</code>, the program will behave
as if all available memory has been allocated to the image, so the <code class="language-plaintext highlighter-rouge">s</code>
command could be used to poke custom values <em>anywhere</em> in memory. This is
a much more complicated exploit, and it has to contend with ASLR and
random stack gap, but it’s feasible.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>Stack Clashing for Fun and Profit</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2017/06/21/"/>
    <id>urn:uuid:43402771-3340-3dff-c18f-7110caeedb7d</id>
    <updated>2017-06-21T05:28:56Z</updated>
    <category term="c"/><category term="posix"/><category term="netsec"/>
    <content type="html">
      <![CDATA[<p><em>Stack clashing</em> has been in the news lately due to <a href="https://blog.qualys.com/securitylabs/2017/06/19/the-stack-clash">some recently
discovered vulnerablities</a> along with proof-of-concept
exploits. As the announcement itself notes, this is not a new issue,
though this appears to be the first time it’s been given this
particular name. I do know of one “good” use of stack clashing, where
it’s used for something productive than as part of an attack. In this
article I’ll explain how it works.</p>

<p>You can find the complete code for this article here, ready to run:</p>

<ul>
  <li><a href="https://github.com/skeeto/stack-clash-coroutine">https://github.com/skeeto/stack-clash-coroutine</a></li>
</ul>

<p>But first, what is a stack clash? Here’s a rough picture of the
typical way process memory is laid out. The stack starts at a high
memory address and grows downwards. Code and static data sit at low
memory, with a <code class="language-plaintext highlighter-rouge">brk</code> pointer growing upward to make small allocations.
In the middle is the heap, where large allocations and memory mappings
take place.</p>

<p><img src="/img/diagram/process-memory.svg" alt="" /></p>

<p>Below the stack is a slim <em>guard page</em> that divides the stack and the
region of memory reserved for the heap. Reading or writing to that
memory will trap, causing the program to crash or some special action
to be taken. The goal is to prevent the stack from growing into the
heap, which could cause all sorts of trouble, like security issues.</p>

<p>The problem is that this thin guard page isn’t enough. It’s possible to
put a large allocation on the stack, never read or write to it, and
completely skip over the guard page, such that the heap and stack
overlap without detection.</p>

<p>Once this happens, writes into the heap will change memory on the
stack and vice versa. If an attacker can cause the program to make
such a large allocation on the stack, then legitimate writes into
memory on the heap can manipulate local variables or <a href="/blog/2017/01/21/">return pointers,
changing the program’s control flow</a>. This can bypass buffer
overflow protections, such as stack canaries.</p>

<h3 id="binary-trees-and-coroutines">Binary trees and coroutines</h3>

<p><img src="/img/diagram/binary-search-tree.svg" alt="" /></p>

<p>Now, I’m going to abruptly change topics to discuss binary search
trees. We’ll get back to stack clash in a bit. Suppose we have a
binary tree which we would like to iterate depth-first. For this
demonstration, here’s the C interface to the binary tree.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">tree</span> <span class="p">{</span>
    <span class="k">struct</span> <span class="n">tree</span> <span class="o">*</span><span class="n">left</span><span class="p">;</span>
    <span class="k">struct</span> <span class="n">tree</span> <span class="o">*</span><span class="n">right</span><span class="p">;</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">key</span><span class="p">;</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">value</span><span class="p">;</span>
<span class="p">};</span>

<span class="kt">void</span>  <span class="nf">tree_insert</span><span class="p">(</span><span class="k">struct</span> <span class="n">tree</span> <span class="o">**</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">k</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">v</span><span class="p">);</span>
<span class="kt">char</span> <span class="o">*</span><span class="nf">tree_find</span><span class="p">(</span><span class="k">struct</span> <span class="n">tree</span> <span class="o">*</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">k</span><span class="p">);</span>
<span class="kt">void</span>  <span class="nf">tree_visit</span><span class="p">(</span><span class="k">struct</span> <span class="n">tree</span> <span class="o">*</span><span class="p">,</span> <span class="kt">void</span> <span class="p">(</span><span class="o">*</span><span class="n">f</span><span class="p">)(</span><span class="kt">char</span> <span class="o">*</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="p">));</span>
<span class="kt">void</span>  <span class="nf">tree_destroy</span><span class="p">(</span><span class="k">struct</span> <span class="n">tree</span> <span class="o">*</span><span class="p">);</span>
</code></pre></div></div>

<p>An empty tree is the NULL pointer, hence the double-pointer for
insert. In the demonstration it’s an unbalanced search tree, but this
could very well be a balanced search tree with the addition of another
field on the structure.</p>

<p>For the traversal, first visit the root node, then traverse its left
tree, and finally traverse its right tree. It makes for a simple,
recursive definition — the sort of thing you’d teach a beginner.
Here’s a definition that accepts a callback, which the caller will use
to <em>visit</em> each key/value in the tree. This really is as simple as it
gets.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span>
<span class="nf">tree_visit</span><span class="p">(</span><span class="k">struct</span> <span class="n">tree</span> <span class="o">*</span><span class="n">t</span><span class="p">,</span> <span class="kt">void</span> <span class="p">(</span><span class="o">*</span><span class="n">f</span><span class="p">)(</span><span class="kt">char</span> <span class="o">*</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="p">))</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">f</span><span class="p">(</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">key</span><span class="p">,</span> <span class="n">t</span><span class="o">-&gt;</span><span class="n">value</span><span class="p">);</span>
        <span class="n">tree_visit</span><span class="p">(</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">left</span><span class="p">,</span> <span class="n">f</span><span class="p">);</span>
        <span class="n">tree_visit</span><span class="p">(</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">right</span><span class="p">,</span> <span class="n">f</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Unfortunately this isn’t so convenient for the caller, who has to
split off a callback function that <a href="/blog/2017/01/08/">lacks context</a>, then hand
over control to the traversal function.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span>
<span class="nf">printer</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">k</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">v</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"%s = %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">);</span>
<span class="p">}</span>

<span class="kt">void</span>
<span class="nf">print_tree</span><span class="p">(</span><span class="k">struct</span> <span class="n">tree</span> <span class="o">*</span><span class="n">tree</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">tree_visit</span><span class="p">(</span><span class="n">tree</span><span class="p">,</span> <span class="n">printer</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Usually it’s much nicer for the caller if instead it’s provided an
<em>iterator</em>, which the caller can invoke at will. Here’s an interface
for it, just two functions.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">tree_it</span> <span class="o">*</span><span class="nf">tree_iterator</span><span class="p">(</span><span class="k">struct</span> <span class="n">tree</span> <span class="o">*</span><span class="p">);</span>
<span class="kt">int</span>             <span class="nf">tree_next</span><span class="p">(</span><span class="k">struct</span> <span class="n">tree_it</span> <span class="o">*</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">k</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">v</span><span class="p">);</span>
</code></pre></div></div>

<p>The first constructs an iterator object, and the second one visits a
key/value pair each time it’s called. It returns 0 when traversal is
complete, automatically freeing any resources associated with the
iterator.</p>

<p>The caller now looks like this:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kt">char</span> <span class="o">*</span><span class="n">k</span><span class="p">,</span> <span class="o">*</span><span class="n">v</span><span class="p">;</span>
    <span class="k">struct</span> <span class="n">tree_it</span> <span class="o">*</span><span class="n">it</span> <span class="o">=</span> <span class="n">tree_iterator</span><span class="p">(</span><span class="n">tree</span><span class="p">);</span>
    <span class="k">while</span> <span class="p">(</span><span class="n">tree_next</span><span class="p">(</span><span class="n">it</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">k</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">v</span><span class="p">))</span>
        <span class="n">printf</span><span class="p">(</span><span class="s">"%s = %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">);</span>
</code></pre></div></div>

<p>Notice I haven’t defined <code class="language-plaintext highlighter-rouge">struct tree_it</code>. That’s because I’ve got
four different implementations, each taking a different approach. The
last one will use stack clashing.</p>

<h4 id="manual-state-tracking">Manual State Tracking</h4>

<p>With just the standard facilities provided by C, there’s a some manual
bookkeeping that has to take place in order to convert the recursive
definition into an iterator. Depth-first traversal is a stack-oriented
process, and with recursion the stack is implicit in the call stack.
As an iterator, the traversal stack needs to be <a href="/blog/2016/11/13/">managed
explicitly</a>. The iterator needs to keep track of the path it
took so that it can backtrack, which means keeping track of parent
nodes as well as which branch was taken.</p>

<p>Here’s my little implementation, which, to keep things simple, has a
hard depth limit of 32. It’s structure definition includes a stack of
node pointers, and 2 bits of information per visited node, stored
across a 64-bit integer.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">tree_it</span> <span class="p">{</span>
    <span class="k">struct</span> <span class="n">tree</span> <span class="o">*</span><span class="n">stack</span><span class="p">[</span><span class="mi">32</span><span class="p">];</span>
    <span class="kt">unsigned</span> <span class="kt">long</span> <span class="kt">long</span> <span class="n">state</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">nstack</span><span class="p">;</span>
<span class="p">};</span>

<span class="k">struct</span> <span class="n">tree_it</span> <span class="o">*</span>
<span class="nf">tree_iterator</span><span class="p">(</span><span class="k">struct</span> <span class="n">tree</span> <span class="o">*</span><span class="n">t</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">tree_it</span> <span class="o">*</span><span class="n">it</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">it</span><span class="p">));</span>
    <span class="n">it</span><span class="o">-&gt;</span><span class="n">stack</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">t</span><span class="p">;</span>
    <span class="n">it</span><span class="o">-&gt;</span><span class="n">state</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">it</span><span class="o">-&gt;</span><span class="n">nstack</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">it</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The 2 bits track three different states for each visited node:</p>

<ol>
  <li>Visit the current node</li>
  <li>Traverse the left tree</li>
  <li>Traverse the right tree</li>
</ol>

<p>It works out to the following. Don’t worry too much about trying to
understand how this works. My point is to demonstrate that converting
the recursive definition into an iterator complicates the
implementation.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>
<span class="nf">tree_next</span><span class="p">(</span><span class="k">struct</span> <span class="n">tree_it</span> <span class="o">*</span><span class="n">it</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">k</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">v</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">while</span> <span class="p">(</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">nstack</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">int</span> <span class="n">shift</span> <span class="o">=</span> <span class="p">(</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">nstack</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">*</span> <span class="mi">2</span><span class="p">;</span>
        <span class="kt">int</span> <span class="n">state</span> <span class="o">=</span> <span class="mi">3u</span> <span class="o">&amp;</span> <span class="p">(</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">state</span> <span class="o">&gt;&gt;</span> <span class="n">shift</span><span class="p">);</span>
        <span class="k">struct</span> <span class="n">tree</span> <span class="o">*</span><span class="n">t</span> <span class="o">=</span> <span class="n">it</span><span class="o">-&gt;</span><span class="n">stack</span><span class="p">[</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">nstack</span> <span class="o">-</span> <span class="mi">1</span><span class="p">];</span>
        <span class="n">it</span><span class="o">-&gt;</span><span class="n">state</span> <span class="o">+=</span> <span class="mi">1ull</span> <span class="o">&lt;&lt;</span> <span class="n">shift</span><span class="p">;</span>
        <span class="k">switch</span> <span class="p">(</span><span class="n">state</span><span class="p">)</span> <span class="p">{</span>
            <span class="k">case</span> <span class="mi">0</span><span class="p">:</span>
                <span class="o">*</span><span class="n">k</span> <span class="o">=</span> <span class="n">t</span><span class="o">-&gt;</span><span class="n">key</span><span class="p">;</span>
                <span class="o">*</span><span class="n">v</span> <span class="o">=</span> <span class="n">t</span><span class="o">-&gt;</span><span class="n">value</span><span class="p">;</span>
                <span class="k">if</span> <span class="p">(</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">left</span><span class="p">)</span> <span class="p">{</span>
                    <span class="n">it</span><span class="o">-&gt;</span><span class="n">stack</span><span class="p">[</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">nstack</span><span class="o">++</span><span class="p">]</span> <span class="o">=</span> <span class="n">t</span><span class="o">-&gt;</span><span class="n">left</span><span class="p">;</span>
                    <span class="n">it</span><span class="o">-&gt;</span><span class="n">state</span> <span class="o">&amp;=</span> <span class="o">~</span><span class="p">(</span><span class="mi">3ull</span> <span class="o">&lt;&lt;</span> <span class="p">(</span><span class="n">shift</span> <span class="o">+</span> <span class="mi">2</span><span class="p">));</span>
                <span class="p">}</span>
                <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
            <span class="k">case</span> <span class="mi">1</span><span class="p">:</span>
                <span class="k">if</span> <span class="p">(</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">right</span><span class="p">)</span> <span class="p">{</span>
                    <span class="n">it</span><span class="o">-&gt;</span><span class="n">stack</span><span class="p">[</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">nstack</span><span class="o">++</span><span class="p">]</span> <span class="o">=</span> <span class="n">t</span><span class="o">-&gt;</span><span class="n">right</span><span class="p">;</span>
                    <span class="n">it</span><span class="o">-&gt;</span><span class="n">state</span> <span class="o">&amp;=</span> <span class="o">~</span><span class="p">(</span><span class="mi">3ull</span> <span class="o">&lt;&lt;</span> <span class="p">(</span><span class="n">shift</span> <span class="o">+</span> <span class="mi">2</span><span class="p">));</span>
                <span class="p">}</span>
                <span class="k">break</span><span class="p">;</span>
            <span class="k">case</span> <span class="mi">2</span><span class="p">:</span>
                <span class="n">it</span><span class="o">-&gt;</span><span class="n">nstack</span><span class="o">--</span><span class="p">;</span>
                <span class="k">break</span><span class="p">;</span>
        <span class="p">}</span>
    <span class="p">}</span>
    <span class="n">free</span><span class="p">(</span><span class="n">it</span><span class="p">);</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Wouldn’t it be nice to keep both the recursive definition while also
getting an iterator? There’s an exact solution to that: coroutines.</p>

<h4 id="coroutines">Coroutines</h4>

<p>C doesn’t come with coroutines, but there are a number of libraries
available. We can also build our own coroutines. One way to do that is
with <em>user contexts</em> (<code class="language-plaintext highlighter-rouge">&lt;ucontext.h&gt;</code>) provided by the X/Open System
Interfaces Extension (XSI), an extension to POSIX. This set of
functions allow programs to create their own call stacks and switch
between them. That’s the key ingredient for coroutines. Caveat: These
functions aren’t widely available, and probably shouldn’t be used in
new code.</p>

<p>Here’s my iterator structure definition.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define _XOPEN_SOURCE 600
#include</span> <span class="cpf">&lt;ucontext.h&gt;</span><span class="cp">
</span>
<span class="k">struct</span> <span class="n">tree_it</span> <span class="p">{</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">k</span><span class="p">;</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">v</span><span class="p">;</span>
    <span class="n">ucontext_t</span> <span class="n">coroutine</span><span class="p">;</span>
    <span class="n">ucontext_t</span> <span class="n">yield</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>It needs one context for the original stack and one context for the
iterator’s stack. Each time the iterator is invoked, it the program
will switch to the other stack, find the next value, then switch back.
This process is called <em>yielding</em>. Values are passed between context
using the <code class="language-plaintext highlighter-rouge">k</code> (key) and <code class="language-plaintext highlighter-rouge">v</code> (value) fields on the iterator.</p>

<p>Before I get into initialization, here’s the actual traversal
coroutine. It’s nearly the same as the original recursive definition
except for the <code class="language-plaintext highlighter-rouge">swapcontext()</code>. This is the <em>yield</em>, pausing execution
and sending control back to the caller. The current context is saved
in the first argument, and the second argument becomes the current
context.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">void</span>
<span class="nf">coroutine</span><span class="p">(</span><span class="k">struct</span> <span class="n">tree</span> <span class="o">*</span><span class="n">t</span><span class="p">,</span> <span class="k">struct</span> <span class="n">tree_it</span> <span class="o">*</span><span class="n">it</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">it</span><span class="o">-&gt;</span><span class="n">k</span> <span class="o">=</span> <span class="n">t</span><span class="o">-&gt;</span><span class="n">key</span><span class="p">;</span>
        <span class="n">it</span><span class="o">-&gt;</span><span class="n">v</span> <span class="o">=</span> <span class="n">t</span><span class="o">-&gt;</span><span class="n">value</span><span class="p">;</span>
        <span class="n">swapcontext</span><span class="p">(</span><span class="o">&amp;</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">coroutine</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">yield</span><span class="p">);</span>
        <span class="n">coroutine</span><span class="p">(</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">left</span><span class="p">,</span> <span class="n">it</span><span class="p">);</span>
        <span class="n">coroutine</span><span class="p">(</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">right</span><span class="p">,</span> <span class="n">it</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>While the actual traversal is simple again, initialization is more
complicated. The first problem is that there’s no way to pass pointer
arguments to the coroutine. Technically only <code class="language-plaintext highlighter-rouge">int</code> arguments are
permitted. (All the online tutorials get this wrong.) To work around
this problem, I smuggle the arguments in as global variables. This
would cause problems should two different threads try to create
iterators at the same time, even on different trees.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="k">struct</span> <span class="n">tree</span> <span class="o">*</span><span class="n">tree_arg</span><span class="p">;</span>
<span class="k">static</span> <span class="k">struct</span> <span class="n">tree_it</span> <span class="o">*</span><span class="n">tree_it_arg</span><span class="p">;</span>

<span class="k">static</span> <span class="kt">void</span>
<span class="nf">coroutine_init</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">coroutine</span><span class="p">(</span><span class="n">tree_arg</span><span class="p">,</span> <span class="n">tree_it_arg</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The stack has to be allocated manually, which I do with a call to
<code class="language-plaintext highlighter-rouge">malloc()</code>. Nothing <a href="/blog/2015/05/15/">fancy is needed</a>, though this means the new
stack won’t have a guard page. For the stack size, I use the suggested
value of <code class="language-plaintext highlighter-rouge">SIGSTKSZ</code>. The <code class="language-plaintext highlighter-rouge">makecontext()</code> function is what creates the
new context from scratch, but the new context must first be
initialized with <code class="language-plaintext highlighter-rouge">getcontext()</code>, even though that particular snapshot
won’t actually be used.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">tree_it</span> <span class="o">*</span>
<span class="nf">tree_iterator</span><span class="p">(</span><span class="k">struct</span> <span class="n">tree</span> <span class="o">*</span><span class="n">t</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">tree_it</span> <span class="o">*</span><span class="n">it</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">it</span><span class="p">));</span>
    <span class="n">it</span><span class="o">-&gt;</span><span class="n">coroutine</span><span class="p">.</span><span class="n">uc_stack</span><span class="p">.</span><span class="n">ss_sp</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="n">SIGSTKSZ</span><span class="p">);</span>
    <span class="n">it</span><span class="o">-&gt;</span><span class="n">coroutine</span><span class="p">.</span><span class="n">uc_stack</span><span class="p">.</span><span class="n">ss_size</span> <span class="o">=</span> <span class="n">SIGSTKSZ</span><span class="p">;</span>
    <span class="n">it</span><span class="o">-&gt;</span><span class="n">coroutine</span><span class="p">.</span><span class="n">uc_link</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">yield</span><span class="p">;</span>
    <span class="n">getcontext</span><span class="p">(</span><span class="o">&amp;</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">coroutine</span><span class="p">);</span>
    <span class="n">makecontext</span><span class="p">(</span><span class="o">&amp;</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">coroutine</span><span class="p">,</span> <span class="n">coroutine_init</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
    <span class="n">tree_arg</span> <span class="o">=</span> <span class="n">t</span><span class="p">;</span>
    <span class="n">tree_it_arg</span> <span class="o">=</span> <span class="n">it</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">it</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Notice I gave it a function pointer, a lot like I’m starting a new
thread. This is no coincidence. There’s a lot of similarity between
coroutines and multiple threads, as you’ll soon see.</p>

<p>Finally the iterator function itself. Since NULL isn’t a valid key, it
initializes the key to NULL before yielding to the iterator context.
If the iterator has no more nodes to visit, it doesn’t set the key,
which can be detected when control returns.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>
<span class="nf">tree_next</span><span class="p">(</span><span class="k">struct</span> <span class="n">tree_it</span> <span class="o">*</span><span class="n">it</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">k</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">v</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">it</span><span class="o">-&gt;</span><span class="n">k</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">swapcontext</span><span class="p">(</span><span class="o">&amp;</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">yield</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">coroutine</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">k</span><span class="p">)</span> <span class="p">{</span>
        <span class="o">*</span><span class="n">k</span> <span class="o">=</span> <span class="n">it</span><span class="o">-&gt;</span><span class="n">k</span><span class="p">;</span>
        <span class="o">*</span><span class="n">v</span> <span class="o">=</span> <span class="n">it</span><span class="o">-&gt;</span><span class="n">v</span><span class="p">;</span>
        <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="n">free</span><span class="p">(</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">coroutine</span><span class="p">.</span><span class="n">uc_stack</span><span class="p">.</span><span class="n">ss_sp</span><span class="p">);</span>
        <span class="n">free</span><span class="p">(</span><span class="n">it</span><span class="p">);</span>
        <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>That’s all it takes to create and operate a coroutine in C, provided
you’re on a system with these XSI extensions.</p>

<h4 id="semaphores">Semaphores</h4>

<p>Instead of a coroutine, we could just use actual threads and a couple
of semaphores to synchronize them. This is a heavy implementation and
also probably shouldn’t be used in practice, but at least it’s fully
portable.</p>

<p>Here’s the structure definition:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">tree_it</span> <span class="p">{</span>
    <span class="k">struct</span> <span class="n">tree</span> <span class="o">*</span><span class="n">t</span><span class="p">;</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">k</span><span class="p">;</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">v</span><span class="p">;</span>
    <span class="n">sem_t</span> <span class="n">visitor</span><span class="p">;</span>
    <span class="n">sem_t</span> <span class="n">main</span><span class="p">;</span>
    <span class="n">pthread_t</span> <span class="kr">thread</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>The main thread will wait on one semaphore and the iterator thread
will wait on the other. This <a href="/blog/2017/02/14/">should sound very familiar</a>.</p>

<p>The actual traversal function looks the same, but with <code class="language-plaintext highlighter-rouge">sem_post()</code>
and <code class="language-plaintext highlighter-rouge">sem_wait()</code> as the yield.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">void</span>
<span class="nf">visit</span><span class="p">(</span><span class="k">struct</span> <span class="n">tree</span> <span class="o">*</span><span class="n">t</span><span class="p">,</span> <span class="k">struct</span> <span class="n">tree_it</span> <span class="o">*</span><span class="n">it</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">it</span><span class="o">-&gt;</span><span class="n">k</span> <span class="o">=</span> <span class="n">t</span><span class="o">-&gt;</span><span class="n">key</span><span class="p">;</span>
        <span class="n">it</span><span class="o">-&gt;</span><span class="n">v</span> <span class="o">=</span> <span class="n">t</span><span class="o">-&gt;</span><span class="n">value</span><span class="p">;</span>
        <span class="n">sem_post</span><span class="p">(</span><span class="o">&amp;</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">main</span><span class="p">);</span>
        <span class="n">sem_wait</span><span class="p">(</span><span class="o">&amp;</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">visitor</span><span class="p">);</span>
        <span class="n">visit</span><span class="p">(</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">left</span><span class="p">,</span> <span class="n">it</span><span class="p">);</span>
        <span class="n">visit</span><span class="p">(</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">right</span><span class="p">,</span> <span class="n">it</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>There’s a separate function to initialize the iterator context again.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">void</span> <span class="o">*</span>
<span class="nf">thread_entrance</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="n">arg</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">tree_it</span> <span class="o">*</span><span class="n">it</span> <span class="o">=</span> <span class="n">arg</span><span class="p">;</span>
    <span class="n">sem_wait</span><span class="p">(</span><span class="o">&amp;</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">visitor</span><span class="p">);</span>
    <span class="n">visit</span><span class="p">(</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">t</span><span class="p">,</span> <span class="n">it</span><span class="p">);</span>
    <span class="n">sem_post</span><span class="p">(</span><span class="o">&amp;</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">main</span><span class="p">);</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Creating the iterator only requires initializing the semaphores and
creating the thread:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">tree_it</span> <span class="o">*</span>
<span class="nf">tree_iterator</span><span class="p">(</span><span class="k">struct</span> <span class="n">tree</span> <span class="o">*</span><span class="n">t</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">tree_it</span> <span class="o">*</span><span class="n">it</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">it</span><span class="p">));</span>
    <span class="n">it</span><span class="o">-&gt;</span><span class="n">t</span> <span class="o">=</span> <span class="n">t</span><span class="p">;</span>
    <span class="n">sem_init</span><span class="p">(</span><span class="o">&amp;</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">visitor</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
    <span class="n">sem_init</span><span class="p">(</span><span class="o">&amp;</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">main</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
    <span class="n">pthread_create</span><span class="p">(</span><span class="o">&amp;</span><span class="n">it</span><span class="o">-&gt;</span><span class="kr">thread</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">thread_entrance</span><span class="p">,</span> <span class="n">it</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">it</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The iterator function looks just like the coroutine version.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>
<span class="nf">tree_next</span><span class="p">(</span><span class="k">struct</span> <span class="n">tree_it</span> <span class="o">*</span><span class="n">it</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">k</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">v</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">it</span><span class="o">-&gt;</span><span class="n">k</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">sem_post</span><span class="p">(</span><span class="o">&amp;</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">visitor</span><span class="p">);</span>
    <span class="n">sem_wait</span><span class="p">(</span><span class="o">&amp;</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">main</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">k</span><span class="p">)</span> <span class="p">{</span>
        <span class="o">*</span><span class="n">k</span> <span class="o">=</span> <span class="n">it</span><span class="o">-&gt;</span><span class="n">k</span><span class="p">;</span>
        <span class="o">*</span><span class="n">v</span> <span class="o">=</span> <span class="n">it</span><span class="o">-&gt;</span><span class="n">v</span><span class="p">;</span>
        <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="n">pthread_join</span><span class="p">(</span><span class="n">it</span><span class="o">-&gt;</span><span class="kr">thread</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
        <span class="n">sem_destroy</span><span class="p">(</span><span class="o">&amp;</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">main</span><span class="p">);</span>
        <span class="n">sem_destroy</span><span class="p">(</span><span class="o">&amp;</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">visitor</span><span class="p">);</span>
        <span class="n">free</span><span class="p">(</span><span class="n">it</span><span class="p">);</span>
        <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Overall, this is almost identical to the coroutine version.</p>

<h4 id="coroutines-using-stack-clashing">Coroutines using stack clashing</h4>

<p>Finally I can tie this back into the topic at hand. Without either XSI
extensions or Pthreads, we can (usually) create coroutines by abusing
<code class="language-plaintext highlighter-rouge">setjmp()</code> and <code class="language-plaintext highlighter-rouge">longjmp()</code>. Technically this violates two of the C’s
rules and relies on undefined behavior, but it generally works. This
<a href="http://fanf.livejournal.com/105413.html">is not my own invention</a>, and it dates back to at least 2010.</p>

<p>From the very beginning, C has provided a crude “exception” mechanism
that allows the stack to be abruptly unwound back to a previous state.
It’s a sort of non-local goto. Call <code class="language-plaintext highlighter-rouge">setjmp()</code> to capture an opaque
<code class="language-plaintext highlighter-rouge">jmp_buf</code> object to be used in the future. This function returns 0
this first time. Hand that value to <code class="language-plaintext highlighter-rouge">longjmp()</code> later, even in a
different function, and <code class="language-plaintext highlighter-rouge">setjmp()</code> will return again, this time with a
non-zero value.</p>

<p>It’s technically unsuitable for coroutines because the jump is a
one-way trip. The unwound stack invalidates any <code class="language-plaintext highlighter-rouge">jmp_buf</code> that was
created after the target of the jump. In practice, though, you can
still use these jumps, which is one rule being broken.</p>

<p>That’s where stack clashing comes into play. In order for it to be a
proper coroutine, it needs to have its own stack. But how can we do
that with these primitive C utilities? <strong>Extend the stack to overlap
the heap, call <code class="language-plaintext highlighter-rouge">setjmp()</code> to capture a coroutine on it, then return.</strong>
Generally we can get away with using <code class="language-plaintext highlighter-rouge">longjmp()</code> to return to this
heap-allocated stack.</p>

<p>Here’s my iterator definition for this one. Like the XSI context
struct, this has two <code class="language-plaintext highlighter-rouge">jmp_buf</code> “contexts.” The <code class="language-plaintext highlighter-rouge">stack</code> holds the
iterator’s stack buffer so that it can be freed, and the <code class="language-plaintext highlighter-rouge">gap</code> field
will be used to prevent the optimizer from spoiling our plans.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">tree_it</span> <span class="p">{</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">k</span><span class="p">;</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">v</span><span class="p">;</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">stack</span><span class="p">;</span>
    <span class="k">volatile</span> <span class="kt">char</span> <span class="o">*</span><span class="n">gap</span><span class="p">;</span>
    <span class="kt">jmp_buf</span> <span class="n">coroutine</span><span class="p">;</span>
    <span class="kt">jmp_buf</span> <span class="n">yield</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>The coroutine looks familiar again. This time the yield is performed
with <code class="language-plaintext highlighter-rouge">setjmmp()</code> and <code class="language-plaintext highlighter-rouge">longjmp()</code>, just like <code class="language-plaintext highlighter-rouge">swapcontext()</code>. Remember
that <code class="language-plaintext highlighter-rouge">setjmp()</code> returns twice, hence the branch. The <code class="language-plaintext highlighter-rouge">longjmp()</code> never
returns.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">void</span>
<span class="nf">coroutine</span><span class="p">(</span><span class="k">struct</span> <span class="n">tree</span> <span class="o">*</span><span class="n">t</span><span class="p">,</span> <span class="k">struct</span> <span class="n">tree_it</span> <span class="o">*</span><span class="n">it</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">it</span><span class="o">-&gt;</span><span class="n">k</span> <span class="o">=</span> <span class="n">t</span><span class="o">-&gt;</span><span class="n">key</span><span class="p">;</span>
        <span class="n">it</span><span class="o">-&gt;</span><span class="n">v</span> <span class="o">=</span> <span class="n">t</span><span class="o">-&gt;</span><span class="n">value</span><span class="p">;</span>
        <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">setjmp</span><span class="p">(</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">coroutine</span><span class="p">))</span>
            <span class="n">longjmp</span><span class="p">(</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">yield</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
        <span class="n">coroutine</span><span class="p">(</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">left</span><span class="p">,</span> <span class="n">it</span><span class="p">);</span>
        <span class="n">coroutine</span><span class="p">(</span><span class="n">t</span><span class="o">-&gt;</span><span class="n">right</span><span class="p">,</span> <span class="n">it</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Next is the tricky part to cause the stack clash. First, allocate the
new stack with <code class="language-plaintext highlighter-rouge">malloc()</code> so that we can get its address. Then use a
local variable on the stack to determine how much the stack needs to
grow in order to overlap with the allocation. Taking the difference
between these pointers is illegal as far as the language is concerned,
making this the second rule I’m breaking. I can <a href="/blog/2017/05/03/">imagine an
implementation</a> where the stack and heap are in two separate
kinds of memory, and it would be meaningless to take the difference. I
don’t actually have to imagine very hard, because this is actually how
it used to work on the 8086 with its <a href="https://en.wikipedia.org/wiki/X86_memory_segmentation">segmented memory
architecture</a>.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">tree_it</span> <span class="o">*</span>
<span class="nf">tree_iterator</span><span class="p">(</span><span class="k">struct</span> <span class="n">tree</span> <span class="o">*</span><span class="n">t</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">tree_it</span> <span class="o">*</span><span class="n">it</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">it</span><span class="p">));</span>
    <span class="n">it</span><span class="o">-&gt;</span><span class="n">stack</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="n">STACK_SIZE</span><span class="p">);</span>
    <span class="kt">char</span> <span class="n">marker</span><span class="p">;</span>
    <span class="kt">char</span> <span class="n">gap</span><span class="p">[</span><span class="o">&amp;</span><span class="n">marker</span> <span class="o">-</span> <span class="n">it</span><span class="o">-&gt;</span><span class="n">stack</span> <span class="o">-</span> <span class="n">STACK_SIZE</span><span class="p">];</span>
    <span class="n">it</span><span class="o">-&gt;</span><span class="n">gap</span> <span class="o">=</span> <span class="n">gap</span><span class="p">;</span> <span class="c1">// prevent optimization</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">setjmp</span><span class="p">(</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">yield</span><span class="p">))</span>
        <span class="n">coroutine_init</span><span class="p">(</span><span class="n">t</span><span class="p">,</span> <span class="n">it</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">it</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I’m using a variable-length array (VLA) named <code class="language-plaintext highlighter-rouge">gap</code> to indirectly
control the stack pointer, moving it over the heap. I’m assuming the
stack grows downward, since otherwise the sign would be wrong.</p>

<p>The compiler is smart and will notice I’m not actually using <code class="language-plaintext highlighter-rouge">gap</code>,
and it’s happy to throw it away. In fact, it’s vitally important that
I <em>don’t</em> touch it since the guard page, along with a bunch of
unmapped memory, is actually somewhere in the middle of that array. I
only want the array for its side effect, but that side effect isn’t
officially supported, which means the optimizer doesn’t need to
consider it in its decisions. To inhibit the optimizer, I store the
array’s address where someone might potentially look at it, meaning
the array has to exist.</p>

<p>Finally, the iterator function looks just like the others, again.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>
<span class="nf">tree_next</span><span class="p">(</span><span class="k">struct</span> <span class="n">tree_it</span> <span class="o">*</span><span class="n">it</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">k</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">v</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">it</span><span class="o">-&gt;</span><span class="n">k</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">setjmp</span><span class="p">(</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">yield</span><span class="p">))</span>
        <span class="n">longjmp</span><span class="p">(</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">coroutine</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">k</span><span class="p">)</span> <span class="p">{</span>
        <span class="o">*</span><span class="n">k</span> <span class="o">=</span> <span class="n">it</span><span class="o">-&gt;</span><span class="n">k</span><span class="p">;</span>
        <span class="o">*</span><span class="n">v</span> <span class="o">=</span> <span class="n">it</span><span class="o">-&gt;</span><span class="n">v</span><span class="p">;</span>
        <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="n">free</span><span class="p">(</span><span class="n">it</span><span class="o">-&gt;</span><span class="n">stack</span><span class="p">);</span>
        <span class="n">free</span><span class="p">(</span><span class="n">it</span><span class="p">);</span>
        <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And that’s it: a nasty hack using a stack clash to create a context
for a <code class="language-plaintext highlighter-rouge">setjmp()</code>+<code class="language-plaintext highlighter-rouge">longjmp()</code> coroutine.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Manual Control Flow Guard in C</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2017/01/21/"/>
    <id>urn:uuid:f185405a-3e30-3612-7a21-6d4ec450519d</id>
    <updated>2017-01-21T22:44:15Z</updated>
    <category term="c"/><category term="linux"/><category term="netsec"/>
    <content type="html">
      <![CDATA[<p>Recent versions of Windows have a new exploit mitigation feature
called <a href="http://sjc1-te-ftp.trendmicro.com/assets/wp/exploring-control-flow-guard-in-windows10.pdf"><em>Control Flow Guard</em></a> (CFG). Before an indirect function
call — e.g. function pointers and virtual functions — the target
address checked against a table of valid call addresses. If the
address isn’t the entry point of a known function, then the program is
aborted.</p>

<p>If an application has a buffer overflow vulnerability, an attacker may
use it to overwrite a function pointer and, by the call through that
pointer, control the execution flow of the program. This is one way to
initiate a <a href="https://skeeto.s3.amazonaws.com/share/p15-coffman.pdf"><em>Return Oriented Programming</em></a> (ROP) attack, where
the attacker constructs <a href="https://github.com/JonathanSalwan/ROPgadget">a chain of <em>gadget</em> addresses</a> — a
gadget being a couple of instructions followed by a return
instruction, all in the original program — using the indirect call as
the starting point. The execution then flows from gadget to gadget so
that the program does what the attacker wants it to do, all without
the attacker supplying any code.</p>

<p>The two most widely practiced ROP attack mitigation techniques today
are <em>Address Space Layout Randomization</em> (ASLR) and <em>stack
protectors</em>. The former randomizes the base address of executable
images (programs, shared libraries) so that process memory layout is
unpredictable to the attacker. The addresses in the ROP attack chain
depend on the run-time memory layout, so the attacker must also find
and exploit an <a href="https://github.com/torvalds/linux/blob/4c9eff7af69c61749b9eb09141f18f35edbf2210/Documentation/sysctl/kernel.txt#L373">information leak</a> to bypass ASLR.</p>

<p>For stack protectors, the compiler allocates a <em>canary</em> on the stack
above other stack allocations and sets the canary to a per-thread
random value. If a buffer overflows to overwrite the function return
pointer, the canary value will also be overwritten. Before the
function returns by the return pointer, it checks the canary. If the
canary doesn’t match the known value, the program is aborted.</p>

<p><img src="/img/cfg/canary.svg" alt="" /></p>

<p>CFG works similarly — performing a check prior to passing control to
the address in a pointer — except that instead of checking a canary,
it checks the target address itself. This is a lot more sophisticated,
and, unlike a stack canary, essentially requires coordination by the
platform. The check must be informed on all valid call targets,
whether from the main program or from shared libraries.</p>

<p>While not (yet?) widely deployed, a worthy mention is <a href="http://clang.llvm.org/docs/SafeStack.html">Clang’s
SafeStack</a>. Each thread gets <em>two</em> stacks: a “safe stack” for
return pointers and other safely-accessed values, and an “unsafe
stack” for buffers and such. Buffer overflows will corrupt other
buffers but will not overwrite return pointers, limiting the effect of
their damage.</p>

<h3 id="an-exploit-example">An exploit example</h3>

<p>Consider this trivial C program, <code class="language-plaintext highlighter-rouge">demo.c</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>
<span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">char</span> <span class="n">name</span><span class="p">[</span><span class="mi">8</span><span class="p">];</span>
    <span class="n">gets</span><span class="p">(</span><span class="n">name</span><span class="p">);</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"Hello, %s.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">name</span><span class="p">);</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>It reads a name into a buffer and prints it back out with a greeting.
While trivial, it’s far from innocent. That naive call to <code class="language-plaintext highlighter-rouge">gets()</code>
doesn’t check the bounds of the buffer, introducing an exploitable
buffer overflow. It’s so obvious that both the compiler and linker
will yell about it.</p>

<p>For simplicity, suppose the program also contains a dangerous
function.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span>
<span class="nf">self_destruct</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">puts</span><span class="p">(</span><span class="s">"**** GO BOOM! ****"</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The attacker can use the buffer overflow to call this dangerous
function.</p>

<p>To make this attack simpler for the sake of the article, assume the
program isn’t using ASLR (e.g. without <code class="language-plaintext highlighter-rouge">-fpie</code>/<code class="language-plaintext highlighter-rouge">-pie</code>, or with
<code class="language-plaintext highlighter-rouge">-fno-pie</code>/<code class="language-plaintext highlighter-rouge">-no-pie</code>). For this particular example, I’ll also
explicitly disable buffer overflow protections (e.g. <code class="language-plaintext highlighter-rouge">_FORTIFY_SOURCE</code>
and stack protectors).</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -Os -fno-pie -D_FORTIFY_SOURCE=0 -fno-stack-protector \
      -o demo demo.c
</code></pre></div></div>

<p>First, find the address of <code class="language-plaintext highlighter-rouge">self_destruct()</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ readelf -a demo | grep self_destruct
46: 00000000004005c5  10 FUNC  GLOBAL DEFAULT 13 self_destruct
</code></pre></div></div>

<p>This is on x86-64, so it’s a 64-bit address. The size of the <code class="language-plaintext highlighter-rouge">name</code>
buffer is 8 bytes, and peeking at the assembly I see an extra 8 bytes
allocated above, so there’s 16 bytes to fill, then 8 bytes to
overwrite the return pointer with the address of <code class="language-plaintext highlighter-rouge">self_destruct</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ echo -ne 'xxxxxxxxyyyyyyyy\xc5\x05\x40\x00\x00\x00\x00\x00' &gt; boom
$ ./demo &lt; boom
Hello, xxxxxxxxyyyyyyyy?@.
**** GO BOOM! ****
Segmentation fault
</code></pre></div></div>

<p>With this input I’ve successfully exploited the buffer overflow to
divert control to <code class="language-plaintext highlighter-rouge">self_destruct()</code>. When <code class="language-plaintext highlighter-rouge">main</code> tries to return into
libc, it instead jumps to the dangerous function, and then crashes
when that function tries to return — though, presumably, the system
would have self-destructed already. Turning on the stack protector
stops this exploit.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -Os -fno-pie -D_FORTIFY_SOURCE=0 -fstack-protector \
      -o demo demo.c
$ ./demo &lt; boom
Hello, xxxxxxxxaaaaaaaa?@.
*** stack smashing detected ***: ./demo terminated
======= Backtrace: =========
... lots of backtrace stuff ...
</code></pre></div></div>

<p>The stack protector successfully blocks the exploit. To get around
this, I’d have to either guess the canary value or discover an
information leak that reveals it.</p>

<p>The stack protector transformed the program into something that looks
like the following:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>
<span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">long</span> <span class="n">__canary</span> <span class="o">=</span> <span class="n">__get_thread_canary</span><span class="p">();</span>
    <span class="kt">char</span> <span class="n">name</span><span class="p">[</span><span class="mi">8</span><span class="p">];</span>
    <span class="n">gets</span><span class="p">(</span><span class="n">name</span><span class="p">);</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"Hello, %s.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">name</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">__canary</span> <span class="o">!=</span> <span class="n">__get_thread_canary</span><span class="p">())</span>
        <span class="n">abort</span><span class="p">();</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>However, it’s not actually possible to implement the stack protector
within C. Buffer overflows are undefined behavior, and a canary is
only affected by a buffer overflow, allowing the compiler to optimize
it away.</p>

<h3 id="function-pointers-and-virtual-functions">Function pointers and virtual functions</h3>

<p>After the attacker successfully self-destructed the last computer,
upper management has mandated password checks before all
self-destruction procedures. Here’s what it looks like now:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span>
<span class="nf">self_destruct</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">password</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">password</span><span class="p">,</span> <span class="s">"12345"</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
        <span class="n">puts</span><span class="p">(</span><span class="s">"**** GO BOOM! ****"</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The password is hardcoded, and it’s the kind of thing an idiot would
have on his luggage, but assume it’s actually unknown to the attacker.
Especially since, as I’ll show shortly, it won’t matter. Upper
management has also mandated stack protectors, so assume that’s
enabled from here on.</p>

<p>Additionally, the program has evolved a bit, and now <a href="/blog/2014/10/21/">uses a function
pointer for polymorphism</a>.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">greeter</span> <span class="p">{</span>
    <span class="kt">char</span> <span class="n">name</span><span class="p">[</span><span class="mi">8</span><span class="p">];</span>
    <span class="kt">void</span> <span class="p">(</span><span class="o">*</span><span class="n">greet</span><span class="p">)(</span><span class="k">struct</span> <span class="n">greeter</span> <span class="o">*</span><span class="p">);</span>
<span class="p">};</span>

<span class="kt">void</span>
<span class="nf">greet_hello</span><span class="p">(</span><span class="k">struct</span> <span class="n">greeter</span> <span class="o">*</span><span class="n">g</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"Hello, %s.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">g</span><span class="o">-&gt;</span><span class="n">name</span><span class="p">);</span>
<span class="p">}</span>

<span class="kt">void</span>
<span class="nf">greet_aloha</span><span class="p">(</span><span class="k">struct</span> <span class="n">greeter</span> <span class="o">*</span><span class="n">g</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"Aloha, %s.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">g</span><span class="o">-&gt;</span><span class="n">name</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>There’s now a greeter object and the function pointer makes its
behavior polymorphic. Think of it as a hand-coded virtual function for
C. Here’s the new (contrived) <code class="language-plaintext highlighter-rouge">main</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>
<span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">greeter</span> <span class="n">greeter</span> <span class="o">=</span> <span class="p">{.</span><span class="n">greet</span> <span class="o">=</span> <span class="n">greet_hello</span><span class="p">};</span>
    <span class="n">gets</span><span class="p">(</span><span class="n">greeter</span><span class="p">.</span><span class="n">name</span><span class="p">);</span>
    <span class="n">greeter</span><span class="p">.</span><span class="n">greet</span><span class="p">(</span><span class="o">&amp;</span><span class="n">greeter</span><span class="p">);</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>(In a real program, something else provides <code class="language-plaintext highlighter-rouge">greeter</code> and picks its
own function pointer for <code class="language-plaintext highlighter-rouge">greet</code>.)</p>

<p>Rather than overwriting the return pointer, the attacker has the
opportunity to overwrite the function pointer on the struct. Let’s
reconstruct the exploit like before.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ readelf -a demo | grep self_destruct
54: 00000000004006a5  10 FUNC  GLOBAL DEFAULT  13 self_destruct
</code></pre></div></div>

<p>We don’t know the password, but we <em>do</em> know (from peeking at the
disassembly) that the password check is 16 bytes. The attack should
instead jump 16 bytes into the function, skipping over the check
(0x4006a5 + 16 = 0x4006b5).</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ echo -ne 'xxxxxxxx\xb5\x06\x40\x00\x00\x00\x00\x00' &gt; boom
$ ./demo &lt; boom
**** GO BOOM! ****
</code></pre></div></div>

<p>Neither the stack protector nor the password were of any help. The
stack protector only protects the <em>return</em> pointer, not the function
pointer on the struct.</p>

<p><strong>This is where the Control Flow Guard comes into play.</strong> With CFG
enabled, the compiler inserts a check before calling the <code class="language-plaintext highlighter-rouge">greet()</code>
function pointer. It must point to the beginning of a known function,
otherwise it will abort just like the stack protector. Since the
middle of <code class="language-plaintext highlighter-rouge">self_destruct()</code> isn’t the <em>beginning</em> of a function, it
would abort if this exploit is attempted.</p>

<p>However, I’m on Linux and there’s no CFG on Linux (yet?). So I’ll
implement it myself, with manual checks.</p>

<h3 id="function-address-bitmap">Function address bitmap</h3>

<p>As described in the PDF linked at the top of this article, CFG on
Windows is implemented using a bitmap. Each bit in the bitmap
represents 8 bytes of memory. If those 8 bytes contains the beginning
of a function, the bit will be set to one. Checking a pointer means
checking its associated bit in the bitmap.</p>

<p>For my CFG, I’ve decided to keep the same 8-byte resolution: the
bottom three bits of the target address will be dropped. The next 24
bits will be used to index into the bitmap. All other bits in the
pointer will be ignored. A 24-bit bit index means the bitmap will only
be 2MB.</p>

<p>These 24 bits is perfectly sufficient for 32-bit systems, but it means
on 64-bit systems there may be false positives: some addresses will
not represent the start of a function, but will have their bit set
to 1. This is acceptable, especially because only functions known to
be targets of indirect calls will be registered in the table, reducing
the false positive rate.</p>

<p>Note: Relying on <a href="/blog/2016/05/30/">the bits of a pointer cast to an integer is
unspecified</a> and isn’t portable, but this implementation will
work fine anywhere I would care to use it.</p>

<p>Here are the CFG parameters. I’ve made them macros so that they can
easily be tuned at compile-time. The <code class="language-plaintext highlighter-rouge">cfg_bits</code> is the integer type
backing the bitmap array. The <code class="language-plaintext highlighter-rouge">CFG_RESOLUTION</code> is the number of bits
dropped, so “3” is a granularity of 8 bytes.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">cfg_bits</span><span class="p">;</span>
<span class="cp">#define CFG_RESOLUTION  3
#define CFG_BITS        24
</span></code></pre></div></div>

<p>Given a function pointer <code class="language-plaintext highlighter-rouge">f</code>, this macro extracts the bitmap index.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define CFG_INDEX(f) \
    (((uintptr_t)f &gt;&gt; CFG_RESOLUTION) &amp; ((1UL &lt;&lt; CFG_BITS) - 1))
</span></code></pre></div></div>

<p>The CFG bitmap is just an array of integers. Zero it to initialize.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">cfg</span> <span class="p">{</span>
    <span class="n">cfg_bits</span> <span class="n">bitmap</span><span class="p">[(</span><span class="mi">1UL</span> <span class="o">&lt;&lt;</span> <span class="n">CFG_BITS</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="n">cfg_bits</span><span class="p">)</span> <span class="o">*</span> <span class="n">CHAR_BIT</span><span class="p">)];</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Functions are manually registered in the bitmap using
<code class="language-plaintext highlighter-rouge">cfg_register()</code>.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span>
<span class="nf">cfg_register</span><span class="p">(</span><span class="k">struct</span> <span class="n">cfg</span> <span class="o">*</span><span class="n">cfg</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">f</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">i</span> <span class="o">=</span> <span class="n">CFG_INDEX</span><span class="p">(</span><span class="n">f</span><span class="p">);</span>
    <span class="kt">size_t</span> <span class="n">z</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">cfg_bits</span><span class="p">)</span> <span class="o">*</span> <span class="n">CHAR_BIT</span><span class="p">;</span>
    <span class="n">cfg</span><span class="o">-&gt;</span><span class="n">bitmap</span><span class="p">[</span><span class="n">i</span> <span class="o">/</span> <span class="n">z</span><span class="p">]</span> <span class="o">|=</span> <span class="mi">1UL</span> <span class="o">&lt;&lt;</span> <span class="p">(</span><span class="n">i</span> <span class="o">%</span> <span class="n">z</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Because functions are registered at run-time, it’s fully compatible
with ASLR. If ASLR is enabled, the bitmap will be a little different
each run. On the same note, it may be worth XORing each bitmap element
with a random, run-time value — along the same lines as the stack
canary value — to make it harder for an attacker to manipulate the
bitmap should he get the ability to overwrite it by a vulnerability.
Alternatively the bitmap could be switched to read-only (e.g.
<code class="language-plaintext highlighter-rouge">mprotect()</code>) once everything is registered.</p>

<p>And finally, the check function, used immediately before indirect
calls. It ensures <code class="language-plaintext highlighter-rouge">f</code> was previously passed to <code class="language-plaintext highlighter-rouge">cfg_register()</code>
(except for false positives, as discussed). Since it will be invoked
often, it needs to be fast and simple.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span>
<span class="nf">cfg_check</span><span class="p">(</span><span class="k">struct</span> <span class="n">cfg</span> <span class="o">*</span><span class="n">cfg</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">f</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">i</span> <span class="o">=</span> <span class="n">CFG_INDEX</span><span class="p">(</span><span class="n">f</span><span class="p">);</span>
    <span class="kt">size_t</span> <span class="n">z</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">cfg_bits</span><span class="p">)</span> <span class="o">*</span> <span class="n">CHAR_BIT</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="p">((</span><span class="n">cfg</span><span class="o">-&gt;</span><span class="n">bitmap</span><span class="p">[</span><span class="n">i</span> <span class="o">/</span> <span class="n">z</span><span class="p">]</span> <span class="o">&gt;&gt;</span> <span class="p">(</span><span class="n">i</span> <span class="o">%</span> <span class="n">z</span><span class="p">))</span> <span class="o">&amp;</span> <span class="mi">1</span><span class="p">))</span>
        <span class="n">abort</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And that’s it! Now augment <code class="language-plaintext highlighter-rouge">main</code> to make use of it:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">cfg</span> <span class="n">cfg</span><span class="p">;</span>

<span class="kt">int</span>
<span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">cfg_register</span><span class="p">(</span><span class="o">&amp;</span><span class="n">cfg</span><span class="p">,</span> <span class="n">self_destruct</span><span class="p">);</span>  <span class="c1">// to prove this works</span>
    <span class="n">cfg_register</span><span class="p">(</span><span class="o">&amp;</span><span class="n">cfg</span><span class="p">,</span> <span class="n">greet_hello</span><span class="p">);</span>
    <span class="n">cfg_register</span><span class="p">(</span><span class="o">&amp;</span><span class="n">cfg</span><span class="p">,</span> <span class="n">greet_aloha</span><span class="p">);</span>

    <span class="k">struct</span> <span class="n">greeter</span> <span class="n">greeter</span> <span class="o">=</span> <span class="p">{.</span><span class="n">greet</span> <span class="o">=</span> <span class="n">greet_hello</span><span class="p">};</span>
    <span class="n">gets</span><span class="p">(</span><span class="n">greeter</span><span class="p">.</span><span class="n">name</span><span class="p">);</span>
    <span class="n">cfg_check</span><span class="p">(</span><span class="o">&amp;</span><span class="n">cfg</span><span class="p">,</span> <span class="n">greeter</span><span class="p">.</span><span class="n">greet</span><span class="p">);</span>
    <span class="n">greeter</span><span class="p">.</span><span class="n">greet</span><span class="p">(</span><span class="o">&amp;</span><span class="n">greeter</span><span class="p">);</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And now attempting the exploit:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./demo &lt; boom
Aborted
</code></pre></div></div>

<p>Normally <code class="language-plaintext highlighter-rouge">self_destruct()</code> wouldn’t be registered since it’s not a
legitimate target of an indirect call, but the exploit <em>still</em> didn’t
work because it called into the middle of <code class="language-plaintext highlighter-rouge">self_destruct()</code>, which
isn’t a valid address in the bitmap. The check aborts the program
before it can be exploited.</p>

<p>In a real application I would have a <a href="/blog/2016/12/23/">global <code class="language-plaintext highlighter-rouge">cfg</code> bitmap</a> for
the whole program, and define <code class="language-plaintext highlighter-rouge">cfg_check()</code> in a header as an <code class="language-plaintext highlighter-rouge">inline</code>
function.</p>

<p>Despite being possible implement in straight C without the help of the
toolchain, it would be far less cumbersome and error-prone to let the
compiler and platform handle Control Flow Guard. That’s the right
place to implement it.</p>

<p><em>Update</em>: Ted Unangst pointed out <a href="http://www.tedunangst.com/inks/l/849">OpenBSD performing a similar
check</a> in its mbuf library. Instead of a bitmap, the function
pointer is replaced with an index into an array of registered function
pointers. That approach is cleaner, more efficient, completely
portable, and has no false positives.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Stealing Session Cookies with Tcpdump</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/06/23/"/>
    <id>urn:uuid:309396d4-fe6e-30a1-1a96-35281b58fb77</id>
    <updated>2016-06-23T21:55:24Z</updated>
    <category term="netsec"/><category term="javascript"/><category term="web"/>
    <content type="html">
      <![CDATA[<p>My wife was shopping online for running shoes when she got this
classic Firefox pop-up.</p>

<p><a href="/img/tcpdump/warning.png"><img src="/img/tcpdump/warning-thumb.png" alt="" /></a></p>

<p>These days this is usually just a server misconfiguration annoyance.
However, she was logged into an account, which included a virtual
shopping cart and associated credit card payment options, meaning
actual sensitive information would be at risk.</p>

<p>The main culprit was the website’s search feature, which wasn’t
transmitted over HTTPS. There’s an HTTPS version of the search (which
I found manually), but searches aren’t directed there. This means it’s
also vulnerable to <a href="https://www.youtube.com/watch?v=MFol6IMbZ7Y">SSL stripping</a>.</p>

<p>Fortunately Firefox warns about the issue and requires a positive
response before continuing. Neither Chrome nor Internet Explorer get
this right. Both transmit session cookies in the clear without
warning, then subtly mention it after the fact. She may not have even
noticed the problem (and then asked me about it) if not for that
pop-up.</p>

<p>I contacted the website’s technical support two weeks ago and they
never responded, nor did they fix any of their issues, so for now you
can <a href="https://www.roadrunnersports.com">see this all for yourself</a>.</p>

<h3 id="finding-the-session-cookies">Finding the session cookies</h3>

<p>To prove to myself that this whole situation was really as bad as it
looked, I decided to steal her session cookie and use it to manipulate
her shopping cart. First I hit F12 in her browser to peek at the
network headers. Perhaps nothing important was actually sent in the
clear.</p>

<p><img src="/img/tcpdump/headers.png" alt="" /></p>

<p>The session cookie (red box) was definitely sent in the request. I
only need to catch it on the network. That’s an easy job for tcpdump.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tcpdump -A -l dst www.roadrunnersports.com and dst port 80 | \
    grep "^Cookie: "
</code></pre></div></div>

<p>This command tells tcpdump to dump selected packet content as ASCII
(<code class="language-plaintext highlighter-rouge">-A</code>). It also sets output to line-buffered so that I can see packets
as soon as they arrive (<code class="language-plaintext highlighter-rouge">-l</code>). The filter will only match packets
going out to this website and only on port 80 (HTTP), so I won’t see
any extraneous noise (<code class="language-plaintext highlighter-rouge">dst &lt;addr&gt; and dst port &lt;port&gt;</code>). Finally, I
crudely run that all through grep to see if any cookies fall out.</p>

<p>On the next insecure page load I get this (wrapped here for display)
spilling many times into my terminal:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Cookie: JSESSIONID=99004F61A4ED162641DC36046AC81EAB.prd_rrs12; visitSo
  urce=Registered; RoadRunnerTestCookie=true; mobify-path=; __cy_d=09A
  78CC1-AF18-40BC-8752-B2372492EDE5; _cybskt=; _cycurrln=; wpCart=0; _
  up=1.2.387590744.1465699388; __distillery=a859d68_771ff435-d359-489a
  -bf1a-1e3dba9b8c10-db57323d1-79769fcf5b1b-fc6c; DYN_USER_ID=16328657
  52; DYN_USER_CONFIRM=575360a28413d508246fae6befe0e1f4
</code></pre></div></div>

<p>That’s a bingo! I massage this into a bit of JavaScript, go to the
store page in my own browser, and dump it in the developer console. I
don’t know which cookies are important, but that doesn’t matter. I
take them all.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>document.cookie = "Cookie: JSESSIONID=99004F61A4ED162641DC36046A" +
                  "C81EAB.prd_rrs12;";
document.cookie = "visitSource=Registered";
document.cookie = "RoadRunnerTestCookie=true";
document.cookie = "mobify-path=";
document.cookie = "__cy_d=09A78CC1-AF18-40BC-8752-B2372492EDE5";
document.cookie = "_cybskt=";
document.cookie = "_cycurrln=";
document.cookie = "wpCart=0";
document.cookie = "_up=1.2.387590744.1465699388";
document.cookie = "__distillery=a859d68_771ff435-d359-489a-bf1a-" +
                  "1e3dba9b8c10-db57323d1-79769fcf5b1b-fc6c";
document.cookie = "DYN_USER_ID=1632865752";
document.cookie = "DYN_USER_CONFIRM=575360a28413d508246fae6befe0e1f4";
</code></pre></div></div>

<p>Refresh the page and now I’m logged in. I can see what’s in the
shopping cart. I can add and remove items. I can checkout and complete
the order. My browser is as genuine as hers.</p>

<h3 id="how-to-fix-it">How to fix it</h3>

<p>The quick and dirty thing to do is set the <a href="http://tools.ietf.org/html/rfc6265#section-4.1.2.5">Secure</a> and
<a href="http://tools.ietf.org/html/rfc6265#section-4.1.2.6">HttpOnly</a> flags on all cookies. The first prevents cookies
from being sent in the clear, where a passive observer might see them.
The second prevents the JavaScript from accessing them, since an
active attacker could inject their own JavaScript in the page.
Customers would appear to be logged out on plain HTTP pages, which is
confusing.</p>

<p>However, since this is an online store, there’s absolutely no excuse
to be serving <em>anything</em> over plain HTTP. This just opens customers up
to downgrade attacks. The long term solution, in addition to the
cookie flags above, is to redirect all HTTP requests to HTTPS and
never serve or request content over HTTP, especially not executable
content like JavaScript.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>A Basic Just-In-Time Compiler</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2015/03/19/"/>
    <id>urn:uuid:95e0437f-61f0-3932-55b7-f828e171d9ca</id>
    <updated>2015-03-19T04:57:55Z</updated>
    <category term="c"/><category term="tutorial"/><category term="netsec"/><category term="x86"/><category term="posix"/><category term="optimization"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=17747759">on Hacker News</a> and <a href="https://old.reddit.com/r/programming/comments/akxq8q/a_basic_justintime_compiler/">on reddit</a>.</em></p>

<p><a href="http://redd.it/2z68di">Monday’s /r/dailyprogrammer challenge</a> was to write a program to
read a recurrence relation definition and, through interpretation,
iterate it to some number of terms. It’s given an initial term
(<code class="language-plaintext highlighter-rouge">u(0)</code>) and a sequence of operations, <code class="language-plaintext highlighter-rouge">f</code>, to apply to the previous
term (<code class="language-plaintext highlighter-rouge">u(n + 1) = f(u(n))</code>) to compute the next term. Since it’s an
easy challenge, the operations are limited to addition, subtraction,
multiplication, and division, with one operand each.</p>

<!--more-->

<p>For example, the relation <code class="language-plaintext highlighter-rouge">u(n + 1) = (u(n) + 2) * 3 - 5</code> would be
input as <code class="language-plaintext highlighter-rouge">+2 *3 -5</code>. If <code class="language-plaintext highlighter-rouge">u(0) = 0</code> then,</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">u(1) = 1</code></li>
  <li><code class="language-plaintext highlighter-rouge">u(2) = 4</code></li>
  <li><code class="language-plaintext highlighter-rouge">u(3) = 13</code></li>
  <li><code class="language-plaintext highlighter-rouge">u(4) = 40</code></li>
  <li><code class="language-plaintext highlighter-rouge">u(5) = 121</code></li>
  <li>…</li>
</ul>

<p>Rather than write an interpreter to apply the sequence of operations,
for <a href="https://gist.github.com/skeeto/3a1aa3df31896c9956dc">my submission</a> (<a href="/download/jit.c">mirror</a>) I took the opportunity to
write a simple x86-64 Just-In-Time (JIT) compiler. So rather than
stepping through the operations one by one, my program converts the
operations into native machine code and lets the hardware do the work
directly. In this article I’ll go through how it works and how I did
it.</p>

<p><strong>Update</strong>: The <a href="http://redd.it/2zna5q">follow-up challenge</a> uses Reverse Polish
notation to allow for more complicated expressions. I wrote another
JIT compiler for <a href="https://gist.github.com/anonymous/f7e4a5086a2b0acc83aa">my submission</a> (<a href="/download/rpn-jit.c">mirror</a>).</p>

<h3 id="allocating-executable-memory">Allocating Executable Memory</h3>

<p>Modern operating systems have page-granularity protections for
different parts of <a href="http://marek.vavrusa.com/c/memory/2015/02/20/memory/">process memory</a>: read, write, and execute.
Code can only be executed from memory with the execute bit set on its
page, memory can only be changed when its write bit is set, and some
pages aren’t allowed to be read. In a running process, the pages
holding program code and loaded libraries will have their write bit
cleared and execute bit set. Most of the other pages will have their
execute bit cleared and their write bit set.</p>

<p>The reason for this is twofold. First, it significantly increases the
security of the system. If untrusted input was read into executable
memory, an attacker could input machine code (<em>shellcode</em>) into the
buffer, then exploit a flaw in the program to cause control flow to
jump to and execute that code. If the attacker is only able to write
code to non-executable memory, this attack becomes a lot harder. The
attacker has to rely on code already loaded into executable pages
(<a href="http://en.wikipedia.org/wiki/Return-oriented_programming"><em>return-oriented programming</em></a>).</p>

<p>Second, it catches program bugs sooner and reduces their impact, so
there’s less chance for a flawed program to accidentally corrupt user
data. Accessing memory in an invalid way will causes a segmentation
fault, usually leading to program termination. For example, <code class="language-plaintext highlighter-rouge">NULL</code>
points to a special page with read, write, and execute disabled.</p>

<h4 id="an-instruction-buffer">An Instruction Buffer</h4>

<p>Memory returned by <code class="language-plaintext highlighter-rouge">malloc()</code> and friends will be writable and
readable, but non-executable. If the JIT compiler allocates memory
through <code class="language-plaintext highlighter-rouge">malloc()</code>, fills it with machine instructions, and jumps to
it without doing any additional work, there will be a segmentation
fault. So some different memory allocation calls will be made instead,
with the details hidden behind an <code class="language-plaintext highlighter-rouge">asmbuf</code> struct.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define PAGE_SIZE 4096
</span>
<span class="k">struct</span> <span class="n">asmbuf</span> <span class="p">{</span>
    <span class="kt">uint8_t</span> <span class="n">code</span><span class="p">[</span><span class="n">PAGE_SIZE</span> <span class="o">-</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">uint64_t</span><span class="p">)];</span>
    <span class="kt">uint64_t</span> <span class="n">count</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>To keep things simple here, I’m just assuming the page size is 4kB. In
a real program, we’d use <code class="language-plaintext highlighter-rouge">sysconf(_SC_PAGESIZE)</code> to discover the page
size at run time. On x86-64, pages may be 4kB, 2MB, or 1GB, but this
program will work correctly as-is regardless.</p>

<p>Instead of <code class="language-plaintext highlighter-rouge">malloc()</code>, the compiler allocates memory as an anonymous
memory map (<code class="language-plaintext highlighter-rouge">mmap()</code>). It’s anonymous because it’s not backed by a
file.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">asmbuf</span> <span class="o">*</span>
<span class="nf">asmbuf_create</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">prot</span> <span class="o">=</span> <span class="n">PROT_READ</span> <span class="o">|</span> <span class="n">PROT_WRITE</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">flags</span> <span class="o">=</span> <span class="n">MAP_ANONYMOUS</span> <span class="o">|</span> <span class="n">MAP_PRIVATE</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">mmap</span><span class="p">(</span><span class="nb">NULL</span><span class="p">,</span> <span class="n">PAGE_SIZE</span><span class="p">,</span> <span class="n">prot</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Windows doesn’t have POSIX <code class="language-plaintext highlighter-rouge">mmap()</code>, so on that platform we use
<code class="language-plaintext highlighter-rouge">VirtualAlloc()</code> instead. Here’s the equivalent in Win32.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">asmbuf</span> <span class="o">*</span>
<span class="nf">asmbuf_create</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">DWORD</span> <span class="n">type</span> <span class="o">=</span> <span class="n">MEM_RESERVE</span> <span class="o">|</span> <span class="n">MEM_COMMIT</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">VirtualAlloc</span><span class="p">(</span><span class="nb">NULL</span><span class="p">,</span> <span class="n">PAGE_SIZE</span><span class="p">,</span> <span class="n">type</span><span class="p">,</span> <span class="n">PAGE_READWRITE</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Anyone reading closely should notice that I haven’t actually requested
that the memory be executable, which is, like, the whole point of all
this! This was intentional. Some operating systems employ a security
feature called W^X: “write xor execute.” That is, memory is either
writable or executable, but never both at the same time. This makes
the shellcode attack I described before even harder. For <a href="http://www.tedunangst.com/flak/post/now-or-never-exec">well-behaved
JIT compilers</a> it means memory protections need to be adjusted
after code generation and before execution.</p>

<p>The POSIX <code class="language-plaintext highlighter-rouge">mprotect()</code> function is used to change memory protections.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span>
<span class="nf">asmbuf_finalize</span><span class="p">(</span><span class="k">struct</span> <span class="n">asmbuf</span> <span class="o">*</span><span class="n">buf</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">mprotect</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">buf</span><span class="p">),</span> <span class="n">PROT_READ</span> <span class="o">|</span> <span class="n">PROT_EXEC</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Or on Win32 (that last parameter is not allowed to be <code class="language-plaintext highlighter-rouge">NULL</code>),</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span>
<span class="nf">asmbuf_finalize</span><span class="p">(</span><span class="k">struct</span> <span class="n">asmbuf</span> <span class="o">*</span><span class="n">buf</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">DWORD</span> <span class="n">old</span><span class="p">;</span>
    <span class="n">VirtualProtect</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">buf</span><span class="p">),</span> <span class="n">PAGE_EXECUTE_READ</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">old</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Finally, instead of <code class="language-plaintext highlighter-rouge">free()</code> it gets unmapped.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span>
<span class="nf">asmbuf_free</span><span class="p">(</span><span class="k">struct</span> <span class="n">asmbuf</span> <span class="o">*</span><span class="n">buf</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">munmap</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="n">PAGE_SIZE</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And on Win32,</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span>
<span class="nf">asmbuf_free</span><span class="p">(</span><span class="k">struct</span> <span class="n">asmbuf</span> <span class="o">*</span><span class="n">buf</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">VirtualFree</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">MEM_RELEASE</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I won’t list the definitions here, but there are two “methods” for
inserting instructions and immediate values into the buffer. This will
be raw machine code, so the caller will be acting a bit like an
assembler.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">asmbuf_ins</span><span class="p">(</span><span class="k">struct</span> <span class="n">asmbuf</span> <span class="o">*</span><span class="p">,</span> <span class="kt">int</span> <span class="n">size</span><span class="p">,</span> <span class="kt">uint64_t</span> <span class="n">ins</span><span class="p">);</span>
<span class="n">asmbuf_immediate</span><span class="p">(</span><span class="k">struct</span> <span class="n">asmbuf</span> <span class="o">*</span><span class="p">,</span> <span class="kt">int</span> <span class="n">size</span><span class="p">,</span> <span class="k">const</span> <span class="kt">void</span> <span class="o">*</span><span class="n">value</span><span class="p">);</span>
</code></pre></div></div>

<h3 id="calling-conventions">Calling Conventions</h3>

<p>We’re only going to be concerned with three of x86-64’s many
registers: <code class="language-plaintext highlighter-rouge">rdi</code>, <code class="language-plaintext highlighter-rouge">rax</code>, and <code class="language-plaintext highlighter-rouge">rdx</code>. These are 64-bit (<code class="language-plaintext highlighter-rouge">r</code>) extensions
of <a href="/blog/2014/12/09/">the original 16-bit 8086 registers</a>. The sequence of
operations will be compiled into a function that we’ll be able to call
from C like a normal function. Here’s what it’s prototype will look
like. It takes a signed 64-bit integer and returns a signed 64-bit
integer.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">long</span> <span class="nf">recurrence</span><span class="p">(</span><span class="kt">long</span><span class="p">);</span>
</code></pre></div></div>

<p><a href="http://en.wikipedia.org/wiki/X86_calling_conventions#x86-64_calling_conventions">The System V AMD64 ABI calling convention</a> says that the first
integer/pointer function argument is passed in the <code class="language-plaintext highlighter-rouge">rdi</code> register.
When our JIT compiled program gets control, that’s where its input
will be waiting. According to the ABI, the C program will be expecting
the result to be in <code class="language-plaintext highlighter-rouge">rax</code> when control is returned. If our recurrence
relation is merely the identity function (it has no operations), the
only thing it will do is copy <code class="language-plaintext highlighter-rouge">rdi</code> to <code class="language-plaintext highlighter-rouge">rax</code>.</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">mov</span>   <span class="nb">rax</span><span class="p">,</span> <span class="nb">rdi</span>
</code></pre></div></div>

<p>There’s a catch, though. You might think all the mucky
platform-dependent stuff was encapsulated in <code class="language-plaintext highlighter-rouge">asmbuf</code>. Not quite. As
usual, Windows is the oddball and has its own unique calling
convention. For our purposes here, the only difference is that the
first argument comes in <code class="language-plaintext highlighter-rouge">rcx</code> rather than <code class="language-plaintext highlighter-rouge">rdi</code>. Fortunately this only
affects the very first instruction and the rest of the assembly
remains the same.</p>

<p>The very last thing it will do, assuming the result is in <code class="language-plaintext highlighter-rouge">rax</code>, is
return to the caller.</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">ret</span>
</code></pre></div></div>

<p>So we know the assembly, but what do we pass to <code class="language-plaintext highlighter-rouge">asmbuf_ins()</code>? This
is where we get our hands dirty.</p>

<h4 id="finding-the-code">Finding the Code</h4>

<p>If you want to do this the Right Way, you go download the x86-64
documentation, look up the instructions we’re using, and manually work
out the bytes we need and how the operands fit into it. You know, like
they used to do <a href="/blog/2016/11/17/">out of necessity</a> back in the 60’s.</p>

<p>Fortunately there’s a much easier way. We’ll have an actual assembler
do it and just copy what it does. Put both of the instructions above
in a file <code class="language-plaintext highlighter-rouge">peek.s</code> and hand it to <code class="language-plaintext highlighter-rouge">nasm</code>. It will produce a raw binary
with the machine code, which we’ll disassemble with <code class="language-plaintext highlighter-rouge">nidsasm</code> (the
NASM disassembler).</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ nasm peek.s
$ ndisasm -b64 peek
00000000  4889F8            mov rax,rdi
00000003  C3                ret
</code></pre></div></div>

<p>That’s straightforward. The first instruction is 3 bytes and the
return is 1 byte.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">asmbuf_ins</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mh">0x4889f8</span><span class="p">);</span>  <span class="c1">// mov   rax, rdi</span>
<span class="c1">// ... generate code ...</span>
<span class="n">asmbuf_ins</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mh">0xc3</span><span class="p">);</span>      <span class="c1">// ret</span>
</code></pre></div></div>

<p>For each operation, we’ll set it up so the operand will already be
loaded into <code class="language-plaintext highlighter-rouge">rdi</code> regardless of the operator, similar to how the
argument was passed in the first place. A smarter compiler would embed
the immediate in the operator’s instruction if it’s small (32-bits or
fewer), but I’m keeping it simple. To sneakily capture the “template”
for this instruction I’m going to use <code class="language-plaintext highlighter-rouge">0x0123456789abcdef</code> as the
operand.</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">mov</span>   <span class="nb">rdi</span><span class="p">,</span> <span class="mh">0x0123456789abcdef</span>
</code></pre></div></div>

<p>Which disassembled with <code class="language-plaintext highlighter-rouge">ndisasm</code> is,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>00000000  48BFEFCDAB896745  mov rdi,0x123456789abcdef
         -2301
</code></pre></div></div>

<p>Notice the operand listed little endian immediately after the
instruction. That’s also easy!</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">long</span> <span class="n">operand</span><span class="p">;</span>
<span class="n">scanf</span><span class="p">(</span><span class="s">"%ld"</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">operand</span><span class="p">);</span>
<span class="n">asmbuf_ins</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mh">0x48bf</span><span class="p">);</span>         <span class="c1">// mov   rdi, operand</span>
<span class="n">asmbuf_immediate</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">operand</span><span class="p">);</span>
</code></pre></div></div>

<p>Apply the same discovery process individually for each operator you
want to support, accumulating the result in <code class="language-plaintext highlighter-rouge">rax</code> for each.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">switch</span> <span class="p">(</span><span class="n">operator</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">case</span> <span class="sc">'+'</span><span class="p">:</span>
        <span class="n">asmbuf_ins</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mh">0x4801f8</span><span class="p">);</span>   <span class="c1">// add   rax, rdi</span>
        <span class="k">break</span><span class="p">;</span>
    <span class="k">case</span> <span class="sc">'-'</span><span class="p">:</span>
        <span class="n">asmbuf_ins</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mh">0x4829f8</span><span class="p">);</span>   <span class="c1">// sub   rax, rdi</span>
        <span class="k">break</span><span class="p">;</span>
    <span class="k">case</span> <span class="sc">'*'</span><span class="p">:</span>
        <span class="n">asmbuf_ins</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mh">0x480fafc7</span><span class="p">);</span> <span class="c1">// imul  rax, rdi</span>
        <span class="k">break</span><span class="p">;</span>
    <span class="k">case</span> <span class="sc">'/'</span><span class="p">:</span>
        <span class="n">asmbuf_ins</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mh">0x4831d2</span><span class="p">);</span>   <span class="c1">// xor   rdx, rdx</span>
        <span class="n">asmbuf_ins</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mh">0x48f7ff</span><span class="p">);</span>   <span class="c1">// idiv  rdi</span>
        <span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>As an exercise, try adding support for modulus operator (<code class="language-plaintext highlighter-rouge">%</code>), XOR
(<code class="language-plaintext highlighter-rouge">^</code>), and bit shifts (<code class="language-plaintext highlighter-rouge">&lt;</code>, <code class="language-plaintext highlighter-rouge">&gt;</code>). With the addition of these
operators, you could define a decent PRNG as a recurrence relation. It
will also eliminate the <a href="https://old.reddit.com/r/dailyprogrammer/comments/2z68di/_/cpgkcx7">closed form solution</a> to this problem so
that we actually have a reason to do all this! Or, alternatively,
switch it all to floating point.</p>

<h3 id="calling-the-generated-code">Calling the Generated Code</h3>

<p>Once we’re all done generating code, finalize the buffer to make it
executable, cast it to a function pointer, and call it. (I cast it as
a <code class="language-plaintext highlighter-rouge">void *</code> just to avoid repeating myself, since that will implicitly
cast to the correct function pointer prototype.)</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">asmbuf_finalize</span><span class="p">(</span><span class="n">buf</span><span class="p">);</span>
<span class="kt">long</span> <span class="p">(</span><span class="o">*</span><span class="n">recurrence</span><span class="p">)(</span><span class="kt">long</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">code</span><span class="p">;</span>
<span class="c1">// ...</span>
<span class="n">x</span><span class="p">[</span><span class="n">n</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">recurrence</span><span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="n">n</span><span class="p">]);</span>
</code></pre></div></div>

<p>That’s pretty cool if you ask me! Now this was an extremely simplified
situation. There’s no branching, no intermediate values, no function
calls, and I didn’t even touch the stack (push, pop). The recurrence
relation definition in this challenge is practically an assembly
language itself, so after the initial setup it’s a 1:1 translation.</p>

<p>I’d like to build a JIT compiler more advanced than this in the
future. I just need to find a suitable problem that’s more complicated
than this one, warrants having a JIT compiler, but is still simple
enough that I could, on some level, justify not using LLVM.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>SSH and GPG Agents</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/06/08/"/>
    <id>urn:uuid:7a12226e-073a-3902-4fe8-842afdfdb951</id>
    <updated>2012-06-08T00:00:00Z</updated>
    <category term="crypto"/><category term="netsec"/>
    <content type="html">
      <![CDATA[<p>If you’re using SSH or GPG with any sort of frequency, you should
definitely be using their accompanying <code class="language-plaintext highlighter-rouge">*-agent</code> programs. The agents
allow you to gain a whole lot of convenience without compromising your
security. Many people seem to be unaware these tools exist, so here’s
an overview along with some tips on how to use them effectively.</p>

<p>Let’s start from the top.</p>

<p>Both SSH and GPG involve the use of asymmetric encryption, and the
private key is protected by a user-entered passphrase. The private key
is generally never written in to the filesystem in plaintext. In the
case of GPG, these keys are the primary focus of the application. For
SSH, they’re a useful tool to make accessing remote machines less
tedious. (The SSH server is authenticated by a public key, too, but
this is unrelated to agents.)</p>

<p>For those who are unaware, rather than enter a password when logging
into a remove machine, you can identify yourself by a public
key. Generating a key is simple.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ssh-keygen
</code></pre></div></div>

<p>You’ll almost certainly want to accept the default location for the
key (<code class="language-plaintext highlighter-rouge">~/.ssh/id_rsa</code>) because this is where SSH will look for it. Make
sure you enter a passphrase, which will encrypt the private key. The
reason this is important is because, without it, anyone who gains
access to your <code class="language-plaintext highlighter-rouge">id_rsa</code> file will be able to access any remote systems
that have been told to trust your public key. By having a passphrase,
this person needs not only the <code class="language-plaintext highlighter-rouge">id_rsa</code> file, but also the passphrase
(two-factor authentication), so you probably want to pick a long,
strong one. This may sound inconvenient, but <code class="language-plaintext highlighter-rouge">ssh-agent</code> will help
you.</p>

<p>The key generation process will create two files: <code class="language-plaintext highlighter-rouge">id_rsa</code> (private
key) and <code class="language-plaintext highlighter-rouge">id_rsa.pub</code> (public key). The latter is what you give to
remote systems.</p>

<p>Telling a remote system about your key is simple,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ssh-copy-id &lt;host&gt;
</code></pre></div></div>

<p>This will copy your <code class="language-plaintext highlighter-rouge">id_rsa.pub</code> to the remote system, prompting you
for the <em>password</em> on the <em>remote</em> system (not the passphrase you just
entered), adding it to the file <code class="language-plaintext highlighter-rouge">~/.ssh/authorized_keys</code>. From this
point on, all logins will use your new keypair rather than prompt you
for a password. Since you put a passphrase on your key, this may seem
pointless — it seems you still need to type in a password for every
connection. Bear with me here!</p>

<p>As a side note, you should have a unique SSH keypair for each
<i>site</i>, so you’ll have several of them. This way you can revoke
access to a particular site without affecting the others.</p>

<p>For GPG — the GNU Privacy Guard, <i>the</i> free software PGP
implementation — your keys are stored under <code class="language-plaintext highlighter-rouge">~/.gnupg/</code> in a
database. Generating a key is also a simple command,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gpg --gen-key
</code></pre></div></div>

<p>This is a slightly more complicated process, which I won’t get into
here. In contrast to SSH, you’ll generally have only one keypair per
<i>identity</i> (i.e. you only have one).</p>

<p>So you’ve got these keys are encrypted by passphrases. If they’re
going to be any use then they’ll be long, annoying things that are a
pain to type in. If that was the end of the story this would be really
inconvenient, enough to make the use of passphrases too costly for
many people to bother. Fortunately, we have agents to help.</p>

<p>An agent is a daemon process that can hold onto your passphrase
(<code class="language-plaintext highlighter-rouge">gpg-agent</code>) or your private key (<code class="language-plaintext highlighter-rouge">ssh-agent</code>) so that you only need
to enter your passphrase once within in some period of time (possibly
for the entire life of the agent process), rather than type it many
times over and over again as it’s needed. The agents are very careful
about how they hold on to this sensitive information, such as avoiding
having it written to swap. You can also configure how long you want
them to hold onto your passphrase/key before purging it from memory.</p>

<p>The <code class="language-plaintext highlighter-rouge">ssh</code> and <code class="language-plaintext highlighter-rouge">gpg</code> programs need to know where to find the
agents. This is done through environmental variables. For <code class="language-plaintext highlighter-rouge">ssh-agent</code>,
the process ID is stored in <code class="language-plaintext highlighter-rouge">SSH_AGENT_PID</code> and the location of the
Unix socket for communication is in <code class="language-plaintext highlighter-rouge">SSH_AUTH_SOCK</code>. <code class="language-plaintext highlighter-rouge">gpg-agent</code>
stuffs everything into one variable, <code class="language-plaintext highlighter-rouge">GPG_AGENT_INFO</code> (which is a pain
if you want to use this information in a script). When the main
program is invoked and it needs to use the private key, it will use
these variables and get in touch with the agent to see if it can
supply the needed information without bothering the user.</p>

<p>Remember, a process can’t change the environment of their parent
process so you need to set this information in the agent’s parent
shell somehow. There are two methods to set these up: eval and exec.</p>

<p>When you start the agent, it forks off its daemon process and prints
the variable information to stdout. This can be <code class="language-plaintext highlighter-rouge">eval</code>ed directly into
the current environment. You could drop these lines directly in your
<code class="language-plaintext highlighter-rouge">.bashrc</code> so that the agents are always there. (Though they won’t exit
with your shell, lingering around uselessly! More on this ahead.)</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>eval $(ssh-agent)
eval $(gpg-agent --daemon)
</code></pre></div></div>

<p>For the exec method, you <em>replace</em> your current shell with a new one
with a modified environment. To do this, you ask the agent to exec
into a shell, with the variables set, rather than return control.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>exec ssh-agent bash
exec gpg-agent --daemon bash
</code></pre></div></div>

<p>As cool trick, you can chain these together. <code class="language-plaintext highlighter-rouge">ssh-agent</code> becomes
<code class="language-plaintext highlighter-rouge">gpg-agent</code> which then becomes <code class="language-plaintext highlighter-rouge">bash</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>exec ssh-agent gpg-agent --daemon bash
</code></pre></div></div>

<p>Note that <code class="language-plaintext highlighter-rouge">gpg-agent</code> is capable of being an <code class="language-plaintext highlighter-rouge">ssh-agent</code> as well by
using the <code class="language-plaintext highlighter-rouge">--enable-ssh-support</code> option, so you don’t need to launch
an <code class="language-plaintext highlighter-rouge">ssh-agent</code>. Unfortunately, I don’t like to use this because
<code class="language-plaintext highlighter-rouge">gpg-agent</code> gets a little too personal with the SSH key, storing its
own copy with its own passphrase again.</p>

<p>On the other hand, <code class="language-plaintext highlighter-rouge">gpg-agent</code> is <em>much</em> more advanced than OpenSSH’s
<code class="language-plaintext highlighter-rouge">ssh-agent</code>. When you want to have <code class="language-plaintext highlighter-rouge">ssh-agent</code> manage a key, you need
to first tell it about the key with <code class="language-plaintext highlighter-rouge">ssh-add</code>. With no arguments, it
will use <code class="language-plaintext highlighter-rouge">~/.ssh/id_rsa</code>. If you forget to do this, <code class="language-plaintext highlighter-rouge">ssh</code> will ask for
your passphrase directly, in your terminal, not allowing <code class="language-plaintext highlighter-rouge">ssh-agent</code>
to hold onto it. By comparison, <code class="language-plaintext highlighter-rouge">gpg</code> will always ask <code class="language-plaintext highlighter-rouge">gpg-agent</code> to
retrieve your passphrase when it’s needed (if the agent is available),
so it will cache your passphrase on demand. No need to explicitly
register with the agent. Even better, it will try its best to use a
“PIN entry” program to read your key, which helps protect against some
kinds of keyloggers — preventing other processes from seeing your
keystrokes.</p>

<p>Well, this is all fine and dandy except when you’ve already got an
agent running. Say you’re launching a new terminal emulator window
from an existing one, creating a new shell. Unfortunately, even though
you have agents running <em>and</em> they’re listed in your environment (from
the origin shell), <em>they’ll still spawn new agents</em>! This is really
lousy behavior, in my opinion. There’s no <code class="language-plaintext highlighter-rouge">--inherit</code> option to tell
them to silently pass along the information of the existing agent if
it appears to be valid. This causes two problems. One, you’ll need to
enter your passphrases <em>again</em> for the new agent. Second, these new
agents will linger around after the spawning shell has exited —
hogging important non-swappable memory.</p>

<p>The direct workaround is to, in your shell init script, check for
these variables yourself and check that they’re valid (the agent
process is still running) before trying to spawn any agents. This is
tedious, error-prone, and makes each user do a lot of work that could
have been done in one place by one person instead.</p>

<p>There’s still the problem of when you launch a new shell that doesn’t
inherit the variables (i.e. a remote login), so there’s no way for it
to be aware of the existing agents. To fix this, you’d need to write
the agent information to a file. The shell init script checks this
file for an existing agent before spawning one. This is even more
complicated, more error-prone, and subject to race-conditions. Why
make every use go through this process?!</p>

<p>Fortunately someone’s done all this work so you don’t have to! There’s
an awesome little tool called
<a href="http://www.funtoo.org/wiki/Keychain">Keychain</a> which can be used to
launch the agents for you. It stores the agent information in a file
so that you only ever launch one instance of the agent, and the agents
will be shared across every shell. It <em>does</em> have an <code class="language-plaintext highlighter-rouge">--inherit</code>
option — the default behavior, so you don’t even need to ask
nicely. Instead of running the <code class="language-plaintext highlighter-rouge">*-agent</code>s directly, you just put this
in your <code class="language-plaintext highlighter-rouge">.bashrc</code>,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>eval $(keychain --eval --quiet)
</code></pre></div></div>

<p>So simple and it <em>just works</em>! I was so happy when I found this. This
is the magic word that makes using agents a breeze, so I can’t
recommend it enough.</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>SSH Honeypots</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/05/19/"/>
    <id>urn:uuid:f7897af2-ca8e-3a80-f951-de22639bf7c7</id>
    <updated>2012-05-19T00:00:00Z</updated>
    <category term="netsec"/>
    <content type="html">
      <![CDATA[<p>Three years ago I was experimenting with high-interaction SSH
honeypots. I failed to document the effort as a blog post
afterwards. Fortunately, I’ve been experimenting with honeypots again,
so I’m taking the time to document it this time.</p>

<p>A <em>honeypot</em> is a fake service or computer on a network used in
detecting and deflecting attacks on the network. Ideally, an attacker
is unable to tell honeypots apart from real systems, attacking the
honeypots instead. In general, honeypots fall into two categories:
<em>high-interaction</em> and <em>low-interaction</em>. The former will imitate a
real system with high fidelity while the latter may just listen for
connections on common ports, without actually accepting or sending
data.</p>

<p>What triggered my curiosity was that I wanted to put OpenBSD’s
<a href="http://www.openbsd.org/cgi-bin/man.cgi?query=securelevel"><code class="language-plaintext highlighter-rouge">securelevel(7)</code></a>
feature to the test. In short, it’s a runtime system value that ranges
from -1 (least secure) to 2 (most secure), and it’s not possible to
decrease the level without gaining physical access to the system. Each
increase makes the system more read-only, and less flexible, so it’s a
trade-off. A system running at level 2 should not carry over any state
between boots — like a LiveCD on a system with no disks.</p>

<p>I set up a fresh OpenBSD install in a <a href="http://qemu.org/">QEMU</a> virtual
machine, locked the system down with <code class="language-plaintext highlighter-rouge">securelevel</code> at 1, forwarded the
SSH port all the way out to the Internet, and than gave
<a href="http://devrand.org/">Gavin</a> the root password. I told him to go nuts,
with the ultimate goal that when he was done I should be unable to
tell he had even logged into the system. All of the system logs were
set to append-only, enforced through the kernel by <code class="language-plaintext highlighter-rouge">securelevel</code>, so
this should have been a very difficult task indeed.</p>

<p>It turned out he was much more successful than I expected. When he
told me he was done, I SSHed into the system to check the logs finding
that there were no entries indicating he logged in at all. The only
proof I could find that he was actually in was a message he
intentionally left behind for me. Did he just subvert <code class="language-plaintext highlighter-rouge">securelevel</code>?!</p>

<p>Turns out not quite. Whew! I was just putting too much trust into a
system I knew was compromised. He mounted a loopback filesystem
over top of the <code class="language-plaintext highlighter-rouge">/var/log</code>, then filled it with fake logs. He also
sabotaged the mount programs so that they’d hide the loopback mount
from me. Since the mount programs were on a read-only system, he had
to do a loopback mount there, too. After restarting OpenSSH, it was no
longer writing to the append-only log, but to the doctored log.</p>

<p>So, the proper way to check your security logs is by mounting the
compromised filesystem in a known trusted system — or, in this case,
just rebooting would have fixed it. Even with <code class="language-plaintext highlighter-rouge">securelevel</code>, you can’t
check the compromised system in-place. Let this be a lesson to all
those amateur sysadmins out there (including me)!</p>

<p>We did a second round and he managed to trick me again by taking me
further into the rabbit hole. Instead of loopback mounts, since I was
expecting that, he had root log into a chroot environment, filled with
a full copy of the system including fake logs. This version survived
reboots and really required inspection from an external system.</p>

<p>After all this, I wanted to crank things up a notch by letting some
real attackers into my test system. I was already accustomed to seeing
many password-guesses on my SSH server in the logs, so getting someone
into my honeypot wouldn’t take long at all. While I didn’t care of
they trashed my VM — restoring from snapshot was an automatic process
— I really didn’t want them to take advantage of my Internet
connection, using it for DDoS attacks or pivoting to attack other SSH
servers. So I needed a way to allow them <em>in</em> though SSH, but not allow
any other traffic <em>out</em>.</p>

<p>If I was doing this today, I’d probably use <code class="language-plaintext highlighter-rouge">iptables</code> to only allow
SSH in, and then bridge the VM to the Internet with a TUN/TAP,
replacing my real SSH server on port 22. However, three years ago I
didn’t know how to do this. Instead I found a really simple hack to
get this done: <a href="http://tsocks.sourceforge.net/"><code class="language-plaintext highlighter-rouge">tsocks</code></a>. <code class="language-plaintext highlighter-rouge">tsocks</code>
adds SOCKS proxying to any application by replacing the sockets API
with its own. In my case, I wrapped the VM in <code class="language-plaintext highlighter-rouge">tsocks</code> configured to
use a non-existent SOCKS proxy (127.0.0.1). It could accept any
incoming connection (though limited to SSH because of NAT) but unable
to make any outgoing connections. Perfect!</p>

<p>I hadn’t realized it yet, but this was a high-interaction SSH honeypot
I created.</p>

<p>I set the root password to “password” and let it go for awhile,
tailing the OpenSSH logs to watch for activity. The brute-force bots
would eventually make their way inside but immediately log out and
keep guessing passwords for root. Either they were really poorly
programmed or they were specifically testing for honeypots that allow
different passwords. They must have logged the address for a human to
investigate some time in the future, because I never witnessed any
shell activity. On the other hand, this was all very difficult to
observe, for the same reasons Gavin was able to cover his tracks. My
honeypot was useful for catching and detecting attackers, but it
wasn’t good for observing them in action.</p>

<p>While I was investigating this I came across
<a href="http://kojoney.sourceforge.net/">Kojoney</a>, which is a low-interaction
SSH honeypot mainly for seeing what sorts of passwords attackers were
guessing. Unfortunately, I could never get it to work, so I never used
it.</p>

<p>Several years passed and I recently came across a project that didn’t
exist last time: <a href="http://code.google.com/p/kippo/">kippo</a>, a
“medium”-interaction SSH honeypot. This is everything I was looking
for before. It doesn’t require a full-blown VM, it’s has high fidelity
interaction, it’s safe, and it allows me to fully observe all activity
— it even records the tty session for replay. Cool!</p>

<p>kippo is written in pure Python, so there shouldn’t be any buffer
overflows, and doesn’t execute any external programs. It <em>should</em> be
safe, but I’m not aware of any real security reviews, so it’s a
use-at-your-own risk thing. They warn about this on their website.</p>

<p>I’ve run this off and on on the weekends. Since I haven’t run my real
SSH server on port 22 since 2009 (no recorded attacks since!), my IP
address atracts much less attention than before, so it hasn’t seen too
much activity. I have had two humans connect and log in. Both
downloaded a well-known script kiddie tool called <code class="language-plaintext highlighter-rouge">go.sh</code>. Here’s an
analysis of the tool by someone who was actually attacked with it:
<a href="http://www.shellperson.net/hacked-ssh-bruteforce/">SSH Bruteforce</a>.</p>

<p>In fact, <code class="language-plaintext highlighter-rouge">go.sh</code> is so well known that it gave me a little scare. In
my tty recording it looked like the tool was actually executed! The
skull banner printed out and it had an interface. I was really nervous
until I found kippo’s
<a href="http://code.google.com/p/kippo/source/browse/trunk/kippo/commands/malware.py?r=204">malware.py</a>. Kippo
actually recognizes some script kiddie tools and imitates their
interfaces to further confuse attackers. I <em>do</em> run kippo as an
unprivileged user so it wouldn’t be the end of the world if something
did happen, but
<a href="http://it.slashdot.org/story/08/02/10/2011257/linux-kernel-26-local-root-exploit">I’d still</a>
<a href="http://blog.zx2c4.com/749">be uncomfortable</a>.</p>

<p>There’s neat feature of kippo, which hilariously caught Gavin
off-guard when I had him poke at it. kippo will never disconnect a
session on its own. If an <code class="language-plaintext highlighter-rouge">exit</code> or <code class="language-plaintext highlighter-rouge">C-d</code> is given, it drops into
<em>another</em> fake shell with the hostname “localhost”, merely pretending
to log out. That way you get a chance to see some commands the
attackers are meaning to run on their own system, before they realize
their mistake. The only way to disconnected is to either close your
terminal emulator or use SSH’s <code class="language-plaintext highlighter-rouge">~.</code> escape sequence.</p>

<p>I’ve been considering running kippo all the time with no password set
— using it as a true honeypot. This would help keep anyone from
finding my real SSH server, since they would find the honeypot and
stop searching other ports. It would also waste time that could be
spent attacking other people’s real SSH servers, helping to protect
other servers out there. My real SSH server (on my router) doesn’t
allow password logins, only key logins, so I already feel pretty good
about its security. I’ve never seen a brute-force attempt on the
current port anyway. But if I do, I now have kippo as another tool in
my security toolbelt.</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>I Finally Have Comments</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2009/05/12/"/>
    <id>urn:uuid:19b5b979-433a-3529-2a61-d25abd07a5b9</id>
    <updated>2009-05-12T00:00:00Z</updated>
    <category term="meta"/><category term="netsec"/>
    <content type="html">
      <![CDATA[<!-- 12 May 2009 -->
<p class="abstract">
Update: This post is referring to my old web hosting situation. I'm
now using external comment hosting because my blog is now statically
hosted.
</p>
<p>
I finally have a comment system, thanks to <a
href="http://www.nukekiller.net/pollxn/">pollxn</a>, a <a
href="http://www.blosxom.com/">blosxom</a> comment system that
actually works. There is a link to it, indicating the number of
comments, in the bottom of each post. Try it out and say hello.
</p>
<p>
Unfortunately, pollxn doesn't have any sort of anti-spam or <a
href="http://en.wikipedia.org/wiki/Captcha"> CAPTCHA</a> system. If
you look around the Interwebs where other people are using pollxn, you
will see everyone has their own little CAPTCHA thing. Well, I am not
different. I hacked together my own to keep away automated spammers.
</p>
<p>
It selects words from the dictionary (of 40,000 words in this case)
and encrypts them with Blowfish in CBC mode, with a unique IV each
time. This is to passed to the user, who passes it to an image
generator which decrypts the word and uses GD in Perl to render it,
apply some transforms, and drop a line randomly over it. The user
submits the guess of the image along with the encrypted version
(hidden field), which is decrypted and compared on the other end. The
same encrypted ID cannot be used twice, but thanks to the IV the same
word <i>can</i> be used twice.
</p>
<p>
Here are some samples. If you hit refresh, they will render
differently. (<i>Update: not any more. These are just static examples
now.</i>)
</p>
<p>
<img src="/img/captcha/a.png" alt="CAPTCHA sample: ormolus"/>
<img src="/img/captcha/b.png" alt="CAPTCHA sample: morons" />
<img src="/img/captcha/c.png" alt="CAPTCHA sample: softer" />
<img src="/img/captcha/d.png" alt="CAPTCHA sample: zanucks" />
<img src="/img/captcha/e.png" alt="CAPTCHA sample: grumble" />
<img src="/img/captcha/f.png" alt="CAPTCHA sample: nozzle" />
</p>
<p>
It's not a great CAPTCHA, but it should be good enough for the low
volume of traffic I see here. As I inevitably collect small amounts of
spam (by spammers manually passing the CAPTCHA), I will gradually
create the needed tools to combat it. I can also easily update the
CAPTCHA image algorithm without disrupting the functioning of the
website.
</p>
<p>
I'm sure I will be making improvements to the comment system over time
as well. I should make it obfuscate e-mail addresses, for one. Maybe
add a preview. And better blosxom integration.
</p>
<p>
So say hello below! I am excited to finally have a <i>real</i> blog.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Controlling a Minefield</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2008/12/16/"/>
    <id>urn:uuid:0db94fda-0de6-3cec-fcdd-5b2d2e1d23ea</id>
    <updated>2008-12-16T00:00:00Z</updated>
    <category term="story"/><category term="crypto"/><category term="netsec"/>
    <content type="html">
      <![CDATA[<!-- 16 December 2008 -->
<p>
  <img src="/img/misc/naval-mine.jpg" alt="" title="Not a space mine."
       class="left"/>

Some time ago I was watching through the entire series of <a
href="http://en.wikipedia.org/wiki/Deep_Space_Nine">Deep Space
9</a>. It was a Star Trek television show about a space station that
rests next a <a href="http://en.wikipedia.org/wiki/Wormhole">
wormhole</a> that connects to the other side of the galaxy (The Delta
quadrant).
</p>
<p>
The Delta quadrant is ruled by a group called the Dominion, and they
are looking to conquer the Federation side of the galaxy (the Alpha
quadrant). At one point during the series, the Federation needs to
temporarily disable the wormhole to prevent Dominion ships from
crossing through. They do this by <a
href="http://startrek.wikia.com/wiki/Second_Battle_of_Deep_Space_9">
mining the wormhole</a> with identical, cloaked, self-replicating
mines.
</p>
<p>
If a mine is destroyed, the neighboring mines will replicate a
replacement. The minefield repairs itself. This makes removing the
minefield within a reasonable amount of time difficult to
impossible. If even a single mine is left behind, it can replicate the
entire minefield again.
</p>
<p>
The most interesting question here is this:
</p>
<blockquote>
  <p>
  When the Federation returns and wants to remove the minefield, how
  would they do it? What would stop the Dominion from doing the same
  thing?
  </p>
</blockquote>
<p>
The first thing that comes to mind is having a kill signal, but what
would this signal be? It could simply be a plain "kill" command, but
the Dominion could also broadcast such a signal to disable the
minefield. Consider that the Dominion could capture a single mine and
study everything about its workings. The minefield itself could
therefore hold no secrets whatsoever. This leaves out any possibility
of a secret kill command stored in the mines.
</p>
<p>
Here's what I would do, assuming that humans or aliens have not yet
discovered some giant breakthrough in factoring in the Star Trek
universe. I would randomly generate two very large prime
numbers. Today, two 1024-bit primes should be more than enough, but in
350 years even larger numbers would probably be necessary. Then, I
multiply these two number together and store this number in the mine
software. To disable the minefield, I simply broadcast these two
numbers into the minefield. The mines would be programmed to take the
product of any pairs of numbers it receives. If the product matches
the internal number, the mine shuts down.
</p>
<p>
Voila! A method for shutting down the minefield. The enemy can know
everything about every single mine's construction, including the
software and data stored on every mine, but will be unable to disable
the minefield without factoring a very large composite number, which
would presumably be difficult or impossible (within a reasonable
amount of time).
</p>
<p>
Another possibility would be using a hash. Come up with a strong
passphrase, then use a hashing algorithm like SHA-1 or MD5, or
whatever is available and appropriate in 350 years, to hash the
passphrase. Store the hash in the mines. When you want to disable the
minefield, broadcast the passphrase. These mines will hash the
broadcast and compare it to the stored hash. It's really the same
solution as before: a one-way function. This is also similar to how
passwords are stored inside a computer today.
</p>
<p>
If we wanted more commands, like "don't blow up any ships for awhile"
or "increase minefield density", we could generate more composites
corresponding to each command. However, once a command is issued, the
secret — the two prime numbers — is out, and it cannot be used again.
In this case, I would go into the realm
of <a href="http://en.wikipedia.org/wiki/Public_key_cryptography">public
key cryptography</a>.
</p>
<p>
I would issue a command, along with a timestamp, and maybe even a
nonce that could double as a global identifier for the command, and
sign the whole deal using my private key. On each mine I would store
the public key. When a command is received, the mines would check the
signature before executing the command. I could then issue repeat
commands, as the timestamps would change each time. An adversary
learns nothing when a command is issued, because the time stamps would
make any replay attacks useless.
</p>
<p>
Minefields just like this exist today all over the Internet, as <a
href="http://en.wikipedia.org/wiki/Botnet">botnets</a>. Thousands of
computers all around the world become infected with malware and come
under the control of a single individual or group. Individual machines
in the botnet could be taken out, but removing the entire botnet is
difficult as it grows and repairs itself. Any security researcher
could disassemble the botnet malware and learn anything about it, so
the malware can store no secrets. How does a malicious person control
the botnet, then, without someone else taking control?  Public key
cryptography, just as described above.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  

</feed>
