<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title>Articles tagged crypto at null program</title>
  <link rel="alternate" type="text/html"
        href="https://nullprogram.com/tags/crypto/"/>
  <link rel="self" type="application/atom+xml"
        href="https://nullprogram.com/tags/crypto/feed/"/>
  <updated>2026-04-09T13:25:45Z</updated>
  <id>urn:uuid:799d7efd-3418-409f-b602-5191d8b4fa4c</id>

  <author>
    <name>Christopher Wellons</name>
    <uri>https://nullprogram.com</uri>
    <email>wellons@nullprogram.com</email>
  </author>

  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Billions of Code Name Permutations in 32 bits</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2021/09/14/"/>
    <id>urn:uuid:bc17a779-bee1-4a60-80d1-5c5cfd8fd638</id>
    <updated>2021-09-14T21:06:59Z</updated>
    <category term="c"/><category term="crypto"/><category term="optimization"/>
    <content type="html">
      <![CDATA[<p>My friend over at Possibly Wrong <a href="https://possiblywrong.wordpress.com/2021/09/13/code-name-generator/">created a code name generator</a>. By
coincidence I happened to be thinking about code names myself while
recently replaying <a href="https://en.wikipedia.org/wiki/XCOM:_Enemy_Within"><em>XCOM: Enemy Within</em></a> (2012/2013). The game
generates a random code name for each mission, and I wondered how often it
repeats. The <a href="https://www.ufopaedia.org/index.php/Mission_Names_(EU2012)">UFOpaedia page on the topic</a> gives the word lists: 53
adjectives and 76 nouns, for a total of 4028 possible code names. A
typical game has around 60 missions, and if code names are generated
naively on the fly, then per the birthday paradox around half of all games
will see a repeated mission code name! Fortunately this is easy to avoid,
and the particular configuration here lends itself to an interesting
implementation.</p>

<p>Mission code names are built using “<em>adjective</em> <em>noun</em>”. Some examples
from the game’s word list:</p>

<ul>
  <li>Fading Hammer</li>
  <li>Fallen Jester</li>
  <li>Hidden Crown</li>
</ul>

<p>To generate a code name, we could select a random adjective and a random
noun, but as discussed it wouldn’t take long for a collision. The naive
approach is to keep a database of previously-generated names, and to
consult this database when generating new names. That works, but there’s
an even better solution: use a random permutation. Done well, we don’t
need to keep track of previous names, and the generator won’t repeat until
it’s exhausted all possibilities.</p>

<p>Further, the total number of possible code names, 4028, is suspiciously
shy of 4,096, a power of two (<code class="language-plaintext highlighter-rouge">2**12</code>). That makes designing and
implementing an efficient permutation that much easier.</p>

<h3 id="a-linear-congruential-generator">A linear congruential generator</h3>

<p>A classic, obvious solution is a <a href="/blog/2019/11/19/">linear congruential generator</a>
(LCG). A full-period, 12-bit LCG is nothing more than a permutation of the
numbers 0 to 4,095. When generating names, we can skip over the extra 68
values and pretend it’s a permutation of 4,028 elements. An LCG is
constructed like so:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>f(n) = (f(n-1)*A + C) % M
</code></pre></div></div>

<p>Typically the seed is used for <code class="language-plaintext highlighter-rouge">f(0)</code>. M is selected based on the problem
space or implementation efficiency, and usually a power of two. In this
case it will be 4,096. Then there are some rules for choosing A and C.</p>

<p>Simply choosing a random <code class="language-plaintext highlighter-rouge">f(0)</code> per game isn’t great. The code name order
will always be the same, and we’re only choosing where in the cycle to
start. It would be better to vary the permutation itself, which we can do
by also choosing unique A and C constants per game.</p>

<p>Choosing C is easy: It must be relatively prime with M, i.e. it must be
odd. Since it’s addition modulo M, there’s no reason to choose <code class="language-plaintext highlighter-rouge">C &gt;= M</code>
since the results are identical to a smaller C. If we think of C as a
12-bit integer, 1 bit is locked in, and the other 11 bits are free to
vary:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>xxxxxxxxxxx1
</code></pre></div></div>

<p>Choosing A is more complicated: must be odd, <code class="language-plaintext highlighter-rouge">A-1</code> must be divisible by 4,
and <code class="language-plaintext highlighter-rouge">A-1</code> should be divisible by 8 (better results). Again, thinking of
this in terms of a 12-bit number, this locks in 3 bits and leaves 9 bits
free:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>xxxxxxxxx101
</code></pre></div></div>

<p>This ensures all the <em>must</em> and <em>should</em> properties of A.</p>

<p>Finally <code class="language-plaintext highlighter-rouge">0 &lt;= f(0) &lt; M</code>. Because of modular arithmetic larger, values are
redundant, and all possible values are valid since the LCG, being
full-period, will cycle through all of them. This is just choosing the
starting point in a particular permutation cycle. As a 12-bit number, all
12 bits are free:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>xxxxxxxxxxxx
</code></pre></div></div>

<p>That’s <code class="language-plaintext highlighter-rouge">9 + 11 + 12 = 32</code> free bits to fill randomly: again, how
incredibly convenient! Every 32-bit integer defines some unique code name
permutation… <em>almost</em>. Any 32-bit descriptor where <code class="language-plaintext highlighter-rouge">f(0) &gt;= 4028</code> will
collide with at least one other due to skipping, and so around 1.7% of the
state space is redundant. A small loss that should shrink with slightly
better word list planning. I don’t think anyone will notice.</p>

<h3 id="slice-and-dice">Slice and dice</h3>

<p><a href="/blog/2020/12/31/">I love compact state machines</a>, and this is an opportunity to put one
to good use. My code name generator will be just one function:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span> <span class="nf">codename</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">state</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">buf</span><span class="p">);</span>
</code></pre></div></div>

<p>This takes one of those 32-bit permutation descriptors, writes the first
code name to <code class="language-plaintext highlighter-rouge">buf</code>, and returns a descriptor for another permutation that
starts with the next name. All we have to do is keep track of that 32-bit
number and we’ll never need to worry about repeating code names until all
have been exhausted.</p>

<p>First, lets extract A, C, and <code class="language-plaintext highlighter-rouge">f(0)</code>, which I’m calling S. The low bits
are A, middle bits are C, and high bits are S. Note the OR with 1 and 5 to
lock in the hard-set bits.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">long</span> <span class="n">a</span> <span class="o">=</span> <span class="p">(</span><span class="n">state</span> <span class="o">&lt;&lt;</span>  <span class="mi">3</span> <span class="o">|</span> <span class="mi">5</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0xfff</span><span class="p">;</span>  <span class="c1">//  9 bits</span>
<span class="kt">long</span> <span class="n">c</span> <span class="o">=</span> <span class="p">(</span><span class="n">state</span> <span class="o">&gt;&gt;</span>  <span class="mi">8</span> <span class="o">|</span> <span class="mi">1</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0xfff</span><span class="p">;</span>  <span class="c1">// 11 bits</span>
<span class="kt">long</span> <span class="n">s</span> <span class="o">=</span>  <span class="n">state</span> <span class="o">&gt;&gt;</span> <span class="mi">20</span><span class="p">;</span>               <span class="c1">// 12 bits</span>
</code></pre></div></div>

<p>Next iterate the LCG until we have a number in range:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">do</span> <span class="p">{</span>
    <span class="n">s</span> <span class="o">=</span> <span class="p">(</span><span class="n">s</span><span class="o">*</span><span class="n">a</span> <span class="o">+</span> <span class="n">c</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0xfff</span><span class="p">;</span>
<span class="p">}</span> <span class="k">while</span> <span class="p">(</span><span class="n">s</span> <span class="o">&gt;=</span> <span class="mi">4028</span><span class="p">);</span>
</code></pre></div></div>

<p>Once we have an appropriate LCG state, compute the adjective/noun indexes
and build a code name:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="n">s</span> <span class="o">%</span> <span class="mi">53</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">j</span> <span class="o">=</span> <span class="n">s</span> <span class="o">/</span> <span class="mi">53</span><span class="p">;</span>
<span class="n">sprintf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="s">"%s %s"</span><span class="p">,</span> <span class="n">adjvs</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">nouns</span><span class="p">[</span><span class="n">j</span><span class="p">]);</span>
</code></pre></div></div>

<p>Finally assemble the next 32-bit state. Since A and C don’t change, these
are passed through while the old S is masked out and replaced with the new
S.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">return</span> <span class="p">(</span><span class="n">state</span> <span class="o">&amp;</span> <span class="mh">0xfffff</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="kt">uint32_t</span><span class="p">)</span><span class="n">s</span><span class="o">&lt;&lt;</span><span class="mi">20</span><span class="p">;</span>
</code></pre></div></div>

<p>Putting it all together:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">adjvs</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span> <span class="cm">/* ... */</span> <span class="p">};</span>
<span class="k">static</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">nouns</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span> <span class="cm">/* ... */</span> <span class="p">};</span>

<span class="kt">uint32_t</span> <span class="nf">codename</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">state</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">buf</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">long</span> <span class="n">a</span> <span class="o">=</span> <span class="p">(</span><span class="n">state</span> <span class="o">&lt;&lt;</span>  <span class="mi">3</span> <span class="o">|</span> <span class="mi">5</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0xfff</span><span class="p">;</span>  <span class="c1">//  9 bits</span>
    <span class="kt">long</span> <span class="n">c</span> <span class="o">=</span> <span class="p">(</span><span class="n">state</span> <span class="o">&gt;&gt;</span>  <span class="mi">8</span> <span class="o">|</span> <span class="mi">1</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0xfff</span><span class="p">;</span>  <span class="c1">// 11 bits</span>
    <span class="kt">long</span> <span class="n">s</span> <span class="o">=</span>  <span class="n">state</span> <span class="o">&gt;&gt;</span> <span class="mi">20</span><span class="p">;</span>               <span class="c1">// 12 bits</span>

    <span class="k">do</span> <span class="p">{</span>
        <span class="n">s</span> <span class="o">=</span> <span class="p">(</span><span class="n">s</span><span class="o">*</span><span class="n">a</span> <span class="o">+</span> <span class="n">c</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0xfff</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">while</span> <span class="p">(</span><span class="n">s</span> <span class="o">&gt;=</span> <span class="n">COUNTOF</span><span class="p">(</span><span class="n">adjvs</span><span class="p">)</span><span class="o">*</span><span class="n">COUNTOF</span><span class="p">(</span><span class="n">nouns</span><span class="p">));</span>

    <span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="n">s</span> <span class="o">%</span> <span class="n">COUNTOF</span><span class="p">(</span><span class="n">adjvs</span><span class="p">);</span>
    <span class="kt">int</span> <span class="n">j</span> <span class="o">=</span> <span class="n">s</span> <span class="o">/</span> <span class="n">COUNTOF</span><span class="p">(</span><span class="n">adjvs</span><span class="p">);</span>
    <span class="n">sprintf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="s">"%s %s"</span><span class="p">,</span> <span class="n">adjvs</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">nouns</span><span class="p">[</span><span class="n">j</span><span class="p">]);</span>
    <span class="k">return</span> <span class="p">(</span><span class="n">state</span> <span class="o">&amp;</span> <span class="mh">0xfffff</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="kt">uint32_t</span><span class="p">)</span><span class="n">s</span><span class="o">&lt;&lt;</span><span class="mi">20</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The caller just needs to generate an initial 32-bit integer. Any 32-bit
integer is valid — even zero — so this could just be, say, the unix epoch
(<code class="language-plaintext highlighter-rouge">time(2)</code>), but adjacent values will have similar-ish permutations. I
intentionally placed S in the high bits, which are least likely to vary,
since it only affects where the cycle begins, while A and C have a much
more dramatic impact and so are placed at more variable locations.</p>

<p>Regardless, it would be better to hash such an input so that adjacent time
values map to distant states. It also helps hide poorer (less random)
choices for A multipliers. I happen to have <a href="/blog/2018/07/31/">designed some great functions
for exactly this purpose</a>. Here’s one of my best:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">uint32_t</span>
<span class="nf">hash32</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">x</span> <span class="o">+=</span> <span class="mh">0x3243f6a8U</span><span class="p">;</span> <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">15</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">*=</span> <span class="mh">0xd168aaadU</span><span class="p">;</span> <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">15</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">*=</span> <span class="mh">0xaf723597U</span><span class="p">;</span> <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">15</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">x</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This would be perfectly reasonable for generating all possible names in a
random order:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span> <span class="n">state</span> <span class="o">=</span> <span class="n">hash32</span><span class="p">(</span><span class="n">time</span><span class="p">(</span><span class="mi">0</span><span class="p">));</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">4028</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">32</span><span class="p">];</span>
    <span class="n">state</span> <span class="o">=</span> <span class="n">codename</span><span class="p">(</span><span class="n">state</span><span class="p">,</span> <span class="n">buf</span><span class="p">);</span>
    <span class="n">puts</span><span class="p">(</span><span class="n">buf</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>To further help cover up poorer A multipliers, it’s better for the word
list to be pre-shuffled in its static storage. If that underlying order
happens to show through, at least it will be less obvious (i.e. not in
alphabetical order). Shuffling the string list in my source is just a few
keystrokes in Vim, so this is easy enough.</p>

<h3 id="robustness">Robustness</h3>

<p>If you’re set on making the <code class="language-plaintext highlighter-rouge">codename</code> function easier to use such that
consumers don’t need to think about hashes, you could “encode” and
“decode” the descriptor going in an out of the function:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span> <span class="nf">codename</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">state</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">buf</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">state</span> <span class="o">+=</span> <span class="mh">0x3243f6a8U</span><span class="p">;</span> <span class="n">state</span> <span class="o">^=</span> <span class="n">state</span> <span class="o">&gt;&gt;</span> <span class="mi">17</span><span class="p">;</span>
    <span class="n">state</span> <span class="o">*=</span> <span class="mh">0x9e485565U</span><span class="p">;</span> <span class="n">state</span> <span class="o">^=</span> <span class="n">state</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">;</span>
    <span class="n">state</span> <span class="o">*=</span> <span class="mh">0xef1d6b47U</span><span class="p">;</span> <span class="n">state</span> <span class="o">^=</span> <span class="n">state</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">;</span>

    <span class="c1">// ...</span>

    <span class="n">state</span> <span class="o">=</span> <span class="p">(</span><span class="n">state</span> <span class="o">&amp;</span> <span class="mh">0xfffff</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="kt">uint32_t</span><span class="p">)</span><span class="n">s</span><span class="o">&lt;&lt;</span><span class="mi">20</span><span class="p">;</span>
    <span class="n">state</span> <span class="o">^=</span> <span class="n">state</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">;</span> <span class="n">state</span> <span class="o">*=</span> <span class="mh">0xeb00ce77U</span><span class="p">;</span>
    <span class="n">state</span> <span class="o">^=</span> <span class="n">state</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">;</span> <span class="n">state</span> <span class="o">*=</span> <span class="mh">0x88ccd46dU</span><span class="p">;</span>
    <span class="n">state</span> <span class="o">^=</span> <span class="n">state</span> <span class="o">&gt;&gt;</span> <span class="mi">17</span><span class="p">;</span> <span class="n">state</span> <span class="o">-=</span> <span class="mh">0x3243f6a8U</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">state</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This permutes the state coming in, and reverses that permutation on the
way out (read: inverse hash). This breaks up similar starting points.</p>

<h3 id="a-random-access-code-name-permutation">A random-access code name permutation</h3>

<p>Of course this isn’t the only way to build a permutation. I recently
picked up another trick: <a href="https://andrew-helmer.github.io/permute/">Kensler permutation</a>. The key insight
is cycle-walking, allowing for random-access to a permutation of a smaller
domain (e.g. 4,028 elements) through permutation of a larger domain (e.g.
4096 elements).</p>

<p>Here’s such a code name generator built around a bespoke 12-bit
xorshift-multiply permutation. I used 4 “rounds” since xorshift-multiply
is less effective the smaller the permutation.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Generate the nth code name for this seed.</span>
<span class="kt">void</span> <span class="nf">codename_n</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">buf</span><span class="p">,</span> <span class="kt">uint32_t</span> <span class="n">seed</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">uint32_t</span> <span class="n">i</span> <span class="o">=</span> <span class="n">n</span><span class="p">;</span>
    <span class="k">do</span> <span class="p">{</span>
        <span class="n">i</span> <span class="o">^=</span> <span class="n">i</span> <span class="o">&gt;&gt;</span> <span class="mi">6</span><span class="p">;</span> <span class="n">i</span> <span class="o">^=</span> <span class="n">seed</span> <span class="o">&gt;&gt;</span>  <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">*=</span> <span class="mh">0x325</span><span class="p">;</span> <span class="n">i</span> <span class="o">&amp;=</span> <span class="mh">0xfff</span><span class="p">;</span>
        <span class="n">i</span> <span class="o">^=</span> <span class="n">i</span> <span class="o">&gt;&gt;</span> <span class="mi">6</span><span class="p">;</span> <span class="n">i</span> <span class="o">^=</span> <span class="n">seed</span> <span class="o">&gt;&gt;</span>  <span class="mi">8</span><span class="p">;</span> <span class="n">i</span> <span class="o">*=</span> <span class="mh">0x3f5</span><span class="p">;</span> <span class="n">i</span> <span class="o">&amp;=</span> <span class="mh">0xfff</span><span class="p">;</span>
        <span class="n">i</span> <span class="o">^=</span> <span class="n">i</span> <span class="o">&gt;&gt;</span> <span class="mi">6</span><span class="p">;</span> <span class="n">i</span> <span class="o">^=</span> <span class="n">seed</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">;</span> <span class="n">i</span> <span class="o">*=</span> <span class="mh">0xa89</span><span class="p">;</span> <span class="n">i</span> <span class="o">&amp;=</span> <span class="mh">0xfff</span><span class="p">;</span>
        <span class="n">i</span> <span class="o">^=</span> <span class="n">i</span> <span class="o">&gt;&gt;</span> <span class="mi">6</span><span class="p">;</span> <span class="n">i</span> <span class="o">^=</span> <span class="n">seed</span> <span class="o">&gt;&gt;</span> <span class="mi">24</span><span class="p">;</span> <span class="n">i</span> <span class="o">*=</span> <span class="mh">0x85b</span><span class="p">;</span> <span class="n">i</span> <span class="o">&amp;=</span> <span class="mh">0xfff</span><span class="p">;</span>
        <span class="n">i</span> <span class="o">^=</span> <span class="n">i</span> <span class="o">&gt;&gt;</span> <span class="mi">6</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">while</span> <span class="p">(</span><span class="n">i</span> <span class="o">&gt;=</span> <span class="n">COUNTOF</span><span class="p">(</span><span class="n">adjvs</span><span class="p">)</span><span class="o">*</span><span class="n">COUNTOF</span><span class="p">(</span><span class="n">nouns</span><span class="p">));</span>

    <span class="kt">int</span> <span class="n">a</span> <span class="o">=</span> <span class="n">i</span> <span class="o">%</span> <span class="n">COUNTOF</span><span class="p">(</span><span class="n">adjvs</span><span class="p">);</span>
    <span class="kt">int</span> <span class="n">b</span> <span class="o">=</span> <span class="n">i</span> <span class="o">/</span> <span class="n">COUNTOF</span><span class="p">(</span><span class="n">adjvs</span><span class="p">);</span>
    <span class="n">snprintf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="mi">22</span><span class="p">,</span> <span class="s">"%s %s"</span><span class="p">,</span> <span class="n">adjvs</span><span class="p">[</span><span class="n">a</span><span class="p">],</span> <span class="n">nouns</span><span class="p">[</span><span class="n">b</span><span class="p">]);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>While this is more flexible, avoids poorer permutations, and doesn’t have
state space collisions, I still have a soft spot for my LCG-based state
machine generator.</p>

<h3 id="source-code">Source code</h3>

<p>You can find the complete, working source code with both generators here:
<a href="https://github.com/skeeto/scratch/tree/master/misc/codename.c"><strong><code class="language-plaintext highlighter-rouge">codename.c</code></strong></a>. I used <a href="https://en.wikipedia.org/wiki/Secret_Service_code_name">real US Secret Service code names</a> for
my word list. Some sample outputs:</p>

<ul>
  <li>PLASTIC HUMMINGBIRD</li>
  <li>BLACK VENUS</li>
  <li>SILENT SUNBURN</li>
  <li>BRONZE AUTHOR</li>
  <li>FADING MARVEL</li>
</ul>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Single-primitive authenticated encryption for fun</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2021/01/30/"/>
    <id>urn:uuid:92013b12-7f4b-4175-8d19-93520798a919</id>
    <updated>2021-01-30T03:39:10Z</updated>
    <category term="crypto"/><category term="c"/>
    <content type="html">
      <![CDATA[<p>Just as a fun exercise, I designed and implemented from scratch a
standalone, authenticated encryption tool, including key derivation with
stretching, using a single cryptographic primitive. Or, more specifically,
<em>half of a primitive</em>. That primitive is the encryption function of the
<a href="https://en.wikipedia.org/wiki/XXTEA">XXTEA block cipher</a>. The goal was to pare both design and
implementation down to the bone without being broken in practice — <em>I
hope</em> — and maybe learn something along the way. This article is the tour
of my design. Everything here will be nearly the opposite of the <a href="https://latacora.micro.blog/2018/04/03/cryptographic-right-answers.html">right
answers</a>.</p>

<p>The <a href="https://github.com/skeeto/xxtea/tree/v0.1">tool itself is named <strong>xxtea</strong></a> (lowercase), and it’s supported
on all unix-like and Windows systems. It’s trivial to compile, <a href="https://github.com/skeeto/w64devkit">even on
the latter</a>. The code should be easy to follow from top to bottom,
with commentary about specific decisions along the way, though I’ll quote
the most important stuff inline here.</p>

<p>The command line options <a href="/blog/2020/08/01/">follow the usual conventions</a>. The two
modes of operation are encrypt (<code class="language-plaintext highlighter-rouge">-E</code>) and decrypt (<code class="language-plaintext highlighter-rouge">-D</code>). It defaults to
using standard input and standard output so it works great in pipelines.
Supplying <code class="language-plaintext highlighter-rouge">-o</code> sends output elsewhere (automatically deleted if something
goes wrong), and the optional positional argument indicates an alternate
input source.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>usage: xxtea &lt;-E|-D&gt; [-h] [-o FILE] [-p PASSWORD] [FILE]

examples:
    $ xxtea -E -o file.txt.xxtea file.txt
    $ xxtea -D -o file.txt file.txt.xxtea
</code></pre></div></div>

<p>If no password is provided (<code class="language-plaintext highlighter-rouge">-p</code>), it prompts for a <a href="/blog/2020/05/04/">UTF-8-encoded
password</a>. Of course it’s not normally a good idea to supply a
password via command line argument, but it’s been useful for testing.</p>

<h3 id="xxtea-block-cipher">XXTEA block cipher</h3>

<p>TEA stands for <em>Tiny Encryption Algorithm</em> and XXTEA is the second attempt
at fixing weaknesses in the cipher — with partial success. The remaining
issues should not be an issue for this particular application. XXTEA
supports a variable block size, but I’ve hardcoded my implementation to a
128-bit block size, along with some unrolling. I’ve also discarded the
unneeded decryption function. There are no data-dependent lookups or
branches so it’s immune to speculation attacks.</p>

<p>XXTEA operates on 32-bit words and has a 128-bit key, meaning both block
and key are four words apiece. My implementation is about a dozen lines
long. Its prototype:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Encrypt a 128-bit block using 128-bit key</span>
<span class="kt">void</span> <span class="nf">xxtea128_encrypt</span><span class="p">(</span><span class="k">const</span> <span class="kt">uint32_t</span> <span class="n">key</span><span class="p">[</span><span class="mi">4</span><span class="p">],</span> <span class="kt">uint32_t</span> <span class="n">block</span><span class="p">[</span><span class="mi">4</span><span class="p">]);</span>
</code></pre></div></div>

<p>All cryptographic operations are built from this function. Another way to
think about it is that it accepts two 128-bit inputs and returns a 128-bit
result:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>uint128 r = f(uint128 key, uint128 block);
</code></pre></div></div>

<p>Tuck that away in the back of your head since this will be important
later.</p>

<h3 id="encryption">Encryption</h3>

<p>If I tossed the decryption function, how are messages decrypted? I’m sure
many have already guessed: XXTEA will be used in <em>counter mode</em>, or CTR
mode. Rather than encrypt the plaintext directly, encrypt a 128-bit block
counter and treat it like a stream cipher. The message is XORed with the
encrypted counter values for both encryption and decryption.</p>

<ul>
  <li>Only half the cipher is needed.</li>
  <li>No padding scheme is necessary. With other block modes, if message
lengths may not be exactly a multiple of the block size then you need
some scheme for padding the last block.</li>
</ul>

<p>A 128-bit increment with 32-bit limbs is easy:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span>
<span class="nf">increment</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">ctr</span><span class="p">[</span><span class="mi">4</span><span class="p">])</span>
<span class="p">{</span>
    <span class="cm">/* 128-bit increment, first word changes fastest */</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!++</span><span class="n">ctr</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="k">if</span> <span class="p">(</span><span class="o">!++</span><span class="n">ctr</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="k">if</span> <span class="p">(</span><span class="o">!++</span><span class="n">ctr</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span> <span class="o">++</span><span class="n">ctr</span><span class="p">[</span><span class="mi">3</span><span class="p">];</span>
<span class="p">}</span>
</code></pre></div></div>

<p>In xxtea, words are always marshalled in little endian byte order (least
significant byte first). With the first word as the least significant
limb, the entire 128-bit counter is itself little endian.</p>

<p>The counter doesn’t start at zero, but at some randomly-selected 128-bit
nonce called the <em>initialization vector</em> (IV), wrapping around to zero if
necessary (incredibly unlikely). The IV will be included with the message
in the clear. This nonce allows one key (password) to be used with
multiple messages, as they’ll all be encrypted using different,
randomly-chosen regions of an enormous keystream. It also provides
<em>semantic security</em>: encrypt the same file more than once and the
ciphertext will always be completely different.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="p">(</span><span class="cm">/* ... */</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">uint32_t</span> <span class="n">cover</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="n">ctr</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">ctr</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">ctr</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="n">ctr</span><span class="p">[</span><span class="mi">3</span><span class="p">]};</span>
    <span class="n">xxtea128_encrypt</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">cover</span><span class="p">);</span>
    <span class="n">block</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">0</span><span class="p">]</span> <span class="o">^=</span> <span class="n">cover</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
    <span class="n">block</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span> <span class="o">^=</span> <span class="n">cover</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
    <span class="n">block</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span> <span class="o">^=</span> <span class="n">cover</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span>
    <span class="n">block</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">3</span><span class="p">]</span> <span class="o">^=</span> <span class="n">cover</span><span class="p">[</span><span class="mi">3</span><span class="p">];</span>
    <span class="n">increment</span><span class="p">(</span><span class="n">ctr</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="hash-function">Hash function</h3>

<p>That’s encryption, but there’s still a matter of <em>authentication</em> and <em>key
derivation function</em> (KDF). To deal with both I’ll need to devise a hash
function. Since I’m only using the one primitive, somehow I need to build
a hash function from a block cipher. Fortunately there’s a tool for doing
just that: the <a href="https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction">Merkle–Damgård construction</a>.</p>

<p>Recall that <code class="language-plaintext highlighter-rouge">xxtea128_encrypt</code> accepts two 128-bit inputs and returns a
128-bit result. In other words, it <em>compresses</em> 256 bits into 128 bits: a
compression function. The two 128-bit inputs are cryptographically
combined into one 128-bit result. I can repeat this operation to fold an
arbitrary number of 128-bit inputs into a 128-bit hash result.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span> <span class="o">*</span><span class="n">input</span> <span class="o">=</span> <span class="cm">/* ... */</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">hash</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">};</span>
<span class="n">xxtea128_encrypt</span><span class="p">(</span><span class="n">input</span> <span class="o">+</span>  <span class="mi">0</span><span class="p">,</span> <span class="n">hash</span><span class="p">);</span>
<span class="n">xxtea128_encrypt</span><span class="p">(</span><span class="n">input</span> <span class="o">+</span>  <span class="mi">4</span><span class="p">,</span> <span class="n">hash</span><span class="p">);</span>
<span class="n">xxtea128_encrypt</span><span class="p">(</span><span class="n">input</span> <span class="o">+</span>  <span class="mi">8</span><span class="p">,</span> <span class="n">hash</span><span class="p">);</span>
<span class="n">xxtea128_encrypt</span><span class="p">(</span><span class="n">input</span> <span class="o">+</span> <span class="mi">12</span><span class="p">,</span> <span class="n">hash</span><span class="p">);</span>
<span class="c1">// ...</span>
</code></pre></div></div>

<p>Note how the input is the key, not the block. The hash state is repeatedly
encrypted using the hash inputs as the key, mixing hash state and input.
When the input is exhausted, that block is the result. Sort of.</p>

<p>I used zero for the initial hash state in my example, but it will be more
challenging to attack if the starting input is something random. <a href="/blog/2017/09/15/">Like
Blowfish</a>, in xxtea I chose the first 128 bits of the decimals
of pi:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span>
<span class="nf">xxtea128_hash_init</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">ctx</span><span class="p">[</span><span class="mi">4</span><span class="p">])</span>
<span class="p">{</span>
    <span class="cm">/* first 32 hexadecimal digits of pi */</span>
    <span class="n">ctx</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="mh">0x243f6a88</span><span class="p">;</span> <span class="n">ctx</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="mh">0x85a308d3</span><span class="p">;</span>
    <span class="n">ctx</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="mh">0x13198a2e</span><span class="p">;</span> <span class="n">ctx</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">=</span> <span class="mh">0x03707344</span><span class="p">;</span>
<span class="p">}</span>

<span class="cm">/* Mix one block into the hash state. */</span>
<span class="kt">void</span>
<span class="nf">xxtea128_hash_update</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">ctx</span><span class="p">[</span><span class="mi">4</span><span class="p">],</span> <span class="k">const</span> <span class="kt">uint32_t</span> <span class="n">block</span><span class="p">[</span><span class="mi">4</span><span class="p">])</span>
<span class="p">{</span>
    <span class="n">xxtea128_encrypt</span><span class="p">(</span><span class="n">block</span><span class="p">,</span> <span class="n">ctx</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>There are still a couple of problems. First, what if the input isn’t a
multiple of the block size? This time I <em>do</em> need a padding scheme to fill
out that last block. In this case I pad it with bytes where each byte is
the number of padding bytes. For instance, <code class="language-plaintext highlighter-rouge">helloworld</code> becomes, roughly
speaking, <code class="language-plaintext highlighter-rouge">helloworld666666</code>.</p>

<p>That creates a different problem: This will have the same hash result as
an input that actually ends with these bytes. So the second rule is that
there is always a padding block, even if that block is 100% padding.</p>

<p>Another problem is that the Merkle–Damgård construction is prone to
<em>length-extension attacks</em>. Anyone can take my hash result and continue
appending additional data without knowing what came before. If I’m using
this hash to authenticate the ciphertext, someone could, for example, use
this attack to append arbitrary data to the end of messages.</p>

<p>Some important hash functions, such as the most common forms of SHA-2, are
vulnerable to length-extension attacks. Keeping this issue in mind, I
could address it later using HMAC, but I have an idea for nipping this in
the bud now. Before mixing the padding block into the hash state, I swap
the two middle words:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* Append final raw-byte block to hash state. */</span>
<span class="kt">void</span>
<span class="nf">xxtea128_hash_final</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">ctx</span><span class="p">[</span><span class="mi">4</span><span class="p">],</span> <span class="k">const</span> <span class="kt">void</span> <span class="o">*</span><span class="n">buf</span><span class="p">,</span> <span class="kt">int</span> <span class="n">len</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">assert</span><span class="p">(</span><span class="n">len</span> <span class="o">&lt;</span> <span class="mi">16</span><span class="p">);</span>
    <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">tmp</span><span class="p">[</span><span class="mi">16</span><span class="p">];</span>
    <span class="n">memset</span><span class="p">(</span><span class="n">tmp</span><span class="p">,</span> <span class="mi">16</span><span class="o">-</span><span class="n">len</span><span class="p">,</span> <span class="mi">16</span><span class="p">);</span>
    <span class="n">memcpy</span><span class="p">(</span><span class="n">tmp</span><span class="p">,</span> <span class="n">buf</span><span class="p">,</span> <span class="n">len</span><span class="p">);</span>
    <span class="kt">uint32_t</span> <span class="n">k</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
        <span class="n">loadu32</span><span class="p">(</span><span class="n">tmp</span> <span class="o">+</span>  <span class="mi">0</span><span class="p">),</span> <span class="n">loadu32</span><span class="p">(</span><span class="n">tmp</span> <span class="o">+</span>  <span class="mi">4</span><span class="p">),</span>
        <span class="n">loadu32</span><span class="p">(</span><span class="n">tmp</span> <span class="o">+</span>  <span class="mi">8</span><span class="p">),</span> <span class="n">loadu32</span><span class="p">(</span><span class="n">tmp</span> <span class="o">+</span> <span class="mi">12</span><span class="p">),</span>
    <span class="p">};</span>
    <span class="cm">/* swap middle words to break length extension attacks */</span>
    <span class="kt">uint32_t</span> <span class="n">swap</span> <span class="o">=</span> <span class="n">ctx</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
    <span class="n">ctx</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">ctx</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span>
    <span class="n">ctx</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="n">swap</span><span class="p">;</span>
    <span class="n">xxtea128_encrypt</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="n">ctx</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This operation “ties off” the last block so that the hash can’t be
extended with more input. <em>Or so I hope.</em> This is my own invention, and so
it may not actually work right. Again, this is for fun and learning!</p>

<p><strong>Update</strong>: Aristotle Pagaltzis pointed out that when these two words are
identical the hash result will be unchanged, leaving it vulnerable to
length extension attack. This occurs about once every 2<sup>32</sup>
messages, which is far too small a security margin.</p>

<h4 id="caveats">Caveats</h4>

<p>Despite all that care, there are still two more potential weaknesses.</p>

<p>First, XXTEA was never designed to be used with the Merkle–Damgård
construction. I assume attackers can modify files I will decrypt, and so
the hash input is usually and mostly under control of attackers, meaning
they control the cipher key. Ciphers are normally designed assuming the
key is not under hostile control. This might be vulnerable to related-key
attacks.</p>

<p>As will be discussed below, I use this custom hash function in two ways.
In one the input is not controlled by attackers, so this is a non-issue.
In the second, the hash state is completely unknown to the attacker before
they control the input, which I believe mitigates any issues.</p>

<p>Second, a 128-bit hash state is a bit small these days. For very large
inputs, the chance of <a href="/blog/2019/07/22/">collision via the birthday paradox</a> is a
practical issue.</p>

<p>In xxtea, digests are only computed over a few megabytes of input at a time
at most, even when encrypting giant files, so a 128-bit state should be
fine.</p>

<h3 id="key-derivation">Key derivation</h3>

<p>The user will supply a password and somehow I need to turn that into a
128-bit key.</p>

<ol>
  <li>What if the password is shorter than 128 bits?</li>
  <li>What if the password is longer than 128 bits?</li>
  <li>It’s safer for the cipher if the raw password isn’t used directly.</li>
  <li>I’d like offline, brute force attacks to be expensive.</li>
</ol>

<p>The first three can be resolved by running the passphrase through the hash
function, using it as key derivation function. What about the last item?
Rather than hash the password once, I concatenate it, including null
terminator, repeatedly until it reaches a certain number of bytes
(hardcoded to 64 MiB, see <code class="language-plaintext highlighter-rouge">COST</code>), and hash that. That’s a computational
workload that attackers must repeat when guessing passwords.</p>

<p>To avoid timing attacks based on the password length, I precompute all
possible block arrangements before starting the hash — all the different
ways the password might appear concatenated across 16-byte blocks. Blocks
may be redundantly computed if necessary to make this part constant time.
The hash is fed entirely from these precomputed blocks.</p>

<p>To defend against rainbow tables and the like — as well as make it harder
to attack other parts of the message construction — the initialization
vector is used as a salt, fed into the hash before the password
concatenation.</p>

<p>Unfortunately this KDF isn’t <em>memory-hard</em>, and attackers can use economy
of scale to strengthen their attacks (GPUs, custom hardware). However, a
memory-hard KDF requires lots of memory to compute the key, making memory
an expensive and limiting factor for attackers. Memory-hard KDFs are
complex and difficult to design, and I made the trade-off for simplicity.</p>

<h3 id="authentication">Authentication</h3>

<p>When I say the encryption is <em>authenticated</em> I mean that it should not be
possible for anyone to tamper with the ciphertext undetected without
already knowing the key. This is typically accomplished by computing a
keyed hash digest and appending it to the message, <em>message authentication
code</em> (MAC). Since it’s keyed, only someone who knows the key can compute
the digest, and so attackers can’t spoof the MAC.</p>

<p>This is where length-extension attacks come into play: With an improperly
constructed MAC, an attacker could append input without knowing the key.
Fortunately my hash function isn’t vulnerable to length-extension attacks!</p>

<p>An alternative is to use an authenticated block mode such as <a href="https://en.wikipedia.org/wiki/Galois/Counter_Mode">GCM</a>,
which is still CTR mode at its core. Unfortunately, this is complicated,
and, unlike plain CTR, it would take me a long time to convince myself I
got it right. So instead I used CTR mode and my hash function in a
straightforward way.</p>

<p>At this point there’s a question of what exactly you input into the hash
function. Do you hash the plaintext or do you hash the ciphertext? It’s
tempting to do the former since it’s (generally) not available to
attackers, and would presumably make it harder to attack. This is a
mistake. Always compute the MAC over the ciphertext, a.k.a. encrypt then
authenticate.</p>

<p>This is the called <a href="https://moxie.org/2011/12/13/the-cryptographic-doom-principle.html">the Doom Principle</a>. Computing the MAC on the
plaintext means that recipients must decrypt untrusted ciphertext before
authenticating it. This is bad because messages should be authenticated
before decryption. So that’s exactly what xxtea does. It also happens to
be the simplest option.</p>

<p>We have a hash function, but to compute a MAC we need a keyed hash
function. Again, I do the simplest thing that I believe isn’t broken:
concatenate the key with the ciphertext. Or more specifically:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>MAC = hash(key || ctr || ciphertext)
</code></pre></div></div>

<p><strong>Update</strong>: <a href="https://lists.sr.ht/~skeeto/public-inbox/%3C5b3ef28a-c8b7-2835-9a56-6968aca5606c%40gmail.com%3E">Dimitrije Erdeljan explains why this is broken</a> and
how to fix it. Given a valid MAC, attackers can forge arbitrary messages.</p>

<p>The counter is because xxtea uses chunked authentication with one megabyte
chunks. It can authenticate a chunk at a time, which allows it to decrypt,
with authentication, arbitrary amounts of ciphertext in a fixed amount of
memory. The worst that can happen is truncation between chunks — an
acceptable trade-off. The counter ensures each chunk MAC is uniquely
keyed, that they appear in order.</p>

<p>It’s also important to note that the counter is appended <em>after</em> the key.
The counter is under hostile control — they can choose the IV — and having
the key there first means they have no information about the hash state.</p>

<p>All chunks are one megabyte except for the last chunk, which is always
shorter, signaling the end of the message. It may even be just a MAC and
zero-length ciphertext. This avoids nasty issues with parsing potentially
unauthenticated length fields and whatnot. Just stop successfully at the
first short, authenticated chunk.</p>

<p>Some will likely have spotted it, but a potential weakness is that I’m
using the same key for both encryption and authentication. These are
normally two different keys. This is disastrous in certain cases <a href="https://blog.cryptographyengineering.com/2013/02/15/why-i-hate-cbc-mac/">like
CBC-MAC</a>, but I believe it’s alright here. It would be easy to
compute a separate MAC key, but I opted for simple.</p>

<h3 id="file-format">File format</h3>

<p>In my usual style, encrypted files have no distinguishing headers or
fields. They just look like a random block of data. A file begins with the
16-byte IV, then a sequence of zero or more one megabyte chunks, ending
with a short chunk. It’s indistinguishable from <code class="language-plaintext highlighter-rouge">/dev/random</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[IV][lMiB || MAC][1MiB || MAC][&lt;1 MiB || MAC]
</code></pre></div></div>

<p>If the user types the incorrect password, it will be discovered when
authenticating the first chunk (read: immediately). This saves on a
dedicated check at the beginning of the file, though it means it’s not
possible to distinguish between a bad password and a modified file.</p>

<p>I know my design has weaknesses as a result of artificial, self-imposed
constraints and deliberate trade-offs, but I’m curious if I’ve made any
glaring mistakes with practical consequences.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>On-the-fly Linear Congruential Generator Using Emacs Calc</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/11/19/"/>
    <id>urn:uuid:13e56720-ef3a-4fa4-a4ff-0a6fef914504</id>
    <updated>2019-11-19T01:17:50Z</updated>
    <category term="emacs"/><category term="crypto"/><category term="optimization"/><category term="c"/><category term="java"/><category term="javascript"/>
    <content type="html">
      <![CDATA[<p>I regularly make throwaway “projects” and do a surprising amount of
programming in <code class="language-plaintext highlighter-rouge">/tmp</code>. For Emacs Lisp, the equivalent is the
<code class="language-plaintext highlighter-rouge">*scratch*</code> buffer. These are places where I can make a mess, and the
mess usually gets cleaned up before it becomes a problem. A lot of my
established projects (<a href="/blog/2019/03/22/">ex</a>.) start out in volatile storage and
only graduate to more permanent storage once the concept has proven
itself.</p>

<p>Throughout my whole career, this sort of throwaway experimentation has
been an important part of my personal growth, and I try to <a href="/blog/2016/09/02/">encourage it
in others</a>. Even if the idea I’m trying doesn’t pan out, I usually
learn something new, and occasionally it translates into an article here.</p>

<p>I also enjoy small programming challenges. One of the most abused
tools in my mental toolbox is the Monte Carlo method, and I readily
apply it to solve toy problems. Even beyond this, random number
generators are frequently a useful tool (<a href="/blog/2017/04/27/">1</a>, <a href="/blog/2019/07/22/">2</a>), so I
find myself reaching for one all the time.</p>

<p>Nearly every programming language comes with a pseudo-random number
generation function or library. Unfortunately the language’s standard
PRNG is usually a poor choice (C, <a href="https://arvid.io/2018/06/30/on-cxx-random-number-generator-quality/">C++</a>, <a href="https://lowleveldesign.org/2018/08/15/randomness-in-net/">C#</a>, <a href="https://grokbase.com/t/gg/golang-nuts/155f6kbb7a/go-nuts-why-are-high-bits-used-by-math-rand-helpers-instead-of-low-ones">Go</a>).
It’s probably mediocre quality, <a href="/blog/2018/05/27/">slower than it needs to be</a>
(<a href="https://grokbase.com/t/gg/golang-nuts/155f6kbb7a/go-nuts-why-are-high-bits-used-by-math-rand-helpers-instead-of-low-ones">also</a>), <a href="https://lists.freebsd.org/pipermail/svn-src-head/2013-July/049068.html">lacks reliable semantics or behavior between
implementations</a>, or is missing some other property I want. So I’ve
long been a fan of <em>BYOPRNG:</em> Bring Your Own Pseudo-random Number
Generator. Just embed a generator with the desired properties directly
into the program. The <a href="/blog/2017/09/21/">best non-cryptographic PRNGs today</a> are
tiny and exceptionally friendly to embedding. Though, depending on what
you’re doing, you might <a href="/blog/2019/04/30/">need to be creative about seeding</a>.</p>

<h3 id="crafting-a-prng">Crafting a PRNG</h3>

<p>On occasion I don’t have an established, embeddable PRNG in reach, and
I have yet to commit xoshiro256** to memory. Or maybe I want to use
a totally unique PRNG for a particular project. In these cases I make
one up. With just a bit of know-how it’s not too difficult.</p>

<p>Probably the easiest decent PRNG to code from scratch is the venerable
<a href="https://en.wikipedia.org/wiki/Linear_congruential_generator">Linear Congruential Generator</a> (LCG). It’s a simple recurrence
relation:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>x[1] = (x[0] * A + C) % M
</code></pre></div></div>

<p>That’s trivial to remember once you know the details. You only need to
choose appropriate values for <code class="language-plaintext highlighter-rouge">A</code>, <code class="language-plaintext highlighter-rouge">C</code>, and <code class="language-plaintext highlighter-rouge">M</code>. Done correctly, it
will be a <em>full-period</em> generator — a generator that visits a
permutation of each of the numbers between 0 and <code class="language-plaintext highlighter-rouge">M - 1</code>. The seed —
the value of <code class="language-plaintext highlighter-rouge">x[0]</code> — is chooses a starting position in this (looping)
permutation.</p>

<p><code class="language-plaintext highlighter-rouge">M</code> has a natural, obvious choice: a power of two matching the range of
operands, such as 2^32 or 2^64. With this the modulo operation is free
as a natural side effect of the computer architecture.</p>

<p>Choosing <code class="language-plaintext highlighter-rouge">C</code> also isn’t difficult. It must be co-prime with <code class="language-plaintext highlighter-rouge">M</code>, and
since <code class="language-plaintext highlighter-rouge">M</code> is a power of two, any odd number is valid. Even 1. In
theory choosing a small value like 1 is faster since the compiler
won’t need to embed a large integer in the code, but this difference
doesn’t show up in any micro-benchmarks I tried. If you want a cool,
unique generator, then choose a large random integer. More on that
below.</p>

<p>The tricky value is <code class="language-plaintext highlighter-rouge">A</code>, and getting it right is the linchpin of the
whole LCG. It must be coprime with <code class="language-plaintext highlighter-rouge">M</code> (i.e. not even), and, for a
full-period generator, <code class="language-plaintext highlighter-rouge">A-1</code> must be divisible by four. For better
results, <code class="language-plaintext highlighter-rouge">A-1</code> should not be divisible by 8. A good choice is a prime
number that satisfies these properties.</p>

<p>If your operands are 64-bit integers, or larger, how are you going to
generate a prime number?</p>

<h4 id="primes-from-emacs-calc">Primes from Emacs Calc</h4>

<p>Emacs Calc can solve this problem. I’ve <a href="/blog/2009/06/23/">noted before</a> how
featureful it is. It has arbitrary precision, random number
generation, and primality testing. It’s everything we need to choose
<code class="language-plaintext highlighter-rouge">A</code>. (In fact, this is nearly identical to <a href="/blog/2015/10/30/">the process I used to
implement RSA</a>.) For this example I’m going to generate a 64-bit
LCG for the C programming language, but it’s easy to use whatever
width you like and mostly whatever language you like. If you wanted a
<a href="http://www.pcg-random.org/posts/does-it-beat-the-minimal-standard.html">minimal standard 128-bit LCG</a>, this will still work.</p>

<p>Start by opening up Calc with <code class="language-plaintext highlighter-rouge">M-x calc</code>, then:</p>

<ol>
  <li>Push <code class="language-plaintext highlighter-rouge">2</code> on the stack</li>
  <li>Push <code class="language-plaintext highlighter-rouge">64</code> on the stack</li>
  <li>Press <code class="language-plaintext highlighter-rouge">^</code>, computing 2^64 and pushing it on the stack</li>
  <li>Press <code class="language-plaintext highlighter-rouge">k r</code> to generate a random number in this range</li>
  <li>Press <code class="language-plaintext highlighter-rouge">d r 16</code> to switch to hexadecimal display</li>
  <li>Press <code class="language-plaintext highlighter-rouge">k n</code> to find the next prime following the random value</li>
  <li>Repeat step 6 until you get a number that ends with <code class="language-plaintext highlighter-rouge">5</code> or <code class="language-plaintext highlighter-rouge">D</code></li>
  <li>Press <code class="language-plaintext highlighter-rouge">k p</code> a few times to avoid false positives.</li>
</ol>

<p>What’s left on the stack is your <code class="language-plaintext highlighter-rouge">A</code>! If you want a random value for
<code class="language-plaintext highlighter-rouge">C</code>, you can follow a similar process. Heck, make it prime, too!</p>

<p>The reason for using hexadecimal (step 5) and looking for <code class="language-plaintext highlighter-rouge">5</code> or <code class="language-plaintext highlighter-rouge">D</code>
(step 7) is that such numbers satisfy both of the important properties
for <code class="language-plaintext highlighter-rouge">A-1</code>.</p>

<p>Calc doesn’t try to factor your random integer. Instead it uses the
<a href="https://en.wikipedia.org/wiki/Miller%E2%80%93Rabin_primality_test">Miller–Rabin primality test</a>, a probabilistic test that, itself,
requires random numbers. It has false positives but no false negatives.
The false positives can be mitigated by repeating the test multiple
times, hence step 8.</p>

<p>Trying this all out right now, I got this implementation (in C):</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint64_t</span> <span class="nf">lcg1</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">static</span> <span class="kt">uint64_t</span> <span class="n">s</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">s</span> <span class="o">=</span> <span class="n">s</span><span class="o">*</span><span class="n">UINT64_C</span><span class="p">(</span><span class="mh">0x7c3c3267d015ceb5</span><span class="p">)</span> <span class="o">+</span> <span class="n">UINT64_C</span><span class="p">(</span><span class="mh">0x24bd2d95276253a9</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">s</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>However, we can still do a little better. Outputting the entire state
doesn’t have great results, so instead it’s better to create a
<em>truncated</em> LCG and only return some portion of the most significant
bits.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span> <span class="nf">lcg2</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">static</span> <span class="kt">uint64_t</span> <span class="n">s</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">s</span> <span class="o">=</span> <span class="n">s</span><span class="o">*</span><span class="n">UINT64_C</span><span class="p">(</span><span class="mh">0x7c3c3267d015ceb5</span><span class="p">)</span> <span class="o">+</span> <span class="n">UINT64_C</span><span class="p">(</span><span class="mh">0x24bd2d95276253a9</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">s</span> <span class="o">&gt;&gt;</span> <span class="mi">32</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This won’t quite pass <a href="http://simul.iro.umontreal.ca/testu01/tu01.html">BigCrush</a> in 64-bit form, but the results
are pretty reasonable for most purposes.</p>

<p>But we can still do better without needing to remember much more than
this.</p>

<h3 id="appending-permutation">Appending permutation</h3>

<p>A <a href="http://www.pcg-random.org/">Permuted Congruential Generator</a> (PCG) is really just a
truncated LCG with a permutation applied to its output. Like LCGs
themselves, there are arbitrarily many variations. The “official”
implementation has a <a href="/blog/2018/02/07/">data-dependent shift</a>, for which I can
never remember the details. Fortunately a couple of simple, easy to
remember transformations is sufficient. Basically anything I used
<a href="/blog/2018/07/31/">while prospecting for hash functions</a>. I love xorshifts, so
lets add one of those:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span> <span class="nf">pcg1</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">static</span> <span class="kt">uint64_t</span> <span class="n">s</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">s</span> <span class="o">=</span> <span class="n">s</span><span class="o">*</span><span class="n">UINT64_C</span><span class="p">(</span><span class="mh">0x7c3c3267d015ceb5</span><span class="p">)</span> <span class="o">+</span> <span class="n">UINT64_C</span><span class="p">(</span><span class="mh">0x24bd2d95276253a9</span><span class="p">);</span>
    <span class="kt">uint32_t</span> <span class="n">r</span> <span class="o">=</span> <span class="n">s</span> <span class="o">&gt;&gt;</span> <span class="mi">32</span><span class="p">;</span>
    <span class="n">r</span> <span class="o">^=</span> <span class="n">r</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This is a big improvement, but it still fails one BigCrush test. As
they say, when xorshift isn’t enough, use xorshift-multiply! Below I
generated a 32-bit prime for the multiply, but any odd integer is a
valid permutation.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span> <span class="nf">pcg2</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">static</span> <span class="kt">uint64_t</span> <span class="n">s</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">s</span> <span class="o">=</span> <span class="n">s</span><span class="o">*</span><span class="n">UINT64_C</span><span class="p">(</span><span class="mh">0x7c3c3267d015ceb5</span><span class="p">)</span> <span class="o">+</span> <span class="n">UINT64_C</span><span class="p">(</span><span class="mh">0x24bd2d95276253a9</span><span class="p">);</span>
    <span class="kt">uint32_t</span> <span class="n">r</span> <span class="o">=</span> <span class="n">s</span> <span class="o">&gt;&gt;</span> <span class="mi">32</span><span class="p">;</span>
    <span class="n">r</span> <span class="o">^=</span> <span class="n">r</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">;</span>
    <span class="n">r</span> <span class="o">*=</span> <span class="n">UINT32_C</span><span class="p">(</span><span class="mh">0x60857ba9</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This passes BigCrush, and I can reliably build a new one entirely from
scratch using Calc any time I need it.</p>

<h3 id="bonus-adapting-to-other-languages">Bonus: Adapting to other languages</h3>

<p>Sometimes it’s not so straightforward to adapt this technique to other
languages. For example, JavaScript has limited support for 32-bit
integer operations (enough for a poor 32-bit LCG) and no 64-bit
integer operations. Though <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt">BigInt</a> is now a thing, and should
make a great 96- or 128-bit LCG easy to build.</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">function</span> <span class="nx">lcg</span><span class="p">(</span><span class="nx">seed</span><span class="p">)</span> <span class="p">{</span>
    <span class="kd">let</span> <span class="nx">s</span> <span class="o">=</span> <span class="nx">BigInt</span><span class="p">(</span><span class="nx">seed</span><span class="p">);</span>
    <span class="k">return</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
        <span class="nx">s</span> <span class="o">*=</span> <span class="mh">0xef725caa331524261b9646cd</span><span class="nx">n</span><span class="p">;</span>
        <span class="nx">s</span> <span class="o">+=</span> <span class="mh">0x213734f2c0c27c292d814385</span><span class="nx">n</span><span class="p">;</span>
        <span class="nx">s</span> <span class="o">&amp;=</span> <span class="mh">0xffffffffffffffffffffffff</span><span class="nx">n</span><span class="p">;</span>
        <span class="k">return</span> <span class="nb">Number</span><span class="p">(</span><span class="nx">s</span> <span class="o">&gt;&gt;</span> <span class="mi">64</span><span class="nx">n</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Java doesn’t have unsigned integers, so how could you build the above
PCG in Java? Easy! First, remember is that Java has two’s complement
semantics, including wrap around, and that two’s complement doesn’t
care about unsigned or signed for multiplication (or addition, or
subtraction). The result is identical. Second, the oft-forgotten <code class="language-plaintext highlighter-rouge">&gt;&gt;&gt;</code>
operator does an unsigned right shift. With these two tips:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">long</span> <span class="n">s</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span>

<span class="kt">int</span> <span class="nf">pcg2</span><span class="o">()</span> <span class="o">{</span>
    <span class="n">s</span> <span class="o">=</span> <span class="n">s</span><span class="o">*</span><span class="mh">0x7c3c3267d015ceb5</span><span class="no">L</span> <span class="o">+</span> <span class="mh">0x24bd2d95276253a9</span><span class="no">L</span><span class="o">;</span>
    <span class="kt">int</span> <span class="n">r</span> <span class="o">=</span> <span class="o">(</span><span class="kt">int</span><span class="o">)(</span><span class="n">s</span> <span class="o">&gt;&gt;&gt;</span> <span class="mi">32</span><span class="o">);</span>
    <span class="n">r</span> <span class="o">^=</span> <span class="n">r</span> <span class="o">&gt;&gt;&gt;</span> <span class="mi">16</span><span class="o">;</span>
    <span class="n">r</span> <span class="o">*=</span> <span class="mh">0x60857ba9</span><span class="o">;</span>
    <span class="k">return</span> <span class="n">r</span><span class="o">;</span>
<span class="o">}</span>
</code></pre></div></div>

<p>So, in addition to the Calc step list above, you may need to know some
of the finer details of your target language.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Keyringless GnuPG</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/08/09/"/>
    <id>urn:uuid:c51e0800-9bf5-4b77-93d4-35480c40a0ba</id>
    <updated>2019-08-09T23:52:39Z</updated>
    <category term="crypto"/><category term="openpgp"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=20792472">on Hacker News</a>.</em></p>

<p>My favorite music player is <a href="https://audacious-media-player.org/">Audacious</a>. It follows the Winamp
Classic tradition of not trying to manage my music library. Instead it
waits patiently for me to throw files and directories at it. These
selections will be informally grouped into transient, disposable
playlists of whatever I fancy that day.</p>

<!--more-->

<p>This matters to me because my music collection is the result of around
25 years of hoarding music files from various sources including CD rips,
Napster P2P sharing, and, most recently, <a href="https://ytdl-org.github.io/youtube-dl/">YouTube downloads</a>. It’s
not well-organized, but it’s organized well enough. Each album has its
own directory, and related albums are sometimes grouped together under
a directory for a particular artist.</p>

<p>Over the years I’ve tried various music players, and some have either
wanted to manage this library or hide the underlying file-organized
nature of my collection. Both situations are annoying because I really
don’t want or need that abstraction. I’m going just fine thinking of
my music library in terms of files, thank you very much. Same goes for
ebooks.</p>

<p><strong>GnuPG is like a media player that wants to manage your whole music
library.</strong> Rather than MP3s, it’s crypto keys on a keyring. Nearly every
operation requires keys that have been imported into the keyring. Until
GnuPG 2.2.8 (June 2018), which added the <code class="language-plaintext highlighter-rouge">--show-keys</code> command, you
couldn’t even be sure what you were importing until after it was already
imported. Hopefully it wasn’t <a href="https://github.com/skeeto/pgp-poisoner">garbage</a>.</p>

<p>GnuPG <em>does</em> has a pretty good excuse. It’s oriented around the Web of
Trust model, and it can’t follow this model effectively without having
all the keys at once. However, even if you don’t buy into the Web of
Trust, the GnuPG interface still requires you to play by its rules.
Sometimes I’ve got a message, a signature, and a public key and I just
want to verify that they’re all consistent with each other, <em>damnit</em>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gpg --import foo.asc
gpg: key 1A719EF63AEB2CFE: public key "foo" imported
gpg: Total number processed: 1
gpg:               imported: 1
$ gpg --verify --trust-model always message.txt.sig message.txt
gpg: Signature made Fri 09 Aug 2019 05:44:43 PM EDT
gpg:                using EDDSA key ...1A719EF63AEB2CFE
gpg: Good signature from "foo" [unknown]
gpg: WARNING: Using untrusted key!
$ gpg --batch --yes --delete-key 1A719EF63AEB2CFE
</code></pre></div></div>

<p>Three commands and seven lines of output when one of each would do.
Plus there’s a false warning: Wouldn’t an “always” trust model mean
that this key is indeed trusted?</p>

<h3 id="signify">Signify</h3>

<p>Compare this to <a href="https://www.openbsd.org/papers/bsdcan-signify.html">OpenBSD’s signify</a> (<a href="https://flak.tedunangst.com/post/signify">also</a>). There’s no
keyring, and it’s up to the user — or the program shelling out to
signify — to supply the appropriate key. It’s like the music player that
just plays whatever I give it. Here’s a simplified <a href="https://man.openbsd.org/signify">usage
overview</a>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>signify -G [-c comment] -p pubkey -s seckey
signify -S [-x sigfile] -s seckey -m message
signify -V [-x sigfile] -p pubkey -m message
</code></pre></div></div>

<p>When generating a new keypair (<code class="language-plaintext highlighter-rouge">-G</code>), the user must choose the
destination files for the public and secret keys. When signing a message
(a file), the user must supply the secret key and the message. When
verifying a file, the user must supply the public key and the message.
This is a popular enough model that <a href="https://jedisct1.github.io/minisign/">other, compatible implementations
with the same interface</a> have been developed.</p>

<p>Signify is deliberately incompatible with OpenPGP and uses its own
simpler, and less featureful, format. Wouldn’t it be nice to have a
similar interface to verify OpenPGP signatures?</p>

<h3 id="simplegpg">SimpleGPG</h3>

<p>Well, I thought so. So I put together a shell script that wraps GnuPG
and provides such an interface:</p>

<p><strong><a href="https://github.com/skeeto/simplegpg">SimpleGPG</a></strong></p>

<p>The interface is nearly identical to signify, and the GnuPG keyring is
hidden away as if it didn’t exist. The main difference is that the keys
and signatures produced and consumed by this tool are fully compatible
with OpenPGP. You could use this script without requiring anyone else to
adopt something new or different.</p>

<p>To avoid touching your real keyring, the script creates a temporary
keyring directory each time it’s run. The GnuPG option <code class="language-plaintext highlighter-rouge">--homedir</code>
instructs it to use this temporary keyring and ignore the usual one.
The temporary keyring is destroyed when the script exits. This is kind
of clunky, but there’s no way around it.</p>

<p>Verification looks roughly like this in the script:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ tmp=$(mktemp -d simplegpg-XXXXXX)
$ gpg --homedir $tmp
$ gpg --homedir $tmp --import foo.asc
$ gpg --homedir $tmp --verify message.txt.sig message.txt
$ rm -rf $tmp
</code></pre></div></div>

<p>Generating a key is trivial, and there’s only a prompt for the
protection passphrase. Like signify, it will generate an Ed25519 key
and all outputs are ASCII-armored.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ simplegpg -G -p keyname.asc -s keyname.pgp
passphrase:
passphrase (confirm):
</code></pre></div></div>

<p>Since signify doesn’t have a concept of a user ID for a key, just an
“untrusted comment”, the user ID is not emphasized here. The default
user ID will be “simplegpg key”, so, if you plan to share the key with
regular GnuPG users who will need to import it into a keyring, you
probably want to use <code class="language-plaintext highlighter-rouge">-c</code> to give it a more informative name.</p>

<p>Unfortunately due GnuPG’s very limited, keyring-oriented interface,
key generation is about three times slower than it should be. That’s
because the protection key is run though the String-to-Key (S2K)
algorithm <em>three times</em>:</p>

<ol>
  <li>
    <p>Immediately after the key is generated, the passphrase is converted
to a key, the key is encrypted, and it’s put onto the temporary
keyring.</p>
  </li>
  <li>
    <p>When exporting, the key passphrase is again run through the S2K to
get the protection key to decrypt it.</p>
  </li>
  <li>
    <p>The export format uses a slightly different S2K algorithm, so this
export S2K is now used to create yet another protection key.</p>
  </li>
</ol>

<p>Technically the second <em>could</em> be avoided since gpg-agent, which is
always required, could be holding the secret key material. As far as I
can tell, gpg-agent simply does not learn freshly-generated keys. I do
not know why this is the case.</p>

<p>This is related to another issue. If you’re accustomed to GnuPG, you may
notice that the passphrase prompt didn’t come from pinentry, a program
specialized for passphrase prompts. GnuPG normally uses it for this.
Instead, the script handles the passphrase prompt and passes the
passphrase to GnuPG (via a file descriptor). This would not be necessary
if gpg-agent did its job. Without this part of the script, users are
prompted three times, via pinentry, for their passphrase when generating
a key.</p>

<p>When signing messages, the passphrase prompt comes from pinentry since
it’s initiated by GnuPG.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ simplegpg -S -s keyname.pgp -m message.txt
passphrase:
</code></pre></div></div>

<p>This will produce <code class="language-plaintext highlighter-rouge">message.txt.sig</code> with an OpenPGP detached signature.</p>

<p>The passphrase prompt is for <code class="language-plaintext highlighter-rouge">--import</code>, not <code class="language-plaintext highlighter-rouge">--detach-sign</code>. As with
key generation, the S2K is run more than necessary: twice instead of
once. First to generate the decryption key, then a second time to
generate a different encryption key for the keyring since the export
format and keyring use different algorithms. Ugh.</p>

<p>But at least gpg-agent does its job this time, so only one passphrase
prompt is necessary. In general, a downside of these temporary
keyrings is that gpg-agent treats each as different keys, and you will
need to enter your passphrase once for each message signed. Just like
signify.</p>

<p>Verification, of course, requires no prompting and no S2K.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ simplegpg -V -p keyname.asc -m message.txt
</code></pre></div></div>

<p>That’s all there is to keyringless OpenPGP signatures. Since I’m not
interested in the Web of Trust or keyservers, I wish GnuPG was more
friendly to this model of operation.</p>

<h3 id="passphrase2pgp">passphrase2pgp</h3>

<p>I mentioned that SimpleGPG is fully compatible with other OpenPGP
systems. This includes <a href="/blog/2019/07/10/">my own passphrase2pgp</a>, where your secret
key is stored only in your brain. No need for a secret key file. In the
time since I first wrote about it, passphrase2pgp has gained the ability
to produce signatures itself!</p>

<p>I’ve got my environment set up — <code class="language-plaintext highlighter-rouge">$REALNAME</code>, <code class="language-plaintext highlighter-rouge">$EMAIL</code>, and <code class="language-plaintext highlighter-rouge">$KEYID</code> per
the README — so I don’t need to supply a user ID argument, nor will I be
prompted to confirm my passphrase since it’s checked against a known
fingerprint. Generating the public key, for sharing, looks like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ passphrase2pgp -K --armor --public &gt;keyname.asc

Or just:

$ passphrase2pgp -ap &gt;keyname.asc
</code></pre></div></div>

<p>Like with signify and SimplePGP, to sign a message I’m prompted for my
passphrase. It takes longer since the “S2K” here is much stronger by
necessity. The passphrase is used to generate the secret key, then from
that the signature on the message:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ passphrase2pgp -S message.txt
</code></pre></div></div>

<p>For the SimpleGPG user on the other side it all looks the same as before:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ simplegpg -V -p keyname.asc -m message.txt
</code></pre></div></div>

<p>I’m probably going to start signing my open source software releases,
and this is how I intend to do it.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>The Long Key ID Collider</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/07/22/"/>
    <id>urn:uuid:e688fdea-0699-42dd-89ac-564d0b2b65bc</id>
    <updated>2019-07-22T21:27:02Z</updated>
    <category term="crypto"/><category term="openpgp"/><category term="go"/>
    <content type="html">
      <![CDATA[<p>Over the last couple weeks I’ve spent a lot more time working with
OpenPGP keys. It’s a consequence of polishing my <a href="/blog/2019/07/10/">passphrase-derived
PGP key generator</a>. I’ve tightened up the internals, and it’s
enabled me to explore the corners of the format, try interesting
things, and observe how various OpenPGP implementations respond to
weird inputs.</p>

<p>For one particularly cool trick, take a look at these two (private)
keys I generated yesterday. Here’s the first:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-----BEGIN PGP PRIVATE KEY BLOCK-----

xVgEXTU3gxYJKwYBBAHaRw8BAQdAjJgvdh3N2pegXPEuMe25nJ3gI7k8gEgQvCor
AExppm4AAQC0TNsuIRHkxaGjLNN6hQowRMxLXAMrkZfMcp1DTG8GBg1TzQ9udWxs
cHJvZ3JhbS5jb23CXgQTFggAEAUCXTU3gwkQmSpe7h0QSfoAAGq0APwOtCFVCxpv
d/gzKUg0SkdygmriV1UmrQ+KYx9dhzC6xwEAqwDGsSgSbCqPdkwqi/tOn+MwZ5N9
jYxy48PZGZ2V3ws=
=bBGR
-----END PGP PRIVATE KEY BLOCK-----
</code></pre></div></div>

<p>And the second:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-----BEGIN PGP PRIVATE KEY BLOCK-----

xVgEXTU3gxYJKwYBBAHaRw8BAQdAzjSPKjpOuJoLP6G0z7pptx4sBNiqmgEI0xiH
Z4Xb16kAAP0Qyon06UB2/gOeV/KjAjCi91MeoUd7lsA5yn82RR5bOxAkzQ9udWxs
cHJvZ3JhbS5jb23CXgQTFggAEAUCXTU3gwkQmSpe7h0QSfoAAEv4AQDLRqx10v3M
bwVnJ8BDASAOzrPw+Rz1tKbjG9r45iE7NQEAhm9QVtFd8SN337kIWcq8wXA6j1tY
+UeEsjg+SHzkqA4=
=QnLn
-----END PGP PRIVATE KEY BLOCK-----
</code></pre></div></div>

<p>Concatenate these and then import them into GnuPG to have a look at
them. To avoid littering in your actual keyring, especially with
private keys, use the <code class="language-plaintext highlighter-rouge">--homedir</code> option to set up a temporary
keyring. I’m going to omit that option in the examples.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gpg --import &lt; keys.asc
gpg: key 992A5EEE1D1049FA: public key "nullprogram.com" imported
gpg: key 992A5EEE1D1049FA: secret key imported
gpg: key 992A5EEE1D1049FA: public key "nullprogram.com" imported
gpg: key 992A5EEE1D1049FA: secret key imported
gpg: Total number processed: 2
gpg:               imported: 2
gpg:       secret keys read: 2
gpg:   secret keys imported: 2
</code></pre></div></div>

<p>The user ID is “nullprogram.com” since I made these and that’s me
taking credit. “992A5EEE1D1049FA” is called the <em>long key ID</em>: a
64-bit value that identifies the key. It’s the lowest 64 bits of the
full key ID, a 160-bit SHA-1 hash. In the old days everyone used a
<em>short key ID</em> to identify keys, which was the lowest 32 bits of the
key. For these keys, that would be “1D1049FA”. However, this was
deemed <em>way</em> too short, and everyone has since switched to long key
IDs, or even the full 160-bit key ID.</p>

<p>The key ID is nothing more than a SHA-1 hash of the key creation date —
unsigned 32-bit unix epoch seconds — and the public key material. So
secret keys have the same key ID as their associated public key. This
makes sense since they’re a key <em>pair</em> and they go together.</p>

<p>Look closely and you’ll notice that both keypairs have the same long
key ID. If you hadn’t already guessed from the title of this article,
these are two different keys with the same long key ID. In other
words, <strong>I’ve created a long key ID collision</strong>. The GnuPG
<code class="language-plaintext highlighter-rouge">--list-keys</code> command prints the full key ID since it’s so important:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gpg --list-keys
---------------------
pub   ed25519 2019-07-22 [SCA]
      A422F8B0E1BF89802521ECB2992A5EEE1D1049FA
uid           [ unknown] nullprogram.com

pub   ed25519 2019-07-22 [SCA]
      F43BC80C4FC2603904E7BE02992A5EEE1D1049FA
uid           [ unknown] nullprogram.com
</code></pre></div></div>

<p>I was only targeting the lower 64 bits, but I actually managed to
collide the lowest 68 bits by chance. So a long key ID still isn’t
enough to truly identify any particular key.</p>

<p>This isn’t news, of course. Nor am I the first person to create a long
key ID collision. In 2013, <a href="https://mailarchive.ietf.org/arch/msg/openpgp/Al8DzxTH2KT7vtFAgZ1q17Nub_g">David Leon Gil published a long key ID
collision for two 4096-bit RSA public keys</a>. However, that is the
only other example I was able to find. He did not include the private
keys and did not elaborate on how he did it. I know he <em>did</em> generate
viable keys, not just garbage for the public key portions, since they’re
both self-signed.</p>

<p>Creating these keys was trickier than I had anticipated, and there’s an
old, clever trick that makes it work. Building atop the work I did for
passphrase2pgp, I created a standalone tool that will create a long key
ID collision and print the two keypairs to standard output:</p>

<ul>
  <li><strong><a href="https://github.com/skeeto/pgpcollider">https://github.com/skeeto/pgpcollider</a></strong></li>
</ul>

<p>Example usage:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ go get -u github.com/skeeto/pgpcollider
$ pgpcollider --verbose &gt; keys.asc
</code></pre></div></div>

<p>This can take up to a day to complete when run like this. The tool can
optionally coordinate many machines — see the <code class="language-plaintext highlighter-rouge">--server</code> / <code class="language-plaintext highlighter-rouge">-S</code> and
<code class="language-plaintext highlighter-rouge">--client</code> / <code class="language-plaintext highlighter-rouge">-C</code> options — to work together, greatly reducing the total
time. It took around 4 hours to create the keys above on a single
machine, generating a around 1 billion extra keys in the process. As
discussed below, I actually got lucky that it only took 1 billion. If
you modify the program to do short key ID collisions, it only takes a
few seconds.</p>

<p>The rest of this article is about how it works.</p>

<h3 id="birthday-attacks">Birthday Attacks</h3>

<p>An important detail is that <strong>this technique doesn’t target any specific
key ID</strong>. Cloning someone’s long key ID is still very expensive. No,
this is a <a href="https://en.wikipedia.org/wiki/Birthday_attack"><em>birthday attack</em></a>. To find a collision in a space of
2^64, on average I only need to generate 2^32 samples — the square root
of that space. That’s perfectly feasible on a regular desktop computer.
To collide long key IDs, I need only generate about 4 billion IDs and
efficiently do membership tests on that set as I go.</p>

<p>That last step is easier said than done. Naively, that might look like
this (pseudo-code):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>seen := map of long key IDs to keys
loop forever {
    key := generateKey()
    longID := key.ID[12:20]
    if longID in seen {
        output seen[longID]
        output key
        break
    } else {
        seen[longID] = key
    }
}
</code></pre></div></div>

<p>Consider the size of that map. Each long ID is 8 bytes, and we expect
to store around 2^32 of them. That’s <em>at minimum</em> 32 GB of storage
just to track all the long IDs. The map itself is going to have some
overhead, too. Since these are literally random lookups, this all
mostly needs to be in RAM or else lookups are going to be <em>very</em> slow
and impractical.</p>

<p>And I haven’t even counted the keys yet. As a saving grace, these are
Ed25519 keys, so that’s 32 bytes for the public key and 32 bytes for the
private key, which I’ll need if I want to make a self-signature. (The
signature itself will be larger than the secret key.) That’s around
256GB more storage, though at least this can be stored on the hard
drive. However, to address these from the map I’d need at least 38 bits,
plus some more in case it goes over. Just call it another 8 bytes.</p>

<p>So that’s, at a bare minimum, 64GB of RAM plus 256GB of other storage.
Since nothing is ideal, we’ll need more than this. This is all still
feasible, but will require expensive hardware. We can do a lot better.</p>

<h4 id="keys-from-seeds">Keys from seeds</h4>

<p>The first thing you might notice is that we can jettison that 256GB of
storage by being a little more clever about how we generate keys. Since
we don’t actually care about the security of these keys, we can generate
each key from a seed much smaller than the key itself. Instead of using
8 bytes to reference a key in storage, just use those 8 bytes to store
the seed used to make the key.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>counter := rand64()
seen := map of long key IDs to 64-bit seeds
loop forever {
    seed := counter
    counter++
    key := generateKey(seed)
    longID := key.ID[12:20]
    if longID in seen {
        output generateKey(seen[longID])
        output key
        break
    } else {
        seen[longID] = seed
    }
}
</code></pre></div></div>

<p>I’m incrementing a counter to generate the seeds because I don’t want to
experience the birthday paradox to apply to my seeds. Each really must
be unique. I’m using SplitMix64 for the PRNG <a href="https://github.com/skeeto/rng-go">since I learned it’s the
fastest</a> for Go, so a simple increment to generate seeds <a href="/blog/2018/07/31/">is
perfectly fine</a>.</p>

<p>Ultimately, this still uses utterly excessive amounts of memory.
Wouldn’t it be crazy if we could somehow get this 64GB map down to just
a few MBs of RAM? Well, we can!</p>

<h4 id="rainbow-tables">Rainbow tables</h4>

<p>For decades, password crackers have faced a similar problem. They want
to precompute the hashes for billions of popular passwords so that they
can efficiently reverse those password hashes later. However, storing
all those hashes would be unnecessarily expensive, or even infeasible.</p>

<p>So they don’t. Instead they use <a href="https://en.wikipedia.org/wiki/Rainbow_table"><em>rainbow tables</em></a>. Password hashes
are chained together into a hash chain, where a password hash leads to a
new password, then to a hash, and so on. Then only store the beginning
and the end of each chain.</p>

<p>To lookup a hash in the rainbow table, run the hash chain algorithm
starting from the target hash and, for each hash, check if it matches
the end of one of the chains. If so, recompute that chain and note the
step just before the target hash value. That’s the corresponding
password.</p>

<p>For example, suppose the password “foo” hashes to <code class="language-plaintext highlighter-rouge">9bfe98eb</code>, and we
have a <em>reduction function</em> that maps a hash to some password. In this
case, it maps <code class="language-plaintext highlighter-rouge">9bfe98eb</code> to “bar”. A trivial reduction function could
just be an index into a list of passwords. A hash chain starting from
“foo” might look like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>foo -&gt; 9bfe98eb -&gt; bar -&gt; 27af0841 -&gt; baz -&gt; d9d4bbcb
</code></pre></div></div>

<p>In reality a chain would be a lot longer. Another chain starting from
“apple” might look like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>apple -&gt; 7bbc06bc -&gt; candle -&gt; 82a46a63 -&gt; dog -&gt; 98c85d0a
</code></pre></div></div>

<p>We only store the tuples (foo, <code class="language-plaintext highlighter-rouge">d9d4bbcb</code>) and (apple, <code class="language-plaintext highlighter-rouge">98c85d0a</code>) in
our database. If the chains had been one million hashes long, we’d
still only store those two tuples. That’s literally a 1:1000000
compression ratio!</p>

<p>Later on we’re faced with reversing the hash <code class="language-plaintext highlighter-rouge">27af0841</code>, which isn’t
listed directly in the database. So we run the chain forward from that
hash until either I hit the maximum chain length (i.e. password not in
the table), or we recognize a hash:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>27af0841 -&gt; baz -&gt; d9d4bbcb
</code></pre></div></div>

<p>That <code class="language-plaintext highlighter-rouge">d9d4bbcb</code> hash is listed as being in the “foo” hash chain. So I
regenerate that hash chain to discover that “bar” leads to <code class="language-plaintext highlighter-rouge">27af0841</code>.
Password cracked!</p>

<h4 id="collider-rainbow-table">Collider rainbow table</h4>

<p>My collider works very similarly. A hash chain works like this: Start
with a 64-bit seed as before, generate a key, get the long key ID,
<strong>then use the long key ID as the seed for the next key</strong>.</p>

<p><img src="/img/diagram/collider-chain.svg" alt="" /></p>

<p>There’s one big difference. In the rainbow table the purpose is to run
the hash function backwards by looking at the previous step in the
chain. For the collider, I want to know if any of the hash chains
collide. So long as each chain starts from a unique seed, it would mean
we’ve found <strong>two different seeds that lead to the same long key ID</strong>.</p>

<p>Alternatively, it could be two different seeds that lead to the same
key, which wouldn’t be useful, but that’s trivial to avoid.</p>

<p>A simple and efficient way to check if two chains contain the same
sequence is to stop them at the same place in that sequence. Rather than
run the hash chains for some fixed number of steps, they stop when they
reach a <em>distinguishing point</em>. In my collider a distinguishing point is
where the long key ID ends with at least N 0 bits, where N determines
the average chain length. I chose 17 bits.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>func computeChain(seed) {
    loop forever {
        key := generateKey(seed)
        longID := key.ID[12:20]
        if distinguished(longID) {
            return longID
        }
        seed = longID
    }
}
</code></pre></div></div>

<p>If two different hash chains end on the same distinguishing point,
they’re guaranteed to have collided somewhere in the middle.</p>

<p><img src="/img/diagram/collision.svg" alt="" /></p>

<p>To determine where two chains collided, regenerate each chain and find
the first long key ID that they have in common. The step just before are
the colliding keys.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>counter := rand64()
seen := map of long key IDs to 64-bit seeds
loop forever {
    seed := counter
    counter++
    longID := computeChain(seed)
    if longID in seen {
        output findCollision(seed, seen[longID])
        break
    } else {
        seen[longID] = seed
    }
}
</code></pre></div></div>

<p>Hash chains computation is embarrassingly parallel, so the load can be
spread efficiently across CPU cores. With these rainbow(-like) tables,
my tool can generate and track billions of keys in mere megabytes of
memory. The additional computational cost is the time it takes to
generate a couple more chains than otherwise necessary.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Predictable, Passphrase-Derived PGP Keys</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/07/10/"/>
    <id>urn:uuid:cae3111c-2887-404a-bc0e-80b8c45a2d06</id>
    <updated>2019-07-10T04:18:29Z</updated>
    <category term="crypto"/><category term="openpgp"/><category term="go"/>
    <content type="html">
      <![CDATA[<p><em>tl;dr</em>: <strong><a href="https://github.com/skeeto/passphrase2pgp">passphrase2pgp</a></strong>.</p>

<p>One of my long-term concerns has been losing my core cryptographic keys,
or just not having access to them when I need them. I keep my important
data backed up, and if that data is private then I store it encrypted.
My keys are private, but how am I supposed to encrypt them? The chicken
or the egg?</p>

<p>The OpenPGP solution is to (optionally) encrypt secret keys using a key
derived from a passphrase. GnuPG prompts the user for this passphrase
when generating keys and when using secret keys. This protects the keys
at rest, and, with some caution, they can be included as part of regular
backups. The <a href="https://tools.ietf.org/html/rfc4880">OpenPGP specification, RFC 4880</a> has many options
for deriving a key from this passphrase, called <em>String-to-Key</em>, or S2K,
algorithms. None of the options are great.</p>

<p>In 2012, I selected the strongest S2K configuration at the time and,
along with a very strong passphrase, put my GnuPG keyring on the
internet as part of <a href="/blog/2012/06/23/">my public dotfiles repository</a>. It was a
kind of super-backup that would guarantee their availability anywhere
I’d need them.</p>

<p>My timing was bad because, with the release of GnuPG 2.1 in 2014, GnuPG
fundamentally changed its secret keyring format. <a href="https://dev.gnupg.org/T1800">S2K options are now
(quietly!) ignored</a> when deriving the protection keys. Instead it
auto-calibrates to much weaker settings. With this new version of GnuPG,
I could no longer update the keyring in my dotfiles repository without
significantly downgrading its protection.</p>

<p>By 2017 I was pretty irritated with the whole situation. I let my
OpenPGP keys expire, and then <a href="/blog/2017/03/12/">I wrote my own tool</a> to replace
the only feature of GnuPG I was actively using: encrypting my backups
with asymmetric encryption. One of its core features is that the
asymmetric keypair can be derived from a passphrase using a memory-hard
key derivation function (KDF). Attackers must commit a significant
quantity of memory (expensive) when attempting to crack the passphrase,
making the passphrase that much more effective.</p>

<p>Since the asymmetric keys themselves, not just the keys protecting them,
are derived from a passphrase, I never need to back them up! They’re
also always available whenever I need them. <strong>My keys are essentially
stored entirely in my brain</strong> as if I was a character in a William
Gibson story.</p>

<h3 id="tackling-openpgp-key-generation">Tackling OpenPGP key generation</h3>

<p>At the time I had expressed my interest in having this feature for
OpenPGP keys. It’s something I’ve wanted for a long time. I first <a href="https://github.com/skeeto/passphrase2pgp/tree/old-version">took
a crack at it in 2013</a> (now the the <code class="language-plaintext highlighter-rouge">old-version</code> branch) for
generating RSA keys. <a href="/blog/2015/10/30/">RSA isn’t that complicated</a> but <a href="https://blog.trailofbits.com/2019/07/08/fuck-rsa/">it’s very
easy to screw up</a>. Since I was rolling it from scratch, I didn’t
really trust myself not to subtly get it wrong. Plus I never figured out
how to self-sign the key. GnuPG doesn’t accept secret keys that aren’t
self-signed, so it was never useful.</p>

<p>I took another crack at it in 2018 with a much more brute force
approach. When a program needs to generate keys, it will either read
from <code class="language-plaintext highlighter-rouge">/dev/u?random</code> or, on more modern systems, call <code class="language-plaintext highlighter-rouge">getentropy(3)</code>.
These are all ultimately system calls, and <a href="/blog/2018/06/23/">I know how to intercept
those with Ptrace</a>. If I want to control key generation for <em>any</em>
program, not just GnuPG, I could intercept these inputs and replace them
with the output of a CSPRNG keyed by a passphrase.</p>

<p><strong><a href="https://github.com/skeeto/keyed">Keyed</a>: Linux Entropy Interception</strong></p>

<p>In practice this doesn’t work at all. Real programs like GnuPG and
OpenSSH’s <code class="language-plaintext highlighter-rouge">ssh-keygen</code> don’t rely solely on these entropy inputs. They
<a href="/blog/2019/04/30/">also grab entropy from other places</a>, like <code class="language-plaintext highlighter-rouge">getpid(2)</code>,
<code class="language-plaintext highlighter-rouge">gettimeofday(2)</code>, and even extract their own scheduler and execution
time noise. Without modifying these programs I couldn’t realistically
control their key generation.</p>

<p>Besides, even if it <em>did</em> work, it would still be fragile and unreliable
since these programs could always change how they use the inputs. So,
ultimately, it was more of an experiment than something practical.</p>

<h3 id="passphrase2pgp">passphrase2pgp</h3>

<p>For regular readers, it’s probably obvious that I <a href="/tags/go/">recently learned
Go</a>. While searching for good projects idea for cutting my teeth, I
noticed that <a href="https://golang.org/x/">Go’s “extended” standard library</a> has a lot of useful
cryptographic support, so the idea of generating the keys myself may be
worth revisiting.</p>

<p>Something else also happened since my previous attempt: The OpenPGP
ecosystem now has widespread support for elliptic curve cryptography. So
instead of RSA, I could generate a Curve25519 keypair, which, by design,
is basically impossible to screw up. <strong>Not only would I be generating
keys on my own terms, I’d being doing it <em>in style</em>, baby.</strong></p>

<p>There are two different ways to use Curve25519:</p>

<ol>
  <li>Digital signatures: Ed25519 (EdDSA)</li>
  <li>Diffie–Hellman (encryption): X25519 (ECDH)</li>
</ol>

<p>In GnuPG terms, the first would be a “sign only” key and the second is
an “encrypt only” key. But can’t you usually do both after you generate
a new OpenPGP key? If you’ve used GnuPG, you’ve probably seen the terms
“primary key” and “subkey”, but you probably haven’t had think about
them since it’s all usually automated.</p>

<p>The <em>primary key</em> is the one associated directly with your identity.
It’s always a signature key. The OpenPGP specification says this is a
signature key only by convention, but, practically speaking, it really
must be since signatures is what holds everything together. Like
packaging tape.</p>

<p>If you want to use encryption, independently generate an encryption key,
then sign that key with the primary key, binding that key as a <em>subkey</em>
to the primary key. This all happens automatically with GnuPG.</p>

<p>Fun fact: Two different primary keys can have the same subkey. Anyone
could even bind any of your subkeys to their primary key! They only need
to sign the public key! Though, of course, they couldn’t actually use
your key since they’d lack the secret key. It would just be really
confusing, and could, perhaps in certain situations, even cause some
OpenPGP clients to malfunction. (Note to self: This demands
investigation!)</p>

<p>It’s also possible to have signature subkeys. What good is that?
Paranoid folks will keep their primary key only on a secure, air-gapped,
then use only subkeys on regular systems. The subkeys can be revoked and
replaced independently of the primary key if something were to go wrong.</p>

<p>In Go, generating an X25519 key pair is this simple (yes, it actually
takes array pointers, <a href="https://github.com/golang/go/issues/32670">which is rather weird</a>):</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">package</span> <span class="n">main</span>

<span class="k">import</span> <span class="p">(</span>
	<span class="s">"crypto/rand"</span>
	<span class="s">"fmt"</span>

	<span class="s">"golang.org/x/crypto/curve25519"</span>
<span class="p">)</span>

<span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
	<span class="k">var</span> <span class="n">seckey</span><span class="p">,</span> <span class="n">pubkey</span> <span class="p">[</span><span class="m">32</span><span class="p">]</span><span class="kt">byte</span>
	<span class="n">rand</span><span class="o">.</span><span class="n">Read</span><span class="p">(</span><span class="n">seckey</span><span class="p">[</span><span class="o">:</span><span class="p">])</span> <span class="c">// FIXME: check for error</span>
	<span class="n">seckey</span><span class="p">[</span><span class="m">0</span><span class="p">]</span> <span class="o">&amp;=</span> <span class="m">248</span>
	<span class="n">seckey</span><span class="p">[</span><span class="m">31</span><span class="p">]</span> <span class="o">&amp;=</span> <span class="m">127</span>
	<span class="n">seckey</span><span class="p">[</span><span class="m">31</span><span class="p">]</span> <span class="o">|=</span> <span class="m">64</span>
	<span class="n">curve25519</span><span class="o">.</span><span class="n">ScalarBaseMult</span><span class="p">(</span><span class="o">&amp;</span><span class="n">pubkey</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">seckey</span><span class="p">)</span>
	<span class="n">fmt</span><span class="o">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"pub %x</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">pubkey</span><span class="p">[</span><span class="o">:</span><span class="p">])</span>
	<span class="n">fmt</span><span class="o">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"sec %x</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">seckey</span><span class="p">[</span><span class="o">:</span><span class="p">])</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The three bitwise operations are optional since it will do these
internally, but it ensures that the secret key is in its canonical form.
The actual Diffie–Hellman exchange requires just one more function call:
<code class="language-plaintext highlighter-rouge">curve25519.ScalarMult()</code>.</p>

<p>For Ed25519, the API is higher-level:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">package</span> <span class="n">main</span>

<span class="k">import</span> <span class="p">(</span>
	<span class="s">"crypto/rand"</span>
	<span class="s">"fmt"</span>

	<span class="s">"golang.org/x/crypto/ed25519"</span>
<span class="p">)</span>

<span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
	<span class="n">seed</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span> <span class="n">ed25519</span><span class="o">.</span><span class="n">SeedSize</span><span class="p">)</span>
	<span class="n">rand</span><span class="o">.</span><span class="n">Read</span><span class="p">(</span><span class="n">seed</span><span class="p">)</span> <span class="c">// FIXME: check for error</span>
	<span class="n">key</span> <span class="o">:=</span> <span class="n">ed25519</span><span class="o">.</span><span class="n">NewKeyFromSeed</span><span class="p">(</span><span class="n">seed</span><span class="p">)</span>
	<span class="n">fmt</span><span class="o">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"pub %x</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">key</span><span class="p">[</span><span class="m">32</span><span class="o">:</span><span class="p">])</span>
	<span class="n">fmt</span><span class="o">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"sec %x</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">key</span><span class="p">[</span><span class="o">:</span><span class="m">32</span><span class="p">])</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Signing a message with this key is just one function call:
<code class="language-plaintext highlighter-rouge">ed25519.Sign()</code>.</p>

<p>Unfortunately that’s the easy part. The other 400 lines of the real
program are concerned only with encoding these values in the complex
OpenPGP format. That’s the hard part. GnuPG’s <code class="language-plaintext highlighter-rouge">--list-packets</code> option
was really useful for debugging this part.</p>

<h3 id="openpgp-specification">OpenPGP specification</h3>

<p>(Feel free to skip this section if the OpenPGP wire format isn’t
interesting to you.)</p>

<p>Following the specification was a real challenge, especially since many
of the details for Curve25519 only appear in still incomplete (and still
erroneous) updates to the specification. I certainly don’t envy the
people who have to parse arbitrary OpenPGP packets. It’s finicky and has
arbitrary parts that don’t seem to serve any purpose, such as redundant
prefix and suffix bytes on signature inputs. Fortunately I only had to
worry about the subset that represents an unencrypted secret key export.</p>

<p>OpenPGP data is broken up into <em>packets</em>. Each packet begins with a tag
identifying its type, followed by a length, which itself is a variable
length. All the packets produced by passphrase2pgp are short, so I could
pretend lengths were all a single byte long.</p>

<p>For a secret key export with one subkey, we need the following packets
in this order:</p>

<ol>
  <li>Secret-Key: Public-Key packet with secret key appended</li>
  <li>User ID: just a length-prefixed, UTF-8 string</li>
  <li>Signature: binds Public-Key packet (1) and User ID packet (2)</li>
  <li>Secret-Subkey: Public-Subkey packet with secret subkey appended</li>
  <li>Signature: binds Public-Key packet (1) and Public-Subkey packet (4)</li>
</ol>

<p>A Public-Key packet contains the creation date, key type, and public key
data. A Secret-Key packet is the same, but with the secret key literally
appended on the end and a different tag. The Key ID is (essentially) a
SHA-1 hash of the Public-Key packet, meaning <strong>the creation date is part
of the Key ID</strong>. That’s important for later.</p>

<p>I had wondered if the <a href="https://shattered.io/">SHAttered</a> attack could be used to create
two different keys with the same full Key ID. However, there’s no slack
space anywhere in the input, so I doubt it.</p>

<p>User IDs are usually a RFC 2822 name and email address, but that’s only
convention. It can literally be an empty string, though that wouldn’t be
useful. OpenPGP clients that require anything more than an empty string,
such as GnuPG during key generation, are adding artificial restrictions.</p>

<p>The first Signature packet indicates the signature date, the signature
issuer’s Key ID, and then optional metadata about how the primary key is
to be used and the capabilities the key owner’s client. The signature
itself is formed by appending the Public-Key packet portion of the
Secret-Key packet, the User ID packet, and the previously described
contents of the signature packet. The concatenation is hashed, the hash
is signed, and the signature is appended to the packet. Since the
options are included in the signature, they can’t be changed by another
person.</p>

<p>In theory the signature is redundant. A client could accept the
Secret-Key packet and User ID packet and consider the key imported. It
would then create its own self-signature since it has everything it
needs. However, my primary target for passphrase2pgp is GnuPG, and it
will not accept secret keys that are not self-signed.</p>

<p>The Secret-Subkey packet is exactly the same as the Secret-Key packet
except that it uses a different tag to indicate it’s a subkey.</p>

<p>The second Signature packet is constructed the same as the previous
signature packet. However, it signs the concatenation of the Public-Key
and Public-Subkey packets, binding the subkey to that primary key. This
key may similarly have its own options.</p>

<p>To create a public key export from this input, a client need only chop
off the secret keys and fix up the packet tags and lengths. The
signatures remain untouched since they didn’t include the secret keys.
That’s essentially what other people will receive about your key.</p>

<p>If someone else were to create a Signature packet binding your
Public-Subkey packet with their Public-Key packet, they could set their
own options on their version of the key. So my question is: Do clients
properly track this separate set of options and separate owner for the
same key? If not, they have a problem!</p>

<p>The format may not sound so complex from this description, but there are
a ton of little details that all need to be correct. To make matters
worse, the feedback is usually just a binary “valid” or “invalid”. The
world could use an OpenPGP format debugger.</p>

<h3 id="usage">Usage</h3>

<p>There is one required argument: either <code class="language-plaintext highlighter-rouge">--uid</code> (<code class="language-plaintext highlighter-rouge">-u</code>) or <code class="language-plaintext highlighter-rouge">--load</code>
(<code class="language-plaintext highlighter-rouge">-l</code>). The former specifies a User ID since a key with an empty User ID
is pretty useless. It’s my own artificial restriction on the User ID.
The latter loads a previously-generated key which will come with a User
ID.</p>

<p>To generate a key for use in GnuPG, just pipe the output straight into
GnuPG:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ passphrase2pgp --uid "Foo &lt;foo@example.com&gt;" | gpg --import
</code></pre></div></div>

<p>You will be prompted for a passphrase. That passphrase is run through
<a href="https://github.com/P-H-C/phc-winner-argon2">Argon2id</a>, a memory-hard KDF, with the User ID as the salt.
Deriving the key requires 8 passes over 1GB of state, which takes my
current computers around 8 seconds. With the <code class="language-plaintext highlighter-rouge">--paranoid</code> (<code class="language-plaintext highlighter-rouge">-x</code>) option
enabled, that becomes 16 passes over 2GB (perhaps not paranoid enough?).
The output is 64 bytes: 32 bytes to seed the primary key and 32 bytes to
seed the subkey.</p>

<p>Despite the aggressive KDF settings, you will still need to choose a
strong passphrase. Anyone who has your public key can mount an offline
attack. A 10-word Diceware or <a href="/blog/2017/07/27/">Pokerware</a> passphrase is more than
sufficient (~128 bits) while also being quite reasonable to memorize.</p>

<p>Since the User ID is the salt, an attacker couldn’t build a single
rainbow table to attack passphrases for different people. (Though your
passphrase really should be strong enough that this won’t matter!) The
cost is that you’ll need to use exactly the same User ID again to
reproduce the key. <em>In theory</em> you could change the User ID afterward to
whatever you like without affecting the Key ID, though it will require a
new self-signature.</p>

<p>The keys are not encrypted (no S2K), and there are few options you can
choose when generating the keys. If you want to change any of this, use
GnuPG’s <code class="language-plaintext highlighter-rouge">--edit-key</code> tool after importing. For example, to set a
protection passphrase:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gpg --edit-key Foo
gpg&gt; passwd
</code></pre></div></div>

<p>There’s a lot that can be configured from this interface.</p>

<p>If you just need the public key to publish or share, the <code class="language-plaintext highlighter-rouge">--public</code>
(<code class="language-plaintext highlighter-rouge">-p</code>) option will suppress the private parts and output only a public
key. It works well in combination with ASCII armor, <code class="language-plaintext highlighter-rouge">--armor</code> (<code class="language-plaintext highlighter-rouge">-a</code>).
For example, to put your public key on the clipboard:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ passphrase2pgp -u '...' -ap | xclip
</code></pre></div></div>

<p>The tool can create detached signatures (<code class="language-plaintext highlighter-rouge">--sign</code>, <code class="language-plaintext highlighter-rouge">-S</code>) entirely on its
own, too, so you don’t need to import the keys into GnuPG just to make
signatures:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ passphrase2pgp --sign --uid '...' program.exe
</code></pre></div></div>

<p>This would create a file named <code class="language-plaintext highlighter-rouge">program.exe.sig</code> with the detached
signature, ready to be verified by another OpenPGP implementation. In
fact, you can hook it directly up to Git for signing your tags and
commits without GnuPG:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git config --global gpg.program passphrase2pgp
</code></pre></div></div>

<p>This only works for signing, and it cannot verify (<code class="language-plaintext highlighter-rouge">verify-tag</code> or
<code class="language-plaintext highlighter-rouge">verify-commit</code>).</p>

<p>It’s pretty tedious to enter the <code class="language-plaintext highlighter-rouge">--uid</code> option all the time, so, if
omitted, passphrase2pgp will infer the User ID from the environment
variables REALNAME and EMAIL. Combined with the KEYID environment
variable (see the README for details), you can easily get away with
<em>never</em> storing your keys: only generate them on demand when needed.</p>

<p>That’s how I intend to use passphrase2pgp. When I want to sign a file,
I’ll only need one option, one passphrase prompt, and a few seconds of
patience:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ passphrase2pgp -S path/to/file
</code></pre></div></div>

<h4 id="january-1-1970">January 1, 1970</h4>

<p>The first time you run the tool you might notice one offensive aspect of
its output: Your key will be dated January 1, 1970 — i.e. unix epoch
zero. This predates PGP itself by more than two decades, so it might
alarm people who receive your key.</p>

<p>Why do this? As I noted before, the creation date is part of the Key ID.
Use a different date, and, as far as OpenPGP is concerned, you have a
different key. Since users probably don’t want to remember a specific
datetime, <em>at seconds resolution</em>, in addition to their passphrase,
passphrase2pgp uses the same hard-coded date by default. A date of
January 1, 1970 is like NULL in a database: no data.</p>

<p>If you don’t like this, you can override it with the <code class="language-plaintext highlighter-rouge">--time</code> (<code class="language-plaintext highlighter-rouge">-t</code>) or
<code class="language-plaintext highlighter-rouge">--now</code> (<code class="language-plaintext highlighter-rouge">-n</code>) options, but it’s up to you to remain consistent.</p>

<h4 id="vanity-keys">Vanity Keys</h4>

<p>If you’re interested in vanity keys — e.g. where the Key ID spells out
words or looks unusual — it wouldn’t take much work to hack up the
passphrase2pgp source into generating your preferred vanity keys. It
would easily beat anything else I could find online.</p>

<h3 id="reconsidering-limited-openpgp">Reconsidering limited OpenPGP</h3>

<p>Initially my intention was <em>never</em> to output an encryption subkey, and
passphrase2pgp would only be useful for signatures. By default it still
only produces a sign key, but you can still get an encryption subkey
with the <code class="language-plaintext highlighter-rouge">--subkey</code> (<code class="language-plaintext highlighter-rouge">-s</code>) option. I figured it might be useful to
generate an encryption key, even if it’s not output by default. Users
can always ask for it later if they have a need for it.</p>

<p>Why only a signing key? Nobody should be using OpenPGP for encryption
anymore. Use better tools instead and retire the <a href="https://blog.cryptographyengineering.com/2014/08/13/whats-matter-with-pgp/">20th century
cryptography</a>. If you don’t have an encryption subkey, nobody can
send you OpenPGP-encrypted messages.</p>

<p>In contrast, OpenPGP signatures are still kind of useful and lack a
practical alternative. The Web of Trust failed to reach critical mass,
but that doesn’t seem to matter much in practice. Important OpenPGP keys
can be bootstrapped off TLS by strategically publishing them on HTTPS
servers. Keybase.io has done interesting things in this area.</p>

<p>Further, <a href="https://github.blog/2016-04-05-gpg-signature-verification/">GitHub officially supports OpenPGP signatures</a>, and I
believe GitLab does too. This is another way to establish trust for a
key. IMHO, there’s generally too much emphasis on binding a person’s
legal identity to their OpenPGP key (e.g. the idea behind key-signing
parties). I suppose that’s useful for holding a person legally
accountable if they do something wrong. I’d prefer trust a key with has
an established history of valuable community contributions, even if done
so <a href="https://en.wikipedia.org/wiki/Why_the_lucky_stiff">only under a pseudonym</a>.</p>

<p>So sometime in the future I may again advertise an OpenPGP public key.
If I do, those keys would certainly be generated with passphrase2pgp. I
may not even store the secret keys on a keyring, and instead generate
them on the fly only when I occasionally need them.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  <entry>
    <title>Looking for Entropy in All the Wrong Places</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/04/30/"/>
    <id>urn:uuid:67da1a72-1103-4e12-a646-8a57443619eb</id>
    <updated>2019-04-30T22:50:09Z</updated>
    <category term="c"/><category term="lua"/><category term="crypto"/>
    <content type="html">
      <![CDATA[<p>Imagine we’re writing a C program and we need some random numbers. Maybe
it’s for a game, or for a Monte Carlo simulation, or for cryptography.
The standard library has a <code class="language-plaintext highlighter-rouge">rand()</code> function for some of these purposes.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">r</span> <span class="o">=</span> <span class="n">rand</span><span class="p">();</span>
</code></pre></div></div>

<p>There are some problems with this. Typically the implementation is a
rather poor PRNG, and <a href="/blog/2017/09/21/">we can do much better</a>. It’s a poor choice
for Monte Carlo simulations, and outright dangerous for cryptography.
Furthermore, it’s usually a dynamic function call, which <a href="/blog/2018/05/27/">has a high
overhead</a> compared to how little the function actually does. In
glibc, it’s also synchronized, adding even more overhead.</p>

<p>But, more importantly, this function returns the same sequences of
values each time the program runs. If we want different numbers each
time the program runs, it needs to be seeded — but seeded with <em>what</em>?
Regardless of what PRNG we ultimately use, we need inputs unique to this
particular execution.</p>

<h3 id="the-right-places">The right places</h3>

<p>On any modern unix-like system, the classical approach is to open
<code class="language-plaintext highlighter-rouge">/dev/urandom</code> and read some bytes. It’s not part of POSIX but it is a
<em>de facto</em> standard. These random bits are seeded from the physical
world by the operating system, making them highly unpredictable and
uncorrelated. They’re are suitable for keying a CSPRNG and, from
there, <a href="https://blog.cr.yp.to/20140205-entropy.html">generating all the secure random bits you will ever
need</a> (perhaps with <a href="https://blog.cr.yp.to/20170723-random.html">fast-key-erasure</a>). Why not
<code class="language-plaintext highlighter-rouge">/dev/random</code>? Because on Linux <a href="https://www.2uo.de/myths-about-urandom/">it’s pointlessly
superstitious</a>, which has basically ruined that path for
everyone.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* Returns zero on failure. */</span>
<span class="kt">int</span>
<span class="nf">getbits</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="n">buf</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">len</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">result</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="kt">FILE</span> <span class="o">*</span><span class="n">f</span> <span class="o">=</span> <span class="n">fopen</span><span class="p">(</span><span class="s">"/dev/urandom"</span><span class="p">,</span> <span class="s">"rb"</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">result</span> <span class="o">=</span> <span class="n">fread</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="n">len</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">f</span><span class="p">);</span>
        <span class="n">fclose</span><span class="p">(</span><span class="n">f</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">result</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">int</span>
<span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">unsigned</span> <span class="n">seed</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">getbits</span><span class="p">(</span><span class="o">&amp;</span><span class="n">seed</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">seed</span><span class="p">)))</span> <span class="p">{</span>
        <span class="n">srand</span><span class="p">(</span><span class="n">seed</span><span class="p">);</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="n">die</span><span class="p">();</span>
    <span class="p">}</span>

    <span class="cm">/* ... */</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Note how there are two different places <code class="language-plaintext highlighter-rouge">getbits()</code> could fail, with
multiple potential causes.</p>

<ul>
  <li>
    <p>It could fail to open the file. Perhaps the program isn’t running on a
modern unix-like system. Perhaps it’s running in a chroot and
<code class="language-plaintext highlighter-rouge">/dev/urandom</code> wasn’t created. Perhaps there are too many file
descriptors already open. Perhaps there isn’t enough memory available
to open a file. Perhaps the file permissions disallow it or it’s
blocked by Mandatory Access Control (MAC).</p>
  </li>
  <li>
    <p>It could fail to read the file. This essentially can’t happen unless
the system is severely misconfigured, in which case a successful
read would be suspect anyway. In this case it’s probably still a
good idea to check the result.</p>
  </li>
</ul>

<p>The need for creating a file descriptor a serious issue for libraries.
Libraries that quietly create and close file descriptors can interfere
with the main program, especially if its asynchronous. The main program
might rely on file descriptors being consecutive, predictable, or
monotonic (<a href="https://www.freedesktop.org/software/systemd/man/sd_listen_fds.html">example</a>). File descriptors are also a limited resource,
so it may exhaust a file descriptor slot needed for the main program.
For a network service, a remote attacker could perhaps open enough
sockets to deny a file descriptor to <code class="language-plaintext highlighter-rouge">getbits()</code>, blocking the program
from gathering entropy.</p>

<p><code class="language-plaintext highlighter-rouge">/dev/urandom</code> is simple, but it’s not an ideal API.</p>

<h4 id="getentropy2">getentropy(2)</h4>

<p>Wouldn’t it be nicer if our program could just directly ask the
operating system to fill a buffer with random bits? That’s what the
OpenBSD folks thought, so they introduced a <a href="https://man.openbsd.org/getentropy.2"><code class="language-plaintext highlighter-rouge">getentropy(2)</code></a>
system call. When called correctly <em>it cannot fail</em>!</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">getentropy</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="n">buf</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">buflen</span><span class="p">);</span>
</code></pre></div></div>

<p>Other operating systems followed suit, <a href="https://lwn.net/Articles/711013/">including Linux</a>, though
on Linux <code class="language-plaintext highlighter-rouge">getentropy(2)</code> is a library function implemented using
<a href="http://man7.org/linux/man-pages/man2/getrandom.2.html"><code class="language-plaintext highlighter-rouge">getrandom(2)</code></a>, the actual system call. It’s been in the Linux
kernel since version 3.17 (October 2014), but the libc wrapper didn’t
appear in glibc until version 2.25 (February 2017). So as of this
writing, there are still many systems where it’s still not practical
to use even if their kernel is new enough.</p>

<p>For now on Linux you may still want to check, and have a strategy in
place, for an <code class="language-plaintext highlighter-rouge">ENOSYS</code> result. Some systems are still running kernels
that are 5 years old, or older.</p>

<p>OpenBSD also has another trick up its trick-filled sleeves: the
<a href="https://github.com/openbsd/src/blob/master/libexec/ld.so/SPECS.randomdata"><code class="language-plaintext highlighter-rouge">.openbsd.randomdata</code></a> section. Just as the <code class="language-plaintext highlighter-rouge">.bss</code> section is
filled with zeros, the <code class="language-plaintext highlighter-rouge">.openbsd.randomdata</code> section is filled with
securely-generated random bits. You could put your PRNG state in this
section and it will be seeded as part of loading the program. Cool!</p>

<h4 id="rtlgenrandom">RtlGenRandom()</h4>

<p>Windows doesn’t have <code class="language-plaintext highlighter-rouge">/dev/urandom</code>. Instead it has:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">CryptGenRandom()</code></li>
  <li><code class="language-plaintext highlighter-rouge">CryptAcquireContext()</code></li>
  <li><code class="language-plaintext highlighter-rouge">CryptReleaseContext()</code></li>
</ul>

<p>Though in typical Win32 fashion, the API is ugly, overly-complicated,
and has multiple possible failure points. It’s essentially impossible
to use without referencing documentation. Ugh.</p>

<p>However, <a href="/blog/2018/04/13/">Windows 98 and later</a> has <a href="https://docs.microsoft.com/en-us/windows/desktop/api/ntsecapi/nf-ntsecapi-rtlgenrandom"><code class="language-plaintext highlighter-rouge">RtlGenRandom()</code></a>,
which has a much more reasonable interface. Looks an awful lot like
<code class="language-plaintext highlighter-rouge">getentropy(2)</code>, eh?</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">BOOLEAN</span> <span class="nf">RtlGenRandom</span><span class="p">(</span>
  <span class="n">PVOID</span> <span class="n">RandomBuffer</span><span class="p">,</span>
  <span class="n">ULONG</span> <span class="n">RandomBufferLength</span>
<span class="p">);</span>
</code></pre></div></div>

<p>The problem is that it’s not quite an official API, and no promises
are made about it. In practice, far too much software now depends on
it that the API is unlikely to ever break. Despite the prototype
above, this function is <em>actually</em> named <code class="language-plaintext highlighter-rouge">SystemFunction036()</code>, and
you have to supply your own prototype. Here’s my little drop-in
snippet that turns it nearly into <code class="language-plaintext highlighter-rouge">getentropy(2)</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#ifdef _WIN32
#  define WIN32_LEAN_AND_MEAN
#  include &lt;windows.h&gt;
#  pragma comment(lib, "advapi32.lib")
</span>   <span class="n">BOOLEAN</span> <span class="n">NTAPI</span> <span class="nf">SystemFunction036</span><span class="p">(</span><span class="n">PVOID</span><span class="p">,</span> <span class="n">ULONG</span><span class="p">);</span>
<span class="cp">#  define getentropy(buf, len) (SystemFunction036(buf, len) ? 0 : -1)
#endif
</span></code></pre></div></div>

<p>It works in Wine, too, where, at least in my version, it reads from
<code class="language-plaintext highlighter-rouge">/dev/urandom</code>.</p>

<h3 id="the-wrong-places">The wrong places</h3>

<p>That’s all well and good, but suppose we’re masochists. We want our
program to be <a href="/blog/2017/03/30/">maximally portable</a> so we’re sticking strictly to
functionality found in the standard C library. That means no
<code class="language-plaintext highlighter-rouge">getentropy(2)</code> and no <code class="language-plaintext highlighter-rouge">RtlGenRandom()</code>. We can still try to open
<code class="language-plaintext highlighter-rouge">/dev/urandom</code>, but it might fail, or it might not actually be useful,
so we’ll want a backup.</p>

<p>The usual approach found in a thousand tutorials is <code class="language-plaintext highlighter-rouge">time(3)</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">srand</span><span class="p">(</span><span class="n">time</span><span class="p">(</span><span class="nb">NULL</span><span class="p">));</span>
</code></pre></div></div>

<p>It would be better to <a href="/blog/2018/07/31/">use an integer hash function</a> to mix up the
result from <code class="language-plaintext highlighter-rouge">time(0)</code> before using it as a seed. Otherwise two programs
started close in time may have similar initial sequences.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">srand</span><span class="p">(</span><span class="n">triple32</span><span class="p">(</span><span class="n">time</span><span class="p">(</span><span class="nb">NULL</span><span class="p">)));</span>
</code></pre></div></div>

<p>The more pressing issue is that <code class="language-plaintext highlighter-rouge">time(3)</code> has a resolution of one
second. If the program is run twice inside of a second, they’ll both
have the same sequence of numbers. It would be better to use a higher
resolution clock, but, <strong>standard C doesn’t provide a clock with greater
than one second resolution</strong>. That normally requires calling into POSIX
or Win32.</p>

<p>So, we need to find some other sources of entropy unique to each
execution of the program.</p>

<h4 id="quick-and-dirty-string-hash-function">Quick and dirty “string” hash function</h4>

<p>Before we get into that, we need a way to mix these different sources
together. Here’s a <a href="/blog/2018/06/10/">small</a>, 32-bit “string” hash function. The loop
is the same algorithm as Java’s <code class="language-plaintext highlighter-rouge">hashCode()</code>, and I appended <a href="/blog/2018/07/31/">my own
integer hash</a> as a finalizer for much better diffusion.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span>
<span class="nf">hash32s</span><span class="p">(</span><span class="k">const</span> <span class="kt">void</span> <span class="o">*</span><span class="n">buf</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">len</span><span class="p">,</span> <span class="kt">uint32_t</span> <span class="n">h</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">const</span> <span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">p</span> <span class="o">=</span> <span class="n">buf</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">len</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
        <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="o">*</span> <span class="mi">31</span> <span class="o">+</span> <span class="n">p</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
    <span class="n">h</span> <span class="o">^=</span> <span class="n">h</span> <span class="o">&gt;&gt;</span> <span class="mi">17</span><span class="p">;</span>
    <span class="n">h</span> <span class="o">*=</span> <span class="n">UINT32_C</span><span class="p">(</span><span class="mh">0xed5ad4bb</span><span class="p">);</span>
    <span class="n">h</span> <span class="o">^=</span> <span class="n">h</span> <span class="o">&gt;&gt;</span> <span class="mi">11</span><span class="p">;</span>
    <span class="n">h</span> <span class="o">*=</span> <span class="n">UINT32_C</span><span class="p">(</span><span class="mh">0xac4c1b51</span><span class="p">);</span>
    <span class="n">h</span> <span class="o">^=</span> <span class="n">h</span> <span class="o">&gt;&gt;</span> <span class="mi">15</span><span class="p">;</span>
    <span class="n">h</span> <span class="o">*=</span> <span class="n">UINT32_C</span><span class="p">(</span><span class="mh">0x31848bab</span><span class="p">);</span>
    <span class="n">h</span> <span class="o">^=</span> <span class="n">h</span> <span class="o">&gt;&gt;</span> <span class="mi">14</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">h</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>It accepts a starting hash value, which is essentially a “context” for
the digest that allows different inputs to be appended together. The
finalizer acts as an implicit “stop” symbol in between inputs.</p>

<p>I used fixed-width integers, but it could be written nearly as concisely
using only <code class="language-plaintext highlighter-rouge">unsigned long</code> and some masking to truncate to 32-bits. I
leave this as an exercise to the reader.</p>

<p>Some of the values to be mixed in will be pointers themselves. These
could instead be cast to integers and passed through an integer hash
function, but using string hash avoids <a href="/blog/2016/05/30/">various caveats</a>. Besides,
one of the inputs will be a string, so we’ll need this function anyway.</p>

<h4 id="randomized-pointers-aslr-random-stack-gap-etc">Randomized pointers (ASLR, random stack gap, etc.)</h4>

<p>Attackers can use predictability to their advantage, so modern systems
use unpredictability to improve security. Memory addresses for various
objects and executable code are randomized since some attacks require
an attacker to know their addresses. We can skim entropy from these
pointers to seed our PRNG.</p>

<p>Address Space Layout Randomization (ASLR) is when executable code and
its associated data is loaded to a random offset by the loader. Code
designed for this is called Position Independent Code (PIC). This has
long been used when loading dynamic libraries so that all of the
libraries on a system don’t have to coordinate with each other to
avoid overlapping.</p>

<p>To improve security, it has more recently been extended to programs
themselves. On both modern unix-like systems and Windows,
position-independent executables (PIE) are now the default.</p>

<p>To skim entropy from ASLR, we just need the address of one of our
functions. All the functions in our program will have the same relative
offset, so there’s no reason to use more than one. An obvious choice is
<code class="language-plaintext highlighter-rouge">main()</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kt">uint32_t</span> <span class="n">h</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>  <span class="cm">/* initial hash value */</span>
    <span class="kt">int</span> <span class="p">(</span><span class="o">*</span><span class="n">mainptr</span><span class="p">)()</span> <span class="o">=</span> <span class="n">main</span><span class="p">;</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="o">&amp;</span><span class="n">mainptr</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">mainptr</span><span class="p">),</span> <span class="n">h</span><span class="p">);</span>
</code></pre></div></div>

<p>Notice I had to store the address of <code class="language-plaintext highlighter-rouge">main()</code> in a variable, and then
treat <em>the pointer itself</em> as a buffer for the hash function? It’s not
hashing the machine code behind <code class="language-plaintext highlighter-rouge">main</code>, just its address. The symbol
<code class="language-plaintext highlighter-rouge">main</code> doesn’t store an address, so it can’t be given to the hash
function to represent its address. This is analogous to an array
versus a pointer.</p>

<p>On a typical x86-64 Linux system, and when this is a PIE, that’s about
3 bytes worth of entropy. On 32-bit systems, virtual memory is so
tight that it’s worth a lot less. We might want more entropy than
that, and we want to cover the case where the program isn’t compiled
as a PIE.</p>

<p>On unix-like systems, programs are typically dynamically linked against
the C library, libc. Each shared object gets its own ASLR offset, so we
can skim more entropy from each shared object by picking a function or
variable from each. Let’s do <code class="language-plaintext highlighter-rouge">malloc(3)</code> for libc ASLR:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kt">void</span> <span class="o">*</span><span class="p">(</span><span class="o">*</span><span class="n">mallocptr</span><span class="p">)()</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">;</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="o">&amp;</span><span class="n">mallocptr</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">mallocptr</span><span class="p">),</span> <span class="n">h</span><span class="p">);</span>
</code></pre></div></div>

<p>Allocators themselves often randomize the addresses they return so that
data objects are stored at unpredictable addresses. In particular, glibc
uses different strategies for small (<code class="language-plaintext highlighter-rouge">brk(2)</code>) versus big (<code class="language-plaintext highlighter-rouge">mmap(2)</code>)
allocations. That’s two different sources of entropy:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kt">void</span> <span class="o">*</span><span class="n">small</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>        <span class="cm">/* 1 byte */</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="o">&amp;</span><span class="n">small</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">small</span><span class="p">),</span> <span class="n">h</span><span class="p">);</span>
    <span class="n">free</span><span class="p">(</span><span class="n">small</span><span class="p">);</span>

    <span class="kt">void</span> <span class="o">*</span><span class="n">big</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="mi">1UL</span> <span class="o">&lt;&lt;</span> <span class="mi">20</span><span class="p">);</span>  <span class="cm">/* 1 MB */</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="o">&amp;</span><span class="n">big</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">big</span><span class="p">),</span> <span class="n">h</span><span class="p">);</span>
    <span class="n">free</span><span class="p">(</span><span class="n">big</span><span class="p">);</span>
</code></pre></div></div>

<p>Finally the stack itself is often mapped at a random address, or at
least started with a random gap, so that local variable addresses are
also randomized.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kt">void</span> <span class="o">*</span><span class="n">ptr</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">ptr</span><span class="p">;</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="o">&amp;</span><span class="n">ptr</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">ptr</span><span class="p">),</span> <span class="n">h</span><span class="p">);</span>
</code></pre></div></div>

<h4 id="time-sources">Time sources</h4>

<p>We haven’t used <code class="language-plaintext highlighter-rouge">time(3)</code> yet! Let’s still do that, using the full
width of <code class="language-plaintext highlighter-rouge">time_t</code> this time around:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kt">time_t</span> <span class="n">t</span> <span class="o">=</span> <span class="n">time</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="o">&amp;</span><span class="n">t</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">t</span><span class="p">),</span> <span class="n">h</span><span class="p">);</span>
</code></pre></div></div>

<p>We do have another time source to consider: <code class="language-plaintext highlighter-rouge">clock(3)</code>. It returns an
approximation of the processor time used by the program. There’s a
tiny bit of noise and inconsistency between repeated calls. We can use
this to extract a little bit of entropy over many repeated calls.</p>

<p>Naively we might try to use it like this:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="cm">/* Note: don't use this */</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">1000</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">clock_t</span> <span class="n">c</span> <span class="o">=</span> <span class="n">clock</span><span class="p">();</span>
        <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="o">&amp;</span><span class="n">c</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">c</span><span class="p">),</span> <span class="n">h</span><span class="p">);</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>The problem is that the resolution for <code class="language-plaintext highlighter-rouge">clock()</code> is typically rough
enough that modern computers can execute multiple instructions between
ticks. On Windows, where <code class="language-plaintext highlighter-rouge">CLOCKS_PER_SEC</code> is low, that entire loop
will typically complete before the result from <code class="language-plaintext highlighter-rouge">clock()</code> increments
even once. With that arrangement we’re hardly getting anything from
it! So here’s a better version:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">1000</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">counter</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
        <span class="kt">clock_t</span> <span class="n">start</span> <span class="o">=</span> <span class="n">clock</span><span class="p">();</span>
        <span class="k">while</span> <span class="p">(</span><span class="n">clock</span><span class="p">()</span> <span class="o">==</span> <span class="n">start</span><span class="p">)</span>
            <span class="n">counter</span><span class="o">++</span><span class="p">;</span>
        <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="o">&amp;</span><span class="n">start</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">start</span><span class="p">),</span> <span class="n">h</span><span class="p">);</span>
        <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="o">&amp;</span><span class="n">counter</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">counter</span><span class="p">),</span> <span class="n">h</span><span class="p">);</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>The counter makes the resolution of the clock no longer important. If
it’s low resolution, then we’ll get lots of noise from the counter. If
it’s high resolution, then we get noise from the clock value itself.
Running the hash function an extra time between overall <code class="language-plaintext highlighter-rouge">clock(3)</code>
samples also helps with noise.</p>

<h4 id="a-legitimate-use-of-tmpnam3">A legitimate use of tmpnam(3)</h4>

<p>We’ve got one more source of entropy available: <code class="language-plaintext highlighter-rouge">tmpnam(3)</code>. This
function generates a unique, temporary file name. It’s dangerous to
use as intended because it doesn’t actually create the file. There’s a
race between generating the name for the file and actually creating
it.</p>

<p>Fortunately we don’t actually care about the name as a filename. We’re
using this to sample entropy not directly available to us. In attempt to
get a unique name, the standard C library draws on its own sources of
entropy.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="n">L_tmpnam</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="mi">0</span><span class="p">};</span>
    <span class="n">tmpnam</span><span class="p">(</span><span class="n">buf</span><span class="p">);</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">buf</span><span class="p">),</span> <span class="n">h</span><span class="p">);</span>
</code></pre></div></div>

<p>The rather unfortunately downside is that lots of modern systems produce
a <em>linker</em> warning when it sees <code class="language-plaintext highlighter-rouge">tmpnam(3)</code> being linked, even though in
this case it’s completely harmless.</p>

<p>So what goes into a temporary filename? It depends on the
implementation.</p>

<h5 id="glibc-and-musl">glibc and musl</h5>

<p>Both get a high resolution timestamp and generate the filename directly
from the timestamp (no hashing, etc.). Unfortunately glibc does a very
poor job of also mixing <code class="language-plaintext highlighter-rouge">getpid(2)</code> into the timestamp before using it,
and probably makes things worse by doing so.</p>

<p>On these platforms, this is is a way to sample a high resolution
timestamp without calling anything non-standard.</p>

<h5 id="dietlibc">dietlibc</h5>

<p>In the latest release as of this writing it uses <code class="language-plaintext highlighter-rouge">rand(3)</code>, which makes
this useless. It’s also a bug since the C library isn’t allowed to
affect the state of <code class="language-plaintext highlighter-rouge">rand(3)</code> outside of <code class="language-plaintext highlighter-rouge">rand(3)</code> and <code class="language-plaintext highlighter-rouge">srand(3)</code>. I
submitted a bug report and this has <a href="https://github.com/ensc/dietlibc/commit/8c8df9579962dc7449fe1f3205fd19eec461aa23">since been fixed</a>.</p>

<p>In the next release it will use a generator seeded by the <a href="https://lwn.net/Articles/301798/">ELF
<code class="language-plaintext highlighter-rouge">AT_RANDOM</code></a> value if available, or ASLR otherwise. This makes
it moderately useful.</p>

<h5 id="libiberty">libiberty</h5>

<p>Generated from <code class="language-plaintext highlighter-rouge">getpid(2)</code> alone, with a counter to handle multiple
calls. It’s basically a way to sample the process ID without actually
calling <code class="language-plaintext highlighter-rouge">getpid(2)</code>.</p>

<h5 id="bsd-libc--bionic-android">BSD libc / Bionic (Android)</h5>

<p>Actually gathers real entropy from the operating system (via
<code class="language-plaintext highlighter-rouge">arc4random(2)</code>), which means we’re getting a lot of mileage out of this
one.</p>

<h5 id="uclibc">uclibc</h5>

<p>Its implementation is obviously forked from glibc. However, it first
tries to read entropy from <code class="language-plaintext highlighter-rouge">/dev/urandom</code>, and only if that fails does
it fallback to glibc’s original high resolution clock XOR <code class="language-plaintext highlighter-rouge">getpid(2)</code>
method (still not hashing it).</p>

<h4 id="finishing-touches">Finishing touches</h4>

<p>Finally, still use <code class="language-plaintext highlighter-rouge">/dev/urandom</code> if it’s available. This doesn’t
require us to trust that the output is anything useful since it’s just
being mixed into the other inputs.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kt">char</span> <span class="n">rnd</span><span class="p">[</span><span class="mi">4</span><span class="p">];</span>
    <span class="kt">FILE</span> <span class="o">*</span><span class="n">f</span> <span class="o">=</span> <span class="n">fopen</span><span class="p">(</span><span class="s">"/dev/urandom"</span><span class="p">,</span> <span class="s">"rb"</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">fread</span><span class="p">(</span><span class="n">rnd</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">rnd</span><span class="p">),</span> <span class="mi">1</span><span class="p">,</span> <span class="n">f</span><span class="p">))</span>
            <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="n">rnd</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">rnd</span><span class="p">),</span> <span class="n">h</span><span class="p">);</span>
        <span class="n">fclose</span><span class="p">(</span><span class="n">f</span><span class="p">);</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>When we’re all done gathering entropy, set the seed from the result.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">srand</span><span class="p">(</span><span class="n">h</span><span class="p">);</span>   <span class="cm">/* or whatever you're seeding */</span>
</code></pre></div></div>

<p>That’s bound to find <em>some</em> entropy on just about any host. Though
definitely don’t rely on the results for cryptography.</p>

<h3 id="lua">Lua</h3>

<p>I recently tackled this problem in Lua. It has a no-batteries-included
design, demanding very little of its host platform: nothing more than an
ANSI C implementation. Because of this, a Lua program has even fewer
options for gathering entropy than C. But it’s still not impossible!</p>

<p>To further complicate things, Lua code is often run in a sandbox with
some features removed. For example, Lua has <code class="language-plaintext highlighter-rouge">os.time()</code> and <code class="language-plaintext highlighter-rouge">os.clock()</code>
wrapping the C equivalents, allowing for the same sorts of entropy
sampling. When run in a sandbox, <code class="language-plaintext highlighter-rouge">os</code> might not be available. Similarly,
<code class="language-plaintext highlighter-rouge">io</code> might not be available for accessing <code class="language-plaintext highlighter-rouge">/dev/urandom</code>.</p>

<p>Have you ever printed a table, though? Or a function? It evaluates to
a string containing the object’s address.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ lua -e 'print(math)'
table: 0x559577668a30
$ lua -e 'print(math)'
table: 0x55e4a3679a30
</code></pre></div></div>

<p>Since the raw pointer values are leaked to Lua, we can skim allocator
entropy like before. Here’s the same hash function in Lua 5.3:</p>

<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">local</span> <span class="k">function</span> <span class="nf">hash32s</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="n">h</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="o">#</span><span class="n">buf</span> <span class="k">do</span>
        <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="o">*</span> <span class="mi">31</span> <span class="o">+</span> <span class="n">buf</span><span class="p">:</span><span class="n">byte</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
    <span class="k">end</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="err">&amp;</span> <span class="mh">0xffffffff</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="err">~</span> <span class="p">(</span><span class="n">h</span> <span class="o">&gt;&gt;</span> <span class="mi">17</span><span class="p">)</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="o">*</span> <span class="mh">0xed5ad4bb</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="err">&amp;</span> <span class="mh">0xffffffff</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="err">~</span> <span class="p">(</span><span class="n">h</span> <span class="o">&gt;&gt;</span> <span class="mi">11</span><span class="p">)</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="o">*</span> <span class="mh">0xac4c1b51</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="err">&amp;</span> <span class="mh">0xffffffff</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="err">~</span> <span class="p">(</span><span class="n">h</span> <span class="o">&gt;&gt;</span> <span class="mi">15</span><span class="p">)</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="o">*</span> <span class="mh">0x31848bab</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="err">&amp;</span> <span class="mh">0xffffffff</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">h</span> <span class="err">~</span> <span class="p">(</span><span class="n">h</span> <span class="o">&gt;&gt;</span> <span class="mi">14</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">h</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Now hash a bunch of pointers in the global environment:</p>

<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">local</span> <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">({},</span> <span class="mi">0</span><span class="p">)</span>  <span class="c1">-- hash a new table</span>
<span class="k">for</span> <span class="n">varname</span><span class="p">,</span> <span class="n">value</span> <span class="k">in</span> <span class="nb">pairs</span><span class="p">(</span><span class="n">_G</span><span class="p">)</span> <span class="k">do</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="n">varname</span><span class="p">,</span> <span class="n">h</span><span class="p">)</span>
    <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="nb">tostring</span><span class="p">(</span><span class="n">value</span><span class="p">),</span> <span class="n">h</span><span class="p">)</span>
    <span class="k">if</span> <span class="nb">type</span><span class="p">(</span><span class="n">value</span><span class="p">)</span> <span class="o">==</span> <span class="s1">'table'</span> <span class="k">then</span>
        <span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span> <span class="k">in</span> <span class="nb">pairs</span><span class="p">(</span><span class="n">value</span><span class="p">)</span> <span class="k">do</span>
            <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="nb">tostring</span><span class="p">(</span><span class="n">k</span><span class="p">),</span> <span class="n">h</span><span class="p">)</span>
            <span class="n">h</span> <span class="o">=</span> <span class="n">hash32s</span><span class="p">(</span><span class="nb">tostring</span><span class="p">(</span><span class="n">v</span><span class="p">),</span> <span class="n">h</span><span class="p">)</span>
        <span class="k">end</span>
    <span class="k">end</span>
<span class="k">end</span>

<span class="nb">math.randomseed</span><span class="p">(</span><span class="n">h</span><span class="p">)</span>
</code></pre></div></div>

<p>Unfortunately this doesn’t actually work well on one platform I tested:
Cygwin. Cygwin has few security features, notably lacking ASLR, and
having a largely deterministic allocator.</p>

<h3 id="when-to-use-it">When to use it</h3>

<p>In practice it’s not really necessary to use these sorts of tricks of
gathering entropy from odd places. It’s something that comes up more
in coding challenges and exercises than in real programs. I’m probably
already making platform-specific calls in programs substantial enough
to need it anyway.</p>

<p>On a few occasions I have thought about these things when debugging.
ASLR makes return pointers on the stack slightly randomized on each
run, which can change the behavior of some kinds of bugs. Allocator
and stack randomization does similar things to most of your pointers.
GDB tries to disable some of these features during debugging, but it
doesn’t get everything.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Prospecting for Hash Functions</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2018/07/31/"/>
    <id>urn:uuid:e865266a-2896-30c5-3f7d-cfad767b1ae2</id>
    <updated>2018-07-31T22:32:45Z</updated>
    <category term="c"/><category term="crypto"/><category term="optimization"/>
    <content type="html">
      <![CDATA[<p><em>Update 2022</em>: <a href="https://github.com/skeeto/hash-prospector/issues/19">TheIronBorn has found even better permutations</a> using
a smarter technique. That thread completely eclipses my efforts in this
article.</p>

<p>I recently got an itch to design my own non-cryptographic integer hash
function. Firstly, I wanted to <a href="/blog/2017/09/15/">better understand</a> how hash
functions work, and the best way to learn is to do. For years I’d been
treating them like magic, shoving input into it and seeing
<a href="/blog/2018/02/07/">random-looking</a>, but deterministic, output come out the other
end. Just how is the avalanche effect achieved?</p>

<p>Secondly, could I apply my own particular strengths to craft a hash
function better than the handful of functions I could find online?
Especially the classic ones from <a href="https://gist.github.com/badboy/6267743">Thomas Wang</a> and <a href="http://burtleburtle.net/bob/hash/integer.html">Bob
Jenkins</a>. Instead of struggling with the mathematics, maybe I
could software engineer my way to victory, working from the advantage
of access to the excessive computational power of today.</p>

<p>Suppose, for example, I wrote tool to generate a <strong>random hash
function definition</strong>, then <strong>JIT compile it</strong> to a native function in
memory, then execute that function across various inputs to <strong>evaluate
its properties</strong>. My tool could rapidly repeat this process in a loop
until it stumbled upon an incredible hash function the world had never
seen. That’s what I actually did. I call it the <strong>Hash Prospector</strong>:</p>

<p><strong><a href="https://github.com/skeeto/hash-prospector">https://github.com/skeeto/hash-prospector</a></strong></p>

<p>It only works on x86-64 because it uses the same <a href="/blog/2015/03/19/">JIT compiling
technique I’ve discussed before</a>: allocate a page of memory, write
some machine instructions into it, set the page to executable, cast the
page pointer to a function pointer, then call the generated code through
the function pointer.</p>

<h3 id="generating-a-hash-function">Generating a hash function</h3>

<p>My focus is on integer hash functions: a function that accepts an
<em>n</em>-bit integer and returns an <em>n</em>-bit integer. One of the important
properties of an <em>integer</em> hash function is that it maps its inputs to
outputs 1:1. In other words, there are <strong>no collisions</strong>. If there’s a
collision, then some outputs aren’t possible, and the function isn’t
making efficient use of its entropy.</p>

<p>This is actually a lot easier than it sounds. As long as every <em>n</em>-bit
integer operation used in the hash function is <em>reversible</em>, then the
hash function has this property. An operation is reversible if, given
its output, you can unambiguously compute its input.</p>

<p>For example, XOR with a constant is trivially reversible: XOR the
output with the same constant to reverse it. Addition with a constant
is reversed by subtraction with the same constant. Since the integer
operations are modular arithmetic, modulo 2^n for <em>n</em>-bit integers,
multiplication by an <em>odd</em> number is reversible. Odd numbers are
coprime with the power-of-two modulus, so there is some <em>modular
multiplicative inverse</em> that reverses the operation.</p>

<p><a href="http://papa.bretmulvey.com/post/124027987928/hash-functions">Bret Mulvey’s hash function article</a> provides a convenient list
of some reversible operations available for constructing integer hash
functions. This list was the catalyst for my little project. Here are
the ones used by the hash prospector:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">x</span>  <span class="o">=</span> <span class="o">~</span><span class="n">x</span><span class="p">;</span>
<span class="n">x</span> <span class="o">^=</span> <span class="n">constant</span><span class="p">;</span>
<span class="n">x</span> <span class="o">*=</span> <span class="n">constant</span> <span class="o">|</span> <span class="mi">1</span><span class="p">;</span> <span class="c1">// e.g. only odd constants</span>
<span class="n">x</span> <span class="o">+=</span> <span class="n">constant</span><span class="p">;</span>
<span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="n">constant</span><span class="p">;</span>
<span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&lt;&lt;</span> <span class="n">constant</span><span class="p">;</span>
<span class="n">x</span> <span class="o">+=</span> <span class="n">x</span> <span class="o">&lt;&lt;</span> <span class="n">constant</span><span class="p">;</span>
<span class="n">x</span> <span class="o">-=</span> <span class="n">x</span> <span class="o">&lt;&lt;</span> <span class="n">constant</span><span class="p">;</span>
<span class="n">x</span> <span class="o">&lt;&lt;&lt;=</span> <span class="n">constant</span><span class="p">;</span> <span class="c1">// left rotation</span>
</code></pre></div></div>

<p>I’ve come across a couple more useful operations while studying existing
integer hash functions, but I didn’t put these in the prospector.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">hash</span> <span class="o">+=</span> <span class="o">~</span><span class="p">(</span><span class="n">hash</span> <span class="o">&lt;&lt;</span> <span class="n">constant</span><span class="p">);</span>
<span class="n">hash</span> <span class="o">-=</span> <span class="o">~</span><span class="p">(</span><span class="n">hash</span> <span class="o">&lt;&lt;</span> <span class="n">constant</span><span class="p">);</span>
</code></pre></div></div>

<p>The prospector picks some operations at random and fills in their
constants randomly within their proper constraints. For example,
here’s an awful hash function I made it generate as an example:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// do NOT use this!</span>
<span class="kt">uint32_t</span>
<span class="nf">badhash32</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">x</span> <span class="o">*=</span> <span class="mh">0x1eca7d79U</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">20</span><span class="p">;</span>
    <span class="n">x</span>  <span class="o">=</span> <span class="p">(</span><span class="n">x</span> <span class="o">&lt;&lt;</span> <span class="mi">8</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">24</span><span class="p">);</span>
    <span class="n">x</span>  <span class="o">=</span> <span class="o">~</span><span class="n">x</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&lt;&lt;</span> <span class="mi">5</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">+=</span> <span class="mh">0x10afe4e7U</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">x</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>That function is reversible, and it would be <a href="https://naml.us/post/inverse-of-a-hash-function/">relatively
straightforward</a> to <a href="http://c42f.github.io/2015/09/21/inverting-32-bit-wang-hash.html">define its inverse</a>. However, it has
awful biases and poor avalanche. How do I know this?</p>

<h3 id="the-measure-of-a-hash-function">The measure of a hash function</h3>

<p>There are two key properties I’m looking for in randomly generated hash
functions.</p>

<ol>
  <li>
    <p>High avalanche effect. When I flip one input bit, the output bits
should each flip with a 50% chance.</p>
  </li>
  <li>
    <p>Low bias. Ideally there is no correlation between which output bits
flip for a particular flipped input bit.</p>
  </li>
</ol>

<p>Initially I screwed up and only measured the first property. This lead
to some hash functions that <em>seemed</em> to be amazing before close
inspection, since, for a 32-bit hash function, it was flipping over 15
output bits on average. However, the particular bits being flipped
were heavily biased, resulting in obvious patterns in the output.</p>

<p>For example, when hashing a counter starting from zero, the high bits
would follow a regular pattern. 15 to 16 bits were being flipped each
time, but it was always the same bits.</p>

<p>Conveniently it’s easy to measure both properties at the same time. For
an <em>n</em>-bit integer hash function, create an <em>n</em> by <em>n</em> table initialized
to zero. The rows are input bits and the columns are output bits. The
<em>i</em>th row and <em>j</em>th column track the correlation between the <em>i</em>th input
bit and <em>j</em>th output bit.</p>

<p>Then exhaustively iterate over all 2^n inputs, and flip each bit one at
a time. Increment the appropriate element in the table if the output bit
flips.</p>

<p>When you’re done, ideally each element in the table is exactly 2^(n-1).
That is, each output bit was flipped exactly half the time by each input
bit. Therefore the <em>bias</em> of the hash function is the distance (the
error) of the computed table from the ideal table.</p>

<p>For example, the ideal bias table for an 8-bit hash function would be:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>128 128 128 128 128 128 128 128
128 128 128 128 128 128 128 128
128 128 128 128 128 128 128 128
128 128 128 128 128 128 128 128
128 128 128 128 128 128 128 128
128 128 128 128 128 128 128 128
128 128 128 128 128 128 128 128
128 128 128 128 128 128 128 128
</code></pre></div></div>

<p>The hash prospector computes the standard deviation in order to turn
this into a single, normalized measurement. Lower scores are better.</p>

<p>However, there’s still one problem: the input space for a 32-bit hash
function is over 4 billion values. The full test takes my computer about
an hour and a half. Evaluating a 64-bit hash function is right out.</p>

<p>Again, <a href="/blog/2017/09/21/">Monte Carlo to the rescue</a>! Rather than sample the entire
space, just sample a random subset. This provides a good estimate in
less than a second, allowing lots of terrible hash functions to be
discarded early. The full test can be saved only for the known good
32-bit candidates. 64-bit functions will only ever receive the estimate.</p>

<h3 id="what-did-i-find">What did I find?</h3>

<p>Once I got the bias issue sorted out, and after hours and hours of
running, followed up with some manual tweaking on my part, the
<strong>prospector stumbled across this little gem</strong>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// DO use this one!</span>
<span class="kt">uint32_t</span>
<span class="nf">prospector32</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">15</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">*=</span> <span class="mh">0x2c1b3c6dU</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">12</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">*=</span> <span class="mh">0x297a2d39U</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">15</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">x</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>According to a full (e.g. not estimated) bias evaluation, this function
beats <em>the snot</em> out of most of 32-bit hash functions I could find. It
even comes out ahead of this well known hash function that I <em>believe</em>
originates from the H2 SQL Database. (Update: Thomas Mueller has
confirmed that, indeed, this is his hash function.)</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span>
<span class="nf">hash32</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">x</span> <span class="o">=</span> <span class="p">((</span><span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">)</span> <span class="o">^</span> <span class="n">x</span><span class="p">)</span> <span class="o">*</span> <span class="mh">0x45d9f3bU</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">=</span> <span class="p">((</span><span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">)</span> <span class="o">^</span> <span class="n">x</span><span class="p">)</span> <span class="o">*</span> <span class="mh">0x45d9f3bU</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">)</span> <span class="o">^</span> <span class="n">x</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">x</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>It’s still an excellent hash function, just slightly more biased than
mine.</p>

<p>Very briefly, <code class="language-plaintext highlighter-rouge">prospector32()</code> was the best 32-bit hash function I could
find, and I thought I had a major breakthrough. Then I noticed the
finalizer function for <a href="https://en.wikipedia.org/wiki/MurmurHash#Algorithm">the 32-bit variant of MurmurHash3</a>. It’s
also a 32-bit hash function:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span>
<span class="nf">murmurhash32_mix32</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">*=</span> <span class="mh">0x85ebca6bU</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">13</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">*=</span> <span class="mh">0xc2b2ae35U</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">x</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This one is just <em>barely</em> less biased than mine. So I still haven’t
discovered the best 32-bit hash function, only the <em>second</em> best one.
:-)</p>

<h3 id="a-pattern-emerges">A pattern emerges</h3>

<p>If you’re paying close enough attention, you may have noticed that all
three functions above have the same structure. The prospector had
stumbled upon it all on its own without knowledge of the existing
functions. It may not be so obvious for the second function, but here it
is refactored:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span>
<span class="nf">hash32</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">*=</span> <span class="mh">0x45d9f3bU</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">*=</span> <span class="mh">0x45d9f3bU</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">x</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I hadn’t noticed this until after the prospector had come across it on
its own. The pattern for all three is XOR-right-shift, multiply,
XOR-right-shift, multiply, XOR-right-shift. There’s something
particularly useful about this <a href="http://www.pcg-random.org/posts/developing-a-seed_seq-alternative.html#multiplyxorshift">multiply-xorshift construction</a>
(<a href="http://ticki.github.io/blog/designing-a-good-non-cryptographic-hash-function/#designing-a-diffusion-function--by-example">also</a>). The XOR-right-shift diffuses bits rightward and the
multiply diffuses bits leftward. I like to think it’s “sloshing” the
bits right, left, right, left.</p>

<p>It seems that multiplication is particularly good at diffusion, so it
makes perfect sense to exploit it in non-cryptographic hash functions,
especially since modern CPUs are so fast at it. Despite this, it’s not
used much in cryptography due to <a href="http://cr.yp.to/snuffle/design.pdf">issues with completing it in constant
time</a>.</p>

<p>I like to think of this construction in terms of a five-tuple. For the
three functions it’s the following:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(15, 0x2c1b3c6d, 12, 0x297a2d39, 15)  // prospector32()
(16, 0x045d9f3b, 16, 0x045d9f3b, 16)  // hash32()
(16, 0x85ebca6b, 13, 0xc2b2ae35, 16)  // murmurhash32_mix32()
</code></pre></div></div>

<p>The prospector actually found lots of decent functions following this
pattern, especially where the middle shift is smaller than the outer
shift. Thinking of it in terms of this tuple, I specifically directed
it to try different tuple constants. That’s what I meant by
“tweaking.” Eventually my new function popped out with its really low
bias.</p>

<p>The prospector has a template option (<code class="language-plaintext highlighter-rouge">-p</code>) if you want to try it
yourself:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./prospector -p xorr,mul,xorr,mul,xorr
</code></pre></div></div>

<p>If you really have your heart set on certain constants, such as my
specific selection of shifts, you can lock those in while randomizing
the other constants:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./prospector -p xorr:15,mul,xorr:12,mul,xorr:15
</code></pre></div></div>

<p>Or the other way around:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./prospector -p xorr,mul:2c1b3c6d,xorr,mul:297a2d39,xorr
</code></pre></div></div>

<p>My function seems a little strange using shifts of 15 bits rather than
a nice, round 16 bits. However, changing those constants to 16
increases the bias. Similarly, neither of the two 32-bit constants is
a prime number, but <strong>nudging those constants to the nearest prime
increases the bias</strong>. These parameters really do seem to be a local
minima in the bias, and using prime numbers isn’t important.</p>

<h3 id="what-about-64-bit-integer-hash-functions">What about 64-bit integer hash functions?</h3>

<p>So far I haven’t been able to improve on 64-bit hash functions. The main
function to beat is SplittableRandom / <a href="http://xoshiro.di.unimi.it/splitmix64.c">SplitMix64</a>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint64_t</span>
<span class="nf">splittable64</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">30</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">*=</span> <span class="mh">0xbf58476d1ce4e5b9U</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">27</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">*=</span> <span class="mh">0x94d049bb133111ebU</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">31</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">x</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Here’s its inverse since it’s sometimes useful:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint64_t</span>
<span class="nf">splittable64_r</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">31</span> <span class="o">^</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">62</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">*=</span> <span class="mh">0x319642b2d24d8ec3U</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">27</span> <span class="o">^</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">54</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">*=</span> <span class="mh">0x96de1b173f119089U</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">30</span> <span class="o">^</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">60</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">x</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I also came across <a href="https://gist.github.com/degski/6e2069d6035ae04d5d6f64981c995ec2">this function</a>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint64_t</span>
<span class="nf">hash64</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">32</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">*=</span> <span class="mh">0xd6e8feb86659fd93U</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">32</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">*=</span> <span class="mh">0xd6e8feb86659fd93U</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">32</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">x</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Again, these follow the same construction as before. There really is
something special about it, and many other people have noticed, too.</p>

<p>Both functions have about the same bias. (Remember, I can only estimate
the bias for 64-bit hash functions.) The prospector has found lots of
functions with about the same bias, but nothing provably better. Until
it does, I have no new 64-bit integer hash functions to offer.</p>

<h3 id="beyond-random-search">Beyond random search</h3>

<p>Right now the prospector does a completely random, unstructured search
hoping to stumble upon something good by chance. Perhaps it would be
worth using a genetic algorithm to breed those 5-tuples towards
optimum? Others have had <a href="https://zimbry.blogspot.com/2011/09/better-bit-mixing-improving-on.html">success in this area with simulated
annealing</a>.</p>

<p>There’s probably more to exploit from the multiply-xorshift construction
that keeps popping up. If anything, the prospector is searching too
broadly, looking at constructions that could never really compete no
matter what the constants. In addition to everything above, I’ve been
looking for good 32-bit hash functions that don’t use any 32-bit
constants, but I’m really not finding any with a competitively low bias.</p>

<h3 id="update-after-one-week">Update after one week</h3>

<p>About one week after publishing this article I found an even better hash
function. I believe <strong>this is the least biased 32-bit integer hash
function <em>of this form</em> ever devised</strong>. It’s even less biased than the
MurmurHash3 finalizer.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// exact bias: 0.17353355999581582</span>
<span class="kt">uint32_t</span>
<span class="nf">lowbias32</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">*=</span> <span class="mh">0x7feb352dU</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">15</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">*=</span> <span class="mh">0x846ca68bU</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">x</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// inverse</span>
<span class="kt">uint32_t</span>
<span class="nf">lowbias32_r</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">*=</span> <span class="mh">0x43021123U</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">15</span> <span class="o">^</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">30</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">*=</span> <span class="mh">0x1d69e2a5U</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">x</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>If you’re willing to use an additional round of multiply-xorshift, this
next function actually reaches the theoretical bias limit (bias =
~0.021) as exhibited by a perfect integer hash function:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// exact bias: 0.020888578919738908</span>
<span class="kt">uint32_t</span>
<span class="nf">triple32</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">17</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">*=</span> <span class="mh">0xed5ad4bbU</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">11</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">*=</span> <span class="mh">0xac4c1b51U</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">15</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">*=</span> <span class="mh">0x31848babU</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">14</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">x</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p><del>It’s statistically indistinguishable from a random permutation of all
32-bit integers.</del>(<em>Update 2025</em>: Peter Schmidt-Nielsen has provided a
second-order characteristic test that quickly identifies statistically
significant biases in <code class="language-plaintext highlighter-rouge">triple32</code>.)</p>

<h3 id="update-february-2020">Update, February 2020</h3>

<p>Some people have been experimenting with using my hash functions in GLSL
shaders, and the results are looking good:</p>

<ul>
  <li><a href="https://www.shadertoy.com/view/WttXWX">https://www.shadertoy.com/view/WttXWX</a></li>
  <li><a href="https://www.shadertoy.com/view/ttVGDV">https://www.shadertoy.com/view/ttVGDV</a></li>
</ul>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Inspiration from Data-dependent Rotations</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2018/02/07/"/>
    <id>urn:uuid:917b72f1-3aad-3af3-075a-a4b0a6eb8a4d</id>
    <updated>2018-02-07T23:59:59Z</updated>
    <category term="c"/><category term="crypto"/><category term="meta"/>
    <content type="html">
      <![CDATA[<p>This article is an expanded email I wrote in response to a question
from Frank Muller. He asked how I arrived at my solution to a
<a href="/blog/2017/10/06/">branchless UTF-8 decoder</a>:</p>

<blockquote>
  <p>I mean, when you started, I’m pretty the initial solution was using
branches, right? Then, you’ve applied some techniques to eliminate
them.</p>
</blockquote>

<p>A bottom-up approach that begins with branches and then proceeds to
eliminate them one at a time sounds like a plausible story. However,
this story is the inverse of how it actually played out. It began when I
noticed a branchless decoder could probably be done, then I put together
the pieces one at a time without introducing any branches. But what
sparked that initial idea?</p>

<p>The two prior posts reveal my train of thought at the time: <a href="/blog/2017/09/15/">a look at
the Blowfish cipher</a> and <a href="/blog/2017/09/21/">a 64-bit PRNG shootout</a>. My
layman’s study of Blowfish was actually part of an examination of a
number of different block ciphers. For example, I also read the NSA’s
<a href="http://eprint.iacr.org/2013/404.pdf">Speck and Simon paper</a> and then <a href="https://github.com/skeeto/scratch/tree/master/speck">implemented the 128/128 variant
of Speck</a> — a 128-bit key and 128-bit block. I didn’t take the
time to write an article about it, but note how the entire cipher —
key schedule, encryption, and decryption — is just 40 lines of code:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">speck</span> <span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">k</span><span class="p">[</span><span class="mi">32</span><span class="p">];</span>
<span class="p">};</span>

<span class="kt">void</span>
<span class="nf">speck_init</span><span class="p">(</span><span class="k">struct</span> <span class="n">speck</span> <span class="o">*</span><span class="n">ctx</span><span class="p">,</span> <span class="kt">uint64_t</span> <span class="n">x</span><span class="p">,</span> <span class="kt">uint64_t</span> <span class="n">y</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">k</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">y</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">uint64_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">31</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">x</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">8</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">x</span> <span class="o">&lt;&lt;</span> <span class="mi">56</span><span class="p">);</span>
        <span class="n">x</span> <span class="o">+=</span> <span class="n">y</span><span class="p">;</span>
        <span class="n">x</span> <span class="o">^=</span> <span class="n">i</span><span class="p">;</span>
        <span class="n">y</span> <span class="o">=</span> <span class="p">(</span><span class="n">y</span> <span class="o">&lt;&lt;</span> <span class="mi">3</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">y</span> <span class="o">&gt;&gt;</span> <span class="mi">61</span><span class="p">);</span>
        <span class="n">y</span> <span class="o">^=</span> <span class="n">x</span><span class="p">;</span>
        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">k</span><span class="p">[</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">y</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="kt">void</span>
<span class="nf">speck_encrypt</span><span class="p">(</span><span class="k">const</span> <span class="k">struct</span> <span class="n">speck</span> <span class="o">*</span><span class="n">ctx</span><span class="p">,</span> <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">x</span><span class="p">,</span> <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">y</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">32</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="o">*</span><span class="n">x</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">8</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="o">*</span><span class="n">x</span> <span class="o">&lt;&lt;</span> <span class="mi">56</span><span class="p">);</span>
        <span class="o">*</span><span class="n">x</span> <span class="o">+=</span> <span class="o">*</span><span class="n">y</span><span class="p">;</span>
        <span class="o">*</span><span class="n">x</span> <span class="o">^=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">k</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
        <span class="o">*</span><span class="n">y</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">y</span> <span class="o">&lt;&lt;</span> <span class="mi">3</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="o">*</span><span class="n">y</span> <span class="o">&gt;&gt;</span> <span class="mi">61</span><span class="p">);</span>
        <span class="o">*</span><span class="n">y</span> <span class="o">^=</span> <span class="o">*</span><span class="n">x</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kt">void</span>
<span class="nf">speck_decrypt</span><span class="p">(</span><span class="k">const</span> <span class="k">struct</span> <span class="n">speck</span> <span class="o">*</span><span class="n">ctx</span><span class="p">,</span> <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">x</span><span class="p">,</span> <span class="kt">uint64_t</span> <span class="o">*</span><span class="n">y</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">32</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="o">*</span><span class="n">y</span> <span class="o">^=</span> <span class="o">*</span><span class="n">x</span><span class="p">;</span>
        <span class="o">*</span><span class="n">y</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">y</span> <span class="o">&gt;&gt;</span> <span class="mi">3</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="o">*</span><span class="n">y</span> <span class="o">&lt;&lt;</span> <span class="mi">61</span><span class="p">);</span>
        <span class="o">*</span><span class="n">x</span> <span class="o">^=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">k</span><span class="p">[</span><span class="mi">31</span> <span class="o">-</span> <span class="n">i</span><span class="p">];</span>
        <span class="o">*</span><span class="n">x</span> <span class="o">-=</span> <span class="o">*</span><span class="n">y</span><span class="p">;</span>
        <span class="o">*</span><span class="n">x</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">x</span> <span class="o">&lt;&lt;</span> <span class="mi">8</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="o">*</span><span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">56</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Isn’t that just beautiful? It’s so tiny and fast. Other than the
not-very-arbitrary selection of 32 rounds, and the use of 3-bit and
8-bit rotations, there are no magic values. One could fairly
reasonably commit this cipher to memory if necessary, similar to the
late RC4. Speck is probably my favorite block cipher right now,
<em>except</em> that I couldn’t figure out the key schedule for any of the
other key/block size variants.</p>

<p>Another cipher I studied, though in less depth, was <a href="http://people.csail.mit.edu/rivest/Rivest-rc5rev.pdf">RC5</a> (1994),
a block cipher by (obviously) Ron Rivest. The most novel part of RC5
is its use of data dependent rotations. This was a very deliberate
decision, and the paper makes this clear:</p>

<blockquote>
  <p>RC5 should <em>highlight the use of data-dependent rotations</em>, and
encourage the assessment of the cryptographic strength data-dependent
rotations can provide.</p>
</blockquote>

<p>What’s a data-dependent rotation. In the Speck cipher shown above,
notice how the right-hand side of all the rotation operations is a
constant (3, 8, 56, and 61). Suppose that these operands were not
constant, instead they were based on some part of the value of the
block:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kt">int</span> <span class="n">r</span> <span class="o">=</span> <span class="o">*</span><span class="n">y</span> <span class="o">&amp;</span> <span class="mh">0x0f</span><span class="p">;</span>
    <span class="o">*</span><span class="n">x</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="n">r</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="o">*</span><span class="n">x</span> <span class="o">&lt;&lt;</span> <span class="p">(</span><span class="mi">64</span> <span class="o">-</span> <span class="n">r</span><span class="p">));</span>
</code></pre></div></div>

<p>Such “random” rotations “frustrate differential cryptanalysis” according
to the paper, increasing the strength of the cipher.</p>

<p>Another algorithm that uses data-dependent shift is the <a href="http://www.pcg-random.org/">PCG family of
PRNGs</a>. Honestly, the data-dependent “permutation” shift is <em>the</em>
defining characteristic of PCG. As a reminder, here’s the simplified PCG
from my shootout:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span>
<span class="nf">spcg32</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">m</span> <span class="o">=</span> <span class="mh">0x9b60933458e17d7d</span><span class="p">;</span>
    <span class="kt">uint64_t</span> <span class="n">a</span> <span class="o">=</span> <span class="mh">0xd737232eeccdf7ed</span><span class="p">;</span>
    <span class="o">*</span><span class="n">s</span> <span class="o">=</span> <span class="o">*</span><span class="n">s</span> <span class="o">*</span> <span class="n">m</span> <span class="o">+</span> <span class="n">a</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">shift</span> <span class="o">=</span> <span class="mi">29</span> <span class="o">-</span> <span class="p">(</span><span class="o">*</span><span class="n">s</span> <span class="o">&gt;&gt;</span> <span class="mi">61</span><span class="p">);</span>
    <span class="k">return</span> <span class="o">*</span><span class="n">s</span> <span class="o">&gt;&gt;</span> <span class="n">shift</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Notice how the final shift depends on the high order bits of the PRNG
state. (This one weird trick from Melissa O’Neil will significantly
improve your PRNG. Xorshift experts hate her.)</p>

<p>I think this raises a really interesting question: Why did it take until
2014 for someone to apply a data-dependent shift to a PRNG? Similarly,
why are <a href="https://crypto.stackexchange.com/q/20325">data-dependent rotations not used in many ciphers</a>?</p>

<p>My own theory is that this is because many older instruction set
architectures can’t perform data-dependent shift operations efficiently.</p>

<p>Many instruction sets only have a fixed shift (e.g. 1-bit), or can
only shift by an immediate (e.g. a constant). In these cases, a
data-dependent shift would require a loop. These loops would be a ripe
source of side channel attacks in ciphers due to the difficultly of
making them operate in constant time. It would also be relatively slow
for video game PRNGs, which often needed to run on constrained
hardware with limited instruction sets. For example, the 6502 (Atari,
Apple II, NES, Commodore 64) and the Z80 (too many to list) can only
shift/rotate one bit at a time.</p>

<p>Even on an architecture with an instruction for data-dependent shifts,
such as the x86, those shifts will be slower than constant shifts, at
least in part due to the additional data dependency.</p>

<p>It turns out there are also some patent issues (ex. <a href="https://www.google.com/patents/US5724428">1</a>, <a href="https://www.google.com/patents/US6269163">2</a>).
Fortunately most of these patents have now expired, and one in
particular is set to expire this June. I still like my theory better.</p>

<h3 id="to-branchless-decoding">To branchless decoding</h3>

<p>So I was thinking about data-dependent shifts, and I had also noticed I
could trivially check the length of a UTF-8 code point using a small
lookup table — the first step in my decoder. What if I combined these: a
data-dependent shift based on a table lookup. This would become the last
step in my decoder. The idea for a branchless UTF-8 decoder was
essentially borne out of connecting these two thoughts, and then filling
in the middle.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Finding the Best 64-bit Simulation PRNG</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2017/09/21/"/>
    <id>urn:uuid:637af55f-6e33-31e5-25fa-edb590a16d44</id>
    <updated>2017-09-21T21:25:00Z</updated>
    <category term="c"/><category term="compsci"/><category term="x86"/><category term="crypto"/><category term="optimization"/>
    <content type="html">
      <![CDATA[<p><strong>August 2018 Update</strong>: <em>xoroshiro128+ fails <a href="http://pracrand.sourceforge.net/">PractRand</a> very
badly. Since this article was published, its authors have supplanted it
with <strong>xoshiro256**</strong>. It has essentially the same performance, but
better statistical properties. xoshiro256** is now my preferred PRNG.</em></p>

<p>I use pseudo-random number generators (PRNGs) a whole lot. They’re an
essential component in lots of algorithms and processes.</p>

<ul>
  <li>
    <p><strong>Monte Carlo simulations</strong>, where PRNGs are used to <a href="https://possiblywrong.wordpress.com/2015/09/15/kanoodle-iq-fit-and-dancing-links/">compute
numeric estimates</a> for problems that are difficult or impossible
to solve analytically.</p>
  </li>
  <li>
    <p><a href="/blog/2017/04/27/"><strong>Monte Carlo tree search AI</strong></a>, where massive numbers of games
are played out randomly in search of an optimal move. This is a
specific application of the last item.</p>
  </li>
  <li>
    <p><a href="https://github.com/skeeto/carpet-fractal-genetics"><strong>Genetic algorithms</strong></a>, where a PRNG creates the initial
population, and then later guides in mutation and breeding of selected
solutions.</p>
  </li>
  <li>
    <p><a href="https://blog.cr.yp.to/20140205-entropy.html"><strong>Cryptography</strong></a>, where a cryptographically-secure PRNGs
(CSPRNGs) produce output that is predictable for recipients who know
a particular secret, but not for anyone else. This article is only
concerned with plain PRNGs.</p>
  </li>
</ul>

<p>For the first three “simulation” uses, there are two primary factors
that drive the selection of a PRNG. These factors can be at odds with
each other:</p>

<ol>
  <li>
    <p>The PRNG should be <em>very</em> fast. The application should spend its
time running the actual algorithms, not generating random numbers.</p>
  </li>
  <li>
    <p>PRNG output should have robust statistical qualities. Bits should
appear to be independent and the output should closely follow the
desired distribution. Poor quality output will negatively effect
the algorithms using it. Also just as important is <a href="http://mumble.net/~campbell/2014/04/28/uniform-random-float">how you use
it</a>, but this article will focus only on generating bits.</p>
  </li>
</ol>

<p>In other situations, such as in cryptography or online gambling,
another important property is that an observer can’t learn anything
meaningful about the PRNG’s internal state from its output. For the
three simulation cases I care about, this is not a concern. Only speed
and quality properties matter.</p>

<p>Depending on the programming language, the PRNGs found in various
standard libraries may be of dubious quality. They’re slower than they
need to be, or have poorer quality than required. In some cases, such
as <code class="language-plaintext highlighter-rouge">rand()</code> in C, the algorithm isn’t specified, and you can’t rely on
it for anything outside of trivial examples. In other cases the
algorithm and behavior <em>is</em> specified, but you could easily do better
yourself.</p>

<p>My preference is to BYOPRNG: <em>Bring Your Own Pseudo-random Number
Generator</em>. You get reliable, identical output everywhere. Also, in
the case of C and C++ — and if you do it right — by embedding the PRNG
in your project, it will get inlined and unrolled, making it far more
efficient than a <a href="/blog/2016/10/27/">slow call into a dynamic library</a>.</p>

<p>A fast PRNG is going to be small, making it a great candidate for
embedding as, say, a header library. That leaves just one important
question, “Can the PRNG be small <em>and</em> have high quality output?” In
the 21st century, the answer to this question is an emphatic “yes!”</p>

<p>For the past few years my main go to for a drop-in PRNG has been
<a href="https://en.wikipedia.org/wiki/Xorshift">xorshift*</a>. The body of the function is 6 lines of C, and its
entire state is a 64-bit integer, directly seeded. However, there are a
number of choices here, including other variants of Xorshift. How do I
know which one is best? The only way to know is to test it, hence my
64-bit PRNG shootout:</p>

<ul>
  <li><a href="https://github.com/skeeto/prng64-shootout"><strong>64-bit PRNG Shootout</strong></a></li>
</ul>

<p>Sure, there <a href="http://xoroshiro.di.unimi.it/">are other such shootouts</a>, but they’re all missing
something I want to measure. I also want to test in an environment very
close to how I’d use these PRNGs myself.</p>

<h3 id="shootout-results">Shootout results</h3>

<p>Before getting into the details of the benchmark and each generator,
here are the results. These tests were run on an i7-6700 (Skylake)
running Linux 4.9.0.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>                               Speed (MB/s)
PRNG           FAIL  WEAK  gcc-6.3.0 clang-3.8.1
------------------------------------------------
baseline          X     X      15000       13100
blowfishcbc16     0     1        169         157
blowfishcbc4      0     5        725         676
blowfishctr16     1     3        187         184
blowfishctr4      1     5        890        1000
mt64              1     7       1700        1970
pcg64             0     4       4150        3290
rc4               0     5        366         185
spcg64            0     8       5140        4960
xoroshiro128+     0     6       8100        7720
xorshift128+      0     2       7660        6530
xorshift64*       0     3       4990        5060
</code></pre></div></div>

<p><strong>The clear winner is <a href="http://xoroshiro.di.unimi.it/">xoroshiro128+</a></strong>, with a function body of
just 7 lines of C. It’s clearly the fastest, and the output had no
observed statistical failures. However, that’s not the whole story. A
couple of the other PRNGS have advantages that situationally makes
them better suited than xoroshiro128+. I’ll go over these in the
discussion below.</p>

<p>These two versions of GCC and Clang were chosen because these are the
latest available in Debian 9 “Stretch.” It’s easy to build and run the
benchmark yourself if you want to try a different version.</p>

<h3 id="speed-benchmark">Speed benchmark</h3>

<p>In the speed benchmark, the PRNG is initialized, a 1-second <code class="language-plaintext highlighter-rouge">alarm(1)</code>
is set, then the PRNG fills a large <code class="language-plaintext highlighter-rouge">volatile</code> buffer of 64-bit unsigned
integers again and again as quickly as possible until the alarm fires.
The amount of memory written is measured as the PRNG’s speed.</p>

<p>The baseline “PRNG” writes zeros into the buffer. This represents the
absolute speed limit that no PRNG can exceed.</p>

<p>The purpose for making the buffer <code class="language-plaintext highlighter-rouge">volatile</code> is to force the entire
output to actually be “consumed” as far as the compiler is concerned.
Otherwise the compiler plays nasty tricks to make the program do as
little work as possible. Another way to deal with this would be to
<code class="language-plaintext highlighter-rouge">write(2)</code> buffer, but of course I didn’t want to introduce
unnecessary I/O into a benchmark.</p>

<p>On Linux, SIGALRM was impressively consistent between runs, meaning it
was perfectly suitable for this benchmark. To account for any process
scheduling wonkiness, the bench mark was run 8 times and only the
fastest time was kept.</p>

<p>The SIGALRM handler sets a <code class="language-plaintext highlighter-rouge">volatile</code> global variable that tells the
generator to stop. The PRNG call was unrolled 8 times to avoid the
alarm check from significantly impacting the benchmark. You can see
the effect for yourself by changing <code class="language-plaintext highlighter-rouge">UNROLL</code> to 1 (i.e. “don’t
unroll”) in the code. Unrolling beyond 8 times had no measurable
effect to my tests.</p>

<p>Due to the PRNGs being inlined, this unrolling makes the benchmark
less realistic, and it shows in the results. Using <code class="language-plaintext highlighter-rouge">volatile</code> for the
buffer helped to counter this effect and reground the results. This is
a fuzzy problem, and there’s not really any way to avoid it, but I
will also discuss this below.</p>

<h3 id="statistical-benchmark">Statistical benchmark</h3>

<p>To measure the statistical quality of each PRNG — mostly as a sanity
check — the raw binary output was run through <a href="http://webhome.phy.duke.edu/~rgb/General/dieharder.php">dieharder</a> 3.31.1:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>prng | dieharder -g200 -a -m4
</code></pre></div></div>

<p>This statistical analysis has no timing characteristics and the
results should be the same everywhere. You would only need to re-run
it to test with a different version of dieharder, or a different
analysis tool.</p>

<p>There’s not much information to glean from this part of the shootout.
It mostly confirms that all of these PRNGs would work fine for
simulation purposes. The WEAK results are not very significant and is
only useful for breaking ties. Even a true RNG will get some WEAK
results. For example, the <a href="https://en.wikipedia.org/wiki/RdRand">x86 RDRAND</a> instruction (not
included in actual shootout) got 7 WEAK results in my tests.</p>

<p>The FAIL results are more significant, but a single failure doesn’t
mean much. A non-failing PRNG should be preferred to an otherwise
equal PRNG with a failure.</p>

<h3 id="individual-prngs">Individual PRNGs</h3>

<p>Admittedly the definition for “64-bit PRNG” is rather vague. My high
performance targets are all 64-bit platforms, so the highest PRNG
throughput will be built on 64-bit operations (<a href="/blog/2015/07/10/">if not wider</a>).
The original plan was to focus on PRNGs built from 64-bit operations.</p>

<p>Curiosity got the best of me, so I included some PRNGs that don’t use
<em>any</em> 64-bit operations. I just wanted to see how they stacked up.</p>

<h4 id="blowfish">Blowfish</h4>

<p>One of the reasons I <a href="/blog/2017/09/15/">wrote a Blowfish implementation</a> was to
evaluate its performance and statistical qualities, so naturally I
included it in the benchmark. It only uses 32-bit addition and 32-bit
XOR. It has a 64-bit block size, so it’s naturally producing a 64-bit
integer. There are two different properties that combine to make four
variants in the benchmark: number of rounds and block mode.</p>

<p>Blowfish normally uses 16 rounds. This makes it a lot slower than a
non-cryptographic PRNG but gives it a <em>security margin</em>. I don’t care
about the security margin, so I included a 4-round variant. At
expected, it’s about four times faster.</p>

<p>The other feature I tested is the block mode: <a href="https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#CBC">Cipher Block
Chaining</a> (CBC) versus <a href="https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_.28CTR.29">Counter</a> (CTR) mode. In CBC mode it
encrypts zeros as plaintext. This just means it’s encrypting its last
output. The ciphertext is the PRNG’s output.</p>

<p>In CTR mode the PRNG is encrypting a 64-bit counter. It’s 11% faster
than CBC in the 16-round variant and 23% faster in the 4-round variant.
The reason is simple, and it’s in part an artifact of unrolling the
generation loop in the benchmark.</p>

<p>In CBC mode, each output depends on the previous, but in CTR mode all
blocks are independent. Work can begin on the next output before the
previous output is complete. The x86 architecture uses out-of-order
execution to achieve many of its performance gains: Instructions may
be executed in a different order than they appear in the program,
though their observable effects must <a href="http://preshing.com/20120515/memory-reordering-caught-in-the-act/">generally be ordered
correctly</a>. Breaking dependencies between instructions allows
out-of-order execution to be fully exercised. It also gives the
compiler more freedom in instruction scheduling, though the <code class="language-plaintext highlighter-rouge">volatile</code>
accesses cannot be reordered with respect to each other (hence it
helping to reground the benchmark).</p>

<p>Statistically, the 4-round cipher was not significantly worse than the
16-round cipher. For simulation purposes the 4-round cipher would be
perfectly sufficient, though xoroshiro128+ is still more than 9 times
faster without sacrificing quality.</p>

<p>On the other hand, CTR mode had a single failure in both the 4-round
(dab_filltree2) and 16-round (dab_filltree) variants. At least for
Blowfish, is there something that makes CTR mode less suitable than CBC
mode as a PRNG?</p>

<p>In the end Blowfish is too slow and too complicated to serve as a
simulation PRNG. This was entirely expected, but it’s interesting to see
how it stacks up.</p>

<h4 id="mersenne-twister-mt19937-64">Mersenne Twister (MT19937-64)</h4>

<p>Nobody ever got fired for choosing <a href="https://en.wikipedia.org/wiki/Mersenne_Twister">Mersenne Twister</a>. It’s the
classical choice for simulations, and is still usually recommended to
this day. However, Mersenne Twister’s best days are behind it. I
tested the 64-bit variant, MT19937-64, and there are four problems:</p>

<ul>
  <li>
    <p>It’s between 1/4 and 1/5 the speed of xoroshiro128+.</p>
  </li>
  <li>
    <p>It’s got a large state: 2,500 bytes. Versus xoroshiro128+’s 16 bytes.</p>
  </li>
  <li>
    <p>Its implementation is three times bigger than xoroshiro128+, and much
more complicated.</p>
  </li>
  <li>
    <p>It had one statistical failure (dab_filltree2).</p>
  </li>
</ul>

<p>Curiously my implementation is 16% faster with Clang than GCC. Since
Mersenne Twister isn’t seriously in the running, I didn’t take time to
dig into why.</p>

<p>Ultimately I would never choose Mersenne Twister for anything anymore.
This was also not surprising.</p>

<h4 id="permuted-congruential-generator-pcg">Permuted Congruential Generator (PCG)</h4>

<p>The <a href="http://www.pcg-random.org/">Permuted Congruential Generator</a> (PCG) has some really
interesting history behind it, particularly with its somewhat <a href="http://www.pcg-random.org/paper.html">unusual
paper</a>, controversial for both its excessive length (58 pages)
and informal style. It’s in close competition with Xorshift and
xoroshiro128+. I was really interested in seeing how it stacked up.</p>

<p>PCG is really just a Linear Congruential Generator (LCG) that doesn’t
output the lowest bits (too poor quality), and has an extra
permutation step to make up for the LCG’s other weaknesses. I included
two variants in my benchmark: the official PCG and a “simplified” PCG
(sPCG) with a simple permutation step. sPCG is just the first PCG
presented in the paper (34 pages in!).</p>

<p>Here’s essentially what the simplified version looks like:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span>
<span class="nf">spcg32</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">m</span> <span class="o">=</span> <span class="mh">0x9b60933458e17d7d</span><span class="p">;</span>
    <span class="kt">uint64_t</span> <span class="n">a</span> <span class="o">=</span> <span class="mh">0xd737232eeccdf7ed</span><span class="p">;</span>
    <span class="o">*</span><span class="n">s</span> <span class="o">=</span> <span class="o">*</span><span class="n">s</span> <span class="o">*</span> <span class="n">m</span> <span class="o">+</span> <span class="n">a</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">shift</span> <span class="o">=</span> <span class="mi">29</span> <span class="o">-</span> <span class="p">(</span><span class="o">*</span><span class="n">s</span> <span class="o">&gt;&gt;</span> <span class="mi">61</span><span class="p">);</span>
    <span class="k">return</span> <span class="o">*</span><span class="n">s</span> <span class="o">&gt;&gt;</span> <span class="n">shift</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The third line with the modular multiplication and addition is the
LCG. The bit shift is the permutation. This PCG uses the most
significant three bits of the result to determine which 32 bits to
output. That’s <em>the</em> novel component of PCG.</p>

<p>The two constants are entirely my own devising. It’s two 64-bit primes
generated using Emacs’ <code class="language-plaintext highlighter-rouge">M-x calc</code>: <code class="language-plaintext highlighter-rouge">2 64 ^ k r k n k p k p k p</code>.</p>

<p>Heck, that’s so simple that I could easily memorize this and code it
from scratch on demand. Key takeaway: This is <strong>one way that PCG is
situationally better than xoroshiro128+</strong>. In a pinch I could use Emacs
to generate a couple of primes and code the rest from memory. If you
participate in coding competitions, take note.</p>

<p>However, you probably also noticed PCG only generates 32-bit integers
despite using 64-bit operations. To properly generate a 64-bit value
we’d need 128-bit operations, which would need to be implemented in
software.</p>

<p>Instead, I doubled up on everything to run two PRNGs in parallel.
Despite the doubling in state size, the period doesn’t get any larger
since the PRNGs don’t interact with each other. We get something in
return, though. Remember what I said about out-of-order execution?
Except for the last step combining their results, since the two PRNGs
are independent, doubling up shouldn’t <em>quite</em> halve the performance,
particularly with the benchmark loop unrolling business.</p>

<p>Here’s my doubled-up version:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint64_t</span>
<span class="nf">spcg64</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">s</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
<span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">m</span>  <span class="o">=</span> <span class="mh">0x9b60933458e17d7d</span><span class="p">;</span>
    <span class="kt">uint64_t</span> <span class="n">a0</span> <span class="o">=</span> <span class="mh">0xd737232eeccdf7ed</span><span class="p">;</span>
    <span class="kt">uint64_t</span> <span class="n">a1</span> <span class="o">=</span> <span class="mh">0x8b260b70b8e98891</span><span class="p">;</span>
    <span class="kt">uint64_t</span> <span class="n">p0</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
    <span class="kt">uint64_t</span> <span class="n">p1</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
    <span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">p0</span> <span class="o">*</span> <span class="n">m</span> <span class="o">+</span> <span class="n">a0</span><span class="p">;</span>
    <span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">p1</span> <span class="o">*</span> <span class="n">m</span> <span class="o">+</span> <span class="n">a1</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">r0</span> <span class="o">=</span> <span class="mi">29</span> <span class="o">-</span> <span class="p">(</span><span class="n">p0</span> <span class="o">&gt;&gt;</span> <span class="mi">61</span><span class="p">);</span>
    <span class="kt">int</span> <span class="n">r1</span> <span class="o">=</span> <span class="mi">29</span> <span class="o">-</span> <span class="p">(</span><span class="n">p1</span> <span class="o">&gt;&gt;</span> <span class="mi">61</span><span class="p">);</span>
    <span class="kt">uint64_t</span> <span class="n">high</span> <span class="o">=</span> <span class="n">p0</span> <span class="o">&gt;&gt;</span> <span class="n">r0</span><span class="p">;</span>
    <span class="kt">uint32_t</span> <span class="n">low</span>  <span class="o">=</span> <span class="n">p1</span> <span class="o">&gt;&gt;</span> <span class="n">r1</span><span class="p">;</span>
    <span class="k">return</span> <span class="p">(</span><span class="n">high</span> <span class="o">&lt;&lt;</span> <span class="mi">32</span><span class="p">)</span> <span class="o">|</span> <span class="n">low</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The “full” PCG has some extra shifts that makes it 25% (GCC) to 50%
(Clang) slower than the “simplified” PCG, but it does halve the WEAK
results.</p>

<p>In this 64-bit form, both are significantly slower than xoroshiro128+.
However, if you find yourself only needing 32 bits at a time (always
throwing away the high 32 bits from a 64-bit PRNG), 32-bit PCG is
faster than using xoroshiro128+ and throwing away half its output.</p>

<h4 id="rc4">RC4</h4>

<p>This is another CSPRNG where I was curious how it would stack up. It
only uses 8-bit operations, and it generates a 64-bit integer one byte
at a time. It’s the slowest after 16-round Blowfish and generally not
useful as a simulation PRNG.</p>

<h4 id="xoroshiro128">xoroshiro128+</h4>

<p>xoroshiro128+ is the obvious winner in this benchmark and it seems to be
the best 64-bit simulation PRNG available. If you need a fast, quality
PRNG, just drop these 11 lines into your C or C++ program:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint64_t</span>
<span class="nf">xoroshiro128plus</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">s</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
<span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">s0</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
    <span class="kt">uint64_t</span> <span class="n">s1</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
    <span class="kt">uint64_t</span> <span class="n">result</span> <span class="o">=</span> <span class="n">s0</span> <span class="o">+</span> <span class="n">s1</span><span class="p">;</span>
    <span class="n">s1</span> <span class="o">^=</span> <span class="n">s0</span><span class="p">;</span>
    <span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">((</span><span class="n">s0</span> <span class="o">&lt;&lt;</span> <span class="mi">55</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">s0</span> <span class="o">&gt;&gt;</span> <span class="mi">9</span><span class="p">))</span> <span class="o">^</span> <span class="n">s1</span> <span class="o">^</span> <span class="p">(</span><span class="n">s1</span> <span class="o">&lt;&lt;</span> <span class="mi">14</span><span class="p">);</span>
    <span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">s1</span> <span class="o">&lt;&lt;</span> <span class="mi">36</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">s1</span> <span class="o">&gt;&gt;</span> <span class="mi">28</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">result</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>There’s one important caveat: <strong>That 16-byte state must be
well-seeded.</strong> Having lots of zero bytes will lead <em>terrible</em> initial
output until the generator mixes it all up. Having all zero bytes will
completely break the generator. If you’re going to seed from, say, the
unix epoch, then XOR it with 16 static random bytes.</p>

<h4 id="xorshift128-and-xorshift64">xorshift128+ and xorshift64*</h4>

<p>These generators are closely related and, like I said, xorshift64* was
what I used for years. Looks like it’s time to retire it.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint64_t</span>
<span class="nf">xorshift64star</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">x</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">12</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&lt;&lt;</span> <span class="mi">25</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">27</span><span class="p">;</span>
    <span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">x</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">x</span> <span class="o">*</span> <span class="n">UINT64_C</span><span class="p">(</span><span class="mh">0x2545f4914f6cdd1d</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>However, unlike both xoroshiro128+ and xorshift128+, xorshift64* will
tolerate weak seeding so long as it’s not literally zero. Zero will also
break this generator.</p>

<p>If it weren’t for xoroshiro128+, then xorshift128+ would have been the
winner of the benchmark and my new favorite choice.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint64_t</span>
<span class="nf">xorshift128plus</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">s</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
<span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">x</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
    <span class="kt">uint64_t</span> <span class="n">y</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
    <span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">y</span><span class="p">;</span>
    <span class="n">x</span> <span class="o">^=</span> <span class="n">x</span> <span class="o">&lt;&lt;</span> <span class="mi">23</span><span class="p">;</span>
    <span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">x</span> <span class="o">^</span> <span class="n">y</span> <span class="o">^</span> <span class="p">(</span><span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="mi">17</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">y</span> <span class="o">&gt;&gt;</span> <span class="mi">26</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="n">y</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>It’s a lot like xoroshiro128+, including the need to be well-seeded,
but it’s just slow enough to lose out. There’s no reason to use
xorshift128+ instead of xoroshiro128+.</p>

<h3 id="conclusion">Conclusion</h3>

<p>My own takeaway (until I re-evaluate some years in the future):</p>

<ul>
  <li>The best 64-bit simulation PRNG is xoroshiro128+.</li>
  <li>“Simplified” PCG can be useful in a pinch.</li>
  <li>When only 32-bit integers are necessary, use PCG.</li>
</ul>

<p>Things can change significantly between platforms, though. Here’s the
shootout on a ARM Cortex-A53:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>                    Speed (MB/s)
PRNG         gcc-5.4.0   clang-3.8.0
------------------------------------
baseline          2560        2400
blowfishcbc16       36.5        45.4
blowfishcbc4       135         173
blowfishctr16       36.4        45.2
blowfishctr4       133         168
mt64               207         254
pcg64              980         712
rc4                 96.6        44.0
spcg64            1021         948
xoroshiro128+     2560        1570
xorshift128+      2560        1520
xorshift64*       1360        1080
</code></pre></div></div>

<p>LLVM is not as mature on this platform, but, with GCC, both
xoroshiro128+ and xorshift128+ matched the baseline! It seems memory
is the bottleneck.</p>

<p>So don’t necessarily take my word for it. You can run this shootout in
your own environment — perhaps even tossing in more PRNGs — to find
what’s appropriate for your own situation.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Blowpipe: a Blowfish-encrypted, Authenticated Pipe</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2017/09/15/"/>
    <id>urn:uuid:1cddecb9-44b1-346c-ded6-c099069ce013</id>
    <updated>2017-09-15T23:59:59Z</updated>
    <category term="crypto"/><category term="c"/><category term="posix"/>
    <content type="html">
      <![CDATA[<p><a href="https://github.com/skeeto/blowpipe"><strong>Blowpipe</strong></a> is a <em>toy</em> crypto tool that creates a
<a href="https://www.schneier.com/academic/blowfish/">Blowfish</a>-encrypted pipe. It doesn’t open any files and instead
encrypts and decrypts from standard input to standard output. This
pipe can encrypt individual files or even encrypt a network
connection (à la netcat).</p>

<p>Most importantly, since Blowpipe is intended to be used as a pipe
(duh), it will <em>never</em> output decrypted plaintext that hasn’t been
<em>authenticated</em>. That is, it will detect tampering of the encrypted
stream and truncate its output, reporting an error, without producing
the manipulated data. Some very similar tools that <em>aren’t</em> considered
toys lack this important feature, such as <a href="http://loop-aes.sourceforge.net/aespipe.README">aespipe</a>.</p>

<h3 id="purpose">Purpose</h3>

<p>Blowpipe came about because I wanted to study Blowfish, a 64-bit block
cipher designed by Bruce Schneier in 1993. It’s played an important
role in the history of cryptography and has withstood cryptanalysis
for 24 years. Its major weakness is its small block size, leaving it
vulnerable to birthday attacks regardless of any other property of the
cipher. Even in 1993 the 64-bit block size was a bit on the small
side, but Blowfish was intended as a drop-in replacement for the Data
Encryption Standard (DES) and the International Data Encryption
Algorithm (IDEA), other 64-bit block ciphers.</p>

<p>The main reason I’m calling this program a toy is that, outside of
legacy interfaces, it’s simply <a href="https://sweet32.info/">not appropriate to deploy a 64-bit
block cipher in 2017</a>. Blowpipe shouldn’t be used to encrypt
more than a few tens of GBs of data at a time. Otherwise I’m <em>fairly</em>
confident in both my message construction and my implementation. One
detail is a little uncertain, and I’ll discuss it later when
describing message format.</p>

<p>A tool that I <em>am</em> confident about is <a href="https://github.com/skeeto/enchive">Enchive</a>, though since
it’s <a href="/blog/2017/03/12/">intended for file encryption</a>, it’s not appropriate for use
as a pipe. It doesn’t authenticate until after it has produced most of
its output. Enchive does try its best to delete files containing
unauthenticated output when authentication fails, but this doesn’t
prevent you from consuming this output before it can be deleted,
particularly if you pipe the output into another program.</p>

<h3 id="usage">Usage</h3>

<p>As you might expect, there are two modes of operation: encryption (<code class="language-plaintext highlighter-rouge">-E</code>)
and decryption (<code class="language-plaintext highlighter-rouge">-D</code>). The simplest usage is encrypting and decrypting a
file:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ blowpipe -E &lt; data.gz &gt; data.gz.enc
$ blowpipe -D &lt; data.gz.enc | gunzip &gt; data.txt
</code></pre></div></div>

<p>In both cases you will be prompted for a passphrase which can be up to
72 bytes in length. The only verification for the key is the first
Message Authentication Code (MAC) in the datastream, so Blowpipe
cannot tell the difference between damaged ciphertext and an incorrect
key.</p>

<p>In a script it would be smart to check Blowpipe’s exit code after
decrypting. The output will be truncated should authentication fail
somewhere in the middle. Since Blowpipe isn’t aware of files, it can’t
clean up for you.</p>

<p>Another use case is securely transmitting files over a network with
netcat. In this example I’ll use a pre-shared key file, <code class="language-plaintext highlighter-rouge">keyfile</code>.
Rather than prompt for a key, Blowpipe will use the raw bytes of a given
file. Here’s how I would create a key file:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ head -c 32 /dev/urandom &gt; keyfile
</code></pre></div></div>

<p>First the receiver listens on a socket (<code class="language-plaintext highlighter-rouge">bind(2)</code>):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ nc -lp 2000 | blowpipe -D -k keyfile &gt; data.zip
</code></pre></div></div>

<p>Then the sender connects (<code class="language-plaintext highlighter-rouge">connect(2)</code>) and pipes Blowpipe through:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ blowpipe -E -k keyfile &lt; data.zip | nc -N hostname 2000
</code></pre></div></div>

<p>If all went well, Blowpipe will exit with 0 on the receiver side.</p>

<p>Blowpipe doesn’t buffer its output (but see <code class="language-plaintext highlighter-rouge">-w</code>). It performs one
<code class="language-plaintext highlighter-rouge">read(2)</code>, encrypts whatever it got, prepends a MAC, and calls
<code class="language-plaintext highlighter-rouge">write(2)</code> on the result. This means it can comfortably transmit live
sensitive data across the network:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ nc -lp 2000 | blowpipe -D

# dmesg -w | blowpipe -E | nc -N hostname 2000
</code></pre></div></div>

<p>Kernel messages will appear on the other end as they’re produced by
<code class="language-plaintext highlighter-rouge">dmesg</code>. Though keep in mind that the size of each line will be known to
eavesdroppers. Blowpipe doesn’t pad it with noise or otherwise try to
disguise the length. Those lengths may leak useful information.</p>

<h3 id="blowfish">Blowfish</h3>

<p>This whole project started when I wanted to <a href="/blog/2017/09/21/">play with Blowfish</a>
as a small drop-in library. I wasn’t satisfied with <a href="https://www.schneier.com/academic/blowfish/download.html">the
selection</a>, so I figured it would be a good exercise to write my
own. Besides, the <a href="https://www.schneier.com/academic/archives/1994/09/description_of_a_new.html">specification</a> is both an enjoyable and easy
read (and recommended). It justifies the need for a new cipher and
explains the various design decisions.</p>

<p>I coded from the specification, including writing <a href="https://github.com/skeeto/blowpipe/blob/master/gen-tables.sh">a script</a>
to generate the subkey initialization tables. Subkeys are initialized
to the binary representation of pi (the first ~10,000 decimal digits).
After a couple hours of work I hooked up the official test vectors to
see how I did, and all the tests passed on the first run. This wasn’t
reasonable, so I spent awhile longer figuring out how I screwed up my
tests. Turns out I absolutely <em>nailed it</em> on my first shot. It’s a
really great sign for Blowfish that it’s so easy to implement
correctly.</p>

<p>Blowfish’s key schedule produces five subkeys requiring 4,168 bytes of
storage. The key schedule is unusually complex: Subkeys are repeatedly
encrypted with themselves as they are being computed. This complexity
inspired the <a href="https://www.usenix.org/legacy/events/usenix99/provos/provos_html/node1.html">bcrypt</a> password hashing scheme, which
essentially works by iterating the key schedule many times in a loop,
then encrypting a constant 24-byte string. My bcrypt implementation
wasn’t nearly as successful on my first attempt, and it took hours of
debugging in order to match OpenBSD’s outputs.</p>

<p>The encryption and decryption algorithms are nearly identical, as is
typical for, and a feature of, Feistel ciphers. There are no branches
(preventing some side-channel attacks), and the only operations are
32-bit XOR and 32-bit addition. This makes it ideal for implementation
on 32-bit computers.</p>

<p>One tricky point is that encryption and decryption operate on a pair
of 32-bit integers (another giveaway that it’s a Feistel cipher). To
put the cipher to practical use, these integers have to be <a href="/blog/2016/11/22/">serialized
into a byte stream</a>. The specification doesn’t choose a byte
order, even for mixing the key into the subkeys. The official test
vectors are also 32-bit integers, not byte arrays. An implementer
could choose little endian, big endian, or even something else.</p>

<p>However, there’s one place in which this decision <em>is</em> formally made:
the official test vectors mix the key into the first subkey in big
endian byte order. By luck I happened to choose big endian as well,
which is why my tests passed on the first try. OpenBSD’s version of
bcrypt also uses big endian for all integer encoding steps, further
cementing big endian as the standard way to encode Blowfish integers.</p>

<h3 id="blowfish-library">Blowfish library</h3>

<p>The Blowpipe repository contains a ready-to-use, public domain Blowfish
library written in strictly conforming C99. The interface is just three
functions:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">blowfish_init</span><span class="p">(</span><span class="k">struct</span> <span class="n">blowfish</span> <span class="o">*</span><span class="p">,</span> <span class="k">const</span> <span class="kt">void</span> <span class="o">*</span><span class="n">key</span><span class="p">,</span> <span class="kt">int</span> <span class="n">len</span><span class="p">);</span>
<span class="kt">void</span> <span class="nf">blowfish_encrypt</span><span class="p">(</span><span class="k">struct</span> <span class="n">blowfish</span> <span class="o">*</span><span class="p">,</span> <span class="kt">uint32_t</span> <span class="o">*</span><span class="p">,</span> <span class="kt">uint32_t</span> <span class="o">*</span><span class="p">);</span>
<span class="kt">void</span> <span class="nf">blowfish_decrypt</span><span class="p">(</span><span class="k">struct</span> <span class="n">blowfish</span> <span class="o">*</span><span class="p">,</span> <span class="kt">uint32_t</span> <span class="o">*</span><span class="p">,</span> <span class="kt">uint32_t</span> <span class="o">*</span><span class="p">);</span>
</code></pre></div></div>

<p>Technically the key can be up to 72 bytes long, but the last 16 bytes
have an incomplete effect on the subkeys, so only the first 56 bytes
should matter. Since bcrypt runs the key schedule multiple times, all
72 bytes have full effect.</p>

<p>The library also includes a bcrypt implementation, though it will only
produce the raw password hash, not the base-64 encoded form. The main
reason for including bcrypt is to support Blowpipe.</p>

<h3 id="message-format">Message format</h3>

<p>The main goal of Blowpipe was to build a robust, authenticated
encryption tool using <em>only</em> Blowfish as a cryptographic primitive.</p>

<ol>
  <li>
    <p>It uses bcrypt with a moderately-high cost as a key derivation
function (KDF). Not terrible, but this is not a memory hard KDF,
which is important for protecting against cheap hardware brute force
attacks.</p>
  </li>
  <li>
    <p>Encryption is Blowfish in “counter” <a href="https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_.28CTR.29">CTR mode</a>. A 64-bit
counter is incremented and encrypted, producing a keystream. The
plaintext is XORed with this keystream like a stream cipher. This
allows the last block to be truncated when output and eliminates
some padding issues. Since CRT mode is trivially malleable, the MAC
becomes even more important. In CTR mode, <code class="language-plaintext highlighter-rouge">blowfish_decrypt()</code> is
never called. In fact, Blowpipe never uses it.</p>
  </li>
  <li>
    <p>The authentication scheme is Blowfish-CBC-MAC with a unique key and
<a href="https://moxie.org/blog/the-cryptographic-doom-principle/">encrypt-then-authenticate</a> (something I harmlessly got wrong
with Enchive). It essentially encrypts the ciphertext again with a
different key, but in <a href="https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#CBC">Cipher Block Chaining mode</a> (CBC), but
it only saves the final block. The final block is prepended to the
ciphertext as the MAC. On decryption the same block is computed
again to ensure that it matches. Only someone who knows the MAC key
can compute it.</p>
  </li>
</ol>

<p>Of all three Blowfish uses, I’m least confident about authentication.
<a href="https://blog.cryptographyengineering.com/2013/02/15/why-i-hate-cbc-mac/">CBC-MAC is tricky to get right</a>, though I am following the
rules: fixed length messages using a different key than encryption.</p>

<p>Wait a minute. Blowpipe is pipe-oriented and can output data without
buffering the entire pipe. How can there be fixed-length messages?</p>

<p>The pipe datastream is broken into 64kB <em>chunks</em>. Each chunk is
authenticated with its own MAC. Both the MAC and chunk length are
written in the chunk header, and the length is authenticated by the
MAC. Furthermore, just like the keystream, the MAC is continued from
previous chunk, preventing chunks from being reordered. Blowpipe can
output the content of a chunk and discard it once it’s been
authenticated. If any chunk fails to authenticate, it aborts.</p>

<p><img src="/img/diagram/blowpipe.svg" alt="" /></p>

<p>This also leads to another useful trick: The pipe is terminated with a
zero length chunk, preventing an attacker from appending to the
datastream. Everything after the zero-length chunk is discarded. Since
the length is authenticated by the MAC, the attacker also cannot
truncate the pipe since that would require knowledge of the MAC key.</p>

<p>The pipe itself has a 17 byte header: a 16 byte random bcrypt salt and 1
byte for the bcrypt cost. The salt is like an initialization vector (IV)
that allows keys to be safely reused in different Blowpipe instances.
The cost byte is the only distinguishing byte in the stream. Since even
the chunk lengths are encrypted, everything else in the datastream
should be indistinguishable from random data.</p>

<h3 id="portability">Portability</h3>

<p>Blowpipe runs on POSIX systems and Windows (Mingw-w64 and MSVC). I
initially wrote it for POSIX (on Linux) of course, but I took an unusual
approach when it came time to port it to Windows. Normally I’d <a href="/blog/2017/03/01/">invent a
generic OS interface</a> that makes the appropriate host system
calls. This time I kept the POSIX interface (<code class="language-plaintext highlighter-rouge">read(2)</code>, <code class="language-plaintext highlighter-rouge">write(2)</code>,
<code class="language-plaintext highlighter-rouge">open(2)</code>, etc.) and implemented the tiny subset of POSIX that I needed
in terms of Win32. That implementation can be found under <code class="language-plaintext highlighter-rouge">w32-compat/</code>.
I even dropped in a copy of <a href="https://github.com/skeeto/getopt">my own <code class="language-plaintext highlighter-rouge">getopt()</code></a>.</p>

<p>One really cool feature of this technique is that, on Windows, Blowpipe
will still “open” <code class="language-plaintext highlighter-rouge">/dev/urandom</code>. It’s intercepted by my own <code class="language-plaintext highlighter-rouge">open(2)</code>,
which in response to that filename actually calls
<a href="https://msdn.microsoft.com/en-us/library/windows/desktop/aa379886(v=vs.85).aspx"><code class="language-plaintext highlighter-rouge">CryptAcquireContext()</code></a> and pretends like it’s a file. It’s all
hidden behind the file descriptor. That’s the unix way.</p>

<p>I’m considering giving Enchive the same treatment since it would simply
and reduce much of the interface code. In fact, this project has taught
me a number of ways that Enchive could be improved. That’s the value of
writing “toys” such as Blowpipe.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Introducing the Pokerware Secure Passphrase Generator</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2017/07/27/"/>
    <id>urn:uuid:c2d33d1a-d2a2-3863-04ae-68d2b48eecd5</id>
    <updated>2017-07-27T17:49:10Z</updated>
    <category term="crypto"/><category term="meatspace"/><category term="netsec"/>
    <content type="html">
      <![CDATA[<p>I recently developed <a href="https://github.com/skeeto/pokerware"><strong>Pokerware</strong></a>, an offline passphrase
generator that operates in the same spirit as <a href="http://world.std.com/~reinhold/diceware.html">Diceware</a>.
The primary difference is that it uses a shuffled deck of playing
cards as its entropy source rather than dice. Draw some cards and use
them to select a uniformly random word from a list. Unless you’re some
sort of <a href="/blog/2011/01/10/">tabletop gaming nerd</a>, a deck of cards is more readily
available than five 6-sided dice, which would typically need to be
borrowed from the Monopoly board collecting dust on the shelf, then
rolled two at a time.</p>

<p>There are various flavors of two different word lists here:</p>

<ul>
  <li><a href="https://github.com/skeeto/pokerware/releases/tag/1.0">https://github.com/skeeto/pokerware/releases/tag/1.0</a></li>
</ul>

<p>Hardware random number generators are <a href="https://lwn.net/Articles/629714/">difficult to verify</a>
and may not actually be as random as they promise, either
intentionally or unintentionally. For the particularly paranoid,
Diceware and Pokerware are an easily verifiable alternative for
generating secure passphrases for <a href="/blog/2017/03/12/">cryptographic purposes</a>.
At any time, a deck of 52 playing cards is in one of 52! possible
arrangements. That’s more than 225 bits of entropy. If you give your
deck <a href="https://possiblywrong.wordpress.com/2011/03/27/card-shuffling-youre-not-done-yet/">a thorough shuffle</a>, it will be in an arrangement that
has never been seen before and will never be seen again. Pokerware
draws on some of these bits to generate passphrases.</p>

<p>The Pokerware list has 5,304 words (12.4 bits per word), compared to
Diceware’s 7,776 words (12.9 bits per word). My goal was to invent a
card-drawing scheme that would uniformly select from a list in the same
sized ballpark as Diceware. Much smaller and you’d have to memorize more
words for the same passphrase strength. Much larger and the words on the
list would be more difficult to memorize, since the list would contain
longer and less frequently used words. Diceware strikes a nice balance
at five dice.</p>

<!-- Photo credit: Kelsey Wellons -->
<p><img src="/img/pokerware/deck.jpg" alt="" /></p>

<p>One important difference for me is that <em>I like my Pokerware word
lists a lot more</em> than the two official Diceware lists. My lists only
have simple, easy-to-remember words (for American English speakers, at
least), without any numbers or other short non-words. Pokerware has
two official lists, “formal” and “slang,” since my early testers
couldn’t agree on which was better. Rather than make a difficult
decision, I took the usual route of making no decision at all.</p>

<p>The “formal” list is derived in part from <a href="https://books.google.com/ngrams">Google’s Ngram
Viewer</a>, with my own additional filters and tweaking. It’s called
“formal” because the ngrams come from formal publications and represent
more formal kinds of speech.</p>

<p>The “slang” list is derived from <a href="http://files.pushshift.io/reddit/"><em>every</em> reddit comment</a> between
December 2005 and May 2017, tamed by the same additional filters. I
<a href="/blog/2016/12/01/">have this data on hand</a>, so I may as well put it to use. I
figured more casually-used words would be easier to remember. Due to
my extra filtering, there’s actually a lot of overlap between these
lists, so the differences aren’t too significant.</p>

<p>If you have your own word list, perhaps in a different language, you
can use the Makefile in the repository to build your own Pokerware
lookup table, both plain text and PDF. The PDF is generated using
Groff macros.</p>

<h3 id="passphrase-generation-instructions">Passphrase generation instructions</h3>

<ol>
  <li>
    <p>Thoroughly shuffle the deck.</p>
  </li>
  <li>
    <p>Draw two cards. Sort them by value, then suit. Suits are in
alphabetical order: Clubs, Diamonds, Hearts, Spades.</p>
  </li>
  <li>
    <p>Draw additional cards until you get a card that doesn’t match the
face value of either of your initial two cards. Observe its suit.</p>
  </li>
  <li>
    <p>Using your two cards and observed suit, look up a word in the table.</p>
  </li>
  <li>
    <p>Place all cards back in the deck, shuffle, and repeat from step 2
until you have the desired number of words. Each word is worth 12.4
bits of entropy.</p>
  </li>
</ol>

<p>A word of warning about step 4: If you use software to do the word list
lookup, beware that it might save your search/command history — and
therefore your passphrase — to a file. For example, the <code class="language-plaintext highlighter-rouge">less</code> pager
will store search history in <code class="language-plaintext highlighter-rouge">~/.lesshst</code>. It’s easy to prevent that
one:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ LESSHISTFILE=- less pokerware-slang.txt
</code></pre></div></div>

<h4 id="example-word-generation">Example word generation</h4>

<p>Suppose in step 2 you draw King of Hearts (KH/K♥) and Queen of Clubs
(QC/Q♣).</p>

<p class="grid"><img src="/img/pokerware/kh.png" alt="" class="card" />
<img src="/img/pokerware/qc.png" alt="" class="card" /></p>

<p>In step 3 you first draw King of Diamonds (KD/K♦), discarding it because
it matches the face value of one of your cards from step 2.</p>

<p class="grid"><img src="/img/pokerware/kd.png" alt="" class="card" /></p>

<p>Next you draw Four of Spades (4S/4♠), taking spades as your extra suit.</p>

<p class="grid"><img src="/img/pokerware/4s.png" alt="" class="card" /></p>

<p>In order, this gives you Queen of Clubs, King of Hearts, and Spades:
QCKHS or Q♣K♥♠. This corresponds to “wizard” in the formal word list and
would be the first word in your passphrase.</p>

<h4 id="a-deck-of-cards-as-an-office-tool">A deck of cards as an office tool</h4>

<p>I now have an excuse to keep a deck of cards out on my desk at work.
I’ve been using Diceware — or something approximating it since I’m not
so paranoid about hardware RNGs. From now I’ll deal new passwords from an
in-reach deck of cards. Though typically I need to tweak the results to
meet <a href="https://www.troyhunt.com/passwords-evolved-authentication-guidance-for-the-modern-era/">outdated character-composition requirements</a>.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Why I've Retired My PGP Keys and What's Replaced It</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2017/03/12/"/>
    <id>urn:uuid:d8fc83d0-4f6e-31dc-95b9-3ec4427725e1</id>
    <updated>2017-03-12T21:54:38Z</updated>
    <category term="crypto"/><category term="openpgp"/>
    <content type="html">
      <![CDATA[<p><em>Update August 2019: I’ve got a PGP key again but only for signing. <a href="/blog/2019/07/10/">I
use another of my own tools, <strong>passphrase2pgp</strong></a>, to manage it.</em></p>

<p><strong>tl;dr</strong>: <a href="https://github.com/skeeto/enchive">Enchive</a> (rhymes with “archive”) has replaced my
use of GnuPG.</p>

<p>Two weeks ago I tried to encrypt a tax document for archival and
noticed my PGP keys had just expired. GnuPG had (correctly) forbidden
the action, requiring that I first edit the key and extend the
expiration date. Rather than do so, I decided to take this opportunity
to retire my PGP keys for good. Over time I’ve come to view PGP as
largely a failure — it <a href="https://blog.filippo.io/giving-up-on-long-term-pgp/">never reached the critical mass</a>, the
tooling has always <a href="https://blog.cryptographyengineering.com/2014/08/13/whats-matter-with-pgp/">been problematic</a>, and it’s now <a href="https://moxie.org/blog/gpg-and-me/">a dead
end</a>. The only thing it’s been successful at is signing Linux
packages, and even there it could be replaced with something simpler
and better.</p>

<p>I still have a use for PGP: encrypting sensitive files to myself for
long term storage. I’ve also been using it to consistently to sign Git
tags for software releases. However, very recently <a href="https://shattered.io/">this lost its
value</a>, though I doubt anyone was verifying these signatures
anyway. It’s never been useful for secure email, especially when <a href="https://josefsson.org/inline-openpgp-considered-harmful.html">most
people use it incorrectly</a>. I only need to find a replacement
for archival encryption.</p>

<p>I could use an encrypted filesystem, but which do I use? I use LUKS to
protect my laptop’s entire hard drive in the event of a theft, but for
archival I want something a little more universal. Basically I want the
following properties:</p>

<ul>
  <li>
    <p>Sensitive content must not normally be in a decrypted state. PGP
solves this by encrypting files individually. The archive filesystem
can always be mounted. An encrypted volume would need to be mounted
just prior to accessing it, during which everything would be
exposed.</p>
  </li>
  <li>
    <p>I should be able to encrypt files from any machine, even
less-trusted ones. With PGP I can load my public key on any machine
and encrypt files to myself. It’s like a good kind of ransomware.</p>
  </li>
  <li>
    <p>It should be easy to back these files up elsewhere, even on
less-trusted machines/systems. This isn’t reasonably possible with an
encrypted filesystem which would need to be backed up as a huge
monolithic block of data. With PGP I can toss encrypted files
anywhere.</p>
  </li>
  <li>
    <p>I don’t want to worry about per-file passphrases. Everything should
be encrypted with/to the same key. PGP solves this by encrypting
files to a recipient. This requirement prevents most stand-alone
crypto tools from qualifying.</p>
  </li>
</ul>

<p>I couldn’t find anything that fit the bill, so I did <strong>exactly what
you’re not supposed to do and rolled my own: <a href="https://github.com/skeeto/enchive">Enchive</a></strong>. It
was loosely inspired by <a href="http://www.tedunangst.com/flak/post/signify">OpenBSD’s signify</a>. It has the tiny
subset of PGP features that I need — using modern algorithms — plus
one more feature I’ve always wanted: the ability to <strong>generate a
keypair from a passphrase</strong>. This means I can reliably access my
archive keypair anywhere.</p>

<h3 id="on-enchive">On Enchive</h3>

<p>Here’s where I’d put the usual disclaimer about not using it for
anything serious, blah blah blah. But really, I don’t care if anyone
else uses Enchive. It exists just to scratch my own personal itch. If
you have any doubts, don’t use it. I’m putting it out there in case
anyone else is in the same boat. It would also be nice if any glaring
flaws I may have missed were pointed out.</p>

<p>Not expecting it to be available as a nice package, I wanted to make it
trivial to build Enchive anywhere I’d need it. Except for including
stdint.h in exactly one place to get the correct integers for crypto,
it’s written in straight C89. All the crypto libraries are embedded, and
there are no external dependencies. There’s even an “amalgamation” build,
so <code class="language-plaintext highlighter-rouge">make</code> isn’t required: just point your system’s <code class="language-plaintext highlighter-rouge">cc</code> at it and you’re
done.</p>

<h4 id="algorithms">Algorithms</h4>

<p>For encryption, Enchive uses <a href="https://cr.yp.to/ecdh.html">Curve25519</a>, <a href="https://cr.yp.to/chacha.html">ChaCha20</a>,
and <a href="https://tools.ietf.org/html/rfc2104">HMAC-SHA256</a>.</p>

<p>Rather than the prime-number-oriented RSA as used in classical PGP
(yes, GPG 2 <em>can</em> do better), Curve25519 is used for the asymmetric
cryptography role, using the relatively new elliptic curve
cryptography. It’s stronger cryptography and the keys are <em>much</em>
smaller. It’s a Diffie-Hellman function — an algorithm used to
exchange cryptographic keys over a public channel — so files are
encrypted by generating an ephemeral keypair and using this ephemeral
keypair to perform a key exchange with the master keys. The ephemeral
public key is included with the encrypted file and the ephemeral
private key is discarded.</p>

<p>I used the <a href="https://github.com/agl/curve25519-donna">“donna” implementation</a> in Enchive. Despite being
the hardest to understand (mathematically), this is the easiest to
use. It’s literally just one function of two arguments to do
everything.</p>

<p>Curve25519 only establishes the shared key, so next is the stream
cipher ChaCha20. It’s keyed by the shared key to actually encrypt the
data. This algorithm has the same author as Curve25519 (<a href="https://cr.yp.to/djb.html">djb</a>),
so it’s natural to use these together. It’s really straightforward, so
there’s not much to say about it.</p>

<p>For the Message Authentication Code (MAC), I chose HMAC-SHA256. It
prevents anyone from modifying the message. Note: This doesn’t prevent
anyone who knows the master public key from replacing the file
wholesale. That would be solved with a digital signature, but this
conflicts with my goal of encrypting files without the need of my secret
key. The MAC goes at the end of the file, allowing arbitrarily large
files to be encrypted single-pass as a stream.</p>

<p>There’s a little more to it (IV, etc.) and is described in detail in the
README.</p>

<h4 id="usage">Usage</h4>

<p>The first thing you’d do is generate a keypair. By default this is done
from <code class="language-plaintext highlighter-rouge">/dev/urandom</code>, in which case you should immediately back them up.
But if you’re like me, you’ll be using Enchive’s <code class="language-plaintext highlighter-rouge">--derive</code> (<code class="language-plaintext highlighter-rouge">-d</code>)
feature to create it from a passphrase. In that case, the keys are
backed up in your brain!</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ enchive keygen --derive
secret key passphrase:
secret key passphrase (repeat):
passphrase (empty for none):
passphrase (repeat):
</code></pre></div></div>

<p>The first prompt is for the secret key passphrase. This is converted
into a Curve25519 keypair using an scrypt-like key derivation algorithm.
The process requires 512MB of memory (to foil hardware-based attacks)
and takes around 20 seconds.</p>

<p>The second passphrase (or the only one when <code class="language-plaintext highlighter-rouge">--derive</code> isn’t used), is
the <em>protection key</em> passphrase. The secret key is encrypted with this
passphrase to protect it at rest. You’ll need to enter it any time you
decrypt a file. The key derivation step is less aggressive for this key,
but you could also crank it up if you like.</p>

<p>At the end of this process you’ll have two new files under
<code class="language-plaintext highlighter-rouge">$XDG_CONFIG_DIR/enchive</code>: <code class="language-plaintext highlighter-rouge">enchive.pub</code> (32 bytes) and <code class="language-plaintext highlighter-rouge">enchive.sec</code>
(64 bytes). The first you can distribute anywhere you’d like to encrypt
files; it’s not particularly sensitive. The second is needed to decrypt
files.</p>

<p>To encrypt a file for archival:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ enchive archive sensitive.zip
</code></pre></div></div>

<p>No prompt for passphrase. This will create <code class="language-plaintext highlighter-rouge">sensitive.zip.enchive</code>.</p>

<p>To decrypt later:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ enchive extract sensitive.zip.enchive
passphrase:
</code></pre></div></div>

<p>If you’ve got many files to decrypt, entering your passphrase over and
over would get tiresome, so Enchive includes a key agent that keeps
the protection key in memory for a period of time (15 minutes by
default). Enable it with the <code class="language-plaintext highlighter-rouge">--agent</code> flag (it may be enabled by
default someday).</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ enchive --agent extract sensitive.zip.enchive
</code></pre></div></div>

<p>Unlike ssh-agent and gpg-agent, there’s no need to start the agent
ahead of time. It’s started on demand as needed and terminates after
the timeout. It’s completely painless.</p>

<p>Both <code class="language-plaintext highlighter-rouge">archive</code> and <code class="language-plaintext highlighter-rouge">extract</code> operate stdin to stdout when no file is
given.</p>

<h3 id="feature-complete">Feature complete</h3>

<p>As far as I’m concerned, Enchive is feature complete. It does
everything I need, I don’t want it to do anything more, and at least
two of us have already started putting it to use. The interface and
file formats won’t change unless someone finds a rather significant
flaw. There <em>is</em> some wiggle room to replace the algorithms in the
future should Enchive have that sort of longevity.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>The Physical Analog for Encryption is the Hyperdrive</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/08/06/"/>
    <id>urn:uuid:89cd4041-90bf-3536-3bda-c1fe56e26383</id>
    <updated>2012-08-06T00:00:00Z</updated>
    <category term="crypto"/><category term="compsci"/>
    <content type="html">
      <![CDATA[<p>I was recently
<a href="http://www.youtube.com/playlist?list=PL80F8C1F2AE9B29DD">watching GetDaved play</a>
through
<a href="http://en.wikipedia.org/wiki/Star_Wars:_X-Wing_Alliance">X-Wing Alliance</a>,
a game I myself played in college. I have a lot of nostalgia for it,
especially because
<a href="http://en.wikipedia.org/wiki/Star_Wars:_TIE_Fighter">TIE Fighter</a> was
the first games I ever invested a lot of time into playing. Just
hearing the sounds and music brings back relaxing memories.</p>

<p>In one of the early missions the player travels through hyperspace
(which ain’t like dusting crops)
<a href="http://youtu.be/SeB1sn_6Zhk">to a storage area</a> located in deep
space. It’s a family business and the player is out there to take
inventory of storage containers. Like when I
<a href="/blog/2008/12/16/">saw the wormhole minefield in Deep Space 9</a>, it
got me thinking, “<em>Why?</em>” Why keep all these storage containers in
deep space? There’s no defense or security out there to stop someone
from stealing containers. It seems like it would be better to store
those at the home base where they can be protected.</p>

<p><img src="/img/misc/deep-space.jpg" alt="" /></p>

<p>Storing items at random locations in deep space is actually <em>very</em>
secure — more so than any lock! Space is <em>huge</em>. Even with
faster-than-light travel searching a galaxy for a storage location
would be impractical. It would be as impractical as using brute-force
to find an encryption key — another huge search space. Also, if the
storage location as been in use for <em>X</em> years, you’d need to come
within <em>X</em> light-years of it, at least, in order to find it, since
even gravity itself is limited by the speed of light.</p>

<p>Physical locks are usually described as the physical analogy of
cryptography. Honestly, it’s not a very good analogy. The brute-force
method for bypassing a lock isn’t to keep trying different keys or
combinations until it works. No, it’s to just smash something (a
window, the lock) or pick the lock. When translated back into the
crypto world that’s like breaking a cipher, which isn’t a practical
attack in modern cryptography.</p>

<p>No, the physical analogy for cryptography is deep space storage. The
only practical way to access deep space items is to learn the
coordinates of the storage location, which is the equivalent of the
encryption key. If the coordinates are lost or forgotten, the items
are as good as destroyed, just like data.</p>

<p>There are actually some advantages of physical “encryption.”
Ciphertext can be decrypted offline without being detected. It’s not
possible to visit deep space storage without having a physical
presence, which is certainly more detectable than offline
decryption. There’s also the advantage that it’s somewhat easier to
tell when the key (location) generation algorithm is busted or you’re
just bad at picking passphrases: someone else’s stuff will already be
there. A <em>literal</em> collision.</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Publishing My Private Keys</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/06/24/"/>
    <id>urn:uuid:cb40de11-5f3c-306f-b792-6214d65605a1</id>
    <updated>2012-06-24T00:00:00Z</updated>
    <category term="crypto"/><category term="openpgp"/><category term="tutorial"/>
    <content type="html">
      <![CDATA[<p><em>Update March 2017: I <a href="/blog/2017/03/12/">no longer use PGP</a>. Also, there’s a
bug in GnuPG <a href="https://dev.gnupg.org/T1800">that silently discards these security settings</a>,
and it’s unlikely to ever get fixed. You’ll need to find/build an old
version of GnuPG if you want to properly protect your secret keys.</em></p>

<p><em>Update August 2019: I’ve got a PGP key again, but <a href="/blog/2019/07/10/">I’m using my own
tool, <strong>passphrase2pgp</strong></a>, to manage it. This tool allows for a
particular workflow that GnuPG has never and will never provide. It
doesn’t rely on S2K as described below.</em></p>

<p>One of the items <a href="/blog/2012/06/23/">in my dotfiles repository</a> is my
PGP keys, both private and public. I believe this is a unique approach
that hasn’t been done before — a public experiment. It may <em>seem</em>
dangerous, but I’ve given it careful thought and I’m only using the
tools already available from GnuPG. It ensures my keys are well
backed-up (via the
<a href="http://markmail.org/message/bupvay4lmlxkbphr">Torvalds method</a>) and
available wherever I should need them.</p>

<p>In your GnuPG directory there are two core files: <code class="language-plaintext highlighter-rouge">secring.gpg</code> and
<code class="language-plaintext highlighter-rouge">pubring.gpg</code>. The first contains your secret keys and the second
contains public keys. <code class="language-plaintext highlighter-rouge">secring.gpg</code> is not itself encrypted. You can
(should) have different passphrases for each key, after all. These
files (or any PGP file) can be inspected with <code class="language-plaintext highlighter-rouge">--list-packets</code>. Notice
it won’t prompt for a passphrase in order to get this data,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gpg --list-packets ~/.gnupg/secring.gpg
:secret key packet:
    version 4, algo 1, created 1298734547, expires 0
    skey[0]: [2048 bits]
    skey[1]: [17 bits]
    iter+salt S2K, algo: 9, SHA1 protection, hash: 10, salt: ...
    protect count: 10485760 (212)
    protect IV:  a6 61 4a 95 44 1e 7e 90 88 c3 01 70 8d 56 2e 11
    encrypted stuff follows
:user ID packet: "Christopher Wellons &lt;...&gt;"
:signature packet: algo 1, keyid 613382C548B2B841
... and so on ...
</code></pre></div></div>

<p>Each key is encrypted <em>individually</em> within this file with a
passphrase. If you try to use the key, GPG will attempt to decrypt it
by asking for the passphrase. If someone were to somehow gain access
to your <code class="language-plaintext highlighter-rouge">secring.gpg</code>, they’d still need to get your passphrase, so
pick a strong one. The official documentation
advises you to keep your <code class="language-plaintext highlighter-rouge">secring.gpg</code> well-guarded and only rely on
the passphrase as a cautionary measure. I’m ignoring that part.</p>

<p>If you’re using GPG’s defaults, your secret key is encrypted with
CAST5, a symmetric block cipher. The encryption key is your passphrase
salted (mixed with a non-secret random number) and hashed with SHA-1
65,536 times. Using the hash function over and over is called
<a href="http://en.wikipedia.org/wiki/Key_stretching">key stretching</a>. It
greatly increases the amount of required work for a brute-force
attack, making your passphrase more effective. All of these settings
can be adjusted to better protect the secret key at the cost of less
portability. Since I’ve chosen to publish my <code class="language-plaintext highlighter-rouge">secring.gpg</code> in my
dotfiles repository I cranked up the settings as far as I can.</p>

<p>I changed the cipher to AES256, which is more modern, more trusted,
and more widely used than CAST5. For the passphrase digest, I selected
SHA-512. There are better passphrase digest algorithms out there but
this is the longest, slowest one that GPG offers. The PGP spec
supports between 1024 and 65,011,712 digest iterations, so I picked
one of the largest. 65 million iterations takes my laptop over a
second to process — absolutely brutal for someone attempting a
brute-force attack. Here’s the command to change to this configuration
on an existing key,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gpg --s2k-cipher-algo AES256 --s2k-digest-algo SHA512 --s2k-mode 3 \
    --s2k-count 65000000 --edit-key &lt;key id&gt;
</code></pre></div></div>

<p>When the edit key prompt comes up, enter <code class="language-plaintext highlighter-rouge">passwd</code> to change your
passphrase. You can enter the same passphrase again and it will re-use
it with the new configuration.</p>

<p>I’m feeling quite secure with my secret key, despite publishing my
<code class="language-plaintext highlighter-rouge">secring.gpg</code>. Before now, I was much more at risk of losing it to
disk failure than having it exposed. I challenge anyone who doubts my
security to crack my secret key. I’d rather learn that I’m wrong
sooner than later!</p>

<p>With this established in my dotfiles repository, I can more easily
include private dotfiles. Rather than use a symmetric cipher with an
individual passphrase on each file, I encrypt the private dotfiles
<em>to</em> myself. All my private dotfiles are managed with one key: my PGP
key. This also plays better with Emacs. While it supports transparent
encryption, it doesn’t even attempt to manage your passphrase (with
good reason). If the file is encrypted with a symmetric cipher, Emacs
will prompt for a passphrase on each save. If I encrypt them with my
public key, I only need the passphrase when I first open the file.</p>

<p>How it works right now is any dotfile that ends with <code class="language-plaintext highlighter-rouge">.priv.pgp</code> will
be decrypted into place — not symlinked, unfortunately, since this is
impossible. The install script has a <code class="language-plaintext highlighter-rouge">-p</code> switch to disable private
dotfiles, such as when I’m using an untrusted computer. <code class="language-plaintext highlighter-rouge">gpg-agent</code>
ensures that I only need to enter my passphrase once during the
install process no matter how many private dotfiles there are.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Versioning Personal Configuration Dotfiles</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/06/23/"/>
    <id>urn:uuid:d2428806-5a27-3996-39aa-ae0c411da126</id>
    <updated>2012-06-23T00:00:00Z</updated>
    <category term="crypto"/><category term="git"/>
    <content type="html">
      <![CDATA[<p>For almost two months now I’ve been
<a href="https://github.com/skeeto/dotfiles">versioning all my personal dotfiles</a>
in Git. Just as <a href="/blog/2011/10/19/">when I did the same with Emacs</a>,
it’s been extremely liberating and I wish I had been doing this for
years. Currently it covers 11 different applications including my web
browser, shell, window manager, and cryptographic keys, giving me a
unified experience across all of my machines — which, between home,
work, and virtual computers is about half a dozen.</p>

<p>Like anything, the biggest problem with <em>not</em> versioning these files
is introducing changes. If I add
<a href="/blog/2012/06/08/">an interesting tweak to a dotfile</a>, I won’t see
that change on my other machines until I either copy it over or I
enter it manually again. Because I’d worry about clobbering other
unpropagated changes, it was usually the latter. Only changes I could
commit to memory would propagate. Any tweak that wasn’t easy to
duplicate manually I couldn’t rely on, so I was discouraged from
customizing too much and relied mostly on defaults. This is bad!</p>

<p>Source control solves almost all of this trivially. If I notice a
pattern in my habits or devise an interesting configuration, I can
immediately make the change, commit it, and push it. Later, when I’m
on another computer and I notice it missing, I just do a pull without
needing to worry about clobbering any local changes. When moving onto
a new computer/install, all I need to do is clone the repository and
I’ve got <em>every</em> configuration I have without having the snoop around
the last computer I used figuring out what to copy over.</p>

<p>Most of <a href="/blog/2012/04/29/">the applications I prefer</a> have tidy,
manually-editable dotfiles that version well, so I would be able to
capture almost my entire environment. One near-exception was
Firefox. By itself, it doesn’t play well, but
<a href="/blog/2009/04/03/">since I use Pentadactyl</a> I’m able to configure it
cleanly like a proper application.</p>

<p>The last straw that triggered my dotfiles repository was
<a href="/blog/2011/11/03/">managing my Bash aliases</a>. It had gotten <em>just</em>
long enough that I was tired of manually synchronizing them. It was
finally time to invest some time into nipping this in the bud once and
for all. Unsure what approach to take, I looked around to see what
other people were doing. There are two basic approaches: version your
entire home directory or symbolically link your dotfiles into place
from a stand-alone repository.</p>

<p>The first approach is straightforward but has a number of issues that
make it a poor choice. You don’t need an install script or anything
special, you just use your home directory.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd
git init
git add .bashrc .gitconfig ...
</code></pre></div></div>

<p>The first problem is that most the files Git sees you <em>do not</em> want to
version. These are all going to show up in the status listing and,
because there’s no pattern to them, there’s really no way to filter
them out with a <code class="language-plaintext highlighter-rouge">.gitconfig</code>. Any other clones you have in your home
directory may also confuse Git, looking like submodules. You’ll have
to dodge this extra stuff all the time when working in the repository.</p>

<p>The second problem is that Git has only only one <code class="language-plaintext highlighter-rouge">.git</code> directory, in
the repository root. If there’s no <code class="language-plaintext highlighter-rouge">.git</code> in the current directory, it
will keep searching upwards until it finds one … which will
inevitably be your dotfiles repository. This will eventually lead to
annoying mistakes where you accidentally commit work to your dotfiles
repository for awhile until you notice you forgot a <code class="language-plaintext highlighter-rouge">git init</code>. A
possible workaround is to keep the <code class="language-plaintext highlighter-rouge">.git</code> directory out of your home
directory and use the environment variable <code class="language-plaintext highlighter-rouge">GIT_DIR</code> to tell Git where
it is when you’re working on it. That sounds like a pain to me.</p>

<p>The other approach is to have your dotfiles repository cloned on its
own, then use symlinks to put the configuration files into place. You
need to write an install script to do this. However, not all
configuration files are sitting directly in your home directory. Some
have their own directory. Modern applications have moved into a
directory under <code class="language-plaintext highlighter-rouge">~/.config/</code>. Your script needs to handle these.</p>

<p>Why symlinks rather than just copying the file into place? Well, if
you make any changes to the installed files, Git won’t see them and
you risk losing those changes.</p>

<p>Why symlinks rather than hard links? Symlinks deal with the atomic
replacement issue better. Conscientious applications are very careful
about how they write your data to disk. Unless it’s some kind of
database, files are never edited in-place. The application rewrites
the entire file at once. If the application is stupid and overwrites
the file directly, there’s a brief instant where you data is not on
disk at all! First, it truncates the original file, deleting your
data, then it rewrites the data, and, if it’s not <em>too</em> stupid, calls
<code class="language-plaintext highlighter-rouge">fsync()</code> to force the write to the hardware. It’s stupid, but it will
work with symlinks.</p>

<p>The conscientious application will write the data to a temporary file,
call <code class="language-plaintext highlighter-rouge">fsync()</code>,
then atomically <code class="language-plaintext highlighter-rouge">rename()</code> the new file over top the original file. If
there’s any failure along the way, <em>some</em> intact version of the data
will be on the disk. The problem is that this will replace your
symlink and changes won’t be captured by the repository. Such an
incident will be obvious with symlinks, since the file will no longer
be a symlink. Hard links are much less obvious.</p>

<p>Smart applications, like Emacs, also know not to clobber your symlinks
and will handle these writes properly, leaving the symlink
intact. With hard links, there is no way for the application to know
it needs to treat a file specially.</p>

<p>I figured that I could use someone else’s install script, so I
wouldn’t have to worry about getting this right. Since Ruby is so
popular with Git, many people are using Rake for this task. However, I
want to be able to maintain the install script myself and I don’t know
Rake. I also don’t want to depend on anything unusual to install my
dotfiles. So that was out.</p>

<p>Second, I don’t want to have to specifically list the files to
install, or not install, in the script. Don’t put the same information
in two places when one will do. This script should be able to tell on
its own what files to install.</p>

<p>Third, I didn’t want my dotfiles to actually <em>be</em> dotfiles in my
repository. It makes them hard to see and manage, since they’re
hidden. They’re much easier to handle when the dot is replaced with an
underscore.</p>

<p>So I wrote my own install script which installs any file beginning
with an underscore. I’ve since added support for “private” dotfiles
along the way. These are dotfiles that contain sensitive information
and are encrypted in the repository, allowing me to continue
publishing it safely.</p>

<p>If you’d like to create your own dotfiles repository, my dotfile
repository may not be useful beyond standing as an example but my
install script may be directly reusable for you.</p>

<p>There’s a lot to talk about, so I’ll be making a few more posts about
this.</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>SSH and GPG Agents</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/06/08/"/>
    <id>urn:uuid:7a12226e-073a-3902-4fe8-842afdfdb951</id>
    <updated>2012-06-08T00:00:00Z</updated>
    <category term="crypto"/><category term="netsec"/>
    <content type="html">
      <![CDATA[<p>If you’re using SSH or GPG with any sort of frequency, you should
definitely be using their accompanying <code class="language-plaintext highlighter-rouge">*-agent</code> programs. The agents
allow you to gain a whole lot of convenience without compromising your
security. Many people seem to be unaware these tools exist, so here’s
an overview along with some tips on how to use them effectively.</p>

<p>Let’s start from the top.</p>

<p>Both SSH and GPG involve the use of asymmetric encryption, and the
private key is protected by a user-entered passphrase. The private key
is generally never written in to the filesystem in plaintext. In the
case of GPG, these keys are the primary focus of the application. For
SSH, they’re a useful tool to make accessing remote machines less
tedious. (The SSH server is authenticated by a public key, too, but
this is unrelated to agents.)</p>

<p>For those who are unaware, rather than enter a password when logging
into a remove machine, you can identify yourself by a public
key. Generating a key is simple.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ssh-keygen
</code></pre></div></div>

<p>You’ll almost certainly want to accept the default location for the
key (<code class="language-plaintext highlighter-rouge">~/.ssh/id_rsa</code>) because this is where SSH will look for it. Make
sure you enter a passphrase, which will encrypt the private key. The
reason this is important is because, without it, anyone who gains
access to your <code class="language-plaintext highlighter-rouge">id_rsa</code> file will be able to access any remote systems
that have been told to trust your public key. By having a passphrase,
this person needs not only the <code class="language-plaintext highlighter-rouge">id_rsa</code> file, but also the passphrase
(two-factor authentication), so you probably want to pick a long,
strong one. This may sound inconvenient, but <code class="language-plaintext highlighter-rouge">ssh-agent</code> will help
you.</p>

<p>The key generation process will create two files: <code class="language-plaintext highlighter-rouge">id_rsa</code> (private
key) and <code class="language-plaintext highlighter-rouge">id_rsa.pub</code> (public key). The latter is what you give to
remote systems.</p>

<p>Telling a remote system about your key is simple,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ssh-copy-id &lt;host&gt;
</code></pre></div></div>

<p>This will copy your <code class="language-plaintext highlighter-rouge">id_rsa.pub</code> to the remote system, prompting you
for the <em>password</em> on the <em>remote</em> system (not the passphrase you just
entered), adding it to the file <code class="language-plaintext highlighter-rouge">~/.ssh/authorized_keys</code>. From this
point on, all logins will use your new keypair rather than prompt you
for a password. Since you put a passphrase on your key, this may seem
pointless — it seems you still need to type in a password for every
connection. Bear with me here!</p>

<p>As a side note, you should have a unique SSH keypair for each
<i>site</i>, so you’ll have several of them. This way you can revoke
access to a particular site without affecting the others.</p>

<p>For GPG — the GNU Privacy Guard, <i>the</i> free software PGP
implementation — your keys are stored under <code class="language-plaintext highlighter-rouge">~/.gnupg/</code> in a
database. Generating a key is also a simple command,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gpg --gen-key
</code></pre></div></div>

<p>This is a slightly more complicated process, which I won’t get into
here. In contrast to SSH, you’ll generally have only one keypair per
<i>identity</i> (i.e. you only have one).</p>

<p>So you’ve got these keys are encrypted by passphrases. If they’re
going to be any use then they’ll be long, annoying things that are a
pain to type in. If that was the end of the story this would be really
inconvenient, enough to make the use of passphrases too costly for
many people to bother. Fortunately, we have agents to help.</p>

<p>An agent is a daemon process that can hold onto your passphrase
(<code class="language-plaintext highlighter-rouge">gpg-agent</code>) or your private key (<code class="language-plaintext highlighter-rouge">ssh-agent</code>) so that you only need
to enter your passphrase once within in some period of time (possibly
for the entire life of the agent process), rather than type it many
times over and over again as it’s needed. The agents are very careful
about how they hold on to this sensitive information, such as avoiding
having it written to swap. You can also configure how long you want
them to hold onto your passphrase/key before purging it from memory.</p>

<p>The <code class="language-plaintext highlighter-rouge">ssh</code> and <code class="language-plaintext highlighter-rouge">gpg</code> programs need to know where to find the
agents. This is done through environmental variables. For <code class="language-plaintext highlighter-rouge">ssh-agent</code>,
the process ID is stored in <code class="language-plaintext highlighter-rouge">SSH_AGENT_PID</code> and the location of the
Unix socket for communication is in <code class="language-plaintext highlighter-rouge">SSH_AUTH_SOCK</code>. <code class="language-plaintext highlighter-rouge">gpg-agent</code>
stuffs everything into one variable, <code class="language-plaintext highlighter-rouge">GPG_AGENT_INFO</code> (which is a pain
if you want to use this information in a script). When the main
program is invoked and it needs to use the private key, it will use
these variables and get in touch with the agent to see if it can
supply the needed information without bothering the user.</p>

<p>Remember, a process can’t change the environment of their parent
process so you need to set this information in the agent’s parent
shell somehow. There are two methods to set these up: eval and exec.</p>

<p>When you start the agent, it forks off its daemon process and prints
the variable information to stdout. This can be <code class="language-plaintext highlighter-rouge">eval</code>ed directly into
the current environment. You could drop these lines directly in your
<code class="language-plaintext highlighter-rouge">.bashrc</code> so that the agents are always there. (Though they won’t exit
with your shell, lingering around uselessly! More on this ahead.)</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>eval $(ssh-agent)
eval $(gpg-agent --daemon)
</code></pre></div></div>

<p>For the exec method, you <em>replace</em> your current shell with a new one
with a modified environment. To do this, you ask the agent to exec
into a shell, with the variables set, rather than return control.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>exec ssh-agent bash
exec gpg-agent --daemon bash
</code></pre></div></div>

<p>As cool trick, you can chain these together. <code class="language-plaintext highlighter-rouge">ssh-agent</code> becomes
<code class="language-plaintext highlighter-rouge">gpg-agent</code> which then becomes <code class="language-plaintext highlighter-rouge">bash</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>exec ssh-agent gpg-agent --daemon bash
</code></pre></div></div>

<p>Note that <code class="language-plaintext highlighter-rouge">gpg-agent</code> is capable of being an <code class="language-plaintext highlighter-rouge">ssh-agent</code> as well by
using the <code class="language-plaintext highlighter-rouge">--enable-ssh-support</code> option, so you don’t need to launch
an <code class="language-plaintext highlighter-rouge">ssh-agent</code>. Unfortunately, I don’t like to use this because
<code class="language-plaintext highlighter-rouge">gpg-agent</code> gets a little too personal with the SSH key, storing its
own copy with its own passphrase again.</p>

<p>On the other hand, <code class="language-plaintext highlighter-rouge">gpg-agent</code> is <em>much</em> more advanced than OpenSSH’s
<code class="language-plaintext highlighter-rouge">ssh-agent</code>. When you want to have <code class="language-plaintext highlighter-rouge">ssh-agent</code> manage a key, you need
to first tell it about the key with <code class="language-plaintext highlighter-rouge">ssh-add</code>. With no arguments, it
will use <code class="language-plaintext highlighter-rouge">~/.ssh/id_rsa</code>. If you forget to do this, <code class="language-plaintext highlighter-rouge">ssh</code> will ask for
your passphrase directly, in your terminal, not allowing <code class="language-plaintext highlighter-rouge">ssh-agent</code>
to hold onto it. By comparison, <code class="language-plaintext highlighter-rouge">gpg</code> will always ask <code class="language-plaintext highlighter-rouge">gpg-agent</code> to
retrieve your passphrase when it’s needed (if the agent is available),
so it will cache your passphrase on demand. No need to explicitly
register with the agent. Even better, it will try its best to use a
“PIN entry” program to read your key, which helps protect against some
kinds of keyloggers — preventing other processes from seeing your
keystrokes.</p>

<p>Well, this is all fine and dandy except when you’ve already got an
agent running. Say you’re launching a new terminal emulator window
from an existing one, creating a new shell. Unfortunately, even though
you have agents running <em>and</em> they’re listed in your environment (from
the origin shell), <em>they’ll still spawn new agents</em>! This is really
lousy behavior, in my opinion. There’s no <code class="language-plaintext highlighter-rouge">--inherit</code> option to tell
them to silently pass along the information of the existing agent if
it appears to be valid. This causes two problems. One, you’ll need to
enter your passphrases <em>again</em> for the new agent. Second, these new
agents will linger around after the spawning shell has exited —
hogging important non-swappable memory.</p>

<p>The direct workaround is to, in your shell init script, check for
these variables yourself and check that they’re valid (the agent
process is still running) before trying to spawn any agents. This is
tedious, error-prone, and makes each user do a lot of work that could
have been done in one place by one person instead.</p>

<p>There’s still the problem of when you launch a new shell that doesn’t
inherit the variables (i.e. a remote login), so there’s no way for it
to be aware of the existing agents. To fix this, you’d need to write
the agent information to a file. The shell init script checks this
file for an existing agent before spawning one. This is even more
complicated, more error-prone, and subject to race-conditions. Why
make every use go through this process?!</p>

<p>Fortunately someone’s done all this work so you don’t have to! There’s
an awesome little tool called
<a href="http://www.funtoo.org/wiki/Keychain">Keychain</a> which can be used to
launch the agents for you. It stores the agent information in a file
so that you only ever launch one instance of the agent, and the agents
will be shared across every shell. It <em>does</em> have an <code class="language-plaintext highlighter-rouge">--inherit</code>
option — the default behavior, so you don’t even need to ask
nicely. Instead of running the <code class="language-plaintext highlighter-rouge">*-agent</code>s directly, you just put this
in your <code class="language-plaintext highlighter-rouge">.bashrc</code>,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>eval $(keychain --eval --quiet)
</code></pre></div></div>

<p>So simple and it <em>just works</em>! I was so happy when I found this. This
is the magic word that makes using agents a breeze, so I can’t
recommend it enough.</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Avoid Zip Archives</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2009/03/22/"/>
    <id>urn:uuid:d8e0047c-9ec9-3553-f6ac-5e8528aa82ca</id>
    <updated>2009-03-22T00:00:00Z</updated>
    <category term="rant"/><category term="compression"/><category term="crypto"/>
    <content type="html">
      <![CDATA[<!-- 22 March 2009 -->
<p>
<img src="/img/misc/onion.jpg" class="right"
     title="Onion on Lettuce by swatjester, cc-by-sa 2.0"/>

In a <a href="/blog/2009/03/16"> previous post</a> about the LZMA
compression algorithm, I made a negative comment about zip archives
and moved on. I would like to go into more detail about it now.
</p>
<p>
A zip archive serves three functions all-in-one: compression, archive,
and encryption. On a unix-like system, these functions would normally
provided by three separate tools, like tar, gzip/bzip2, and GnuPG. The
unix philosophy says to "write programs that do one thing and do it
well".
</p>
<p>
So in the case of zip archives, we are doing three things poorly when,
instead, we should be using three separate tools that each do one
thing well.
</p>
<p>
When we use three different tools, our encrypted archive is a lot like
an onion. On the outside we have encryption. After we peel that off by
decrypting it, we have compression, and after removing that lair,
finally the archive. This is reflected in the filename:
<code>.tar.gz.gpg</code>. As a side note, if GPG didn't already
support it, we could add base-64 encoding if needed as another layer
on the onion: <code>.tar.gz.gpg.b64</code>.
</p>
<p>
By using separate tools, we can also swap different tools in and out
without breaking any spec. Previously I mentioned using LZMA, which
could be used in place of gzip or bzip2. Instead of
<code>.tar.gz.gpg</code> you can have <code>.tar.lzma.gpg</code>. Or
you can swap out GPG for encryption and use, say, <a
href="http://ciphersaber.gurus.org/">CipherSaber</a> as
<code>.tar.lzma.cs2</code>. If we use a single one-size-fits-all
format, we are limited by the spec.
</p>
<h4>Compression</h4>
<p>
Both zip and gzip basically use the same compression algorithm. The
zip spec actually allows for a variety of other compression
algorithms, but you cannot rely on other tools to support them.
</p>
<p>
Zip archives are also inside out. Instead of <a
href="http://en.wikipedia.org/wiki/Solid_archive"> solid
compression</a>, which is what happens in tarballs, each file is
compressed individually. Redundancy between different files cannot be
exploited. The equivalent would be an inside out tarball:
<code>.gz.tar</code>. This would be produced by first individually
gzipping each file in a directory tree, then archiving them with
tar. This results in larger archive sizes.
</p>
<p>
However, there is an advantage to inside out archives: random
access. We can access a file in the middle of the archive without
having to take the whole thing apart. In general use, this sort of
thing isn't really needed, and solid compression would be more useful.
</p>
<h4>Archive</h4>
<p>
In a zip archive, timestamp resolution is limited to 2 seconds, which
is based on the old FAT filesystem time resolution. If your system
supports finer timestamps, you will lose information. But really, this
isn't a big deal.
</p>
<p>
It also does not store file ownership information, but this is also
not a big deal. It may even be desirable as a privacy measure.
</p>
<p>
Actually, the archive part of zip seems to be pretty reasonable, and
better than I thought it was. There don't seem to be any really
annoying problems with it.
</p>
<p>
Tar is still has advantages over zip. Zip doesn't quite allow the same
range of filenames as unix-like systems do, but it does allow
characters like * and ?. What happens when you extract files with
names containing these characters on an inferior operating system that
forbids them will depend on the tool.
</p>
<h4>Encryption</h4>
<p>
Encryption is where zip has been awful in the past. The original
spec's encryption algorithm had serious flaws and no one should even
consider using them today.
</p>
<p>
Since then, AES encryption has been worked into the standard and
implemented differently by different tools. Unless the same zip tool
is used on each end, you can't be sure AES encryption will work.
</p>
<p>
By placing encryption as part of the file spec, each tool has to
implement its own encryption, probably leaving out considerations like
using secure memory. These tools are concentrating on archiving and
compression, and so encryption will likely not be given a solid
effort.
</p>
<p>
In the implementations I know of, the archive index isn't encrypted,
so someone could open it up and see lots of file metadata, including
filenames.
</p>
<p>
When you encrypt a tarball with GnuPG, you have all the flexibility of
PGP available. Asymmetric encryption, web of trust, multiple strong
encryption algorithms, digital signatures, strong key management,
etc. It would be unreasonable for an archive format to have this kind
of thing built in.
</p>
<h4>Conclusion</h4>
<p>
You are almost always better off using a tarball rather than a zip
archive. Unfortunately the receiver of an archive will often be unable
to open anything else, so you may have no choice.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Controlling a Minefield</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2008/12/16/"/>
    <id>urn:uuid:0db94fda-0de6-3cec-fcdd-5b2d2e1d23ea</id>
    <updated>2008-12-16T00:00:00Z</updated>
    <category term="story"/><category term="crypto"/><category term="netsec"/>
    <content type="html">
      <![CDATA[<!-- 16 December 2008 -->
<p>
  <img src="/img/misc/naval-mine.jpg" alt="" title="Not a space mine."
       class="left"/>

Some time ago I was watching through the entire series of <a
href="http://en.wikipedia.org/wiki/Deep_Space_Nine">Deep Space
9</a>. It was a Star Trek television show about a space station that
rests next a <a href="http://en.wikipedia.org/wiki/Wormhole">
wormhole</a> that connects to the other side of the galaxy (The Delta
quadrant).
</p>
<p>
The Delta quadrant is ruled by a group called the Dominion, and they
are looking to conquer the Federation side of the galaxy (the Alpha
quadrant). At one point during the series, the Federation needs to
temporarily disable the wormhole to prevent Dominion ships from
crossing through. They do this by <a
href="http://startrek.wikia.com/wiki/Second_Battle_of_Deep_Space_9">
mining the wormhole</a> with identical, cloaked, self-replicating
mines.
</p>
<p>
If a mine is destroyed, the neighboring mines will replicate a
replacement. The minefield repairs itself. This makes removing the
minefield within a reasonable amount of time difficult to
impossible. If even a single mine is left behind, it can replicate the
entire minefield again.
</p>
<p>
The most interesting question here is this:
</p>
<blockquote>
  <p>
  When the Federation returns and wants to remove the minefield, how
  would they do it? What would stop the Dominion from doing the same
  thing?
  </p>
</blockquote>
<p>
The first thing that comes to mind is having a kill signal, but what
would this signal be? It could simply be a plain "kill" command, but
the Dominion could also broadcast such a signal to disable the
minefield. Consider that the Dominion could capture a single mine and
study everything about its workings. The minefield itself could
therefore hold no secrets whatsoever. This leaves out any possibility
of a secret kill command stored in the mines.
</p>
<p>
Here's what I would do, assuming that humans or aliens have not yet
discovered some giant breakthrough in factoring in the Star Trek
universe. I would randomly generate two very large prime
numbers. Today, two 1024-bit primes should be more than enough, but in
350 years even larger numbers would probably be necessary. Then, I
multiply these two number together and store this number in the mine
software. To disable the minefield, I simply broadcast these two
numbers into the minefield. The mines would be programmed to take the
product of any pairs of numbers it receives. If the product matches
the internal number, the mine shuts down.
</p>
<p>
Voila! A method for shutting down the minefield. The enemy can know
everything about every single mine's construction, including the
software and data stored on every mine, but will be unable to disable
the minefield without factoring a very large composite number, which
would presumably be difficult or impossible (within a reasonable
amount of time).
</p>
<p>
Another possibility would be using a hash. Come up with a strong
passphrase, then use a hashing algorithm like SHA-1 or MD5, or
whatever is available and appropriate in 350 years, to hash the
passphrase. Store the hash in the mines. When you want to disable the
minefield, broadcast the passphrase. These mines will hash the
broadcast and compare it to the stored hash. It's really the same
solution as before: a one-way function. This is also similar to how
passwords are stored inside a computer today.
</p>
<p>
If we wanted more commands, like "don't blow up any ships for awhile"
or "increase minefield density", we could generate more composites
corresponding to each command. However, once a command is issued, the
secret — the two prime numbers — is out, and it cannot be used again.
In this case, I would go into the realm
of <a href="http://en.wikipedia.org/wiki/Public_key_cryptography">public
key cryptography</a>.
</p>
<p>
I would issue a command, along with a timestamp, and maybe even a
nonce that could double as a global identifier for the command, and
sign the whole deal using my private key. On each mine I would store
the public key. When a command is received, the mines would check the
signature before executing the command. I could then issue repeat
commands, as the timestamps would change each time. An adversary
learns nothing when a command is issued, because the time stamps would
make any replay attacks useless.
</p>
<p>
Minefields just like this exist today all over the Internet, as <a
href="http://en.wikipedia.org/wiki/Botnet">botnets</a>. Thousands of
computers all around the world become infected with malware and come
under the control of a single individual or group. Individual machines
in the botnet could be taken out, but removing the entire botnet is
difficult as it grows and repairs itself. Any security researcher
could disassemble the botnet malware and learn anything about it, so
the malware can store no secrets. How does a malicious person control
the botnet, then, without someone else taking control?  Public key
cryptography, just as described above.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  

</feed>
