<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title>Articles tagged elisp at null program</title>
  <link rel="alternate" type="text/html"
        href="https://nullprogram.com/tags/elisp/"/>
  <link rel="self" type="application/atom+xml"
        href="https://nullprogram.com/tags/elisp/feed/"/>
  <updated>2026-04-09T13:25:45Z</updated>
  <id>urn:uuid:8f0d2f56-ad2a-4f37-b61c-16d975ffd79e</id>

  <author>
    <name>Christopher Wellons</name>
    <uri>https://nullprogram.com</uri>
    <email>wellons@nullprogram.com</email>
  </author>

  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>A Makefile for Emacs Packages</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2020/01/22/"/>
    <id>urn:uuid:2e138ef3-bc68-4115-bb84-af260db641c0</id>
    <updated>2020-01-22T02:54:41Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>Each of my Emacs packages has a Makefile to byte-compile all source
files, run the tests, build a package file, and, in some cases, run the
package in an interactive, temporary, isolated Emacs instance. These
<a href="/blog/2017/08/20/">portable Makefiles</a> have a similar structure and follow the same
conventions. It would require more thought and feedback before I’d try
to make it a <em>standard</em>, but these are conventions I’d like to see in
other package Makefiles.</p>

<p>Here’s an incomplete list of examples:</p>

<ul>
  <li><a href="https://github.com/skeeto/bitpack/blob/master/Makefile">https://github.com/skeeto/bitpack/blob/master/Makefile</a></li>
  <li><a href="https://github.com/skeeto/cplx/blob/master/Makefile">https://github.com/skeeto/cplx/blob/master/Makefile</a></li>
  <li><a href="https://github.com/skeeto/devdocs-lookup/blob/master/Makefile">https://github.com/skeeto/devdocs-lookup/blob/master/Makefile</a></li>
  <li><a href="https://github.com/skeeto/elfeed/blob/master/Makefile">https://github.com/skeeto/elfeed/blob/master/Makefile</a></li>
  <li><a href="https://github.com/skeeto/emacs-aio/blob/master/Makefile">https://github.com/skeeto/emacs-aio/blob/master/Makefile</a></li>
  <li><a href="https://github.com/skeeto/emacs-bencode/blob/master/Makefile">https://github.com/skeeto/emacs-bencode/blob/master/Makefile</a></li>
  <li><a href="https://github.com/skeeto/emacs-memoize/blob/master/Makefile">https://github.com/skeeto/emacs-memoize/blob/master/Makefile</a></li>
  <li><a href="https://github.com/skeeto/emacs-web-server/blob/master/Makefile">https://github.com/skeeto/emacs-web-server/blob/master/Makefile</a></li>
  <li><a href="https://github.com/skeeto/impatient-mode/blob/master/Makefile">https://github.com/skeeto/impatient-mode/blob/master/Makefile</a></li>
  <li><a href="https://github.com/skeeto/lcg128/blob/master/Makefile">https://github.com/skeeto/lcg128/blob/master/Makefile</a></li>
  <li><a href="https://github.com/skeeto/nasm-mode/blob/master/Makefile">https://github.com/skeeto/nasm-mode/blob/master/Makefile</a></li>
  <li><a href="https://github.com/skeeto/skewer-mode/blob/master/Makefile">https://github.com/skeeto/skewer-mode/blob/master/Makefile</a></li>
  <li><a href="https://github.com/skeeto/x86-lookup/blob/master/Makefile">https://github.com/skeeto/x86-lookup/blob/master/Makefile</a></li>
</ul>

<p>You should make a habit of compiling your Emacs Lisp files even if you
don’t think you need the performance. The byte-compiler, while
<a href="/blog/2019/02/24/">dumb</a>, does <a href="/blog/2016/12/22/">static analysis</a> and may spot bugs and other
issues early.</p>

<p>First things first: Every portable Makefile starts with a special
target, <code class="language-plaintext highlighter-rouge">.POSIX</code>, to request standard behavior. This is followed by
macro definitions. When compiling a C program, the <code class="language-plaintext highlighter-rouge">CC</code> macro is the
name of the compiler. Analogously, when compiling Emacs packages the
<code class="language-plaintext highlighter-rouge">EMACS</code> macro is the name of the Emacs program.</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">.POSIX</span><span class="o">:</span>
<span class="nv">EMACS</span> <span class="o">=</span> emacs
</code></pre></div></div>

<p>Users can now override the macro to specify alternate Emacs binaries. I
use this all the time to test my packages under different versions of
Emacs.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ make clean
$ make EMACS=emacs-24.3 check
$ make clean
$ make EMACS=emacs-25.1 check
</code></pre></div></div>

<p>Note: It’s common to use <code class="language-plaintext highlighter-rouge">?=</code> assignment here, but that is both
non-standard and unnecessary. If you want to override macro definitions
from the environment, use the <code class="language-plaintext highlighter-rouge">-e</code> option:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ export EMACS=emacs-24.3
$ make -e
</code></pre></div></div>

<p>The first non-special target in the Makefile is the default target. For
Emacs packages, this target should byte-compile all the source files,
including tests. List the byte-compiled file names as the target
dependencies:</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">compile</span><span class="o">:</span> <span class="nf">foo.elc foo-test.elc</span>
</code></pre></div></div>

<p>Now for the tedious part: Define the dependencies between your different
source files. It would be nice to automate this part somehow, but
fortunately most packages just aren’t that complicated. You do not need
to list trivial dependencies — i.e. mapping each .el file to its .elc
file — since make will figure that out on its own.</p>

<p>Since <code class="language-plaintext highlighter-rouge">foo-test.elc</code> relies on <code class="language-plaintext highlighter-rouge">foo.elc</code> — it’s testing this file after
all — the relationship must be indicated to make. For single file
packages (one package file, one test file), this is all that’s needed:</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">foo-test.elc</span><span class="o">:</span> <span class="nf">foo.elc</span>
</code></pre></div></div>

<p>I call my testing targets “check” and this target must depend on the
byte-compiled files containing tests. It will transiently depend on the
other package source files because of the previous section.</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">check</span><span class="o">:</span> <span class="nf">foo-test.elc</span>
    <span class="err">$(EMACS)</span> <span class="err">-Q</span> <span class="err">--batch</span> <span class="err">-L</span> <span class="err">.</span> <span class="err">-l</span> <span class="err">foo-test.elc</span> <span class="err">-f</span> <span class="err">ert-run-tests-batch</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">-Q</code> option runs Emacs with “minimum customizations.” The <code class="language-plaintext highlighter-rouge">-L .</code>
option puts the current directory in the load path so that <code class="language-plaintext highlighter-rouge">(require
'foo</code>) will work. Finally it loads the file containing the tests and
instructs ERT to run all defined tests.</p>

<p>A good build can clean up after itself:</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">clean</span><span class="o">:</span>
    <span class="err">rm</span> <span class="err">-f</span> <span class="err">foo.elc</span> <span class="err">foo-test.elc</span>
</code></pre></div></div>

<p>Finally we need one more thing to tie it all together: an inference rule
to teach make how to compile .elc files from .el files.</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">.SUFFIXES</span><span class="o">:</span> <span class="nf">.el .elc</span>
<span class="nl">.el.elc</span><span class="o">:</span>
    <span class="err">$(EMACS)</span> <span class="err">-Q</span> <span class="err">--batch</span> <span class="err">-L</span> <span class="err">.</span> <span class="err">-f</span> <span class="err">batch-byte-compile</span> <span class="err">$&lt;</span>
</code></pre></div></div>

<p>This is similar to the “check” target, but compiles a source file
instead of running tests.</p>

<p>For simple, single source file packages, this is all you need!</p>

<h3 id="complex-packages">Complex packages</h3>

<p>My most complex package is Elfeed which has 10 source files and 4 test
files. It also includes a target to build a package file, which I would
upload to Marmalade when it was still functioning. I did a few extra
things to keep this tidy.</p>

<p>First, I define the package version in the Makefile:</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">VERSION</span> <span class="o">=</span> 1.2.3
</code></pre></div></div>

<p>It would be nice to grab this information from a reliable place (Git
tag, source file, etc.), but I never found a reliable and satisfactory
way to do this. Simple wins.</p>

<p>To avoid repeating myself, I list the source files in a macro as well:</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">EL</span>   <span class="o">=</span> foo-a.el foo-b.el foo-c.el
<span class="nv">DOC</span>  <span class="o">=</span> README.md
<span class="nv">TEST</span> <span class="o">=</span> foo-test.el
</code></pre></div></div>

<p>These will still need to have all their interdependencies individually
defined for make. For example, if C depends on both A and B, but neither
A nor B depend on each other, this is all you’d need:</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">foo-c.elc</span><span class="o">:</span> <span class="nf">foo-a.elc foo-b.elc</span>
</code></pre></div></div>

<p>Done correctly you can perform parallel builds with the non-standard but
common <code class="language-plaintext highlighter-rouge">-j</code> make option. This is pretty nice since Emacs can’t do
parallel builds itself.</p>

<p>I use the file list macros in the “compile” and “check” targets:</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">compile</span><span class="o">:</span> <span class="nf">$(EL:.el=.elc) $(TEST:.el=.elc)</span>
<span class="nl">test</span><span class="o">:</span> <span class="nf">$(TEST:.el=.elc)</span>
</code></pre></div></div>

<p>The “package” target copies everything under a directory and tars it up.
The directory is removed first, if it exists, so that any potenntial
leftover garbage from doesn’t get included.</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">package</span><span class="o">:</span> <span class="nf">foo-$(VERSION).tar</span>
<span class="nl">foo-$(VERSION).tar</span><span class="o">:</span> <span class="nf">$(EL) $(DOC)</span>
    <span class="err">rm</span> <span class="err">-rf</span> <span class="err">foo-$(VERSION)/</span>
    <span class="err">mkdir</span> <span class="err">foo-$(VERSION)/</span>
    <span class="err">cp</span> <span class="err">$(EL)</span> <span class="err">$(DOC)</span> <span class="err">foo-$(VERSION)/</span>
    <span class="err">tar</span> <span class="err">cf</span> <span class="err">$@</span> <span class="err">foo-$(VERSION)/</span>
    <span class="err">rm</span> <span class="err">-rf</span> <span class="err">foo-$(VERSION)/</span>
</code></pre></div></div>

<p>In Elfeed, the target to test in an interactive, temporary Emacs
instance is called “virtual”. In Skewer it’s called “run”. The name of
the target and the specific rules will depend on the package, should you
even want this target at all. It’s handy to have the option test without
my own configuration contaminating Emacs, and vice versa. When people
report issues, I can also direct them to reproduce their issue in the
clean environment.</p>

<p>Here’s what a simple “run” target might look like:</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">run</span><span class="o">:</span> <span class="nf">$(EL:.el=.elc)</span>
    <span class="err">$(EMACS)</span> <span class="err">-Q</span> <span class="err">-L</span> <span class="err">.</span> <span class="err">-l</span> <span class="err">foo-c.elc</span> <span class="err">-f</span> <span class="err">foo-mode</span>
</code></pre></div></div>

<p>Make is not really designed to run interactive programs like this, but
it works in practice.</p>

<h3 id="dependencies">Dependencies</h3>

<p>What about packages with dependencies? I’ve used <a href="https://github.com/cask/cask">Cask</a> in the
past but was never satisfied, especially when integrating it into a
Makefile. So, again, I’ve opted for the dumb-but-reliable option:
request that dependencies are cloned in adjacent directories matching
the dependency’s package name. For example, the <a href="/blog/2014/02/06/">EmacSQL</a> Makefile
header:</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Clone the dependencies of this package in sibling directories:
#     $ git clone https://github.com/cbbrowne/pg.el ../pg
</span></code></pre></div></div>

<p>I also define a new “linker flags” macro, <code class="language-plaintext highlighter-rouge">LDFLAGS</code>. Like with <code class="language-plaintext highlighter-rouge">EMACS</code>,
this lets users override it if needed:</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">LDFLAGS</span> <span class="o">=</span> <span class="nt">-L</span> ../pg
</code></pre></div></div>

<p>Everywhere I use <code class="language-plaintext highlighter-rouge">-L .</code> I also include <code class="language-plaintext highlighter-rouge">$(LDFLAGS)</code>. For example, in the
inference rule:</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">.SUFFIXES</span><span class="o">:</span> <span class="nf">.el .elc</span>
<span class="nl">.el.elc</span><span class="o">:</span>
    <span class="err">$(EMACS)</span> <span class="err">-Q</span> <span class="err">--batch</span> <span class="err">-L</span> <span class="err">.</span> <span class="err">$(LDFLAGS)</span> <span class="err">-f</span> <span class="err">batch-byte-compile</span> <span class="err">$&lt;</span>
</code></pre></div></div>

<p>If the dependencies follow these conventions, then these can also be
compiled in a recursive way with little effort:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ make -C ../pg
</code></pre></div></div>

<p>I’m not completely satisfied with this solution, particularly since it’s
an odd burden on anyone using the Makefile, but it’s worked well enough
for my needs. This is when I wish Emacs had <a href="/blog/2020/01/21/#package-management">distributed package
management</a>.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Efficient Alias of a Built-In Emacs Lisp Function</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/12/10/"/>
    <id>urn:uuid:15421609-2681-4b75-99b2-b2d6aaa835fe</id>
    <updated>2019-12-10T02:32:04Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="optimization"/>
    <content type="html">
      <![CDATA[<p>Suppose you don’t like the names <code class="language-plaintext highlighter-rouge">car</code> and <code class="language-plaintext highlighter-rouge">cdr</code>, the traditional
identifiers for two halves of a lisp cons cell. <a href="https://irreal.org/blog/?p=8500">This is
misguided.</a> A cons is really just a 2-tuple, and the halves
don’t have any particular meaning on their own, even as “head” and
“tail.” However, maybe this is really important to you so you want to
do it anyway. What’s the best way to go about it?</p>

<h3 id="defalias">defalias</h3>

<p>Emacs Lisp has a built-in function just for this, <code class="language-plaintext highlighter-rouge">defalias</code>, which
is the obvious choice.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defalias</span> <span class="ss">'car-alias</span> <span class="nf">#'</span><span class="nb">car</span><span class="p">)</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">car</code> built-in function is so fundamental to the language that <a href="/blog/2014/01/04/">it
gets its own byte-code opcode</a>. When you call <code class="language-plaintext highlighter-rouge">car</code> in your code,
the byte-compiler doesn’t generate a function call, but instead uses a
single instruction. For example, here’s an <code class="language-plaintext highlighter-rouge">add</code> function that sums
the <code class="language-plaintext highlighter-rouge">car</code> of its two arguments. I’ve followed the definition with its
disassembly (Emacs 26.3, <a href="/blog/2016/12/22/">lexical scope</a>):</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">add</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">a</span><span class="p">)</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">b</span><span class="p">)))</span>
<span class="c1">;; 0       stack-ref 1</span>
<span class="c1">;; 1       car</span>
<span class="c1">;; 2       stack-ref 1</span>
<span class="c1">;; 3       car</span>
<span class="c1">;; 4       plus</span>
<span class="c1">;; 5       return</span>
</code></pre></div></div>

<p>There are zero function calls because of the dedicated <code class="language-plaintext highlighter-rouge">car</code> opcode, and
it has the optimal six byte-code instructions.</p>

<p>The problem with <code class="language-plaintext highlighter-rouge">defalias</code> is that the definition is permitted change
— or <a href="/blog/2013/01/22/">be advised</a> — and that robs the byte-compiler of
optimization opportunities. It’s <a href="/blog/2019/12/09/">a constraint</a>. When the
byte-code compiler sees <code class="language-plaintext highlighter-rouge">car-alias</code>, it <em>must</em> emit a function call:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">add-alias</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nv">car-alias</span> <span class="nv">a</span><span class="p">)</span> <span class="p">(</span><span class="nv">car-alias</span> <span class="nv">b</span><span class="p">)))</span>
<span class="c1">;; 0       constant  car-alias</span>
<span class="c1">;; 1       stack-ref 2</span>
<span class="c1">;; 2       call      1</span>
<span class="c1">;; 3       constant  car-alias</span>
<span class="c1">;; 4       stack-ref 2</span>
<span class="c1">;; 5       call      1</span>
<span class="c1">;; 6       plus</span>
<span class="c1">;; 7       return</span>
</code></pre></div></div>

<p>This has two function calls and eight byte-code instructions. Those
function calls are significantly more expensive than a <code class="language-plaintext highlighter-rouge">car</code>
instruction, which will show in the benchmark later.</p>

<h3 id="defsubst">defsubst</h3>

<p>An alternative is <code class="language-plaintext highlighter-rouge">defsubst</code>, an inlined function definition, which
will inline an actual <code class="language-plaintext highlighter-rouge">car</code>. The semantics for <code class="language-plaintext highlighter-rouge">defsubst</code> are, like
macros, explicit that re-definitions may not affect previous uses, so
the constraint is gone. Unfortunately <a href="/blog/2019/02/24/">the byte-code compiler is
pretty dumb</a>, and does a poor job inlining <code class="language-plaintext highlighter-rouge">car-subst</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defsubst</span> <span class="nv">car-subst</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">car</span> <span class="nv">x</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">add-subst</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nv">car-subst</span> <span class="nv">a</span><span class="p">)</span> <span class="p">(</span><span class="nv">car-subst</span> <span class="nv">b</span><span class="p">)))</span>
<span class="c1">;; 0       stack-ref 1</span>
<span class="c1">;; 1       dup</span>
<span class="c1">;; 2       car</span>
<span class="c1">;; 3       stack-set 1</span>
<span class="c1">;; 5       stack-ref 1</span>
<span class="c1">;; 6       dup</span>
<span class="c1">;; 7       car</span>
<span class="c1">;; 8       stack-set 1</span>
<span class="c1">;; 10      plus</span>
<span class="c1">;; 11      return</span>
</code></pre></div></div>

<p>There are zero function calls and ten byte-code instructions. The
<code class="language-plaintext highlighter-rouge">car</code> opcode <em>is</em> in use, but there are five unnecessary instructions.
This is still faster than making the function calls, though. If the
byte-code compiler was just a little smarter and could compile this to
the ideal case, then this would be the end of the discussion.</p>

<h3 id="cl-first">cl-first</h3>

<p>The built-in <code class="language-plaintext highlighter-rouge">cl-lib</code> package has a <code class="language-plaintext highlighter-rouge">cl-first</code> alias for <code class="language-plaintext highlighter-rouge">car</code>. This was
written by someone with intimate knowledge of Emacs Lisp, so how how
well did they do?</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'cl-lib</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">add-cl-first</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nv">cl-first</span> <span class="nv">a</span><span class="p">)</span> <span class="p">(</span><span class="nv">cl-first</span> <span class="nv">b</span><span class="p">)))</span>
<span class="c1">;; 0       stack-ref 1</span>
<span class="c1">;; 1       car</span>
<span class="c1">;; 2       stack-ref 1</span>
<span class="c1">;; 3       car</span>
<span class="c1">;; 4       plus</span>
<span class="c1">;; 5       return</span>
</code></pre></div></div>

<p>It’s just like plain old <code class="language-plaintext highlighter-rouge">car</code>! How did they manage this? By using a
byte-compiler hint:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defalias</span> <span class="ss">'cl-first</span> <span class="ss">'car</span><span class="p">)</span>
<span class="p">(</span><span class="nv">put</span> <span class="ss">'cl-first</span> <span class="ss">'byte-optimizer</span> <span class="ss">'byte-compile-inline-expand</span><span class="p">)</span>
</code></pre></div></div>

<p>They used <code class="language-plaintext highlighter-rouge">defalias</code>, but they also manually told the byte-compiler to
inline the definition like <code class="language-plaintext highlighter-rouge">defsubst</code>. In fact, <code class="language-plaintext highlighter-rouge">defsubst</code> expands to an
expression that sets <code class="language-plaintext highlighter-rouge">byte-compile-inline-expand</code>, but, as seen above,
the inline function overhead gets inlined and doesn’t get eliminated.</p>

<h3 id="benchmark">Benchmark</h3>

<p>So how do the alternatives perform? (<a href="https://gist.github.com/skeeto/36baa3b1493f53eab4e082b449448a96">benchmark source</a>)</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>add           (0.594811299 0 0.0)
add-alias     (1.232037132 0 0.0)
add-subst     (0.700044324 0 0.0)
add-cl-first  (0.58332882 0 0.0)
</code></pre></div></div>

<p>(The <code class="language-plaintext highlighter-rouge">car</code> of the list is the running time.) Since <code class="language-plaintext highlighter-rouge">add</code> and
<code class="language-plaintext highlighter-rouge">add-cl-first</code> have the same byte-codes, we shouldn’t, and didn’t, see
a significant difference. The simple use of <code class="language-plaintext highlighter-rouge">defalias</code> doubles the
running time, and using <code class="language-plaintext highlighter-rouge">defsubst</code> is about 15% slower.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>UTF-8 String Indexing Strategies</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/05/29/"/>
    <id>urn:uuid:12e9ed44-b5c1-495f-8750-dfaf1ab008e2</id>
    <updated>2019-05-29T21:52:06Z</updated>
    <category term="elisp"/><category term="emacs"/><category term="go"/><category term="lang"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=20049491">on Hacker News</a>.</em></p>

<p>When designing or, in some cases, implementing a programming language
with built-in support for Unicode strings, an important decision must be
made about how to represent or encode those strings in memory. Not all
representations are equal, and there are trade-offs between different
choices.</p>

<!--more-->

<p>One issue to consider is that strings typically feature random access
indexing of code points with a time complexity resembling constant
time (<code class="language-plaintext highlighter-rouge">O(1)</code>). However, not all string representations actually
support this well. Strings using variable length encoding, such as
UTF-8 or UTF-16, have <code class="language-plaintext highlighter-rouge">O(n)</code> time complexity indexing, ignoring
special cases (discussed below). The most obvious choice to achieve
<code class="language-plaintext highlighter-rouge">O(1)</code> time complexity — an array of 32-bit values, as in UCS-4 —
makes very inefficient use of memory, especially with typical strings.</p>

<p>Despite this, UTF-8 is still chosen in a number of programming
languages, or at least in their implementations. In this article I’ll
discuss three examples — Emacs Lisp, Julia, and Go — and how each takes a
slightly different approach.</p>

<h3 id="emacs-lisp">Emacs Lisp</h3>

<p>Emacs Lisp has two different types of strings that generally can be used
interchangeably: <em>unibyte</em> and <em>multibyte</em>. In fact, the difference
between them is so subtle that I bet that most people writing Emacs Lisp
don’t even realize there are two kinds of strings.</p>

<p>Emacs Lisp uses UTF-8 internally to encode all “multibyte” strings and
buffers. To fully support arbitrary sequences of bytes in the files
being edited, Emacs uses <a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/Text-Representations.html">its own extension of Unicode</a> to
precisely and unambiguously represent raw bytes intermixed with text.
Any arbitrary sequence of bytes can be decoded into Emacs’ internal
representation, then losslessly re-encoded back into the exact same
sequence of bytes.</p>

<p>Unibyte strings and buffers are really just byte-strings. In practice,
they’re essentially ISO/IEC 8859-1, a.k.a. <em>Latin-1</em>. It’s a Unicode
string where all code points are below 256. Emacs prefers the smallest
and simplest string representation when possible, <a href="https://www.python.org/dev/peps/pep-0393/">similar to CPython
3.3+</a>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">multibyte-string-p</span> <span class="s">"hello"</span><span class="p">)</span>
<span class="c1">;; =&gt; nil</span>

<span class="p">(</span><span class="nv">multibyte-string-p</span> <span class="s">"π ≈ 3.14"</span><span class="p">)</span>
<span class="c1">;; =&gt; t</span>
</code></pre></div></div>

<p>Emacs Lisp strings are mutable, and therein lies the kicker: As soon as
you insert a code point above 255, Emacs quietly converts the string to
multibyte.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">fish</span> <span class="s">"fish"</span><span class="p">)</span>

<span class="p">(</span><span class="nv">multibyte-string-p</span> <span class="nv">fish</span><span class="p">)</span>
<span class="c1">;; =&gt; nil</span>

<span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">fish</span> <span class="mi">2</span><span class="p">)</span> <span class="nv">?</span><span class="err">ŝ</span>
      <span class="p">(</span><span class="nb">aref</span> <span class="nv">fish</span> <span class="mi">3</span><span class="p">)</span> <span class="nv">?o</span><span class="p">)</span>

<span class="nv">fish</span>
<span class="c1">;; =&gt; "fiŝo"</span>

<span class="p">(</span><span class="nv">multibyte-string-p</span> <span class="nv">fish</span><span class="p">)</span>
<span class="c1">;; =&gt; t</span>
</code></pre></div></div>

<p>Constant time indexing into unibyte strings is straightforward, and
Emacs does the obvious thing when indexing into unibyte strings. It
helps that most strings in Emacs are probably unibyte, even when the
user isn’t working in English.</p>

<p>Most buffers are multibyte, even if those buffers are generally just
ASCII. Since <a href="/blog/2017/09/07/">Emacs uses gap buffers</a> it generally doesn’t matter:
Nearly all accesses are tightly clustered around the point, so O(n)
indexing doesn’t often matter.</p>

<p>That leaves multibyte strings. Consider these idioms for iterating
across a string in Emacs Lisp:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">i</span> <span class="p">(</span><span class="nb">length</span> <span class="nb">string</span><span class="p">))</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">c</span> <span class="p">(</span><span class="nb">aref</span> <span class="nb">string</span> <span class="nv">i</span><span class="p">)))</span>
    <span class="o">...</span><span class="p">))</span>

<span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">c</span> <span class="nv">being</span> <span class="k">the</span> <span class="nv">elements</span> <span class="nv">of</span> <span class="nb">string</span>
         <span class="o">...</span><span class="p">)</span>
</code></pre></div></div>

<p>The latter expands into essentially the same as the former: An
incrementing index that uses <code class="language-plaintext highlighter-rouge">aref</code> to index to that code point. So is
iterating over a multibyte string — a common operation — an O(n^2)
operation?</p>

<p>The good news is that, at least in this case, no! It’s essentially just
as efficient as iterating over a unibyte string. Before going over why,
consider this little puzzle. Here’s a little string comparison function
that compares two strings a code point at a time, returning their first
difference:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">compare</span> <span class="p">(</span><span class="nv">string-a</span> <span class="nv">string-b</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">a</span> <span class="nv">being</span> <span class="k">the</span> <span class="nv">elements</span> <span class="nv">of</span> <span class="nv">string-a</span>
           <span class="nv">for</span> <span class="nv">b</span> <span class="nv">being</span> <span class="k">the</span> <span class="nv">elements</span> <span class="nv">of</span> <span class="nv">string-b</span>
           <span class="nb">unless</span> <span class="p">(</span><span class="nb">eql</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
           <span class="nb">return</span> <span class="p">(</span><span class="nb">cons</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
</code></pre></div></div>

<p>Let’s examine benchmarks with some long strings (100,000 code points):</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark-run</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">a</span> <span class="p">(</span><span class="nb">make-string</span> <span class="mi">100000</span> <span class="mi">0</span><span class="p">))</span>
          <span class="p">(</span><span class="nv">b</span> <span class="p">(</span><span class="nb">make-string</span> <span class="mi">100000</span> <span class="mi">0</span><span class="p">)))</span>
      <span class="p">(</span><span class="nv">compare</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
<span class="c1">;; =&gt; (0.012568031 0 0.0)</span>
</code></pre></div></div>

<p>With using two, zeroed unibyte strings it takes 13ms. How about changing
the last code point in one of them to 256, converting it to a multibyte
string:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark-run</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">a</span> <span class="p">(</span><span class="nb">make-string</span> <span class="mi">100000</span> <span class="mi">0</span><span class="p">))</span>
          <span class="p">(</span><span class="nv">b</span> <span class="p">(</span><span class="nb">make-string</span> <span class="mi">100000</span> <span class="mi">0</span><span class="p">)))</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">a</span> <span class="p">(</span><span class="nb">1-</span> <span class="p">(</span><span class="nb">length</span> <span class="nv">a</span><span class="p">)))</span> <span class="mi">256</span><span class="p">)</span>
      <span class="p">(</span><span class="nv">compare</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
<span class="c1">;; =&gt; (0.012680513 0 0.0)</span>
</code></pre></div></div>

<p>Same running time, so that multibyte string cost nothing more to iterate
across. Let’s try making them both multibyte:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark-run</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">a</span> <span class="p">(</span><span class="nb">make-string</span> <span class="mi">100000</span> <span class="mi">0</span><span class="p">))</span>
          <span class="p">(</span><span class="nv">b</span> <span class="p">(</span><span class="nb">make-string</span> <span class="mi">100000</span> <span class="mi">0</span><span class="p">)))</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">a</span> <span class="p">(</span><span class="nb">1-</span> <span class="p">(</span><span class="nb">length</span> <span class="nv">a</span><span class="p">)))</span> <span class="mi">256</span>
            <span class="p">(</span><span class="nb">aref</span> <span class="nv">b</span> <span class="p">(</span><span class="nb">1-</span> <span class="p">(</span><span class="nb">length</span> <span class="nv">b</span><span class="p">)))</span> <span class="mi">256</span><span class="p">)</span>
      <span class="p">(</span><span class="nv">compare</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
<span class="c1">;; =&gt; (2.327959762 0 0.0)</span>
</code></pre></div></div>

<p>That took 2.3 seconds: about 2000x longer to run! Iterating over two
multibyte strings concurrently seems to have broken an optimization.
Can you reason about what’s happened?</p>

<p>To avoid the O(n) cost on this common indexing operating, Emacs keeps
a “bookmark” for the last indexing location into a multibyte string.
If the next access is nearby, it can starting looking from this
bookmark, forwards or backwards. Like a gap buffer, this gives a big
advantage to clustered accesses, including iteration.</p>

<p>However, this string bookmark is <em>global</em>, one per Emacs instance, not
once per string. In the last benchmark, the two multibyte strings are
constantly fighting over a single string bookmark, and indexing in
comparison function is reduced to O(n^2) time complexity.</p>

<p>So, Emacs <em>pretends</em> it has constant time access into its UTF-8 text
data, but it’s only faking it with some simple optimizations. This
usually works out just fine.</p>

<h3 id="julia">Julia</h3>

<p>Another approach is to not pretend at all, and to make this limitation
of UTF-8 explicit in the interface. Julia took this approach, and it
<a href="/blog/2014/03/06/">was one of my complaints about the language</a>. I don’t think
this is necessarily a bad choice, but I do still think it’s
inappropriate considering Julia’s target audience (i.e. Matlab users).</p>

<p>Julia strings are explicitly byte strings containing valid UTF-8 data.
All indexing occurs on bytes, which is trivially constant time, and
always decodes the multibyte code point starting at that byte. <em>But</em>
it is an error to index to a byte that doesn’t begin a code point.
That error is also trivially checked in constant time.</p>

<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">s</span> <span class="o">=</span> <span class="s">"π"</span>

<span class="n">s</span><span class="x">[</span><span class="mi">1</span><span class="x">]</span>
<span class="c"># =&gt; 'π'</span>

<span class="n">s</span><span class="x">[</span><span class="mi">2</span><span class="x">]</span>
<span class="c"># ERROR: UnicodeError: invalid character index</span>
<span class="c">#  in getindex at ./strings/basic.jl:37</span>
</code></pre></div></div>

<p>Slices are still over bytes, but they “round up” to the end of the
current code point:</p>

<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">s</span><span class="x">[</span><span class="mi">1</span><span class="o">:</span><span class="mi">1</span><span class="x">]</span>
<span class="c"># =&gt; "π"</span>
</code></pre></div></div>

<p>Iterating over a string requires helper functions which keep an internal
“bookmark” so that each access is constant time:</p>

<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="n">eachindex</span><span class="x">(</span><span class="n">string</span><span class="x">)</span>
    <span class="n">c</span> <span class="o">=</span> <span class="n">string</span><span class="x">[</span><span class="n">i</span><span class="x">]</span>
    <span class="c"># ...</span>
<span class="k">end</span>
</code></pre></div></div>

<p>So Julia doesn’t pretend, it makes the problem explicit.</p>

<h3 id="go">Go</h3>

<p>Go is very similar to Julia, but takes an even more explicit view of
strings. All strings are byte strings and there are no restrictions on
their contents. Conventionally strings contain UTF-8 encoded text, but
this is not strictly required. There’s a <code class="language-plaintext highlighter-rouge">unicode/utf8</code> package for
working with strings containing UTF-8 data.</p>

<p>Beyond convention, the <code class="language-plaintext highlighter-rouge">range</code> clause also assumes the string contains
UTF-8 data, and it’s not an error if it does not. Bytes not containing
valid UTF-8 data appear as a <code class="language-plaintext highlighter-rouge">REPLACEMENT CHARACTER</code> (U+FFFD).</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="n">s</span> <span class="o">:=</span> <span class="s">"π</span><span class="se">\xff</span><span class="s">"</span>
    <span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">r</span> <span class="o">:=</span> <span class="k">range</span> <span class="n">s</span> <span class="p">{</span>
        <span class="n">fmt</span><span class="o">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"U+%04x</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">r</span><span class="p">)</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="c">// U+03c0</span>
<span class="c">// U+fffd</span>
</code></pre></div></div>

<p>A further case of the language favoring UTF-8 is that casting a string
to <code class="language-plaintext highlighter-rouge">[]rune</code> decodes strings into code points, like UCS-4, again using
<code class="language-plaintext highlighter-rouge">REPLACEMENT CHARACTER</code>:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="n">s</span> <span class="o">:=</span> <span class="s">"π</span><span class="se">\xff</span><span class="s">"</span>
    <span class="n">r</span> <span class="o">:=</span> <span class="p">[]</span><span class="kt">rune</span><span class="p">(</span><span class="n">s</span><span class="p">)</span>
    <span class="n">fmt</span><span class="o">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"U+%04x</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">r</span><span class="p">[</span><span class="m">0</span><span class="p">])</span>
    <span class="n">fmt</span><span class="o">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"U+%04x</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">r</span><span class="p">[</span><span class="m">1</span><span class="p">])</span>
<span class="p">}</span>

<span class="c">// U+03c0</span>
<span class="c">// U+fffd</span>
</code></pre></div></div>

<p>So, like Julia, there’s no pretending, and the programmer explicitly
must consider the problem.</p>

<h3 id="preferences">Preferences</h3>

<p>All-in-all I probably prefer how Julia and Go are explicit with
UTF-8’s limitations, rather than Emacs Lisp’s attempt to cover it up
with an internal optimization. Since the abstraction is leaky, it may
as well be made explicit.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  <entry>
    <title>An Async / Await Library for Emacs Lisp</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/03/10/"/>
    <id>urn:uuid:5d1462fa-a30d-432e-9a4f-827eb67862b2</id>
    <updated>2019-03-10T20:57:03Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="lisp"/><category term="python"/><category term="javascript"/><category term="lang"/><category term="asyncio"/>
    <content type="html">
      <![CDATA[<p>As part of <a href="/blog/2019/02/24/">building my Python proficiency</a>, I’ve learned how to
use <a href="https://docs.python.org/3/library/asyncio.html">asyncio</a>. This new language feature <a href="https://docs.python.org/3/whatsnew/3.5.html#whatsnew-pep-492">first appeared in
Python 3.5</a> (<a href="https://www.python.org/dev/peps/pep-0492/">PEP 492</a>, September 2015). JavaScript grew <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/async_function">a
nearly identical feature</a> in ES2017 (June 2017). An async function
can pause to await on an asynchronously computed result, much like a
generator pausing when it yields a value.</p>

<p>In fact, both Python and JavaScript async functions are essentially just
fancy generator functions with some specialized syntax and semantics.
That is, they’re <a href="https://blog.varunramesh.net/posts/stackless-vs-stackful-coroutines/">stackless coroutines</a>. Both languages already had
generators, so their generator-like async functions are a natural
extension that — unlike <a href="/blog/2017/06/21/"><em>stackful</em> coroutines</a> — do not require
significant, new runtime plumbing.</p>

<p>Emacs <a href="/blog/2018/05/31/">officially got generators in 25.1</a> (September 2016),
though, unlike Python and JavaScript, it didn’t require any additional
support from the compiler or runtime. It’s implemented entirely using
Lisp macros. In other words, it’s just another library, not a core
language feature. In theory, the generator library could be easily
backported to the first Emacs release to <a href="/blog/2016/12/22/">properly support lexical
closures</a>, Emacs 24.1 (June 2012).</p>

<p>For the same reason, stackless async/await coroutines can also be
implemented as a library. So that’s what I did, letting Emacs’ generator
library do most of the heavy lifting. The package is called <code class="language-plaintext highlighter-rouge">aio</code>:</p>

<ul>
  <li><strong><a href="https://github.com/skeeto/emacs-aio">https://github.com/skeeto/emacs-aio</a></strong></li>
</ul>

<p>It’s modeled more closely on JavaScript’s async functions than Python’s
asyncio, with the core representation being <em>promises</em> rather than a
coroutine objects. I just have an easier time reasoning about promises
than coroutines.</p>

<p>I’m definitely <a href="https://github.com/chuntaro/emacs-async-await">not the first person to realize this was
possible</a>, and was beaten to the punch by two years. Wanting to
<a href="http://www.winestockwebdesign.com/Essays/Lisp_Curse.html">avoid fragmentation</a>, I set aside all formality in my first
iteration on the idea, not even bothering with namespacing my
identifiers. It was to be only an educational exercise. However, I got
quite attached to my little toy. Once I got my head wrapped around the
problem, everything just sort of clicked into place so nicely.</p>

<p>In this article I will show step-by-step one way to build async/await
on top of generators, laying out one concept at a time and then
building upon each. But first, some examples to illustrate the desired
final result.</p>

<h3 id="aio-example">aio example</h3>

<p>Ignoring <a href="/blog/2016/06/16/">all its problems</a> for a moment, suppose you want to use
<code class="language-plaintext highlighter-rouge">url-retrieve</code> to fetch some content from a URL and return it. To keep
this simple, I’m going to omit error handling. Also assume that
<code class="language-plaintext highlighter-rouge">lexical-binding</code> is <code class="language-plaintext highlighter-rouge">t</code> for all examples. Besides, lexical scope
required by the generator library, and therefore also required by <code class="language-plaintext highlighter-rouge">aio</code>.</p>

<p>The most naive approach is to fetch the content synchronously:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">fetch-fortune-1</span> <span class="p">(</span><span class="nv">url</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">buffer</span> <span class="p">(</span><span class="nv">url-retrieve-synchronously</span> <span class="nv">url</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">with-current-buffer</span> <span class="nv">buffer</span>
      <span class="p">(</span><span class="nb">prog1</span> <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">kill-buffer</span><span class="p">)))))</span>
</code></pre></div></div>

<p>The result is returned directly, and errors are communicated by an error
signal (e.g. Emacs’ version of exceptions). This is convenient, but the
function will block the main thread, locking up Emacs until the result
has arrived. This is obviously very undesirable, so, in practice,
everyone nearly always uses the asynchronous version:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">fetch-fortune-2</span> <span class="p">(</span><span class="nv">url</span> <span class="nv">callback</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">url-retrieve</span> <span class="nv">url</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">_status</span><span class="p">)</span>
                      <span class="p">(</span><span class="nb">funcall</span> <span class="nv">callback</span> <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)))))</span>
</code></pre></div></div>

<p>The main thread no longer blocks, but it’s a whole lot less
convenient. The result isn’t returned to the caller, and instead the
caller supplies a callback function. The result, whether success or
failure, will be delivered via callback, so the caller must split
itself into two pieces: the part before the callback and the callback
itself. Errors cannot be delivered using a error signal because of the
inverted flow control.</p>

<p>The situation gets worse if, say, you need to fetch results from two
different URLs. You either fetch results one at a time (inefficient),
or you manage two different callbacks that could be invoked in any
order, and therefore have to coordinate.</p>

<p><em>Wouldn’t it be nice for the function to work like the first example,
but be asynchronous like the second example?</em> Enter async/await:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">aio-defun</span> <span class="nv">fetch-fortune-3</span> <span class="p">(</span><span class="nv">url</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">buffer</span> <span class="p">(</span><span class="nv">aio-await</span> <span class="p">(</span><span class="nv">aio-url-retrieve</span> <span class="nv">url</span><span class="p">))))</span>
    <span class="p">(</span><span class="nv">with-current-buffer</span> <span class="nv">buffer</span>
      <span class="p">(</span><span class="nb">prog1</span> <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">kill-buffer</span><span class="p">)))))</span>
</code></pre></div></div>

<p>A function defined with <code class="language-plaintext highlighter-rouge">aio-defun</code> is just like <code class="language-plaintext highlighter-rouge">defun</code> except that
it can use <code class="language-plaintext highlighter-rouge">aio-await</code> to pause and wait on any other function defined
with <code class="language-plaintext highlighter-rouge">aio-defun</code> — or, more specifically, any function that returns a
promise. Borrowing Python parlance: Returning a promise makes a
function <em>awaitable</em>. If there’s an error, it’s delivered as a error
signal from <code class="language-plaintext highlighter-rouge">aio-url-retrieve</code>, just like the first example. When
called, this function returns immediately with a promise object that
represents a future result. The caller might look like this:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defcustom</span> <span class="nv">fortune-url</span> <span class="o">...</span><span class="p">)</span>

<span class="p">(</span><span class="nv">aio-defun</span> <span class="nv">display-fortune</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">message</span> <span class="s">"%s"</span> <span class="p">(</span><span class="nv">aio-await</span> <span class="p">(</span><span class="nv">fetch-fortune-3</span> <span class="nv">fortune-url</span><span class="p">))))</span>
</code></pre></div></div>

<p>How wonderfully clean that looks! And, yes, it even works with
<code class="language-plaintext highlighter-rouge">interactive</code> like that. I can <code class="language-plaintext highlighter-rouge">M-x display-fortune</code> and a fortune is
printed in the minibuffer as soon as the result arrives from the
server. In the meantime Emacs doesn’t block and I can continue my
work.</p>

<p>You can’t do anything you couldn’t already do before. It’s just a
nicer way to organize the same callbacks: <em>implicit</em> rather than
<em>explicit</em>.</p>

<h3 id="promises-simplified">Promises, simplified</h3>

<p>The core object at play is the <em>promise</em>. Promises are already a
rather simple concept, but <code class="language-plaintext highlighter-rouge">aio</code> promises have been distilled to their
essence, as they’re only needed for this singular purpose. More on
this later.</p>

<p>As I said, a promise represents a future result. In practical terms, a
promise is just an object to which one can subscribe with a callback.
When the result is ready, the callbacks are invoked. Another way to
put it is that <em>promises <a href="https://en.wikipedia.org/wiki/Reification_(computer_science)">reify</a> the concept of callbacks</em>. A
callback is no longer just the idea of extra argument on a function.
It’s a first-class <em>thing</em> that itself can be passed around as a
value.</p>

<p>Promises have two slots: the final promise <em>result</em> and a list of
<em>subscribers</em>. A <code class="language-plaintext highlighter-rouge">nil</code> result means the result hasn’t been computed
yet. It’s so simple I’m not even <a href="/blog/2018/02/14/">bothering with <code class="language-plaintext highlighter-rouge">cl-struct</code></a>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">aio-promise</span> <span class="p">()</span>
  <span class="s">"Create a new promise object."</span>
  <span class="p">(</span><span class="nv">record</span> <span class="ss">'aio-promise</span> <span class="no">nil</span> <span class="p">()))</span>

<span class="p">(</span><span class="nv">defsubst</span> <span class="nv">aio-promise-p</span> <span class="p">(</span><span class="nv">object</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">eq</span> <span class="ss">'aio-promise</span> <span class="p">(</span><span class="nb">type-of</span> <span class="nv">object</span><span class="p">))</span>
       <span class="p">(</span><span class="nb">=</span> <span class="mi">3</span> <span class="p">(</span><span class="nb">length</span> <span class="nv">object</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">defsubst</span> <span class="nv">aio-result</span> <span class="p">(</span><span class="nv">promise</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">aref</span> <span class="nv">promise</span> <span class="mi">1</span><span class="p">))</span>
</code></pre></div></div>

<p>To subscribe to a promise, use <code class="language-plaintext highlighter-rouge">aio-listen</code>:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">aio-listen</span> <span class="p">(</span><span class="nv">promise</span> <span class="nv">callback</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">result</span> <span class="p">(</span><span class="nv">aio-result</span> <span class="nv">promise</span><span class="p">)))</span>
    <span class="p">(</span><span class="k">if</span> <span class="nv">result</span>
        <span class="p">(</span><span class="nv">run-at-time</span> <span class="mi">0</span> <span class="no">nil</span> <span class="nv">callback</span> <span class="nv">result</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">push</span> <span class="nv">callback</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">promise</span> <span class="mi">2</span><span class="p">)))))</span>
</code></pre></div></div>

<p>If the result isn’t ready yet, add the callback to the list of
subscribers. If the result is ready <em>call the callback in the next
event loop turn</em> using <code class="language-plaintext highlighter-rouge">run-at-time</code>. This is important because it
keeps all the asynchronous components isolated from one another. They
won’t see each others’ frames on the call stack, nor frames from
<code class="language-plaintext highlighter-rouge">aio</code>. This is so important that the <a href="https://promisesaplus.com/">Promises/A+ specification</a>
is explicit about it.</p>

<p>The other half of the equation is resolving a promise, which is done
with <code class="language-plaintext highlighter-rouge">aio-resolve</code>. Unlike other promises, <code class="language-plaintext highlighter-rouge">aio</code> promises don’t care
whether the promise is being <em>fulfilled</em> (success) or <em>rejected</em>
(error). Instead a promise is resolved using a <em>value function</em> — or,
usually, a <em>value closure</em>. Subscribers receive this value function
and extract the value by invoking it with no arguments.</p>

<p>Why? This lets the promise’s resolver decide the semantics of the
result. Instead of returning a value, this function can instead signal
an error, propagating an error signal that terminated an async function.
Because of this, the promise doesn’t need to know how it’s being
resolved.</p>

<p>When a promise is resolved, subscribers are each scheduled in their own
event loop turns in the same order that they subscribed. If a promise
has already been resolved, nothing happens. (Thought: Perhaps this
should be an error in order to catch API misuse?)</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">aio-resolve</span> <span class="p">(</span><span class="nv">promise</span> <span class="nv">value-function</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">unless</span> <span class="p">(</span><span class="nv">aio-result</span> <span class="nv">promise</span><span class="p">)</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">callbacks</span> <span class="p">(</span><span class="nb">nreverse</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">promise</span> <span class="mi">2</span><span class="p">))))</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">promise</span> <span class="mi">1</span><span class="p">)</span> <span class="nv">value-function</span>
            <span class="p">(</span><span class="nb">aref</span> <span class="nv">promise</span> <span class="mi">2</span><span class="p">)</span> <span class="p">())</span>
      <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">callback</span> <span class="nv">callbacks</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">run-at-time</span> <span class="mi">0</span> <span class="no">nil</span> <span class="nv">callback</span> <span class="nv">value-function</span><span class="p">)))))</span>
</code></pre></div></div>

<p>If you’re not an async function, you might subscribe to a promise like
so:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">aio-listen</span> <span class="nv">promise</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">v</span><span class="p">)</span>
                      <span class="p">(</span><span class="nv">message</span> <span class="s">"%s"</span> <span class="p">(</span><span class="nb">funcall</span> <span class="nv">v</span><span class="p">))))</span>
</code></pre></div></div>

<p>The simplest example of a non-async function that creates and delivers
on a promise is a “sleep” function:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">aio-sleep</span> <span class="p">(</span><span class="nv">seconds</span> <span class="k">&amp;optional</span> <span class="nv">result</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">promise</span> <span class="p">(</span><span class="nv">aio-promise</span><span class="p">))</span>
        <span class="p">(</span><span class="nv">value-function</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="nv">result</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">prog1</span> <span class="nv">promise</span>
      <span class="p">(</span><span class="nv">run-at-time</span> <span class="nv">seconds</span> <span class="no">nil</span>
                   <span class="nf">#'</span><span class="nv">aio-resolve</span> <span class="nv">promise</span> <span class="nv">value-function</span><span class="p">))))</span>
</code></pre></div></div>

<p>Similarly, here’s a “timeout” promise that delivers a special timeout
error signal at a given time in the future.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">aio-timeout</span> <span class="p">(</span><span class="nv">seconds</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">promise</span> <span class="p">(</span><span class="nv">aio-promise</span><span class="p">))</span>
        <span class="p">(</span><span class="nv">value-function</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="p">(</span><span class="nb">signal</span> <span class="ss">'aio-timeout</span> <span class="no">nil</span><span class="p">))))</span>
    <span class="p">(</span><span class="nb">prog1</span> <span class="nv">promise</span>
      <span class="p">(</span><span class="nv">run-at-time</span> <span class="nv">seconds</span> <span class="no">nil</span>
                   <span class="nf">#'</span><span class="nv">aio-resolve</span> <span class="nv">promise</span> <span class="nv">value-function</span><span class="p">))))</span>
</code></pre></div></div>

<p>That’s all there is to promises.</p>

<h3 id="evaluate-in-the-context-of-a-promise">Evaluate in the context of a promise</h3>

<p>Before we get into pausing functions, lets deal with the slightly
simpler matter of delivering their return values using a promise. What
we need is a way to evaluate a “body” and capture its result in a
promise. If the body exits due to a signal, we want to capture that as
well.</p>

<p>Here’s a macro that does just this:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmacro</span> <span class="nv">aio-with-promise</span> <span class="p">(</span><span class="nv">promise</span> <span class="k">&amp;rest</span> <span class="nv">body</span><span class="p">)</span>
  <span class="o">`</span><span class="p">(</span><span class="nv">aio-resolve</span> <span class="o">,</span><span class="nv">promise</span>
                <span class="p">(</span><span class="nv">condition-case</span> <span class="nb">error</span>
                    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">result</span> <span class="p">(</span><span class="k">progn</span> <span class="o">,@</span><span class="nv">body</span><span class="p">)))</span>
                      <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="nv">result</span><span class="p">))</span>
                  <span class="p">(</span><span class="nb">error</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
                           <span class="p">(</span><span class="nb">signal</span> <span class="p">(</span><span class="nb">car</span> <span class="nb">error</span><span class="p">)</span> <span class="c1">; rethrow</span>
                                   <span class="p">(</span><span class="nb">cdr</span> <span class="nb">error</span><span class="p">)))))))</span>
</code></pre></div></div>

<p>The body result is captured in a closure and delivered to the promise.
If there’s an error signal, it’s “<em>rethrown</em>” into subscribers by the
promise’s value function.</p>

<p>This is where Emacs Lisp has a serious weak spot. There’s not really a
concept of rethrowing a signal. Unlike a language with explicit
exception objects that can capture a snapshot of the backtrace, the
original backtrace is completely lost where the signal is caught.
There’s no way to “reattach” it to the signal when it’s rethrown. This
is unfortunate because it would greatly help debugging if you got to see
the full backtrace on the other side of the promise.</p>

<h3 id="async-functions">Async functions</h3>

<p>So we have promises and we want to pause a function on a promise.
Generators have <code class="language-plaintext highlighter-rouge">iter-yield</code> for pausing an iterator’s execution. To
tackle this problem:</p>

<ol>
  <li>Yield the promise to pause the iterator.</li>
  <li>Subscribe a callback on the promise that continues the generator
(<code class="language-plaintext highlighter-rouge">iter-next</code>) with the promise’s result as the yield result.</li>
</ol>

<p>All the hard work is done in either side of the yield, so <code class="language-plaintext highlighter-rouge">aio-await</code> is
just a simple wrapper around <code class="language-plaintext highlighter-rouge">iter-yield</code>:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmacro</span> <span class="nv">aio-await</span> <span class="p">(</span><span class="nv">expr</span><span class="p">)</span>
  <span class="o">`</span><span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">iter-yield</span> <span class="o">,</span><span class="nv">expr</span><span class="p">)))</span>
</code></pre></div></div>

<p>Remember, that <code class="language-plaintext highlighter-rouge">funcall</code> is here to extract the promise value from the
value function. If it signals an error, this propagates directly into
the iterator just as if it had been a direct call — minus an accurate
backtrace.</p>

<p>So <code class="language-plaintext highlighter-rouge">aio-lambda</code> / <code class="language-plaintext highlighter-rouge">aio-defun</code> needs to wrap the body in a generator
(<code class="language-plaintext highlighter-rouge">iter-lamba</code>), invoke it to produce a generator, then drive the
generator using callbacks. Here’s a simplified, unhygienic definition of
<code class="language-plaintext highlighter-rouge">aio-lambda</code>:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmacro</span> <span class="nv">aio-lambda</span> <span class="p">(</span><span class="nv">arglist</span> <span class="k">&amp;rest</span> <span class="nv">body</span><span class="p">)</span>
  <span class="o">`</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">args</span><span class="p">)</span>
     <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">promise</span> <span class="p">(</span><span class="nv">aio-promise</span><span class="p">))</span>
           <span class="p">(</span><span class="nv">iter</span> <span class="p">(</span><span class="nb">apply</span> <span class="p">(</span><span class="nv">iter-lambda</span> <span class="o">,</span><span class="nv">arglist</span>
                          <span class="p">(</span><span class="nv">aio-with-promise</span> <span class="nv">promise</span>
                            <span class="o">,@</span><span class="nv">body</span><span class="p">))</span>
                        <span class="nv">args</span><span class="p">)))</span>
       <span class="p">(</span><span class="nb">prog1</span> <span class="nv">promise</span>
         <span class="p">(</span><span class="nv">aio--step</span> <span class="nv">iter</span> <span class="nv">promise</span> <span class="no">nil</span><span class="p">)))))</span>
</code></pre></div></div>

<p>The body is evaluated inside <code class="language-plaintext highlighter-rouge">aio-with-promise</code> with the result
delivered to the promise returned directly by the async function.</p>

<p>Before returning, the iterator is handed to <code class="language-plaintext highlighter-rouge">aio--step</code>, which drives
the iterator forward until it delivers its first promise. When the
iterator yields a promise, <code class="language-plaintext highlighter-rouge">aio--step</code> attaches a callback back to
itself on the promise as described above. Immediately driving the
iterator up to the first yielded promise “primes” it, which is
important for getting the ball rolling on any asynchronous operations.</p>

<p>If the iterator ever yields something other than a promise, it’s
delivered right back into the iterator.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">aio--step</span> <span class="p">(</span><span class="nv">iter</span> <span class="nv">promise</span> <span class="nv">yield-result</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">condition-case</span> <span class="nv">_</span>
      <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">result</span> <span class="nb">=</span> <span class="p">(</span><span class="nv">iter-next</span> <span class="nv">iter</span> <span class="nv">yield-result</span><span class="p">)</span>
               <span class="nv">then</span> <span class="p">(</span><span class="nv">iter-next</span> <span class="nv">iter</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="nv">result</span><span class="p">))</span>
               <span class="nv">until</span> <span class="p">(</span><span class="nv">aio-promise-p</span> <span class="nv">result</span><span class="p">)</span>
               <span class="nv">finally</span> <span class="p">(</span><span class="nv">aio-listen</span> <span class="nv">result</span>
                                   <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">value</span><span class="p">)</span>
                                     <span class="p">(</span><span class="nv">aio--step</span> <span class="nv">iter</span> <span class="nv">promise</span> <span class="nv">value</span><span class="p">))))</span>
    <span class="p">(</span><span class="nv">iter-end-of-sequence</span><span class="p">)))</span>
</code></pre></div></div>

<p>When the iterator is done, nothing more needs to happen since the
iterator resolves its own return value promise.</p>

<p>The definition of <code class="language-plaintext highlighter-rouge">aio-defun</code> just uses <code class="language-plaintext highlighter-rouge">aio-lambda</code> with <code class="language-plaintext highlighter-rouge">defalias</code>.
There’s nothing to it.</p>

<p>That’s everything you need! Everything else in the package is merely
useful, awaitable functions like <code class="language-plaintext highlighter-rouge">aio-sleep</code> and <code class="language-plaintext highlighter-rouge">aio-timeout</code>.</p>

<h3 id="composing-promises">Composing promises</h3>

<p>Unfortunately <code class="language-plaintext highlighter-rouge">url-retrieve</code> doesn’t support timeouts. We can work
around this by composing two promises: a <code class="language-plaintext highlighter-rouge">url-retrieve</code> promise and
<code class="language-plaintext highlighter-rouge">aio-timeout</code> promise. First define a promise-returning function,
<code class="language-plaintext highlighter-rouge">aio-select</code> that takes a list of promises and returns (as another
promise) the first promise to resolve:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">aio-select</span> <span class="p">(</span><span class="nv">promises</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">result</span> <span class="p">(</span><span class="nv">aio-promise</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">prog1</span> <span class="nv">result</span>
      <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">promise</span> <span class="nv">promises</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">aio-listen</span> <span class="nv">promise</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">_</span><span class="p">)</span>
                              <span class="p">(</span><span class="nv">aio-resolve</span>
                               <span class="nv">result</span>
                               <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="nv">promise</span><span class="p">))))))))</span>
</code></pre></div></div>

<p>We give <code class="language-plaintext highlighter-rouge">aio-select</code> both our <code class="language-plaintext highlighter-rouge">url-retrieve</code> and <code class="language-plaintext highlighter-rouge">timeout</code> promises, and
it tells us which resolved first:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">aio-defun</span> <span class="nv">fetch-fortune-4</span> <span class="p">(</span><span class="nv">url</span> <span class="nv">timeout</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">promises</span> <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="nv">aio-url-retrieve</span> <span class="nv">url</span><span class="p">)</span>
                         <span class="p">(</span><span class="nv">aio-timeout</span> <span class="nv">timeout</span><span class="p">)))</span>
         <span class="p">(</span><span class="nv">fastest</span> <span class="p">(</span><span class="nv">aio-await</span> <span class="p">(</span><span class="nv">aio-select</span> <span class="nv">promises</span><span class="p">)))</span>
         <span class="p">(</span><span class="nv">buffer</span> <span class="p">(</span><span class="nv">aio-await</span> <span class="nv">fastest</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">with-current-buffer</span> <span class="nv">buffer</span>
      <span class="p">(</span><span class="nb">prog1</span> <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">kill-buffer</span><span class="p">)))))</span>
</code></pre></div></div>

<p>Cool! Note: This will not actually cancel the URL request, just move
the async function forward earlier and prevent it from getting the
result.</p>

<h3 id="threads">Threads</h3>

<p>Despite <code class="language-plaintext highlighter-rouge">aio</code> being entirely about managing concurrent, asynchronous
operations, it has nothing at all to do with threads — as in Emacs 26’s
support for kernel threads. All async functions and promise callbacks
are expected to run <em>only</em> on the main thread. That’s not to say an
async function can’t await on a result from another thread. It just must
be <a href="/blog/2017/02/14/">done very carefully</a>.</p>

<h3 id="processes">Processes</h3>

<p>The package also includes two functions for realizing promises on
processes, whether they be subprocesses or network sockets.</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">aio-process-filter</code></li>
  <li><code class="language-plaintext highlighter-rouge">aio-process-sentinel</code></li>
</ul>

<p>For example, this function loops over each chunk of output (typically
4kB) from the process, as delivered to a filter function:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">aio-defun</span> <span class="nv">process-chunks</span> <span class="p">(</span><span class="nv">process</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">chunk</span> <span class="nb">=</span> <span class="p">(</span><span class="nv">aio-await</span> <span class="p">(</span><span class="nv">aio-process-filter</span> <span class="nv">process</span><span class="p">))</span>
           <span class="nv">while</span> <span class="nv">chunk</span>
           <span class="nb">do</span> <span class="p">(</span><span class="o">...</span> <span class="nv">process</span> <span class="nv">chunk</span> <span class="o">...</span><span class="p">)))</span>
</code></pre></div></div>

<p>Exercise for the reader: Write an awaitable function that returns a line
at at time rather than a chunk at a time. You can build it on top of
<code class="language-plaintext highlighter-rouge">aio-process-filter</code>.</p>

<p>I considered wrapping functions like <code class="language-plaintext highlighter-rouge">start-process</code> so that their <code class="language-plaintext highlighter-rouge">aio</code>
versions would return a promise representing some kind of result from
the process. However there are <em>so</em> many different ways to create and
configure processes that I would have ended up duplicating all the
process functions. Focusing on the filter and sentinel, and letting the
caller create and configure the process is much cleaner.</p>

<p>Unfortunately Emacs has no asynchronous API for writing output to a
process. Both <code class="language-plaintext highlighter-rouge">process-send-string</code> and <code class="language-plaintext highlighter-rouge">process-send-region</code> will block
if the pipe or socket is full. There is no callback, so you cannot await
on writing output. Maybe there’s a way to do it with a dedicated thread?</p>

<p>Another issue is that the <code class="language-plaintext highlighter-rouge">process-send-*</code> functions <a href="/blog/2013/01/14/">are
preemptible</a>, made necessary because they block. The
<code class="language-plaintext highlighter-rouge">aio-process-*</code> functions leave a gap (i.e. between filter awaits)
where no filter or sentinel function is attached. It’s a consequence
of promises being single-fire. The gap is harmless so long as the
async function doesn’t await something else or get preempted. This
needs some more thought.</p>

<p><strong><em>Update</em></strong>: These process functions no longer exist and have been
replaced by a small framework for building chains of promises. See
<code class="language-plaintext highlighter-rouge">aio-make-callback</code>.</p>

<h3 id="testing-aio">Testing aio</h3>

<p>The test suite for <code class="language-plaintext highlighter-rouge">aio</code> is a bit unusual. Emacs’ built-in test suite,
ERT, doesn’t support asynchronous tests. Furthermore, tests are
generally run in batch mode, where Emacs invokes a single function and
then exits rather than pump an event loop. Batch mode can only handle
asynchronous process I/O, not the async functions of <code class="language-plaintext highlighter-rouge">aio</code>. So it’s
not possible to run the tests in batch mode.</p>

<p>Instead I hacked together a really crude callback-based test suite. It
runs in non-batch mode and writes the test results into a buffer
(run with <code class="language-plaintext highlighter-rouge">make check</code>). Not ideal, but it works.</p>

<p>One of the tests is a sleep sort (with reasonable tolerances). It’s a
pretty neat demonstration of what you can do with <code class="language-plaintext highlighter-rouge">aio</code>:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">aio-defun</span> <span class="nv">sleep-sort</span> <span class="p">(</span><span class="nb">values</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">promises</span> <span class="p">(</span><span class="nb">mapcar</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nv">aio-sleep</span> <span class="nv">v</span> <span class="nv">v</span><span class="p">))</span> <span class="nb">values</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">while</span> <span class="nv">promises</span>
             <span class="nv">for</span> <span class="nv">next</span> <span class="nb">=</span> <span class="p">(</span><span class="nv">aio-await</span> <span class="p">(</span><span class="nv">aio-select</span> <span class="nv">promises</span><span class="p">))</span>
             <span class="nb">do</span> <span class="p">(</span><span class="nb">setf</span> <span class="nv">promises</span> <span class="p">(</span><span class="nv">delq</span> <span class="nv">next</span> <span class="nv">promises</span><span class="p">))</span>
             <span class="nv">collect</span> <span class="p">(</span><span class="nv">aio-await</span> <span class="nv">next</span><span class="p">))))</span>
</code></pre></div></div>

<p>To see it in action (<code class="language-plaintext highlighter-rouge">M-x sleep-sort-demo</code>):</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">aio-defun</span> <span class="nv">sleep-sort-demo</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nb">values</span> <span class="o">'</span><span class="p">(</span><span class="mf">0.1</span> <span class="mf">0.4</span> <span class="mf">1.1</span> <span class="mf">0.2</span> <span class="mf">0.8</span> <span class="mf">0.6</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">message</span> <span class="s">"%S"</span> <span class="p">(</span><span class="nv">aio-await</span> <span class="p">(</span><span class="nv">sleep-sort</span> <span class="nb">values</span><span class="p">)))))</span>
</code></pre></div></div>

<h3 id="asyncawait-is-pretty-awesome">Async/await is pretty awesome</h3>

<p>I’m quite happy with how this all came together. Once I had the
concepts straight — particularly resolving to value functions —
everything made sense and all the parts fit together well, and mostly
by accident. That feels good.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>The CPython Bytecode Compiler is Dumb</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/02/24/"/>
    <id>urn:uuid:4348d611-858b-4f48-a6f5-6e4b93f71a34</id>
    <updated>2019-02-24T21:56:35Z</updated>
    <category term="python"/><category term="lua"/><category term="lang"/><category term="elisp"/><category term="optimization"/>
    <content type="html">
      <![CDATA[<p><em>This article was <a href="https://news.ycombinator.com/item?id=19241545">discussed on Hacker News</a>.</em></p>

<p>Due to sheer coincidence of several unrelated tasks converging on
Python at work, I recently needed to brush up on my Python skills. So
far for me, Python has been little more than <a href="/blog/2017/05/15/">a fancy extension
language for BeautifulSoup</a>, though I also used it to participate
in the recent tradition of <a href="https://github.com/skeeto/qualbum">writing one’s own static site
generator</a>, in this case for <a href="http://photo.nullprogram.com/">my wife’s photo blog</a>.
I’ve been reading through <em>Fluent Python</em> by Luciano Ramalho, and it’s
been quite effective at getting me up to speed.</p>

<!--more-->

<p>As I write Python, <a href="/blog/2014/01/04/">like with Emacs Lisp</a>, I can’t help but
consider what exactly is happening inside the interpreter. I wonder if
the code I’m writing is putting undue constraints on the bytecode
compiler and limiting its options. Ultimately I’d like the code I
write <a href="/blog/2017/01/30/">to drive the interpreter efficiently and effectively</a>.
<a href="https://www.python.org/dev/peps/pep-0020/">The Zen of Python</a> says there should “only one obvious way to do
it,” but in practice there’s a lot of room for expression. Given
multiple ways to express the same algorithm or idea, I tend to prefer
the one that compiles to the more efficient bytecode.</p>

<p>Fortunately CPython, the main and most widely used implementation of
Python, is very transparent about its bytecode. It’s easy to inspect
and reason about its bytecode. The disassembly listing is easy to read
and understand, and I can always follow it without consulting the
documentation. This contrasts sharply with modern JavaScript engines
and their opaque use of JIT compilation, where performance is guided
by obeying certain patterns (<a href="https://www.youtube.com/watch?v=UJPdhx5zTaw">hidden classes</a>, etc.), helping the
compiler <a href="https://blog.mozilla.org/javascript/2013/11/07/efficient-float32-arithmetic-in-javascript/">understand my program’s types</a>, and being careful
not to unnecessarily constrain the compiler.</p>

<p>So, besides just catching up with Python the language, I’ve been
studying the bytecode disassembly of the functions that I write. One
fact has become quite apparent: <strong>the CPython bytecode compiler is
pretty dumb</strong>. With a few exceptions, it’s a very literal translation
of a Python program, and there is almost <a href="https://legacy.python.org/workshops/1998-11/proceedings/papers/montanaro/montanaro.html">no optimization</a>.
Below I’ll demonstrate a case where it’s possible to detect one of the
missed optimizations without inspecting the bytecode disassembly
thanks to a small abstraction leak in the optimizer.</p>

<p>To be clear: This isn’t to say CPython is bad, or even that it should
necessarily change. In fact, as I’ll show, <strong>dumb bytecode compilers
are par for the course</strong>. In the past I’ve lamented how the Emacs Lisp
compiler could do a better job, but CPython and Lua are operating at
the same level. There are benefits to a dumb and straightforward
bytecode compiler: the compiler itself is simpler, easier to maintain,
and more amenable to modification (e.g. as Python continues to
evolve). It’s also easier to debug Python (<code class="language-plaintext highlighter-rouge">pdb</code>) because it’s such a
close match to the source listing.</p>

<p><em>Update</em>: <a href="https://codewords.recurse.com/issues/seven/dragon-taming-with-tailbiter-a-bytecode-compiler">Darius Bacon points out</a> that Guido van Rossum
himself said, “<a href="https://books.google.com/books?id=bIxWAgAAQBAJ&amp;pg=PA26&amp;lpg=PA26&amp;dq=%22Python+is+about+having+the+simplest,+dumbest+compiler+imaginable.%22&amp;source=bl&amp;ots=2OfDoWX321&amp;sig=ACfU3U32jKZBE3VkJ0gvkKbxRRgD0bnoRg&amp;hl=en&amp;sa=X&amp;ved=2ahUKEwjZ1quO89bgAhWpm-AKHfckAxUQ6AEwAHoECAkQAQ#v=onepage&amp;q=%22Python%20is%20about%20having%20the%20simplest%2C%20dumbest%20compiler%20imaginable.%22&amp;f=false">Python is about having the simplest, dumbest compiler
imaginable.</a>” So this is all very much by design.</p>

<p>The consensus seems to be that if you want or need better performance,
use something other than Python. (And if you can’t do that, at least use
<a href="https://pypy.org/">PyPy</a>.) That’s a fairly reasonable and healthy goal. Still, if
I’m writing Python, I’d like to do the best I can, which means
exploiting the optimizations that <em>are</em> available when possible.</p>

<h3 id="disassembly-examples">Disassembly examples</h3>

<p>I’m going to compare three bytecode compilers in this article: CPython
3.7, Lua 5.3, and Emacs 26.1. Each of these languages are dynamically
typed, are primarily executed on a bytecode virtual machine, and it’s
easy to access their disassembly listings. One caveat: CPython and Emacs
use a stack-based virtual machine while Lua uses a register-based
virtual machine.</p>

<p>For CPython I’ll be using the <code class="language-plaintext highlighter-rouge">dis</code> module. For Emacs Lisp I’ll use <code class="language-plaintext highlighter-rouge">M-x
disassemble</code>, and all code will use lexical scoping. In Lua I’ll use
<code class="language-plaintext highlighter-rouge">lua -l</code> on the command line.</p>

<h3 id="local-variable-elimination">Local variable elimination</h3>

<p>Will the bytecode compiler eliminate local variables? Keeping the
variable around potentially involves allocating memory for it, assigning
to it, and accessing it. Take this example:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
    <span class="n">x</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="n">y</span> <span class="o">=</span> <span class="mi">1</span>
    <span class="k">return</span> <span class="n">x</span>
</code></pre></div></div>

<p>This function is equivalent to:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
    <span class="k">return</span> <span class="mi">0</span>
</code></pre></div></div>

<p>Despite this, CPython completely misses this optimization for both <code class="language-plaintext highlighter-rouge">x</code>
and <code class="language-plaintext highlighter-rouge">y</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  2           0 LOAD_CONST               1 (0)
              2 STORE_FAST               0 (x)
  3           4 LOAD_CONST               2 (1)
              6 STORE_FAST               1 (y)
  4           8 LOAD_FAST                0 (x)
             10 RETURN_VALUE
</code></pre></div></div>

<p>It assigns both variables, and even loads again from <code class="language-plaintext highlighter-rouge">x</code> for the return.
Missed optimizations, but, as I said, by keeping these variables around,
debugging is more straightforward. Users can always inspect variables.</p>

<p>How about Lua?</p>

<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span> <span class="nf">foo</span><span class="p">()</span>
    <span class="kd">local</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="kd">local</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">1</span>
    <span class="k">return</span> <span class="n">x</span>
<span class="k">end</span>
</code></pre></div></div>

<p>It also misses this optimization, though it matters a little less due to
its architecture (the return instruction references a register
regardless of whether or not that register is allocated to a local
variable):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>        1       [2]     LOADK           0 -1    ; 0
        2       [3]     LOADK           1 -2    ; 1
        3       [4]     RETURN          0 2
        4       [5]     RETURN          0 1
</code></pre></div></div>

<p>Emacs Lisp also misses it:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">foo</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">x</span> <span class="mi">0</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">y</span> <span class="mi">1</span><span class="p">))</span>
    <span class="nv">x</span><span class="p">))</span>
</code></pre></div></div>

<p>Disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0	constant  0
1	constant  1
2	stack-ref 1
3	return
</code></pre></div></div>

<p>All three are on the same page.</p>

<h3 id="constant-folding">Constant folding</h3>

<p>Does the bytecode compiler evaluate simple constant expressions at
compile time? This is simple and everyone does it.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
    <span class="k">return</span> <span class="mi">1</span> <span class="o">+</span> <span class="mi">2</span> <span class="o">*</span> <span class="mi">3</span> <span class="o">/</span> <span class="mi">4</span>
</code></pre></div></div>

<p>Disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  2           0 LOAD_CONST               1 (2.5)
              2 RETURN_VALUE
</code></pre></div></div>

<p>Lua:</p>

<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span> <span class="nf">foo</span><span class="p">()</span>
    <span class="k">return</span> <span class="mi">1</span> <span class="o">+</span> <span class="mi">2</span> <span class="o">*</span> <span class="mi">3</span> <span class="o">/</span> <span class="mi">4</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>        1       [2]     LOADK           0 -1    ; 2.5
        2       [2]     RETURN          0 2
        3       [3]     RETURN          0 1
</code></pre></div></div>

<p>Emacs Lisp:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">foo</span> <span class="p">()</span>
  <span class="p">(</span><span class="nb">+</span> <span class="mi">1</span> <span class="p">(</span><span class="nb">/</span> <span class="p">(</span><span class="nb">*</span> <span class="mi">2</span> <span class="mi">3</span><span class="p">)</span> <span class="mf">4.0</span><span class="p">))</span>
</code></pre></div></div>

<p>Disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0	constant  2.5
1	return
</code></pre></div></div>

<p>That’s something we can count on so long as the operands are all
numeric literals (or also, for Python, string literals) that are
visible to the compiler. Don’t count on your operator overloads to
work here, though.</p>

<h3 id="allocation-optimization">Allocation optimization</h3>

<p>Optimizers often perform <em>escape analysis</em>, to determine if objects
allocated in a function ever become visible outside of that function. If
they don’t then these objects could potentially be stack-allocated
(instead of heap-allocated) or even be eliminated entirely.</p>

<p>None of the bytecode compilers are this sophisticated. However CPython
does have a trick up its sleeve: tuple optimization. Since tuples are
immutable, in certain circumstances CPython will reuse them and avoid
both the constructor and the allocation.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
    <span class="k">return</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
</code></pre></div></div>

<p>Check it out, the tuple is used as a constant:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  2           0 LOAD_CONST               1 ((1, 2, 3))
              2 RETURN_VALUE
</code></pre></div></div>

<p>Which we can detect by evaluating <code class="language-plaintext highlighter-rouge">foo() is foo()</code>, which is <code class="language-plaintext highlighter-rouge">True</code>.
Though deviate from this too much and the optimization is disabled.
Remember how CPython can’t optimize away variables, and that they
break constant folding? The break this, too:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
    <span class="n">x</span> <span class="o">=</span> <span class="mi">1</span>
    <span class="k">return</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
</code></pre></div></div>

<p>Disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  2           0 LOAD_CONST               1 (1)
              2 STORE_FAST               0 (x)
  3           4 LOAD_FAST                0 (x)
              6 LOAD_CONST               2 (2)
              8 LOAD_CONST               3 (3)
             10 BUILD_TUPLE              3
             12 RETURN_VALUE
</code></pre></div></div>

<p>This function might document that it always returns a simple tuple,
but we can tell if its being optimized or not using <code class="language-plaintext highlighter-rouge">is</code> like before:
<code class="language-plaintext highlighter-rouge">foo() is foo()</code> is now <code class="language-plaintext highlighter-rouge">False</code>! In some future version of Python with
a cleverer bytecode compiler, that expression might evaluate to
<code class="language-plaintext highlighter-rouge">True</code>. (Unless the <a href="https://docs.python.org/3/reference/">Python language specification</a> is specific
about this case, which I didn’t check.)</p>

<p>Note: Curiously PyPy replicates this exact behavior when examined with
<code class="language-plaintext highlighter-rouge">is</code>. Was that deliberate? I’m impressed that PyPy matches CPython’s
semantics so closely here.</p>

<p>Putting a mutable value, such as a list, in the tuple will also break
this optimization. But that’s not the compiler being dumb. That’s a
hard constraint on the compiler: the caller might change the mutable
component of the tuple, so it must always return a fresh copy.</p>

<p>Neither Lua nor Emacs Lisp have a language-level concept equivalent of
an immutable tuple, so there’s nothing to compare.</p>

<p>Other than the tuples situation in CPython, none of the bytecode
compilers eliminate unnecessary intermediate objects.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
    <span class="k">return</span> <span class="p">[</span><span class="mi">1024</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span>
</code></pre></div></div>

<p>Disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  2           0 LOAD_CONST               1 (1024)
              2 BUILD_LIST               1
              4 LOAD_CONST               2 (0)
              6 BINARY_SUBSCR
              8 RETURN_VALUE
</code></pre></div></div>

<p>Lua:</p>

<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span> <span class="nf">foo</span><span class="p">()</span>
    <span class="k">return</span> <span class="p">({</span><span class="mi">1024</span><span class="p">})[</span><span class="mi">1</span><span class="p">]</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>        1       [2]     NEWTABLE        0 1 0
        2       [2]     LOADK           1 -1    ; 1024
        3       [2]     SETLIST         0 1 1   ; 1
        4       [2]     GETTABLE        0 0 -2  ; 1
        5       [2]     RETURN          0 2
        6       [3]     RETURN          0 1
</code></pre></div></div>

<p>Emacs Lisp:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">foo</span> <span class="p">()</span>
  <span class="p">(</span><span class="nb">car</span> <span class="p">(</span><span class="nb">list</span> <span class="mi">1024</span><span class="p">)))</span>
</code></pre></div></div>

<p>Disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0	constant  1024
1	list1
2	car
3	return
</code></pre></div></div>

<h3 id="dont-expect-too-much">Don’t expect too much</h3>

<p>I could go on with lots of examples, looking at loop optimizations and
so on, and each case is almost certainly unoptimized. The general rule
of thumb is to simply not expect much from these bytecode compilers.
They’re very literal in their translation.</p>

<p>Working so much in C has put me in the habit of expecting all obvious
optimizations from the compiler. This frees me to be more expressive
in my code. Lots of things are cost-free thanks to these
optimizations, such as breaking a complex expression up into several
variables, naming my constants, or not using a local variable to
manually cache memory accesses. I’m confident the compiler will
optimize away my expressiveness. The catch is that <a href="/blog/2018/05/01/">clever compilers
can take things too far</a>, so I’ve got to be mindful of how it might
undermine my intentions — i.e. when I’m doing something unusual or not
strictly permitted.</p>

<p>These bytecode compilers will never truly surprise me. The cost is
that being more expressive in Python, Lua, or Emacs Lisp may reduce
performance at run time because it shows in the bytecode. Usually this
doesn’t matter, but sometimes it does.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Emacs 26 Brings Generators and Threads</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2018/05/31/"/>
    <id>urn:uuid:395c5e11-2088-32fa-53c8-0c749dca2064</id>
    <updated>2018-05-31T17:45:16Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="lisp"/><category term="lang"/>
    <content type="html">
      <![CDATA[<p>Emacs 26.1 was <a href="https://lists.gnu.org/archive/html/emacs-devel/2018-05/msg00765.html">recently released</a>. As you would expect from a
major release, it comes with lots of new goodies. Being <a href="/tags/emacs/">a bit of an
Emacs Lisp enthusiast</a>, the two most interesting new features
are <a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/Generators.html">generators</a> (<code class="language-plaintext highlighter-rouge">iter</code>) and <a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/Threads.html">native threads</a>
(<code class="language-plaintext highlighter-rouge">thread</code>).</p>

<p><strong>Correction</strong>: Generators were actually introduced in Emacs 25.1
(Sept. 2016), not Emacs 26.1. Doh!</p>

<p><strong>Update</strong>: <a href="https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual">ThreadSanitizer (TSan)</a> quickly shows that Emacs’
threading implementation has many data races, making it <a href="https://hboehm.info/boehm-hotpar11.pdf">completely
untrustworthy</a>. Until this is fixed, <strong><em>nobody</em> should use Emacs
threads for any purpose</strong>, and threads should disabled at compile time.</p>

<!--more-->

<h3 id="generators">Generators</h3>

<p>Generators are one of those cool language features that provide a lot of
power at a small implementation cost. They’re like a constrained form of
coroutines, but, unlike coroutines, they’re typically built entirely on
top of first-class functions (e.g. closures). This means <em>no additional
run-time support is needed</em> in order to add generators to a language.
The only complications are the changes to the compiler. Generators are
not compiled the same way as normal functions despite looking so
similar.</p>

<p>What’s perhaps coolest of all about lisp-family generators, including
Emacs Lisp, is that the compiler component can be <em>implemented
entirely with macros</em>. The compiler need not be modified at all,
making generators no more than a library, and not actually part of the
language. That’s exactly how they’ve been implemented in Emacs Lisp
(<code class="language-plaintext highlighter-rouge">emacs-lisp/generator.el</code>).</p>

<p>So what’s a generator? It’s a function that returns an <em>iterator
object</em>. When an iterator object is invoked (e.g. <code class="language-plaintext highlighter-rouge">iter-next</code>) it
evaluates the body of the generator. Each iterator is independent.
What makes them unusual (and useful) is that the evaluation is
<em>paused</em> in the middle of the body to return a value, saving all the
internal state in the iterator. Normally pausing in the middle of
functions isn’t possible, which is what requires the special compiler
support.</p>

<p>Emacs Lisp generators appear to be most closely modeled after <a href="https://wiki.python.org/moin/Generators">Python
generators</a>, though it also shares some similarities to
<a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Iterators_and_Generators">JavaScript generators</a>. What makes it most like Python is the use
of signals for flow control — something I’m <a href="http://wiki.c2.com/?DontUseExceptionsForFlowControl">not personally enthused
about</a>. When a Python generator
completes, it throws a <code class="language-plaintext highlighter-rouge">StopItertion</code> exception. In Emacs Lisp, it’s
an <code class="language-plaintext highlighter-rouge">iter-end-of-sequence</code> signal. A signal is out-of-band and avoids
the issue relying on some special in-band value to communicate the end
of iteration.</p>

<p>In contrast, JavaScript’s solution is to return a “rich” object wrapping
the actual yield value. This object has a <code class="language-plaintext highlighter-rouge">done</code> field that communicates
whether iteration has completed. This avoids the use of exceptions for
flow control, but the caller has to unpack the rich object.</p>

<p>Fortunately the flow control issue isn’t normally exposed to Emacs Lisp
code. Most of the time you’ll use the <code class="language-plaintext highlighter-rouge">iter-do</code> macro or (my preference)
the new <code class="language-plaintext highlighter-rouge">cl-loop</code> keyword <code class="language-plaintext highlighter-rouge">iter-by</code>.</p>

<p>To illustrate how a generator works, here’s a really simple iterator
that iterates over a list:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">iter-defun</span> <span class="nv">walk</span> <span class="p">(</span><span class="nb">list</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">while</span> <span class="nb">list</span>
    <span class="p">(</span><span class="nv">iter-yield</span> <span class="p">(</span><span class="nb">pop</span> <span class="nb">list</span><span class="p">))))</span>
</code></pre></div></div>

<p>Here’s how it might be used:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">i</span> <span class="p">(</span><span class="nv">walk</span> <span class="o">'</span><span class="p">(</span><span class="ss">:a</span> <span class="ss">:b</span> <span class="ss">:c</span><span class="p">)))</span>

<span class="p">(</span><span class="nv">iter-next</span> <span class="nv">i</span><span class="p">)</span>  <span class="c1">; =&gt; :a</span>
<span class="p">(</span><span class="nv">iter-next</span> <span class="nv">i</span><span class="p">)</span>  <span class="c1">; =&gt; :b</span>
<span class="p">(</span><span class="nv">iter-next</span> <span class="nv">i</span><span class="p">)</span>  <span class="c1">; =&gt; :c</span>
<span class="p">(</span><span class="nv">iter-next</span> <span class="nv">i</span><span class="p">)</span>  <span class="c1">; error: iter-end-of-sequence</span>
</code></pre></div></div>

<p>The iterator object itself is <em>opaque</em> and you shouldn’t rely on any
part of its structure. That being said, I’m a firm believer that we
should understand how things work underneath the hood so that we can
make the most effective use of at them. No program should rely on the
particulars of the iterator object internals for <em>correctness</em>, but a
well-written program should employ them in a way that <a href="/blog/2017/01/30/">best exploits
their expected implementation</a>.</p>

<p>Currently iterator objects are closures, and <code class="language-plaintext highlighter-rouge">iter-next</code> invokes the
closure with its own internal protocol. It asks the closure to return
the next value (<code class="language-plaintext highlighter-rouge">:next</code> operation), and <code class="language-plaintext highlighter-rouge">iter-close</code> asks it to clean
itself up (<code class="language-plaintext highlighter-rouge">:close</code> operation).</p>

<p>Since they’re just closures, another <em>really</em> cool thing about Emacs
Lisp generators is that <a href="/blog/2013/12/30/">iterator objects are generally readable</a>.
That is, you can serialize them out with <code class="language-plaintext highlighter-rouge">print</code> and bring them back to
life with <code class="language-plaintext highlighter-rouge">read</code>, even in another instance of Emacs. They exist
independently of the original generator function. This will not work if
one of the values captured in the iterator object is not readable (e.g.
buffers).</p>

<p>How does pausing work? Well, one of other exciting new features of
Emacs 26 is the introduction of a jump table opcode, <code class="language-plaintext highlighter-rouge">switch</code>. I’d
lamented in the past that large <code class="language-plaintext highlighter-rouge">cond</code> and <code class="language-plaintext highlighter-rouge">cl-case</code> expressions could
be a lot more efficient if Emacs’ byte code supported jump tables. It
turns an O(n) sequence of comparisons into an O(1) lookup and jump.
It’s essentially the perfect foundation for a generator since it can
be used to <a href="https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html">jump straight back to the position</a> where evaluation
was paused.</p>

<p><em>Buuut</em>, generators do not currently use jump tables. The generator
library predates the new <code class="language-plaintext highlighter-rouge">switch</code> opcode, and, being independent of it,
its author, Daniel Colascione, went with the best option at the time.
Chunks of code between yields are packaged as individual closures. These
closures are linked together a bit like nodes in a graph, creating a
sort of state machine. To get the next value, the iterator object
invokes the closure representing the next state.</p>

<p>I’ve <em>manually</em> macro expanded the <code class="language-plaintext highlighter-rouge">walk</code> generator above into a form
that <em>roughly</em> resembles the expansion of <code class="language-plaintext highlighter-rouge">iter-defun</code>:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">walk</span> <span class="p">(</span><span class="nb">list</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">(</span><span class="nv">state</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">cl-flet*</span> <span class="p">((</span><span class="nv">state-2</span> <span class="p">()</span>
                 <span class="p">(</span><span class="nb">signal</span> <span class="ss">'iter-end-of-sequence</span> <span class="no">nil</span><span class="p">))</span>
               <span class="p">(</span><span class="nv">state-1</span> <span class="p">()</span>
                 <span class="p">(</span><span class="nb">prog1</span> <span class="p">(</span><span class="nb">pop</span> <span class="nb">list</span><span class="p">)</span>
                   <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nb">null</span> <span class="nb">list</span><span class="p">)</span>
                     <span class="p">(</span><span class="nb">setf</span> <span class="nv">state</span> <span class="nf">#'</span><span class="nv">state-2</span><span class="p">))))</span>
               <span class="p">(</span><span class="nv">state-0</span> <span class="p">()</span>
                 <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">null</span> <span class="nb">list</span><span class="p">)</span>
                     <span class="p">(</span><span class="nv">state-2</span><span class="p">)</span>
                   <span class="p">(</span><span class="nb">setf</span> <span class="nv">state</span> <span class="nf">#'</span><span class="nv">state-1</span><span class="p">)</span>
                   <span class="p">(</span><span class="nv">state-1</span><span class="p">))))</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="nv">state</span> <span class="nf">#'</span><span class="nv">state-0</span><span class="p">)</span>
      <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
        <span class="p">(</span><span class="nb">funcall</span> <span class="nv">state</span><span class="p">)))))</span>
</code></pre></div></div>

<p>This omits the protocol I mentioned, and it doesn’t have yield results
(values passed to the iterator). The actual expansion is a whole lot
messier and less optimal than this, but hopefully my hand-rolled
generator is illustrative enough. Without the protocol, this iterator is
stepped using <code class="language-plaintext highlighter-rouge">funcall</code> rather than <code class="language-plaintext highlighter-rouge">iter-next</code>.</p>

<p>The <code class="language-plaintext highlighter-rouge">state</code> variable keeps track of where in the body of the generator
this iterator is currently “paused.” Continuing the iterator is
therefore just a matter of invoking the closure that represents this
state. Each state closure may update <code class="language-plaintext highlighter-rouge">state</code> to point to a new part of
the generator body. The terminal state is obviously <code class="language-plaintext highlighter-rouge">state-2</code>. Notice
how state transitions occur around branches.</p>

<p>I had said generators can be implemented as a library in Emacs Lisp.
Unfortunately theres a hole in this: <code class="language-plaintext highlighter-rouge">unwind-protect</code>. It’s not valid to
yield inside an <code class="language-plaintext highlighter-rouge">unwind-protect</code> form. Unlike, say, a throw-catch,
there’s no mechanism to trap an unwinding stack so that it can be
restarted later. The state closure needs to return and fall through the
<code class="language-plaintext highlighter-rouge">unwind-protect</code>.</p>

<p>A jump table version of the generator might look like the following.
I’ve used <code class="language-plaintext highlighter-rouge">cl-labels</code> since it allows for recursion.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">walk</span> <span class="p">(</span><span class="nb">list</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">state</span> <span class="mi">0</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">cl-labels</span>
        <span class="p">((</span><span class="nv">closure</span> <span class="p">()</span>
           <span class="p">(</span><span class="nv">cl-case</span> <span class="nv">state</span>
             <span class="p">(</span><span class="mi">0</span> <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">null</span> <span class="nb">list</span><span class="p">)</span>
                    <span class="p">(</span><span class="nb">setf</span> <span class="nv">state</span> <span class="mi">2</span><span class="p">)</span>
                  <span class="p">(</span><span class="nb">setf</span> <span class="nv">state</span> <span class="mi">1</span><span class="p">))</span>
                <span class="p">(</span><span class="nv">closure</span><span class="p">))</span>
             <span class="p">(</span><span class="mi">1</span> <span class="p">(</span><span class="nb">prog1</span> <span class="p">(</span><span class="nb">pop</span> <span class="nb">list</span><span class="p">)</span>
                  <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nb">null</span> <span class="nb">list</span><span class="p">)</span>
                    <span class="p">(</span><span class="nb">setf</span> <span class="nv">state</span> <span class="mi">2</span><span class="p">))))</span>
             <span class="p">(</span><span class="mi">2</span> <span class="p">(</span><span class="nb">signal</span> <span class="ss">'iter-end-of-sequence</span> <span class="no">nil</span><span class="p">)))))</span>
      <span class="nf">#'</span><span class="nv">closure</span><span class="p">)))</span>
</code></pre></div></div>

<p>When byte compiled on Emacs 26, that <code class="language-plaintext highlighter-rouge">cl-case</code> is turned into a jump
table. This “switch” form is closer to how generators are implemented in
other languages.</p>

<p>Iterator objects can <a href="/blog/2017/12/14/">share state between themselves</a> if they
close over a common environment (or, of course, use the same global
variables).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">foo</span>
      <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nb">list</span> <span class="o">'</span><span class="p">(</span><span class="ss">:a</span> <span class="ss">:b</span> <span class="ss">:c</span><span class="p">)))</span>
        <span class="p">(</span><span class="nb">list</span>
         <span class="p">(</span><span class="nb">funcall</span>
          <span class="p">(</span><span class="nv">iter-lambda</span> <span class="p">()</span>
            <span class="p">(</span><span class="nv">while</span> <span class="nb">list</span>
              <span class="p">(</span><span class="nv">iter-yield</span> <span class="p">(</span><span class="nb">pop</span> <span class="nb">list</span><span class="p">)))))</span>
         <span class="p">(</span><span class="nb">funcall</span>
          <span class="p">(</span><span class="nv">iter-lambda</span> <span class="p">()</span>
            <span class="p">(</span><span class="nv">while</span> <span class="nb">list</span>
              <span class="p">(</span><span class="nv">iter-yield</span> <span class="p">(</span><span class="nb">pop</span> <span class="nb">list</span><span class="p">))))))))</span>

<span class="p">(</span><span class="nv">iter-next</span> <span class="p">(</span><span class="nb">nth</span> <span class="mi">0</span> <span class="nv">foo</span><span class="p">))</span>  <span class="c1">; =&gt; :a</span>
<span class="p">(</span><span class="nv">iter-next</span> <span class="p">(</span><span class="nb">nth</span> <span class="mi">1</span> <span class="nv">foo</span><span class="p">))</span>  <span class="c1">; =&gt; :b</span>
<span class="p">(</span><span class="nv">iter-next</span> <span class="p">(</span><span class="nb">nth</span> <span class="mi">0</span> <span class="nv">foo</span><span class="p">))</span>  <span class="c1">; =&gt; :c</span>
</code></pre></div></div>

<p>For years there has been a <em>very</em> crude way to “pause” a function and
allow other functions to run: <code class="language-plaintext highlighter-rouge">accept-process-output</code>. It only works in
the context of processes, but five years ago this was <a href="/blog/2013/01/14/">sufficient for me
to build primitives on top of it</a>. Unlike this old process
function, generators do not block threads, including the user interface,
which is really important.</p>

<h3 id="threads">Threads</h3>

<p>Emacs 26 also bring us threads, which have been attached in a very
bolted on fashion. It’s not much more than a subset of pthreads: shared
memory threads, recursive mutexes, and condition variables. The
interfaces look just like they do in pthreads, and there hasn’t been
much done to integrate more naturally into the Emacs Lisp ecosystem.</p>

<p>This is also only the first step in bringing threading to Emacs Lisp.
Right now there’s effectively a global interpreter lock (GIL), and
threads only run one at a time cooperatively. Like with generators, the
Python influence is obvious. In theory, sometime in the future this
interpreter lock will be removed, making way for actual concurrency.</p>

<p>This is, again, where I think it’s useful to contrast with JavaScript,
which was also initially designed to be single-threaded. Low-level
threading primitives weren’t exposed — though mostly because
JavaScript typically runs sandboxed and there’s no safe way to expose
those primitives. Instead it got a <a href="/blog/2013/01/26/">web worker API</a> that exposes
concurrency at a much higher level, along with an efficient interface
for thread coordination.</p>

<p>For Emacs Lisp, I’d prefer something safer, more like the JavaScript
approach. Low-level pthreads are now a great way to wreck Emacs with
deadlocks (with no <code class="language-plaintext highlighter-rouge">C-g</code> escape). Playing around with the new
threading API for just a few days, I’ve already had to restart Emacs a
bunch of times. Bugs in Emacs Lisp are normally a lot more forgiving.</p>

<p>One important detail that has been designed well is that dynamic
bindings are thread-local. This is really essential for correct
behavior. This is also an easy way to create thread-local storage
(TLS): dynamically bind variables in the thread’s entrance function.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: t; -*-</span>

<span class="p">(</span><span class="nb">defvar</span> <span class="nv">foo-counter-tls</span><span class="p">)</span>
<span class="p">(</span><span class="nb">defvar</span> <span class="nv">foo-path-tls</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">foo-make-thread</span> <span class="p">(</span><span class="nv">path</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">make-thread</span>
   <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
     <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">foo-counter-tls</span> <span class="mi">0</span><span class="p">)</span>
           <span class="p">(</span><span class="nv">foo-name-tls</span> <span class="nv">path</span><span class="p">))</span>
       <span class="o">...</span><span class="p">))))</span>
</code></pre></div></div>

<p>However, <strong><code class="language-plaintext highlighter-rouge">cl-letf</code> “bindings” are <em>not</em> thread-local</strong>, which makes
this <a href="/blog/2017/10/27/">otherwise incredibly useful macro</a> quite dangerous in the
presence of threads. This is one way that the new threading API feels
bolted on.</p>

<h4 id="building-generators-on-threads">Building generators on threads</h4>

<p>In <a href="/blog/2017/06/21/">my stack clashing article</a> I showed a few different ways to
add coroutine support to C. One method spawned per-coroutine threads,
and coordinated using semaphores. With the new threads API in Emacs,
it’s possible to do exactly the same thing.</p>

<p>Since generators are just a limited form of coroutines, this means
threads offer another, <em>very</em> different way to implement them. The
threads API doesn’t provide semaphores, but condition variables can fill
in for them. To “pause” in the middle of the generator, just wait on a
condition variable.</p>

<p>So, naturally, I just had to see if I could make it work. I call it a
“thread iterator” or “thriter.” The API is <em>very</em> similar to <code class="language-plaintext highlighter-rouge">iter</code>:</p>

<p><strong><a href="https://github.com/skeeto/thriter">https://github.com/skeeto/thriter</a></strong></p>

<p>This is merely a proof of concept so don’t actually use this library
for anything. These thread-based generators are about 5x slower than
<code class="language-plaintext highlighter-rouge">iter</code> generators, and they’re a lot more heavy-weight, needing an
entire thread per iterator object. This makes <code class="language-plaintext highlighter-rouge">thriter-close</code> all the
more important. On the other hand, these generators have no problem
yielding inside <code class="language-plaintext highlighter-rouge">unwind-protect</code>.</p>

<p>Originally this article was going to dive into the details of how
these thread-iterators worked, but <code class="language-plaintext highlighter-rouge">thriter</code> turned out to be quite a
bit more complicated than I anticipated, especially as I worked
towards feature matching <code class="language-plaintext highlighter-rouge">iter</code>.</p>

<p>The gist of it is that each side of a next/yield transaction gets its
own condition variable, but share a common mutex. Values are passed
between the threads using slots on the iterator object. The side that
isn’t currently running waits on a condition variable until the other
side frees it, after which the releaser waits on its own condition
variable for the result. This is similar to <a href="/blog/2017/02/14/">asynchronous requests in
Emacs dynamic modules</a>.</p>

<p>Rather than use signals to indicate completion, I modeled it after
JavaScript generators. Iterators return a cons cell. The car indicates
continuation and the cdr holds the yield result. To terminate an
iterator early (<code class="language-plaintext highlighter-rouge">thriter-close</code> or garbage collection), <code class="language-plaintext highlighter-rouge">thread-signal</code>
is used to essentially “cancel” the thread and knock it off the
condition variable.</p>

<p>Since threads aren’t (and shouldn’t be) garbage collected, failing to
run a thread-iterator to completion would normally cause a memory leak,
as the thread <a href="https://www.youtube.com/watch?v=AK3PWHxoT_E">sits there forever waiting on a “next” that will never
come</a>. To deal with this, there’s a finalizer is attached to the
iterator object in such a way that it’s not visible to the thread. A
lost iterator is eventually cleaned up by the garbage collector, but, as
usual with finalizers, this is <a href="https://utcc.utoronto.ca/~cks/space/blog/programming/GoFinalizersStopLeaks">only a last resort</a>.</p>

<h4 id="the-future-of-threads">The future of threads</h4>

<p>This thread-iterator project was my initial, little experiment with
Emacs Lisp threads, similar to why I <a href="/blog/2016/11/05/">connected a joystick to Emacs
using a dynamic module</a>. While I don’t expect the current thread
API to go away, it’s not really suitable for general use in its raw
form. Bugs in Emacs Lisp programs should virtually never bring down
Emacs and require a restart. Outside of threads, the few situations
that break this rule are very easy to avoid (and very obvious that
something dangerous is happening). Dynamic modules are dangerous by
necessity, but concurrency doesn’t have to be.</p>

<p>There really needs to be a safe, high-level API with clean thread
isolation. Perhaps this higher-level API will eventually build on top of
the low-level threading API.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Emacs Lisp Lambda Expressions Are Not Self-Evaluating</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2018/02/22/"/>
    <id>urn:uuid:7a3cd1d1-a48c-3c9b-1564-eacde8b9aa4d</id>
    <updated>2018-02-22T21:30:57Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="lang"/>
    <content type="html">
      <![CDATA[<p>This week I made a mistake that ultimately enlightened me about the
nature of function objects in Emacs Lisp. There are three kinds of
function objects, but they each behave very differently when evaluated
as objects.</p>

<p>But before we get to that, let’s talk about one of Emacs’
embarrassing, old missteps: <code class="language-plaintext highlighter-rouge">eval-after-load</code>.</p>

<h3 id="taming-an-old-dragon">Taming an old dragon</h3>

<p>One of the long-standing issues with Emacs is that loading Emacs Lisp
files (.el and .elc) is a slow process, even when those files have
been byte compiled. There are a number of dirty hacks in place to deal
with this issue, and the biggest and nastiest of them all is the
<a href="https://lwn.net/Articles/707615/"><em>dumper</em></a>, also known as <em>unexec</em>.</p>

<p>The Emacs you routinely use throughout the day is actually a previous
instance of Emacs that’s been resurrected from the dead. Your undead
Emacs was probably created months, if not years, earlier, back when it
was originally compiled. The first stage of compiling Emacs is to
compile a minimal C core called <code class="language-plaintext highlighter-rouge">temacs</code>. The second stage is loading
a bunch of Emacs Lisp files, then dumping a memory image in an
unportable, platform-dependent way. On Linux, this actually <a href="https://lwn.net/Articles/707615/">requires
special hooks in glibc</a>. The Emacs you know and love is this
dumped image loaded back into memory, continuing from where it left
off just after it was compiled. Regardless of your own feelings on the
matter, you have to admit <a href="/blog/2011/01/30/">this <em>is</em> a very lispy thing to do</a>.</p>

<p>There are two notable costs to Emacs’ dumper:</p>

<ol>
  <li>
    <p>The dumped image contains hard-coded memory addresses. This means
Emacs can’t be a <em>Position Independent Executable</em> (PIE). It can’t
take advantage of a security feature called <em>Address Space Layout
Randomization</em> (ASLR), which would increase the difficulty of
<a href="/blog/2017/07/19/">exploiting</a> some <a href="/blog/2012/09/28/">classes of bugs</a>. This might be
important to you if Emacs processes untrusted data, such as when it’s
used as <a href="/blog/2013/09/03/">a mail client</a>, <a href="https://github.com/skeeto/emacs-web-server">a web server</a> or generally
<a href="https://github.com/skeeto/elfeed">parses data downloaded across the network</a>.</p>
  </li>
  <li>
    <p>It’s not possible to cross-compile Emacs since it can only be dumped
by running <code class="language-plaintext highlighter-rouge">temacs</code> on its target platform. As an experiment I’ve
attempted to dump the Windows version of Emacs on Linux using
<a href="https://www.winehq.org/">Wine</a>, but was unsuccessful.</p>
  </li>
</ol>

<p>The good news is that there’s <a href="https://lists.gnu.org/archive/html/emacs-devel/2018-02/msg00347.html">a portable dumper</a> in the works
that makes this a lot less nasty. If you’re adventurous, you can
already disable dumping and run <code class="language-plaintext highlighter-rouge">temacs</code> directly by setting
<a href="https://lists.gnu.org/archive/html/bug-gnu-emacs/2016-11/msg00729.html"><code class="language-plaintext highlighter-rouge">CANNOT_DUMP=yes</code> at compile time</a>. Be warned, though, that a
non-dumped Emacs takes several seconds, or worse, to initialize
<em>before</em> it even begins loading your own configuration. It’s also
somewhat buggy since it seems nobody ever runs it this way
productively.</p>

<p>The other major way Emacs users have worked around slow loading is
aggressive use of lazy loading, generally via <em>autoloads</em>. The major
package interactive entry points are defined ahead of time as stub
functions. These stubs, when invoked, load the full package, which
overrides the stub definition, then finally the stub re-invokes the
new definition with the same arguments.</p>

<p>To further assist with lazy loading, an evaluated <code class="language-plaintext highlighter-rouge">defvar</code> form will
not override an existing global variable binding. This means you can,
to a certain extent, configure a package before it’s loaded. The
package will not clobber any existing configuration when it loads.
This also explains the bizarre interfaces for the various hook
functions, like <code class="language-plaintext highlighter-rouge">add-hook</code> and <code class="language-plaintext highlighter-rouge">run-hooks</code>. These accept symbols — the
<em>names</em> of the variables — rather than <em>values</em> of those variables as
would normally be the case. The <code class="language-plaintext highlighter-rouge">add-to-list</code> function does the same
thing. It’s all intended to cooperate with lazy loading, where the
variable may not have been defined yet.</p>

<h4 id="eval-after-load">eval-after-load</h4>

<p>Sometimes this isn’t enough and you need some some configuration to
take place after the package has been loaded, but without forcing it
to load early. That is, you need to tell Emacs “evaluate this code
after this particular package loads.” That’s where <code class="language-plaintext highlighter-rouge">eval-after-load</code>
comes into play, except for its fatal flaw: it takes the word “eval”
completely literally.</p>

<p>The first argument to <code class="language-plaintext highlighter-rouge">eval-after-load</code> is the name of a package. Fair
enough. The second argument is a form that will be passed to <code class="language-plaintext highlighter-rouge">eval</code>
after that package is loaded. Now hold on a minute. The general rule
of thumb is that if you’re calling <code class="language-plaintext highlighter-rouge">eval</code>, you’re probably doing
something seriously wrong, and this function is no exception. This is
<em>completely</em> the wrong mechanism for the task.</p>

<p>The second argument should have been a function — either a (sharp
quoted) symbol or a function object. And then instead of <code class="language-plaintext highlighter-rouge">eval</code> it
would be something more sensible, like <code class="language-plaintext highlighter-rouge">funcall</code>. Perhaps this
improved version would be named <code class="language-plaintext highlighter-rouge">call-after-load</code> or <code class="language-plaintext highlighter-rouge">run-after-load</code>.</p>

<p>The big problem with passing an s-expression is that it will be left
uncompiled due to being quoted. <a href="/blog/2017/12/14/">I’ve talked before about the
importance of evaluating your lambdas</a>. <code class="language-plaintext highlighter-rouge">eval-after-load</code> not
only encourages badly written Emacs Lisp, it demands it.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; BAD!</span>
<span class="p">(</span><span class="nv">eval-after-load</span> <span class="ss">'simple-httpd</span>
                 <span class="o">'</span><span class="p">(</span><span class="nb">push</span> <span class="o">'</span><span class="p">(</span><span class="s">"c"</span> <span class="o">.</span> <span class="s">"text/plain"</span><span class="p">)</span> <span class="nv">httpd-mime-types</span><span class="p">))</span>
</code></pre></div></div>

<p>This was all corrected in Emacs 25. If the second argument to
<code class="language-plaintext highlighter-rouge">eval-after-load</code> is a function — the result of applying <code class="language-plaintext highlighter-rouge">functionp</code> is
non-nil — then it uses <code class="language-plaintext highlighter-rouge">funcall</code>. There’s also a new macro,
<code class="language-plaintext highlighter-rouge">with-eval-after-load</code>, to package it all up nicely.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; Better (Emacs &gt;= 25 only)</span>
<span class="p">(</span><span class="nv">eval-after-load</span> <span class="ss">'simple-httpd</span>
  <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
    <span class="p">(</span><span class="nb">push</span> <span class="o">'</span><span class="p">(</span><span class="s">"c"</span> <span class="o">.</span> <span class="s">"text/plain"</span><span class="p">)</span> <span class="nv">httpd-mime-types</span><span class="p">)))</span>

<span class="c1">;;; Best (Emacs &gt;= 25 only)</span>
<span class="p">(</span><span class="nv">with-eval-after-load</span> <span class="ss">'simple-httpd</span>
  <span class="p">(</span><span class="nb">push</span> <span class="o">'</span><span class="p">(</span><span class="s">"c"</span> <span class="o">.</span> <span class="s">"text/plain"</span><span class="p">)</span> <span class="nv">httpd-mime-types</span><span class="p">))</span>
</code></pre></div></div>

<p>Though in both of these examples the compiler will likely warn about
<code class="language-plaintext highlighter-rouge">httpd-mime-types</code> not being defined. That’s a problem for another
day.</p>

<h4 id="a-workaround">A workaround</h4>

<p>But what if you <em>need</em> to use Emacs 24, as was the <a href="https://github.com/skeeto/elfeed/pull/268">situation that
sparked this article</a>? What can we do with the bad version of
<code class="language-plaintext highlighter-rouge">eval-after-load</code>? We could situate a lambda such that it’s evaluated,
but then smuggle the resulting function object into the form passed to
<code class="language-plaintext highlighter-rouge">eval-after-load</code>, all using a backquote.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; Note: this is subtly broken</span>
<span class="p">(</span><span class="nv">eval-after-load</span> <span class="ss">'simple-httpd</span>
  <span class="o">`</span><span class="p">(</span><span class="nb">funcall</span>
    <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
       <span class="p">(</span><span class="nb">push</span> <span class="o">'</span><span class="p">(</span><span class="s">"c"</span> <span class="o">.</span> <span class="s">"text/plain"</span><span class="p">)</span> <span class="nv">httpd-mime-types</span><span class="p">)))</span>
</code></pre></div></div>

<p>When everything is compiled, the backquoted form evalutes to this:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">funcall</span> <span class="err">#</span><span class="nv">[0</span> <span class="nv">&lt;bytecode&gt;</span> <span class="nv">[httpd-mime-types</span> <span class="p">(</span><span class="s">"c"</span> <span class="o">.</span> <span class="s">"text/plain"</span><span class="p">)</span><span class="nv">]</span> <span class="nv">2]</span><span class="p">)</span>
</code></pre></div></div>

<p>Where the second value (<code class="language-plaintext highlighter-rouge">#[...]</code>) is a <a href="/blog/2014/01/04/">byte-code object</a>.
However, as the comment notes, this is subtly broken. A cleaner and
correct way to solve all this is with a named function. The damage
caused by <code class="language-plaintext highlighter-rouge">eval-after-load</code> will have been (mostly) minimized.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">my-simple-httpd-hook</span> <span class="p">()</span>
  <span class="p">(</span><span class="nb">push</span> <span class="o">'</span><span class="p">(</span><span class="s">"c"</span> <span class="o">.</span> <span class="s">"text/plain"</span><span class="p">)</span> <span class="nv">httpd-mime-types</span><span class="p">))</span>

<span class="p">(</span><span class="nv">eval-after-load</span> <span class="ss">'simple-httpd</span>
  <span class="o">'</span><span class="p">(</span><span class="nb">funcall</span> <span class="nf">#'</span><span class="nv">my-simple-httpd-hook</span><span class="p">))</span>
</code></pre></div></div>

<p>But, let’s go back to the anonymous function solution. What was broken
about it? It all has to do with evaluating function objects.</p>

<h3 id="evaluating-function-objects">Evaluating function objects</h3>

<p>So what happens when we evaluate an expression like the one above with
<code class="language-plaintext highlighter-rouge">eval</code>? Here’s what it looks like again.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">funcall</span> <span class="err">#</span><span class="nv">[...]</span><span class="p">)</span>
</code></pre></div></div>

<p>First, <code class="language-plaintext highlighter-rouge">eval</code> notices it’s been given a non-empty list, so it’s probably
a function call. The first argument is the name of the function to be
called (<code class="language-plaintext highlighter-rouge">funcall</code>) and the remaining elements are its arguments. <em>But</em>
each of these elements must be evaluated first, and the <em>result</em> of that
evaluation becomes the arguments.</p>

<p>Any value that isn’t a list or a symbol is <em>self-evaluating</em>. That is,
it evaluates to its own value:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">eval</span> <span class="mi">10</span><span class="p">)</span>
<span class="c1">;; =&gt; 10</span>
</code></pre></div></div>

<p>If the value is a symbol, it’s treated as a variable. If the value is a
list, it goes through the function call process I’m describing (or one
of a number of other special cases, such as macro expansion, lambda
expressions, and special forms).</p>

<p>So, conceptually <code class="language-plaintext highlighter-rouge">eval</code> recurses on the function object <code class="language-plaintext highlighter-rouge">#[...]</code>. A
function object is not a list or a symbol, so it’s self-evaluating. No
problem.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Byte-code objects are self-evaluating</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">x</span> <span class="p">(</span><span class="nv">byte-compile</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()))))</span>
  <span class="p">(</span><span class="nb">eq</span> <span class="nv">x</span> <span class="p">(</span><span class="nb">eval</span> <span class="nv">x</span><span class="p">)))</span>
<span class="c1">;; =&gt; t</span>
</code></pre></div></div>

<p>What if this code <em>wasn’t</em> compiled? Rather than a byte-code object,
we’d have some other kind of function object for the interpreter.
Let’s examine the dynamic scope (<em>shudder</em>) case. Here, a lambda
<em>appears</em> to evaluate to itself, but appearances can be deceiving:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">eval</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">())</span>
<span class="c1">;; =&gt; (lambda ())</span>
</code></pre></div></div>

<p>However, this is not self-evaluation. <strong>Lambda expressions are not
self-evaluating</strong>. It’s merely <em>coincidence</em> that the result of
evaluating a lambda expression looks like the original expression.
This is just how the Emacs Lisp interpreter is currently implemented
and, strictly speaking, it’s an implementation detail that <em>just so
happens</em> to be mostly compatible with byte-code objects being
self-evaluating. It would be a mistake to rely on this.</p>

<p>Instead, <strong>dynamic scope lambda expression evaluation is
<a href="https://labs.spotify.com/2013/06/18/creative-usernames/">idempotent</a>.</strong> Applying <code class="language-plaintext highlighter-rouge">eval</code> to the result will return
an <code class="language-plaintext highlighter-rouge">equal</code>, but not identical (<code class="language-plaintext highlighter-rouge">eq</code>), expression. In contrast, a
self-evaluating value is also idempotent under evaluation, but with
<code class="language-plaintext highlighter-rouge">eq</code> results.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Not self-evaluating:</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">x</span> <span class="o">'</span><span class="p">(</span><span class="k">lambda</span> <span class="p">())))</span>
  <span class="p">(</span><span class="nb">eq</span> <span class="nv">x</span> <span class="p">(</span><span class="nb">eval</span> <span class="nv">x</span><span class="p">)))</span>
<span class="c1">;; =&gt; nil</span>

<span class="c1">;; Evaluation is idempotent:</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">x</span> <span class="o">'</span><span class="p">(</span><span class="k">lambda</span> <span class="p">())))</span>
  <span class="p">(</span><span class="nb">equal</span> <span class="nv">x</span> <span class="p">(</span><span class="nb">eval</span> <span class="nv">x</span><span class="p">)))</span>
<span class="c1">;; =&gt; t</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">x</span> <span class="o">'</span><span class="p">(</span><span class="k">lambda</span> <span class="p">())))</span>
  <span class="p">(</span><span class="nb">equal</span> <span class="nv">x</span> <span class="p">(</span><span class="nb">eval</span> <span class="p">(</span><span class="nb">eval</span> <span class="nv">x</span><span class="p">))))</span>
<span class="c1">;; =&gt; t</span>
</code></pre></div></div>

<p>So, with dynamic scope, the subtly broken backquote example will still
work, but only by sheer luck. Under lexical scope, the situation isn’t
so lucky:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-scope: t; -*-</span>

<span class="p">(</span><span class="k">lambda</span> <span class="p">())</span>
<span class="c1">;; =&gt; (closure (t) nil)</span>
</code></pre></div></div>

<p>These interpreted lambda functions are neither self-evaluating nor
idempotent. Passing <code class="language-plaintext highlighter-rouge">t</code> as the second argument to <code class="language-plaintext highlighter-rouge">eval</code> tells it to
use lexical scope, as shown below:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Not self-evaluating:</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">x</span> <span class="o">'</span><span class="p">(</span><span class="k">lambda</span> <span class="p">())))</span>
  <span class="p">(</span><span class="nb">eq</span> <span class="nv">x</span> <span class="p">(</span><span class="nb">eval</span> <span class="nv">x</span> <span class="no">t</span><span class="p">)))</span>
<span class="c1">;; =&gt; nil</span>

<span class="c1">;; Not idempotent:</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">x</span> <span class="o">'</span><span class="p">(</span><span class="k">lambda</span> <span class="p">())))</span>
  <span class="p">(</span><span class="nb">equal</span> <span class="nv">x</span> <span class="p">(</span><span class="nb">eval</span> <span class="nv">x</span> <span class="no">t</span><span class="p">)))</span>
<span class="c1">;; =&gt; nil</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">x</span> <span class="o">'</span><span class="p">(</span><span class="k">lambda</span> <span class="p">())))</span>
  <span class="p">(</span><span class="nb">equal</span> <span class="nv">x</span> <span class="p">(</span><span class="nb">eval</span> <span class="p">(</span><span class="nb">eval</span> <span class="nv">x</span> <span class="no">t</span><span class="p">)</span> <span class="no">t</span><span class="p">)))</span>
<span class="c1">;; error: (void-function closure)</span>
</code></pre></div></div>

<p>I can <a href="/blog/2017/05/03/">imagine an implementation</a> of Emacs Lisp where dynamic
scope lambda expressions are in the same boat, where they’re not even
idempotent. For example:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: nil; -*-</span>

<span class="p">(</span><span class="k">lambda</span> <span class="p">())</span>
<span class="c1">;; =&gt; (totally-not-a-closure ())</span>
</code></pre></div></div>

<p>Most Emacs Lisp would work just fine under this change, and only code
that makes some kind of logical mistake — where there’s nested
evaluation of lambda expressions — would break. This essentially
already happened when lots of code was quietly switched over to
lexical scope after Emacs 24. Lambda idempotency was lost and
well-written code didn’t notice.</p>

<p>There’s a temptation here for Emacs to define a <code class="language-plaintext highlighter-rouge">closure</code> function or
special form that would allow interpreter closure objects to be either
self-evaluating or idempotent. This would be a mistake. It would only
serve as a hack that covers up logical mistakes that lead to nested
evaluation. Much better to catch those problems early.</p>

<h3 id="solving-the-problem-with-one-character">Solving the problem with one character</h3>

<p>So how do we fix the subtly broken example? With a strategically
placed quote right before the comma.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">eval-after-load</span> <span class="ss">'simple-httpd</span>
  <span class="o">`</span><span class="p">(</span><span class="nb">funcall</span>
    <span class="ss">',</span><span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
        <span class="p">(</span><span class="nb">push</span> <span class="o">'</span><span class="p">(</span><span class="s">"c"</span> <span class="o">.</span> <span class="s">"text/plain"</span><span class="p">)</span> <span class="nv">httpd-mime-types</span><span class="p">)))</span>
</code></pre></div></div>

<p>So the form passed to <code class="language-plaintext highlighter-rouge">eval-after-load</code> becomes:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Compiled:</span>
<span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="k">quote</span> <span class="err">#</span><span class="nv">[...]</span><span class="p">))</span>

<span class="c1">;; Dynamic scope:</span>
<span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="k">quote</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="o">...</span><span class="p">)))</span>

<span class="c1">;; Lexical scope:</span>
<span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="k">quote</span> <span class="p">(</span><span class="nv">closure</span> <span class="p">(</span><span class="no">t</span><span class="p">)</span> <span class="p">()</span> <span class="o">...</span><span class="p">)))</span>
</code></pre></div></div>

<p>The quote prevents <code class="language-plaintext highlighter-rouge">eval</code> from evaluating the function object, which
would be either needless or harmful. There’s also an argument to be
made that this is a perfect situation for a sharp-quote (<code class="language-plaintext highlighter-rouge">#'</code>), which
exists to quote functions.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>Options for Structured Data in Emacs Lisp</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2018/02/14/"/>
    <id>urn:uuid:3837b5b2-0aba-3381-ff6f-9432f8ff03e9</id>
    <updated>2018-02-14T17:43:34Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>So your Emacs package has grown beyond a dozen or so lines of code, and
the data it manages is now structured and heterogeneous. Informal plain
old lists, the bread and butter of any lisp, are not longer cutting it.
You really need to cleanly abstract this structure, both for your own
organizational sake any for anyone reading your code.</p>

<p>With informal lists as structures, you might regularly ask questions
like, “Was the ‘name’ slot stored in the third list element, or was
it the fourth element?” A plist or alist helps with this problem, but
those are better suited for informal, externally-supplied data, not
for internal structures with fixed slots. Occasionally someone
suggests using hash tables as structures, but Emacs Lisp’s hash tables
are <em>much</em> too heavy for this. Hash tables are more appropriate when
keys themselves are data.</p>

<h3 id="defining-a-data-structure-from-scratch">Defining a data structure from scratch</h3>

<p>Imagine a refrigerator package that manages a collection of food in a
refrigerator. A food item could be structured as a plain old list,
with slots at specific positions.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">fridge-item-create</span> <span class="p">(</span><span class="nv">name</span> <span class="nv">expiry</span> <span class="nv">weight</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">list</span> <span class="nv">name</span> <span class="nv">expiry</span> <span class="nv">weight</span><span class="p">))</span>
</code></pre></div></div>

<p>A function that computes the mean weight of a list of food items might
look like this:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">fridge-mean-weight</span> <span class="p">(</span><span class="nv">items</span><span class="p">)</span>
  <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">null</span> <span class="nv">items</span><span class="p">)</span>
      <span class="mf">0.0</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">sum</span> <span class="mf">0.0</span><span class="p">)</span>
          <span class="p">(</span><span class="nb">count</span> <span class="mi">0</span><span class="p">))</span>
      <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">item</span> <span class="nv">items</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">sum</span> <span class="nb">count</span><span class="p">))</span>
        <span class="p">(</span><span class="nb">setf</span> <span class="nb">count</span> <span class="p">(</span><span class="nb">1+</span> <span class="nb">count</span><span class="p">)</span>
              <span class="nv">sum</span> <span class="p">(</span><span class="nb">+</span> <span class="nv">sum</span> <span class="p">(</span><span class="nb">nth</span> <span class="mi">2</span> <span class="nv">item</span><span class="p">)))))))</span>
</code></pre></div></div>

<p>Note the use of <code class="language-plaintext highlighter-rouge">(nth 2 item)</code> at the end, used to get the item’s
weight. That magic number 2 is easy to mess up. Even worse, if lots of
code accesses “weight” this way, then future extensions will be
inhibited. Defining some accessor functions solves this problem.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defsubst</span> <span class="nv">fridge-item-name</span> <span class="p">(</span><span class="nv">item</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">nth</span> <span class="mi">0</span> <span class="nv">item</span><span class="p">))</span>

<span class="p">(</span><span class="nv">defsubst</span> <span class="nv">fridge-item-expiry</span> <span class="p">(</span><span class="nv">item</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">nth</span> <span class="mi">1</span> <span class="nv">item</span><span class="p">))</span>

<span class="p">(</span><span class="nv">defsubst</span> <span class="nv">fridge-item-weight</span> <span class="p">(</span><span class="nv">item</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">nth</span> <span class="mi">2</span> <span class="nv">item</span><span class="p">))</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">defsubst</code> defines an inline function, so there’s effectively no
additional run-time costs for these accessors compared to a bare
<code class="language-plaintext highlighter-rouge">nth</code>. Since these only cover <em>getting</em> slots, we should also define
some setters using the built-in gv (generalized variable) package.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'gv</span><span class="p">)</span>

<span class="p">(</span><span class="nv">gv-define-setter</span> <span class="nv">fridge-item-name</span> <span class="p">(</span><span class="nv">value</span> <span class="nv">item</span><span class="p">)</span>
  <span class="o">`</span><span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">nth</span> <span class="mi">0</span> <span class="o">,</span><span class="nv">item</span><span class="p">)</span> <span class="o">,</span><span class="nv">value</span><span class="p">))</span>

<span class="p">(</span><span class="nv">gv-define-setter</span> <span class="nv">fridge-item-expiry</span> <span class="p">(</span><span class="nv">value</span> <span class="nv">item</span><span class="p">)</span>
  <span class="o">`</span><span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">nth</span> <span class="mi">1</span> <span class="o">,</span><span class="nv">item</span><span class="p">)</span> <span class="o">,</span><span class="nv">value</span><span class="p">))</span>

<span class="p">(</span><span class="nv">gv-define-setter</span> <span class="nv">fridge-item-weight</span> <span class="p">(</span><span class="nv">value</span> <span class="nv">item</span><span class="p">)</span>
  <span class="o">`</span><span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">nth</span> <span class="mi">2</span> <span class="o">,</span><span class="nv">item</span><span class="p">)</span> <span class="o">,</span><span class="nv">value</span><span class="p">))</span>
</code></pre></div></div>

<p>This makes each slot setf-able. Generalized variables are great for
simplifying APIs, since otherwise there would need to be an equal
number of setter functions (<code class="language-plaintext highlighter-rouge">fridge-item-set-name</code>, etc.). With
generalized variables, both are at the same entrypoint:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">fridge-item-name</span> <span class="nv">item</span><span class="p">)</span> <span class="s">"Eggs"</span><span class="p">)</span>
</code></pre></div></div>

<p>There are still two more significant improvements.</p>

<ol>
  <li>
    <p>As far as Emacs Lisp is concerned, this isn’t a real <em>type</em>. The
type-ness of it is just a fiction created by the conventions of the
package. It would be easy to make the mistake of passing an
arbitrary list to these <code class="language-plaintext highlighter-rouge">fridge-item</code> functions, and the mistake
wouldn’t be caught so long as that list has at least three items.
An common solution is to add a <em>type tag</em>: a symbol at the
beginning of the structure that identifies it.</p>
  </li>
  <li>
    <p>It’s still a linked list, and <code class="language-plaintext highlighter-rouge">nth</code> has to walk the list (i.e.
<code class="language-plaintext highlighter-rouge">O(n)</code>) to retrieve items. It would be much more efficient to use a
vector, turning this into an efficient <code class="language-plaintext highlighter-rouge">O(1)</code> operation.</p>
  </li>
</ol>

<p>Addressing both of these at once:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">fridge-item-create</span> <span class="p">(</span><span class="nv">name</span> <span class="nv">expiry</span> <span class="nv">weight</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">vector</span> <span class="ss">'fridge-item</span> <span class="nv">name</span> <span class="nv">expiry</span> <span class="nv">weight</span><span class="p">))</span>

<span class="p">(</span><span class="nv">defsubst</span> <span class="nv">fridge-item-p</span> <span class="p">(</span><span class="nv">object</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">vectorp</span> <span class="nv">object</span><span class="p">)</span>
       <span class="p">(</span><span class="nb">=</span> <span class="p">(</span><span class="nb">length</span> <span class="nv">object</span><span class="p">)</span> <span class="mi">4</span><span class="p">)</span>
       <span class="p">(</span><span class="nb">eq</span> <span class="ss">'fridge-item</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">object</span> <span class="mi">0</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">defsubst</span> <span class="nv">fridge-item-name</span> <span class="p">(</span><span class="nv">item</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">unless</span> <span class="p">(</span><span class="nv">fridge-item-p</span> <span class="nv">item</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">signal</span> <span class="ss">'wrong-type-argument</span> <span class="p">(</span><span class="nb">list</span> <span class="ss">'fridge-item</span> <span class="nv">item</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">aref</span> <span class="nv">item</span> <span class="mi">1</span><span class="p">))</span>

<span class="p">(</span><span class="nv">defsubst</span> <span class="nv">fridge-item-name--set</span> <span class="p">(</span><span class="nv">item</span> <span class="nv">value</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">unless</span> <span class="p">(</span><span class="nv">fridge-item-p</span> <span class="nv">item</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">signal</span> <span class="ss">'wrong-type-argument</span> <span class="p">(</span><span class="nb">list</span> <span class="ss">'fridge-item</span> <span class="nv">item</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">item</span> <span class="mi">1</span><span class="p">)</span> <span class="nv">value</span><span class="p">))</span>

<span class="p">(</span><span class="nv">gv-define-setter</span> <span class="nv">fridge-item-name</span> <span class="p">(</span><span class="nv">value</span> <span class="nv">item</span><span class="p">)</span>
  <span class="o">`</span><span class="p">(</span><span class="nv">fridge-item-name--set</span> <span class="o">,</span><span class="nv">item</span> <span class="o">,</span><span class="nv">value</span><span class="p">))</span>

<span class="c1">;; And so on for expiry and weight...</span>
</code></pre></div></div>

<p>As long as <code class="language-plaintext highlighter-rouge">fridge-mean-weight</code> uses the <code class="language-plaintext highlighter-rouge">fridge-item-weight</code>
accessor, it continues to work unmodified across all these changes.
But, <em>whew</em>, that’s quite a lot of boilerplate to write and maintain
for each data structure in our package! Boilerplate code generation is
a perfect candidate for a macro definition. Luckily for us, Emacs
already defines a macro to generate all this code: <code class="language-plaintext highlighter-rouge">cl-defstruct</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'cl-lib</span><span class="p">)</span>

<span class="p">(</span><span class="nv">cl-defstruct</span> <span class="nv">fridge-item</span>
  <span class="nv">name</span> <span class="nv">expiry</span> <span class="nv">weight</span><span class="p">)</span>
</code></pre></div></div>

<p>In Emacs 25 and earlier, this innocent looking definition expands into
essentially all the above code. The code it generates is expressed in
<a href="/blog/2017/01/30/">the most optimal form</a> for its version of Emacs, and it
exploits many of the available optimizations by using function
declarations such as <code class="language-plaintext highlighter-rouge">side-effect-free</code> and <code class="language-plaintext highlighter-rouge">error-free</code>. It’s
configurable, too, allowing for the exclusion of a type tag (<code class="language-plaintext highlighter-rouge">:named</code>)
— discarding all the type checks — or using a list rather than a
vector as the underlying structure (<code class="language-plaintext highlighter-rouge">:type</code>). As a crude form of
structural inheritance, it even allows for directly embedding other
structures (<code class="language-plaintext highlighter-rouge">:include</code>).</p>

<h4 id="two-pitfalls">Two pitfalls</h4>

<p>There a couple pitfalls, though. First, for historical reasons, <strong>the
macro will define two namespace-unfriendly functions: <code class="language-plaintext highlighter-rouge">make-NAME</code> and
<code class="language-plaintext highlighter-rouge">copy-NAME</code></strong>. I always override these, preferring the <code class="language-plaintext highlighter-rouge">-create</code>
convention for the constructor, and tossing the copier since it’s
either useless or, worse, semantically wrong.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-defstruct</span> <span class="p">(</span><span class="nv">fridge-item</span> <span class="p">(</span><span class="ss">:constructor</span> <span class="nv">fridge-item-create</span><span class="p">)</span>
                           <span class="p">(</span><span class="ss">:copier</span> <span class="no">nil</span><span class="p">))</span>
  <span class="nv">name</span> <span class="nv">expiry</span> <span class="nv">weight</span><span class="p">)</span>
</code></pre></div></div>

<p>If the constructor needs to be more sophisticated than just setting
slots, it’s common to define a “private” constructor (double dash in
the name) and wrap it with a “public” constructor that has some
behavior.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-defstruct</span> <span class="p">(</span><span class="nv">fridge-item</span> <span class="p">(</span><span class="ss">:constructor</span> <span class="nv">fridge-item--create</span><span class="p">)</span>
                           <span class="p">(</span><span class="ss">:copier</span> <span class="no">nil</span><span class="p">))</span>
  <span class="nv">name</span> <span class="nv">expiry</span> <span class="nv">weight</span> <span class="nv">entry-time</span><span class="p">)</span>

<span class="p">(</span><span class="nv">cl-defun</span> <span class="nv">fridge-item-create</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">args</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">apply</span> <span class="nf">#'</span><span class="nv">fridge-item--create</span> <span class="ss">:entry-time</span> <span class="p">(</span><span class="nv">float-time</span><span class="p">)</span> <span class="nv">args</span><span class="p">))</span>
</code></pre></div></div>

<p>The other pitfall is related to printing. In Emacs 25 and earlier,
types defined by <code class="language-plaintext highlighter-rouge">cl-defstruct</code> are still only types by convention.
They’re really just vectors as far as Emacs Lisp is concerned. One
benefit from this is that <a href="/blog/2013/12/30/">printing and reading</a> these
structures is “free” because vectors are printable. It’s trivial to
serialize <code class="language-plaintext highlighter-rouge">cl-defstruct</code> structures out to a file. This is <a href="/blog/2013/09/09/">exactly
how the Elfeed database works</a>.</p>

<p>The pitfall is that <strong>once a structure has been serialized, there’s no
more changing the <code class="language-plaintext highlighter-rouge">cl-defstruct</code> definition.</strong> It’s now a file format
definition, so the slots are locked in place. Forever.</p>

<p>Emacs 26 throws a wrench in all this, though it’s worth it in the long
run. There’s a new primitive type in Emacs 26 with its own reader
syntax: records. This is similar to hash tables <a href="/blog/2010/06/07/">becoming first class
in the reader in Emacs 23.2</a>. In Emacs 26, <code class="language-plaintext highlighter-rouge">cl-defstruct</code> uses
records instead of vectors.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Emacs 25:</span>
<span class="p">(</span><span class="nv">fridge-item-create</span> <span class="ss">:name</span> <span class="s">"Eggs"</span> <span class="ss">:weight</span> <span class="mf">11.1</span><span class="p">)</span>
<span class="c1">;; =&gt; [cl-struct-fridge-item "Eggs" nil 11.1]</span>

<span class="c1">;; Emacs 26:</span>
<span class="p">(</span><span class="nv">fridge-item-create</span> <span class="ss">:name</span> <span class="s">"Eggs"</span> <span class="ss">:weight</span> <span class="mf">11.1</span><span class="p">)</span>
<span class="c1">;; =&gt; #s(fridge-item "Eggs" nil 11.1)</span>
</code></pre></div></div>

<p>So far slots are still accessed using <code class="language-plaintext highlighter-rouge">aref</code>, and all the type
checking still happens in Emacs Lisp. The only practical change is the
<code class="language-plaintext highlighter-rouge">record</code> function is used in place of the <code class="language-plaintext highlighter-rouge">vector</code> function when
allocating a structure. But it does pave the way for more interesting
things in the future.</p>

<p>The major short-term downside is that this breaks printed compatibility
across the Emacs 25/26 boundary. The <code class="language-plaintext highlighter-rouge">cl-old-struct-compat-mode</code>
function can be used for <em>some</em> degree of backwards, but not forwards,
compatibility. Emacs 26 can read and use some structures printed by
Emacs 25 and earlier, but the reverse will never be true. This issue
initially <a href="https://debbugs.gnu.org/cgi/bugreport.cgi?bug=27617">tripped up Emacs’ built-in packages</a>, and when Emacs 26
is released we’ll see more of these issues arise in external packages.</p>

<h3 id="dynamic-dispatch">Dynamic dispatch</h3>

<p>Prior to Emacs 25, the major built-in package for dynamic dispatch —
functions that specialize on the run-time type of their arguments — was
EIEIO, though it only supported single dispatch (specializing on a
single argument). EIEIO brought much of the Common Lisp Object System
(CLOS) to Emacs Lisp, including classes and methods.</p>

<p>Emacs 25 introduced a more sophisticated dynamic dispatch package
called cl-generic. It focuses only on dynamic dispatch and supports
multiple dispatch, completely replacing the dynamic dispatch portion
of EIEIO. Since <code class="language-plaintext highlighter-rouge">cl-defstruct</code> does inheritance and cl-generic does
dynamic dispatch, there’s not really much left for EIEIO — besides bad
ideas like multiple inheritance and method combination.</p>

<p>Without either of these packages, the most direct way to build single
dispatch on top of <code class="language-plaintext highlighter-rouge">cl-defstruct</code> would be to <a href="/blog/2014/10/21/">shove a function in one
of the slots</a>. Then the “method” is just a wrapper that call this
function.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Base "class"</span>

<span class="p">(</span><span class="nv">cl-defstruct</span> <span class="nv">greeter</span>
  <span class="nv">greeting</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">greet</span> <span class="p">(</span><span class="nv">thing</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">greeter-greeting</span> <span class="nv">thing</span><span class="p">)</span> <span class="nv">thing</span><span class="p">))</span>

<span class="c1">;; Cow "class"</span>

<span class="p">(</span><span class="nv">cl-defstruct</span> <span class="p">(</span><span class="nv">cow</span> <span class="p">(</span><span class="ss">:include</span> <span class="nv">greeter</span><span class="p">)</span>
                   <span class="p">(</span><span class="ss">:constructor</span> <span class="nv">cow--create</span><span class="p">)))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">cow-create</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">cow--create</span> <span class="ss">:greeting</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">_</span><span class="p">)</span> <span class="s">"Moo!"</span><span class="p">)))</span>

<span class="c1">;; Bird "class"</span>

<span class="p">(</span><span class="nv">cl-defstruct</span> <span class="p">(</span><span class="nv">bird</span> <span class="p">(</span><span class="ss">:include</span> <span class="nv">greeter</span><span class="p">)</span>
                    <span class="p">(</span><span class="ss">:constructor</span> <span class="nv">bird--create</span><span class="p">)))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">bird-create</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">bird--create</span> <span class="ss">:greeting</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">_</span><span class="p">)</span> <span class="s">"Chirp!"</span><span class="p">)))</span>

<span class="c1">;; Usage:</span>

<span class="p">(</span><span class="nv">greet</span> <span class="p">(</span><span class="nv">cow-create</span><span class="p">))</span>
<span class="c1">;; =&gt; "Moo!"</span>

<span class="p">(</span><span class="nv">greet</span> <span class="p">(</span><span class="nv">bird-create</span><span class="p">))</span>
<span class="c1">;; =&gt; "Chirp!"</span>
</code></pre></div></div>

<p>Since cl-generic is aware of the types created by <code class="language-plaintext highlighter-rouge">cl-defstruct</code>,
functions can specialize on them as if they were native types. It’s a
lot simpler to let cl-generic do all the hard work. The people reading
your code will appreciate it, too:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'cl-generic</span><span class="p">)</span>

<span class="p">(</span><span class="nv">cl-defgeneric</span> <span class="nv">greet</span> <span class="p">(</span><span class="nv">greeter</span><span class="p">))</span>

<span class="p">(</span><span class="nv">cl-defstruct</span> <span class="nv">cow</span><span class="p">)</span>

<span class="p">(</span><span class="nv">cl-defmethod</span> <span class="nv">greet</span> <span class="p">((</span><span class="nv">_</span> <span class="nv">cow</span><span class="p">))</span>
  <span class="s">"Moo!"</span><span class="p">)</span>

<span class="p">(</span><span class="nv">cl-defstruct</span> <span class="nv">bird</span><span class="p">)</span>

<span class="p">(</span><span class="nv">cl-defmethod</span> <span class="nv">greet</span> <span class="p">((</span><span class="nv">_</span> <span class="nv">bird</span><span class="p">))</span>
  <span class="s">"Chirp!"</span><span class="p">)</span>

<span class="p">(</span><span class="nv">greet</span> <span class="p">(</span><span class="nv">make-cow</span><span class="p">))</span>
<span class="c1">;; =&gt; "Moo!"</span>

<span class="p">(</span><span class="nv">greet</span> <span class="p">(</span><span class="nv">make-bird</span><span class="p">))</span>
<span class="c1">;; =&gt; "Chirp!"</span>
</code></pre></div></div>

<p>The majority of the time a simple <code class="language-plaintext highlighter-rouge">cl-defstruct</code> will fulfill your
needs, keeping in mind the gotcha with the constructor and copier
names. Its use should feel almost as natural as defining functions.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  <entry>
    <title>What's in an Emacs Lambda</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2017/12/14/"/>
    <id>urn:uuid:efcc8cf7-11d3-3bd3-9fc9-a23e80f7bf33</id>
    <updated>2017-12-14T18:18:57Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="compsci"/><category term="lang"/>
    <content type="html">
      <![CDATA[<p>There was recently some <a href="https://old.reddit.com/r/emacs/comments/7h23ed/dynamically_construct_a_lambda_function/">interesting discussion</a> about correctly
using backquotes to express a mixture of data and code. Since lambda
expressions <em>seem</em> to evaluate to themselves, what’s the difference?
For example, an association list of operations:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">'</span><span class="p">((</span><span class="nv">add</span> <span class="o">.</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">+</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">sub</span> <span class="o">.</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">mul</span> <span class="o">.</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">div</span> <span class="o">.</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">))))</span>
</code></pre></div></div>

<p>It looks like it would work, and indeed it does work in this case.
However, there are good reasons to actually evaluate those lambda
expressions. Eventually invoking the lambda expressions in the quoted
form above are equivalent to using <code class="language-plaintext highlighter-rouge">eval</code>. So, instead, prefer the
backquote form:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">`</span><span class="p">((</span><span class="nv">add</span> <span class="o">.</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">+</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">sub</span> <span class="o">.</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">mul</span> <span class="o">.</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">div</span> <span class="o">.</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">))))</span>
</code></pre></div></div>

<p>There are a lot of interesting things to say about this, but let’s
first reduce it to two very simple cases:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>

<span class="o">'</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>
</code></pre></div></div>

<p>What’s the difference between these two forms? The first is a lambda
expression, and it evaluates to a function object. The other is a quoted
list that <em>looks like</em> a lambda expression, and it evaluates to a list —
a piece of data.</p>

<p>A naive evaluation of these expressions in <code class="language-plaintext highlighter-rouge">*scratch*</code> (<code class="language-plaintext highlighter-rouge">C-x C-e</code>)
suggests they are are identical, and so it would seem that quoting a
lambda expression doesn’t really matter:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>
<span class="c1">;; =&gt; (lambda (x) x)</span>

<span class="o">'</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>
<span class="c1">;; =&gt; (lambda (x) x)</span>
</code></pre></div></div>

<p>However, there are two common situations where this is not the case:
<strong>byte compilation</strong> and <strong>lexical scope</strong>.</p>

<h3 id="lambda-under-byte-compilation">Lambda under byte compilation</h3>

<p>It’s a little trickier to evaluate these forms byte compiled in the
scratch buffer since that doesn’t happen automatically. But if it did,
it would look like this:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: nil; -*-</span>

<span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>
<span class="c1">;; =&gt; #[(x) "\010\207" [x] 1]</span>

<span class="o">'</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>
<span class="c1">;; =&gt; (lambda (x) x)</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">#[...]</code> is the syntax for a byte-code function object. As
discussed in detail in <a href="/blog/2014/01/04/">my byte-code internals article</a>, it’s a
special vector object that contains byte-code, and other metadata, for
evaluation by Emacs’ virtual stack machine. Elisp is one of very few
languages with <a href="/blog/2013/12/30/">readable function objects</a>, and this feature is
core to its ahead-of-time byte compilation.</p>

<p>The quote, by definition, prevents evaluation, and so inhibits byte
compilation of the lambda expression. It’s vital that the byte compiler
does not try to guess the programmer’s intent and compile the expression
anyway, since that would interfere with lists that just so happen to
look like lambda expressions — i.e. any list containing the <code class="language-plaintext highlighter-rouge">lambda</code>
symbol.</p>

<p>There are three reasons you want your lambda expressions to get byte
compiled:</p>

<ul>
  <li>
    <p>Byte-compiled functions are significantly faster. That’s the main
purpose for byte compilation after all.</p>
  </li>
  <li>
    <p>The compiler performs static checks, producing warnings and errors
ahead of time. This lets you spot certain classes of problems before
they occur. The static analysis is even better under lexical scope due
to its tighter semantics.</p>
  </li>
  <li>
    <p>Under lexical scope, byte-compiled closures may use less memory. More
specifically, they won’t accidentally keep objects alive longer than
necessary. I’ve never seen a name for this implementation issue, but I
call it <em>overcapturing</em>. More on this later.</p>
  </li>
</ul>

<p>While it’s common for personal configurations to skip byte compilation,
Elisp should still generally be written as if it were going to be byte
compiled. General rule of thumb: <strong>Ensure your lambda expressions are
actually evaluated.</strong></p>

<h3 id="lambda-in-lexical-scope">Lambda in lexical scope</h3>

<p>As I’ve stressed many times, <a href="/blog/2016/12/22/">you should <em>always</em> use lexical
scope</a>. There’s no practical disadvantage or trade-off involved.
Just do it.</p>

<p>Once lexical scope is enabled, the two expressions diverge even without
byte compilation:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: t; -*-</span>

<span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>
<span class="c1">;; =&gt; (closure (t) (x) x)</span>

<span class="o">'</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>
<span class="c1">;; =&gt; (lambda (x) x)</span>
</code></pre></div></div>

<p>Under lexical scope, lambda expressions evaluate to <em>closures</em>.
Closures capture their lexical environment in their closure object —
nothing in this particular case. It’s a type of function object,
making it a valid first argument to <code class="language-plaintext highlighter-rouge">funcall</code>.</p>

<p>Since the quote prevents the second expression from being evaluated,
semantically it evaluates to a list that just so happens to look like
a (non-closure) function object. <strong>Invoking a <em>data</em> object as a
function is like using <code class="language-plaintext highlighter-rouge">eval</code></strong> — i.e. executing data as code.
Everyone already knows <code class="language-plaintext highlighter-rouge">eval</code> should not be used lightly.</p>

<p>It’s a little more interesting to look at a closure that actually
captures a variable, so here’s a definition for <code class="language-plaintext highlighter-rouge">constantly</code>, a
higher-order function that returns a closure that accepts any number of
arguments and returns a particular constant:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nb">constantly</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span>
  <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">_</span><span class="p">)</span> <span class="nv">x</span><span class="p">))</span>
</code></pre></div></div>

<p>Without byte compiling it, here’s an example of its return value:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">constantly</span> <span class="ss">:foo</span><span class="p">)</span>
<span class="c1">;; =&gt; (closure ((x . :foo) t) (&amp;rest _) x)</span>
</code></pre></div></div>

<p>The environment has been captured as an association list (with a
trailing <code class="language-plaintext highlighter-rouge">t</code>), and we can plainly see that the variable <code class="language-plaintext highlighter-rouge">x</code> is bound to
the symbol <code class="language-plaintext highlighter-rouge">:foo</code> in this closure. Consider that we could manipulate
this data structure (e.g. <code class="language-plaintext highlighter-rouge">setcdr</code> or <code class="language-plaintext highlighter-rouge">setf</code>) to change the binding of
<code class="language-plaintext highlighter-rouge">x</code> for this closure. <em>This is essentially how closures mutate their own
environment.</em> Moreover, closures from the same environment share
structure, so such mutations are also shared. More on this later.</p>

<p>Semantically, closures are distinct objects (via <code class="language-plaintext highlighter-rouge">eq</code>), even if the
variables they close over are bound to the same value. This is because
they each have a distinct environment attached to them, even if in
some invisible way.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nb">constantly</span> <span class="ss">:foo</span><span class="p">)</span> <span class="p">(</span><span class="nb">constantly</span> <span class="ss">:foo</span><span class="p">))</span>
<span class="c1">;; =&gt; nil</span>
</code></pre></div></div>

<p>Without byte compilation, this is true <em>even when there’s no lexical
environment to capture</em>:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">dummy</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="no">t</span><span class="p">))</span>

<span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nv">dummy</span><span class="p">)</span> <span class="p">(</span><span class="nv">dummy</span><span class="p">))</span>
<span class="c1">;; =&gt; nil</span>
</code></pre></div></div>

<p>The byte compiler is smart, though. <a href="/blog/2017/01/30/">As an optimization</a>, the
same closure object is reused when possible, avoiding unnecessary
work, including multiple object allocations. Though this is a bit of
an abstraction leak. A function can (ab)use this to introspect whether
it’s been byte compiled:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">have-i-been-compiled-p</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">funcs</span> <span class="p">(</span><span class="nb">vector</span> <span class="no">nil</span> <span class="no">nil</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">i</span> <span class="mi">2</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">funcs</span> <span class="nv">i</span><span class="p">)</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">())))</span>
    <span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">funcs</span> <span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">funcs</span> <span class="mi">1</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">have-i-been-compiled-p</span><span class="p">)</span>
<span class="c1">;; =&gt; nil</span>

<span class="p">(</span><span class="nv">byte-compile</span> <span class="ss">'have-i-been-compiled-p</span><span class="p">)</span>

<span class="p">(</span><span class="nv">have-i-been-compiled-p</span><span class="p">)</span>
<span class="c1">;; =&gt; t</span>
</code></pre></div></div>

<p>The trick here is to evaluate the exact same non-capturing lambda
expression twice, which requires a loop (or at least some sort of
branch). <em>Semantically</em> we should think of these closures as being
distinct objects, but, if we squint our eyes a bit, we can see the
effects of the behind-the-scenes optimization.</p>

<p>Don’t actually do this in practice, of course. That’s what
<code class="language-plaintext highlighter-rouge">byte-code-function-p</code> is for, which won’t rely on a subtle
implementation detail.</p>

<h3 id="overcapturing">Overcapturing</h3>

<p>I mentioned before that one of the potential gotchas of not byte
compiling your lambda expressions is overcapturing closure variables in
the interpreter.</p>

<p>To evaluate lisp code, Emacs has both an interpreter and a virtual
machine. The interpreter evaluates code in list form: cons cells,
numbers, symbols, etc. The byte compiler is like the interpreter, but
instead of directly executing those forms, it emits byte-code that, when
evaluated by the virtual machine, produces identical visible results to
the interpreter — <em>in theory</em>.</p>

<p>What this means is that <strong>Emacs contains two different implementations
of Emacs Lisp</strong>, one in the interpreter and one in the byte compiler.
The Emacs developers have been maintaining and expanding these
implementations side-by-side for decades. A pitfall to this approach
is that the <em>implementations can, and do, diverge in their behavior</em>.
We saw this above with that introspective function, and it <a href="/blog/2013/01/22/">comes up
in practice with advice</a>.</p>

<p>Another way they diverge is in closure variable capture. For example:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: t; -*-</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">overcapture</span> <span class="p">(</span><span class="nv">x</span> <span class="nv">y</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">when</span> <span class="nv">y</span>
    <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="nv">x</span><span class="p">)))</span>

<span class="p">(</span><span class="nv">overcapture</span> <span class="ss">:x</span> <span class="ss">:some-big-value</span><span class="p">)</span>
<span class="c1">;; =&gt; (closure ((y . :some-big-value) (x . :x) t) nil x)</span>
</code></pre></div></div>

<p>Notice that the closure captured <code class="language-plaintext highlighter-rouge">y</code> even though it’s unnecessary.
This is because the interpreter doesn’t, and shouldn’t, take the time
to analyze the body of the lambda to determine which variables should
be captured. That would need to happen at run-time each time the
lambda is evaluated, which would make the interpreter much slower.
Overcapturing can get pretty messy if macros are introducing their own
hidden variables.</p>

<p>On the other hand, the byte compiler can do this analysis just once at
compile-time. And it’s already doing the analysis as part of its job.
It can avoid this problem easily:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">overcapture</span> <span class="ss">:x</span> <span class="ss">:some-big-value</span><span class="p">)</span>
<span class="c1">;; =&gt; #[0 "\300\207" [:x] 1]</span>
</code></pre></div></div>

<p>It’s clear that <code class="language-plaintext highlighter-rouge">:some-big-value</code> isn’t present in the closure.</p>

<p>But… how does this work?</p>

<h3 id="how-byte-compiled-closures-are-constructed">How byte compiled closures are constructed</h3>

<p>Recall from the <a href="/blog/2014/01/04/">internals article</a> that the four core elements of a
byte-code function object are:</p>

<ol>
  <li>Parameter specification</li>
  <li>Byte-code string (opcodes)</li>
  <li>Constants vector</li>
  <li>Maximum stack usage</li>
</ol>

<p>While a closure <em>seems</em> like compiling a whole new function each time
the lambda expression is evaluated, there’s actually not that much to
it! Namely, <a href="/blog/2017/01/08/">the <em>behavior</em> of the function remains the same</a>. Only
the closed-over environment changes.</p>

<p>What this means is that closures produced by a common lambda
expression can all share the same byte-code string (second element).
Their bodies are identical, so they compile to the same byte-code.
Where they differ are in their constants vector (third element), which
gets filled out according to the closed over environment. It’s clear
just from examining the outputs:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">constantly</span> <span class="ss">:a</span><span class="p">)</span>
<span class="c1">;; =&gt; #[128 "\300\207" [:a] 2]</span>

<span class="p">(</span><span class="nb">constantly</span> <span class="ss">:b</span><span class="p">)</span>
<span class="c1">;; =&gt; #[128 "\300\207" [:b] 2]</span>

</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">constantly</code> has three of the four components of the closure in its own
constant pool. Its job is to construct the constants vector, and then
assemble the whole thing into a byte-code function object (<code class="language-plaintext highlighter-rouge">#[...]</code>).
Here it is with <code class="language-plaintext highlighter-rouge">M-x disassemble</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  make-byte-code
1       constant  128
2       constant  "\300\207"
4       constant  vector
5       stack-ref 4
6       call      1
7       constant  2
8       call      4
9       return
</code></pre></div></div>

<p>(Note: since byte compiler doesn’t produce perfectly optimal code, I’ve
simplified it for this discussion.)</p>

<p>It pushes most of its constants on the stack. Then the <code class="language-plaintext highlighter-rouge">stack-ref 5</code> (5)
puts <code class="language-plaintext highlighter-rouge">x</code> on the stack. Then it calls <code class="language-plaintext highlighter-rouge">vector</code> to create the constants
vector (6). Finally, it constructs the function object (<code class="language-plaintext highlighter-rouge">#[...]</code>) by
calling <code class="language-plaintext highlighter-rouge">make-byte-code</code> (8).</p>

<p>Since this might be clearer, here’s the same thing expressed back in
terms of Elisp:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nb">constantly</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">make-byte-code</span> <span class="mi">128</span> <span class="s">"\300\207"</span> <span class="p">(</span><span class="nb">vector</span> <span class="nv">x</span><span class="p">)</span> <span class="mi">2</span><span class="p">))</span>
</code></pre></div></div>

<p>To see the disassembly of the closure’s byte-code:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">disassemble</span> <span class="p">(</span><span class="nb">constantly</span> <span class="ss">:x</span><span class="p">))</span>
</code></pre></div></div>

<p>The result isn’t very surprising:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  :x
1       return
</code></pre></div></div>

<p>Things get a little more interesting when mutation is involved. Consider
this adder closure generator, which mutates its environment every time
it’s called:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">adder</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">total</span> <span class="mi">0</span><span class="p">))</span>
    <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="p">(</span><span class="nv">cl-incf</span> <span class="nv">total</span><span class="p">))))</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nb">count</span> <span class="p">(</span><span class="nv">adder</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">funcall</span> <span class="nb">count</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">funcall</span> <span class="nb">count</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">funcall</span> <span class="nb">count</span><span class="p">))</span>
<span class="c1">;; =&gt; 3</span>

<span class="p">(</span><span class="nv">adder</span><span class="p">)</span>
<span class="c1">;; =&gt; #[0 "\300\211\242T\240\207" [(0)] 2]</span>
</code></pre></div></div>

<p>The adder essentially works like this:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">adder</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">make-byte-code</span> <span class="mi">0</span> <span class="s">"\300\211\242T\240\207"</span> <span class="p">(</span><span class="nb">vector</span> <span class="p">(</span><span class="nb">list</span> <span class="mi">0</span><span class="p">))</span> <span class="mi">2</span><span class="p">))</span>
</code></pre></div></div>

<p><em>In theory</em>, this closure could operate by mutating its constants vector
directly. But that wouldn’t be much of a <em>constants</em> vector, now would
it!? Instead, mutated variables are <em>boxed</em> inside a cons cell. Closures
don’t share constant vectors, so the main reason for boxing is to share
variables between closures from the same environment. That is, they have
the same cons in each of their constant vectors.</p>

<p>There’s no equivalent Elisp for the closure in <code class="language-plaintext highlighter-rouge">adder</code>, so here’s the
disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  (0)
1       dup
2       car-safe
3       add1
4       setcar
5       return
</code></pre></div></div>

<p>It puts two references to boxed integer on the stack (<code class="language-plaintext highlighter-rouge">constant</code>,
<code class="language-plaintext highlighter-rouge">dup</code>), unboxes the top one (<code class="language-plaintext highlighter-rouge">car-safe</code>), increments that unboxed
integer, stores it back in the box (<code class="language-plaintext highlighter-rouge">setcar</code>) via the bottom reference,
leaving the incremented value behind to be returned.</p>

<p>This all gets a little more interesting when closures interact:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">fancy-adder</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">total</span> <span class="mi">0</span><span class="p">))</span>
    <span class="o">`</span><span class="p">(</span><span class="ss">:add</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="p">(</span><span class="nv">cl-incf</span> <span class="nv">total</span><span class="p">))</span>
      <span class="ss">:set</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nb">setf</span> <span class="nv">total</span> <span class="nv">v</span><span class="p">))</span>
      <span class="ss">:get</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="nv">total</span><span class="p">))))</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">counter</span> <span class="p">(</span><span class="nv">fancy-adder</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">counter</span> <span class="ss">:set</span><span class="p">)</span> <span class="mi">100</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">counter</span> <span class="ss">:add</span><span class="p">))</span>
  <span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">counter</span> <span class="ss">:add</span><span class="p">))</span>
  <span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">counter</span> <span class="ss">:get</span><span class="p">)))</span>
<span class="c1">;; =&gt; 102</span>

<span class="p">(</span><span class="nv">fancy-adder</span><span class="p">)</span>
<span class="c1">;; =&gt; (:add #[0 "\300\211\242T\240\207" [(0)] 2]</span>
<span class="c1">;;     :set #[257 "\300\001\240\207" [(0)] 3]</span>
<span class="c1">;;     :get #[0 "\300\242\207" [(0)] 1])</span>
</code></pre></div></div>

<p>This is starting to resemble object oriented programming, with methods
acting upon fields stored in a common, closed-over environment.</p>

<p>All three closures share a common variable, <code class="language-plaintext highlighter-rouge">total</code>. Since I didn’t
use <code class="language-plaintext highlighter-rouge">print-circle</code>, this isn’t obvious from the last result, but each
of those <code class="language-plaintext highlighter-rouge">(0)</code> conses are the same object. When one closure mutates
the box, they all see the change. Here’s essentially how <code class="language-plaintext highlighter-rouge">fancy-adder</code>
is transformed by the byte compiler:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">fancy-adder</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">box</span> <span class="p">(</span><span class="nb">list</span> <span class="mi">0</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">list</span> <span class="ss">:add</span> <span class="p">(</span><span class="nv">make-byte-code</span> <span class="mi">0</span> <span class="s">"\300\211\242T\240\207"</span> <span class="p">(</span><span class="nb">vector</span> <span class="nv">box</span><span class="p">)</span> <span class="mi">2</span><span class="p">)</span>
          <span class="ss">:set</span> <span class="p">(</span><span class="nv">make-byte-code</span> <span class="mi">257</span> <span class="s">"\300\001\240\207"</span> <span class="p">(</span><span class="nb">vector</span> <span class="nv">box</span><span class="p">)</span> <span class="mi">3</span><span class="p">)</span>
          <span class="ss">:get</span> <span class="p">(</span><span class="nv">make-byte-code</span> <span class="mi">0</span> <span class="s">"\300\242\207"</span> <span class="p">(</span><span class="nb">vector</span> <span class="nv">box</span><span class="p">)</span> <span class="mi">1</span><span class="p">))))</span>
</code></pre></div></div>

<p>The backquote in the original <code class="language-plaintext highlighter-rouge">fancy-adder</code> brings this article full
circle. This final example wouldn’t work correctly if those lambdas
weren’t evaluated properly.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  <entry>
    <title>Make Flet Great Again</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2017/10/27/"/>
    <id>urn:uuid:46576058-9269-392b-96b2-0f434cbb87a2</id>
    <updated>2017-10-27T21:02:58Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>Do you long for the days before Emacs 24.3 when <code class="language-plaintext highlighter-rouge">flet</code> was dynamically
scoped? Well, you probably shouldn’t since there are <a href="/blog/2016/12/22/">some very good
reasons</a> lexical scope. But, still, a dynamically scoped <code class="language-plaintext highlighter-rouge">flet</code>
is situationally really useful, particularly in unit testing. The good
news is that it’s trivial to get this original behavior back without
relying on deprecated functions nor third-party packages.</p>

<p>But first, what is <code class="language-plaintext highlighter-rouge">flet</code> and what does it mean for it to be
dynamically scoped? The name stands for “function let” (or something
to that effect). It’s a macro to bind named functions within a local
scope, just as <code class="language-plaintext highlighter-rouge">let</code> binds variables within some local scope. It’s
provided by the now-deprecated <code class="language-plaintext highlighter-rouge">cl</code> package.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'cl</span><span class="p">)</span>  <span class="c1">; deprecated!</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">norm</span> <span class="p">(</span><span class="nv">x</span> <span class="nv">y</span><span class="p">)</span>
  <span class="p">(</span><span class="k">flet</span> <span class="p">((</span><span class="nv">square</span> <span class="p">(</span><span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">v</span> <span class="nv">v</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">sqrt</span> <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nv">square</span> <span class="nv">x</span><span class="p">)</span> <span class="p">(</span><span class="nv">square</span> <span class="nv">y</span><span class="p">)))))</span>
</code></pre></div></div>

<p>However, a gotcha here is that <code class="language-plaintext highlighter-rouge">square</code> is visible not just to the body
of <code class="language-plaintext highlighter-rouge">norm</code> but also to any function called directly or indirectly from
the <code class="language-plaintext highlighter-rouge">flet</code> body. That’s dynamic scope.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">flet</span> <span class="p">((</span><span class="nb">sqrt</span> <span class="p">(</span><span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">v</span> <span class="mi">2</span><span class="p">)))</span>  <span class="c1">; close enough</span>
  <span class="p">(</span><span class="nv">norm</span> <span class="mi">2</span> <span class="mi">2</span><span class="p">))</span>
<span class="c1">;; -&gt; 4</span>
</code></pre></div></div>

<p>Note: This works because <code class="language-plaintext highlighter-rouge">sqrt</code> hasn’t (yet?) been assigned a bytecode
opcode. One weakness with <code class="language-plaintext highlighter-rouge">flet</code> is that, due to being dynamically
scoped, it is unable to define or override functions whose calls
evaporate under byte compilation. For example, addition:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">add-with-flet</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">flet</span> <span class="p">((</span><span class="nb">+</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">_</span><span class="p">)</span> <span class="ss">:override</span><span class="p">))</span>
    <span class="p">(</span><span class="nb">+</span> <span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span><span class="p">)))</span>

<span class="p">(</span><span class="nv">add-with-flet</span><span class="p">)</span>
<span class="c1">;; -&gt; :override</span>

<span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">byte-compile</span> <span class="nf">#'</span><span class="nv">add-with-flet</span><span class="p">))</span>
<span class="c1">;; -&gt; 6</span>
</code></pre></div></div>

<p>Since <code class="language-plaintext highlighter-rouge">+</code> has its own opcode, the function call is eliminated under
byte-compilation and <code class="language-plaintext highlighter-rouge">flet</code> can’t do its job. This is similar <a href="/blog/2013/01/22/">these
same functions being <em>unadvisable</em></a>.</p>

<h3 id="cl-lib-and-cl-flet">cl-lib and cl-flet</h3>

<p>The <code class="language-plaintext highlighter-rouge">cl-lib</code> package introduced in Emacs 24.3, replacing <code class="language-plaintext highlighter-rouge">cl</code>, adds a
namespace prefix, <code class="language-plaintext highlighter-rouge">cl-</code>, to all of these Common Lisp style functions.
In most cases this was the only change. One exception is <code class="language-plaintext highlighter-rouge">cl-flet</code>,
which has different semantics: It’s lexically scoped, just like in
Common Lisp. Its bindings aren’t visible outside of the <code class="language-plaintext highlighter-rouge">cl-flet</code>
body.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'cl-lib</span><span class="p">)</span>

<span class="p">(</span><span class="nv">cl-flet</span> <span class="p">((</span><span class="nb">sqrt</span> <span class="p">(</span><span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">v</span> <span class="mi">2</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">norm</span> <span class="mi">2</span> <span class="mi">2</span><span class="p">))</span>
<span class="c1">;; -&gt; 2.8284271247461903</span>
</code></pre></div></div>

<p>In most cases <em>this is what you actually want</em>. The old <code class="language-plaintext highlighter-rouge">flet</code> subtly
changes the environment for all functions called directly or
indirectly from its body.</p>

<p>Besides being cleaner and less error prone, <code class="language-plaintext highlighter-rouge">cl-flet</code> also doesn’t
have special exceptions for functions with assigned opcodes. At
macro-expansion time it walks the body, taking its action before the
byte-compiler can interfere.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">add-with-cl-flet</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">cl-flet</span> <span class="p">((</span><span class="nb">+</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">_</span><span class="p">)</span> <span class="ss">:override</span><span class="p">))</span>
    <span class="p">(</span><span class="nb">+</span> <span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span><span class="p">)))</span>

<span class="p">(</span><span class="nv">add-with-cl-flet</span><span class="p">)</span>
<span class="c1">;; -&gt; :override</span>

<span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">byte-compile</span> <span class="nf">#'</span><span class="nv">add-with-cl-flet</span><span class="p">))</span>
<span class="c1">;; -&gt; :override</span>
</code></pre></div></div>

<p>In order for it to work properly, it’s essential that functions are
quoted with sharp-quotes (<code class="language-plaintext highlighter-rouge">#'</code>) so that the macro can tell the
difference between functions and symbols. Just make a general habit of
sharp-quoting functions.</p>

<p>In unit testing, temporarily overriding functions for all of Emacs is
useful, so <code class="language-plaintext highlighter-rouge">flet</code> still has some uses. But it’s deprecated!</p>

<h3 id="unit-testing-with-flet">Unit testing with flet</h3>

<p>Since Emacs can do anything, suppose there is an Emacs package that
makes sandwiches. In this package there’s an interactive function to
set the default sandwich cheese.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">default-cheese</span> <span class="ss">'cheddar</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">set-default-cheese</span> <span class="p">(</span><span class="k">type</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">interactive</span>
   <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">options</span> <span class="o">'</span><span class="p">(</span><span class="s">"cheddar"</span> <span class="s">"swiss"</span> <span class="s">"american"</span><span class="p">))</span>
          <span class="p">(</span><span class="nv">input</span> <span class="p">(</span><span class="nv">completing-read</span> <span class="s">"Cheese: "</span> <span class="nv">options</span> <span class="no">nil</span> <span class="no">t</span><span class="p">)))</span>
     <span class="p">(</span><span class="nb">when</span> <span class="nv">input</span>
       <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="nb">intern</span> <span class="nv">input</span><span class="p">)))))</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="nv">default-cheese</span> <span class="k">type</span><span class="p">))</span>
</code></pre></div></div>

<p>Since it’s interactive, it uses <code class="language-plaintext highlighter-rouge">completing-read</code> to prompt the user
for input. A unit test could call this function non-interactively, but
perhaps we’d also like to test the interactive path. The code inside
<code class="language-plaintext highlighter-rouge">interactive</code> occasionally gets messy and may warrant testing. It
would obviously be inconvenient to prompt the user for input during
testing, and it wouldn’t work at all in batch mode (<code class="language-plaintext highlighter-rouge">-batch</code>).</p>

<p>With <code class="language-plaintext highlighter-rouge">flet</code> we can stub out <code class="language-plaintext highlighter-rouge">completing-read</code> just for the unit test:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: t; -*-</span>

<span class="p">(</span><span class="nv">ert-deftest</span> <span class="nv">test-set-default-cheese</span> <span class="p">()</span>
  <span class="c1">;; protect original with dynamic binding</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">(</span><span class="nv">default-cheese</span><span class="p">)</span>
    <span class="c1">;; simulate user entering "american"</span>
    <span class="p">(</span><span class="k">flet</span> <span class="p">((</span><span class="nv">completing-read</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">_</span><span class="p">)</span> <span class="s">"american"</span><span class="p">))</span>
      <span class="p">(</span><span class="nv">call-interactively</span> <span class="nf">#'</span><span class="nv">set-default-cheese</span><span class="p">)</span>
      <span class="p">(</span><span class="nv">should</span> <span class="p">(</span><span class="nb">eq</span> <span class="ss">'american</span> <span class="nv">default-cheese</span><span class="p">)))))</span>
</code></pre></div></div>

<p>Since <code class="language-plaintext highlighter-rouge">default-cheese</code> was defined with <code class="language-plaintext highlighter-rouge">defvar</code>, it will be
dynamically scoped despite <code class="language-plaintext highlighter-rouge">let</code> normally using lexical scope in this
example. Both of the <em>side effects</em> of the tested function — setting a
global variable and prompting the user — are captured using a
combination of <code class="language-plaintext highlighter-rouge">let</code> and <code class="language-plaintext highlighter-rouge">flet</code>.</p>

<p>Since <code class="language-plaintext highlighter-rouge">cl-flet</code> is lexically scoped, it cannot serve this purpose. If
<code class="language-plaintext highlighter-rouge">flet</code> is deprecated and <code class="language-plaintext highlighter-rouge">cl-flet</code> can’t do the job, what’s the right
way to fix it? The answer lies in <em>generalized variables</em>.</p>

<h3 id="cl-letf">cl-letf</h3>

<p>What’s <em>really</em> happening inside <code class="language-plaintext highlighter-rouge">flet</code> is it’s globally binding a
function name to a different function, evaluating the body, and
rebinding it back to the original definition when the body completes.
It macro-expands to something like this:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">original</span> <span class="p">(</span><span class="nb">symbol-function</span> <span class="ss">'completing-read</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">symbol-function</span> <span class="ss">'completing-read</span><span class="p">)</span>
        <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">_</span><span class="p">)</span> <span class="s">"american"</span><span class="p">))</span>
  <span class="p">(</span><span class="k">unwind-protect</span>
      <span class="p">(</span><span class="nv">call-interactively</span> <span class="nf">#'</span><span class="nv">set-default-cheese</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">symbol-function</span> <span class="ss">'completing-read</span><span class="p">)</span> <span class="nv">original</span><span class="p">)))</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">unwind-protect</code> ensures the original function is rebound even if
the body of the call were to fail. This is very much a <code class="language-plaintext highlighter-rouge">let</code>-like
pattern, and I’m using <code class="language-plaintext highlighter-rouge">symbol-function</code> as a generalized variable via
<code class="language-plaintext highlighter-rouge">setf</code>. Is there a generalized variable version of <code class="language-plaintext highlighter-rouge">let</code>?</p>

<p>Yes! It’s called <code class="language-plaintext highlighter-rouge">cl-letf</code>! In this case the <code class="language-plaintext highlighter-rouge">f</code> suffix is analogous
to the <code class="language-plaintext highlighter-rouge">f</code> suffix in <code class="language-plaintext highlighter-rouge">setf</code>. That form above can be reduced to a more
general form:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-letf</span> <span class="p">(((</span><span class="nb">symbol-function</span> <span class="ss">'completing-read</span><span class="p">)</span>
           <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">_</span><span class="p">)</span> <span class="s">"american"</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">call-interactively</span> <span class="nf">#'</span><span class="nv">set-default-cheese</span><span class="p">))</span>
</code></pre></div></div>

<p>And <em>that’s</em> the way to reproduce the dynamically scoped behavior of
<code class="language-plaintext highlighter-rouge">flet</code> since Emacs 24.3. There’s nothing complicated about it.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">ert-deftest</span> <span class="nv">test-set-default-cheese</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">(</span><span class="nv">default-cheese</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">cl-letf</span> <span class="p">(((</span><span class="nb">symbol-function</span> <span class="ss">'completing-read</span><span class="p">)</span>
               <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">_</span><span class="p">)</span> <span class="s">"american"</span><span class="p">)))</span>
      <span class="p">(</span><span class="nv">call-interactively</span> <span class="nf">#'</span><span class="nv">set-default-cheese</span><span class="p">)</span>
      <span class="p">(</span><span class="nv">should</span> <span class="p">(</span><span class="nb">eq</span> <span class="ss">'american</span> <span class="nv">default-cheese</span><span class="p">)))))</span>

</code></pre></div></div>

<p>Keep in mind that this suffers the exact same problem with
bytecode-assigned functions as <code class="language-plaintext highlighter-rouge">flet</code>, and for exactly the same
reasons. If <code class="language-plaintext highlighter-rouge">completing-read</code> were to ever be assigned its own opcode
then <code class="language-plaintext highlighter-rouge">cl-letf</code> would no longer work for this particular example.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Asynchronous Requests from Emacs Dynamic Modules</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2017/02/14/"/>
    <id>urn:uuid:00a59e4f-268c-343f-e6c6-bb23cde265de</id>
    <updated>2017-02-14T02:30:00Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="c"/><category term="linux"/><category term="win32"/>
    <content type="html">
      <![CDATA[<p>A few months ago I had a discussion with Vladimir Kazanov about his
<a href="https://github.com/vkazanov/toy-orgfuse">Orgfuse</a> project: a Python script that exposes an Emacs
Org-mode document as a <a href="https://en.wikipedia.org/wiki/Filesystem_in_Userspace">FUSE filesystem</a>. It permits other
programs to navigate the structure of an Org-mode document through the
standard filesystem APIs. I suggested that, with the new dynamic
modules in Emacs 25, Emacs <em>itself</em> could serve a FUSE filesystem. In
fact, support for FUSE services in general could be an package of his
own.</p>

<p>So that’s what he did: <a href="https://github.com/vkazanov/elfuse"><strong>Elfuse</strong></a>. It’s an old joke that
Emacs is an operating system, and here it is handling system calls.</p>

<p>However, there’s a tricky problem to solve, an issue also present <a href="/blog/2016/11/05/">my
joystick module</a>. Both modules handle asynchronous events —
filesystem requests or joystick events — but Emacs runs the event loop
and owns the main thread. The external events somehow need to feed
into the main event loop. It’s even more difficult with FUSE because
FUSE <em>also</em> wants control of its own thread for its own event loop.
This requires Elfuse to spawn a dedicated FUSE thread and negotiate a
request/response hand-off.</p>

<p>When a filesystem request or joystick event arrives, how does Emacs
know to handle it? The simple and obvious solution is to poll the
module from a timer.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">queue</span> <span class="n">requests</span><span class="p">;</span>

<span class="n">emacs_value</span>
<span class="nf">Frequest_next</span><span class="p">(</span><span class="n">emacs_env</span> <span class="o">*</span><span class="n">env</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">n</span><span class="p">,</span> <span class="n">emacs_value</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">p</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">emacs_value</span> <span class="n">next</span> <span class="o">=</span> <span class="n">Qnil</span><span class="p">;</span>
    <span class="n">queue_lock</span><span class="p">(</span><span class="n">requests</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">queue_length</span><span class="p">(</span><span class="n">requests</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">void</span> <span class="o">*</span><span class="n">request</span> <span class="o">=</span> <span class="n">queue_pop</span><span class="p">(</span><span class="n">requests</span><span class="p">,</span> <span class="n">env</span><span class="p">);</span>
        <span class="n">next</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_user_ptr</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">fin_empty</span><span class="p">,</span> <span class="n">request</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="n">queue_unlock</span><span class="p">(</span><span class="n">request</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">next</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And then ask Emacs to check the module every, say, 10ms:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">request--poll</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">next</span> <span class="p">(</span><span class="nv">request-next</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">when</span> <span class="nv">next</span>
      <span class="p">(</span><span class="nv">request-handle</span> <span class="nv">next</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">run-at-time</span> <span class="mi">0</span> <span class="mf">0.01</span> <span class="nf">#'</span><span class="nv">request--poll</span><span class="p">)</span>
</code></pre></div></div>

<p>Blocking directly on the module’s event pump with Emacs’ thread would
prevent Emacs from doing important things like, you know, <em>being a
text editor</em>. The timer allows it to handle its own events
uninterrupted. It gets the job done, but it’s far from perfect:</p>

<ol>
  <li>
    <p>It imposes an arbitrary latency to handling requests. Up to the
poll period could pass before a request is handled.</p>
  </li>
  <li>
    <p>Polling the module 100 times per second is inefficient. Unless you
really enjoy recharging your laptop, that’s no good.</p>
  </li>
</ol>

<p>The poll period is a sliding trade-off between latency and battery
life. If only there was some mechanism to, ahem, <em>signal</em> the Emacs
thread, informing it that a request is waiting…</p>

<h3 id="sigusr1">SIGUSR1</h3>

<p>Emacs Lisp programs can handle the POSIX SIGUSR1 and SIGUSR2 signals,
which is exactly the mechanism we need. The interface is a “key”
binding on <code class="language-plaintext highlighter-rouge">special-event-map</code>, the keymap that handles these kinds of
events. When the signal arrives, Emacs queues it up for the main event
loop.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">define-key</span> <span class="nv">special-event-map</span> <span class="nv">[sigusr1]</span>
  <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
    <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">request-handle</span> <span class="p">(</span><span class="nv">request-next</span><span class="p">))))</span>
</code></pre></div></div>

<p>The module blocks on its own thread on its own event pump. When a
request arrives, it queues the request, rings the bell for Emacs to
come handle it (<code class="language-plaintext highlighter-rouge">raise()</code>), and waits on a semaphore. For illustration
purposes, assume the module reads requests from and writes responses
to a file descriptor, like a socket.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">event_fd</span> <span class="o">=</span> <span class="cm">/* ... */</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">request</span> <span class="n">request</span><span class="p">;</span>
<span class="n">sem_init</span><span class="p">(</span><span class="o">&amp;</span><span class="n">request</span><span class="p">.</span><span class="n">sem</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>

<span class="k">for</span> <span class="p">(;;)</span> <span class="p">{</span>
    <span class="cm">/* Blocking read for request event */</span>
    <span class="n">read</span><span class="p">(</span><span class="n">event_fd</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">request</span><span class="p">.</span><span class="n">event</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">request</span><span class="p">.</span><span class="n">event</span><span class="p">));</span>

    <span class="cm">/* Put request on the queue */</span>
    <span class="n">queue_lock</span><span class="p">(</span><span class="n">requests</span><span class="p">);</span>
    <span class="n">queue_push</span><span class="p">(</span><span class="n">requests</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">request</span><span class="p">);</span>
    <span class="n">queue_unlock</span><span class="p">(</span><span class="n">requests</span><span class="p">);</span>
    <span class="n">raise</span><span class="p">(</span><span class="n">SIGUSR1</span><span class="p">);</span>  <span class="c1">// TODO: Should raise() go inside the lock?</span>

    <span class="cm">/* Wait for Emacs */</span>
    <span class="k">while</span> <span class="p">(</span><span class="n">sem_wait</span><span class="p">(</span><span class="o">&amp;</span><span class="n">request</span><span class="p">.</span><span class="n">sem</span><span class="p">))</span>
        <span class="p">;</span>

    <span class="cm">/* Reply with Emacs' response */</span>
    <span class="n">write</span><span class="p">(</span><span class="n">event_fd</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">request</span><span class="p">.</span><span class="n">response</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">request</span><span class="p">.</span><span class="n">response</span><span class="p">));</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">sem_wait()</code> is in a loop because signals will wake it up
prematurely. In fact, it may even wake up due to its own signal on the
line before. This is the only way this particular use of <code class="language-plaintext highlighter-rouge">sem_wait()</code>
might fail, so there’s no need to check <code class="language-plaintext highlighter-rouge">errno</code>.</p>

<p>If there are multiple module threads making requests to the same
global queue, the lock is necessary to protect the queue. The
semaphore is only for blocking the thread until Emacs has finished
writing its particular response. Each thread has its own semaphore.</p>

<p>When Emacs is done writing the response, it releases the module thread
by incrementing the semaphore. It might look something like this:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">emacs_value</span>
<span class="nf">Frequest_complete</span><span class="p">(</span><span class="n">emacs_env</span> <span class="o">*</span><span class="n">env</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">n</span><span class="p">,</span> <span class="n">emacs_value</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">p</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">request</span> <span class="o">*</span><span class="n">request</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">get_user_ptr</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">request</span><span class="p">)</span>
        <span class="n">sem_post</span><span class="p">(</span><span class="o">&amp;</span><span class="n">request</span><span class="o">-&gt;</span><span class="n">sem</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">Qnil</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The top-level handler dispatches to the specific request handler,
calling <code class="language-plaintext highlighter-rouge">request-complete</code> above when it’s done.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">request-handle</span> <span class="p">(</span><span class="nv">next</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">condition-case</span> <span class="nv">e</span>
      <span class="p">(</span><span class="nv">cl-ecase</span> <span class="p">(</span><span class="nv">request-type</span> <span class="nv">next</span><span class="p">)</span>
        <span class="p">(</span><span class="ss">:open</span>  <span class="p">(</span><span class="nv">request-handle-open</span>  <span class="nv">next</span><span class="p">))</span>
        <span class="p">(</span><span class="ss">:close</span> <span class="p">(</span><span class="nv">request-handle-close</span> <span class="nv">next</span><span class="p">))</span>
        <span class="p">(</span><span class="ss">:read</span>  <span class="p">(</span><span class="nv">request-handle-read</span>  <span class="nv">next</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">error</span> <span class="p">(</span><span class="nv">request-respond-as-error</span> <span class="nv">next</span> <span class="nv">e</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">request-complete</span><span class="p">))</span>
</code></pre></div></div>

<p>This SIGUSR1+semaphore mechanism is roughly how Elfuse currently
processes requests.</p>

<h3 id="making-it-work-on-windows">Making it work on Windows</h3>

<p>Windows doesn’t have signals. This isn’t a problem for Elfuse since
Windows doesn’t have FUSE either. Nor does it matter for Joymacs since
XInput isn’t event-driven and always requires polling. But someday
someone will need this mechanism for a dynamic module on Windows.</p>

<p>Fortunately there’s a solution: <em>input language change</em> events,
<code class="language-plaintext highlighter-rouge">WM_INPUTLANGCHANGE</code>. It’s also on <code class="language-plaintext highlighter-rouge">special-event-map</code>:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">define-key</span> <span class="nv">special-event-map</span> <span class="nv">[language-change]</span>
  <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
    <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">request-process</span> <span class="p">(</span><span class="nv">request-next</span><span class="p">))))</span>
</code></pre></div></div>

<p>Instead of <code class="language-plaintext highlighter-rouge">raise()</code> (or <code class="language-plaintext highlighter-rouge">pthread_kill()</code>), broadcast the window event
with <code class="language-plaintext highlighter-rouge">PostMessage()</code>. Outside of invoking the <code class="language-plaintext highlighter-rouge">language-change</code> key
binding, Emacs will ignore the event because WPARAM is 0 — it doesn’t
belong to any particular window. We don’t <em>really</em> want to change the
input language, after all.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">PostMessageA</span><span class="p">(</span><span class="n">HWND_BROADCAST</span><span class="p">,</span> <span class="n">WM_INPUTLANGCHANGE</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
</code></pre></div></div>

<p>Naturally you’ll also need to replace the POSIX threading primitives
with the Windows versions (<code class="language-plaintext highlighter-rouge">CreateThread()</code>, <code class="language-plaintext highlighter-rouge">CreateSemaphore()</code>,
etc.). With a bit of abstraction in the right places, it should be
pretty easy to support both POSIX and Windows in these asynchronous
dynamic module events.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>How to Write Fast(er) Emacs Lisp</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2017/01/30/"/>
    <id>urn:uuid:cee07e3d-08cc-3465-1a29-c1e30b5bd0e2</id>
    <updated>2017-01-30T21:08:19Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="optimization"/>
    <content type="html">
      <![CDATA[<p>Not everything written in Emacs Lisp needs to be fast. Most of Emacs
itself — around 82% — is written in Emacs Lisp <em>because</em> those parts
are generally not performance-critical. Otherwise these functions
would be built-ins written in C. Extensions to Emacs don’t have a
choice and — outside of a few exceptions like <a href="/blog/2016/11/05/">dynamic modules</a>
and inferior processes — must be written in Emacs Lisp, including
their performance-critical bits. Common performance hot spots are
automatic indentation, <a href="https://github.com/mooz/js2-mode">AST parsing</a>, and <a href="/blog/2016/12/11/">interactive
completion</a>.</p>

<p>Here are 5 guidelines, each very specific to Emacs Lisp, that will
result in faster code. The non-intrusive guidelines could be applied
at all times as a matter of style — choosing one equally expressive
and maintainable form over another just because it performs better.</p>

<p>There’s one caveat: These guidelines are focused on Emacs 25.1 and
“nearby” versions. Emacs is constantly evolving. Changes to the
<a href="/blog/2014/01/04/">virtual machine</a> and byte-code compiler may transform
currently-slow expressions into fast code, obsoleting some of these
guidelines. In the future I’ll add notes to this article for anything
that changes.</p>

<h3 id="1-use-lexical-scope">(1) Use lexical scope</h3>

<p>This guideline refers to the following being the first line of every
Emacs Lisp source file you write:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: t; -*-</span>
</code></pre></div></div>

<p>This point is worth mentioning again and again. Not only will <a href="/blog/2016/12/22/">your
code be more correct</a>, it will be measurably faster. Dynamic
scope is still opt-in through the explicit use of <em>special variables</em>,
so there’s absolutely no reason not to be using lexical scope. If
you’ve written clean, dynamic scope code, then switching to lexical
scope won’t have any effect on its behavior.</p>

<p>Along similar lines, special variables are a lot slower than local,
lexical variables. Only use them when necessary.</p>

<h3 id="2-prefer-built-in-functions">(2) Prefer built-in functions</h3>

<p>Built-in functions are written in C and are, as expected,
significantly faster than the equivalent written in Emacs Lisp.
Complete as much work as possible inside built-in functions, even if
it might mean taking more conceptual steps overall.</p>

<p>For example, what’s the fastest way to accumulate a list of items?
That is, new items go on the tail but, for algorithm reasons, the list
must be constructed from the head.</p>

<p>You might be tempted to keep track of the tail of the list, appending
new elements directly to the tail with <code class="language-plaintext highlighter-rouge">setcdr</code> (via <code class="language-plaintext highlighter-rouge">setf</code> below).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">fib-track-tail</span> <span class="p">(</span><span class="nv">n</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">a</span> <span class="mi">0</span><span class="p">)</span>
         <span class="p">(</span><span class="nv">b</span> <span class="mi">1</span><span class="p">)</span>
         <span class="p">(</span><span class="nv">head</span> <span class="p">(</span><span class="nb">list</span> <span class="mi">1</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">tail</span> <span class="nv">head</span><span class="p">))</span>
    <span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">_</span> <span class="nv">n</span> <span class="nv">head</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">psetf</span> <span class="nv">a</span> <span class="nv">b</span>
             <span class="nv">b</span> <span class="p">(</span><span class="nb">+</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">))</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">cdr</span> <span class="nv">tail</span><span class="p">)</span> <span class="p">(</span><span class="nb">list</span> <span class="nv">b</span><span class="p">)</span>
            <span class="nv">tail</span> <span class="p">(</span><span class="nb">cdr</span> <span class="nv">tail</span><span class="p">)))))</span>

<span class="p">(</span><span class="nv">fib-track-tail</span> <span class="mi">8</span><span class="p">)</span>
<span class="c1">;; =&gt; (1 1 2 3 5 8 13 21 34)</span>
</code></pre></div></div>

<p>Actually, it’s much faster to construct the list in reverse, then
destructively reverse it at the end.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">fib-nreverse</span> <span class="p">(</span><span class="nv">n</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">a</span> <span class="mi">0</span><span class="p">)</span>
         <span class="p">(</span><span class="nv">b</span> <span class="mi">1</span><span class="p">)</span>
         <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="nb">list</span> <span class="mi">1</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">_</span> <span class="nv">n</span> <span class="p">(</span><span class="nb">nreverse</span> <span class="nb">list</span><span class="p">))</span>
      <span class="p">(</span><span class="nb">psetf</span> <span class="nv">a</span> <span class="nv">b</span>
             <span class="nv">b</span> <span class="p">(</span><span class="nb">+</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">))</span>
      <span class="p">(</span><span class="nb">push</span> <span class="nv">b</span> <span class="nb">list</span><span class="p">))))</span>
</code></pre></div></div>

<p>It might not look it, but <code class="language-plaintext highlighter-rouge">nreverse</code> is <em>very</em> fast. Not only is it a
built-in, it’s got its own opcode. Using <code class="language-plaintext highlighter-rouge">push</code> in a loop, then
finishing with <code class="language-plaintext highlighter-rouge">nreverse</code> is the canonical and fastest way to
accumulate a list of items.</p>

<p>In <code class="language-plaintext highlighter-rouge">fib-track-tail</code>, the added complexity of tracking the tail in
Emacs Lisp is much slower than zipping over the entire list a second
time in C.</p>

<h3 id="3-avoid-unnecessary-lambda-functions">(3) Avoid unnecessary lambda functions</h3>

<p>I’m talking about <code class="language-plaintext highlighter-rouge">mapcar</code> and friends.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Slower</span>
<span class="p">(</span><span class="nb">defun</span> <span class="nv">expt-list</span> <span class="p">(</span><span class="nb">list</span> <span class="nv">e</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">mapcar</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="p">(</span><span class="nb">expt</span> <span class="nv">x</span> <span class="nv">e</span><span class="p">))</span> <span class="nb">list</span><span class="p">))</span>
</code></pre></div></div>

<p>Listen, I know you love <a href="https://github.com/magnars/dash.el">dash.el</a> and higher order functions,
but <em>this habit ain’t cheap</em>. The byte-code compiler does not know how
to inline these lambdas, so there’s an additional per-element function
call overhead.</p>

<p>Worse, if you’re using lexical scope like I told you, the above
example forms a <em>closure</em> over <code class="language-plaintext highlighter-rouge">e</code>. This means a new function object
is created (e.g. <code class="language-plaintext highlighter-rouge">make-byte-code</code>) each time <code class="language-plaintext highlighter-rouge">expt-list</code> is called. To
be clear, I don’t mean that the lambda is recompiled each time — the
same byte-code string is shared between all instances of the same
lambda. A unique function vector (<code class="language-plaintext highlighter-rouge">#[...]</code>) and constants vector are
allocated and initialized each time <code class="language-plaintext highlighter-rouge">expt-list</code> is invoked.</p>

<p>Related mini-guideline: Don’t create any more garbage than strictly
necessary in performance-critical code.</p>

<p>Compare to an implementation with an explicit loop, using the
<code class="language-plaintext highlighter-rouge">nreverse</code> list-accumulation technique.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">expt-list-fast</span> <span class="p">(</span><span class="nb">list</span> <span class="nv">e</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">result</span> <span class="p">()))</span>
    <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">x</span> <span class="nb">list</span> <span class="p">(</span><span class="nb">nreverse</span> <span class="nv">result</span><span class="p">))</span>
      <span class="p">(</span><span class="nb">push</span> <span class="p">(</span><span class="nb">expt</span> <span class="nv">x</span> <span class="nv">e</span><span class="p">)</span> <span class="nv">result</span><span class="p">))))</span>
</code></pre></div></div>

<ul>
  <li>No unnecessary garbage is created.</li>
  <li>No unnecessary per-element function calls.</li>
</ul>

<p>This is the fastest possible definition for this function, and it’s
what you need to use in performance-critical code.</p>

<p>Personally I prefer the list comprehension approach, using <code class="language-plaintext highlighter-rouge">cl-loop</code>
from <code class="language-plaintext highlighter-rouge">cl-lib</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">expt-list-fast</span> <span class="p">(</span><span class="nb">list</span> <span class="nv">e</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">x</span> <span class="nv">in</span> <span class="nb">list</span>
           <span class="nv">collect</span> <span class="p">(</span><span class="nb">expt</span> <span class="nv">x</span> <span class="nv">e</span><span class="p">)))</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">cl-loop</code> macro will expand into essentially the previous
definition, making them practically equivalent. It takes some getting
used to, but writing efficient loops is a whole lot less tedious with
<code class="language-plaintext highlighter-rouge">cl-loop</code>.</p>

<p>In Emacs 24.4 and earlier, <code class="language-plaintext highlighter-rouge">catch</code>/<code class="language-plaintext highlighter-rouge">throw</code> is implemented by
converting the body of the <code class="language-plaintext highlighter-rouge">catch</code> into a lambda function and calling
it. If code inside the <code class="language-plaintext highlighter-rouge">catch</code> accesses a variable outside the <code class="language-plaintext highlighter-rouge">catch</code>
(very likely), then, in lexical scope, it turns into a closure,
resulting in the garbage function object like before.</p>

<p>In Emacs 24.5 and later, the byte-code compiler uses a new opcode,
<code class="language-plaintext highlighter-rouge">pushcatch</code>. It’s a whole lot more efficient, and there’s no longer a
reason to shy away from <code class="language-plaintext highlighter-rouge">catch</code>/<code class="language-plaintext highlighter-rouge">throw</code> in performance-critical code.
This is important because it’s often the only way to perform an early
bailout.</p>

<h3 id="4-prefer-using-functions-with-dedicated-opcodes">(4) Prefer using functions with dedicated opcodes</h3>

<p>When following the guideline about using built-in functions, you might
have several to pick from. Some built-in functions have dedicated
virtual machine opcodes, making them much faster to invoke. Prefer
these functions when possible.</p>

<p>How can you tell when a function has an assigned opcode? Take a peek
at the <code class="language-plaintext highlighter-rouge">byte-defop</code> listings in <a href="https://github.com/emacs-mirror/emacs/blob/master/lisp/emacs-lisp/bytecomp.el">bytecomp.el</a>. Optimization often
involves getting into the weeds, so don’t be shy.</p>

<p>For example, the <code class="language-plaintext highlighter-rouge">assq</code> and <code class="language-plaintext highlighter-rouge">assoc</code> functions search for a matching
key in an association list (alist). Both are built-in functions, and
the only difference is that the former compares keys with <code class="language-plaintext highlighter-rouge">eq</code> (e.g.
symbol or integer keys) and the latter with <code class="language-plaintext highlighter-rouge">equal</code> (typically string
keys). The difference in performance between <code class="language-plaintext highlighter-rouge">eq</code> and <code class="language-plaintext highlighter-rouge">equal</code> isn’t as
important as another factor: <code class="language-plaintext highlighter-rouge">assq</code> has its own opcode (158).</p>

<p>This means in performance-critical code you should prefer <code class="language-plaintext highlighter-rouge">assq</code>,
perhaps even going as far as restructuring your alists specifically to
have <code class="language-plaintext highlighter-rouge">eq</code> keys. That last step is probably a trade-off, which means
you’ll want to make some benchmarks to help with that decision.</p>

<p>Another example is <code class="language-plaintext highlighter-rouge">eq</code>, <code class="language-plaintext highlighter-rouge">=</code>, <code class="language-plaintext highlighter-rouge">eql</code>, and <code class="language-plaintext highlighter-rouge">equal</code>. Some macros and
functions use <code class="language-plaintext highlighter-rouge">eql</code>, especially <code class="language-plaintext highlighter-rouge">cl-lib</code> which inherits <code class="language-plaintext highlighter-rouge">eql</code> as a
default from Common Lisp. Take <code class="language-plaintext highlighter-rouge">cl-case</code>, which is like <code class="language-plaintext highlighter-rouge">switch</code> from
the C family of languages. It compares elements with <code class="language-plaintext highlighter-rouge">eql</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">op-apply</span> <span class="p">(</span><span class="nv">op</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-case</span> <span class="nv">op</span>
    <span class="p">(</span><span class="ss">:norm</span> <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">a</span> <span class="nv">a</span><span class="p">)</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">b</span> <span class="nv">b</span><span class="p">)))</span>
    <span class="p">(</span><span class="ss">:disp</span> <span class="p">(</span><span class="nb">abs</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
    <span class="p">(</span><span class="ss">:isin</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">b</span> <span class="p">(</span><span class="nb">sin</span> <span class="nv">a</span><span class="p">)))))</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">cl-case</code> expands into a <code class="language-plaintext highlighter-rouge">cond</code>. Since Emacs byte-code lacks
support for jump tables, there’s not much room for cleverness.</p>

<p><strong>Update</strong>: Emacs 26.1, released May 2018, introduced a jump table
opcode.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">op-apply</span> <span class="p">(</span><span class="nv">op</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">cond</span>
   <span class="p">((</span><span class="nb">eql</span> <span class="nv">op</span> <span class="ss">:norm</span><span class="p">)</span> <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">a</span> <span class="nv">a</span><span class="p">)</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">b</span> <span class="nv">b</span><span class="p">)))</span>
   <span class="p">((</span><span class="nb">eql</span> <span class="nv">op</span> <span class="ss">:disp</span><span class="p">)</span> <span class="p">(</span><span class="nb">abs</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
   <span class="p">((</span><span class="nb">eql</span> <span class="nv">op</span> <span class="ss">:isin</span><span class="p">)</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">b</span> <span class="p">(</span><span class="nb">sin</span> <span class="nv">a</span><span class="p">)))))</span>
</code></pre></div></div>

<p>It turns out <code class="language-plaintext highlighter-rouge">eql</code> is pretty much always the worst choice for
<code class="language-plaintext highlighter-rouge">cl-case</code>. Of the four equality functions I listed, the only one
lacking an opcode is <code class="language-plaintext highlighter-rouge">eql</code>. A faster definition would use <code class="language-plaintext highlighter-rouge">eq</code>. (In
theory, <code class="language-plaintext highlighter-rouge">cl-case</code> <em>could</em> have done this itself because it knows all
the keys are symbols.)</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">op-apply</span> <span class="p">(</span><span class="nv">op</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">cond</span>
   <span class="p">((</span><span class="nb">eq</span> <span class="nv">op</span> <span class="ss">:norm</span><span class="p">)</span> <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">a</span> <span class="nv">a</span><span class="p">)</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">b</span> <span class="nv">b</span><span class="p">)))</span>
   <span class="p">((</span><span class="nb">eq</span> <span class="nv">op</span> <span class="ss">:disp</span><span class="p">)</span> <span class="p">(</span><span class="nb">abs</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
   <span class="p">((</span><span class="nb">eq</span> <span class="nv">op</span> <span class="ss">:isin</span><span class="p">)</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">b</span> <span class="p">(</span><span class="nb">sin</span> <span class="nv">a</span><span class="p">)))))</span>
</code></pre></div></div>

<p>Fortunately <code class="language-plaintext highlighter-rouge">eq</code> can safely compare integers in Emacs Lisp. You only
need <code class="language-plaintext highlighter-rouge">eql</code> when comparing symbols, integers, and floats all at once,
which is unusual.</p>

<h3 id="5-unroll-loops-using-andor">(5) Unroll loops using and/or</h3>

<p>Consider the following function which checks its argument against a
list of numbers, bailing out on the first match. I used <code class="language-plaintext highlighter-rouge">%</code> instead of
<code class="language-plaintext highlighter-rouge">mod</code> since the former has an opcode (166) and the latter does not.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">detect</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span>
  <span class="p">(</span><span class="k">catch</span> <span class="ss">'found</span>
    <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">f</span> <span class="o">'</span><span class="p">(</span><span class="mi">2</span> <span class="mi">3</span> <span class="mi">5</span> <span class="mi">7</span> <span class="mi">11</span> <span class="mi">13</span> <span class="mi">17</span> <span class="mi">19</span> <span class="mi">23</span> <span class="mi">29</span> <span class="mi">31</span><span class="p">))</span>
      <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="nv">x</span> <span class="nv">f</span><span class="p">))</span>
        <span class="p">(</span><span class="k">throw</span> <span class="ss">'found</span> <span class="nv">f</span><span class="p">)))))</span>
</code></pre></div></div>

<p>The byte-code compiler doesn’t know how to unroll loops. Fortunately
that’s something we can do for ourselves using <code class="language-plaintext highlighter-rouge">and</code> and <code class="language-plaintext highlighter-rouge">or</code>. The
compiler will turn this into clean, efficient jumps in the byte-code.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">detect-unrolled</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">or</span> <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="nv">x</span> <span class="mi">2</span><span class="p">))</span> <span class="mi">2</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="nv">x</span> <span class="mi">3</span><span class="p">))</span> <span class="mi">3</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="nv">x</span> <span class="mi">5</span><span class="p">))</span> <span class="mi">5</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="nv">x</span> <span class="mi">7</span><span class="p">))</span> <span class="mi">7</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="nv">x</span> <span class="mi">11</span><span class="p">))</span> <span class="mi">11</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="nv">x</span> <span class="mi">13</span><span class="p">))</span> <span class="mi">13</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="nv">x</span> <span class="mi">17</span><span class="p">))</span> <span class="mi">17</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="nv">x</span> <span class="mi">19</span><span class="p">))</span> <span class="mi">19</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="nv">x</span> <span class="mi">23</span><span class="p">))</span> <span class="mi">23</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="nv">x</span> <span class="mi">29</span><span class="p">))</span> <span class="mi">29</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="nv">x</span> <span class="mi">31</span><span class="p">))</span> <span class="mi">31</span><span class="p">)))</span>
</code></pre></div></div>

<p>In Emacs 24.4 and earlier with the old-fashioned lambda-based <code class="language-plaintext highlighter-rouge">catch</code>,
the unrolled definition is seven times faster. With the faster
<code class="language-plaintext highlighter-rouge">pushcatch</code>-based <code class="language-plaintext highlighter-rouge">catch</code> it’s about twice as fast. This means the
loop overhead accounts for about half the work of the first definition
of this function.</p>

<p>Update: It was pointed out in the comments that this particular
example is equivalent to a <code class="language-plaintext highlighter-rouge">cond</code>. That’s literally true all the way
down to the byte-code, and it would be a clearer way to express the
unrolled code. In real code it’s often not <em>quite</em> equivalent.</p>

<p>Unlike some of the other guidelines, this is certainly something you’d
only want to do in code you know for sure is performance-critical.
Maintaining unrolled code is tedious and error-prone.</p>

<p>I’ve had the most success with this approach by not by unrolling these
loops myself, but by <a href="/blog/2016/12/27/">using a macro</a>, or <a href="/blog/2016/12/11/">similar</a>, to
generate the unrolled form.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmacro</span> <span class="nv">with-detect</span> <span class="p">(</span><span class="nv">var</span> <span class="nb">list</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">e</span> <span class="nv">in</span> <span class="nb">list</span>
           <span class="nv">collect</span> <span class="o">`</span><span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="o">,</span><span class="nv">var</span> <span class="o">,</span><span class="nv">e</span><span class="p">))</span> <span class="o">,</span><span class="nv">e</span><span class="p">)</span> <span class="nv">into</span> <span class="nv">conditions</span>
           <span class="nv">finally</span> <span class="nb">return</span> <span class="o">`</span><span class="p">(</span><span class="nb">or</span> <span class="o">,@</span><span class="nv">conditions</span><span class="p">)))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">detect-unrolled</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">with-detect</span> <span class="nv">x</span> <span class="p">(</span><span class="mi">2</span> <span class="mi">3</span> <span class="mi">5</span> <span class="mi">7</span> <span class="mi">11</span> <span class="mi">13</span> <span class="mi">17</span> <span class="mi">19</span> <span class="mi">23</span> <span class="mi">29</span> <span class="mi">31</span><span class="p">)))</span>
</code></pre></div></div>

<h3 id="how-can-i-find-more-optimization-opportunities-myself">How can I find more optimization opportunities myself?</h3>

<p>Use <code class="language-plaintext highlighter-rouge">M-x disassemble</code> to inspect the byte-code for your own hot spots.
Observe how the byte-code changes in response to changes in your
functions. Take note of the sorts of forms that allow the byte-code
compiler to produce the best code, and then exploit it where you can.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  <entry>
    <title>Domain-Specific Language Compilation in Elfeed</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/12/27/"/>
    <id>urn:uuid:6a6cd6a2-b44d-35b5-503c-c496d9094ac0</id>
    <updated>2016-12-27T21:46:30Z</updated>
    <category term="elfeed"/><category term="emacs"/><category term="elisp"/><category term="optimization"/>
    <content type="html">
      <![CDATA[<p>Last night I pushed another performance enhancement for Elfeed, this
time reducing the time spent parsing feeds. It’s accomplished by
compiling, during macro expansion, a jQuery-like domain-specific
language within Elfeed.</p>

<h3 id="heuristic-parsing">Heuristic parsing</h3>

<p>Given the nature of the domain — <a href="/blog/2013/09/23/">an under-specified standard</a>
and a lack of robust adherence — feed parsing is much more heuristic
than strict. Sure, everyone’s feed XML is strictly conforming since
virtually no feed reader tolerates invalid XML (thank you, XML
libraries), but, for the schema, the situation resembles the <em>de
facto</em> looseness of HTML. Sometimes important or required information
is missing, or is only available in <a href="https://www.intertwingly.net/wiki/pie/DublinCore">a different namespace</a>.
Sometimes, especially in the case of timestamps, it’s in the wrong
format, or encoded incorrectly, or ambiguous. It’s real world data.</p>

<p>To get a particular piece of information, Elfeed looks in a number of
different places within the feed, starting with the preferred source
and stopping when the information is found. For example, to find the
date of an Atom entry, Elfeed first searches for elements in this
order:</p>

<ol>
  <li><code class="language-plaintext highlighter-rouge">&lt;published&gt;</code></li>
  <li><code class="language-plaintext highlighter-rouge">&lt;updated&gt;</code></li>
  <li><code class="language-plaintext highlighter-rouge">&lt;date&gt;</code></li>
  <li><code class="language-plaintext highlighter-rouge">&lt;modified&gt;</code></li>
  <li><code class="language-plaintext highlighter-rouge">&lt;issued&gt;</code></li>
</ol>

<p>Failing to find any of these elements, or if no parsable date is
found, it settles on the current time. Only the <code class="language-plaintext highlighter-rouge">updated</code> element is
required, but <code class="language-plaintext highlighter-rouge">published</code> usually has the desired information, so it
goes first. The last three are only valid for another namespace, but
are useful fallbacks.</p>

<p>Before Elfeed even starts this search, the XML text is parsed into an
s-expression using <code class="language-plaintext highlighter-rouge">xml-parse-region</code> — a pure Elisp XML parser
included in Emacs. The search is made over the resulting s-expression.</p>

<p>For example, here’s a sample <a href="https://tools.ietf.org/html/rfc4287">from the Atom specification</a>.</p>

<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">&lt;?xml version="1.0" encoding="utf-8"?&gt;</span>
<span class="nt">&lt;feed</span> <span class="na">xmlns=</span><span class="s">"http://www.w3.org/2005/Atom"</span><span class="nt">&gt;</span>

  <span class="nt">&lt;title&gt;</span>Example Feed<span class="nt">&lt;/title&gt;</span>
  <span class="nt">&lt;link</span> <span class="na">href=</span><span class="s">"http://example.org/"</span><span class="nt">/&gt;</span>
  <span class="nt">&lt;updated&gt;</span>2003-12-13T18:30:02Z<span class="nt">&lt;/updated&gt;</span>
  <span class="nt">&lt;author&gt;</span>
    <span class="nt">&lt;name&gt;</span>John Doe<span class="nt">&lt;/name&gt;</span>
  <span class="nt">&lt;/author&gt;</span>
  <span class="nt">&lt;id&gt;</span>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6<span class="nt">&lt;/id&gt;</span>

  <span class="nt">&lt;entry&gt;</span>
    <span class="nt">&lt;title&gt;</span>Atom-Powered Robots Run Amok<span class="nt">&lt;/title&gt;</span>
    <span class="nt">&lt;link</span> <span class="na">rel=</span><span class="s">"alternate"</span> <span class="na">href=</span><span class="s">"http://example.org/2003/12/13/atom03"</span><span class="nt">/&gt;</span>
    <span class="nt">&lt;id&gt;</span>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a<span class="nt">&lt;/id&gt;</span>
    <span class="nt">&lt;updated&gt;</span>2003-12-13T18:30:02Z<span class="nt">&lt;/updated&gt;</span>
    <span class="nt">&lt;summary&gt;</span>Some text.<span class="nt">&lt;/summary&gt;</span>
  <span class="nt">&lt;/entry&gt;</span>

<span class="nt">&lt;/feed&gt;</span>
</code></pre></div></div>

<p>Which is parsed to into this s-expression.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">((</span><span class="nv">feed</span> <span class="p">((</span><span class="nv">xmlns</span> <span class="o">.</span> <span class="s">"http://www.w3.org/2005/Atom"</span><span class="p">))</span>
       <span class="p">(</span><span class="nv">title</span> <span class="p">()</span> <span class="s">"Example Feed"</span><span class="p">)</span>
       <span class="p">(</span><span class="nv">link</span> <span class="p">((</span><span class="nv">href</span> <span class="o">.</span> <span class="s">"http://example.org/"</span><span class="p">)))</span>
       <span class="p">(</span><span class="nv">updated</span> <span class="p">()</span> <span class="s">"2003-12-13T18:30:02Z"</span><span class="p">)</span>
       <span class="p">(</span><span class="nv">author</span> <span class="p">()</span> <span class="p">(</span><span class="nv">name</span> <span class="p">()</span> <span class="s">"John Doe"</span><span class="p">))</span>
       <span class="p">(</span><span class="nv">id</span> <span class="p">()</span> <span class="s">"urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6"</span><span class="p">)</span>
       <span class="p">(</span><span class="nv">entry</span> <span class="p">()</span>
              <span class="p">(</span><span class="nv">title</span> <span class="p">()</span> <span class="s">"Atom-Powered Robots Run Amok"</span><span class="p">)</span>
              <span class="p">(</span><span class="nv">link</span> <span class="p">((</span><span class="nv">rel</span> <span class="o">.</span> <span class="s">"alternate"</span><span class="p">)</span>
                     <span class="p">(</span><span class="nv">href</span> <span class="o">.</span> <span class="s">"http://example.org/2003/12/13/atom03"</span><span class="p">)))</span>
              <span class="p">(</span><span class="nv">id</span> <span class="p">()</span> <span class="s">"urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a"</span><span class="p">)</span>
              <span class="p">(</span><span class="nv">updated</span> <span class="p">()</span> <span class="s">"2003-12-13T18:30:02Z"</span><span class="p">)</span>
              <span class="p">(</span><span class="nv">summary</span> <span class="p">()</span> <span class="s">"Some text."</span><span class="p">))))</span>
</code></pre></div></div>

<p>Each XML element is converted to a list. The first item is a symbol
that is the element’s name. The second item is an alist of attributes
— cons pairs of symbols and strings. And the rest are its children,
both string nodes and other elements. I’ve trimmed the extraneous
string nodes from the sample s-expression.</p>

<p>A subtle detail is that <code class="language-plaintext highlighter-rouge">xml-parse-region</code> doesn’t just return the
root element. It returns a <em>list of elements</em>, which always happens to
be a single element list, which is the root element. I don’t know why
this is, but I’ve built everything to assume this structure as input.</p>

<p>Elfeed strips all namespaces stripped from both elements and
attributes to make parsing simpler. As I said, it’s heuristic rather
than strict, so namespaces are treated as noise.</p>

<h3 id="a-domain-specific-language">A domain-specific language</h3>

<p>Coding up Elfeed’s s-expression searches in straight Emacs Lisp would
be tedious, error-prone, and difficult to understand. It’s a lot of
loops, <code class="language-plaintext highlighter-rouge">assoc</code>, etc. So instead I invented a jQuery-like, CSS
selector-like, domain-specific language (DSL) to express these
searches concisely and clearly.</p>

<p>For example, all of the entry links are “selected” using this
expression:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">feed</span> <span class="nv">entry</span> <span class="nv">link</span> <span class="nv">[rel</span> <span class="s">"alternate"</span><span class="nv">]</span> <span class="ss">:href</span><span class="p">)</span>
</code></pre></div></div>

<p>Reading right-to-left, this matches every <code class="language-plaintext highlighter-rouge">href</code> attribute under every
<code class="language-plaintext highlighter-rouge">link</code> element with the <code class="language-plaintext highlighter-rouge">rel="alternate"</code> attribute, under every
<code class="language-plaintext highlighter-rouge">entry</code> element, under the <code class="language-plaintext highlighter-rouge">feed</code> root element. Symbols match element
names, two-element vectors match elements with a particular attribute
pair, and keywords (which must come last) narrow the selection to a
specific attribute value.</p>

<p>Imagine hand-writing the code to navigate all these conditions for
each piece of information that Elfeed requires. The RSS parser makes
up to 16 such queries, and the Atom parser makes as many as 24. That
would add up to a lot of tedious code.</p>

<p>The package (included with Elfeed) that executes this query is called
“xml-query.” It comes in two flavors: <code class="language-plaintext highlighter-rouge">xml-query</code> and <code class="language-plaintext highlighter-rouge">xml-query-all</code>.
The former returns just the first match, and the latter returns all
matches. The naming parallels the <code class="language-plaintext highlighter-rouge">querySelector()</code> and
<code class="language-plaintext highlighter-rouge">querySelectorAll()</code> DOM methods in JavaScript.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">xml</span> <span class="p">(</span><span class="nv">elfeed-xml-parse-region</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">xml-query-all</span> <span class="o">'</span><span class="p">(</span><span class="nv">feed</span> <span class="nv">entry</span> <span class="nv">link</span> <span class="nv">[rel</span> <span class="s">"alternate"</span><span class="nv">]</span> <span class="ss">:href</span><span class="p">)</span> <span class="nv">xml</span><span class="p">))</span>

<span class="c1">;; =&gt; ("http://example.org/2003/12/13/atom03")</span>
</code></pre></div></div>

<p>That date search I mentioned before looks roughly like this. The <code class="language-plaintext highlighter-rouge">*</code>
matches text nodes within the selected element. It must come last just
like the keyword matcher.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">or</span> <span class="p">(</span><span class="nv">xml-query</span> <span class="o">'</span><span class="p">(</span><span class="nv">feed</span> <span class="nv">entry</span> <span class="nv">published</span> <span class="nb">*</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">xml-query</span> <span class="o">'</span><span class="p">(</span><span class="nv">feed</span> <span class="nv">entry</span> <span class="nv">updated</span> <span class="nb">*</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">xml-query</span> <span class="o">'</span><span class="p">(</span><span class="nv">feed</span> <span class="nv">entry</span> <span class="nv">date</span> <span class="nb">*</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">xml-query</span> <span class="o">'</span><span class="p">(</span><span class="nv">feed</span> <span class="nv">entry</span> <span class="nv">modified</span> <span class="nb">*</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">xml-query</span> <span class="o">'</span><span class="p">(</span><span class="nv">feed</span> <span class="nv">entry</span> <span class="nv">issued</span> <span class="nb">*</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">current-time</span><span class="p">))</span>
</code></pre></div></div>

<p>Over the past three years, Elfeed has gained more and more of these
selectors as it collects more and more information from feeds. Most
recently, Elfeed collects author and category information provided by
feeds. Each new query slows feed parsing a little bit, and it’s a
perfect example of a program slowing down as it gains more features
and capabilities.</p>

<p>But I don’t want Elfeed to slow down. I want it to get <em>faster</em>!</p>

<h3 id="optimizing-the-domain-specific-language">Optimizing the domain-specific language</h3>

<p>Just like the primary jQuery function (<code class="language-plaintext highlighter-rouge">$</code>), both <code class="language-plaintext highlighter-rouge">xml-query</code> and
<code class="language-plaintext highlighter-rouge">xml-query-all</code> are functions. The xml-query engine processes the
selector from scratch on each invocation. It examines the first
element, dispatches on its type/value to apply it to the input, and
then recurses on the rest of selector with the narrowed input,
stopping when it hits the end of the list. That’s the way it’s worked
from the start.</p>

<p>However, every selector argument in Elfeed is a static, quoted list.
<a href="/blog/2016/12/11/">Unlike user-supplied filters</a>, I know exactly what I want to
execute ahead of time. It would be much better if the engine didn’t
have to waste time reparsing the DSL for each query.</p>

<p>This is the classic split between interpreters and compilers. An
interpreter reads input and immediately executes it, doing what the
input tells it to do. A compiler reads input and, rather than execute
it, produces output, usually in a simpler language, that, when
evaluated, has the same effect as executing the input.</p>

<p>Rather than interpret the selector, it would be better to compile it
into Elisp code, compile that <a href="/blog/2014/01/04/">into byte-code</a>, and then have the
Emacs byte-code virtual machine (VM) execute the query each time it’s
needed. The extra work of parsing the DSL is performed ahead of time,
the dispatch is entirely static, and the selector ultimately executes
on a much faster engine (byte-code VM). This should be a lot faster!</p>

<p>So I wrote a function that accepts a selector expression and emits
Elisp source that implements that selector: a compiler for my DSL.
Having a readily-available syntax tree is one of the <a href="https://en.wikipedia.org/wiki/Homoiconicity">big advantages
of homoiconicity</a>, and this sort of function makes perfect sense
in a lisp. For the external interface, this compiler function is
called by a new pair of macros, <code class="language-plaintext highlighter-rouge">xml-query*</code> and <code class="language-plaintext highlighter-rouge">xml-query-all*</code>.
These macros consume a static selector and expand into the compiled
Elisp form of the selector.</p>

<p>To demonstrate, remember that link query from before? Here’s the macro
version of that selection, but only returning the first match. Notice
the selector is no longer quoted. This is because it’s consumed by the
macro, not evaluated.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">xml-query*</span> <span class="p">(</span><span class="nv">feed</span> <span class="nv">entry</span> <span class="nv">title</span> <span class="nv">[rel</span> <span class="s">"alternate"</span><span class="nv">]</span> <span class="ss">:href</span><span class="p">)</span> <span class="nv">xml</span><span class="p">)</span>
</code></pre></div></div>

<p>This will expand into the following code.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">catch</span> <span class="ss">'done</span>
  <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">v</span> <span class="nv">xml</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">consp</span> <span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">v</span><span class="p">)</span> <span class="ss">'feed</span><span class="p">))</span>
      <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">v</span> <span class="p">(</span><span class="nb">cddr</span> <span class="nv">v</span><span class="p">))</span>
        <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">consp</span> <span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">v</span><span class="p">)</span> <span class="ss">'entry</span><span class="p">))</span>
          <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">v</span> <span class="p">(</span><span class="nb">cddr</span> <span class="nv">v</span><span class="p">))</span>
            <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">consp</span> <span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">v</span><span class="p">)</span> <span class="ss">'title</span><span class="p">))</span>
              <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">value</span> <span class="p">(</span><span class="nb">cdr</span> <span class="p">(</span><span class="nv">assq</span> <span class="ss">'rel</span> <span class="p">(</span><span class="nb">cadr</span> <span class="nv">v</span><span class="p">)))))</span>
                <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nb">equal</span> <span class="nv">value</span> <span class="s">"alternate"</span><span class="p">)</span>
                  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">v</span> <span class="p">(</span><span class="nb">cdr</span> <span class="p">(</span><span class="nv">assq</span> <span class="ss">'href</span> <span class="p">(</span><span class="nb">cadr</span> <span class="nv">v</span><span class="p">)))))</span>
                    <span class="p">(</span><span class="nb">when</span> <span class="nv">v</span>
                      <span class="p">(</span><span class="k">throw</span> <span class="ss">'done</span> <span class="nv">v</span><span class="p">))))))))))))</span>
</code></pre></div></div>

<p>As soon as it finds a match, it’s thrown to the top level and
returned. Without the DSL, the expansion is essentially what would
have to be written by hand. <strong>This is <em>exactly</em> the sort of leverage
you should be getting from a compiler.</strong> It compiles to around 130
byte-code instructions.</p>

<p>The <code class="language-plaintext highlighter-rouge">xml-query-all*</code> form is nearly the same, but instead of a
<code class="language-plaintext highlighter-rouge">throw</code>, it pushes the result into the return list. Only the prologue
(the outermost part) and the epilogue (the innermost part) are
different.</p>

<p>Parsing feeds is a hot spot for Elfeed, so I wanted the compiler’s
output to be as efficient as possible. I had three goals for this:</p>

<ul>
  <li>
    <p><strong>No extraneous code.</strong> It’s easy for the compiler to emit
unnecessary code. The byte-code compiler might be able to eliminate
some of it, but I don’t want to rely on that. Except for the
identifiers, it should basically look like a human wrote it.</p>
  </li>
  <li>
    <p><strong>Avoid function calls.</strong> I don’t want to pay function call
overhead, and, with some care, it’s easy to avoid. In the
<code class="language-plaintext highlighter-rouge">xml-query*</code> expansion, the only function call is <code class="language-plaintext highlighter-rouge">throw</code>, which is
unavoidable. The <code class="language-plaintext highlighter-rouge">xml-query-all*</code> version makes no function calls
whatsoever. Notice that I used <code class="language-plaintext highlighter-rouge">assq</code> rather than <code class="language-plaintext highlighter-rouge">assoc</code>. First, it
only needs to match symbols, so it should be faster. Second, <code class="language-plaintext highlighter-rouge">assq</code>
has its own byte-code instruction (158) and <code class="language-plaintext highlighter-rouge">assoc</code> does not.</p>
  </li>
  <li>
    <p><strong>No unnecessary memory allocations</strong>. The <code class="language-plaintext highlighter-rouge">xml-query*</code> expansion
makes <em>no</em> allocations. The <code class="language-plaintext highlighter-rouge">xml-query-all*</code> version only conses
once per output, which is the minimum possible.</p>
  </li>
</ul>

<p>The end result is at least as optimal as hand-written code, but
without the chance of human error (typos, fat fingering) and sourced
from an easy-to-read DSL.</p>

<h3 id="performance">Performance</h3>

<p>In my tests, the <strong>xml-query macros are a full order of magnitude
faster than the functions</strong>. Yes, ten times faster! It’s an even
bigger gain than I expected.</p>

<p>In the full picture, xml-query is only one part of parsing a feed.
Measuring the time starting from raw XML text (as <a href="/blog/2016/06/16/">delivered by
cURL</a>) to a list of database entry objects, I’m seeing an
<strong>overall 25% speedup</strong> with the macros. The remaining time is
dominated by <code class="language-plaintext highlighter-rouge">xml-parse-region</code>, which is mostly out of my control.</p>

<p>With xml-query so computationally cheap, I don’t need to worry about
using it more often. Compared to parsing XML text, it’s virtually
free.</p>

<p>When it came time to validate my DSL compiler, I was <em>really</em> happy
that Elfeed had a test suite. I essentially rewrote a core component
from scratch, and passing all of the unit tests was a strong sign that
it was correct. Many times that test suite has provided confidence in
changes made both by me and by others.</p>

<p>I’ll end by describing another possible application: Apply this
technique to regular expressions, such that static strings containing
regular expressions are compiled into Elisp/byte-code via macro
expansion. I wonder if situationally this would be faster than Emacs’
own regular expression engine.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>Some Performance Advantages of Lexical Scope</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/12/22/"/>
    <id>urn:uuid:21bc4afa-caa8-37ed-a912-a35f35d0e432</id>
    <updated>2016-12-22T02:33:36Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="optimization"/><category term="compsci"/>
    <content type="html">
      <![CDATA[<p>I recently had a discussion with <a href="http://ergoemacs.org/">Xah Lee</a> about lexical scope in
Emacs Lisp. The topic was why <code class="language-plaintext highlighter-rouge">lexical-binding</code> exists at a file-level
when there was already <code class="language-plaintext highlighter-rouge">lexical-let</code> (from <code class="language-plaintext highlighter-rouge">cl-lib</code>), prompted by my
previous article on <a href="/blog/2016/12/11/">JIT byte-code compilation</a>. The specific
context is Emacs Lisp, but these concepts apply to language design in
general.</p>

<p>Until Emacs 24.1 (June 2012), Elisp only had dynamically scoped
variables — a feature, mostly by accident, common to old lisp
dialects. While dynamic scope has some selective uses, it’s widely
regarded as a mistake for local variables, and virtually no other
languages have adopted it.</p>

<p>Way back in 1993, Dave Gillespie’s deviously clever <code class="language-plaintext highlighter-rouge">lexical-let</code>
macro <a href="http://git.savannah.gnu.org/cgit/emacs.git/commit/?h=fcd73769&amp;id=fcd737693e8e320acd70f91ec8e0728563244805">was committed</a> to the <code class="language-plaintext highlighter-rouge">cl</code> package, providing a rudimentary
form of opt-in lexical scope. The macro walks its body replacing local
variable names with guaranteed-unique gensym names: the exact same
technique used in macros to create “hygienic” bindings that aren’t
visible to the macro body. It essentially “fakes” lexical scope within
Elisp’s dynamic scope by preventing variable name collisions.</p>

<p>For example, here’s one of the consequences of dynamic scope.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">inner</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">setq</span> <span class="nv">v</span> <span class="ss">:inner</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">outer</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">v</span> <span class="ss">:outer</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">inner</span><span class="p">)</span>
    <span class="nv">v</span><span class="p">))</span>

<span class="p">(</span><span class="nv">outer</span><span class="p">)</span>
<span class="c1">;; =&gt; :inner</span>
</code></pre></div></div>

<p>The “local” variable <code class="language-plaintext highlighter-rouge">v</code> in <code class="language-plaintext highlighter-rouge">outer</code> is visible to its callee, <code class="language-plaintext highlighter-rouge">inner</code>,
which can access and manipulate it. The meaning of the <em>free variable</em>
<code class="language-plaintext highlighter-rouge">v</code> in <code class="language-plaintext highlighter-rouge">inner</code> depends entirely on the run-time call stack. It might
be a global variable, or it might be a local variable for a caller,
direct or indirect.</p>

<p>Using <code class="language-plaintext highlighter-rouge">lexical-let</code> deconflicts these names, giving the effect of
lexical scope.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">v</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">lexical-outer</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">lexical-let</span> <span class="p">((</span><span class="nv">v</span> <span class="ss">:outer</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">inner</span><span class="p">)</span>
    <span class="nv">v</span><span class="p">))</span>

<span class="p">(</span><span class="nv">lexical-outer</span><span class="p">)</span>
<span class="c1">;; =&gt; :outer</span>
</code></pre></div></div>

<p>But there’s more to lexical scope than this. Closures only make sense
in the context of lexical scope, and the most useful feature of
<code class="language-plaintext highlighter-rouge">lexical-let</code> is that lambda expressions evaluate to closures. The
macro implements this using a technique called <a href="https://en.wikipedia.org/wiki/Lambda_lifting"><em>closure
conversion</em></a>. Additional parameters are added to the original
lambda function, one for each lexical variable (and not just each
closed-over variable), and the whole thing is wrapped in <em>another</em>
lambda function that invokes the original lambda function with the
additional parameters filled with the closed-over variables — yes, the
variables (e.g. symbols) themselves, <em>not</em> just their values, (e.g.
pass-by-reference). The last point means different closures can
properly close over the same variables, and they can bind new values.</p>

<p>To roughly illustrate how this works, the first lambda expression
below, which closes over the lexical variables <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code>, would be
converted into the latter by <code class="language-plaintext highlighter-rouge">lexical-let</code>. The <code class="language-plaintext highlighter-rouge">#:</code> is Elisp’s syntax
for uninterned variables. So <code class="language-plaintext highlighter-rouge">#:x</code> is <em>a</em> symbol <code class="language-plaintext highlighter-rouge">x</code>, but not <em>the</em>
symbol <code class="language-plaintext highlighter-rouge">x</code> (see <code class="language-plaintext highlighter-rouge">print-gensym</code>).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Before conversion:</span>
<span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
  <span class="p">(</span><span class="nb">+</span> <span class="nv">x</span> <span class="nv">y</span><span class="p">))</span>

<span class="c1">;; After conversion:</span>
<span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">args</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">apply</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span> <span class="nv">y</span><span class="p">)</span>
           <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nb">symbol-value</span> <span class="nv">x</span><span class="p">)</span>
              <span class="p">(</span><span class="nb">symbol-value</span> <span class="nv">y</span><span class="p">)))</span>
         <span class="o">'</span><span class="ss">#:x</span> <span class="o">'</span><span class="ss">#:y</span> <span class="nv">args</span><span class="p">))</span>
</code></pre></div></div>

<p>I’ve said on multiple occasions that <code class="language-plaintext highlighter-rouge">lexical-binding: t</code> has
significant advantages, both in performance and static analysis, and
so it should be used for all future Elisp code. The only reason it’s
not the default is because it breaks some old (badly written) code.
However, <strong><code class="language-plaintext highlighter-rouge">lexical-let</code> doesn’t realize any of these advantages</strong>! In
fact, it has worse performance than straightforward dynamic scope with
<code class="language-plaintext highlighter-rouge">let</code>.</p>

<ol>
  <li>
    <p>New symbol objects are allocated and initialized (<code class="language-plaintext highlighter-rouge">make-symbol</code>) on
each run-time evaluation, one per lexical variable.</p>
  </li>
  <li>
    <p>Since it’s just faking it, <code class="language-plaintext highlighter-rouge">lexical-let</code> still uses dynamic
bindings, which are more expensive than lexical bindings. It varies
depending on the C compiler that built Emacs, but dynamic variable
accesses (opcode <code class="language-plaintext highlighter-rouge">varref</code>) take around 30% longer than lexical
variable accesses (opcode <code class="language-plaintext highlighter-rouge">stack-ref</code>). Assignment is far worse,
where dynamic variable assignment (<code class="language-plaintext highlighter-rouge">varset</code>) takes 650% longer than
lexical variable assignment (<code class="language-plaintext highlighter-rouge">stack-set</code>). How I measured all this
is a topic for another article.</p>
  </li>
  <li>
    <p>The “lexical” variables are accessed using <code class="language-plaintext highlighter-rouge">symbol-value</code>, a full
function call, so they’re even slower than normal dynamic
variables.</p>
  </li>
  <li>
    <p>Because converted lambda expressions are constructed dynamically at
run-time within the body of <code class="language-plaintext highlighter-rouge">lexical-let</code>, the resulting closure is
only partially byte-compiled even if the code as a whole has been
byte-compiled. In contrast, <code class="language-plaintext highlighter-rouge">lexical-binding: t</code> closures are fully
compiled. How this works is worth <a href="/blog/2017/12/14/">its own article</a>.</p>
  </li>
  <li>
    <p>Converted lambda expressions include the additional internal
function invocation, making them slower.</p>
  </li>
</ol>

<p>While <code class="language-plaintext highlighter-rouge">lexical-let</code> is clever, and occasionally useful prior to Emacs
24, it may come at a hefty performance cost if evaluated frequently.
There’s no reason to use it anymore.</p>

<h3 id="constraints-on-code-generation">Constraints on code generation</h3>

<p>Another reason to be weary of dynamic scope is that it puts needless
constraints on the compiler, preventing a number of important
optimization opportunities. For example, consider the following
function, <code class="language-plaintext highlighter-rouge">bar</code>:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">bar</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">x</span> <span class="mi">1</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">y</span> <span class="mi">2</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">foo</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">+</span> <span class="nv">x</span> <span class="nv">y</span><span class="p">)))</span>
</code></pre></div></div>

<p>Byte-compile this function under dynamic scope (<code class="language-plaintext highlighter-rouge">lexical-binding:
nil</code>) and <a href="/blog/2014/01/04/">disassemble it</a> to see what it looks like.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">byte-compile</span> <span class="nf">#'</span><span class="nv">bar</span><span class="p">)</span>
<span class="p">(</span><span class="nb">disassemble</span> <span class="nf">#'</span><span class="nv">bar</span><span class="p">)</span>
</code></pre></div></div>

<p>That pops up a buffer with the disassembly listing:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  1
1       constant  2
2       varbind   y
3       varbind   x
4       constant  foo
5       call      0
6       discard
7       varref    x
8       varref    y
9       plus
10      unbind    2
11      return
</code></pre></div></div>

<p>It’s 12 instructions, 5 of which deal with dynamic bindings. The
byte-compiler doesn’t always produce optimal byte-code, but this just
so happens to be <em>nearly</em> optimal byte-code. The <code class="language-plaintext highlighter-rouge">discard</code> (a very
fast instruction) isn’t necessary, but otherwise no more compiler
smarts can improve on this. Since the variables <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> are
visible to <code class="language-plaintext highlighter-rouge">foo</code>, they must be bound before the call and <a href="/blog/2016/07/25/">loaded after
the call</a>. While generally this function will return 3, the
compiler cannot assume so since it ultimately depends on the behavior
<code class="language-plaintext highlighter-rouge">foo</code>. Its hands are tied.</p>

<p>Compare this to the lexical scope version (<code class="language-plaintext highlighter-rouge">lexical-binding: t</code>):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  1
1       constant  2
2       constant  foo
3       call      0
4       discard
5       stack-ref 1
6       stack-ref 1
7       plus
8       return
</code></pre></div></div>

<p>It’s only 8 instructions, none of which are expensive dynamic variable
instructions. And this isn’t even close to the optimal byte-code. In
fact, as of Emacs 25.1 the byte-compiler often doesn’t produce the
optimal byte-code for lexical scope code and still needs some work.
<strong>Despite not firing on all cylinders, lexical scope still manages to
beat dynamic scope in performance benchmarks.</strong></p>

<p>Here’s the optimal byte-code, should the byte-compiler become smarter
someday:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  foo
1       call      0
2       constant  3
3       return
</code></pre></div></div>

<p>It’s down to 4 instructions due to computing the math operation at
compile time. Emacs’ byte-compiler only has rudimentary constant
folding, so it doesn’t notice that <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> are constants and
misses this optimization. I speculate this is due to its roots
compiling under dynamic scope. Since <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> are no longer exposed
to <code class="language-plaintext highlighter-rouge">foo</code>, the compiler has the opportunity to optimize them out of
existence. I haven’t measured it, but I would expect this to be
significantly faster than the dynamic scope version of this function.</p>

<h3 id="optional-dynamic-scope">Optional dynamic scope</h3>

<p>You might be thinking, “What if I really <em>do</em> want <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> to be
dynamically bound for <code class="language-plaintext highlighter-rouge">foo</code>?” This is often useful. Many of Emacs’ own
functions are designed to have certain variables dynamically bound
around them. For example, the print family of functions use the global
variable <code class="language-plaintext highlighter-rouge">standard-output</code> to determine where to send output by
default.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">standard-output</span> <span class="p">(</span><span class="nv">current-buffer</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">princ</span> <span class="s">"value = "</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">prin1</span> <span class="nv">value</span><span class="p">))</span>
</code></pre></div></div>

<p>Have no fear: <strong>With <code class="language-plaintext highlighter-rouge">lexical-binding: t</code> you can have your cake and
eat it too.</strong> Variables declared with <code class="language-plaintext highlighter-rouge">defvar</code>, <code class="language-plaintext highlighter-rouge">defconst</code>, or
<code class="language-plaintext highlighter-rouge">defvaralias</code> are marked as “special” with an internal bit flag
(<code class="language-plaintext highlighter-rouge">declared_special</code> in C). When the compiler detects one of these
variables (<code class="language-plaintext highlighter-rouge">special-variable-p</code>), it uses a classical dynamic binding.</p>

<p>Declaring both <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> as special restores the original semantics,
reverting <code class="language-plaintext highlighter-rouge">bar</code> back to its old byte-code definition (next time it’s
compiled, that is). But it would be poor form to mark <code class="language-plaintext highlighter-rouge">x</code> or <code class="language-plaintext highlighter-rouge">y</code> as
special: You’d de-optimize all code (compiled <em>after</em> the declaration)
anywhere in Emacs that uses these names. As a package author, only do
this with the namespace-prefixed variables that belong to you.</p>

<p>The only way to unmark a special variable is with the undocumented
function <code class="language-plaintext highlighter-rouge">internal-make-var-non-special</code>. I expected <code class="language-plaintext highlighter-rouge">makunbound</code> to
do this, but as of Emacs 25.1 it does not. This could possibly be
considered a bug.</p>

<h3 id="accidental-closures">Accidental closures</h3>

<p>I’ve said there are absolutely no advantages to <code class="language-plaintext highlighter-rouge">lexical-binding: nil</code>.
It’s only the default for the sake of backwards-compatibility. However,
there <em>is</em> one case where <code class="language-plaintext highlighter-rouge">lexical-binding: t</code> introduces a subtle issue
that would otherwise not exist. Take this code for example (and
nevermind <code class="language-plaintext highlighter-rouge">prin1-to-string</code> for a moment):</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; -*- lexical-binding: t; -*-</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">function-as-string</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">with-temp-buffer</span>
    <span class="p">(</span><span class="nb">prin1</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="ss">:example</span><span class="p">)</span> <span class="p">(</span><span class="nv">current-buffer</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)))</span>
</code></pre></div></div>

<p>This creates and serializes a closure, which is <a href="/blog/2013/12/30/">one of Elisp’s unique
features</a>. It doesn’t close over any variables, so it should be
pretty simple. However, this function will only work correctly under
<code class="language-plaintext highlighter-rouge">lexical-binding: t</code> when byte-compiled.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">function-as-string</span><span class="p">)</span>
<span class="c1">;; =&gt; "(closure ((temp-buffer . #&lt;buffer  *temp*&gt;) t) nil :example)"</span>
</code></pre></div></div>

<p>The interpreter doesn’t analyze the closure, so just closes over
everything. This includes the hidden variable <code class="language-plaintext highlighter-rouge">temp-buffer</code> created by
the <code class="language-plaintext highlighter-rouge">with-temp-buffer</code> macro, resulting in an abstraction leak.
Buffers aren’t readable, so this will signal an error if an attempt is
made to read this function back into an s-expression. The
byte-compiler fixes this by noticing <code class="language-plaintext highlighter-rouge">temp-buffer</code> isn’t actually
closed over and so doesn’t include it in the closure, making it work
correctly.</p>

<p>Under <code class="language-plaintext highlighter-rouge">lexical-binding: nil</code> it works correctly either way:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">function-as-string</span><span class="p">)</span>
<span class="c1">;; -&gt; "(lambda nil :example)"</span>
</code></pre></div></div>

<p>This may seem contrived — it’s certainly unlikely — but <a href="https://github.com/jwiegley/emacs-async/issues/17">it has come
up in practice</a>. Still, it’s no reason to avoid <code class="language-plaintext highlighter-rouge">lexical-binding: t</code>.</p>

<h3 id="use-lexical-scope-in-all-new-code">Use lexical scope in all new code</h3>

<p>As I’ve said again and again, always use <code class="language-plaintext highlighter-rouge">lexical-binding: t</code>. Use
dynamic variables judiciously. And <code class="language-plaintext highlighter-rouge">lexical-let</code> is no replacement. It
has virtually none of the benefits, performs <em>worse</em>, and it only
applies to <code class="language-plaintext highlighter-rouge">let</code>, not any of the other places bindings are created:
function parameters, <code class="language-plaintext highlighter-rouge">dotimes</code>, <code class="language-plaintext highlighter-rouge">dolist</code>, and <code class="language-plaintext highlighter-rouge">condition-case</code>.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Faster Elfeed Search Through JIT Byte-code Compilation</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/12/11/"/>
    <id>urn:uuid:47002cc3-816a-3cb8-b462-327364e3f943</id>
    <updated>2016-12-11T23:16:42Z</updated>
    <category term="emacs"/><category term="elfeed"/><category term="optimization"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>Today I pushed an update for <a href="https://github.com/skeeto/elfeed">Elfeed</a> that doubles the speed
of the search filter in the worse case. This is the user-entered
expression that dynamically narrows the entry listing to a subset that
meets certain criteria: published after a particular date,
with/without particular tags, and matching/non-matching zero or more
regular expressions. The filter is live, applied to the database as
the expression is edited, so it’s important for usability that this
search completes under a threshold that the user might notice.</p>

<p><img src="/img/elfeed/filter.gif" alt="" /></p>

<p>The typical workaround for these kinds of interfaces is to make
filtering/searching asynchronous. It’s possible to do this well, but
it’s usually a terrible, broken design. If the user acts upon the
asynchronous results — say, by typing the query and hitting enter to
choose the current or expected top result — then the final behavior is
non-deterministic, a race between the user’s typing speed and the
asynchronous search. Elfeed will keep its synchronous live search.</p>

<p>For anyone not familiar with Elfeed, here’s a filter that finds all
entries from within the past year tagged “youtube” (<code class="language-plaintext highlighter-rouge">+youtube</code>) that
mention Linux or Linus (<code class="language-plaintext highlighter-rouge">linu[sx]</code>), but aren’t tagged “bsd” (<code class="language-plaintext highlighter-rouge">-bsd</code>),
limited to the most recent 15 entries (<code class="language-plaintext highlighter-rouge">#15</code>):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@1-year-old +youtube linu[xs] -bsd #15
</code></pre></div></div>

<p>The database is primarily indexed over publication date, so filters on
publication dates are the most efficient filters. Entries are visited
in order starting with the most recently published, and the search can
bail out early once it crosses the filter threshold. Time-oriented
filters have been encouraged as the solution to keep the live search
feeling lively.</p>

<h3 id="filtering-overview">Filtering Overview</h3>

<p>The first step in filtering is parsing the filter text entered by the
user. This string is broken into its components using the
<code class="language-plaintext highlighter-rouge">elfeed-search-parse-filter</code> function. Date filter components are
converted into a unix epoch interval, tags are interned into symbols,
regular expressions are gathered up as strings, and the entry limit is
parsed into a plain integer. Absence of a filter component is
indicated by nil.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">elfeed-search-parse-filter</span> <span class="s">"@1-year-old +youtube linu[xs] -bsd #15"</span><span class="p">)</span>
<span class="c1">;; =&gt; (31557600.0 (youtube) (bsd) ("linu[xs]") nil 15)</span>
</code></pre></div></div>

<p>Previously, the next step was to apply the <code class="language-plaintext highlighter-rouge">elfeed-search-filter</code>
function with this structured filter representation to the database.
Except for special early-bailout situations, it works left-to-right
across the filter, checking each condition against each entry. This is
analogous to an interpreter, with the filter being a program.</p>

<p>Thinking about it that way, what if the filter was instead compiled
into an Emacs byte-code function and executed directly by the Emacs
virtual machine? That’s what this latest update does.</p>

<h3 id="benchmarks">Benchmarks</h3>

<p>With six different filter components, the actual filtering routine is
a bit too complicated for an article, so I’ll set up a simpler, but
roughly equivalent, scenario. With a reasonable cut-off date, the
filter was already sufficiently fast, so for benchmarking I’ll focus
on the worst case: no early bailout opportunities. An entry will be
just a list of tags (symbols), and the filter will have to test every
entry.</p>

<p>My <a href="/blog/2016/08/12/">real-world Elfeed database</a> currently has 46,772 entries with
36 distinct tags. For my benchmark I’ll round this up to a nice
100,000 entries, and use 26 distinct tags (A–Z), which has the nice
alphabet property and more closely reflects the number of tags I still
care about.</p>

<p>First, here’s <code class="language-plaintext highlighter-rouge">make-random-entry</code> to generate a random list of 1–5
tags (i.e. an entry). The <code class="language-plaintext highlighter-rouge">state</code> parameter is the random state,
allowing for deterministic benchmarks on a randomly-generated
database.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-defun</span> <span class="nv">make-random-entry</span> <span class="p">(</span><span class="k">&amp;key</span> <span class="nv">state</span> <span class="p">(</span><span class="nb">min</span> <span class="mi">1</span><span class="p">)</span> <span class="p">(</span><span class="nb">max</span> <span class="mi">5</span><span class="p">))</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">repeat</span> <span class="p">(</span><span class="nb">+</span> <span class="nb">min</span> <span class="p">(</span><span class="nv">cl-random</span> <span class="p">(</span><span class="nb">1+</span> <span class="p">(</span><span class="nb">-</span> <span class="nb">max</span> <span class="nb">min</span><span class="p">))</span> <span class="nv">state</span><span class="p">))</span>
           <span class="nv">for</span> <span class="nv">letter</span> <span class="nb">=</span> <span class="p">(</span><span class="nb">+</span> <span class="nv">?A</span> <span class="p">(</span><span class="nv">cl-random</span> <span class="mi">26</span> <span class="nv">state</span><span class="p">))</span>
           <span class="nv">collect</span> <span class="p">(</span><span class="nb">intern</span> <span class="p">(</span><span class="nb">format</span> <span class="s">"%c"</span> <span class="nv">letter</span><span class="p">))))</span>
</code></pre></div></div>

<p>The database is just a big list of entries. In Elfeed this is actually
an AVL tree. Without dates, the order doesn’t matter.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-defun</span> <span class="nv">make-random-database</span> <span class="p">(</span><span class="k">&amp;key</span> <span class="nv">state</span> <span class="p">(</span><span class="nb">count</span> <span class="mi">100000</span><span class="p">))</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">repeat</span> <span class="nb">count</span> <span class="nv">collect</span> <span class="p">(</span><span class="nv">make-random-entry</span> <span class="ss">:state</span> <span class="nv">state</span><span class="p">)))</span>
</code></pre></div></div>

<p>Here’s <a href="/blog/2009/05/28/">my old time macro</a>. An important change I’ve made since
years ago is to call <code class="language-plaintext highlighter-rouge">garbage-collect</code> before starting the clock,
eliminating bad samples from unlucky garbage collection events.
Depending on what you want to measure, it may even be worth disabling
garbage collection during the measurement by setting
<code class="language-plaintext highlighter-rouge">gc-cons-threshold</code> to a high value.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmacro</span> <span class="nv">measure-time</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">body</span><span class="p">)</span>
  <span class="p">(</span><span class="k">declare</span> <span class="p">(</span><span class="nv">indent</span> <span class="nb">defun</span><span class="p">))</span>
  <span class="p">(</span><span class="nv">garbage-collect</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">start</span> <span class="p">(</span><span class="nb">make-symbol</span> <span class="s">"start"</span><span class="p">)))</span>
    <span class="o">`</span><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="o">,</span><span class="nv">start</span> <span class="p">(</span><span class="nv">float-time</span><span class="p">)))</span>
       <span class="o">,@</span><span class="nv">body</span>
       <span class="p">(</span><span class="nb">-</span> <span class="p">(</span><span class="nv">float-time</span><span class="p">)</span> <span class="o">,</span><span class="nv">start</span><span class="p">))))</span>
</code></pre></div></div>

<p>Finally, the benchmark harness. It uses a hard-coded seed to generate
the same pseudo-random database. The test is run against the a filter
function, <code class="language-plaintext highlighter-rouge">f</code>, 100 times in search for the same 6 tags, and the timing
results are averaged.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-defun</span> <span class="nv">benchmark</span> <span class="p">(</span><span class="nv">f</span> <span class="k">&amp;optional</span> <span class="p">(</span><span class="nv">n</span> <span class="mi">100</span><span class="p">)</span> <span class="p">(</span><span class="nv">tags</span> <span class="o">'</span><span class="p">(</span><span class="nv">A</span> <span class="nv">B</span> <span class="nv">C</span> <span class="nv">D</span> <span class="nv">E</span> <span class="nv">F</span><span class="p">)))</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">state</span> <span class="p">(</span><span class="nv">copy-sequence</span> <span class="nv">[cl-random-state-tag</span> <span class="mi">-1</span> <span class="mi">30</span> <span class="nv">267466518]</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">db</span> <span class="p">(</span><span class="nv">make-random-database</span> <span class="ss">:state</span> <span class="nv">state</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">repeat</span> <span class="nv">n</span>
             <span class="nv">sum</span> <span class="p">(</span><span class="nv">measure-time</span>
                   <span class="p">(</span><span class="nb">funcall</span> <span class="nv">f</span> <span class="nv">db</span> <span class="nv">tags</span><span class="p">))</span>
             <span class="nv">into</span> <span class="nv">total</span>
             <span class="nv">finally</span> <span class="nb">return</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">total</span> <span class="p">(</span><span class="nb">float</span> <span class="nv">n</span><span class="p">)))))</span>
</code></pre></div></div>

<p>The baseline will be <code class="language-plaintext highlighter-rouge">memq</code> (test for membership using identity,
<code class="language-plaintext highlighter-rouge">eq</code>). There are two lists of tags to compare: the list that is the
entry, and the list from the filter. This requires a nested loop for
each entry, one explicit (<code class="language-plaintext highlighter-rouge">cl-loop</code>) and one implicit (<code class="language-plaintext highlighter-rouge">memq</code>), both
with early bailout.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">memq-count</span> <span class="p">(</span><span class="nv">db</span> <span class="nv">tags</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">entry</span> <span class="nv">in</span> <span class="nv">db</span> <span class="nb">count</span>
           <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">tag</span> <span class="nv">in</span> <span class="nv">tags</span>
                    <span class="nb">when</span> <span class="p">(</span><span class="nv">memq</span> <span class="nv">tag</span> <span class="nv">entry</span><span class="p">)</span>
                    <span class="nb">return</span> <span class="no">t</span><span class="p">)))</span>
</code></pre></div></div>

<p>Byte-code compiling everything and running the benchmark on my laptop
I get:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">memq-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.041 seconds</span>
</code></pre></div></div>

<p>That’s actually not too bad. One of the advantages of this definition
is that there are no function calls. The <code class="language-plaintext highlighter-rouge">memq</code> built-in function has
its own opcode (62), and the rest of the definition is special forms
and macros expanding to special forms (<code class="language-plaintext highlighter-rouge">cl-loop</code>). It’s exactly the
thing I need to exploit to make filters faster.</p>

<p>As a sanity check, what would happen if I used <code class="language-plaintext highlighter-rouge">member</code> instead of
<code class="language-plaintext highlighter-rouge">memq</code>? In theory it should be slower because it uses <code class="language-plaintext highlighter-rouge">equal</code> for
tests instead of <code class="language-plaintext highlighter-rouge">eq</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">member-count</span> <span class="p">(</span><span class="nv">db</span> <span class="nv">tags</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">entry</span> <span class="nv">in</span> <span class="nv">db</span> <span class="nb">count</span>
           <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">tag</span> <span class="nv">in</span> <span class="nv">tags</span>
                    <span class="nb">when</span> <span class="p">(</span><span class="nb">member</span> <span class="nv">tag</span> <span class="nv">entry</span><span class="p">)</span>
                    <span class="nb">return</span> <span class="no">t</span><span class="p">)))</span>
</code></pre></div></div>

<p>It’s only slightly slower because <code class="language-plaintext highlighter-rouge">member</code>, <a href="/blog/2013/01/22/">like many other
built-ins</a>, also has an opcode (157). It’s just a tiny bit
more overhead.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">member-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.047 seconds</span>
</code></pre></div></div>

<p>To test function call overhead while still using the built-in (e.g.
written in C) <code class="language-plaintext highlighter-rouge">memq</code>, I’ll alias it so that the byte-code compiler is
forced to emit a function call.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defalias</span> <span class="ss">'memq-alias</span> <span class="ss">'memq</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">memq-alias-count</span> <span class="p">(</span><span class="nv">db</span> <span class="nv">tags</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">entry</span> <span class="nv">in</span> <span class="nv">db</span> <span class="nb">count</span>
           <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">tag</span> <span class="nv">in</span> <span class="nv">tags</span>
                    <span class="nb">when</span> <span class="p">(</span><span class="nv">memq-alias</span> <span class="nv">tag</span> <span class="nv">entry</span><span class="p">)</span>
                    <span class="nb">return</span> <span class="no">t</span><span class="p">)))</span>
</code></pre></div></div>

<p>To verify that this is doing what I expect, I <code class="language-plaintext highlighter-rouge">M-x disassemble</code> the
function and inspect the byte-code disassembly. Here’s a simple
example.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">disassemble</span>
 <span class="p">(</span><span class="nv">byte-compile</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nb">list</span><span class="p">)</span> <span class="p">(</span><span class="nv">memq</span> <span class="ss">:foo</span> <span class="nb">list</span><span class="p">))))</span>
</code></pre></div></div>

<p>When compiled under lexical scope (<code class="language-plaintext highlighter-rouge">lexical-binding</code> is true), here’s
the disassembly. To understand what this means, see <a href="/blog/2014/01/04/"><em>Emacs Byte-code
Internals</em></a>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  :foo
1       stack-ref 1
2       memq
3       return
</code></pre></div></div>

<p>Notice the <code class="language-plaintext highlighter-rouge">memq</code> instruction. Try using <code class="language-plaintext highlighter-rouge">memq-alias</code> instead:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">disassemble</span>
 <span class="p">(</span><span class="nv">byte-compile</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nb">list</span><span class="p">)</span> <span class="p">(</span><span class="nv">memq-alias</span> <span class="ss">:foo</span> <span class="nb">list</span><span class="p">))))</span>
</code></pre></div></div>

<p>Resulting in a function call:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  memq-alias
1       constant  :foo
2       stack-ref 2
3       call      2
4       return
</code></pre></div></div>

<p>And the benchmark:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">memq-alias-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.052 seconds</span>
</code></pre></div></div>

<p>So the function call adds about 27% overhead. This means it would be a
good idea to <strong>avoid calling functions in the filter</strong> if I can help
it. I should rely on these special opcodes.</p>

<p>Suppose <code class="language-plaintext highlighter-rouge">memq</code> was written in Emacs Lisp rather than C. How much would
that hurt performance? My version of <code class="language-plaintext highlighter-rouge">my-memq</code> below isn’t quite the
same since it returns t rather than the sublist, but it’s good enough
for this purpose. (I’m using <code class="language-plaintext highlighter-rouge">cl-loop</code> because writing early bailout
in plain Elisp without recursion is, in my opinion, ugly.)</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">my-memq</span> <span class="p">(</span><span class="nv">needle</span> <span class="nv">haystack</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">element</span> <span class="nv">in</span> <span class="nv">haystack</span>
           <span class="nb">when</span> <span class="p">(</span><span class="nb">eq</span> <span class="nv">needle</span> <span class="nv">element</span><span class="p">)</span>
           <span class="nb">return</span> <span class="no">t</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">my-memq-count</span> <span class="p">(</span><span class="nv">db</span> <span class="nv">tags</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">entry</span> <span class="nv">in</span> <span class="nv">db</span> <span class="nb">count</span>
           <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">tag</span> <span class="nv">in</span> <span class="nv">tags</span>
                    <span class="nb">when</span> <span class="p">(</span><span class="nv">my-memq</span> <span class="nv">tag</span> <span class="nv">entry</span><span class="p">)</span>
                    <span class="nb">return</span> <span class="no">t</span><span class="p">)))</span>
</code></pre></div></div>

<p>And the benchmark:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">my-memq-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.137 seconds</span>
</code></pre></div></div>

<p>Oof! It’s more than 3 times slower than the opcode. This means <strong>I
should use built-ins as much as possible</strong> in the filter.</p>

<h3 id="dynamic-vs-lexical-scope">Dynamic vs. lexical scope</h3>

<p>There’s one last thing to watch out for. Everything so far has been
compiled with lexical scope. You should really turn this on by default
for all new code that you write. It has three important advantages:</p>

<ol>
  <li>It allows the compiler to catch more mistakes.</li>
  <li>It eliminates a class of bugs related to dynamic scope: Local
variables are exposed to manipulation by callees.</li>
  <li><a href="/blog/2016/12/22/">Lexical scope has better performance</a>.</li>
</ol>

<p>Here are all the benchmarks with the default dynamic scope:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">memq-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.065 seconds</span>

<span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">member-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.070 seconds</span>

<span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">memq-alias-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.074 seconds</span>

<span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">my-memq-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.256 seconds</span>
</code></pre></div></div>

<p>It halves the performance in this benchmark, and for no benefit. Under
dynamic scope, local variables use the <code class="language-plaintext highlighter-rouge">varref</code> opcode — a global
variable lookup — instead of the <code class="language-plaintext highlighter-rouge">stack-ref</code> opcode — a simple array
index.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">norm</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">*</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
</code></pre></div></div>

<p>Under dynamic scope, this compiles to:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       varref    a
1       varref    b
2       diff
3       varref    a
4       varref    b
5       diff
6       mult
7       return
</code></pre></div></div>

<p>And under lexical scope (notice the variable names disappear):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       stack-ref 1
1       stack-ref 1
2       diff
3       stack-ref 2
4       stack-ref 2
5       diff
6       mult
7       return
</code></pre></div></div>

<h3 id="jit-compiled-filters">JIT-compiled filters</h3>

<p>So far I’ve been moving in the wrong direction, making things slower
rather than faster. How can I make it faster than the straight <code class="language-plaintext highlighter-rouge">memq</code>
version? By compiling the filter into byte-code.</p>

<p>I won’t write the byte-code directly, but instead generate Elisp code
and use the byte-code compiler on it. This is safer, will work
correctly in future versions of Emacs, and leverages the optimizations
performed by the byte-compiler. This sort of thing recently <a href="http://emacshorrors.com/posts/when-data-becomes-code.html">got a bad
rap on Emacs Horrors</a>, but I was happy to see that this
technique is already established.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">jit-count</span> <span class="p">(</span><span class="nv">db</span> <span class="nv">tags</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">memq-list</span> <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">tag</span> <span class="nv">in</span> <span class="nv">tags</span>
                             <span class="nv">collect</span> <span class="o">`</span><span class="p">(</span><span class="nv">memq</span> <span class="ss">',tag</span> <span class="nv">entry</span><span class="p">)))</span>
         <span class="p">(</span><span class="k">function</span> <span class="o">`</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">db</span><span class="p">)</span>
                      <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">entry</span> <span class="nv">in</span> <span class="nv">db</span>
                               <span class="nb">count</span> <span class="p">(</span><span class="nb">or</span> <span class="o">,@</span><span class="nv">memq-list</span><span class="p">))))</span>
         <span class="p">(</span><span class="nv">compiled</span> <span class="p">(</span><span class="nv">byte-compile</span> <span class="k">function</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">funcall</span> <span class="nv">compiled</span> <span class="nv">db</span><span class="p">)))</span>
</code></pre></div></div>

<p>It dynamically builds the code as an s-expression, runs that through
the byte-code compiler, executes it, and throws it away. It’s
“just-in-time,” though compiling to byte-code and not <a href="/blog/2015/03/19/">native
code</a>. For the benchmark tags of <code class="language-plaintext highlighter-rouge">(A B C D E F)</code>, this builds
the following:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">db</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">entry</span> <span class="nv">in</span> <span class="nv">db</span>
           <span class="nb">count</span> <span class="p">(</span><span class="nb">or</span> <span class="p">(</span><span class="nv">memq</span> <span class="ss">'A</span> <span class="nv">entry</span><span class="p">)</span>
                     <span class="p">(</span><span class="nv">memq</span> <span class="ss">'B</span> <span class="nv">entry</span><span class="p">)</span>
                     <span class="p">(</span><span class="nv">memq</span> <span class="ss">'C</span> <span class="nv">entry</span><span class="p">)</span>
                     <span class="p">(</span><span class="nv">memq</span> <span class="ss">'D</span> <span class="nv">entry</span><span class="p">)</span>
                     <span class="p">(</span><span class="nv">memq</span> <span class="ss">'E</span> <span class="nv">entry</span><span class="p">)</span>
                     <span class="p">(</span><span class="nv">memq</span> <span class="ss">'F</span> <span class="nv">entry</span><span class="p">))))</span>
</code></pre></div></div>

<p>Due to its short-circuiting behavior, <code class="language-plaintext highlighter-rouge">or</code> is a special form, so this
function is just special forms and <code class="language-plaintext highlighter-rouge">memq</code> in its opcode form. It’s as
fast as Elisp can get.</p>

<p>Having s-expressions is a real strength for lisp, since the
alternative (in, say, JavaScript) would be to assemble the function by
concatenating code strings. By contrast, this looks a lot like a
regular lisp macro. Invoking the byte-code compiler does add some
overhead compared to the interpreted filter, but it’s insignificant.</p>

<p>How much faster is this?</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">jit-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.017s</span>
</code></pre></div></div>

<p><strong>It’s more than twice as fast!</strong> The big gain here is through <em>loop
unrolling</em>. The outer loop has been unrolled into the <code class="language-plaintext highlighter-rouge">or</code> expression.
That section of byte-code looks like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  A
1       stack-ref 1
2       memq
3       goto-if-not-nil-else-pop 1
6       constant  B
7       stack-ref 1
8       memq
9       goto-if-not-nil-else-pop 1
12      constant  C
13      stack-ref 1
14      memq
15      goto-if-not-nil-else-pop 1
18      constant  D
19      stack-ref 1
20      memq
21      goto-if-not-nil-else-pop 1
24      constant  E
25      stack-ref 1
26      memq
27      goto-if-not-nil-else-pop 1
30      constant  F
31      stack-ref 1
32      memq
33:1    return
</code></pre></div></div>

<p>In Elfeed, not only does it unroll these loops, it completely
eliminates the overhead for unused filter components. Comparing to
this benchmark, I’m seeing roughly matching gains in Elfeed’s worst
case. In Elfeed, I also bind <code class="language-plaintext highlighter-rouge">lexical-binding</code> around the
<code class="language-plaintext highlighter-rouge">byte-compile</code> call to force lexical scope, since otherwise it just
uses the buffer-local value (usually nil).</p>

<p>Filter compilation can be toggled on and off by setting
<code class="language-plaintext highlighter-rouge">elfeed-search-compile-filter</code>. If you’re up to date, try out live
filters with it both enabled and disabled. See if you can notice the
difference.</p>

<h3 id="result-summary">Result summary</h3>

<p>Here are the results in a table, all run with Emacs 24.4 on x86-64.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(ms)      memq      member    memq-alias my-memq   jit
lexical   41        47        52         137       17
dynamic   65        70        74         256       21
</code></pre></div></div>

<p>And the same benchmarks on Aarch64 (Emacs 24.5, ARM Cortex-A53), where
I also occasionally use Elfeed, and where I have been very interested
in improving performance.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(ms)      memq      member    memq-alias my-memq   jit
lexical   170       235       242        614       79
dynamic   274       340       345        1130      92
</code></pre></div></div>

<p>And here’s how you can run the benchmarks for yourself, perhaps with
different parameters:</p>

<ul>
  <li><a href="/download/jit-bench.el">jit-bench.el</a></li>
</ul>

<p>The header explains how to run the benchmark in batch mode:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ emacs -Q -batch -f batch-byte-compile jit-bench.el
$ emacs -Q -batch -l jit-bench.elc -f benchmark-batch
</code></pre></div></div>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Emacs, Dynamic Modules, and Joysticks</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/11/05/"/>
    <id>urn:uuid:c53305bb-4770-3a7f-934c-31eea37d38eb</id>
    <updated>2016-11-05T04:01:51Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="c"/><category term="linux"/>
    <content type="html">
      <![CDATA[<p>Two months ago Emacs 25 was released and introduced a <a href="http://diobla.info/blog-archive/modules-tut.html">new dynamic
module feature</a>. Emacs can now load shared libraries built
against Emacs’ module API, defined in <a href="http://git.savannah.gnu.org/cgit/emacs.git/tree/src/emacs-module.h?h=emacs-25.1">emacs-module.h</a>. What’s
interesting about this API is that it doesn’t require linking against
Emacs or any sort of library. Instead, at run time Emacs supplies the
module’s initialization function with function pointers for the entire
API.</p>

<p>As a demonstration, in this article I’ll build an Emacs joystick
interface (Linux only) using a dynamic module. It will allow Emacs to
read events from any joystick on the system. All the source code is
here:</p>

<ul>
  <li><a href="https://github.com/skeeto/joymacs">https://github.com/skeeto/joymacs</a></li>
</ul>

<p>It includes a calibration interface (<code class="language-plaintext highlighter-rouge">M-x joydemo</code>) within Emacs:</p>

<p><a href="/img/joymacs/joymacs.png"><img src="/img/joymacs/joymacs-thumb.png" alt="" /></a></p>

<p>Currently, Emacs’ emacs-module.h header is the entirety of the module
documentation. It’s a bit thin and leaves ambiguities that requires
some reading of the Emacs source code. Even reading the source, it’s
not clear which behaviors are a reliable part of the interface. For
example, if there’s a pending non-local exit, it’s safe for a function
to return <code class="language-plaintext highlighter-rouge">NULL</code> since the return value is never inspected (Emacs
25.1), but will this always be the case? While mistakes are
unforgiving (a hard crash), the API is mostly intuitive and it’s been
pretty easy to feel my way around it.</p>

<p><em>Update</em>: Philipp Stephani has <a href="https://phst.github.io/emacs-modules">written thorough, reliable module
documentation</a>.</p>

<h3 id="dynamic-module-types">Dynamic Module Types</h3>

<p>All Emacs values — integers, floats, cons cells, vectors, strings,
etc. — are represented as the polymorphic, pointer-valued type,
<code class="language-plaintext highlighter-rouge">emacs_value</code>. Despite being a pointer, <code class="language-plaintext highlighter-rouge">NULL</code> is not a valid value,
as convenient as that would be. The API includes functions for
creating and extracting the fundamental types: integers, floats,
strings. Almost all other object types can only be accessed by making
Lisp function calls to regular Emacs functions from the module.</p>

<p>Modules also introduce a brand new Emacs object type: a <em>user
pointer</em>. These are <a href="/blog/2013/12/30/">non-readable</a>, opaque pointer values
returned by modules, typically representing a handle to some resource,
be it a memory block, database connection, or a joystick. These
objects include a finalizer function pointer — which, surprisingly, is
not permitted to be NULL — and their lifetime is managed by Emacs’
garbage collector.</p>

<p>User pointers are a somewhat dangerous feature since there’s little to
stop Emacs Lisp code from misusing them. A Lisp program can take a
user pointer from one module and pass it to a function in a different
module. Since it’s just a pointer, there’s no way to type check it. At
best, a module could maintain a table of all its live pointers,
checking all user pointer arguments against the table before
dereferencing. But I don’t expect this to be normal practice.</p>

<h3 id="module-initialization">Module Initialization</h3>

<p>After loading the module through the platform’s mechanism, the first
thing Emacs does is check for the symbol <code class="language-plaintext highlighter-rouge">plugin_is_GPL_compatible</code>.
While tacky, this is not surprising given the culture around Emacs.</p>

<p>Next it calls <code class="language-plaintext highlighter-rouge">emacs_module_init()</code>, passing it the first function
pointer. From this, the module can get a Lisp environment and start
doing Emacs things, such as binding module functions to Lisp symbols.</p>

<p>Here’s a complete “Hello, world!” example:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">"emacs-module.h"</span><span class="cp">
</span>
<span class="kt">int</span> <span class="n">plugin_is_GPL_compatible</span><span class="p">;</span>

<span class="kt">int</span>
<span class="nf">emacs_module_init</span><span class="p">(</span><span class="k">struct</span> <span class="n">emacs_runtime</span> <span class="o">*</span><span class="n">ert</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">emacs_env</span> <span class="o">*</span><span class="n">env</span> <span class="o">=</span> <span class="n">ert</span><span class="o">-&gt;</span><span class="n">get_environment</span><span class="p">(</span><span class="n">ert</span><span class="p">);</span>
    <span class="n">emacs_value</span> <span class="n">message</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">intern</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="s">"message"</span><span class="p">);</span>
    <span class="k">const</span> <span class="kt">char</span> <span class="n">hi</span><span class="p">[]</span> <span class="o">=</span> <span class="s">"Hello, world!"</span><span class="p">;</span>
    <span class="n">emacs_value</span> <span class="n">string</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_string</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">hi</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">hi</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">);</span>
    <span class="n">env</span><span class="o">-&gt;</span><span class="n">funcall</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">message</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">string</span><span class="p">);</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>In a real module, it’s common to create function objects for native
functions, then fetch the <code class="language-plaintext highlighter-rouge">fset</code> symbol and make a Lisp call on it to
bind the newly-created function object to a name. You’ll see this in
action later.</p>

<h3 id="joystick-api">Joystick API</h3>

<p>The joystick API will closely resemble <a href="https://www.kernel.org/doc/Documentation/input/joystick-api.txt">Linux’s own joystick API</a>,
making for a fairly thin wrapper. It’s so thin that Emacs <em>almost</em>
doesn’t even need a dynamic module. This is because, on Linux,
joysticks are just files under <code class="language-plaintext highlighter-rouge">/dev/input/</code>. Want to see the input
events on the first joystick? Just read <code class="language-plaintext highlighter-rouge">/dev/input/js0</code>. So Plan 9.</p>

<p>Emacs already knows how to read files, but these virtual files are a
little <em>too</em> special for that. The header <code class="language-plaintext highlighter-rouge">linux/joystick.h</code> defines a
<code class="language-plaintext highlighter-rouge">struct js_event</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">js_event</span> <span class="p">{</span>
    <span class="kt">uint32_t</span> <span class="n">time</span><span class="p">;</span>  <span class="cm">/* event timestamp in milliseconds */</span>
    <span class="kt">int16_t</span> <span class="n">value</span><span class="p">;</span>
    <span class="kt">uint8_t</span> <span class="n">type</span><span class="p">;</span>
    <span class="kt">uint8_t</span> <span class="n">number</span><span class="p">;</span> <span class="cm">/* axis/button number */</span>
<span class="p">};</span>
</code></pre></div></div>

<p>The idea is to read from the joystick device into this structure. The
first several reads are initialization that define the axes and
buttons of the joystick and their initial state. Further events are
queued up for the file descriptor. This all means that the file can’t
just be opened each time joystick input is needed. It has to be held
open for the duration, and is typically configured non-blocking.</p>

<p>The Emacs package will be called <code class="language-plaintext highlighter-rouge">joymacs</code> and there will be three
functions:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">joymacs-open</span> <span class="nv">N</span><span class="p">)</span>
<span class="p">(</span><span class="nv">joymacs-close</span> <span class="nv">JOYSTICK</span><span class="p">)</span>
<span class="p">(</span><span class="nv">joymacs-read</span> <span class="nv">JOYSTICK</span> <span class="nv">EVENT-VECTOR</span><span class="p">)</span>
</code></pre></div></div>

<h4 id="joymacs-open">joymacs-open</h4>

<p>The <code class="language-plaintext highlighter-rouge">joymacs-open</code> function will take an integer, opening the Nth
joystick (<code class="language-plaintext highlighter-rouge">/dev/input/jsN</code>). It will create a file descriptor for the
joystick device, returning it as a user pointer. Think of it as a sort
of “joystick handle.” Now, it <em>could</em> instead return the file
descriptor as an integer, but the user pointer has two significant
benefits:</p>

<ol>
  <li>
    <p><strong>The resource will be garbage collected.</strong> If the caller loses
track of a file descriptor returned as an integer, the joystick
device will be held open until Emacs shuts down, using up one of
Emacs’ file descriptors. By putting it in a user pointer, the
garbage collector will have the module to release the file
descriptor if the user loses track of it.</p>
  </li>
  <li>
    <p><strong>It should be difficult for the user to make a dangerous call.</strong>
Emacs Lisp can’t create user pointers — they only come from modules
— and so the module is less likely to get passed the wrong thing.
In the case of <code class="language-plaintext highlighter-rouge">joystick-close</code>, the module will be calling
<code class="language-plaintext highlighter-rouge">close(2)</code> on the argument. We definitely don’t want to make that
system call on file descriptors owned by Emacs. Further, since user
pointers are mutable, the module can ensure it doesn’t call
<code class="language-plaintext highlighter-rouge">close(2)</code> twice.</p>
  </li>
</ol>

<p>Here’s the implementation for <code class="language-plaintext highlighter-rouge">joymacs-open</code>. I’ll over over each part
in detail.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="n">emacs_value</span>
<span class="nf">joymacs_open</span><span class="p">(</span><span class="n">emacs_env</span> <span class="o">*</span><span class="n">env</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">n</span><span class="p">,</span> <span class="n">emacs_value</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">ptr</span><span class="p">)</span>
<span class="p">{</span>
    <span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="n">ptr</span><span class="p">;</span>
    <span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="n">n</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">id</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">extract_integer</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">env</span><span class="o">-&gt;</span><span class="n">non_local_exit_check</span><span class="p">(</span><span class="n">env</span><span class="p">)</span> <span class="o">!=</span> <span class="n">emacs_funcall_exit_return</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
    <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">64</span><span class="p">];</span>
    <span class="kt">int</span> <span class="n">buflen</span> <span class="o">=</span> <span class="n">sprintf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="s">"/dev/input/js%d"</span><span class="p">,</span> <span class="n">id</span><span class="p">);</span>
    <span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">open</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="n">O_RDONLY</span> <span class="o">|</span> <span class="n">O_NONBLOCK</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">fd</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">emacs_value</span> <span class="n">signal</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">intern</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="s">"file-error"</span><span class="p">);</span>
        <span class="n">emacs_value</span> <span class="n">message</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_string</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">buf</span><span class="p">,</span> <span class="n">buflen</span><span class="p">);</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">non_local_exit_signal</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">signal</span><span class="p">,</span> <span class="n">message</span><span class="p">);</span>
        <span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_user_ptr</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">fin_close</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)(</span><span class="kt">intptr_t</span><span class="p">)</span><span class="n">fd</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The C function name doesn’t matter to Emacs. It’s <code class="language-plaintext highlighter-rouge">static</code> because it
doesn’t even matter if the function visible to Emacs. It will get the
function pointer later as part of initialization.</p>

<p>This is the prototype for all functions callable by Emacs Lisp,
regardless of its arity. It has four arguments:</p>

<ol>
  <li>
    <p>It gets an environment, <code class="language-plaintext highlighter-rouge">env</code>, through which to call back into
Emacs.</p>
  </li>
  <li>
    <p>It gets <code class="language-plaintext highlighter-rouge">n</code>, the number of arguments. This is guaranteed to be the
correct number of arguments, as specified later when creating the
function object, so only variadic functions need to inspect this
argument.</p>
  </li>
  <li>
    <p>The Lisp arguments are passed as an array of values, <code class="language-plaintext highlighter-rouge">args</code>.
There’s no type declaration when declaring a function object, so
these may be of the wrong type. I’ll go over how to deal with this.</p>
  </li>
  <li>
    <p>Finally, it gets an arbitrary pointer, supplied at function object
creation time. This allows the module to create closures, but will
usually be ignored.</p>
  </li>
</ol>

<p>The first thing the function does is extract its integer argument.
This is actually an <code class="language-plaintext highlighter-rouge">intmax_t</code>, but I don’t think anyone has that many
USB ports. An <code class="language-plaintext highlighter-rouge">int</code> will suffice.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kt">int</span> <span class="n">id</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">extract_integer</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">env</span><span class="o">-&gt;</span><span class="n">non_local_exit_check</span><span class="p">(</span><span class="n">env</span><span class="p">)</span> <span class="o">!=</span> <span class="n">emacs_funcall_exit_return</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
</code></pre></div></div>

<p>As for not underestimating fools, what if the user passed a value that
isn’t an integer? Will the world come crashing down? Fortunately Emacs
checks that in <code class="language-plaintext highlighter-rouge">extract_integer</code> and, if there’s a mismatch, sets a
pending error signal in the environment. This is really great because
checking types directly in the module is a <em>real pain the ass</em>. So,
before committing to anything further, such as opening a file, I check
for this signal and bail out early if necessary. In Emacs 25.1 it’s
safe to return NULL since the return value will be completely ignored,
but I’d rather hedge my bets.</p>

<p>By the way, the <code class="language-plaintext highlighter-rouge">nil</code> here is a global variable set in initialization.
You don’t just get that for free!</p>

<p>The next step is opening the joystick device, read-only and
non-blocking. The non-blocking is vital because the module would
otherwise hang Emacs later if there are no events (well, except for
the read being quickly interrupted by a POSIX signal).</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">64</span><span class="p">];</span>
    <span class="kt">int</span> <span class="n">buflen</span> <span class="o">=</span> <span class="n">sprintf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="s">"/dev/input/js%d"</span><span class="p">,</span> <span class="n">id</span><span class="p">);</span>
    <span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">open</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="n">O_RDONLY</span> <span class="o">|</span> <span class="n">O_NONBLOCK</span><span class="p">);</span>
</code></pre></div></div>

<p>If the joystick fails to open (e.g. it doesn’t exist, or the user
lacks permission), manually set an error signal for a non-local exit.
I chose the <code class="language-plaintext highlighter-rouge">file-error</code> signal and I’m just using the filename as the
signal data.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">if</span> <span class="p">(</span><span class="n">fd</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">emacs_value</span> <span class="n">signal</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">intern</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="s">"file-error"</span><span class="p">);</span>
        <span class="n">emacs_value</span> <span class="n">message</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_string</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">buf</span><span class="p">,</span> <span class="n">buflen</span><span class="p">);</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">non_local_exit_signal</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">signal</span><span class="p">,</span> <span class="n">message</span><span class="p">);</span>
        <span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>Otherwise create the user pointer. No need to allocate any memory;
just stuff it in the pointer itself. If the user mistakenly passes it
to another module, it will sure be in for a surprise when it tries to
dereference it.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">return</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_user_ptr</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">fin_close</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)(</span><span class="kt">intptr_t</span><span class="p">)</span><span class="n">fd</span><span class="p">);</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">fin_close()</code> function is defined as:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">void</span>
<span class="nf">fin_close</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="n">fdptr</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="p">(</span><span class="kt">intptr_t</span><span class="p">)</span><span class="n">fdptr</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">fd</span> <span class="o">!=</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
        <span class="n">close</span><span class="p">(</span><span class="n">fd</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The garbage collector will call this function when the user pointer is
lost. If the user closes it early with <code class="language-plaintext highlighter-rouge">joymacs-close</code>, that function
will set the user pointer to -1, an invalid file descriptor, so that
it doesn’t get closed a second time here.</p>

<h4 id="joymacs-close">joymacs-close</h4>

<p>Here’s <code class="language-plaintext highlighter-rouge">joymacs-close</code>, which is a bit simpler.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="n">emacs_value</span>
<span class="nf">joymacs_close</span><span class="p">(</span><span class="n">emacs_env</span> <span class="o">*</span><span class="n">env</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">n</span><span class="p">,</span> <span class="n">emacs_value</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">ptr</span><span class="p">)</span>
<span class="p">{</span>
    <span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="n">ptr</span><span class="p">;</span>
    <span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="n">n</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="p">(</span><span class="kt">intptr_t</span><span class="p">)</span><span class="n">env</span><span class="o">-&gt;</span><span class="n">get_user_ptr</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">env</span><span class="o">-&gt;</span><span class="n">non_local_exit_check</span><span class="p">(</span><span class="n">env</span><span class="p">)</span> <span class="o">!=</span> <span class="n">emacs_funcall_exit_return</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">fd</span> <span class="o">!=</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">close</span><span class="p">(</span><span class="n">fd</span><span class="p">);</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">set_user_ptr</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)(</span><span class="kt">intptr_t</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Again, it starts by extracting its argument, relying on Emacs to do
the check:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="p">(</span><span class="kt">intptr_t</span><span class="p">)</span><span class="n">env</span><span class="o">-&gt;</span><span class="n">get_user_ptr</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">env</span><span class="o">-&gt;</span><span class="n">non_local_exit_check</span><span class="p">(</span><span class="n">env</span><span class="p">)</span> <span class="o">!=</span> <span class="n">emacs_funcall_exit_return</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
</code></pre></div></div>

<p>If the user pointer hasn’t been closed yet, then close it and strip
out the file descriptor to prevent further closes.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">if</span> <span class="p">(</span><span class="n">fd</span> <span class="o">!=</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">close</span><span class="p">(</span><span class="n">fd</span><span class="p">);</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">set_user_ptr</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)(</span><span class="kt">intptr_t</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>
</code></pre></div></div>

<h4 id="joymacs-read">joymacs-read</h4>

<p>The <code class="language-plaintext highlighter-rouge">joymacs-read</code> function is doing something a little unusual for an
Emacs Lisp function. It takes two arguments: the joystick handle and a
5-element vector. Instead of returning the event in some
representation, it fills the vector with the event details. The are
two reasons for this:</p>

<ol>
  <li>
    <p>The API has no function for creating vectors … though the module
<em>could</em> get the <code class="language-plaintext highlighter-rouge">make-symbol</code> vector and call it to create a
vector.</p>
  </li>
  <li>
    <p>The idiom for event pumps is for the caller to supply a buffer to
the pump. This has better performance by avoiding lots of
unnecessary allocations, especially since events tend to be
message-like objects with a short, well-defined extent.</p>
  </li>
</ol>

<p>Here’s the full definition:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="n">emacs_value</span>
<span class="nf">joymacs_read</span><span class="p">(</span><span class="n">emacs_env</span> <span class="o">*</span><span class="n">env</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">n</span><span class="p">,</span> <span class="n">emacs_value</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">ptr</span><span class="p">)</span>
<span class="p">{</span>
    <span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="n">n</span><span class="p">;</span>
    <span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="n">ptr</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="p">(</span><span class="kt">intptr_t</span><span class="p">)</span><span class="n">env</span><span class="o">-&gt;</span><span class="n">get_user_ptr</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">env</span><span class="o">-&gt;</span><span class="n">non_local_exit_check</span><span class="p">(</span><span class="n">env</span><span class="p">)</span> <span class="o">!=</span> <span class="n">emacs_funcall_exit_return</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
    <span class="k">struct</span> <span class="n">js_event</span> <span class="n">e</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">r</span> <span class="o">=</span> <span class="n">read</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">e</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">e</span><span class="p">));</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">r</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span> <span class="o">&amp;&amp;</span> <span class="n">errno</span> <span class="o">==</span> <span class="n">EAGAIN</span><span class="p">)</span> <span class="p">{</span>
        <span class="cm">/* No more events. */</span>
        <span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">r</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="cm">/* An actual read error (joystick unplugged, etc.). */</span>
        <span class="n">emacs_value</span> <span class="n">signal</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">intern</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="s">"file-error"</span><span class="p">);</span>
        <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">error</span> <span class="o">=</span> <span class="n">strerror</span><span class="p">(</span><span class="n">errno</span><span class="p">);</span>
        <span class="kt">size_t</span> <span class="n">len</span> <span class="o">=</span> <span class="n">strlen</span><span class="p">(</span><span class="n">error</span><span class="p">);</span>
        <span class="n">emacs_value</span> <span class="n">message</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_string</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">error</span><span class="p">,</span> <span class="n">len</span><span class="p">);</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">non_local_exit_signal</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">signal</span><span class="p">,</span> <span class="n">message</span><span class="p">);</span>
        <span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="cm">/* Fill out event vector. */</span>
        <span class="n">emacs_value</span> <span class="n">v</span> <span class="o">=</span> <span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
        <span class="n">emacs_value</span> <span class="n">type</span> <span class="o">=</span> <span class="n">e</span><span class="p">.</span><span class="n">type</span> <span class="o">&amp;</span> <span class="n">JS_EVENT_BUTTON</span> <span class="o">?</span> <span class="n">button</span> <span class="o">:</span> <span class="n">axis</span><span class="p">;</span>
        <span class="n">emacs_value</span> <span class="n">value</span><span class="p">;</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">type</span> <span class="o">==</span> <span class="n">button</span><span class="p">)</span>
            <span class="n">value</span> <span class="o">=</span> <span class="n">e</span><span class="p">.</span><span class="n">value</span> <span class="o">?</span> <span class="n">t</span> <span class="o">:</span> <span class="n">nil</span><span class="p">;</span>
        <span class="k">else</span>
            <span class="n">value</span> <span class="o">=</span>  <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_float</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">e</span><span class="p">.</span><span class="n">value</span> <span class="o">/</span> <span class="p">(</span><span class="kt">double</span><span class="p">)</span><span class="n">INT16_MAX</span><span class="p">);</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">vec_set</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_integer</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">e</span><span class="p">.</span><span class="n">time</span><span class="p">));</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">vec_set</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">type</span><span class="p">);</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">vec_set</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">value</span><span class="p">);</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">vec_set</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_integer</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">e</span><span class="p">.</span><span class="n">number</span><span class="p">));</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">vec_set</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="n">e</span><span class="p">.</span><span class="n">type</span> <span class="o">&amp;</span> <span class="n">JS_EVENT_INIT</span> <span class="o">?</span> <span class="n">t</span> <span class="o">:</span> <span class="n">nil</span><span class="p">);</span>
        <span class="k">return</span> <span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>As before, extract the first argument and check for a signal. Then
call <code class="language-plaintext highlighter-rouge">read(2)</code> to get an event. If the read fails with <code class="language-plaintext highlighter-rouge">EAGAIN</code>, it’s
not a real failure. There are just no more events, so return nil.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">struct</span> <span class="n">js_event</span> <span class="n">e</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">r</span> <span class="o">=</span> <span class="n">read</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">e</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">e</span><span class="p">));</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">r</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span> <span class="o">&amp;&amp;</span> <span class="n">errno</span> <span class="o">==</span> <span class="n">EAGAIN</span><span class="p">)</span> <span class="p">{</span>
        <span class="cm">/* No more events. */</span>
        <span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>If the read failed with something else — perhaps the joystick was
unplugged — signal an error. The <code class="language-plaintext highlighter-rouge">strerror(3)</code> string is used for the
signal data.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">if</span> <span class="p">(</span><span class="n">r</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="cm">/* An actual read error (joystick unplugged, etc.). */</span>
        <span class="n">emacs_value</span> <span class="n">signal</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">intern</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="s">"file-error"</span><span class="p">);</span>
        <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">error</span> <span class="o">=</span> <span class="n">strerror</span><span class="p">(</span><span class="n">errno</span><span class="p">);</span>
        <span class="n">emacs_value</span> <span class="n">message</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_string</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">error</span><span class="p">,</span> <span class="n">strlen</span><span class="p">(</span><span class="n">error</span><span class="p">));</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">non_local_exit_signal</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">signal</span><span class="p">,</span> <span class="n">message</span><span class="p">);</span>
        <span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>Otherwise fill out the event vector. If the second argument isn’t a
vector, or if it’s too short, the signal will automatically get raised
by Emacs. The module can keep plowing through the <code class="language-plaintext highlighter-rouge">vec_set()</code> calls
safely since it’s not committing to anything.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>        <span class="cm">/* Fill out event vector. */</span>
        <span class="n">emacs_value</span> <span class="n">v</span> <span class="o">=</span> <span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
        <span class="n">emacs_value</span> <span class="n">type</span> <span class="o">=</span> <span class="n">e</span><span class="p">.</span><span class="n">type</span> <span class="o">&amp;</span> <span class="n">JS_EVENT_BUTTON</span> <span class="o">?</span> <span class="n">button</span> <span class="o">:</span> <span class="n">axis</span><span class="p">;</span>
        <span class="n">emacs_value</span> <span class="n">value</span><span class="p">;</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">type</span> <span class="o">==</span> <span class="n">button</span><span class="p">)</span>
            <span class="n">value</span> <span class="o">=</span> <span class="n">e</span><span class="p">.</span><span class="n">value</span> <span class="o">?</span> <span class="n">t</span> <span class="o">:</span> <span class="n">nil</span><span class="p">;</span>
        <span class="k">else</span>
            <span class="n">value</span> <span class="o">=</span>  <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_float</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">e</span><span class="p">.</span><span class="n">value</span> <span class="o">/</span> <span class="p">(</span><span class="kt">double</span><span class="p">)</span><span class="n">INT16_MAX</span><span class="p">);</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">vec_set</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_integer</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">e</span><span class="p">.</span><span class="n">time</span><span class="p">));</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">vec_set</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">type</span><span class="p">);</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">vec_set</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">value</span><span class="p">);</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">vec_set</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_integer</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">e</span><span class="p">.</span><span class="n">number</span><span class="p">));</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">vec_set</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="n">e</span><span class="p">.</span><span class="n">type</span> <span class="o">&amp;</span> <span class="n">JS_EVENT_INIT</span> <span class="o">?</span> <span class="n">t</span> <span class="o">:</span> <span class="n">nil</span><span class="p">);</span>
        <span class="k">return</span> <span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
</code></pre></div></div>

<p>The Linux event struct has four fields and the function fills out five
values of the vector. This is because the <code class="language-plaintext highlighter-rouge">type</code> field has a bit flag
indicating initialization events. This is split out into an extra
t/nil value. It also normalizes axis values and converts button values
into t/nil, which makes more sense for Emacs Lisp. The event itself is
returned since it’s a truthy value and it’s convenient for the caller.</p>

<p>The astute programmer might notice that the negative side of the axis
could go just below -1.0, since <code class="language-plaintext highlighter-rouge">INT16_MIN</code> has one extra value over
<code class="language-plaintext highlighter-rouge">INT16_MAX</code> (two’s complement). It doesn’t seem to be documented, but
the joystick drivers I’ve seen never exactly return <code class="language-plaintext highlighter-rouge">INT16_MIN</code>, so
this is in fact the correct way to normalize it.</p>

<h4 id="initialization">Initialization</h4>

<p><em>Update 2021</em>: In a previous version of this article, I talked about
interning symbols during initialziation so that they do not need to be
re-interned each time the module is called. This <a href="https://github.com/skeeto/joymacs/issues/1">no longer works</a>,
and it was probably never intended to be work in the first place. The
lesson is simple: <strong>Do not reuse Emacs objects between module calls.</strong></p>

<p>First grab the <code class="language-plaintext highlighter-rouge">fset</code> symbol since this function will be needed to bind
names to the module’s functions.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">emacs_value</span> <span class="n">fset</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">intern</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="s">"fset"</span><span class="p">);</span>
</code></pre></div></div>

<p>Using <code class="language-plaintext highlighter-rouge">fset</code>, bind the functions. The second and third arguments to
<code class="language-plaintext highlighter-rouge">make_function</code> are the minimum and maximum number of arguments, which
<a href="/blog/2014/01/04/">may look familiar</a>. The last argument is that closure pointer
I mentioned at the beginning.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">emacs_value</span> <span class="n">args</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span>
    <span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">intern</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="s">"joymacs-open"</span><span class="p">);</span>
    <span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_function</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">joymacs_open</span><span class="p">,</span> <span class="n">doc</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
    <span class="n">env</span><span class="o">-&gt;</span><span class="n">funcall</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">fset</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">args</span><span class="p">);</span>
</code></pre></div></div>

<p>If the module is to be loaded with <code class="language-plaintext highlighter-rouge">require</code> like any other package,
it needs to provide: <code class="language-plaintext highlighter-rouge">(provide 'joymacs)</code>.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">emacs_value</span> <span class="n">provide</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">intern</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="s">"provide"</span><span class="p">);</span>
    <span class="n">emacs_value</span> <span class="n">joymacs</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">intern</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="s">"joymacs"</span><span class="p">);</span>
    <span class="n">env</span><span class="o">-&gt;</span><span class="n">funcall</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">provide</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">joymacs</span><span class="p">);</span>
</code></pre></div></div>

<p>And that’s it!</p>

<p>The source repository now includes a port to Windows (XInput). If
you’re on Linux or Windows, have Emacs 25 with modules enabled, and a
joystick is plugged in, then <code class="language-plaintext highlighter-rouge">make run</code> in the repository should bring
up Emacs running a joystick calibration demonstration. The module
can’t poke at Emacs when events are ready, so instead there’s a timer
that polls the module for events.</p>

<p>I’d like to someday see an Emacs Lisp game well-suited for a joystick.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Elfeed, cURL, and You</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/06/16/"/>
    <id>urn:uuid:76942398-f693-3127-fd45-19d508b5c044</id>
    <updated>2016-06-16T18:22:16Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="elfeed"/>
    <content type="html">
      <![CDATA[<p>This morning I pushed out an important update to <a href="https://github.com/skeeto/elfeed">Elfeed</a>, my
web feed reader for Emacs. The update should be available in MELPA by
the time you read this. Elfeed now has support for fetching feeds
using a <a href="https://curl.haxx.se/">cURL</a> through a <code class="language-plaintext highlighter-rouge">curl</code> inferior process. You’ll need
the program in your PATH or configured through
<code class="language-plaintext highlighter-rouge">elfeed-curl-program-name</code>.</p>

<p>I’ve been using it for a couple of days now, but, while I work out the
remaining kinks, it’s disabled by default. So in addition to having
cURL installed, you’ll need to set <code class="language-plaintext highlighter-rouge">elfeed-use-curl</code> to non-nil.
Sometime soon it will be enabled by default whenever cURL is
available. The original <code class="language-plaintext highlighter-rouge">url-retrieve</code> fetcher will remain in place
for time time being. However, cURL <em>may</em> become a requirement someday.</p>

<p>Fetching with a <code class="language-plaintext highlighter-rouge">curl</code> inferior process has some huge advantages.</p>

<h3 id="its-much-faster">It’s much faster</h3>

<p>The most obvious change is that you should experience a huge speedup
on updates and better responsiveness during updates after the first
cURL run. There are important two reasons:</p>

<p><strong>Asynchronous DNS and TCP</strong>: Emacs 24 and earlier performs DNS
queries synchronously even for asynchronous network processes. This is
being fixed on some platforms (including Linux) in Emacs 25, but now
we don’t have to wait.</p>

<p>On Windows it’s even worse: the TCP connection is also established
synchronously. This is especially bad when fetching relatively small
items such as feeds, because the DNS look-up and TCP handshake dominate
the overall fetch time. It essentially makes the whole process
synchronous.</p>

<p><strong>Conditional GET</strong>: HTTP has two mechanism to avoid transmitting
information that a client has previously fetched. One is the
Last-Modified header delivered by the server with the content. When
querying again later, the client echos the date back <a href="https://utcc.utoronto.ca/~cks/space/blog/web/IfModifiedSinceHowNot">like a
token</a> in the If-Modified-Since header.</p>

<p>The second is the “entity tag,” an arbitrary server-selected token
associated with each version of the content. The server delivers it
along with the content in the ETag header, and the client hands it
back later in the If-None-Match header, sort of like a cookie.</p>

<p>This is highly valuable for feeds because, unless the feed is
particularly active, most of the time the feed hasn’t been updated
since the last query. This avoids sending anything other hand a
handful of headers each way. In Elfeed’s case, it means <strong>it doesn’t
have to parse the same XML over and over again</strong>.</p>

<p>Both of these being outside of cURL’s scope, Elfeed has to manage
conditional GET itself. I had no control over the HTTP headers until
now, so I couldn’t take advantage of it. Emacs’ <code class="language-plaintext highlighter-rouge">url-retrieve</code>
function allows for sending custom headers through dynamically binding
<code class="language-plaintext highlighter-rouge">url-request-extra-headers</code>, but this isn’t available when calling
<code class="language-plaintext highlighter-rouge">url-queue-retrieve</code> since the request itself is created
asynchronously.</p>

<p>Both the ETag and Last-Modified values are stored in the database and
persist across sessions. This is the reason the full speedup isn’t
realized until the second fetch. The initial cURL fetch doesn’t have
these values.</p>

<h3 id="fewer-bugs">Fewer bugs</h3>

<p>As mentioned previously, Emacs has a built-in URL retrieval library
called <code class="language-plaintext highlighter-rouge">url</code>. The central function is <code class="language-plaintext highlighter-rouge">url-retrieve</code> which
asynchronously fetches the content at an arbitrary URL (usually HTTP)
and delivers the buffer and status to a callback when it’s ready.
There’s also a queue front-end for it, <code class="language-plaintext highlighter-rouge">url-queue-retrieve</code> which
limits the number of parallel connections. Elfeed hands this function
a pile of feed URLs all at once and it fetches them N at a time.</p>

<p>Unfortunately both these functions are <em>incredibly</em> buggy. It’s been a
thorn in my side for years.</p>

<p>Here’s what the interface looks like for both:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">url-retrieve</span> <span class="nv">URL</span> <span class="nv">CALLBACK</span> <span class="k">&amp;optional</span> <span class="nv">CBARGS</span> <span class="nv">SILENT</span> <span class="nv">INHIBIT-COOKIES</span><span class="p">)</span>
</code></pre></div></div>

<p>It takes a URL and a callback. Seeing this, the sane, unsurprising
expectation is the callback will be invoked <em>exactly once</em> for time
<code class="language-plaintext highlighter-rouge">url-retrieve</code> was called. In any case where the request fails, it
should report it through the callback. <a href="http://debbugs.gnu.org/cgi/bugreport.cgi?bug=20159">This is not the case</a>.
The callback may be invoked any number of times, <em>including zero</em>.</p>

<p>In this example, suppose you have a webserver that will return an HTTP
404 for a requested URL. Below, I fire off 10 asynchronous requests in a
row.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">results</span> <span class="p">())</span>
<span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">i</span> <span class="mi">10</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">url-retrieve</span> <span class="s">"http://127.0.0.1:8080/404"</span>
                <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">status</span><span class="p">)</span> <span class="p">(</span><span class="nb">push</span> <span class="p">(</span><span class="nb">cons</span> <span class="nv">i</span> <span class="nv">status</span><span class="p">)</span> <span class="nv">results</span><span class="p">))))</span>
</code></pre></div></div>

<p>What would you guess is the length of <code class="language-plaintext highlighter-rouge">results</code>? It’s initially 0
before any requests complete and over time (a very short time) I would
expect this to top out at 10. On Emacs 24, here’s the real answer:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">length</span> <span class="nv">results</span><span class="p">)</span>
<span class="c1">;; =&gt; 46</span>
</code></pre></div></div>

<p>The same error is reported multiple times to the callback. At least
the pattern is obvious.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-count</span> <span class="mi">0</span> <span class="nv">results</span> <span class="ss">:key</span> <span class="nf">#'</span><span class="nb">car</span><span class="p">)</span>
<span class="c1">;; =&gt; 9</span>
<span class="p">(</span><span class="nv">cl-count</span> <span class="mi">1</span> <span class="nv">results</span> <span class="ss">:key</span> <span class="nf">#'</span><span class="nb">car</span><span class="p">)</span>
<span class="c1">;; =&gt; 8</span>
<span class="p">(</span><span class="nv">cl-count</span> <span class="mi">2</span> <span class="nv">results</span> <span class="ss">:key</span> <span class="nf">#'</span><span class="nb">car</span><span class="p">)</span>
<span class="c1">;; =&gt; 7</span>

<span class="p">(</span><span class="nv">cl-count</span> <span class="mi">9</span> <span class="nv">results</span> <span class="ss">:key</span> <span class="nf">#'</span><span class="nb">car</span><span class="p">)</span>
<span class="c1">;; =&gt; 1</span>
</code></pre></div></div>

<p>Here’s another one, this time to the non-existent foo.example. The DNS
query should never resolve.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">results</span> <span class="p">())</span>
<span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">i</span> <span class="mi">10</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">url-retrieve</span> <span class="s">"http://foo.example/"</span>
                <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">status</span><span class="p">)</span> <span class="p">(</span><span class="nb">push</span> <span class="p">(</span><span class="nb">cons</span> <span class="nv">i</span> <span class="nv">status</span><span class="p">)</span> <span class="nv">results</span><span class="p">))))</span>
</code></pre></div></div>

<p>What’s the length of <code class="language-plaintext highlighter-rouge">results</code>? This time it’s zero. Remember how DNS
is synchronous? Because of this, DNS failures are reported
synchronously as a signaled error. This gets a lot worse with
<code class="language-plaintext highlighter-rouge">url-queue-retrieve</code>. Since the request is put off until later, DNS
doesn’t fail until later, and you get neither a callback nor an error
signal. This also puts the queue in a bad state and necessitated
<code class="language-plaintext highlighter-rouge">elfeed-unjam</code> for manually clear it. This one should get fixed in
Emacs 25 when DNS is asynchronous.</p>

<p>This last one assumes you don’t have anything listening on port 57432
(pulled out of nowhere) so that the connection fails.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">results</span> <span class="p">())</span>
<span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">i</span> <span class="mi">10</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">url-retrieve</span> <span class="s">"http://127.0.0.1:57432/"</span>
                <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">status</span><span class="p">)</span> <span class="p">(</span><span class="nb">push</span> <span class="p">(</span><span class="nb">cons</span> <span class="nv">i</span> <span class="nv">status</span><span class="p">)</span> <span class="nv">results</span><span class="p">))))</span>
</code></pre></div></div>

<p>On Linux, we finally get the sane result of 10. However, on Windows,
it’s zero. The synchronous TCP connection will fail, signaling an
error just like DNS failures. Not only is it broken, it’s broken in
different ways on different platforms.</p>

<p>There are many more cases of callback weirdness which depend on the
connection and HTTP session being in various states when thing go
awry. These were just the easiest to demonstrate. By using cURL, I get
to bypass this mess.</p>

<h3 id="no-more-gnutls-issues">No more GnuTLS issues</h3>

<p>At compile time, Emacs can optionally be linked against GnuTLS, giving
it robust TLS support so long as the shared library is available.
<code class="language-plaintext highlighter-rouge">url-retrieve</code> uses this for fetching HTTPS content. Unfortunately,
this library is noisy and will occasionally echo non-informational
messages in the minibuffer and in <code class="language-plaintext highlighter-rouge">*Messages*</code> that cannot be
suppressed.</p>

<p>When not linked against GnuTLS, Emacs will instead run the GnuTLS
command line program as an inferior process, just like Elfeed now does
with cURL. Unfortunately this interface is very slow and frequently
fails, basically preventing Elfeed from fetching HTTPS feeds. I
suspect it’s in part due to an improper <code class="language-plaintext highlighter-rouge">coding-system-for-read</code>.</p>

<p>cURL handles all the TLS negotation itself, so both these problems
disappear. The compile-time configuration doesn’t matter.</p>

<h3 id="windows-is-now-supported">Windows is now supported</h3>

<p>Emacs’ Windows networking code is so unstable, even in Emacs 25, that
I couldn’t make any practical use of Elfeed on that platform. Even the
Cygwin emacs-w32 version couldn’t cut it. It hard crashes Emacs every
time I’ve tried to fetch feeds. Fortunately the inferior process code
is a whole lot more stable, meaning fetching with cURL works great. As
of today, you can now use Elfeed on Windows. The biggest obstable is
getting cURL installed and configured.</p>

<h3 id="interface-changes">Interface changes</h3>

<p>With cURL, obviously the values of <code class="language-plaintext highlighter-rouge">url-queue-timeout</code> and
<code class="language-plaintext highlighter-rouge">url-queue-parallel-processes</code> no longer have any meaning to Elfeed.
If you set these for yourself, you should instead call the functions
<code class="language-plaintext highlighter-rouge">elfeed-set-timeout</code> and <code class="language-plaintext highlighter-rouge">elfeed-set-max-connections</code>, which will do
the appropriate thing depending on the value of <code class="language-plaintext highlighter-rouge">elfeed-use-curl</code>.
Each also comes with a getter so you can query the current value.</p>

<p>The deprecated <code class="language-plaintext highlighter-rouge">elfeed-max-connections</code> has been removed.</p>

<p>Feed objects now have meta tags <code class="language-plaintext highlighter-rouge">:etag</code>, <code class="language-plaintext highlighter-rouge">:last-modified</code>, and
<code class="language-plaintext highlighter-rouge">:canonical-url</code>. The latter can identify feeds that have been moved,
though it needs a real UI.</p>

<h3 id="see-any-bugs">See any bugs?</h3>

<p>If you use Elfeed, grab the current update and give the cURL fetcher a
shot. Please open a ticket if you find problems. Be sure to report
your Emacs version, operating system, and cURL version.</p>

<p>As of this writing there’s just one thing missing compared to
url-queue: connection reuse. cURL supports it, so I just need to code
it up.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>RSA Signatures in Emacs Lisp</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2015/10/30/"/>
    <id>urn:uuid:9d9ef14d-d513-3cad-b053-fb016f3c3bf0</id>
    <updated>2015-10-30T22:35:13Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="lisp"/>
    <content type="html">
      <![CDATA[<p>Emacs comes with a wonderful arbitrary-precision computer algebra
system called <a href="http://www.gnu.org/software/emacs/manual/html_mono/calc.html">calc</a>. I’ve <a href="/blog/2009/06/23/">discussed it previously</a> and
continue to use it on a daily basis. That’s right, people, <em>Emacs can
do calculus</em>. Like everything Emacs, it’s programmable and extensible
from Emacs Lisp. In this article, I’m going to implement the <a href="https://en.wikipedia.org/wiki/RSA_(cryptosystem)">RSA
public-key cryptosystem</a> in Emacs Lisp using calc.</p>

<p>If you want to dive right in first, here’s the repository:</p>

<ul>
  <li><a href="https://github.com/skeeto/emacs-rsa">https://github.com/skeeto/emacs-rsa</a></li>
</ul>

<p>This is only a toy implementation and not really intended for serious
cryptographic work. It’s also far too slow when using keys of
reasonable length.</p>

<h3 id="evaluation-with-calc">Evaluation with calc</h3>

<p>The calc package is particularly useful when considering Emacs’
limited integer type. Emacs uses a tagged integer scheme where
integers are embedded within pointers. It’s a lot faster than the
alternative (individually-allocated integer objects), but it means
they’re always a few bits short of the platform’s native integer type.</p>

<p>calc has a large API, but the user-friendly porcelain for it is the
under-documented <code class="language-plaintext highlighter-rouge">calc-eval</code> function. It evaluates an expression
string with format-like argument substitutions (<code class="language-plaintext highlighter-rouge">$n</code>).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"2^16 - 1"</span><span class="p">)</span>
<span class="c1">;; =&gt; "65535"</span>

<span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"2^$1 - 1"</span> <span class="no">nil</span> <span class="mi">128</span><span class="p">)</span>
<span class="c1">;; =&gt; "340282366920938463463374607431768211455"</span>
</code></pre></div></div>

<p>Notice it returns strings, which is one of the ways calc represents
arbitrary precision numbers. For arguments, it accepts regular Elisp
numbers and strings just like this function returns. The implicit
radix is 10. To explicitly set the radix, prefix the number with the
radix and <code class="language-plaintext highlighter-rouge">#</code>. This is the same as in the user interface of calc. For
example:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"16#deadbeef"</span><span class="p">)</span>
<span class="c1">;; =&gt; "3735928559"</span>
</code></pre></div></div>

<p>The second argument (optional) to <code class="language-plaintext highlighter-rouge">calc-eval</code> adjusts its behavior.
Given <code class="language-plaintext highlighter-rouge">nil</code>, it simply evaluates the string and returns the result.
The manual documents the different options, but the only other
relevant option for RSA is the symbol <code class="language-plaintext highlighter-rouge">pred</code>, which asks it to return
a boolean “predicate” result.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"$1 &lt; $2"</span> <span class="ss">'pred</span> <span class="s">"4000"</span> <span class="s">"5000"</span><span class="p">)</span>
<span class="c1">;; =&gt; t</span>
</code></pre></div></div>

<h3 id="generating-primes">Generating primes</h3>

<p>RSA is founded on the difficulty of factoring large composites with
large factors. Generating an RSA keypair starts with generating two
prime numbers, <code class="language-plaintext highlighter-rouge">p</code> and <code class="language-plaintext highlighter-rouge">q</code>, and using these primes to compute two
mathematically related composite numbers.</p>

<p>calc has a function <code class="language-plaintext highlighter-rouge">calc-next-prime</code> for finding the next prime
number following any arbitrary number. It uses a probabilistic
primarily test — the <del>Fermat</del> Miller-Rabin primality test
— to efficiently test large integers. It increments the input until
it finds a result that passes enough iterations of the primality test.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"nextprime($1)"</span> <span class="no">nil</span> <span class="s">"100000000000000000"</span><span class="p">)</span>
<span class="c1">;; =&gt; "100000000000000003"</span>
</code></pre></div></div>

<p>So to generate a random n-bit prime, first generate a random n-bit
number and then increment it until a prime number is found.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Generate a 128-bit prime, 10 iterations (0.000084% error rate)</span>
<span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"nextprime(random(2^$1), 10)"</span> <span class="no">nil</span> <span class="mi">128</span><span class="p">)</span>
<span class="s">"111618319598394878409654851283959105123"</span>
</code></pre></div></div>

<p>Unfortunately calc’s <code class="language-plaintext highlighter-rouge">random</code> function is based on Emacs’ <code class="language-plaintext highlighter-rouge">random</code>
function, which is entirely unsuitable for cryptography. In the real
implementation I read n bits from <code class="language-plaintext highlighter-rouge">/dev/urandom</code> to generate an n-bit
number.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">with-temp-buffer</span>
  <span class="p">(</span><span class="nv">set-buffer-multibyte</span> <span class="no">nil</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">call-process</span> <span class="s">"head"</span> <span class="s">"/dev/urandom"</span> <span class="no">t</span> <span class="no">nil</span> <span class="s">"-c"</span> <span class="p">(</span><span class="nb">format</span> <span class="s">"%d"</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">bits</span> <span class="mi">8</span><span class="p">)))</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">f</span> <span class="p">(</span><span class="nv">apply-partially</span> <span class="nf">#'</span><span class="nb">format</span> <span class="s">"%02x"</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">concat</span> <span class="s">"16#"</span> <span class="p">(</span><span class="nv">mapconcat</span> <span class="nv">f</span> <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)</span> <span class="s">""</span><span class="p">))))</span>
</code></pre></div></div>

<p>(Note: <code class="language-plaintext highlighter-rouge">/dev/urandom</code> <em>is</em> the right choice. There’s <a href="http://www.2uo.de/myths-about-urandom/">no reason to use
<code class="language-plaintext highlighter-rouge">/dev/random</code> for generating keys</a>.)</p>

<h3 id="computing-e-and-d">Computing e and d</h3>

<p>From here the code just follows along from the Wikipedia article.
After generating the primes <code class="language-plaintext highlighter-rouge">p</code> and <code class="language-plaintext highlighter-rouge">q</code>, two composites are computed,
<code class="language-plaintext highlighter-rouge">n = p * q</code> and <code class="language-plaintext highlighter-rouge">i = (p - 1) * (q - 1)</code>. Lacking any reason to do
otherwise, I chose 65,537 for the public exponent <code class="language-plaintext highlighter-rouge">e</code>.</p>

<p>The function <code class="language-plaintext highlighter-rouge">rsa--inverse</code> is just a straight Emacs Lisp + calc
implementation of the extended Euclidean algorithm from <a href="https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm">the Wikipedia
article pseudocode</a>, computing <code class="language-plaintext highlighter-rouge">d ≡ e^-1 (mod i)</code>. It’s not much
use sharing it here, so take a look at the repository if you’re
curious.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">rsa-generate-keypair</span> <span class="p">(</span><span class="nv">bits</span><span class="p">)</span>
  <span class="s">"Generate a fresh RSA keypair plist of BITS length."</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">p</span> <span class="p">(</span><span class="nv">rsa-generate-prime</span> <span class="p">(</span><span class="nb">+</span> <span class="mi">1</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">bits</span> <span class="mi">2</span><span class="p">))))</span>
         <span class="p">(</span><span class="nv">q</span> <span class="p">(</span><span class="nv">rsa-generate-prime</span> <span class="p">(</span><span class="nb">+</span> <span class="mi">1</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">bits</span> <span class="mi">2</span><span class="p">))))</span>
         <span class="p">(</span><span class="nv">n</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"$1 * $2"</span> <span class="no">nil</span> <span class="nv">p</span> <span class="nv">q</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">i</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"($1 - 1) * ($2 - 1)"</span> <span class="no">nil</span> <span class="nv">p</span> <span class="nv">q</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">e</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"2^16+1"</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">d</span> <span class="p">(</span><span class="nv">rsa--inverse</span> <span class="nv">e</span> <span class="nv">i</span><span class="p">)))</span>
    <span class="o">`</span><span class="p">(</span><span class="ss">:public</span>  <span class="p">(</span><span class="ss">:n</span> <span class="o">,</span><span class="nv">n</span> <span class="ss">:e</span> <span class="o">,</span><span class="nv">e</span><span class="p">)</span> <span class="ss">:private</span> <span class="p">(</span><span class="ss">:n</span> <span class="o">,</span><span class="nv">n</span> <span class="ss">:d</span> <span class="o">,</span><span class="nv">d</span><span class="p">))))</span>
</code></pre></div></div>

<p>The public key is <code class="language-plaintext highlighter-rouge">n</code> and <code class="language-plaintext highlighter-rouge">e</code> and the private key is <code class="language-plaintext highlighter-rouge">n</code> and <code class="language-plaintext highlighter-rouge">d</code>. From
here we can compute and verify cryptographic signatures.</p>

<h3 id="signatures">Signatures</h3>

<p>To compute signature <code class="language-plaintext highlighter-rouge">s</code> of an integer <code class="language-plaintext highlighter-rouge">m</code> (where <code class="language-plaintext highlighter-rouge">m &lt; n</code>), compute
<code class="language-plaintext highlighter-rouge">s ≡ m^d (mod n)</code>. I chose the right-to-left binary method, again
straight from <a href="https://en.wikipedia.org/wiki/Modular_exponentiation#Right-to-left_binary_method">the Wikipedia pseudocode</a> (lazy!). I’ll share this
one since it’s short. The backslash denotes integer division.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">rsa--mod-pow</span> <span class="p">(</span><span class="nv">base</span> <span class="nv">exponent</span> <span class="nv">modulus</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">result</span> <span class="mi">1</span><span class="p">))</span>
    <span class="p">(</span><span class="nb">setf</span> <span class="nv">base</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"$1 % $2"</span> <span class="no">nil</span> <span class="nv">base</span> <span class="nv">modulus</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">while</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"$1 &gt; 0"</span> <span class="ss">'pred</span> <span class="nv">exponent</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"$1 % 2 == 1"</span> <span class="ss">'pred</span> <span class="nv">exponent</span><span class="p">)</span>
        <span class="p">(</span><span class="nb">setf</span> <span class="nv">result</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"($1 * $2) % $3"</span> <span class="no">nil</span> <span class="nv">result</span> <span class="nv">base</span> <span class="nv">modulus</span><span class="p">)))</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="nv">exponent</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"$1 \\ 2"</span> <span class="no">nil</span> <span class="nv">exponent</span><span class="p">)</span>
            <span class="nv">base</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"($1 * $1) % $2"</span> <span class="no">nil</span> <span class="nv">base</span> <span class="nv">modulus</span><span class="p">)))</span>
    <span class="nv">result</span><span class="p">))</span>
</code></pre></div></div>

<p>Verifying the signature is the same process, but with the public key’s
<code class="language-plaintext highlighter-rouge">e</code>: <code class="language-plaintext highlighter-rouge">m ≡ s^e (mod n)</code>. If the signature is valid, <code class="language-plaintext highlighter-rouge">m</code> will be
recovered. In theory, only someone who knows <code class="language-plaintext highlighter-rouge">d</code> can feasibly compute
<code class="language-plaintext highlighter-rouge">s</code> from <code class="language-plaintext highlighter-rouge">m</code>. If <code class="language-plaintext highlighter-rouge">n</code> is <a href="http://crypto.stackexchange.com/a/5942">small enough to factor</a>, revealing
<code class="language-plaintext highlighter-rouge">p</code> and <code class="language-plaintext highlighter-rouge">q</code>, then <code class="language-plaintext highlighter-rouge">d</code> can be feasibly recomputed from the public key.
So mind your Ps and Qs.</p>

<p>So that leaves one problem: generally users want to sign strings and
files and such, not integers. A hash function is used to reduce an
arbitrary quantity of data into an integer suitable for signing. Emacs
comes with a bunch of them, accessible through <code class="language-plaintext highlighter-rouge">secure-hash</code>. It
hashes strings and buffers.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">secure-hash</span> <span class="ss">'sha224</span> <span class="s">"Hello, world!"</span><span class="p">)</span>
<span class="c1">;; =&gt; "8552d8b7a7dc5476cb9e25dee69a8091290764b7f2a64fe6e78e9568"</span>
</code></pre></div></div>

<p>Since the result is hexadecimal, just prefix <code class="language-plaintext highlighter-rouge">16#</code> to turn it into a
calc integer.</p>

<p>Here’s the signature and verification functions. Any string or buffer
can be signed.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">rsa-sign</span> <span class="p">(</span><span class="nv">private-key</span> <span class="nv">object</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">n</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">private-key</span> <span class="ss">:n</span><span class="p">))</span>
        <span class="p">(</span><span class="nv">d</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">private-key</span> <span class="ss">:d</span><span class="p">))</span>
        <span class="p">(</span><span class="nv">hash</span> <span class="p">(</span><span class="nv">concat</span> <span class="s">"16#"</span> <span class="p">(</span><span class="nv">secure-hash</span> <span class="ss">'sha384</span> <span class="nv">object</span><span class="p">))))</span>
    <span class="c1">;; truncate hash such that hash &lt; n</span>
    <span class="p">(</span><span class="nv">while</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"$1 &gt; $2"</span> <span class="ss">'pred</span> <span class="nv">hash</span> <span class="nv">n</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="nv">hash</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"$1 \\ 2"</span> <span class="no">nil</span> <span class="nv">hash</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">rsa--mod-pow</span> <span class="nv">hash</span> <span class="nv">d</span> <span class="nv">n</span><span class="p">)))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">rsa-verify</span> <span class="p">(</span><span class="nv">public-key</span> <span class="nv">object</span> <span class="nv">sig</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">n</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">public-key</span> <span class="ss">:n</span><span class="p">))</span>
        <span class="p">(</span><span class="nv">e</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">public-key</span> <span class="ss">:e</span><span class="p">))</span>
        <span class="p">(</span><span class="nv">hash</span> <span class="p">(</span><span class="nv">concat</span> <span class="s">"16#"</span> <span class="p">(</span><span class="nv">secure-hash</span> <span class="ss">'sha384</span> <span class="nv">object</span><span class="p">))))</span>
    <span class="c1">;; truncate hash such that hash &lt; n</span>
    <span class="p">(</span><span class="nv">while</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"$1 &gt; $2"</span> <span class="ss">'pred</span> <span class="nv">hash</span> <span class="nv">n</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="nv">hash</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"$1 \\ 2"</span> <span class="no">nil</span> <span class="nv">hash</span><span class="p">)))</span>
    <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">result</span> <span class="p">(</span><span class="nv">rsa--mod-pow</span> <span class="nv">sig</span> <span class="nv">e</span> <span class="nv">n</span><span class="p">)))</span>
      <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"$1 == $2"</span> <span class="ss">'pred</span> <span class="nv">result</span> <span class="nv">hash</span><span class="p">))))</span>
</code></pre></div></div>

<p>Note the hash truncation step. If this is actually necessary, then
your <code class="language-plaintext highlighter-rouge">n</code> is <em>very</em> easy to factor! It’s in there since this is just a
toy and I want it to work with small keys.</p>

<h3 id="putting-it-all-together">Putting it all together</h3>

<p>Here’s the whole thing in action with an extremely small, 128-bit key.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">message</span> <span class="s">"hello, world!"</span><span class="p">)</span>

<span class="p">(</span><span class="nb">setf</span> <span class="nv">keypair</span> <span class="p">(</span><span class="nv">rsa-generate-keypair</span> <span class="mi">128</span><span class="p">))</span>
<span class="c1">;; =&gt; (:public  (:n "74924929503799951536367992905751084593"</span>
<span class="c1">;;               :e "65537")</span>
<span class="c1">;;     :private (:n "74924929503799951536367992905751084593"</span>
<span class="c1">;;               :d "36491277062297490768595348639394259869"))</span>

<span class="p">(</span><span class="nb">setf</span> <span class="nv">sig</span> <span class="p">(</span><span class="nv">rsa-sign</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">keypair</span> <span class="ss">:private</span><span class="p">)</span> <span class="nv">message</span><span class="p">))</span>
<span class="c1">;; =&gt; "31982247477262471348259501761458827454"</span>

<span class="p">(</span><span class="nv">rsa-verify</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">keypair</span> <span class="ss">:public</span><span class="p">)</span> <span class="nv">message</span> <span class="nv">sig</span><span class="p">)</span>
<span class="c1">;; =&gt; t</span>

<span class="p">(</span><span class="nv">rsa-verify</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">keypair</span> <span class="ss">:public</span><span class="p">)</span> <span class="p">(</span><span class="nv">capitalize</span> <span class="nv">message</span><span class="p">)</span> <span class="nv">sig</span><span class="p">)</span>
<span class="c1">;; =&gt; nil</span>
</code></pre></div></div>

<p>Each of these operations took less than a second. For larger,
secure-length keys, this implementation is painfully slow. For
example, generating a 2048-bit key takes my laptop about half an hour,
and computing a signature with that key (any size message) takes about
a minute. That’s probably a little too slow for, say, signing ELPA
packages.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Counting Processor Cores in Emacs</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2015/10/14/"/>
    <id>urn:uuid:dbfba1a0-b3af-356d-4d01-96917d622906</id>
    <updated>2015-10-14T03:17:16Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="c"/><category term="cpp"/>
    <content type="html">
      <![CDATA[<p>One of the great advantages of dependency analysis is parallelization.
Modern processors reorder instructions whose results don’t affect each
other. Compilers reorder expressions and statements to improve
throughput. Build systems know which outputs are inputs for other
targets and can choose any arbitrary build order within that
constraint. This article involves the last case.</p>

<p>The build system I use most often is GNU Make, either directly or
indirectly (Autoconf, CMake). It’s far from perfect, but it does what
I need. I almost always invoke it from within Emacs rather than in a
terminal. In fact, I do it so often that I’ve wrapped Emacs’ <code class="language-plaintext highlighter-rouge">compile</code>
command for rapid invocation.</p>

<p>I recently helped a co-worker set this set up for himself, so it had
me thinking about the problem again. The situation <a href="https://github.com/skeeto/.emacs.d">in my
config</a> is much more complicated than it needs to be, so I’ll
share a simplified version instead.</p>

<p>First bring in the usual goodies (we’re going to be making closures):</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: t; -*-</span>
<span class="p">(</span><span class="nb">require</span> <span class="ss">'cl-lib</span><span class="p">)</span>
</code></pre></div></div>

<p>We need a couple of configuration variables.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">quick-compile-command</span> <span class="s">"make -k "</span><span class="p">)</span>
<span class="p">(</span><span class="nb">defvar</span> <span class="nv">quick-compile-build-file</span> <span class="s">"Makefile"</span><span class="p">)</span>
</code></pre></div></div>

<p>Then a couple of interactive functions to set these on the fly. It’s
not strictly necessary, but I like giving each a key binding. I also
like having a history available via <code class="language-plaintext highlighter-rouge">read-string</code>, so I can switch
between a couple of different options with ease.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">quick-compile-set-command</span> <span class="p">(</span><span class="nv">command</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">interactive</span>
   <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="nv">read-string</span> <span class="s">"Command: "</span> <span class="nv">quick-compile-command</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="nv">quick-compile-command</span> <span class="nv">command</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">quick-compile-set-build-file</span> <span class="p">(</span><span class="nv">build-file</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">interactive</span>
   <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="nv">read-string</span> <span class="s">"Build file: "</span> <span class="nv">quick-compile-build-file</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="nv">quick-compile-build-file</span> <span class="nv">build-file</span><span class="p">))</span>
</code></pre></div></div>

<p>Now finally to the good part. Below, <code class="language-plaintext highlighter-rouge">quick-compile</code> is a
non-interactive function that returns an interactive closure ready to
be bound to any key I desire. It takes an optional target. This means
I don’t use the above <code class="language-plaintext highlighter-rouge">quick-compile-set-command</code> to choose a target,
only for setting other options. That will make more sense in a moment.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-defun</span> <span class="nv">quick-compile</span> <span class="p">(</span><span class="k">&amp;optional</span> <span class="p">(</span><span class="nv">target</span> <span class="s">""</span><span class="p">))</span>
  <span class="s">"Return an interaction function that runs `compile' for TARGET."</span>
  <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
    <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">save-buffer</span><span class="p">)</span>  <span class="c1">; so I don't get asked</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">default-directory</span>
            <span class="p">(</span><span class="nv">locate-dominating-file</span>
             <span class="nv">default-directory</span> <span class="nv">quick-compile-build-file</span><span class="p">)))</span>
      <span class="p">(</span><span class="k">if</span> <span class="nv">default-directory</span>
          <span class="p">(</span><span class="nb">compile</span> <span class="p">(</span><span class="nv">concat</span> <span class="nv">quick-compile-command</span> <span class="s">" "</span> <span class="nv">target</span><span class="p">))</span>
        <span class="p">(</span><span class="nb">error</span> <span class="s">"Cannot find %s"</span> <span class="nv">quick-compile-build-file</span><span class="p">)))))</span>
</code></pre></div></div>

<p>It traverses up (down?) the directory hierarchy towards root looking
for a Makefile — or whatever is set for <code class="language-plaintext highlighter-rouge">quick-compile-build-file</code>
— then invokes the build system there. I <a href="http://aegis.sourceforge.net/auug97.pdf">don’t believe in recursive
<code class="language-plaintext highlighter-rouge">make</code></a>.</p>

<p>So how do I put this to use? I clobber some key bindings I don’t
otherwise care about. A better choice might be the F-keys, but my
muscle memory is already committed elsewhere.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">global-set-key</span> <span class="p">(</span><span class="nv">kbd</span> <span class="s">"C-x c"</span><span class="p">)</span> <span class="p">(</span><span class="nv">quick-compile</span><span class="p">))</span> <span class="c1">; default target</span>
<span class="p">(</span><span class="nv">global-set-key</span> <span class="p">(</span><span class="nv">kbd</span> <span class="s">"C-x C"</span><span class="p">)</span> <span class="p">(</span><span class="nv">quick-compile</span> <span class="s">"clean"</span><span class="p">))</span>
<span class="p">(</span><span class="nv">global-set-key</span> <span class="p">(</span><span class="nv">kbd</span> <span class="s">"C-x t"</span><span class="p">)</span> <span class="p">(</span><span class="nv">quick-compile</span> <span class="s">"test"</span><span class="p">))</span>
<span class="p">(</span><span class="nv">global-set-key</span> <span class="p">(</span><span class="nv">kbd</span> <span class="s">"C-x r"</span><span class="p">)</span> <span class="p">(</span><span class="nv">quick-compile</span> <span class="s">"run"</span><span class="p">))</span>
</code></pre></div></div>

<p>Each of those invokes a different target without second guessing me.
Let me tell you, having “clean” at the tip of my fingers is wonderful.</p>

<h3 id="parallel-builds">Parallel Builds</h3>

<p>An extension common to many different <code class="language-plaintext highlighter-rouge">make</code> programs is <code class="language-plaintext highlighter-rouge">-j</code>, which
asks <code class="language-plaintext highlighter-rouge">make</code> to build targets in parallel where possible. These days
where multi-core machines are the norm, you nearly always want to use
this option, ideally set to the number of logical processor cores on
your system. It’s a huge time-saver.</p>

<p>My recent revelation was that my default build command could be
better: <code class="language-plaintext highlighter-rouge">make -k</code> is minimal. It should at least include <code class="language-plaintext highlighter-rouge">-j</code>, but
choosing an argument (number of processor cores) is a problem. Today I
use different machines with 2, 4, or 8 cores, so most of the time any
given number will be wrong. I could use a per-system configuration,
but I’d rather not. Unfortunately GNU Make will not automatically
detect the number of cores. That leaves the matter up to Emacs Lisp.</p>

<p>Emacs doesn’t currently have a built-in function that returns the
number of processor cores. I’ll need to reach into the operating
system to figure it out. My usual development environments are Linux,
Windows, and OpenBSD, so my solution should work on each. I’ve ranked
them by order of importance.</p>

<h4 id="number-of-cores-on-linux">Number of cores on Linux</h4>

<p>Linux has the <code class="language-plaintext highlighter-rouge">/proc</code> virtual filesystem in the fashion of Plan 9,
allowing different aspects of the system to be explored through the
standard filesystem API. The relevant file here is <code class="language-plaintext highlighter-rouge">/proc/cpuinfo</code>,
listing useful information about each of the system’s processors. To
get the number of processors, count the number of processor entries in
this file. I’ve wrapped it in <code class="language-plaintext highlighter-rouge">if-file-exists</code> so that it returns
<code class="language-plaintext highlighter-rouge">nil</code> on other operating systems instead of throwing an error.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nv">file-exists-p</span> <span class="s">"/proc/cpuinfo"</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">with-temp-buffer</span>
    <span class="p">(</span><span class="nv">insert-file-contents</span> <span class="s">"/proc/cpuinfo"</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">how-many</span> <span class="s">"^processor[[:space:]]+:"</span><span class="p">)))</span>
</code></pre></div></div>

<h4 id="number-of-cores-on-windows">Number of cores on Windows</h4>

<p>When I was first researching how to do this on Windows, I thought I
would need to invoke the <code class="language-plaintext highlighter-rouge">wmic</code> command line program and hope the
output could be parsed the same way on different versions of the
operating system and tool. However, it turns out the solution for
Windows is trivial. The environment variable <code class="language-plaintext highlighter-rouge">NUMBER_OF_PROCESSORS</code>
gives every process the answer for free. Being an environment
variable, it will need to be parsed.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">number-of-processors</span> <span class="p">(</span><span class="nv">getenv</span> <span class="s">"NUMBER_OF_PROCESSORS"</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">when</span> <span class="nv">number-of-processors</span>
    <span class="p">(</span><span class="nv">string-to-number</span> <span class="nv">number-of-processors</span><span class="p">)))</span>
</code></pre></div></div>

<h4 id="number-of-cores-on-bsd">Number of cores on BSD</h4>

<p>This seems to work the same across all the BSDs, including OS X,
though I haven’t yet tested it exhaustively. Invoke <code class="language-plaintext highlighter-rouge">sysctl</code>, which
returns an undecorated number to be parsed.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">with-temp-buffer</span>
  <span class="p">(</span><span class="nb">ignore-errors</span>
    <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nb">zerop</span> <span class="p">(</span><span class="nv">call-process</span> <span class="s">"sysctl"</span> <span class="no">nil</span> <span class="no">t</span> <span class="no">nil</span> <span class="s">"-n"</span> <span class="s">"hw.ncpu"</span><span class="p">))</span>
      <span class="p">(</span><span class="nv">string-to-number</span> <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)))))</span>
</code></pre></div></div>

<p>Also not complicated, but it’s the heaviest solution of the three.</p>

<h3 id="putting-it-all-together">Putting it all together</h3>

<p>Join all these together with <code class="language-plaintext highlighter-rouge">or</code>, call it <code class="language-plaintext highlighter-rouge">numcores</code>, and ta-da.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">quick-compile-command</span> <span class="p">(</span><span class="nb">format</span> <span class="s">"make -kj%d"</span> <span class="p">(</span><span class="nv">numcores</span><span class="p">)))</span>
</code></pre></div></div>

<p>Now <code class="language-plaintext highlighter-rouge">make</code> is invoked correctly on any system by default.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Emacs Autotetris Mode</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/10/19/"/>
    <id>urn:uuid:e76556be-ebeb-3f65-7041-bffbe2e19952</id>
    <updated>2014-10-19T21:45:53Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="interactive"/>
    <content type="html">
      <![CDATA[<p>For more than a decade now, Emacs has come with a built-in Tetris
clone, originally written by XEmacs’ Glynn Clements. Just run <code class="language-plaintext highlighter-rouge">M-x
tetris</code> any time you want to play. For anyone too busy to waste time
playing Tetris, earlier this year I wrote an autotetris-mode that will
play the Emacs game automatically.</p>

<ul>
  <li><a href="https://github.com/skeeto/autotetris-mode">https://github.com/skeeto/autotetris-mode</a></li>
</ul>

<p>Load the source, <code class="language-plaintext highlighter-rouge">autotetris-mode.el</code> and <code class="language-plaintext highlighter-rouge">M-x autotetris</code>. It will
start the built-in Tetris but make all the moves itself. It works best
when byte compiled.</p>

<p><img src="/img/diagram/tetris/screenshot.png" alt="" /></p>

<p>At the time I had read <a href="http://www.cs.cornell.edu/boom/1999sp/projects/tetris/">an article</a> and was interested in trying
my hand at my own Tetris AI. Like most things Emacs, the built-in
Tetris game is very hackable. It’s also pretty simple and easy to
understand. Rather than write my own I chose to build upon this one.</p>

<h3 id="heuristics">Heuristics</h3>

<p>It’s not a particularly strong AI. It doesn’t pay attention to the
next piece in queue, it doesn’t know the game’s basic shapes, and it
doesn’t try to maximize the score (clearing multiple rows at once).
The goal is to continue running for as long as possible. But since
it’s able to get to the point where the game is so fast that the AI is
unable to move pieces fast enough (it’s rate limited like a human
player), that means it’s good enough.</p>

<p>When a new piece appears at the top of the screen, the AI, in memory,
tries placing it in all possible positions and all possible
orientations. For each of these positions it runs a heuristic on the
resulting game state, summing five metrics. Each metric is scaled by a
hand-tuned weight to adjust its relative priority. Smaller is better,
so the position with the lowest score is selected.</p>

<h4 id="number-of-holes">Number of Holes</h4>

<p><img src="/img/diagram/tetris/holes.png" alt="" /></p>

<p>A hole is any open space that has a solid block above it, even if that
hole is accessible without passing through a solid block. Count these
holes.</p>

<h4 id="maximum-height">Maximum Height</h4>

<p><img src="/img/diagram/tetris/height.png" alt="" /></p>

<p>Add the height of the tallest column. Column height includes any holes
in the column. The game ends when a column touches the top of the
screen (or something like that), so this should be kept in check.</p>

<h4 id="mean-height">Mean Height</h4>

<p><img src="/img/diagram/tetris/mean.png" alt="" /></p>

<p>Add the mean height of all columns. The higher this is, the closer we
are to losing the game. Since each row will have at least one hole,
this will be a similar measure to the hole count.</p>

<h4 id="height-disparity">Height Disparity</h4>

<p><img src="/img/diagram/tetris/disparity.png" alt="" /></p>

<p>Add the difference between the shortest column height and the tallest
column height. If this number is large it means we’re not making
effective use of the playing area. It also discourages the AI from
getting into that annoying situation we all remember: when you
<em>really</em> need a 4x1 piece that never seems to come. Those are the
brief moments when I truly believe the version I’m playing has to be
rigged.</p>

<h4 id="surface-roughness">Surface Roughness</h4>

<p><img src="/img/diagram/tetris/surface.png" alt="" /></p>

<p>Take the root mean square of the column heights. A rougher surface
leaves fewer options when placing pieces. This measure will be similar
to the disparity measurement.</p>

<h3 id="emacs-specific-details">Emacs-specific Details</h3>

<p>With a position selected, the AI sends player inputs at a limited rate
to the game itself, moving the piece into place. This is done by
calling <code class="language-plaintext highlighter-rouge">tetris-move-right</code>, <code class="language-plaintext highlighter-rouge">tetris-move-left</code>, and
<code class="language-plaintext highlighter-rouge">tetris-rotate-next</code>, which, in the normal game, are bound to the
arrow keys.</p>

<p>The built-in tetris-mode isn’t quite designed for this kind of
extension, so it needs a little bit of help. I defined two pieces of
advice to create hooks. These hooks alert my AI to two specific events
in the game: the game start and a fresh, new piece.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defadvice</span> <span class="nv">tetris-new-shape</span> <span class="p">(</span><span class="nv">after</span> <span class="nv">autotetris-new-shape-hook</span> <span class="nv">activate</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">run-hooks</span> <span class="ss">'autotetris-new-shape-hook</span><span class="p">))</span>

<span class="p">(</span><span class="nv">defadvice</span> <span class="nv">tetris-start-game</span> <span class="p">(</span><span class="nv">after</span> <span class="nv">autotetris-start-game-hook</span> <span class="nv">activate</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">run-hooks</span> <span class="ss">'autotetris-start-game-hook</span><span class="p">))</span>
</code></pre></div></div>

<p>I talked before about <a href="/blog/2014/10/12/">the problems with global state</a>.
Fortunately, tetris-mode doesn’t store any game state in global
variables. It stores everything in buffer-local variables, which can
be exploited for use in the AI. To perform the “in memory” heuristic
checks, it creates a copy of the game state and manipulates the copy.
The copy is made by way of <code class="language-plaintext highlighter-rouge">clone-buffer</code> on the <code class="language-plaintext highlighter-rouge">*Tetris*</code> buffer.
The tetris-mode functions all work equally as well on the clone, so I
can use the existing game rules to properly place the next piece in
each available position. The game’s own rules take care of clearing
rows and checking for collisions for me. I wrote an
<code class="language-plaintext highlighter-rouge">autotetris-save-excursion</code> function to handle the messy details.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmacro</span> <span class="nv">autotetris-save-excursion</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">body</span><span class="p">)</span>
  <span class="s">"Restore tetris game state after BODY completes."</span>
  <span class="p">(</span><span class="k">declare</span> <span class="p">(</span><span class="nv">indent</span> <span class="nb">defun</span><span class="p">))</span>
  <span class="o">`</span><span class="p">(</span><span class="nv">with-current-buffer</span> <span class="nv">tetris-buffer-name</span>
     <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">autotetris-saved</span> <span class="p">(</span><span class="nv">clone-buffer</span> <span class="s">"*Tetris-saved*"</span><span class="p">)))</span>
       <span class="p">(</span><span class="k">unwind-protect</span>
           <span class="p">(</span><span class="nv">with-current-buffer</span> <span class="nv">autotetris-saved</span>
             <span class="p">(</span><span class="nv">kill-local-variable</span> <span class="ss">'kill-buffer-hook</span><span class="p">)</span>
             <span class="o">,@</span><span class="nv">body</span><span class="p">)</span>
         <span class="p">(</span><span class="nv">kill-buffer</span> <span class="nv">autotetris-saved</span><span class="p">)))))</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">kill-buffer-hook</code> variable is also cloned, but I don’t want
tetris-mode to respond to the clone being killed, so I clear out the
hook.</p>

<p>That’s basically all there is to it! While watching it feels like it’s
making dumb mistakes, not placing pieces in optimal positions, but it
recovers well from these situations almost every time, so it must know
what it’s doing. Currently it’s a better player than me, which is <a href="/blog/2011/08/24/">my
rule-of-thumb</a> for calling an AI successful.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Emacs Unicode Pitfalls</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/06/13/"/>
    <id>urn:uuid:1ebd6db9-a40e-3433-dc30-192cf133b2f0</id>
    <updated>2014-06-13T05:58:34Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>GNU Emacs is seven years older than Unicode. Support for Unicode had
to be added relatively late in Emacs’ existence. This means Emacs has
existed longer without Unicode support (16 years) than with it (14
years). Despite this, Emacs has excellent Unicode support. It feels as
if it was there the whole time.</p>

<p>However, as a natural result of Unicode covering all sorts of edge
cases for every known human language, there are pitfalls and
complications. As a <em>user</em> of Emacs, you’re not particularly affected
by these, but extension developers might run into trouble while
handling Emacs character-oriented data structures: strings and
buffers.</p>

<p>In this article I’ll go over Elisp’s Unicode surprises. I’ve been
caught by some of these myself. In fact, as a result of writing this
article, I’ve discovered subtle encoding bugs in some of my own
extensions. None of these pitfalls are Emacs’ fault. They’re just the
result of complexities of natural language.</p>

<h3 id="unicode-and-code-points">Unicode and Code Points</h3>

<p>First, there are excellent materials online for learning Unicode. I
recommend starting with <a href="http://www.cl.cam.ac.uk/~mgk25/unicode.html">UTF-8 and Unicode FAQ for Unix/Linux</a>.
There’s no reason for me to repeat all this information here, but I’ll
attempt to quickly summarize it.</p>

<p>Unicode maps <em>code points</em> (integers) to specific characters, along
with a standard name. As of this writing, Unicode defines over 110,000
characters. For backwards compatibility, the first 128 code points are
mapped to ASCII. This trend continues for other character standards,
like Latin-1.</p>

<p>In Emacs, Unicode characters are entered into a buffer with <code class="language-plaintext highlighter-rouge">C-x 8
RET</code> (<code class="language-plaintext highlighter-rouge">insert-char</code>). You can enter either the official name of the
character (e.g. “GREEK SMALL LETTER PI” for π) or the hexadecimal code
point. Outside of Emacs it depends on the application, but <code class="language-plaintext highlighter-rouge">C-S-u</code>
followed by the hexadecimal code works for most of the applications I
care about.</p>

<h4 id="encodings">Encodings</h4>

<p>The Unicode standard also describes several methods for encoding
sequences of code points into sequences of bytes. Obviously a selection
of 110,000 characters cannot be encoded with one byte per letter, so
these are multibyte encodings. The two most popular encodings are
probably UTF-8 and UTF-16.</p>

<p>UTF-8 was designed to be backwards compatible with ASCII, Unix, and
existing C APIs (null-terminated C strings). The first 128 code points
are encoded directly as a single byte. Every other character is
encoded with two to six bytes, with the highest bit of each byte set
to 1. This ensures that no part of a multibyte character will be
interpreted as ASCII, nor will it contain a null (0). The latter means
that C programs and C APIs can handle UTF-8 strings with few or no
changes. Most importantly, every ASCII encoded file is automatically a
UTF-8 encoded file.</p>

<p>UTF-16 encodes all the characters from the <em>Basic Multilingual Plane</em>
(BMP) with two bytes. Even the original ASCII characters get two bytes
(<em>16</em> bits). The BMP covers virtually all modern languages and is
generally all you’ll ever practically need. However, this doesn’t
include the important <a href="http://www.fileformat.info/info/unicode/char/1f379/index.htm">TROPICAL DRINK</a> or <a href="http://www.fileformat.info/info/unicode/char/1F4A9/index.htm">PILE OF POO</a>
characters from the supplemental (“astral”) plane. If you need to use
these characters in UTF-16, you’re going to run into problems:
characters outside the BMP don’t fit in two bytes. To accommodate
these characters, UTF-16 uses <em>surrogate pairs</em>: these characters are
encoded with two 16-bit units.</p>

<p>Because of this last point, <strong>UTF-16 offers no practical advantages
over UTF-8</strong>. Its <a href="http://www.utf8everywhere.org/">existence was probably a big mistake</a>. You
can’t do constant-time character lookup because you have to scan for
surrogate pairs. It’s not backwards compatible and cannot be stored in
null-terminated strings. In both Java and JavaScript, it leads to the
awkward situation where the “length” of a string is not the number of
characters, code points, or even bytes. Worst of all, <a href="https://speakerdeck.com/mathiasbynens/hacking-with-unicode?slide=114">it has serious
security implications</a>. New applications should avoid it
whenever possible.</p>

<h3 id="emacs-and-utf-8">Emacs and UTF-8</h3>

<p><strong>Emacs internally stores all text as UTF-8.</strong> This was an excellent
choice! When text leaves Emacs, such as writing to a file or to a
process, Emacs automatically converts it to the coding system
configured for that particular file or process. When it accepts text
from a file or process, it either converts it to UTF-8 or preserves it
as raw bytes.</p>

<p>There are two modes for this in Emacs: unibyte and multibyte. Unibyte
strings/buffers are just raw bytes. They have constant access O(1)
time but can only hold single-byte values. The <a href="/blog/2014/01/04/">byte-code compiler
outputs unibyte strings</a>.</p>

<p>Multibyte strings/buffers hold UTF-8 encoded code points. Character
access is O(n) because the string/buffer has to be scanned to count
characters.</p>

<p>The actual encoding is rarely relevant because there’s little way (and
need) to access it directly. Emacs automatically converts text as
needed when it leaves Emacs and arrives in Emacs, so there’s no need
to know the internal encoding. If you <em>really</em> want to see it anyway,
you can use <code class="language-plaintext highlighter-rouge">string-as-unibyte</code> to get a copy of a string with the
exact same bytes, but as a byte-string.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">string-as-unibyte</span> <span class="s">"π"</span><span class="p">)</span>
<span class="c1">;; =&gt; "\317\200"</span>
</code></pre></div></div>

<p>This can be reversed with <code class="language-plaintext highlighter-rouge">string-as-multibyte</code>), to change a unibyte
string holding UTF-8 encoded text back into a multibyte string. Note
that these functions are different than <code class="language-plaintext highlighter-rouge">string-to-unibyte</code> and
<code class="language-plaintext highlighter-rouge">string-to-multibyte</code>, which will attempt a conversion rather than
preserving the raw bytes.</p>

<p>The <code class="language-plaintext highlighter-rouge">length</code> and <code class="language-plaintext highlighter-rouge">buffer-size</code> functions always count characters in
multibyte and bytes in unibyte. Being UTF-8, there are no surrogate
pairs to worry about here. The <code class="language-plaintext highlighter-rouge">string-bytes</code> and <code class="language-plaintext highlighter-rouge">position-bytes</code>
functions return byte information for both multibyte and unibyte.</p>

<p>To specify a Unicode character in a string literal without using the
character directly, use <code class="language-plaintext highlighter-rouge">\uXXXX</code>. The <code class="language-plaintext highlighter-rouge">XXXX</code> is the hexadecimal code
point for the character and is always 4 digits long. For characters
outside the BMP, which won’t fit in four digits, use a capital U with
eight digits: <code class="language-plaintext highlighter-rouge">\UXXXXXXXX</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="s">"\u03C0"</span>
<span class="c1">;; =&gt; "π"</span>

<span class="s">"\U0001F4A9"</span>
<span class="c1">;; =&gt; "💩"  (PILE OF POO)</span>
</code></pre></div></div>

<p>Finally, Emacs extends Unicode with 256 additional “characters”
representing raw bytes. This allows raw bytes to be embedded
distinctly within UTF-8 sequences. For example, it’s used to
distinguish the code point U+0041 from the raw byte #x41. As far as I
can tell, this isn’t used very often.</p>

<h3 id="combining-characters">Combining Characters</h3>

<p>Some Unicode characters are defined as <em>combining characters</em>. These
characters modify the non-combining character that appears before it,
typically with accents or diacritical marks.</p>

<p>For example, the word “naïve” can be written as <em>six</em> characters as
<code class="language-plaintext highlighter-rouge">"nai\u0308ve"</code>. The fourth character, U+0308 (COMBINING DIAERESIS),
is a combining character that changes the “i” (U+0069 LATIN SMALL
LETTER I) into an umlaut character.</p>

<p>The most commonly accented characters have a code of their own. These
are called <em>precomposed characters</em>. This includes ï (U+00EF LATIN
SMALL LETTER I WITH DIAERESIS). This means “naïve” can also be written
as <em>five</em> characters as <code class="language-plaintext highlighter-rouge">"na\u00EFve"</code>.</p>

<h4 id="normalization">Normalization</h4>

<p>So what happens when comparing two different representations of the
same text? They’re not equal.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">string=</span> <span class="s">"nai\u0308ve"</span> <span class="s">"na\u00EFve"</span><span class="p">)</span>
<span class="c1">;; =&gt; nil</span>
</code></pre></div></div>

<p>To deal with situations like this, the Unicode standard defines four
different kinds of normalization. The two most important ones are NFC
(composition) and NFD (decomposition). The former uses precomposed
characters whenever possible and the latter breaks them apart. The
functions <code class="language-plaintext highlighter-rouge">ucs-normalize-NFC-string</code> and <code class="language-plaintext highlighter-rouge">ucs-normalize-NFD-string</code>
perform this operation.</p>

<p>Pitfall #1: <strong>Proper string comparison requires normalization.</strong> It
doesn’t matter which normalization you use (though NFD should be
slightly faster), you just need to use it consistently. Unfortunately
this can get tricky when using <code class="language-plaintext highlighter-rouge">equal</code> to compare complex data
structures with multiple strings.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">string=</span> <span class="p">(</span><span class="nv">ucs-normalize-NFD-string</span> <span class="s">"nai\u0308ve"</span><span class="p">)</span>
         <span class="p">(</span><span class="nv">ucs-normalize-NFD-string</span> <span class="s">"na\u00EFve"</span><span class="p">))</span>
<span class="c1">;; =&gt; t</span>
</code></pre></div></div>

<p>Emacs itself fails to do this. It doesn’t normalize strings before
interning them, which is probably a mistake. This means you can have
differently defined variables and functions with the same canonical
name.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nb">intern</span> <span class="s">"nai\u0308ve"</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">intern</span> <span class="s">"na\u00EFve"</span><span class="p">))</span>
<span class="c1">;; =&gt; nil</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">print-r</span><span class="err">é</span><span class="nv">sum</span><span class="err">é</span> <span class="p">()</span>
  <span class="s">"NFC-normalized form."</span>
  <span class="p">(</span><span class="nb">print</span> <span class="s">"I'm going to sabotage your team."</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">print-re</span><span class="err">́</span><span class="nv">sume</span><span class="err">́</span> <span class="p">()</span>
  <span class="s">"NFD-normalized form."</span>
  <span class="p">(</span><span class="nb">print</span> <span class="s">"I'd be a great asset to your team."</span><span class="p">))</span>

<span class="p">(</span><span class="nv">print-r</span><span class="err">é</span><span class="nv">sum</span><span class="err">é</span><span class="p">)</span>
<span class="c1">;; =&gt; "I'm going to sabotage your team."</span>
</code></pre></div></div>

<h4 id="string-width">String Width</h4>

<p>There are three ways to quantify multibyte text. These are often the
same value, but in some circumstances they can each be different.</p>

<ul>
  <li><em>length</em>: number of characters, including combining characters</li>
  <li><em>bytes</em>:  number of bytes in its UTF-8 encoding</li>
  <li><em>width</em>:  number of columns it would occupy in the current buffer</li>
</ul>

<p>Most of the time, one character is one column (a width of one). Some
characters, like combining characters, consume no columns. Many Asian
characters consume two columns (U+4000, 䀀). Tabs consume <code class="language-plaintext highlighter-rouge">tab-width</code>
columns, usually 8.</p>

<p>Generally, a string should have the same width regardless of which
whether it’s NFD or NFC. However, due to bugs and incomplete Unicode
support, this isn’t strictly true. For example, some combining
characters, such as U+20DD ⃝, won’t combine correctly in Emacs nor in
other applications.</p>

<p>Pitfall #2: <strong>Always measure text by width, not length, when laying
out a buffer</strong>. Width is measured with the <code class="language-plaintext highlighter-rouge">string-width</code> function.
This comes up when laying out tables in a buffer. The number of
characters that fit in a column depends on what those characters are.</p>

<p>Fortunately I accidentally got this right in <a href="/blog/2013/09/04/">Elfeed</a> because
I used the <code class="language-plaintext highlighter-rouge">format</code> function for layout. The <code class="language-plaintext highlighter-rouge">%s</code> directive operates
on width, as would be expected. However, this has the side effect that
the output of may <code class="language-plaintext highlighter-rouge">format</code> change depending on the current buffer!
Pitfall #3: <strong>Be mindful of the current buffer when using the format
function.</strong></p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">tab-width</span> <span class="mi">4</span><span class="p">))</span>
  <span class="p">(</span><span class="nb">length</span> <span class="p">(</span><span class="nb">format</span> <span class="s">"%.6s"</span> <span class="s">"\t"</span><span class="p">)))</span>
<span class="c1">;; =&gt; 1</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">tab-width</span> <span class="mi">8</span><span class="p">))</span>
  <span class="p">(</span><span class="nb">length</span> <span class="p">(</span><span class="nb">format</span> <span class="s">"%.6s"</span> <span class="s">"\t"</span><span class="p">)))</span>
<span class="c1">;; =&gt; 0</span>
</code></pre></div></div>

<h3 id="string-reversal">String Reversal</h3>

<p>Say you want to reverse a multibyte string. Simple, right?</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">reverse-string</span> <span class="p">(</span><span class="nb">string</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">concat</span> <span class="p">(</span><span class="nb">reverse</span> <span class="p">(</span><span class="nv">string-to-list</span> <span class="nb">string</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">reverse-string</span> <span class="s">"abc"</span><span class="p">)</span>
<span class="c1">;; =&gt; "cba"</span>
</code></pre></div></div>

<p>Wrong! The combining characters will get flipped around to the wrong
side of the character they’re meant to modify.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">reverse-string</span> <span class="s">"nai\u0308ve"</span><span class="p">)</span>
<span class="c1">;; =&gt; "ev̈ian"</span>
</code></pre></div></div>

<p>Pitfall #4: <strong><a href="https://github.com/mathiasbynens/esrever">Reversing Unicode strings is non-trivial</a>.</strong>
The <a href="http://rosettacode.org/wiki/Reverse_a_string">Rosetta Code</a> page is full of incorrect examples, and
<a href="/blog/2012/11/15/">I’m personally guilty</a> of this, too. The other day I
<a href="https://github.com/magnars/s.el/pull/58">submitted a patch to s.el</a> to correct its <code class="language-plaintext highlighter-rouge">s-reverse</code> function
for Unicode. If it’s accepted, you should never need to worry about
this.</p>

<h3 id="regular-expressions">Regular Expressions</h3>

<p>Regular expressions operate on code points. This means combining
characters are counted separately and the match may change depending
on how characters are composed. To avoid this, you might want to
consider NFC normalization before performing some kinds of regular
expressions.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Like string= from before:</span>
<span class="p">(</span><span class="nv">string-match-p</span>  <span class="s">"na\u00EFve"</span> <span class="s">"nai\u0308ve"</span><span class="p">)</span>
<span class="c1">;; =&gt; nil</span>

<span class="c1">;; The . only matches part of the composition</span>
<span class="p">(</span><span class="nv">string-match-p</span> <span class="s">"na.ve"</span> <span class="s">"nai\u0308ve"</span><span class="p">)</span>
<span class="c1">;; =&gt; nil</span>
</code></pre></div></div>

<p>Pitfall #5: <strong>Be mindful of combining characters when using regular
expressions.</strong> Prefer NFC normalization when dealing with regular
expressions.</p>

<p>Another potential problem is ranges, though this is quite uncommon.
Ranges of characters can be expressed in inside brackets, e.g.
<code class="language-plaintext highlighter-rouge">[a-zA-Z]</code>. If the range begins or ends with a decomposed combining
character you won’t get the proper range because its parts are
considered separately by the regular expression engine.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">match-weird</span> <span class="s">"[\u00E0-\u00F6]+"</span><span class="p">)</span>

<span class="p">(</span><span class="nv">string-match-p</span> <span class="nv">match-weird</span> <span class="s">"áâãäå"</span><span class="p">)</span>
<span class="c1">;; =&gt; 0  (successful match)</span>

<span class="p">(</span><span class="nv">string-match-p</span> <span class="p">(</span><span class="nv">ucs-normalize-NFD-string</span> <span class="nv">match-weird</span><span class="p">)</span> <span class="s">"áâãäå"</span><span class="p">)</span>
<span class="c1">;; =&gt; nil</span>
</code></pre></div></div>

<p>It’s <em>especially</em> important to keep all of this in mind when
sanitizing untrusted input, such as when using Emacs as a web server.
An attacker might use a denormalized or strange grapheme cluster to
bypass a filter.</p>

<h3 id="interacting-with-the-world">Interacting with the World</h3>

<p>Here’s a mistake I’ve made twice now. Emacs uses UTF-8 internally,
regardless of whatever encoding the original text came in. Pitfall #6:
<strong>When working with bytes of text, the counts may be different than
the original source of the text.</strong></p>

<p>For example, HTTP/1.1 introduced persistent connections. Before this,
a client connects to a server and asks for content. The server sends
the content and then closes the connection to signal the end of the
data. In HTTP/1.1, when <code class="language-plaintext highlighter-rouge">Connection: close</code> isn’t specified, the
server will instead send a <code class="language-plaintext highlighter-rouge">Content-Length</code> header indicating the
length of the content in bytes. The connection can then be re-used for
more requests, or, more importantly, pipelining requests.</p>

<p>The main problem is that HTTP headers usually have a different
encoding than the content body. Emacs is not prepared to handle
multiple encodings from a single source, so the only correct way to
talk HTTP with a network process is raw. My mistake was allowing Emacs
to do the UTF-8 conversion, then measuring the length of the content
in its UTF-8 encoding. This just happens to work fine about 99.9% of
the time since clients tend to speak UTF-8, or something like it,
anyway, but it’s not correct.</p>

<h3 id="further-reading">Further Reading</h3>

<p>A lot of this investigation was inspired by JavaScript’s and other
languages’ Unicode shortcomings.</p>

<ul>
  <li><a href="http://www.cl.cam.ac.uk/~mgk25/unicode.html">UTF-8 and Unicode FAQ for Unix/Linux</a></li>
  <li><a href="https://speakerdeck.com/mathiasbynens/hacking-with-unicode">Hacking with Unicode</a></li>
  <li><a href="https://github.com/mathiasbynens/jsesc">jsesc</a></li>
  <li><a href="http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#unicode">java.lang.Character Unicode Character Representations</a></li>
  <li><a href="http://www.gnu.org/software/emacs/manual/html_node/elisp/Strings-and-Characters.html">GNU Emacs Lisp Reference Manual: Strings and Characters</a></li>
</ul>

<p>Comparatively, Emacs Lisp has really great Unicode support. This isn’t
too surprising considering that it’s primary purpose is for
manipulating text.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Emacs Lisp Buffer Passing Style</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/05/27/"/>
    <id>urn:uuid:f61fa819-f174-3147-1bfe-3493c1c18bc6</id>
    <updated>2014-05-27T01:58:09Z</updated>
    <category term="elisp"/>
    <content type="html">
      <![CDATA[<p>Emacs Lisp strings are mutable, fixed-length character (multibyte) or
byte (unibyte) arrays. Any operation that would change its length
requires allocating a new string object. This is common in many
programming languages’ strings. Python, Java, and JavaScript go even
further, with strings being completely immutable.</p>

<p>In these languages, performing many string operations at a time,
especially with the <code class="language-plaintext highlighter-rouge">+=</code> operator, allocates many temporary strings.
It’s also awkward. For these situations, Java provides a class,
<a href="http://docs.oracle.com/javase/7/docs/api/java/lang/StringBuffer.html">StringBuilder</a>, so that these operations can be done with a
temporary, efficient, mutable data structure that will emit the final
string when complete.</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">java</span><span class="o">.</span><span class="na">util</span><span class="o">.</span><span class="na">Collection</span><span class="o">&lt;</span><span class="no">T</span><span class="o">&gt;</span> <span class="n">collection</span><span class="o">;</span>

<span class="kd">public</span> <span class="nc">String</span> <span class="nf">toString</span><span class="o">()</span> <span class="o">{</span>
    <span class="nc">StringBuilder</span> <span class="n">sb</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">StringBuilder</span><span class="o">();</span>
    <span class="k">for</span> <span class="o">(</span><span class="no">T</span> <span class="n">element</span> <span class="o">:</span> <span class="n">collection</span><span class="o">)</span> <span class="o">{</span>
        <span class="n">sb</span><span class="o">.</span><span class="na">append</span><span class="o">(</span><span class="n">element</span><span class="o">);</span>
    <span class="o">}</span>
    <span class="k">return</span> <span class="n">sb</span><span class="o">.</span><span class="na">toString</span><span class="o">();</span>
<span class="o">}</span>
</code></pre></div></div>

<p>In JavaScript a popular string building idiom is to use an array. Push
the components onto an array and join() the result.</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">function</span> <span class="nx">toString</span><span class="p">(</span><span class="nx">object</span><span class="p">)</span> <span class="p">{</span>
    <span class="kd">var</span> <span class="nx">output</span> <span class="o">=</span> <span class="p">[];</span>
    <span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">k</span> <span class="k">in</span> <span class="nx">object</span><span class="p">)</span> <span class="p">{</span>
        <span class="nx">output</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">k</span><span class="p">);</span>
        <span class="nx">output</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="dl">'</span><span class="s1"> -&gt; </span><span class="dl">'</span><span class="p">);</span>
        <span class="nx">output</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">object</span><span class="p">[</span><span class="nx">k</span><span class="p">]);</span>
        <span class="nx">output</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="dl">'</span><span class="se">\n</span><span class="dl">'</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="nx">output</span><span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="dl">''</span><span class="p">);</span>
<span class="p">}</span>

<span class="nx">toString</span><span class="p">({</span><span class="na">a</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="na">b</span><span class="p">:</span> <span class="mi">2</span><span class="p">});</span>
<span class="c1">// =&gt; "a -&gt; 1\nb -&gt; 2\n"</span>
</code></pre></div></div>

<h3 id="emacs-lisp">Emacs Lisp</h3>

<p>What character sequence data structure already exists in Elisp that’s
efficient at insert, update, and delete? Buffers, of course! I know
it’s easy to forget, but editing sequences of characters is <em>the</em>
primary purpose of Emacs, after all. To make use of a buffer as a
string builder, use one of my favorite macros: <code class="language-plaintext highlighter-rouge">with-temp-buffer</code>. I
like to combine this with setting <code class="language-plaintext highlighter-rouge">standard-output</code> so that all of the
printing functions go there.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">to-string</span> <span class="p">(</span><span class="nv">alist</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">with-temp-buffer</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">standard-output</span> <span class="p">(</span><span class="nv">current-buffer</span><span class="p">)))</span>
      <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">pair</span> <span class="nv">alist</span><span class="p">)</span>
        <span class="p">(</span><span class="nb">princ</span> <span class="p">(</span><span class="nv">cl-first</span> <span class="nv">pair</span><span class="p">))</span>
        <span class="p">(</span><span class="nb">princ</span> <span class="s">" -&gt; "</span><span class="p">)</span>
        <span class="p">(</span><span class="nb">princ</span> <span class="p">(</span><span class="nv">cl-second</span> <span class="nv">pair</span><span class="p">))</span>
        <span class="p">(</span><span class="nb">princ</span> <span class="s">"\n"</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)))</span>
</code></pre></div></div>

<p>Update: Jon O. pointed out that Emacs has a <code class="language-plaintext highlighter-rouge">with-output-to-string</code>
macro available to do this more concisely.</p>

<p>Internally Elisp buffers are <a href="http://en.wikipedia.org/wiki/Gap_buffer">gap buffers</a>, a rather simple data
structure where the data is split into two sequences with a “gap” in
between. Insertion and deletion occurs at the gap, which is slid up
and down the overall sequence. This makes gap buffers efficient for
making lots of edits localized in a single area, just as a human would
do while editing text.</p>

<p>Each character in a buffer is a full Unicode code point and can have
an arbitrary set of properties associated with it (font-lock-face,
read-only, nonstickiness, etc.). Along with inline image objects, this
makes buffers rich enough to display rendered HTML (to a limited
extent).</p>

<h3 id="the-catch">The Catch</h3>

<p>There’s an important caveat to using buffers as mutable strings:
they’re <a href="/blog/2014/01/27/">not managed by the garbage collector</a>. Each buffer
goes into the global buffer list, implemented internally as an
intrusive linked list. If a buffer is not on this list, it’s a dead
buffer.</p>

<p>Ultimately this makes buffer objects poor return values. It’s an
impedance mismatch. The caller has to be careful to free (“kill”) the
buffer. It’s easy to miss if an error is signaled. For example,
<code class="language-plaintext highlighter-rouge">url-retrieve</code> and <code class="language-plaintext highlighter-rouge">url-retrieve-synchronously</code> return a buffer with
the response from a web server. It’s not uncommon for Elisp programs
to leak these buffers during normal operation.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">with-current-buffer</span> <span class="p">(</span><span class="nv">url-retrieve-synchronously</span> <span class="nv">some-url</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">point</span><span class="p">)</span> <span class="nv">url-http-end-of-headers</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">prog1</span> <span class="p">(</span><span class="nv">json-read</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">kill-buffer</span><span class="p">)))</span>
</code></pre></div></div>

<p>If <code class="language-plaintext highlighter-rouge">json-read</code> fails, the buffer is leaked.</p>

<p>As a side note: alternatively you could use <a href="https://github.com/skeeto/elisp-finalize">my finalize
package</a> to associate the buffer with an object that
is subject to garbage collection. The buffer will be killed
immediately when the object is garbage collected.</p>

<h4 id="buffer-passing-style">Buffer Passing Style</h4>

<p>To deal with this, my preferred idiom is what I call <em>buffer-passing
style</em>. Rather than have the callee instantiate the buffer, the caller
instantiates the buffer and “passes” it implicitly as the <em>current
buffer</em>. The callee fills it with something. The caller should use
something like <code class="language-plaintext highlighter-rouge">with-temp-buffer</code> so that the buffer has a clean
life-cycle, fully managed by the caller.</p>

<p>Imagine instead of returning a buffer, <code class="language-plaintext highlighter-rouge">url-retrieve-synchronously</code>
puts the result in the current buffer instead of returning a buffer.
If anything goes wrong, the buffer will be automatically killed by
<code class="language-plaintext highlighter-rouge">with-temp-buffer</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">with-temp-buffer</span>
  <span class="p">(</span><span class="nv">url-retrieve-synchronously</span> <span class="nv">some-url</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">point</span><span class="p">)</span> <span class="nv">url-http-end-of-headers</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">json-read</span><span class="p">))</span>
</code></pre></div></div>

<p>Buffer-passing style is what I settled on for <a href="https://github.com/skeeto/emacs-web-server">simple-httpd</a>.
Servlets are called with the output buffer as the current buffer and
with <code class="language-plaintext highlighter-rouge">standard-output</code> set to this buffer. The servlet is only
responsible for filling this buffer with content. Thanks to
<code class="language-plaintext highlighter-rouge">process-send-region</code>, the content is never actually copied into a
string.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defservlet*</span> <span class="nb">search</span> <span class="ss">:application/json</span> <span class="p">(</span><span class="nv">q</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">princ</span> <span class="p">(</span><span class="nv">json-encode</span> <span class="p">(</span><span class="nv">search-results</span> <span class="nv">q</span><span class="p">))))</span>
</code></pre></div></div>

<p>I didn’t recognize buffer-passing style until much later. As a result,
far too much of simple-httpd is still string oriented when it
shouldn’t be.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>An Emacs Foreign Function Interface</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/04/26/"/>
    <id>urn:uuid:ba31fe59-f5b0-3603-b243-4bcae00aebcf</id>
    <updated>2014-04-26T16:25:51Z</updated>
    <category term="elisp"/><category term="emacs"/>
    <content type="html">
      <![CDATA[<p>For many years Richard Stallman (RMS) prohibited a foreign function
interface (FFI) in GNU Emacs. An FFI is an API for dynamically calling
native libraries at run-time, like the Java Native Interface (JNI). He
was concerned that people might use it to make proprietary extensions
to the popular editor. This was the same (paranoid) justification for
rejecting a package manager in Emacs for many years, that someone
might use it to distribute proprietary packages.</p>

<p>Fortunately, times have changed. RMS reevaluated his
<a href="http://lists.gnu.org/archive/html/emacs-devel/2010-03/msg00240.html">stances on FFI</a> and on package managers. Today Emacs comes with
a package manager (package.el), and there are multiple package
repositories with no proprietary packages in sight. Though, outside of
<a href="http://www.loveshack.ukfsn.org/emacs/dynamic-loading/">some unaccepted patches</a>, no significant progress has been
made to add an FFI.</p>

<p>A few weeks ago I did something about that by writing a package that
adds an FFI. It requires no patches or any other changes to Emacs
itself. Instead, it drives a subprocess running <a href="http://sourceware.org/libffi/">libffi</a>,
passing arguments and return values back and forth through a pipe, in
the spirit of <a href="/blog/2014/02/06/">EmacSQL</a>. It’s not as efficient as a built-in
API, but it could potentially be distributed through an ELPA
repository.</p>

<ul>
  <li><a href="https://github.com/skeeto/elisp-ffi">Emacs Lisp Foreign Function Interface</a></li>
</ul>

<p>The API is modeled loosely after <a href="http://julia.readthedocs.org/en/latest/manual/calling-c-and-fortran-code/">Julia’s elegant FFI</a>. A call
interface (CIF) doesn’t need to be prepared ahead of time. Provide all
the necessary information at the call site and the library takes care
of building and caching CIFs and handles for you.</p>

<h3 id="api-examples">API Examples</h3>

<p>The core function for the FFI is <code class="language-plaintext highlighter-rouge">ffi-call</code>. Here’s an example that
calls the system’s <code class="language-plaintext highlighter-rouge">srand()</code> and then <code class="language-plaintext highlighter-rouge">rand()</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; seed with 0</span>
<span class="p">(</span><span class="nv">ffi-call</span> <span class="no">nil</span> <span class="s">"srand"</span> <span class="nv">[:void</span> <span class="ss">:uint32]</span> <span class="mi">0</span><span class="p">)</span>
<span class="c1">;; =&gt; :void</span>

<span class="p">(</span><span class="nv">ffi-call</span> <span class="no">nil</span> <span class="s">"rand"</span> <span class="nv">[:sint32]</span><span class="p">)</span>
<span class="c1">;; =&gt; 1102520059</span>
</code></pre></div></div>

<p>The first two arguments are similar to the first two arguments of
<code class="language-plaintext highlighter-rouge">dlsym()</code>. For <code class="language-plaintext highlighter-rouge">ffi-call</code>, the first argument is the library shared
object name. The back-end automatically takes care of obtaining a
handle on the library with <code class="language-plaintext highlighter-rouge">dlopen()</code>. In this case we’re accessing a
function that’s already in the main program, so we pass nil. This is
identical to passing NULL to <code class="language-plaintext highlighter-rouge">dlsym()</code>. In this FFI, nil always
corresponds to NULL.</p>

<p>The second argument is the function name, just like <code class="language-plaintext highlighter-rouge">dlsym()</code>’s second
argument.</p>

<p>The third argument is the function signature. It’s a vector of
keywords declaring the return value type followed by the types of each
argument. In this example, <code class="language-plaintext highlighter-rouge">srand()</code> returns nothing (void) and
accepts a single 32-bit unsigned argument, so the signature is
<code class="language-plaintext highlighter-rouge">[:void :uint32]</code>.</p>

<p>The remaining arguments are the native function arguments. I can keep
making the second FFI call (“rand”) to retrieve different numbers,
using the first FFI call (“srand”) to reset the sequence.</p>

<h4 id="using-a-library">Using a Library</h4>

<p>Here’s another example, loading <code class="language-plaintext highlighter-rouge">libm</code> and calling <code class="language-plaintext highlighter-rouge">cos</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; cos(1.2)</span>
<span class="p">(</span><span class="nv">ffi-call</span> <span class="s">"libm.so"</span> <span class="s">"cos"</span> <span class="nv">[:double</span> <span class="ss">:double]</span> <span class="mf">1.2</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.362357754476674</span>
</code></pre></div></div>

<p>The first time a library is used, the back-end creates a handle for it
with <code class="language-plaintext highlighter-rouge">dlopen()</code>. Further calls will reuse the handle, trying to be as
efficient as possible. Handles are never closed.</p>

<h4 id="pointers">Pointers</h4>

<p>Here are a couple of examples that use pointers. As stated before, nil
is used to pass a NULL pointer. Like the underlying libffi, the FFI
doesn’t care what <em>kind</em> of pointer you’re passing, just that it’s a
pointer, so it’s declared with <code class="language-plaintext highlighter-rouge">:pointer</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; time(NULL);</span>
<span class="p">(</span><span class="nv">ffi-call</span> <span class="no">nil</span> <span class="s">"time"</span> <span class="nv">[:uint64</span> <span class="ss">:pointer]</span> <span class="no">nil</span><span class="p">)</span>
<span class="c1">;; =&gt; 1396496875</span>
</code></pre></div></div>

<p>Strings are automatically copied to the subprocess, their lifetime
tied to the lifetime of the Elisp string (note: this detail is still
unimplemented). When used as arguments, they become pointers.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; getenv("DISPLAY")</span>
<span class="p">(</span><span class="nv">ffi-call</span> <span class="no">nil</span> <span class="s">"getenv"</span> <span class="nv">[:pointer</span> <span class="ss">:pointer]</span> <span class="s">"DISPLAY"</span><span class="p">)</span>
<span class="c1">;; =&gt; 0x7fffc13ceb29</span>

<span class="p">(</span><span class="nv">ffi-get-string</span> <span class="ss">'0x7fffc13ceb29</span><span class="p">)</span>
<span class="c1">;; =&gt; ":0"</span>
</code></pre></div></div>

<p>Pointers can be handled as values on the Elisp side. They’re
represented as symbols whose name is an address. In the above example,
<code class="language-plaintext highlighter-rouge">0x7fffc13ceb29</code> is one of these symbols. I would have preferred to
use a plain integer to represent pointers, but, because Elisp integers
are <em>tagged</em>, they’re guaranteed not to be wide enough for this. I
plan to add pointer operators to do pointer arithmetic on these
special pointer values.</p>

<p>The function <code class="language-plaintext highlighter-rouge">ffi-get-string</code> is used to retrieve the null-terminated
string referenced by a pointer. If the string returned by <code class="language-plaintext highlighter-rouge">getenv()</code>
needed to be freed (it doesn’t and shouldn’t), the FFI caller would
need to be careful to call <code class="language-plaintext highlighter-rouge">free()</code> as another FFI call.</p>

<h3 id="how-it-works-the-stack-machine">How It Works: The Stack Machine</h3>

<p>My goal is to keep the back-end as simple as possible. All resource
management is handled by Emacs, <a href="/blog/2014/01/27/">tied to garbage collection</a>.
For example, the pointer returned by <code class="language-plaintext highlighter-rouge">dlopen()</code> isn’t stored anywhere
in the subprocess. It’s passed to Emacs and managed there. To call a
function using the handle, the pointer is transmitted back to the
subprocess.</p>

<p>To keep it simple, the back-end is just a stack machine with a simple
human-readable bytecode. You can see the instruction set by looking at
the big switch statement in <code class="language-plaintext highlighter-rouge">ffi-glue.cc</code>. For example, to push a
signed 2-byte integer 237 onto the stack, send a <code class="language-plaintext highlighter-rouge">j</code> followed by an
ASCII representation of the number (terminated by a space if needed):
<code class="language-plaintext highlighter-rouge">j237</code>.</p>

<p>As usual, my assumption is that the Elisp printer and reader is faster
than any possible serialization I could implement within Elisp itself.
This also nicely sidesteps the byte-order issue.</p>

<p>The function signature is declared by pushing zeros of the
return/argument types onto the stack, with a special void “value” used
to communicate <code class="language-plaintext highlighter-rouge">void</code>. Once it’s all set up, the <code class="language-plaintext highlighter-rouge">C</code> instruction is
called, collapsing the signature into a CIF handle: a pointer for the
Elisp side to manage.</p>

<p>Pointers to raw strings of bytes are pushed onto the stack with the
<code class="language-plaintext highlighter-rouge">M</code> instruction. It pops the top integer on the stack to get the byte
count, reads that number of bytes from input into a buffer,
null-terminates the buffer in case it’s used as a string, and finally
puts a pointer to that buffer on the stack.</p>

<p>Calling functions is just a matter of pushing all the needed
information onto the stack, invoking libffi to magically call the
function, then popping the result off the stack. Popping a value
transmits it to Elisp.</p>

<h4 id="stack-machine-example">Stack Machine Example</h4>

<p>Here’s a concise example that calls <code class="language-plaintext highlighter-rouge">cos(1.2)</code> (assuming libm.so is
already linked). The actual Elisp-generated FFI bytecode doesn’t plan
things quite this way — particularly because it needs to keep track
of the various pointers involved — but this example keeps it simple.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>d1.2d0d0w1Cp0w3McosSco
</code></pre></div></div>

<p>You can run this example manually by executing the <code class="language-plaintext highlighter-rouge">ffi-glue</code> program
and pasting in that line as standard input. The result will be
printed.</p>

<ol>
  <li><code class="language-plaintext highlighter-rouge">d1.2</code> : Push a double, 1.2, onto the stack. This will be the
function argument.</li>
  <li><code class="language-plaintext highlighter-rouge">d0d0</code> : Push a couple of zero doubles onto the stack. This is our
function signature. It takes a double and returns a double.</li>
  <li><code class="language-plaintext highlighter-rouge">w1</code> : Push an unsigned 32-bit 1 onto the stack. Instructions that
use integers accept unsigned 32-bit integers. This 1 indicates that
our function accepts one argument.</li>
  <li><code class="language-plaintext highlighter-rouge">C</code> : create a CIF. The integer 1 and the two 0 doubles are
consumed and a pointer to a CIF is put on the stack. Elisp would
normally pop this off and save it for future use, but we’re going
to leave it there (and ultimately leak it in the example).</li>
  <li><code class="language-plaintext highlighter-rouge">p0</code> : Push a NULL onto the stack. <code class="language-plaintext highlighter-rouge">p</code> means push a pointer and 0
is a NULL pointer. This is our library handle. We’re assuming <code class="language-plaintext highlighter-rouge">cos</code>
will be in the main program.</li>
  <li><code class="language-plaintext highlighter-rouge">w3Mcos</code> : Put a pointer to the string “cos” into the stack. First
push on the number 3 (string length), then <code class="language-plaintext highlighter-rouge">M</code> to read from input,
then pass three bytes: “cos”. In our example, this buffer will be
leaked because we lose the buffer pointer.</li>
  <li><code class="language-plaintext highlighter-rouge">S</code> : Call <code class="language-plaintext highlighter-rouge">dlsym()</code> on the string and handle on top of the stack.
This consumes the top two values (NULL and “cos”), and pushes a
function handle on top of the stack. At this point the stack has
three values: 1.2, the CIF, and the function handle.</li>
  <li><code class="language-plaintext highlighter-rouge">c</code> : Call the function pointed to by the top of the stack. This
consumes the top pointer, the CIF below it, and the CIF indicates
how many more values to consume: just one in this case, since the
function takes one argument. The function’s return value is pushed
on the stack. If the function is <code class="language-plaintext highlighter-rouge">void</code>, the special void “value”
is pushed on the stack.</li>
  <li><code class="language-plaintext highlighter-rouge">o</code> : Pop the top stack value, sending it to Emacs. This is
what would be returned by <code class="language-plaintext highlighter-rouge">ffi-call</code>.</li>
</ol>

<p>Before I got the Elisp side of things going, I was testing out the
back-end by writing lots of little programs like this by hand.</p>

<h3 id="a-safe-ffi">A Safe FFI</h3>

<p>While using an FFI through a pipe is slow compared to a built-in FFI,
there is a distinct advantage. The FFI can never crash Emacs! Normally,
making calls to an FFI is <em>unsafe</em>. It allows the programmer to
violate normal language constraints. If the programmer misuses the
FFI, the whole process may crash or become corrupt. This will lose any
state held behind foreign interface, but Emacs will be safe.</p>

<p>In my package, the handle for the FFI Emacs subprocess is called the
<em>context</em>. A context is automatically established and bound to the
<code class="language-plaintext highlighter-rouge">ffi-context</code> global variable as needed. This context keeps track of
CIFs, string buffers, handles, and any other resources held by the
subprocess. If the subprocess dies, the context becomes meaningless
since the pointers it holds are dead.</p>

<h3 id="limitations">Limitations</h3>

<p>This FFI package is about 80% complete. It occasionally leaks memory
in the subprocess, it’s overly-sensitive to mis-typing, it doesn’t
manage stdin/stdout, it can’t inspect/modify structs, and it can’t set
up closures.</p>

<p>The last point, closures, would require some changes to the
interprocess communication. The purpose here would be to allow foreign
functions to call Elisp functions. The subprocess would need to be
able to initiate activity with Elisp.</p>

<p>Manipulating structs is complex, and even libffi has limited support
for working with them. It allows structs to be declared, but leaves
alignment and access up to the user to sort out. That’s where the
previously-mentioned pointer arithmetic comes into play.</p>

<p>Currently stdin, stdout, and stderr are problems, especially when I
was trying to write a test GTK application with Elisp. Any command
line junkie knows that GTK (and Qt) applications are ridiculously
noisy. It spews hundreds of lines of warnings and notifications as
part of its normal operation. This noise interferes with FFI
communication with Emacs. I need to figure out how to separate this
and get standard input/output/error to/from Emacs through separate
channels.</p>

<p>Like libffi, there are no guarantees about variadic function calls. It
should generally Just Work, but you can’t rely on it.</p>

<p>The whole thing will not work as well in 32-bit Emacs, where integers
are limited to a tiny 29 bits. For example, those <code class="language-plaintext highlighter-rouge">rand()</code> return
values will simply not fit. In the long run, this is probably the
single largest barrier to making the FFI work smoothly. It’s too easy
to run into large integer values.</p>

<p>Right now I consider it a proof of concept; an FFI <em>really can</em> be
done this way. I don’t have any particular uses in mind, and, outside
of the “cool factor,” I can’t actually think of any useful
applications. If a solid FFI already existed, I may have tried to use
it for EmacSQL rather than use this subprocess trick. My FFI <em>is</em>
probably mature enough to drive SQLite, so maybe this is the future of
EmacSQL.</p>

<p>If you can think of a good use for an Emacs FFI, please share it. I
need good test ideas.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Emacs Lisp Defstruct Namespace Convention</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/03/19/"/>
    <id>urn:uuid:624f92a9-6696-33bb-f955-d6c83da56fc1</id>
    <updated>2014-03-19T01:41:52Z</updated>
    <category term="emacs"/><category term="lisp"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>One of the drawbacks of Emacs Lisp is the lack of namespaces. Every
<code class="language-plaintext highlighter-rouge">defun</code>, <code class="language-plaintext highlighter-rouge">defvar</code>, <code class="language-plaintext highlighter-rouge">defcustom</code>, <code class="language-plaintext highlighter-rouge">defface</code>, <code class="language-plaintext highlighter-rouge">defalias</code>, <code class="language-plaintext highlighter-rouge">defstruct</code>,
and <code class="language-plaintext highlighter-rouge">defclass</code> establishes one or more names in the global scope. To
work around this, package authors are strongly encouraged to prefix
every global name with the name of its package. That way there should
never be a naming conflict between two different packages.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">mypackage-foo-limit</span> <span class="mi">10</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defvar</span> <span class="nv">mypackage--bar-counter</span> <span class="mi">0</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">mypackage-init</span> <span class="p">()</span>
  <span class="o">...</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">mypackage-compute-children</span> <span class="p">(</span><span class="nv">node</span><span class="p">)</span>
  <span class="o">...</span><span class="p">)</span>

<span class="p">(</span><span class="nb">provide</span> <span class="ss">'mypackage</span><span class="p">)</span>
</code></pre></div></div>

<p>While this has solved the problem for the time being, attaching the
package name to almost every identifier, including private function
and variable names, is quite cumbersome. Namespaces can <em>almost</em> be
hacked into the language by using multiple obarrays,
<a href="/blog/2011/08/18/">but symbols have internal linked lists</a> that prohibit
inclusion in multiple obarrays.</p>

<p>By convention, private names are given a double-dash after the
namespace. If a “bar counter” is an implementation detail that may
disappear in the future, it will be called <code class="language-plaintext highlighter-rouge">mypackage--bar-counter</code> to
warn users and other package authors not to rely on it.</p>

<p>There’s been a recent push to follow this namespace-prefix policy more
strictly, particularly with the depreciation of <code class="language-plaintext highlighter-rouge">cl</code> and introduction
of <code class="language-plaintext highlighter-rouge">cl-lib</code>. I suspect someday when namespaces are finally introduced,
packages with strictly clean namespaces with be at an advantage,
somehow automatically supported. <a href="http://nic.ferrier.me.uk/blog/2013_06/adding-namespaces-to-elisp">Nic Ferrier has proposed ideas</a>
for how to move forward on this.</p>

<h3 id="how-strict-are-we-talking">How strict are we talking?</h3>

<p>Over the last few years I’ve gotten much stricter in my own packages
when it comes to namespace prefixes. You can see the progression going
from <a href="https://github.com/skeeto/javadoc-lookup">javadoc-lookup</a> (2010) where I was completely sloppy about
it, to <a href="https://github.com/skeeto/emacsql">EmacSQL</a> (2014) where every single global identifier
is meticulously prefixed.</p>

<p>For a time I considered names such as <code class="language-plaintext highlighter-rouge">make-*</code> and <code class="language-plaintext highlighter-rouge">with-*</code> to be
exceptions to the rule, since these names are idioms inherited from
Common Lisp. The namespace comes <em>after</em> the expected prefix. I’ve
changed my mind about this, which has caused me to change my usage of
<code class="language-plaintext highlighter-rouge">defstruct</code> (now <code class="language-plaintext highlighter-rouge">cl-defstruct</code>).</p>

<p>Just as in Common Lisp, by default <code class="language-plaintext highlighter-rouge">cl-defstruct</code> defines a
constructor starting with <code class="language-plaintext highlighter-rouge">make-*</code>. This is fine in Common Lisp, where
it’s a package-private function by default, but in Emacs Lisp this
pollutes the global namespace.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'cl-lib</span><span class="p">)</span>

<span class="c1">;; Defines make-circle, circle-x, circle-y, circle-radius, circle-p</span>
<span class="p">(</span><span class="nv">cl-defstruct</span> <span class="nv">circle</span>
  <span class="nv">x</span> <span class="nv">y</span> <span class="nv">radius</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defvar</span> <span class="nv">unit-circle</span> <span class="p">(</span><span class="nv">make-circle</span> <span class="ss">:x</span> <span class="mf">0.0</span> <span class="ss">:y</span> <span class="mf">0.0</span> <span class="ss">:radius</span> <span class="mf">1.0</span><span class="p">))</span>

<span class="nv">unit-circle</span>
<span class="c1">;; =&gt; [cl-struct-circle 0.0 0.0 1.0]</span>

<span class="p">(</span><span class="nv">circle-radius</span> <span class="nv">unit-circle</span><span class="p">)</span>
<span class="c1">;; =&gt; 1.0</span>
</code></pre></div></div>

<p>This constructor isn’t namespace clean, so package authors should
avoid defstruct’s default. If the package is named <code class="language-plaintext highlighter-rouge">circle</code> then all
of the accessors are perfectly fine, though.</p>

<p>To fix this, I now use another, more recent Emacs Lisp idiom: name the
constructor <code class="language-plaintext highlighter-rouge">create</code>. That is, for the package <code class="language-plaintext highlighter-rouge">circle</code>, we desire
<code class="language-plaintext highlighter-rouge">circle-create</code>. To get this behavior from <code class="language-plaintext highlighter-rouge">cl-defstruct</code>, use the
<code class="language-plaintext highlighter-rouge">:constructor</code> option.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Clean!</span>
<span class="p">(</span><span class="nv">cl-defstruct</span> <span class="p">(</span><span class="nv">circle</span> <span class="p">(</span><span class="ss">:constructor</span> <span class="nv">circle-create</span><span class="p">))</span>
  <span class="nv">x</span> <span class="nv">y</span> <span class="nv">radius</span><span class="p">)</span>

<span class="p">(</span><span class="nv">circle-create</span> <span class="ss">:x</span> <span class="mi">0</span> <span class="ss">:y</span> <span class="mi">0</span> <span class="ss">:radius</span> <span class="mi">1</span><span class="p">)</span>
<span class="c1">;; =&gt; [cl-struct-circle 0 0 1]</span>

<span class="p">(</span><span class="nb">provide</span> <span class="ss">'circle</span><span class="p">)</span>
</code></pre></div></div>

<p>This affords a new opportunity to craft a better constructor. Have
<code class="language-plaintext highlighter-rouge">cl-defstruct</code> define a private constructor, then manually write a
constructor with a nicer interface. It may also do additional work,
like enforce invariants or initialize dependent slots.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-defstruct</span> <span class="p">(</span><span class="nv">circle</span> <span class="p">(</span><span class="ss">:constructor</span> <span class="nv">circle--create</span><span class="p">))</span>
  <span class="nv">x</span> <span class="nv">y</span> <span class="nv">radius</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">circle-create</span> <span class="p">(</span><span class="nv">x</span> <span class="nv">y</span> <span class="nv">radius</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">circle</span> <span class="p">(</span><span class="nv">circle--create</span> <span class="ss">:x</span> <span class="nv">x</span> <span class="ss">:y</span> <span class="nv">y</span> <span class="ss">:radius</span> <span class="nv">radius</span><span class="p">)))</span>
    <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">&lt;</span> <span class="nv">radius</span> <span class="mi">0</span><span class="p">)</span>
        <span class="p">(</span><span class="nb">error</span> <span class="s">"must have non-negative radius"</span><span class="p">)</span>
      <span class="nv">circle</span><span class="p">)))</span>

<span class="p">(</span><span class="nv">circle-create</span> <span class="mi">0</span> <span class="mi">0</span> <span class="mi">1</span><span class="p">)</span>
<span class="c1">;; =&gt; [cl-struct-circle 0 0 1]</span>

<span class="p">(</span><span class="nv">circle-create</span> <span class="mi">0</span> <span class="mi">0</span> <span class="mi">-1</span><span class="p">)</span>
<span class="c1">;; error: "must have non-negative radius"</span>
</code></pre></div></div>

<p>This is now how I always use <code class="language-plaintext highlighter-rouge">cl-defstruct</code> in Emacs Lisp. It’s a tidy
convention that will probably become more common in the future.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  <entry>
    <title>Introducing EmacSQL</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/02/06/"/>
    <id>urn:uuid:af878773-296e-3411-593a-cc516856b832</id>
    <updated>2014-02-06T05:52:37Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>Yesterday I made the first official release of <a href="https://github.com/skeeto/emacsql">EmacSQL</a>, an
Emacs package I’ve been working on for the past few weeks. EmacSQL is
a high-level SQL database for Emacs. It primarily targets SQLite as a
back-end, but it also currently supports PostgreSQL and MySQL.</p>

<ul>
  <li><a href="https://github.com/skeeto/emacsql">https://github.com/skeeto/emacsql</a></li>
</ul>

<p>It’s <a href="http://melpa.milkbox.net/#/emacsql">available on MELPA</a> and is ready for immediate use. It
depends on the <a href="/blog/2014/01/27/">finalizers package</a> I added last week.</p>

<p>While there’s a non-Elisp component, SQLite, there are no special
requirements for the user to worry about. When the package’s Elisp is
compiled, if a C compiler is available it will use it to compile a
SQLite binary for EmacSQL. If not, it will later offer to download a
pre-built binary that I built. Ideally this makes the non-Elisp part
of EmacSQL completely transparent and users can pretend Emacs has a
built-in relational database.</p>

<p>The official SQLite command line shell is not used even if present,
and I’ll explain why below.</p>

<p>Just as <a href="/blog/2012/10/31/">Skewer</a> jump started my web development experience,
EmacSQL has been a crash course in SQL and relational databases.
Before starting this project I knew little about this topic and I’ve
gained a lot of appreciation for it in the process. Building an Emacs
extension is a very rapid way to dive into a new topic.</p>

<p>If you’re a total newb about this stuff like I was and want to learn
SQL for SQLite yourself, I highly recommend <a href="http://www.amazon.com/gp/product/0596521189/ref=as_li_qf_sp_asin_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0596521189&amp;linkCode=as2&amp;tag=nullprogram-20">Using SQLite</a>. It’s
a really solid introduction.</p>

<h3 id="high-level-sql-compiler">High-level SQL Compiler</h3>

<p>By “high-level” I mean that it goes beyond assembling strings
containing SQL code. In EmacSQL, statements are assembled from
s-expressions which, behind the scenes, are compiled into SQL using
some simple rules. This means if you already know SQL you should be
able to hit the ground running with EmacSQL. Here’s an example,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'emacsql</span><span class="p">)</span>

<span class="c1">;; Connect to the database, SQLite in this case:</span>
<span class="p">(</span><span class="nb">defvar</span> <span class="nv">db</span> <span class="p">(</span><span class="nv">emacsql-connect</span> <span class="s">"~/office.db"</span><span class="p">))</span>

<span class="c1">;; Create a table with 3 columns:</span>
<span class="p">(</span><span class="nv">emacsql</span> <span class="nv">db</span> <span class="nv">[:create-table</span> <span class="nv">patients</span>
             <span class="p">(</span><span class="nv">[name</span> <span class="p">(</span><span class="nv">id</span> <span class="nc">integer</span> <span class="ss">:primary-key</span><span class="p">)</span> <span class="p">(</span><span class="nv">weight</span> <span class="nb">float</span><span class="p">)</span><span class="nv">]</span><span class="p">)</span><span class="nv">]</span><span class="p">)</span>

<span class="c1">;; Insert a few rows:</span>
<span class="p">(</span><span class="nv">emacsql</span> <span class="nv">db</span> <span class="nv">[:insert</span> <span class="ss">:into</span> <span class="nv">patients</span>
             <span class="ss">:values</span> <span class="p">(</span><span class="nv">[</span><span class="s">"Jeff"</span> <span class="mi">1000</span> <span class="nv">184.2]</span> <span class="nv">[</span><span class="s">"Susan"</span> <span class="mi">1001</span> <span class="nv">118.9]</span><span class="p">)</span><span class="nv">]</span><span class="p">)</span>

<span class="c1">;; Query the database:</span>
<span class="p">(</span><span class="nv">emacsql</span> <span class="nv">db</span> <span class="nv">[:select</span> <span class="nv">[name</span> <span class="nv">id]</span>
             <span class="ss">:from</span> <span class="nv">patients</span>
             <span class="ss">:where</span> <span class="p">(</span><span class="nb">&lt;</span> <span class="nv">weight</span> <span class="mf">150.0</span><span class="p">)</span><span class="nv">]</span><span class="p">)</span>
<span class="c1">;; =&gt; (("Susan" 1001))</span>

<span class="c1">;; Queries can be templates, using $s1, $i2, etc. as parameters:</span>
<span class="p">(</span><span class="nv">emacsql</span> <span class="nv">db</span> <span class="nv">[:select</span> <span class="nv">[name</span> <span class="nv">id]</span>
             <span class="ss">:from</span> <span class="nv">patients</span>
             <span class="ss">:where</span> <span class="p">(</span><span class="nb">&gt;</span> <span class="nv">weight</span> <span class="nv">$s1</span><span class="p">)</span><span class="nv">]</span>
         <span class="mi">100</span><span class="p">)</span>
<span class="c1">;; =&gt; (("Jeff" 1000) ("Susan" 1001))</span>
</code></pre></div></div>

<p>A query is a vector of keywords, identifiers, parameters, and data.
Thanks to parameters, these s-expression statements should not need to
be constructed dynamically at run-time.</p>

<p>The compilation rules are listed in the EmacSQL documentation so I
won’t repeat them in detail here. In short, lisp keywords become SQL
keywords, row-oriented information is always presented as vectors,
expressions are lists, and symbols are identifiers, except when
quoted.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">[:select</span> <span class="nv">[name</span> <span class="nv">weight]</span> <span class="ss">:from</span> <span class="nv">patients</span> <span class="ss">:where</span> <span class="p">(</span><span class="nb">&lt;</span> <span class="nv">weight</span> <span class="mf">150.0</span><span class="p">)</span><span class="nv">]</span>
</code></pre></div></div>

<p>That compiles to this,</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">name</span><span class="p">,</span> <span class="n">weight</span> <span class="k">FROM</span> <span class="n">patients</span> <span class="k">WHERE</span> <span class="n">weight</span> <span class="o">&lt;</span> <span class="mi">150</span><span class="p">.</span><span class="mi">0</span><span class="p">;</span>
</code></pre></div></div>

<p>Also, any <a href="/blog/2013/12/30/#almost_everything_prints_readably">readable lisp value</a> can be stored in an
attribute. Integers are mapped to INTEGER, floats are mapped to REAL,
nil is mapped to NULL, and everything else is printed and stored as
TEXT. The specifics vary depending on the back-end.</p>

<h4 id="parameters">Parameters</h4>

<p>A symbol beginning with a dollar sign is a parameter. It has a type —
identifier (i), scalar (s), vector (v), schema (S) — and an argument
position.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">[:select</span> <span class="nv">[$i1]</span> <span class="ss">:from</span> <span class="nv">$i2</span> <span class="ss">:where</span> <span class="p">(</span><span class="nb">&lt;</span> <span class="nv">$i3</span> <span class="nv">$s4</span><span class="p">)</span><span class="nv">]</span>
</code></pre></div></div>

<p>Given the arguments <code class="language-plaintext highlighter-rouge">name people age 21</code>, three symbols and an
integer, it compiles to:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">SELECT</span> <span class="nv">name</span> <span class="nv">FROM</span> <span class="nv">people</span> <span class="nv">WHERE</span> <span class="nv">age</span> <span class="nb">&lt;</span> <span class="mi">21</span><span class="c1">;</span>
</code></pre></div></div>

<p>A vector parameter refers to rows to be inserted or as a set for an
<code class="language-plaintext highlighter-rouge">IN</code> expression.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">[:insert-into</span> <span class="nv">people</span> <span class="nv">[name</span> <span class="nv">age]</span> <span class="ss">:values</span> <span class="nv">$v1]</span>
</code></pre></div></div>

<p>Given the argument <code class="language-plaintext highlighter-rouge">(["Jim" 45] ["Jeff" 34])</code>, a list of two rows,
this becomes,</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">people</span> <span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">age</span><span class="p">)</span> <span class="k">VALUES</span> <span class="p">(</span><span class="s1">'"Jim"'</span><span class="p">,</span> <span class="mi">45</span><span class="p">),</span> <span class="p">(</span><span class="s1">'"Jeff"'</span><span class="p">,</span> <span class="mi">34</span><span class="p">);</span>
</code></pre></div></div>

<p>And this,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">[:select</span> <span class="nb">*</span> <span class="ss">:from</span> <span class="nv">tags</span> <span class="ss">:where</span> <span class="p">(</span><span class="nv">in</span> <span class="nv">tag</span> <span class="nv">$v1</span><span class="p">)</span><span class="nv">]</span>
</code></pre></div></div>

<p>Given the argument <code class="language-plaintext highlighter-rouge">[hiking camping biking]</code> becomes,</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">tags</span> <span class="k">WHERE</span> <span class="n">tag</span> <span class="k">IN</span> <span class="p">(</span><span class="s1">'hiking'</span><span class="p">,</span> <span class="s1">'camping'</span><span class="p">,</span> <span class="s1">'biking'</span><span class="p">);</span>
</code></pre></div></div>

<p>When writing these expressions keep in mind the command
<code class="language-plaintext highlighter-rouge">emacsql-show-last-sql</code>. It will display in the minibuffer the SQL
result of the s-expression statement before the point.</p>

<h3 id="schemas">Schemas</h3>

<p>A table schema is a list whose first element is a column specification
vector (i.e. row-oriented information is presented as vectors). The
remaining elements are table constraints. Here are the examples from
the documentation,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; No constraints schema with four columns:</span>
<span class="p">(</span><span class="nv">[name</span> <span class="nv">id</span> <span class="nv">building</span> <span class="nv">room]</span><span class="p">)</span>

<span class="c1">;; Add some column constraints:</span>
<span class="p">(</span><span class="nv">[</span><span class="p">(</span><span class="nv">name</span> <span class="ss">:unique</span><span class="p">)</span> <span class="p">(</span><span class="nv">id</span> <span class="nc">integer</span> <span class="ss">:primary-key</span><span class="p">)</span> <span class="nv">building</span> <span class="nv">room]</span><span class="p">)</span>

<span class="c1">;; Add some table constraints:</span>
<span class="p">(</span><span class="nv">[</span><span class="p">(</span><span class="nv">name</span> <span class="ss">:unique</span><span class="p">)</span> <span class="p">(</span><span class="nv">id</span> <span class="nc">integer</span> <span class="ss">:primary-key</span><span class="p">)</span> <span class="nv">building</span> <span class="nv">room]</span>
 <span class="p">(</span><span class="ss">:unique</span> <span class="nv">[building</span> <span class="nv">room]</span><span class="p">)</span>
 <span class="p">(</span><span class="ss">:check</span> <span class="p">(</span><span class="nb">&gt;</span> <span class="nv">id</span> <span class="mi">0</span><span class="p">)))</span>
</code></pre></div></div>

<p>In the handful of EmacSQL databases I’ve created for practice and
testing, I’ve put the schema in a global constant. A table schema is a
part of a program’s type specifications, and rows are instances of
that type, so it makes sense to declare schemas up top with things
like defstructs.</p>

<p>These schemas can be substituted into a SQL statement using a <code class="language-plaintext highlighter-rouge">$S</code>
parameter (capital “S” for <em>S</em>chema).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defconst</span> <span class="nv">foo-schema-people</span>
  <span class="o">'</span><span class="p">(</span><span class="nv">[</span><span class="p">(</span><span class="nv">person-id</span> <span class="nc">integer</span> <span class="ss">:primary-key</span><span class="p">)</span> <span class="nv">name</span> <span class="nv">age]</span><span class="p">))</span>

<span class="c1">;; ...</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">foo-init</span> <span class="p">(</span><span class="nv">db</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">emacsql</span> <span class="nv">db</span> <span class="nv">[:create-table</span> <span class="nv">$i1</span> <span class="nv">$S2]</span> <span class="ss">'people</span> <span class="nv">foo-schema-people</span><span class="p">))</span>
</code></pre></div></div>

<h3 id="back-ends">Back-ends</h3>

<p>Everything I’ve discussed so far is restricted to the SQL statement
compiler. It’s completely independent of the back-end implementations,
themselves mostly handling strings of SQL statements.</p>

<h4 id="sqlite-implementation-difficulties">SQLite Implementation Difficulties</h4>

<p>A little over a year ago I wrote <a href="/blog/2012/12/29/">a pastebin webapp</a> in
Elisp. I wanted to use SQLite as a back-end for storing pastes but
struggled to get the SQLite command shell, sqlite3, to cooperate with
Emacs. The problem was that all of the output modes except for “tcl”
are ambiguous. This includes the “csv” formatted output. TEXT values
can dump newlines, allowing rows to span an arbitrary number of lines.
They can dump things that look like the sqlite3 prompt, so it’s
impossible to know when sqlite3 is done printing results. I ultimately
decided the command shell was inadequate as an Emacs subprocess.</p>

<p>Recently there <a href="/blog/2013/09/09/">was some discussion</a> from alexbenjm and Andres
Ramirez on an Elfeed post about using SQLite as an Elfeed back-end.
This inspired me to take another look and that’s when I came up with a
workaround for SQLite’s ambiguity: only store printed Elisp values for
TEXT values! With <code class="language-plaintext highlighter-rouge">print-escape-newlines</code> set, TEXT values no longer
span multiple lines, and I can use <code class="language-plaintext highlighter-rouge">read</code> to pull in data from
sqlite3. All of sqlite3’s output modes were now unambiguous.</p>

<p>However, after making significant progress I discovered an even bigger
issue: GNU Readline. The sqlite3 binary provided by Linux package
repositories is almost always compiled with Readline support. This
makes the tool much more friendly to use, but it’s a huge problem for
Emacs.</p>

<p>First, sqlite3 the command shell is not up to the same standards as
SQLite the database. Not by a long shot. In my short time working with
SQLite I’ve already discovered several bugs in the command shell. For
one, it’s not properly integrated with GNU Readline. There’s an
<code class="language-plaintext highlighter-rouge">.echo</code> meta-command that turns command echoing on and off. That is,
it repeats your command back to you. Useful in some circumstances,
though not mine. The bug is that this echo is separate from GNU
Readline’s echo. When Readline is active and <code class="language-plaintext highlighter-rouge">.echo</code> is enabled, there
are actually <em>two</em> echos. Turn it off and there’s one echo.</p>

<h5 id="pseudo-terminals">Pseudo-terminals</h5>

<p>Under some circumstances, like when communicating over a pipe rather
than a PTY, Readline will mostly become deactivated. This would have
been a workaround, but when Readline is disabled sqlite3 heavily
buffers its output. This breaks any sort of interaction. Even worse,
on Windows <a href="http://sqlite.1065341.n5.nabble.com/Command-line-shell-not-flushing-stderr-when-interactive-td73340.html">stderr is not always unbuffered</a>, so sqlite3’s
error messages may not appear for a long time (another bug).</p>

<p>Besides the problem of getting Readline to shut up, another problem is
getting Readline to stop acting on control characters. The first 32
characters in ASCII are control characters. A pseudo-terminal (PTY)
that is not in raw mode will immediately act upon any control
characters it sees. There’s no escaping them.</p>

<p>Emacs communicates with subprocesses through a PTY by default
(probably an early design mistake), limiting the kind of data that can
be transmitted. You can try this yourself in a comint mode sometime
where a subprocess is used (not a socket like SLIME). Fire up <code class="language-plaintext highlighter-rouge">M-x
sql-sqlite</code> (part of Emacs) and try sending a string containing byte
0x1C (28, file separator). You can type one by pressing <code class="language-plaintext highlighter-rouge">C-q C-\</code>.
Send that byte and the subprocess dies.</p>

<p>There are two ways to work around this. One is to use a pipe (bind
<code class="language-plaintext highlighter-rouge">process-connection-type</code> to nil). Pipes don’t respond to control
characters. This doesn’t work with sqlite3 because of the
previously-mentioned buffering issue.</p>

<p>The other way to work around this is to put the PTY in raw mode.
Unfortunately there’s no function to do this so you need to call
<code class="language-plaintext highlighter-rouge">stty</code>. Of course, this program needs to run on the same PTY, so a
<code class="language-plaintext highlighter-rouge">start-process-shell-command</code> is required.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">start-process-shell-command</span> <span class="nv">name</span> <span class="nv">buffer</span> <span class="s">"stty raw &amp;&amp; &lt;your command&gt;"</span><span class="p">)</span>
</code></pre></div></div>

<p>Windows has neither <code class="language-plaintext highlighter-rouge">stty</code> nor PTYs (nor any of PTY’s issues) so
you’ll need to check the operating system before starting the process.
Even this still doesn’t work for sqlite3 because Readline itself will
respond to control characters. There’s no option to disable this.</p>

<p>There’s a package called <a href="https://github.com/mhayashi1120/Emacs-esqlite">esqlite</a> that is also a SQLite
front-end. It’s built to use sqlite3 and therefore suffers from all of
these problems.</p>

<h4 id="a-custom-sqlite-binary">A Custom SQLite Binary</h4>

<p>Since sqlite3 proved unreliable I developed my own protocol and
external program. It’s just a tiny bit of C that accepts a SQL string
and returns results as an s-expression. I’m not longer constrained to
storing readable values, but I’m still keeping that paradigm. First,
it keeps the C glue program simple and, more importantly, I can rely
entirely on the Emacs reader to parse the results. This makes
communication between Emacs and the subprocess as fast as it can
possibly be. The reader is faster than any possible Elisp program.</p>

<p>As I mentioned before, this C program is compiled when possible, and
otherwise a pre-built binary is fetched from my server (popular
platforms only, obviously). It’s likely EmacSQL will have at least one
working back-end on whatever you’re using.</p>

<h3 id="other-back-ends">Other Back-ends</h3>

<p>Both PostgreSQL and MySQL are also supported, though these require the
user have the appropriate client programs installed (psql or mysql).
Both of these are much better behaved than sqlite3 and, with the
<code class="language-plaintext highlighter-rouge">stty</code> trick, each can reliably be used without any special help. Both
pass all of the unit tests, so, in theory, they’ll work just as well
as SQLite.</p>

<p>To use them with the example at the beginning of this article, require
<code class="language-plaintext highlighter-rouge">emacsql-psql</code> or <code class="language-plaintext highlighter-rouge">emacsql-mysql</code>, then swap <code class="language-plaintext highlighter-rouge">emacsql-connect</code> for the
constructors <code class="language-plaintext highlighter-rouge">emacsql-psql</code> or <code class="language-plaintext highlighter-rouge">emacsql-mysql</code> (along with the proper
arguments). All three of these constructors return an
<code class="language-plaintext highlighter-rouge">emacsql-connection</code> object that works with the same API.</p>

<p>EmacSQL only goes so far to normalize the interfaces to these
databases, so for any non-trivial program you may not be able to swap
back-ends without some work. All of the EmacSQL functions that operate
on connections are generic functions (EIEIO), so changing back-ends
will only have an effect on the program’s SQL statements. For example,
if you use q SQLite-ism (dynamic typing) it won’t translate to either
of the other databases should they be swapped in.</p>

<p>I’ll cover the connections API, and what it takes to implement a new
back-end, in a future post. Outside of the PTY caveats, it’s actually
very easy. The MySQL implementation is just 80 lines of code.</p>

<h3 id="emacsqls-future">EmacSQL’s Future</h3>

<p>I hope this becomes a reliable and trusted database solution that
other packages can depend upon. Twice so far, the pastebin demo and
Elfeed, I’ve really wanted something like this and, instead, ended up
having to hack together my own database.</p>

<p>I’ve already started a branch on Elfeed re-implementing its database
in EmacSQL. Someday it may become Elfeed’s primary database if I feel
there’s no disadvantage to it. EmacSQL builds SQLite with the
full-text search engine enabled, which opens to the door to a
powerful, fast Elfeed search API. Currently the main obstacle is
actually Elfeed’s database API being somewhat incompatible with ACID
database transactions — shortsightedness on my part!</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Emacs Lisp Object Finalizers</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/01/27/"/>
    <id>urn:uuid:48023a80-358c-39b4-371b-d74dfb248897</id>
    <updated>2014-01-27T05:24:16Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p><strong>*Update</strong>: Emacs 25.1 (released Sept. 2016) formally introduced
finalizers to Emacs Lisp. This article is left here for historical
purposes.</p>

<p><strong>Problem</strong>: You have a special resource, such as a buffer or process,
associated with an Emacs Lisp object which is not managed by the
garbage collector. You want this resource to be cleaned up when the
owning lisp object is garbage collected. Unlike some other languages,
Elisp doesn’t provide <a href="http://en.wikipedia.org/wiki/Finalizer">finalizers</a> for this job, so what do
you do?</p>

<p><strong>Solution</strong>: This is Emacs Lisp. We can just add this feature to the
language ourselves!</p>

<p>I’ve already implemented this feature as a package called <code class="language-plaintext highlighter-rouge">finalize</code>,
available on MELPA. I will be using it as part of a larger, upcoming
project.</p>

<ul>
  <li><a href="https://github.com/skeeto/elisp-finalize">https://github.com/skeeto/elisp-finalize</a></li>
</ul>

<p>In this article I will describe how it works.</p>

<h3 id="processes-and-buffers">Processes and Buffers</h3>

<p>Process and buffers are special types of objects. Immediately after
instantiation these objects are added to a global list. They will
never become unreachable without explicitly being killed. The garbage
collector will never manage them for you.</p>

<p>This is a problem for APIs like those provided by the url package. The
functions <code class="language-plaintext highlighter-rouge">url-retrieve</code> and <code class="language-plaintext highlighter-rouge">url-retrieve-synchronously</code> create
buffers and hand them back to their callers. Ownership is transfered
to the caller and the caller must be careful to kill the buffer, or
transfer ownership again, before it returns. Otherwise the buffer is
“leaked.” The url package tries to manage this a little bit with
<code class="language-plaintext highlighter-rouge">url-gc-dead-buffers</code>, but this can’t be relied upon.</p>

<p>Another issue is when a process is started and is stored in a struct
or some other kind of object. There is probably a “close” function
that accepts one of these structs and kills the process. But if that
function isn’t called, due to a bug or an error condition, it will
become a “dangling” process. If the struct is completely lost, it will
probably be inconvenient to deal with the process — the “close”
function is no longer useful.</p>

<h3 id="with-macros">With Macros</h3>

<p>A common way to deal with this problem is using a <code class="language-plaintext highlighter-rouge">with-</code> macro. This
macro establishes a resource, evaluates a body, and ensures the
resource is properly cleaned up regardless of the body’s termination
state. The latter is accomplished using <code class="language-plaintext highlighter-rouge">unwind-protect</code>. For example,
<code class="language-plaintext highlighter-rouge">with-temp-buffer</code>,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Fetch the first 10 bytes of foo.txt</span>
<span class="p">(</span><span class="nv">with-temp-buffer</span>
  <span class="p">(</span><span class="nv">insert-file-contents</span> <span class="s">"foo.txt"</span> <span class="no">nil</span> <span class="mi">0</span> <span class="mi">10</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">buffer-string</span><span class="p">))</span>
</code></pre></div></div>

<p>This expands (roughly) to the following expression.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">temp-buffer</span> <span class="p">(</span><span class="nv">generate-new-buffer</span> <span class="s">"*temp*"</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">with-current-buffer</span> <span class="nv">temp-buffer</span>
    <span class="p">(</span><span class="k">unwind-protect</span>
        <span class="p">(</span><span class="k">progn</span>
          <span class="p">(</span><span class="nv">insert-file-contents</span> <span class="s">"foo.txt"</span> <span class="no">nil</span> <span class="mi">0</span> <span class="mi">10</span><span class="p">)</span>
          <span class="p">(</span><span class="nv">buffer-string</span><span class="p">))</span>
      <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nv">buffer-live-p</span> <span class="nv">temp-buffer</span><span class="p">)</span>
           <span class="p">(</span><span class="nv">kill-buffer</span> <span class="nv">temp-buffer</span><span class="p">)))))</span>
</code></pre></div></div>

<p>For dealing with open files, Common Lisp has <code class="language-plaintext highlighter-rouge">with-open-stream</code>. It
establishes a binding for a new stream over its body and ensures the
stream is closed when the body is complete. There’s no chance for a
stream to be left open, leaking a system resource.</p>

<p>However, <code class="language-plaintext highlighter-rouge">with-</code> macros aren’t useful in asynchronous situations. In
Emacs this would be the case for asynchronous sub-processes, such as
an attached language interpreter. The extent of the process goes
beyond a single body.</p>

<h3 id="finalizers">Finalizers</h3>

<p>What would really be useful is to have a callback — a finalizer —
that runs when an object is garbage collected. This ensures that the
resource will not outlive its owner, restoring management back to the
garbage collector. However, Emacs provides no such hook.</p>

<p>Fortunately this feature can be built using weak hash tables and the
<code class="language-plaintext highlighter-rouge">post-gc-hook</code>, a list of functions that are run immediately after
garbage collection.</p>

<h4 id="weak-references">Weak References</h4>

<p>I’ve discussed before <a href="/blog/2012/12/17/">how to create weak references in Elisp</a>.
The only weak references in Emacs are built into weak hash tables.
Normally the language provides weak references first and hash tables
are built on top of them. With Emacs we do this backwards.</p>

<p>The <code class="language-plaintext highlighter-rouge">make-hash-table</code> function accepts a key argument <code class="language-plaintext highlighter-rouge">:weakness</code> to
specify how strongly keys and values should be held by the table. To
make a weak reference just create a hash table of size 1 and set
<code class="language-plaintext highlighter-rouge">:weakness</code> to t.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">weak-ref</span> <span class="p">(</span><span class="nv">thing</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">ref</span> <span class="p">(</span><span class="nb">make-hash-table</span> <span class="ss">:size</span> <span class="mi">1</span> <span class="ss">:weakness</span> <span class="no">t</span> <span class="ss">:test</span> <span class="ss">'eq</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">prog1</span> <span class="nv">ref</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">gethash</span> <span class="no">t</span> <span class="nv">ref</span><span class="p">)</span> <span class="nv">thing</span><span class="p">))))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">deref</span> <span class="p">(</span><span class="nv">ref</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">gethash</span> <span class="no">t</span> <span class="nv">ref</span><span class="p">))</span>
</code></pre></div></div>

<p>The same trick can be used to detect when an object is garbage
collected. If the result of <code class="language-plaintext highlighter-rouge">deref</code> is nil, then the object was
garbage collected. (Or the weakly-referenced object <em>is</em> nil, but this
object will never be garbage collected anyway.)</p>

<p>To check if we need to run a finalizer all we have to do is create a
weak reference to the object, then check the reference after garbage
collection. This check can be done in a <code class="language-plaintext highlighter-rouge">post-gc-hook</code> function.</p>

<h4 id="registration">Registration</h4>

<p>To avoid cluttering up <code class="language-plaintext highlighter-rouge">post-gc-hook</code> with one closure per object
we’ll keep a register of all watched objects.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">finalizable-objects</span> <span class="p">())</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">register</span> <span class="p">(</span><span class="nv">object</span> <span class="nv">callback</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">push</span> <span class="p">(</span><span class="nb">cons</span> <span class="p">(</span><span class="nv">weak-ref</span> <span class="nv">object</span><span class="p">)</span> <span class="nv">callback</span><span class="p">)</span> <span class="nv">finalizable-objects</span><span class="p">))</span>
</code></pre></div></div>

<p>Now a function to check for missing objects, <code class="language-plaintext highlighter-rouge">try-finalize</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">try-finalize</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">alive</span> <span class="p">(</span><span class="nv">cl-remove-if-not</span> <span class="nf">#'</span><span class="nv">deref</span> <span class="nv">finalizable-objects</span> <span class="ss">:key</span> <span class="nf">#'</span><span class="nb">car</span><span class="p">))</span>
        <span class="p">(</span><span class="nv">dead</span> <span class="p">(</span><span class="nv">cl-remove-if</span> <span class="nf">#'</span><span class="nv">deref</span> <span class="nv">finalizable-objects</span> <span class="ss">:key</span> <span class="nf">#'</span><span class="nb">car</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">setf</span> <span class="nv">finalizable-objects</span> <span class="nv">alive</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">mapc</span> <span class="nf">#'</span><span class="nb">funcall</span> <span class="p">(</span><span class="nb">mapcar</span> <span class="nf">#'</span><span class="nb">cdr</span> <span class="nv">dead</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'post-gc-hook</span> <span class="nf">#'</span><span class="nv">try-finalize</span><span class="p">)</span>
</code></pre></div></div>

<p>Now to try it out. Create a process, stuff it in a vector (like a
defstruct), register <code class="language-plaintext highlighter-rouge">delete-process</code> as a finalizer, and, for the
sake of demonstration, immediately forget the vector.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: t; -*-</span>
<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">process</span> <span class="p">(</span><span class="nv">start-process</span> <span class="s">"ping"</span> <span class="no">nil</span> <span class="s">"ping"</span> <span class="s">"localhost"</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">register</span> <span class="p">(</span><span class="nb">vector</span> <span class="nv">process</span><span class="p">)</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="p">(</span><span class="nv">delete-process</span> <span class="nv">process</span><span class="p">))))</span>

<span class="c1">;; Assuming the garbage collector has not already run.</span>
<span class="p">(</span><span class="nv">get-process</span> <span class="s">"ping"</span><span class="p">)</span>
<span class="c1">;; =&gt; #&lt;process ping&gt;</span>

<span class="c1">;; Force garbage collection.</span>
<span class="p">(</span><span class="nv">garbage-collect</span><span class="p">)</span>

<span class="p">(</span><span class="nv">get-process</span> <span class="s">"ping"</span><span class="p">)</span>
<span class="c1">;; =&gt; nil</span>
</code></pre></div></div>

<p>The garbage collector killed the process for us!</p>

<p>There are some problems with this implementation. Using <code class="language-plaintext highlighter-rouge">cl-remove-if</code>
is unwise in a <code class="language-plaintext highlighter-rouge">post-gc-hook</code> function. It allocates lots of new cons
cells but garbage collection is inhibited while the function is run.
The docstring warns us:</p>

<blockquote>
  <p>Garbage collection is inhibited while the hook functions run, so be
careful writing them.</p>
</blockquote>

<p>Similarly, all of the finalizers are run within the context of this
memory-sensitive hook. Instead they should be delayed until the next
evaluation turn (i.e. <code class="language-plaintext highlighter-rouge">run-at-time</code> of 0). Some of the finalizers
could also fail, which would cause the remaining finalizers to never
run. The real implementation deals with all of these issues.</p>

<p>A major drawback to these Emacs Lisp finalizers compared to other
languages is that the actual object is not available. We don’t know
it’s getting collected until after it’s already gone. This solves the
object resurrection problem, but it’s darn inconvenient. One possible
workaround in the case of defstructs and EIEIO objects is to make a
copy of the original object (<code class="language-plaintext highlighter-rouge">copy-sequence</code> or <code class="language-plaintext highlighter-rouge">clone</code>) and run the
finalizer on the copy as if it was the original.</p>

<h3 id="the-real-implementation">The Real Implementation</h3>

<p>The real implementation is more carefully namespaced and its API has
just one function: <code class="language-plaintext highlighter-rouge">finalize-register</code>. It works just like <code class="language-plaintext highlighter-rouge">register</code>
above but it accepts <code class="language-plaintext highlighter-rouge">&amp;rest</code> arguments to be passed to the finalizer.
This makes the registration call simpler and avoids some
<a href="/blog/2013/12/30/#the_readable_closures_catch">significant problems with closures</a>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">process</span> <span class="p">(</span><span class="nv">start-process</span> <span class="s">"ping"</span> <span class="no">nil</span> <span class="s">"ping"</span> <span class="s">"localhost"</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">finalize-register</span> <span class="p">(</span><span class="nb">vector</span> <span class="nv">process</span><span class="p">)</span> <span class="nf">#'</span><span class="nv">delete-process</span> <span class="nv">process</span><span class="p">))</span>
</code></pre></div></div>

<p>Here’s a more formal example of how it might really be used.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-defstruct</span> <span class="p">(</span><span class="nv">pinger</span> <span class="p">(</span><span class="ss">:constructor</span> <span class="nv">pinger--create</span><span class="p">))</span>
  <span class="nv">process</span> <span class="nv">host</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">pinger-create</span> <span class="p">(</span><span class="nv">host</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">process</span> <span class="p">(</span><span class="nv">start-process</span> <span class="s">"pinger"</span> <span class="no">nil</span> <span class="s">"ping"</span> <span class="nv">host</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">object</span> <span class="p">(</span><span class="nv">pinger--create</span> <span class="ss">:process</span> <span class="nv">process</span> <span class="ss">:host</span> <span class="nv">host</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">finalize-register</span> <span class="nv">object</span> <span class="nf">#'</span><span class="nv">delete-process</span> <span class="nv">process</span><span class="p">)</span>
    <span class="nv">object</span><span class="p">))</span>
</code></pre></div></div>

<p>To make things cleaner for EIEIO classes there’s also a <code class="language-plaintext highlighter-rouge">finalizable</code>
mixin class that ensures the <code class="language-plaintext highlighter-rouge">finalize</code> generic function is called on
a copy of the object (the original object is gone) when it’s garbage
collected.</p>

<p>Here’s how it would be used for the same “pinger” concept, this time
as an EIEIO class. An advantage here is that anyone can manually call
<code class="language-plaintext highlighter-rouge">finalize</code> early if desired.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'eieio</span><span class="p">)</span>
<span class="p">(</span><span class="nb">require</span> <span class="ss">'finalizable</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defclass</span> <span class="nv">pinger</span> <span class="p">(</span><span class="nv">finalizable</span><span class="p">)</span>
  <span class="p">((</span><span class="nv">process</span> <span class="ss">:initarg</span> <span class="ss">:process</span> <span class="ss">:reader</span> <span class="nv">pinger-process</span><span class="p">)</span>
   <span class="p">(</span><span class="nv">host</span> <span class="ss">:initarg</span> <span class="ss">:host</span> <span class="ss">:reader</span> <span class="nv">pinger-host</span><span class="p">)))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">pinger-create</span> <span class="p">(</span><span class="nv">host</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">make-instance</span> <span class="ss">'pinger</span>
                 <span class="ss">:process</span> <span class="p">(</span><span class="nv">start-process</span> <span class="s">"ping"</span> <span class="no">nil</span> <span class="s">"ping"</span> <span class="nv">host</span><span class="p">)</span>
                 <span class="ss">:host</span> <span class="nv">host</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defmethod</span> <span class="nv">finalize</span> <span class="p">((</span><span class="nv">pinger</span> <span class="nv">pinger</span><span class="p">))</span>
  <span class="p">(</span><span class="nv">delete-process</span> <span class="p">(</span><span class="nv">pinger-process</span> <span class="nv">pinger</span><span class="p">)))</span>
</code></pre></div></div>

<p>It’s a small package but I think it can be quite handy.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Measure Elisp Object Memory Usage with Calipers</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/01/26/"/>
    <id>urn:uuid:3ba9664d-2758-30c8-6b33-2c17835575d1</id>
    <updated>2014-01-26T01:15:02Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>A couple of weeks ago I wrote a library to measure the retained memory
footprint of arbitrary Elisp objects for the purposes of optimization.
It’s called Caliper.</p>

<ul>
  <li><a href="https://github.com/skeeto/caliper">https://github.com/skeeto/caliper</a></li>
</ul>

<p>Note, Caliper requires <a href="/blog/2013/12/18/">predd, my predicate dispatch library</a>.
Neither of these packages are on MELPA or Marmalade since they’re
mostly for fun.</p>

<p>The reason I wanted this was that I came across a post on reddit where
someone had <a href="http://old.reddit.com/r/datasets/comments/1uyd0t/">scraped 217,000 <em>Jeopardy!</em> questions</a> from
<a href="http://www.j-archive.com/">J! Archive</a> and dumped them out into a single, large JSON
file. The significance of the effort is that it dealt with some of the
inconsistencies of <em>J! Archive</em>’s data presentation, normalizing them
for the JSON output.</p>

<ul>
  <li><a href="https://skeeto.s3.amazonaws.com/share/JEOPARDY_QUESTIONS1.json.gz">JEOPARDY_QUESTIONS1.json.gz</a> (12MB, 53MB uncompressed)</li>
</ul>

<p>When I want to examine a JSON dataset like this I have three preferred
options:</p>

<ul>
  <li>Load it into a browser page and poke at it from JavaScript remotely
with <a href="https://github.com/skeeto/skewer-mode">Skewer</a>. With the JSON text weighing in at 53MB and
with such a large object count, I decided this was too large for a
browser page. It definitely <em>could</em> be done, it’s just that the
browser is not the place to be working on large datasets.</li>
  <li>Load it into Clojure. I’m familiar with Clojure’s
<a href="https://github.com/clojure/data.json">data.json</a>. This is not a bad choice, but there’s
something else I always reach for first if I can.</li>
  <li>Load it into Emacs using json.el (part of Emacs). This is what I
ended up doing.</li>
</ul>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">jeopardy</span>
  <span class="p">(</span><span class="nv">with-temp-buffer</span>
    <span class="p">(</span><span class="nv">insert-file-contents</span> <span class="s">"/tmp/JEOPARDY_QUESTIONS1.json"</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">json-read</span><span class="p">)))</span>

<span class="p">(</span><span class="nb">length</span> <span class="nv">jeopardy</span><span class="p">)</span>
<span class="c1">;; =&gt; 216930</span>
</code></pre></div></div>

<p>Here, <code class="language-plaintext highlighter-rouge">jeopardy</code> is bound to a vector of 216,930 association lists
(alists). I’m curious exactly how much heap memory this data structure
is using. To find out, we need to walk the data structure and sum the
sizes of everything we come across. However, care must be taken not to
count the identical objects twice, such as symbols, which, being
interned, appear many times in this data.</p>

<h3 id="measuring-object-sizes">Measuring Object Sizes</h3>

<p>This is lisp so let’s start with the cons cell. A cons cell is just a
pair of pointers, called <em>car</em> and <em>cdr</em>.</p>

<p><img src="/img/diagram/cons.png" alt="" /></p>

<p>These are used to assemble lists.</p>

<p><img src="/img/diagram/list.png" alt="" /></p>

<p>So a cons cell itself — the <em>shallow</em> size — is two words: 16 bytes
on a 64-bit operating system. To make sure Elisp doesn’t happen to
have any additional information attached to cons cells, let’s take a
look at the Emacs source code.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">Lisp_Cons</span>
  <span class="p">{</span>
    <span class="cm">/* Car of this cons cell.  */</span>
    <span class="n">Lisp_Object</span> <span class="n">car</span><span class="p">;</span>

    <span class="k">union</span>
    <span class="p">{</span>
      <span class="cm">/* Cdr of this cons cell.  */</span>
      <span class="n">Lisp_Object</span> <span class="n">cdr</span><span class="p">;</span>

      <span class="cm">/* Used to chain conses on a free list.  */</span>
      <span class="k">struct</span> <span class="n">Lisp_Cons</span> <span class="o">*</span><span class="n">chain</span><span class="p">;</span>
    <span class="p">}</span> <span class="n">u</span><span class="p">;</span>
  <span class="p">};</span>
</code></pre></div></div>

<p>The return value from <code class="language-plaintext highlighter-rouge">garbage-collect</code> backs this up. The first value
after each type is the shallow size of that type. From here on, all
values have been computed for 64-bit Emacs running on x86-64
GNU/Linux.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">garbage-collect</span><span class="p">)</span>
<span class="c1">;; =&gt; ((conses 16 9923172 2036943)</span>
<span class="c1">;;     (symbols 48 57017 54)</span>
<span class="c1">;;     (miscs 40 10203 18892)</span>
<span class="c1">;;     (strings 32 4810027 197961)</span>
<span class="c1">;;     (string-bytes 1 104599635)</span>
<span class="c1">;;     (vectors 16 103138)</span>
<span class="c1">;;     (vector-slots 8 2921744 131076)</span>
<span class="c1">;;     (floats 8 12494 5816)</span>
<span class="c1">;;     (intervals 56 119911 69249)</span>
<span class="c1">;;     (buffers 960 134)</span>
<span class="c1">;;     (heap 1024 593412 133853))</span>
</code></pre></div></div>

<p>A <code class="language-plaintext highlighter-rouge">Lisp_Object</code> is just a pointer to a lisp object. The <em>retained</em>
size of a cons cell is its shallow size plus, recursively, the
retained size of the objects in its car and cdr.</p>

<h4 id="integers-and-floats">Integers and Floats</h4>

<p>Integers are a special case. Elisp uses what is called <em>tagged
integers</em>. They’re not heap-allocated objects. Instead they’re
embedded inside the object pointers. That is, those <code class="language-plaintext highlighter-rouge">Lisp_Object</code>
pointers in <code class="language-plaintext highlighter-rouge">Lisp_Cons</code> will hold integers directly. This means to
Caliper integers have retained size of 0. We can use this to verify
Caliper’s return value for cons cells.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">caliper-object-size</span> <span class="mi">100</span><span class="p">)</span>
<span class="c1">;; =&gt; 0</span>

<span class="p">(</span><span class="nv">caliper-object-size</span> <span class="p">(</span><span class="nb">cons</span> <span class="mi">100</span> <span class="mi">200</span><span class="p">))</span>
<span class="c1">;; =&gt; 16</span>
</code></pre></div></div>

<p>Tagged integers are fast and save on memory. They also compare
properly with <code class="language-plaintext highlighter-rouge">eq</code>, which is just a pointer (identity) comparison.
However, because a few bits need to be reserved for differentiating
them from actual pointers these integers have a restricted dynamic
range.</p>

<p>Floats are not tagged and exist as immutable objects in the heap.
That’s why <code class="language-plaintext highlighter-rouge">eql</code> is still useful in Elisp — it’s like <code class="language-plaintext highlighter-rouge">eq</code> but will
handle numbers properly. (By convention you should use <code class="language-plaintext highlighter-rouge">eql</code> for
integers, too.)</p>

<h4 id="symbols-and-strings">Symbols and Strings</h4>

<p>Not counting the string’s contents, a string’s base size is 32 bytes
according to <code class="language-plaintext highlighter-rouge">garbage-collect</code>. The <code class="language-plaintext highlighter-rouge">length</code> of the string can’t be
used here because that counts characters, which vary in size. There’s
a <code class="language-plaintext highlighter-rouge">string-bytes</code> function for this. A string’s size is 32 plus its
<code class="language-plaintext highlighter-rouge">string-bytes</code> value.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">string-bytes</span> <span class="s">"naïveté"</span><span class="p">)</span>
<span class="c1">;; =&gt; 9</span>
<span class="p">(</span><span class="nv">caliper-object-size</span> <span class="s">"naïveté"</span><span class="p">)</span>
<span class="c1">;; =&gt; 41  (i.e. 32 + 9)</span>
</code></pre></div></div>

<p>As you can see from above, symbols are <em>huge</em>. Without even counting
either the string holding the name of the symbol or the symbol’s
plist, a symbol is 48 bytes.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">caliper-object-size</span> <span class="ss">'hello</span><span class="p">)</span>
<span class="c1">;; =&gt; 1038</span>
</code></pre></div></div>

<p>This 1,038 bytes is a little misleading. The symbol itself is 48
bytes, the string <code class="language-plaintext highlighter-rouge">"hello"</code> is 37 bytes, and the plist is nil. The
retained size of <code class="language-plaintext highlighter-rouge">nil</code> is significant. On my system, nil’s plist has 4
key-value pairs, which themselves have retained sizes. When examining
symbols, caliper doesn’t care if they’re interned or not, including
symbols like <code class="language-plaintext highlighter-rouge">nil</code> and <code class="language-plaintext highlighter-rouge">t</code>. However, nil is only counted once, so it
will have little impact on a large data structure.</p>

<h4 id="miscellaneous">Miscellaneous</h4>

<p>Outside of vectors, measuring object sizes starts to get fuzzy. For
example, it’s not possible to examine the exact internals of a hash
table from Elisp. We can see its contents and the number of elements
it can hold without re-sizing, but there’s intermediate structure
that’s not visible. Caliper makes rough estimates for each of these
types.</p>

<h4 id="circularity-and-double-counting">Circularity and Double Counting</h4>

<p>To avoid double counting objects, a hash table with a test of <code class="language-plaintext highlighter-rouge">eq</code> is
dynamically bound by the top level call. It’s used like a set. Before
an object is examined, the hash table is checked. If the object is
listed, the reported size is 0 (it consumes no additional space than
already accounted for).</p>

<p>This automatically solves the circularity problem. There’s no way we
can traverse into the same data structure a second time because we’ll
stop when we see it twice.</p>

<h3 id="using-caliper">Using Caliper</h3>

<p>So what’s the total retained size of the <code class="language-plaintext highlighter-rouge">jeopardy</code> structure? About
124MB.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">caliper-object-size</span> <span class="nv">jeopardy</span><span class="p">)</span>
<span class="c1">;; =&gt; 130430198</span>
</code></pre></div></div>

<p>For fun, let’s see if how much we can improve on this.</p>

<p>json.el will return alists for objects by default, but this can be
changed by setting <code class="language-plaintext highlighter-rouge">json-object-type</code> to something else. Initially I
thought maybe using plists instead would save space, but I later
realized that <strong>plists use exactly the same number of cons cells as
alists</strong>. If this doesn’t sound right, try to picture the cons cells
in your head (an exercise for the reader).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">jeopardy</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">json-object-type</span> <span class="ss">'plist</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">with-temp-buffer</span>
      <span class="p">(</span><span class="nv">insert-file-contents</span> <span class="s">"~/JEOPARDY_QUESTIONS1.json"</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">point</span><span class="p">)</span> <span class="p">(</span><span class="nv">point-min</span><span class="p">))</span>
      <span class="p">(</span><span class="nv">json-read</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">caliper-object-size</span> <span class="nv">jeopardy</span><span class="p">)</span>
<span class="c1">;; =&gt; 130430077 (plist)</span>
</code></pre></div></div>

<p>Strangely this is 121 bytes smaller. I don’t know why yet, but in the
scope of 124MB that’s nothing.</p>

<p>So what do these questions look like?</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">elt</span> <span class="nv">jeopardy</span> <span class="mi">0</span><span class="p">)</span>
<span class="c1">;; =&gt; (:show_number "4680"</span>
<span class="c1">;;     :round "Jeopardy!"</span>
<span class="c1">;;     :answer "Copernicus"</span>
<span class="c1">;;     :value "$200"</span>
<span class="c1">;;     :question "..." ;; omitted</span>
<span class="c1">;;     :air_date "2004-12-31"</span>
<span class="c1">;;     :category "HISTORY")</span>
</code></pre></div></div>

<p>They’re (now) plists of 7 pairs. All of the keys are symbols, and, as
such, are interned and consuming very little memory. All of the values
are strings. Surely we can do better here. The strings can be interned
and the numbers can be turned into tagged integers. The :category
values would probably be good candidates for conversion into symbols.</p>

<p>Here’s an interesting fact about Jeopardy! that can be exploited for
our purposes. While Jeopardy! covers a broad range of trivia,
<a href="http://vimeo.com/29001512">it does so very shallowly</a>. The same answers appear many
times. For example, the very first answer from our dataset,
Copernicus, appears 14 times. That makes even the answers good
candidates for interning.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">question</span> <span class="nv">across</span> <span class="nv">jeopardy</span>
         <span class="nv">for</span> <span class="nv">answer</span> <span class="nb">=</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">question</span> <span class="ss">:answer</span><span class="p">)</span>
         <span class="nb">count</span> <span class="p">(</span><span class="nb">string=</span> <span class="nv">answer</span> <span class="s">"Copernicus"</span><span class="p">))</span>
<span class="c1">;; =&gt; 14</span>
</code></pre></div></div>

<p>A string pool is trivial to implement. Just use a weak, <code class="language-plaintext highlighter-rouge">equal</code> hash
table to track strings. Making it weak keeps it from leaking memory by
holding onto strings for longer than necessary.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">string-pool</span>
  <span class="p">(</span><span class="nb">make-hash-table</span> <span class="ss">:test</span> <span class="ss">'equal</span> <span class="ss">:weakness</span> <span class="no">t</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">intern-string</span> <span class="p">(</span><span class="nb">string</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">or</span> <span class="p">(</span><span class="nb">gethash</span> <span class="nb">string</span> <span class="nv">string-pool</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">gethash</span> <span class="nb">string</span> <span class="nv">string-pool</span><span class="p">)</span> <span class="nb">string</span><span class="p">)))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">jeopardy-fix</span> <span class="p">(</span><span class="nv">question</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="p">(</span><span class="nv">key</span> <span class="nv">value</span><span class="p">)</span> <span class="nv">on</span> <span class="nv">question</span> <span class="nv">by</span> <span class="nf">#'</span><span class="nb">cddr</span>
           <span class="nv">collect</span> <span class="nv">key</span>
           <span class="nv">collect</span> <span class="p">(</span><span class="nv">cl-case</span> <span class="nv">key</span>
                     <span class="p">(</span><span class="ss">:show_number</span> <span class="p">(</span><span class="nb">read</span> <span class="nv">value</span><span class="p">))</span>
                     <span class="p">(</span><span class="ss">:value</span> <span class="p">(</span><span class="k">if</span> <span class="nv">value</span> <span class="p">(</span><span class="nb">read</span> <span class="p">(</span><span class="nv">substring</span> <span class="nv">value</span> <span class="mi">1</span><span class="p">))))</span>
                     <span class="p">(</span><span class="ss">:category</span> <span class="p">(</span><span class="nb">intern</span> <span class="nv">value</span><span class="p">))</span>
                     <span class="p">(</span><span class="nv">otherwise</span> <span class="p">(</span><span class="nv">intern-string</span> <span class="nv">value</span><span class="p">)))))</span>

<span class="p">(</span><span class="nb">defvar</span> <span class="nv">jeopardy-interned</span>
  <span class="p">(</span><span class="nv">cl-map</span> <span class="ss">'vector</span> <span class="nf">#'</span><span class="nv">jeopardy-fix</span> <span class="nv">jeopardy</span><span class="p">))</span>
</code></pre></div></div>

<p>So how are we looking now?</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">caliper-object-size</span> <span class="nv">jeopardy-interned</span><span class="p">)</span>
<span class="c1">;; =&gt; 83254322</span>
</code></pre></div></div>

<p>That’s down to 79MB of memory. Not bad! If we <code class="language-plaintext highlighter-rouge">print-circle</code> this,
taking advantage of string interning in the printed representation, I
wonder how it compares to the original JSON.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">with-temp-buffer</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">print-circle</span> <span class="no">nil</span><span class="p">))</span>
    <span class="p">(</span><span class="nb">prin1</span> <span class="nv">jeopardy-interned</span> <span class="p">(</span><span class="nv">current-buffer</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">buffer-size</span><span class="p">)))</span>
<span class="c1">;; =&gt; 45554437</span>
</code></pre></div></div>

<p>About 44MB, down from JSON’s 53MB. With <code class="language-plaintext highlighter-rouge">print-circle</code> set to nil it’s about
48MB.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Emacs Byte-code Internals</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/01/04/"/>
    <id>urn:uuid:c03869b5-fca0-3f9e-8dda-c3f361b287a8</id>
    <updated>2014-01-04T05:07:26Z</updated>
    <category term="emacs"/><category term="lisp"/><category term="elisp"/><category term="lang"/>
    <content type="html">
      <![CDATA[<p>Byte-code compilation is an underdocumented — and in the case of the
recent lexical binding updates, undocumented — part of Emacs. Most
users know that Elisp is usually compiled into a byte-code saved to
<code class="language-plaintext highlighter-rouge">.elc</code> files, and that byte-code loads and runs faster than uncompiled
Elisp. That’s all users really need to know, and the <em>GNU Emacs Lisp
Reference Manual</em> specifically discourages poking around too much.</p>

<blockquote>
  <p><strong>People do not write byte-code;</strong> that job is left to the byte
compiler. But we provide a disassembler to satisfy a cat-like
curiosity.</p>
</blockquote>

<p>Screw that! What if I want to handcraft some byte-code myself? :-) The
purpose of this article is to introduce the internals of Elisp
byte-code interpreter. I will explain how it works, why lexically
scoped code is faster, and demonstrate writing some byte-code by hand.</p>

<h3 id="the-humble-stack-machine">The Humble Stack Machine</h3>

<p>The byte-code interpreter is a simple stack machine. The stack holds
arbitrary lisp objects. The interpreter is backwards compatible but
not forwards compatible (old versions can’t run new byte-code). Each
instruction is between 1 and 3 bytes. The first byte is the opcode and
the second and third bytes are either a single operand or a single
intermediate value. Some operands are packed into the opcode byte.</p>

<p>As of this writing (Emacs 24.3) there are 142 opcodes, 6 of which have
been declared obsolete. Most opcodes refer to commonly used built-in
functions for fast access. (Looking at the selection, Elisp really is
geared towards text!) Considering packed operands, there are up to 27
potential opcodes unused, reserved for the future.</p>

<ul>
  <li>opcodes 48 - 55</li>
  <li>opcode 97</li>
  <li>opcode 128</li>
  <li>opcodes 169 - 174</li>
  <li>opcodes 180 - 181</li>
  <li>opcodes 183 - 191</li>
</ul>

<p>The easiest place to access the opcode listing is in
<a href="http://cvs.savannah.gnu.org/viewvc/emacs/emacs/lisp/emacs-lisp/bytecomp.el?view=markup">bytecomp.el</a>. Beware that some of the opcode comments are
currently out of date.</p>

<h3 id="segmentation-fault-warning">Segmentation Fault Warning</h3>

<p>Byte-code does not offer the same safety as normal Elisp. <strong>Bad
byte-code can, and will, cause Emacs to crash.</strong> You can try out for
yourself right now,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>emacs -batch -Q --eval '(print (#[0 "\300\207" [] 0]))'
</code></pre></div></div>

<p>Or evaluate the code manually in a buffer (save everything first!),</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="err">#</span><span class="nv">[0</span> <span class="s">"\300\207"</span> <span class="nv">[]</span> <span class="nv">0]</span><span class="p">)</span>
</code></pre></div></div>

<p>This segfault, caused by referencing beyond the end of the constants
vector, is <em>not</em> an Emacs bug. Doing a boundary test would slow down
the byte-code interpreter. Not performing this test at run-time is a
practical engineering decision. The Emacs developers have instead
chosen to rely on valid byte-code output from the compiler, making a
disclaimer to anyone wanting to write their own byte-code,</p>

<blockquote>
  <p>You should not try to come up with the elements for a byte-code
function yourself, because if they are inconsistent, Emacs may crash
when you call the function. Always leave it to the byte compiler to
create these objects; it makes the elements consistent (we hope).</p>
</blockquote>

<p>You’ve been warned. Now it’s time to start playing with firecrackers.</p>

<h3 id="the-byte-code-object">The Byte-code Object</h3>

<p>A byte-code object is functionally equivalent to a normal Elisp vector
<em>except</em> that it can be evaluated as a function. Elements are accessed
in constant time, the syntax is similar to vector syntax (<code class="language-plaintext highlighter-rouge">[...]</code> vs.
<code class="language-plaintext highlighter-rouge">#[...]</code>), and it can be of any length, though valid functions must
have at least 4 elements.</p>

<p>There are two ways to create a byte-code object: using a byte-code
object literal or with <code class="language-plaintext highlighter-rouge">make-byte-code</code>.
<a href="/blog/2012/07/17/">Like vector literals</a>, byte-code literals don’t need to be
quoted.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">make-byte-code</span> <span class="mi">0</span> <span class="s">""</span> <span class="nv">[]</span> <span class="mi">0</span><span class="p">)</span>
<span class="c1">;; =&gt; #[0 "" [] 0]</span>

<span class="err">#</span><span class="nv">[1</span> <span class="mi">2</span> <span class="mi">3</span> <span class="nv">4]</span>
<span class="c1">;; =&gt; #[1 2 3 4]</span>

<span class="p">(</span><span class="err">#</span><span class="nv">[0</span> <span class="s">""</span> <span class="nv">[]</span> <span class="nv">0]</span><span class="p">)</span>
<span class="c1">;; error: Invalid byte opcode</span>
</code></pre></div></div>

<p>The elements of an object literal are:</p>

<ul>
  <li>Function parameter (lambda) list</li>
  <li>Unibyte string of byte-code</li>
  <li>Constants vector</li>
  <li>Maximum stack usage</li>
  <li>Docstring (optional, nil for none)</li>
  <li>Interactive specification (optional)</li>
</ul>

<h4 id="parameter-list">Parameter List</h4>

<p>The parameter list takes on two different forms depending on if the
function is lexically or dynamically scoped. If the function is
dynamically scoped, the argument list is exactly what appears in lisp
code.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">byte-compile</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span> <span class="k">&amp;optional</span> <span class="nv">c</span><span class="p">)))</span>
<span class="c1">;; =&gt; #[(a b &amp;optional c) "\300\207" [nil] 1]</span>
</code></pre></div></div>

<p>There’s really no shorter way to represent the parameter list because
preserving the argument names is critical. Remember that, in dynamic
scope, while the function body is being evaluated these variables are
<em>globally</em> bound (eww!) to the function’s arguments.</p>

<p>When the function is lexically scoped, the parameter list is packed
into an Elisp integer, indicating the counts of the different kinds of
parameters: required, <code class="language-plaintext highlighter-rouge">&amp;optional</code>, and <code class="language-plaintext highlighter-rouge">&amp;rest</code>.</p>

<p><img src="/img/diagram/elisp-params.png" alt="" /></p>

<p>The least significant 7 bits indicate the number of required
arguments. Notice that this limits compiled, lexically-scoped
functions to 127 required arguments. The 8th bit is the number of
<code class="language-plaintext highlighter-rouge">&amp;rest</code> arguments (up to 1). The remaining bits indicate the total
number of optional and required arguments (not counting <code class="language-plaintext highlighter-rouge">&amp;rest</code>). It’s
really easy to parse these in your head when viewed as hexadecimal
because each portion almost always fits inside its own “digit.”</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">byte-compile-make-args-desc</span> <span class="o">'</span><span class="p">())</span>
<span class="c1">;; =&gt; #x000  (0 args, 0 rest, 0 required)</span>

<span class="p">(</span><span class="nv">byte-compile-make-args-desc</span> <span class="o">'</span><span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">))</span>
<span class="c1">;; =&gt; #x202  (2 args, 0 rest, 2 required)</span>

<span class="p">(</span><span class="nv">byte-compile-make-args-desc</span> <span class="o">'</span><span class="p">(</span><span class="nv">a</span> <span class="nv">b</span> <span class="k">&amp;optional</span> <span class="nv">c</span><span class="p">))</span>
<span class="c1">;; =&gt; #x302  (3 args, 0 rest, 2 required)</span>

<span class="p">(</span><span class="nv">byte-compile-make-args-desc</span> <span class="o">'</span><span class="p">(</span><span class="nv">a</span> <span class="nv">b</span> <span class="k">&amp;optional</span> <span class="nv">c</span> <span class="k">&amp;rest</span> <span class="nv">d</span><span class="p">))</span>
<span class="c1">;; =&gt; #x382  (3 args, 1 rest, 2 required)</span>
</code></pre></div></div>

<p>The names of the arguments don’t matter in lexical scope: they’re
purely positional. This tighter argument specification is one of the
reasons lexical scope is faster: the byte-code interpreter doesn’t
need to parse the entire lambda list and assign all of the variables
on each function invocation.</p>

<h4 id="unibyte-string-byte-code">Unibyte String Byte-code</h4>

<p>The second element is a unibyte string — it strictly holds octets and
is not to be interpreted as any sort of Unicode encoding. These
strings should be created with <code class="language-plaintext highlighter-rouge">unibyte-string</code> because <code class="language-plaintext highlighter-rouge">string</code> may
return a multibyte string. To disambiguate the string type to the lisp
reader when higher values are present (&gt; 127), the strings are printed
in an escaped octal notation, keeping the string literal inside the
ASCII character set.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">unibyte-string</span> <span class="mi">100</span> <span class="mi">200</span> <span class="mi">250</span><span class="p">)</span>
<span class="c1">;; =&gt; "d\310\372"</span>
</code></pre></div></div>

<p>It’s unusual to see a byte-code string that doesn’t end with 135
(#o207, byte-return). Perhaps this should have been implicit? I’ll
talk more about the byte-code below.</p>

<h4 id="constants-vector">Constants Vector</h4>

<p>The byte-code has very limited operands. Most operands are only a few
bits, some fill an entire byte, and occasionally two bytes. The meat
of the function that holds all the constants, function symbols, and
variables symbols is the constants vector. It’s a normal Elisp vector
and can be created with <code class="language-plaintext highlighter-rouge">vector</code> or a vector literal. Operands
reference either this vector or they index into the stack itself.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">byte-compile</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nv">my-func</span> <span class="nv">b</span> <span class="nv">a</span><span class="p">)))</span>
<span class="c1">;; =&gt; #[(a b) "\302\134\011\042\207" [b a my-func] 3]</span>
</code></pre></div></div>

<p>Note that the constants vector lists the variable symbols as well as
the external function symbol. If this was a lexically scoped function
the constants vector wouldn’t have the variables listed, being only
<code class="language-plaintext highlighter-rouge">[my-func]</code>.</p>

<h4 id="maximum-stack-usage">Maximum Stack Usage</h4>

<p>This is the maximum stack space used by this byte-code. This value can
be derived from the byte-code itself, but it’s pre-computed so that
the byte-code interpreter can quickly check for stack overflow.
Under-reporting this value is probably another way to crash Emacs.</p>

<h4 id="docstring">Docstring</h4>

<p>The simplest component and completely optional. It’s either the
docstring itself, or if the docstring is especially large it’s a cons
cell indicating a compiled <code class="language-plaintext highlighter-rouge">.elc</code> and a position for lazy access. Only
one position, the start, is needed because the lisp reader is used to
load it and it knows how to recognize the end.</p>

<h4 id="interactive-specification">Interactive Specification</h4>

<p>If this element is present and non-nil then the function is an
interactive function. It holds the exactly contents of <code class="language-plaintext highlighter-rouge">interactive</code>
in the uncompiled function definition.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">byte-compile</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">n</span><span class="p">)</span> <span class="p">(</span><span class="nv">interactive</span> <span class="s">"nNumber: "</span><span class="p">)</span> <span class="nv">n</span><span class="p">))</span>
<span class="c1">;; =&gt; #[(n) "\010\207" [n] 1 nil "nNumber: "]</span>

<span class="p">(</span><span class="nv">byte-compile</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">n</span><span class="p">)</span> <span class="p">(</span><span class="nv">interactive</span> <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="nb">read</span><span class="p">)))</span> <span class="nv">n</span><span class="p">))</span>
<span class="c1">;; =&gt; #[(n) "\010\207" [n] 1 nil (list (read))]</span>
</code></pre></div></div>

<p>The interactive expression is always interpreted, never byte-compiled.
This is usually fine because, by definition, this code is going to be
waiting on user input. However, it slows down keyboard macro playback.</p>

<h3 id="opcodes">Opcodes</h3>

<p>The bulk of the established opcode bytes is for variable, stack, and
constant access opcodes, most of which use packed operands.</p>

<ul>
  <li>0 - 7   : (<code class="language-plaintext highlighter-rouge">stack-ref</code>) stack reference</li>
  <li>8 - 15  : (<code class="language-plaintext highlighter-rouge">varref</code>) variable reference (from constants vector)</li>
  <li>16 - 23 : (<code class="language-plaintext highlighter-rouge">varset</code>) variable set (from constants vector)</li>
  <li>24 - 31 : (<code class="language-plaintext highlighter-rouge">varbind</code>) variable binding (from constants vector)</li>
  <li>32 - 39 : (<code class="language-plaintext highlighter-rouge">call</code>) function call (immediate = number of arguments)</li>
  <li>40 - 47 : (<code class="language-plaintext highlighter-rouge">unbind</code>) variable unbinding (from constants vector)</li>
  <li>129, 192-255 : (<code class="language-plaintext highlighter-rouge">constant</code>) direct constants vector access</li>
</ul>

<p>Except for the last item, each kind of instruction comes in sets of 8.
The nth such instruction means access the nth thing. For example, the
instruction “<code class="language-plaintext highlighter-rouge">2</code>” copies the third stack item to the top of the stack.
An instruction of “<code class="language-plaintext highlighter-rouge">9</code>” pushes onto the stack the value of the
variable named by the second element listed in the constants vector.</p>

<p>However, the 7th and 8th such instructions in each set take an operand
byte or two. The 7th instruction takes a 1-byte operand and the 8th
takes a 2-byte operand. A 2-byte operand is written in little-endian
byte-order regardless of the host platform.</p>

<p>For example, let’s manually craft an instruction that returns the
value of the global variable <code class="language-plaintext highlighter-rouge">foo</code>. Each opcode has a named constant
of <code class="language-plaintext highlighter-rouge">byte-X</code> so we don’t have to worry about their actual byte-code
number.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'bytecomp</span><span class="p">)</span>  <span class="c1">; named opcodes</span>

<span class="p">(</span><span class="nb">defvar</span> <span class="nv">foo</span> <span class="s">"hello"</span><span class="p">)</span>

<span class="p">(</span><span class="nv">defalias</span> <span class="ss">'get-foo</span>
  <span class="p">(</span><span class="nv">make-byte-code</span>
    <span class="m">#x000</span>                 <span class="c1">; no arguments</span>
    <span class="p">(</span><span class="nv">unibyte-string</span>
      <span class="p">(</span><span class="nb">+</span> <span class="mi">0</span> <span class="nv">byte-varref</span><span class="p">)</span>   <span class="c1">; ref variable under first constant</span>
      <span class="nv">byte-return</span><span class="p">)</span>        <span class="c1">; pop and return</span>
    <span class="nv">[foo]</span>                 <span class="c1">; constants</span>
    <span class="mi">1</span><span class="p">))</span>                   <span class="c1">; only using 1 stack space</span>

<span class="p">(</span><span class="nv">get-foo</span><span class="p">)</span>
<span class="c1">;; =&gt; "hello"</span>
</code></pre></div></div>

<p>Ta-da! That’s a handcrafted byte-code function. I left a “+ 0” in
there so that I can change the offset. This function has the exact
same behavior, it’s just less optimal,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defalias</span> <span class="ss">'get-foo</span>
  <span class="p">(</span><span class="nv">make-byte-code</span>
    <span class="m">#x000</span>
    <span class="p">(</span><span class="nv">unibyte-string</span>
      <span class="p">(</span><span class="nb">+</span> <span class="mi">3</span> <span class="nv">byte-varref</span><span class="p">)</span>     <span class="c1">; 4th form of varref</span>
      <span class="nv">byte-return</span><span class="p">)</span>
    <span class="nv">[nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="nv">foo]</span>
    <span class="mi">1</span><span class="p">))</span>
</code></pre></div></div>

<p>If <code class="language-plaintext highlighter-rouge">foo</code> was the 10th constant, we would need to use the 1-byte
operand version. Again, the same behavior, just less optimal.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defalias</span> <span class="ss">'get-foo</span>
  <span class="p">(</span><span class="nv">make-byte-code</span>
    <span class="m">#x000</span>
    <span class="p">(</span><span class="nv">unibyte-string</span>
      <span class="p">(</span><span class="nb">+</span> <span class="mi">6</span> <span class="nv">byte-varref</span><span class="p">)</span>     <span class="c1">; 7th form of varref</span>
      <span class="mi">9</span>                     <span class="c1">; operand, (constant index 9)</span>
      <span class="nv">byte-return</span><span class="p">)</span>
    <span class="nv">[nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="nv">foo]</span>
    <span class="mi">1</span><span class="p">))</span>
</code></pre></div></div>

<p>Dynamically-scoped code makes heavy use of <code class="language-plaintext highlighter-rouge">varref</code> but
lexically-scoped code rarely uses it (global variables only), instead
relying heavily on <code class="language-plaintext highlighter-rouge">stack-ref</code>, which is faster. This is where the
different calling conventions come into play.</p>

<h3 id="calling-convention">Calling Convention</h3>

<p>Each kind of scope gets its own calling convention. Here we finally
get to glimpse some of the really great work by Stefan Monnier
updating the compiler for lexical scope.</p>

<h4 id="dynamic-scope-calling-convention">Dynamic Scope Calling Convention</h4>

<p>Remembering back to the parameter list element of the byte-code
object, dynamically scoped functions keep track of all its argument
names. Before executing a function the interpreter examines the lambda
list and binds (<code class="language-plaintext highlighter-rouge">varbind</code>) every variable globally to an argument.</p>

<p>If the caller was byte-compiled, each argument started on the stack,
was popped and bound to a variable, and, to be accessed by the
function, will be pushed back right onto the stack (<code class="language-plaintext highlighter-rouge">varref</code>). There’s
a lot of argument indirection for each function call.</p>

<h4 id="lexical-scope-calling-convention">Lexical Scope Calling Convention</h4>

<p>With lexical scope, the argument names are not actually bound for the
evaluation byte-code. The names are completely gone because the
compiler has converted local variables into stack offsets.</p>

<p>When calling a lexically-scoped function, the byte-code interpreter
examines the integer parameter descriptor. It checks to make sure the
appropriate number of arguments have been provided, and for each
unprovided <code class="language-plaintext highlighter-rouge">&amp;optional</code> argument it pushes a nil onto the stack. If the
function has a <code class="language-plaintext highlighter-rouge">&amp;rest</code> parameter, any extra arguments are popped off
into a list and that list is pushed onto the stack.</p>

<p>From here the function can access its arguments directly on the stack
without any named variable misdirection. It can even consume them
directly.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; -*- lexical-binding: t -*-</span>
<span class="p">(</span><span class="nb">defun</span> <span class="nv">foo</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>

<span class="p">(</span><span class="nb">symbol-function</span> <span class="nf">#'</span><span class="nv">foo</span><span class="p">)</span>
<span class="c1">;; =&gt; #[#x101 "\207" [] 2]</span>
</code></pre></div></div>

<p>The byte-code for <code class="language-plaintext highlighter-rouge">foo</code> is a single instruction: <code class="language-plaintext highlighter-rouge">return</code>. The
function’s argument is already on the stack so it doesn’t have to do
anything. Strangely the maximum stack usage element is wrong here (2),
but it won’t cause a crash.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; (As of this writing `byte-compile' always uses dynamic scope.)</span>

<span class="p">(</span><span class="nv">byte-compile</span> <span class="ss">'foo</span><span class="p">)</span>
<span class="c1">;; =&gt; #[(x) "\010\207" [x] 1]</span>
</code></pre></div></div>

<p>It takes longer to set up (x is implicitly bound), it has to make an
explicit variable dereference (<code class="language-plaintext highlighter-rouge">varref</code>), then it has to clean up by
unbinding x (implicit <code class="language-plaintext highlighter-rouge">unbind</code>). It’s no wonder lexical scope is
faster!</p>

<p>Note that there’s also a <code class="language-plaintext highlighter-rouge">disassemble</code> function for examining
byte-code, but it only reveals part of the story.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">disassemble</span> <span class="nf">#'</span><span class="nv">foo</span><span class="p">)</span>
<span class="c1">;; byte code:</span>
<span class="c1">;;   args: (x)</span>
<span class="c1">;; 0       varref    x</span>
<span class="c1">;; 1       return</span>
</code></pre></div></div>

<h3 id="compiler-intermediate-lapcode">Compiler Intermediate “lapcode”</h3>

<p>The Elisp byte-compiler has an intermediate language called lapcode
(“Lisp Assembly Program”), which is much easier to optimize than
byte-code. It’s basically an assembly language built out of
s-expressions. Opcodes are referenced by name and operands, including
packed operands, are handled whole. Each instruction is a cons cell,
<code class="language-plaintext highlighter-rouge">(opcode . operand)</code>, and a program is a list of these.</p>

<p>Let’s rewrite our last <code class="language-plaintext highlighter-rouge">get-foo</code> using lapcode.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defalias</span> <span class="ss">'get-foo</span>
  <span class="p">(</span><span class="nv">make-byte-code</span>
    <span class="m">#x000</span>
    <span class="p">(</span><span class="nv">byte-compile-lapcode</span>
      <span class="o">'</span><span class="p">((</span><span class="nv">byte-varref</span> <span class="o">.</span> <span class="mi">9</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">byte-return</span><span class="p">)))</span>
    <span class="nv">[nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="nv">foo]</span>
    <span class="mi">1</span><span class="p">))</span>
</code></pre></div></div>

<p>We didn’t have to worry about which form of <code class="language-plaintext highlighter-rouge">varref</code> we were using or
even how to encode a 2-byte operand. The lapcode “assembler” took care
of that detail.</p>

<h3 id="project-ideas">Project Ideas?</h3>

<p>The Emacs byte-code compiler and interpreter are fascinating. Having
spent time studying them I’m really tempted to build a project on top
of it all. Perhaps implementing a programming language that targets
the byte-code interpreter, improving compiler optimization, or, for a
really big project, JIT compiling Emacs byte-code.</p>

<p><strong>People <em>can</em> write byte-code!</strong></p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Emacs Lisp Readable Closures</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/12/30/"/>
    <id>urn:uuid:84f86fb6-e029-3a57-1450-1d25be3fdee0</id>
    <updated>2013-12-30T23:52:38Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="lisp"/><category term="javascript"/>
    <content type="html">
      <![CDATA[<p>I’ve stated before that one of the unique features of Emacs Lisp is
that its closures are <em>readable</em>. Closures can be serialized by the
printer and read back in with the reader. I am unaware of any other
programming language that has this feature. In fact it’s essential for
Elisp byte-code compilation because byte-compiled Elisp files are
merely s-expressions of byte-code dumped out as source.</p>

<h3 id="lisp-printing">Lisp Printing</h3>

<p>The Lisp family of languages are <em>homoiconic</em>. Lisp source code is
written in the syntax of its own data structures, s-expressions. Since
a compiler/interpreter is usually provided at run-time, a consequence
of this is that reading and printing are a fundamental feature of
Lisps. A value can be handed to the printer, which will serialize the
value into an s-expression as a sequence of characters. Later on the
reader can parse the s-expression back into an <code class="language-plaintext highlighter-rouge">equal</code> value.</p>

<p>To compare, JavaScript originally had half of this in place.
JavaScript has convenient object syntax for defining an associative
array, known today as JSON. The <code class="language-plaintext highlighter-rouge">eval</code> function could (dangerously) be
used as a reader for parsing a string containing JSON-encoded data
into a value. But until <code class="language-plaintext highlighter-rouge">JSON.stringify()</code> became standard, developers
had to write their own printer. Lisp s-expression syntax is much more
powerful (and complicated) than JSON, maintaining
<a href="/blog/2013/03/28/">both identity and cycles</a> (e.g. <code class="language-plaintext highlighter-rouge">*print-circle*</code>).</p>

<p>Not all values can be read. They’ll still print (when <code class="language-plaintext highlighter-rouge">*print-readably*</code>
is nil) but will do so using special syntax that will signal an error
in the reader: <code class="language-plaintext highlighter-rouge">#&lt;</code>. For example, in Emacs Lisp buffers cannot be
serialized so they print using this syntax.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">prin1-to-string</span> <span class="p">(</span><span class="nv">current-buffer</span><span class="p">))</span>
<span class="c1">;; =&gt; "#&lt;buffer *scratch*&gt;"</span>
</code></pre></div></div>

<p>It doesn’t matter what’s between the angle brackets, or even that
there’s a closing angle bracket. The reader will signal an error as
soon as it hits a <code class="language-plaintext highlighter-rouge">#&lt;</code>.</p>

<h4 id="almost-everything-prints-readably">Almost Everything Prints Readably</h4>

<p>Elisp has a small set of primitive data types. All of these primitive
types print readably:</p>

<ul>
  <li>integer (<code class="language-plaintext highlighter-rouge">1024</code>, <code class="language-plaintext highlighter-rouge">?a</code>)</li>
  <li>float (<code class="language-plaintext highlighter-rouge">1.7</code>)</li>
  <li>cons/list (<code class="language-plaintext highlighter-rouge">(...)</code>)</li>
  <li>vector (one-dimensional, <code class="language-plaintext highlighter-rouge">[...]</code>)</li>
  <li>bool-vector (<code class="language-plaintext highlighter-rouge">#&amp;n"..."</code>)</li>
  <li>string (<code class="language-plaintext highlighter-rouge">"..."</code>)</li>
  <li>char-table (<code class="language-plaintext highlighter-rouge">#^[...]</code>)</li>
  <li>hash-table (readable as of Emacs 23.3, <code class="language-plaintext highlighter-rouge">#s(hash-table ...)</code>)</li>
  <li>byte-code function object (<code class="language-plaintext highlighter-rouge">#[...]</code>)</li>
  <li>symbol</li>
</ul>

<p>Here are all the non-readable types. Each one has a good reason for
not being serializable.</p>

<ul>
  <li>buffer</li>
  <li>process (external state)</li>
  <li>frame (user interface element)</li>
  <li>marker (live, automatically updates)</li>
  <li>overlay (belongs to a buffer)</li>
  <li>built-in functions (native code)</li>
  <li>user-ptr (opaque pointers from Emacs 25 dynamic modules)</li>
</ul>

<p>And that’s it. Every other value in Elisp is constructed from one or
more of these primitives, including keymaps, functions, macros, syntax
tables, <code class="language-plaintext highlighter-rouge">defstruct</code> structs, and EIEIO objects. This means that as
long as these values don’t refer to an unreadable value, they
themselves can be printed.</p>

<p>An interesting note here is that, unlike the Common Lisp Object System
(CLOS), EIEIO objects are readable by default. To Elisp they’re just
vectors, so of course they print. CLOS objects are unreadable without
manually defining a print method per class.</p>

<h3 id="elisp-closures">Elisp Closures</h3>

<p>Elisp got lexical scoping in Emacs 24, released in June 2012. It’s now
one of the relatively few languages to have both dynamic and lexical
scope. Like Common Lisp, variables declared with <code class="language-plaintext highlighter-rouge">defvar</code> (and family)
continue to have dynamic scope. For backwards compatibility with old
Lisp code, lexical scope is disabled by default. It’s enabled for a
specific file or buffer by setting <code class="language-plaintext highlighter-rouge">lexical-binding</code> to non-nil.</p>

<p>With lexical scope, anonymous functions become closures, a powerful
functional programming primitive: a function plus a captured lexical
environment. It also provides some performance benefits. In my own
tests, compiled Elisp with lexical scope enabled is about 10% to 15%
faster than with the default dynamic scope.</p>

<p>What do closures look like in Emacs Lisp? It takes on two forms
depending on whether the closure is compiled or not. For example,
consider this function, <code class="language-plaintext highlighter-rouge">foo</code>, that takes two arguments and returns a
closure that returns the first argument.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; -*- lexical-binding: t; -*-</span>
<span class="p">(</span><span class="nb">defun</span> <span class="nv">foo</span> <span class="p">(</span><span class="nv">x</span> <span class="nv">y</span><span class="p">)</span>
  <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="nv">x</span><span class="p">))</span>

<span class="p">(</span><span class="nv">foo</span> <span class="ss">:bar</span> <span class="ss">:ignored</span><span class="p">)</span>
<span class="c1">;; =&gt; (closure ((y . :ignored) (x . :bar) t) () x)</span>
</code></pre></div></div>

<p>An uncompiled closure is a list beginning with the symbol <code class="language-plaintext highlighter-rouge">closure</code>.
The second element is the lexical environment, the third is the
argument list (lambda list), and the rest is the body of the function.
Here we can see that both <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> have been “closed over.” This is
a little bit sloppy because the function never makes use of <code class="language-plaintext highlighter-rouge">y</code>.
Capturing it has a few problems.</p>

<ul>
  <li>The closure has a larger footprint than necessary.</li>
  <li>Values are held longer than necessary, delaying collection.</li>
  <li>It affects the readability of the closure, which I’ll get to later.</li>
</ul>

<p>Fortunately the compiler is smart enough to see this and will avoid
capturing unused variables. To prove this, I’ve now compiled <code class="language-plaintext highlighter-rouge">foo</code> so
that it returns a compiled closure.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">foo</span> <span class="ss">:bar</span> <span class="ss">:ignored</span><span class="p">)</span>
<span class="c1">;; =&gt; #[0 "\300\207" [:bar] 1]</span>
</code></pre></div></div>

<p>What’s returned here is a byte-code function object, with the <code class="language-plaintext highlighter-rouge">#[...]</code>
syntax. It has these elements:</p>

<ol>
  <li>The function’s lambda list (zero arguments)</li>
  <li>Byte-codes stored in a unibyte string</li>
  <li>Constants vector</li>
  <li>Maximum stack space needed by this function</li>
</ol>

<p>Notice that the lexical environment has been captured in the constants
vector, specifically noting the lack of <code class="language-plaintext highlighter-rouge">:ignored</code> in this vector. The
compiler didn’t capture it.</p>

<p>For those curious about the byte-code here’s an explanation. The
string syntax shown is in octal, representing a string containing two
bytes: 192 and 135. The
<a href="/blog/2014/01/04/">Elisp byte-code interpreter is stack-based</a>. The 192
(<code class="language-plaintext highlighter-rouge">constant 0</code>) says to push the first constant onto the stack. The 135
(<code class="language-plaintext highlighter-rouge">return</code>) says to pop the top element from the stack and return it.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">coerce</span> <span class="s">"\300\207"</span> <span class="ss">'list</span><span class="p">)</span>
<span class="c1">;; =&gt; (192 135)</span>
</code></pre></div></div>

<h3 id="the-readable-closures-catch">The Readable Closures Catch</h3>

<p>Since closures are byte-code function objects, they print readably.
You can capture an environment in a closure, serialize it, read it
back in, and evaluate it. That’s pretty cool! This means closures can
be transmitted to other Emacs instances in a multi-processing setup
(i.e. <a href="https://github.com/nicferrier/elnode">Elnode</a>, <a href="https://github.com/jwiegley/emacs-async">Async</a>)</p>

<p>The catch is that it’s easy to accidentally capture an unreadable
value, especially buffers. Consider this function <code class="language-plaintext highlighter-rouge">bar</code> which uses a
temporary buffer as an efficient string builder. It returns a closure
that returns the result. (Weird, but stick with me here!)</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">bar</span> <span class="p">(</span><span class="nv">n</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">with-temp-buffer</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">standard-output</span> <span class="p">(</span><span class="nv">current-buffer</span><span class="p">)))</span>
      <span class="p">(</span><span class="nb">loop</span> <span class="nv">for</span> <span class="nv">i</span> <span class="nv">from</span> <span class="mi">0</span> <span class="nv">to</span> <span class="nv">n</span> <span class="nb">do</span> <span class="p">(</span><span class="nb">princ</span> <span class="nv">i</span><span class="p">))</span>
      <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nb">string</span> <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)))</span>
        <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="nb">string</span><span class="p">)))))</span>
</code></pre></div></div>

<p>The compiled form looks fine,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">bar</span> <span class="mi">3</span><span class="p">)</span>
<span class="c1">;; =&gt; #[0 "\300\207" ["0123"] 1]</span>
</code></pre></div></div>

<p>But the interpreted form of the closure has a problem. The
<code class="language-plaintext highlighter-rouge">with-temp-buffer</code> macro silently introduced a new binding — an
abstraction leak.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">bar</span> <span class="mi">3</span><span class="p">)</span>
<span class="c1">;; =&gt; (closure ((string . "0123")</span>
<span class="c1">;;              (temp-buffer . #&lt;killed buffer&gt;)</span>
<span class="c1">;;              (n . 3) t)</span>
<span class="c1">;;      () string)</span>
</code></pre></div></div>

<p>The temporary buffer is mistakenly captured in the closure making it
unreadable, but <em>only</em> in its uncompiled form. This creates the
awkward situation where compiled and uncompiled code has <a href="/blog/2016/12/22/#accidental-closures">different
behavior</a>.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Clojure-style Multimethods in Emacs Lisp</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/12/18/"/>
    <id>urn:uuid:029f9acb-a29f-3e58-14f3-457f245cdb5d</id>
    <updated>2013-12-18T23:06:15Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="clojure"/>
    <content type="html">
      <![CDATA[<p>This past week I added <a href="http://clojure.org/multimethods">Clojure-style multimethods</a> to Emacs
Lisp through a package I call <code class="language-plaintext highlighter-rouge">predd</code> (predicate dispatch). <strong>I
believe it is Elisp’s very first complete <em>multiple dispatch</em> object
system!</strong> That is, methods are dispatched based on the dynamic,
run-time type of <a href="http://en.wikipedia.org/wiki/Multimethods">more than one of its arguments</a>.</p>

<ul>
  <li><a href="https://github.com/skeeto/predd">https://github.com/skeeto/predd</a></li>
</ul>

<p>(Unfortunately I was unaware of the
<a href="https://github.com/kurisuwhyte/emacs-multi">other Clojure-style multimethod library</a> when I wrote mine.
However, my version is <em>much</em> more complete, has better performance,
and is public domain.)</p>

<p>As of version 23.2, Emacs includes a CLOS-like object system cleverly
named EIEIO. While CLOS (Common Lisp Object System) is multiple
dispatch, EIEIO is, like most object systems, only <em>single dispatch</em>.
The predd package is also very different than my other Elisp object
system, <a href="/blog/2013/04/07/">@</a>, which was prototype based and, therefore, also single
dispatch (and comically slow).</p>

<p>The <a href="http://clojure.org/multimethods">Clojure multimethods documentation</a> provides a good
introduction. The predd package works almost exactly the same way,
except that due to Elisp’s lack of namespacing the function names are
prefixed with <code class="language-plaintext highlighter-rouge">predd-</code>. Also different is that the optional hierarchy
(<code class="language-plaintext highlighter-rouge">h</code>) argument is handled by the dynamic variable <code class="language-plaintext highlighter-rouge">predd-hierarchy</code>,
which holds the global hierarchy.</p>

<h3 id="combination-example">Combination Example</h3>

<p>To define a multimethod, pick a name and give it a <em>classifier
function</em>. The classifier function will look at the method’s arguments
and return a <em>dispatch value</em>. This value is used to select a
particular method. What makes predd a multiple dispatch system is the
dispatch value can be derived from any number of methods arguments.
Because the dispatch value is computed at run-time this is called a
<em>late binding</em>.</p>

<p>Here I’m going to define a multimethod called <code class="language-plaintext highlighter-rouge">combine</code> that takes two
arguments. It combines its arguments appropriately depending on their
dynamic run-time types.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">predd-defmulti</span> <span class="nv">combine</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">vector</span> <span class="p">(</span><span class="nb">type-of</span> <span class="nv">a</span><span class="p">)</span> <span class="p">(</span><span class="nb">type-of</span> <span class="nv">b</span><span class="p">)))</span>
  <span class="s">"Appropriately combine A and B."</span><span class="p">)</span>
</code></pre></div></div>

<p>The classifier uses <code class="language-plaintext highlighter-rouge">type-of</code>, an Elisp built-in, to examine its
argument types. It returns them as tuple in the form of a vector. The
classifier of a method can be accessed with <code class="language-plaintext highlighter-rouge">predd-classifier</code>, which
I’ll use to demonstrate what these dispatch values will look like.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">predd-classifier</span> <span class="ss">'combine</span><span class="p">)</span> <span class="mi">1</span> <span class="mi">2</span><span class="p">)</span>    <span class="c1">; =&gt; [integer integer]</span>
<span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">predd-classifier</span> <span class="ss">'combine</span><span class="p">)</span> <span class="mi">1</span> <span class="s">"2"</span><span class="p">)</span>  <span class="c1">; =&gt; [integer string]</span>
</code></pre></div></div>

<p>I chose a vector for the dispatch value because I like the bracket
style when defining methods (you’ll see below). The dispatch value can
be literally anything that <code class="language-plaintext highlighter-rouge">equal</code> knows how to compare, not just
vectors. Note that it’s actually faster to create a list than a vector
up to a length of about 6, so this multimethod would be faster if the
classifier returned a list — or even better: a single cons.</p>

<p>Now define some methods for different dispatch values.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">predd-defmethod</span> <span class="nv">combine</span> <span class="nv">[integer</span> <span class="nv">integer]</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">+</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">))</span>

<span class="p">(</span><span class="nv">predd-defmethod</span> <span class="nv">combine</span> <span class="nv">[string</span> <span class="nv">string]</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">concat</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">))</span>

<span class="p">(</span><span class="nv">predd-defmethod</span> <span class="nv">combine</span> <span class="nv">[cons</span> <span class="nv">cons]</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">append</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">))</span>
</code></pre></div></div>

<p>Now try it out.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">combine</span> <span class="mi">1</span> <span class="mi">2</span><span class="p">)</span>            <span class="c1">; =&gt; 3</span>
<span class="p">(</span><span class="nv">combine</span> <span class="s">"a"</span> <span class="s">"b"</span><span class="p">)</span>        <span class="c1">; =&gt;"ab"</span>
<span class="p">(</span><span class="nv">combine</span> <span class="o">'</span><span class="p">(</span><span class="mi">1</span> <span class="mi">2</span><span class="p">)</span> <span class="o">'</span><span class="p">(</span><span class="mi">3</span> <span class="mi">4</span><span class="p">))</span>  <span class="c1">; =&gt; (1 2 3 4)</span>

<span class="p">(</span><span class="nv">combine</span> <span class="mi">1</span> <span class="o">'</span><span class="p">(</span><span class="mi">3</span> <span class="mi">4</span><span class="p">))</span>
<span class="c1">; error: "No method found in combine for [integer cons]"</span>
</code></pre></div></div>

<p>Notice in the last case it didn’t know how to combine these two types,
so it threw an error. In this simple example where we’re only calling
a single function, so rather than use the <code class="language-plaintext highlighter-rouge">predd-defmethod</code> macro
these methods can be added directly with the <code class="language-plaintext highlighter-rouge">predd-add-method</code>
function. This has the exact same result except that it has slightly
better performance (no wrapper functions).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">predd-add-method</span> <span class="ss">'combine</span> <span class="nv">[integer</span> <span class="nv">integer]</span> <span class="nf">#'</span><span class="nb">+</span><span class="p">)</span>
<span class="p">(</span><span class="nv">predd-add-method</span> <span class="ss">'combine</span> <span class="nv">[string</span> <span class="nv">string]</span>   <span class="nf">#'</span><span class="nv">concat</span><span class="p">)</span>
<span class="p">(</span><span class="nv">predd-add-method</span> <span class="ss">'combine</span> <span class="nv">[cons</span> <span class="nv">cons]</span>       <span class="nf">#'</span><span class="nb">append</span><span class="p">)</span>
</code></pre></div></div>

<h4 id="use-the-hierarchy">Use the Hierarchy</h4>

<p>Hmmm, the <code class="language-plaintext highlighter-rouge">+</code> function is already polymorphic. It seamlessly operates
on both floats and integers. So far it seems there’s no way to exploit
this with multimethods. Fortunately we can solve this by defining our
own ad hoc hierarchy using <code class="language-plaintext highlighter-rouge">predd-derive</code>. Both integers and floats
are a kind of number. It’s important to note that <code class="language-plaintext highlighter-rouge">type-of</code> never
returns <code class="language-plaintext highlighter-rouge">number</code>. We’re introducing that name here ourselves.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">type-of</span> <span class="mf">1.0</span><span class="p">)</span>  <span class="c1">; =&gt; float</span>

<span class="p">(</span><span class="nv">predd-derive</span> <span class="ss">'integer</span> <span class="ss">'number</span><span class="p">)</span>
<span class="p">(</span><span class="nv">predd-derive</span> <span class="ss">'float</span> <span class="ss">'number</span><span class="p">)</span>

<span class="c1">;; Types can derive from multiple parents, like multiple inheritance</span>
<span class="p">(</span><span class="nv">predd-derive</span> <span class="ss">'integer</span> <span class="ss">'exact</span><span class="p">)</span>
<span class="p">(</span><span class="nv">predd-derive</span> <span class="ss">'float</span> <span class="ss">'inexact</span><span class="p">)</span>
</code></pre></div></div>

<p>This says that <code class="language-plaintext highlighter-rouge">integer</code> and <code class="language-plaintext highlighter-rouge">float</code> are each a kind of <code class="language-plaintext highlighter-rouge">number</code>. Now
we can use <code class="language-plaintext highlighter-rouge">number</code> in a dispatch value. When it sees something like
<code class="language-plaintext highlighter-rouge">[float integer]</code> it knows that it matches <code class="language-plaintext highlighter-rouge">[number number]</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">predd-add-method</span> <span class="ss">'combine</span> <span class="nv">[number</span> <span class="nv">number]</span> <span class="nf">#'</span><span class="nb">+</span><span class="p">)</span>

<span class="p">(</span><span class="nv">combine</span> <span class="mf">1.5</span> <span class="mi">2</span><span class="p">)</span>  <span class="c1">; =&gt; 3.5</span>
</code></pre></div></div>

<p>We can check the hierarchy explicitly with <code class="language-plaintext highlighter-rouge">predd-isa-p</code> (like
Clojure’s <code class="language-plaintext highlighter-rouge">isa?</code>). It compares two values just like <code class="language-plaintext highlighter-rouge">equal</code>, but it
also accounts for all <code class="language-plaintext highlighter-rouge">predd-derive</code> declarations. Because of this
extra concern, unlike <code class="language-plaintext highlighter-rouge">equal</code>, <code class="language-plaintext highlighter-rouge">predd-isa-p</code> is <em>not</em> commutative.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">predd-isa-p</span> <span class="ss">'number</span> <span class="ss">'number</span><span class="p">)</span>  <span class="c1">; =&gt; 0</span>
<span class="p">(</span><span class="nv">predd-isa-p</span> <span class="ss">'float</span> <span class="ss">'number</span><span class="p">)</span>   <span class="c1">; =&gt; 1</span>
<span class="p">(</span><span class="nv">predd-isa-p</span> <span class="ss">'number</span> <span class="ss">'float</span><span class="p">)</span>   <span class="c1">; =&gt; nil</span>

<span class="p">(</span><span class="nv">predd-isa-p</span> <span class="nv">[float</span> <span class="nv">float]</span> <span class="nv">[number</span> <span class="nv">number]</span><span class="p">)</span>  <span class="c1">; =&gt; 2</span>
</code></pre></div></div>

<p>(Remember that <code class="language-plaintext highlighter-rouge">0</code> is truthy in Elisp.) The integer returned is a
distance metric used by method dispatch to determine which values are
“closer” so that the most appropriate method is selected.</p>

<p>You might be worried that introducing <code class="language-plaintext highlighter-rouge">number</code> will make the
multimethod slower. Examining the hierarchy will definitely have a
cost after all. Fortunately predd has a dispatch cache, so
introducing this indirection will have <em>no</em> additional performance
penalty after the first call with a particular dispatch value.</p>

<h3 id="struct-example">Struct Example</h3>

<p>Something that really sets these multimethods apart from other object
systems is a lack of concern about encapsulation — or really about
object data in general. That’s the classifier’s concern. So here’s an
example of how to combine predd with <code class="language-plaintext highlighter-rouge">defstruct</code> from cl/cl-lib.</p>

<p>Imagine we’re making some kind of game where each of the creatures is
represented by an <code class="language-plaintext highlighter-rouge">actor</code> struct. Each actor has a name, hit points,
and active status effects.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defstruct</span> <span class="nv">actor</span>
  <span class="p">(</span><span class="nv">name</span> <span class="s">"Unknown"</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">hp</span> <span class="mi">100</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">statuses</span> <span class="p">()))</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">defstruct</code> macro has a useful inheritance feature that we can
exploit for our game to create subtypes. The parent accessors will
work on these subtypes, immediately providing some (efficient)
polymorphism even before multimethods are involved.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defstruct</span> <span class="p">(</span><span class="nv">player</span> <span class="p">(</span><span class="ss">:include</span> <span class="nv">actor</span><span class="p">))</span>
  <span class="nv">control-scheme</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defstruct</span> <span class="p">(</span><span class="nv">stinkmonster</span> <span class="p">(</span><span class="ss">:include</span> <span class="nv">actor</span><span class="p">))</span>
  <span class="p">(</span><span class="k">type</span> <span class="ss">'sewage</span><span class="p">))</span>

<span class="p">(</span><span class="nv">actor-hp</span> <span class="p">(</span><span class="nv">make-stinkmonster</span><span class="p">))</span>  <span class="c1">; =&gt; 100</span>
</code></pre></div></div>

<p>As a side note: this isn’t necessarily the best way to go about
modeling a game. We probably shouldn’t be relying on inheritance too
much, but bear with me for this example.</p>

<p>Say we want an <code class="language-plaintext highlighter-rouge">attack</code> method for handling attacks between different
types of monsters. Elisp structs have a very useful property by
default: they’re simply vectors whose first element is a symbol
denoting its type. We can use this in a multimethod classifier.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">make-player</span><span class="p">)</span>
<span class="c1">;; =&gt; [cl-struct-player "Unknown" 100 nil nil]</span>

<span class="p">(</span><span class="nv">predd-defmulti</span> <span class="nv">attack</span>
    <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">attacker</span> <span class="nv">victim</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">vector</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">attacker</span> <span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">victim</span> <span class="mi">0</span><span class="p">)))</span>
  <span class="s">"Perform an attack from ATTACKER on VICTIM."</span><span class="p">)</span>
</code></pre></div></div>

<p>Let’s define a base case. This will be overridden by more specific
methods (determined by that distance metric).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">predd-defmethod</span> <span class="nv">attack</span> <span class="nv">[cl-struct-actor</span> <span class="nv">cl-struct-actor]</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">v</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">decf</span> <span class="p">(</span><span class="nv">actor-hp</span> <span class="nv">v</span><span class="p">)</span> <span class="mi">10</span><span class="p">))</span>
</code></pre></div></div>

<p>We could have instead used <code class="language-plaintext highlighter-rouge">:default</code> for the dispatch value, which is
a special catch-all value. The <code class="language-plaintext highlighter-rouge">actor-hp</code> function will signal an
error for any victim non-actors anyway. However, not using <code class="language-plaintext highlighter-rouge">:default</code>
will force both argument types to be checked. It will also demonstrate
specialization for the example.</p>

<p>However, before we can make use of this we need to teach predd about
the relationship between these structs. It doesn’t check <code class="language-plaintext highlighter-rouge">defstruct</code>
hierarchies. This step is what makes combining <code class="language-plaintext highlighter-rouge">defstruct</code> and predd
a little unwieldy. A wrapper macro is probably due for this.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">predd-derive</span> <span class="ss">'cl-struct-player</span> <span class="ss">'cl-struct-actor</span><span class="p">)</span>
<span class="p">(</span><span class="nv">predd-derive</span> <span class="ss">'cl-struct-stinkmonster</span> <span class="ss">'cl-struct-actor</span><span class="p">)</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">player</span> <span class="p">(</span><span class="nv">make-player</span><span class="p">))</span>
      <span class="p">(</span><span class="nv">monster</span> <span class="p">(</span><span class="nv">make-stinkmonster</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">attack</span> <span class="nv">player</span> <span class="nv">monster</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">actor-hp</span> <span class="nv">monster</span><span class="p">))</span>
<span class="c1">;; =&gt; 90</span>
</code></pre></div></div>

<p>When the stinkmonster attacks players it doesn’t do damage. Instead it
applies a status effect.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">predd-defmethod</span> <span class="nv">attack</span> <span class="nv">[cl-struct-stinkmonster</span> <span class="nv">cl-struct-player]</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">v</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">pushnew</span> <span class="p">(</span><span class="nv">stinkmonster-type</span> <span class="nv">a</span><span class="p">)</span> <span class="p">(</span><span class="nv">actor-statuses</span> <span class="nv">v</span><span class="p">)))</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">player</span> <span class="p">(</span><span class="nv">make-player</span><span class="p">))</span>
      <span class="p">(</span><span class="nv">monster</span> <span class="p">(</span><span class="nv">make-stinkmonster</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">attack</span> <span class="nv">monster</span> <span class="nv">player</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">actor-statuses</span> <span class="nv">player</span><span class="p">))</span>
<span class="c1">;; =&gt; (sewage)</span>
</code></pre></div></div>

<p>If the monster applied a status effect in addition to the default
attack behavior then CLOS-style method combination would be far more
appropriate here (if only it was available in Elisp). The method would
instead be defined as an “after” method and it would automatically run
in addition to the default behavior.</p>

<p>If I was actually building a system combing structs and predd, I would
be using this helper function for building classifiers. It returns a
dispatch value for selected arguments.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: t; -*-</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">struct-classifier</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">pattern</span><span class="p">)</span>
  <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">args</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">loop</span> <span class="nv">for</span> <span class="nv">select-p</span> <span class="nv">in</span> <span class="nv">pattern</span> <span class="nb">and</span> <span class="nv">arg</span> <span class="nv">in</span> <span class="nv">args</span>
          <span class="nb">when</span> <span class="nv">select-p</span> <span class="nv">collect</span> <span class="p">(</span><span class="nb">elt</span> <span class="nv">arg</span> <span class="mi">0</span><span class="p">))))</span>

<span class="c1">;; Takes 3 arguments, dispatches on the first 2 argument types.</span>
<span class="p">(</span><span class="nv">predd-defmulti</span> <span class="nv">speak</span> <span class="p">(</span><span class="nv">struct-classifier</span> <span class="no">t</span> <span class="no">t</span> <span class="no">nil</span><span class="p">))</span>

<span class="c1">;; Messages sent to the player are displayed.</span>
<span class="p">(</span><span class="nv">predd-defmethod</span> <span class="nv">speak</span> <span class="o">'</span><span class="p">(</span><span class="nv">cl-struct-actor</span> <span class="nv">cl-struct-player</span><span class="p">)</span> <span class="p">(</span><span class="nv">from</span> <span class="nv">to</span> <span class="nv">message</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">message</span> <span class="s">"%s says %s."</span> <span class="p">(</span><span class="nv">actor-name</span> <span class="nv">from</span><span class="p">)</span> <span class="nv">message</span><span class="p">))</span>
</code></pre></div></div>

<h3 id="the-future">The Future</h3>

<p>As of this writing there isn’t yet a <code class="language-plaintext highlighter-rouge">prefer-method</code> for
disambiguating equally preferred dispatch values. I will add it in the
future. I think <code class="language-plaintext highlighter-rouge">prefer-method</code> gets unwieldy quickly as the type
hierarchy grows, so it should be avoided anyway.</p>

<p>I haven’t put predd in MELPA or otherwise published it yet. That’s
what this post is for. But I think it’s ready for prime time, so feel
free to try it out.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Emacs Lisp Reddit API Wrapper</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/12/16/"/>
    <id>urn:uuid:3362934d-9762-3f58-e05c-4d8b28175367</id>
    <updated>2013-12-16T23:27:23Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="reddit"/><category term="web"/>
    <content type="html">
      <![CDATA[<p>A couple of months ago I wrote an Emacs Lisp wrapper for the
<a href="http://old.reddit.com/dev/api">reddit API</a>. I didn’t put it in MELPA,
not yet anyway. If anyone is finding it useful I’ll see about getting
that done. My intention was give it some exercise and testing before
putting it out there for people to use, locking down the API. You can
find it here,</p>

<ul>
  <li><a href="https://github.com/skeeto/emacs-reddit-api">https://github.com/skeeto/emacs-reddit-api</a></li>
</ul>

<p>Except for logging in, the library is agnostic about the actual API
endpoints themselves. It just knows how to translate between Elisp and
the reddit API protocol. This makes the library dead simple to use. I
had considered supporting <a href="http://blog.jenkster.com/2013/10/an-oauth2-in-emacs-example.html">OAuth2 authentication</a> rather than
password authentication, but reddit’s OAuth2 support is pretty rough
around the edges.</p>

<h3 id="library-usage">Library Usage</h3>

<p>The reddit API has two kinds of endpoints, GET and POST, so there are
really only three functions to concern yourself with.</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">reddit-login</code></li>
  <li><code class="language-plaintext highlighter-rouge">reddit-get</code></li>
  <li><code class="language-plaintext highlighter-rouge">reddit-post</code></li>
</ul>

<p>And one variable,</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">reddit-session</code></li>
</ul>

<p>The <code class="language-plaintext highlighter-rouge">reddit-login</code> function is really just a special case of
<code class="language-plaintext highlighter-rouge">reddit-post</code>. It returns a session value (cookie/modhash tuple) that
is used by the other two functions for authenticating the user. Just
as you get automatically with almost all Elisp data structures —
probably more so than <em>any</em> other popular programming language — it
can be serialized with the printer and reader, allowing a reddit
session to be maintained across Emacs sessions.</p>

<p>The return value of <code class="language-plaintext highlighter-rouge">reddit-login</code> generally doesn’t need to be
captured. It automatically sets the dynamic variable <code class="language-plaintext highlighter-rouge">reddit-session</code>,
which is what the other functions access for authentication. This can
be bound with <code class="language-plaintext highlighter-rouge">let</code> to other session values in order to switch between
different users.</p>

<p>Both <code class="language-plaintext highlighter-rouge">reddit-get</code> and <code class="language-plaintext highlighter-rouge">reddit-post</code> take an endpoint name and a list
of key-value pairs in the form of a property list (plist). (The
<code class="language-plaintext highlighter-rouge">api-type</code> key is automatically supplied.) They each return the JSON
response from the server in association list (alist) form. The actual
shape of this data matches the response from reddit, which,
unfortunately, is inconsistent and unspecified, so writing any sort of
program to operate on the API requires lots of trial and error. If the
API responded with an error, these functions signal a <code class="language-plaintext highlighter-rouge">reddit-error</code>.</p>

<p>Typical usage looks like so. Notice that values need not be only
strings; they just need to print to something reasonable.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Login first</span>
<span class="p">(</span><span class="nv">reddit-login</span> <span class="s">"your-username"</span> <span class="s">"your-password"</span><span class="p">)</span>

<span class="c1">;; Subscribe to a subreddit</span>
<span class="p">(</span><span class="nv">reddit-post</span> <span class="s">"/api/subscribe"</span> <span class="o">'</span><span class="p">(</span><span class="ss">:sr</span> <span class="s">"t5_2s49f"</span> <span class="ss">:action</span> <span class="nv">sub</span><span class="p">))</span>

<span class="c1">;; Post a comment</span>
<span class="p">(</span><span class="nv">reddit-post</span> <span class="s">"/api/comment/"</span> <span class="o">'</span><span class="p">(</span><span class="ss">:text</span> <span class="s">"Hello world."</span> <span class="ss">:thing_id</span> <span class="s">"t1_cd3ar7y"</span><span class="p">))</span>
</code></pre></div></div>

<p>For plists keys I considered automatically converting between dashes
and underscores so that the keywords could have Lisp-style names. But
the reddit API is inconsistent, using both, so there’s no correct way
to do this.</p>

<p>To further refine the API it might be worth defining a function for
each of the reddit endpoints, forming a facade for the wrapper
library, hiding way the plist arguments and complicated responses.
That would eliminate the trial and error of using the API.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">reddit-api-comment</span> <span class="p">(</span><span class="nv">parent</span> <span class="nv">comment</span><span class="p">)</span>
  <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">null</span> <span class="nv">reddit-session</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">error</span> <span class="s">"Not logged in."</span><span class="p">)</span>
    <span class="c1">;; TODO: reduce the return value into a thing/struct</span>
    <span class="p">(</span><span class="nv">reddit-post</span> <span class="s">"/api/comment/"</span> <span class="o">'</span><span class="p">(</span><span class="ss">:thing_id</span> <span class="nv">parent</span> <span class="ss">:text</span> <span class="nv">comment</span><span class="p">))))</span>
</code></pre></div></div>

<p>Furthermore there could be defstructs for comments, posts, subreddits,
etc. so that the “thing” ID stuff is hidden away. This is basically
what was already done for sessions out of necessity. I might add these
structs and functions someday but I don’t currently have a need for
it.</p>

<p>It would be neat to use this API to create an interface to reddit from
within Emacs. I imagine it might look like one of the Emacs mail
clients, or <a href="/blog/2013/09/04/">like Elfeed</a>. Almost everything, including
viewing image posts within Emacs, should be possible.</p>

<h3 id="background">Background</h3>

<p>For the last 3.5 years I’ve been a moderator of <a href="http://old.reddit.com/r/civ">/r/civ</a>,
<a href="http://old.reddit.com/r/civ/comments/clxj4/lets_tidy_rciv_up_a_bit/">starting back when it had about 100 subscribers</a>. As of this
writing it’s just short of 60k subscribers and we’re now up to 9
moderators.</p>

<p>A few months ago we decided to institute a self-post-only Sunday. All
day Sunday, midnight to midnight Eastern time, only self-posts are
allowed in the subreddit. One of the other moderators was turning this
on and off manually, so I offered to write a bot to do the job. There
<a href="https://github.com/reddit/reddit/wiki/API-Wrappers">weren’t any Lisp wrappers yet</a> (though raw4j could be used
with Clojure), so I decided to write one.</p>

<p>As mentioned before, the reddit API leaves <em>a lot</em> to be desired. It
randomly returns errors, so a correct program needs to be prepared to
retry requests after a short delay, depending on the error. My
particular annoyance is that the <code class="language-plaintext highlighter-rouge">/api/site_admin</code> endpoint requires
that most of its keys are supplied, and it’s not documented which ones
are required. Even worse, there’s no single endpoint to get all of the
required values, the key names between endpoints are inconsistent, and
even the values themselves can’t be returned as-is, requiring
<a href="http://old.reddit.com/r/bugs/comments/1t162o/">massaging/fixing before returning them back to the API</a>.</p>

<p>I hope other people find this library useful!</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Emacs, Thanksgiving, and Hanukkah</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/11/28/"/>
    <id>urn:uuid:cd66c73c-cb8c-3e12-9c16-396a36266f8b</id>
    <updated>2013-11-28T22:25:36Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="meatspace"/>
    <content type="html">
      <![CDATA[<p>Today is Thanksgiving in the United States. It also happens to be
Hanukkah. There’s been news going around that Thanksgiving and
Hanukkah <a href="http://www.leancrew.com/all-this/2013/01/hanukkah-and-thanksgiving/">will not coincide again for about 80,000 years</a>. This
sounded somewhat unbelievable to me because
<a href="http://blog.plover.com/calendar/july-weekends.html">the Gregorian repeats every 400 years</a>. I decided to
compute it for myself to double-check this figure.</p>

<p>I’m not Jewish and I know very little about Hanukkah, so I had to look
it up. After learning that Hanukkah is based on the Hebrew calendar,
the rumors were sounding more believable. The Hebrew calendar repeats
every 689,472 Hebrew years. This means the correspondence between
Gregorian and Hebrew calendars <a href="http://hebrewcalendar.tripod.com/">is about 14 billion years</a>.
That 80,000 seems lowball.</p>

<p>Since I decided to use Emacs Lisp for the computation, I fortunately
was able to ignore all the unfamiliar, complicated rules for the
Hebrew calendar: Emacs knows how to compute Hebrew dates. It can be
accessed through the function <code class="language-plaintext highlighter-rouge">calendar-hebrew-date-string</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Thanksgiving 2013</span>
<span class="p">(</span><span class="nv">calendar-hebrew-date-string</span> <span class="o">'</span><span class="p">(</span><span class="mi">11</span> <span class="mi">28</span> <span class="mi">2013</span><span class="p">))</span>
<span class="c1">;; =&gt; "Kislev 25, 5774"</span>
</code></pre></div></div>

<p>Hanukkah begins on the 25th of Kislev, so I can write a
quick-and-dirty function to detect if a date is the first day of
Hanukkah.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">hanukkah-p</span> <span class="p">(</span><span class="nv">date</span><span class="p">)</span>
  <span class="s">"Return non-nil if DATE is Hanukkah."</span>
  <span class="p">(</span><span class="nv">string-match-p</span> <span class="s">"^Kislev 25"</span> <span class="p">(</span><span class="nv">calendar-hebrew-date-string</span> <span class="nv">date</span><span class="p">)))</span>
</code></pre></div></div>

<p>Next I need a function to compute Thanksgiving, which is really
simple. Thanksgiving falls on the fourth Thursday of November.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">thanksgiving</span> <span class="p">(</span><span class="nv">year</span><span class="p">)</span>
  <span class="s">"Return the date of Thanksgiving for YEAR."</span>
  <span class="p">(</span><span class="nb">loop</span> <span class="nv">for</span> <span class="nv">day</span> <span class="nv">from</span> <span class="mi">1</span> <span class="nv">upto</span> <span class="mi">7</span>
        <span class="nb">when</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">4</span> <span class="p">(</span><span class="nv">calendar-day-of-week</span> <span class="o">`</span><span class="p">(</span><span class="mi">11</span> <span class="o">,</span><span class="nv">day</span> <span class="o">,</span><span class="nv">year</span><span class="p">)))</span>
        <span class="nb">return</span> <span class="o">`</span><span class="p">(</span><span class="mi">11</span> <span class="o">,</span><span class="p">(</span><span class="nb">+</span> <span class="nv">day</span> <span class="mi">21</span><span class="p">)</span> <span class="o">,</span><span class="nv">year</span><span class="p">)))</span>
</code></pre></div></div>

<p>If there was no <code class="language-plaintext highlighter-rouge">calendar-day-of-week</code> I could compute it using
<a href="http://en.wikipedia.org/wiki/Determination_of_the_day_of_the_week#Gauss.27s_algorithm">Zeller’s algorithm</a>, which I already happen to have
implemented,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">cal/day-of-week</span> <span class="p">(</span><span class="nv">year</span> <span class="nv">month</span> <span class="nv">day</span><span class="p">)</span>
  <span class="s">"Return day of week number (0-7)."</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">Y</span> <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">&lt;</span> <span class="nv">month</span> <span class="mi">3</span><span class="p">)</span> <span class="p">(</span><span class="nb">1-</span> <span class="nv">year</span><span class="p">)</span> <span class="nv">year</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">m</span> <span class="p">(</span><span class="nb">1+</span> <span class="p">(</span><span class="nb">mod</span> <span class="p">(</span><span class="nb">+</span> <span class="nv">month</span> <span class="mi">9</span><span class="p">)</span> <span class="mi">12</span><span class="p">)))</span>
         <span class="p">(</span><span class="nv">y</span> <span class="p">(</span><span class="nb">mod</span> <span class="nv">Y</span> <span class="mi">100</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">c</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">Y</span> <span class="mi">100</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">mod</span> <span class="p">(</span><span class="nb">+</span> <span class="nv">day</span> <span class="p">(</span><span class="nb">floor</span> <span class="p">(</span><span class="nb">-</span> <span class="p">(</span><span class="nb">*</span> <span class="mi">26</span> <span class="nv">m</span><span class="p">)</span> <span class="mi">2</span><span class="p">)</span> <span class="mi">10</span><span class="p">)</span> <span class="nv">y</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">y</span> <span class="mi">4</span><span class="p">)</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">c</span> <span class="mi">4</span><span class="p">)</span> <span class="p">(</span><span class="nb">*</span> <span class="mi">-2</span> <span class="nv">c</span><span class="p">))</span> <span class="mi">7</span><span class="p">)))</span>
</code></pre></div></div>

<p>Now for each year find Thanksgiving and test it for Hanukkah. I
started with 1942 because that’s when the fourth-Thursday-of-November
rule was established. Presumably due to the regexp part, this
expression takes a moment to compute.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">loop</span> <span class="nv">for</span> <span class="nv">year</span> <span class="nv">from</span> <span class="mi">1942</span> <span class="nv">to</span> <span class="mi">80000</span>
      <span class="nb">when</span> <span class="p">(</span><span class="nv">hanukkah-p</span> <span class="p">(</span><span class="nv">thanksgiving</span> <span class="nv">year</span><span class="p">))</span>
      <span class="nv">collect</span> <span class="nv">year</span><span class="p">)</span>
<span class="c1">;; =&gt; (2013 79043 79290 79537 79564 79635 79784 79811 79882)</span>
</code></pre></div></div>

<p>My result exactly matches what I’m seeing elsewhere. The rumors are
correct! The next coincidence occurs on November 23rd, 79043. Thanks,
Emacs!</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Elfeed Tips and Tricks</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/11/26/"/>
    <id>urn:uuid:45fbc221-dbea-302c-22c0-ec0527421ed8</id>
    <updated>2013-11-26T00:38:20Z</updated>
    <category term="elfeed"/><category term="elisp"/><category term="emacs"/>
    <content type="html">
      <![CDATA[<p>This past weekend I had some questions from next-user-here (NUH) on my
<a href="/blog/2013/09/04/">original Elfeed post</a> about changing some of Elfeed’s
behavior. NUH is an Elisp novice so accomplishing some of the
requested modifications wasn’t obvious. A novice is mostly limited to
setting variables, not defining advice or using hooks. I’ve also been
using Elfeed daily for about three months now as my sole web feed
reader and along the way I’ve developed some best practices. In
addition to responding to some of NIH’s questions here, I’d like to
share some tips and tricks.</p>

<h3 id="custom-entry-launchers">Custom Entry Launchers</h3>

<p>Currently you can press “b” to launch one or more entries in your
browser. You can use “y” to copy an single entry to the clipboard.
What if you want to make another action.</p>

<p>In my configuration I have a fancy binding that sends the entry URLs
in the selected region to <a href="http://rg3.github.io/youtube-dl/">youtube-dl</a> for downloading the
videos. It’s too large to share as a snippet so here’s a small example
of something similar using a program called <code class="language-plaintext highlighter-rouge">xcowsay</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">xcowsay</span> <span class="p">(</span><span class="nv">message</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">call-process</span> <span class="s">"xcowsay"</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="nv">message</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">elfeed-xcowsay</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">entry</span> <span class="p">(</span><span class="nv">elfeed-search-selected</span> <span class="ss">:single</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">xcowsay</span> <span class="p">(</span><span class="nv">elfeed-entry-title</span> <span class="nv">entry</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">define-key</span> <span class="nv">elfeed-search-mode-map</span> <span class="s">"x"</span> <span class="nf">#'</span><span class="nv">elfeed-xcowsay</span><span class="p">)</span>
</code></pre></div></div>

<p>Now when I hit “x” over an entry in Elfeed I’m greeted by a cow
announcing the title.</p>

<p><img src="/img/screenshot/xcowsay-small.png" alt="" /></p>

<h3 id="entry-listing-customization">Entry Listing Customization</h3>

<p>The <em>search</em> buffer you see when starting Elfeed, where entries are
listed, can be customized a few different ways. First, this buffer
<em>does</em> grow dynamically. After re-sizing the window/frame horizontally
you just have to refresh the view by pressing <code class="language-plaintext highlighter-rouge">g</code> (an Emacs
convention). How it fills out depends on the settings of these
variables,</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">elfeed-search-title-max-width</code></li>
  <li><code class="language-plaintext highlighter-rouge">elfeed-search-title-min-width</code></li>
  <li><code class="language-plaintext highlighter-rouge">elfeed-search-trailing-width</code></li>
</ul>

<p>They control how wide the different columns should be as the window
size changes. An important caveat to this is that the cache stored in
<code class="language-plaintext highlighter-rouge">elfeed-search-cache</code> <em>must</em> be cleared before the changes will be
reflected in the display. This cache exists because building the
display, assembling all the special faces, is actually quite
CPU-intensive. It was an optimization I established early on.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">clrhash</span> <span class="nv">elfeed-search-cache</span><span class="p">)</span>
</code></pre></div></div>

<p>If you set these variables in your start-up configuration you don’t
need to worry about clearing the cache because it will already be
empty. It’s only a concern when playing with the settings.</p>

<h4 id="date-display">Date Display</h4>

<p>Another question was about adding time to the entry listing. Elfeed
only displays the entry’s date. Dates are formatted by the function
<code class="language-plaintext highlighter-rouge">elfeed-search-format-date</code>. This can be redefined to display dates
differently.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">elfeed-search-format-date</span> <span class="p">(</span><span class="nv">date</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">format-time-string</span> <span class="s">"%Y-%m-%d %H:%M"</span> <span class="p">(</span><span class="nv">seconds-to-time</span> <span class="nv">date</span><span class="p">)))</span>
</code></pre></div></div>

<p>It’s given epoch seconds as a float and it returns a string to display
as a date.</p>

<h4 id="faces-and-colors">Faces and Colors</h4>

<p>All of the faces used in the display are declared for customization,
so these can be changed to whatever you like.</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">elfeed-search-date-face</code></li>
  <li><code class="language-plaintext highlighter-rouge">elfeed-search-title-face</code></li>
  <li><code class="language-plaintext highlighter-rouge">elfeed-search-feed-face</code></li>
  <li><code class="language-plaintext highlighter-rouge">elfeed-search-tag-face</code></li>
</ul>

<p>Say you suffered a head injury and decided you want your Elfeed dates
to be bold, purple, and underlined,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">custom-set-faces</span>
 <span class="o">'</span><span class="p">(</span><span class="nv">elfeed-search-date-face</span>
   <span class="p">((</span><span class="no">t</span> <span class="ss">:foreground</span> <span class="s">"#f0f"</span>
       <span class="ss">:weight</span> <span class="nv">extra-bold</span>
       <span class="ss">:underline</span> <span class="no">t</span><span class="p">))))</span>
</code></pre></div></div>

<h3 id="database-manipulation">Database Manipulation</h3>

<p>Feeds and entries in the database can be manipulated to become
whatever you want them to be. Because Elfeed is regularly modifying
the database, the trick is to perform the manipulation at <em>just</em> the
right time.</p>

<h4 id="feed-title-changes">Feed Title Changes</h4>

<p>Say you want to change a feed title because you don’t like the title
supplied by the feed. For example, the title to my blog’s feed is
“null program” but instead you think it should be “Seriously Handsome
Programmer” (head injury, remember?). The function
<code class="language-plaintext highlighter-rouge">elfeed-db-get-feed</code> can be used to fetch a feed’s data structure from
the database, given it’s exact URL as listed in your <code class="language-plaintext highlighter-rouge">elfeed-feeds</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">feed</span> <span class="p">(</span><span class="nv">elfeed-db-get-feed</span> <span class="s">"https://nullprogram.com/feed/"</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">elfeed-feed-title</span> <span class="nv">feed</span><span class="p">)</span> <span class="s">"Seriously Handsome Programmer"</span><span class="p">))</span>
</code></pre></div></div>

<p>Hold it, that didn’t work. First, that display cache is getting in the
way again. Feed titles change very infrequently so they’re cached
aggressively. More importantly, next time you update your feeds Elfeed
will re-synchronize the feed title with the official title. It’s going
to fight against your intervention.</p>

<p>The solution is to do it with a little bit of advice just before the
title is displayed. Advise the function <code class="language-plaintext highlighter-rouge">elfeed-search-update</code> with
some “before” advice.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defadvice</span> <span class="nv">elfeed-search-update</span> <span class="p">(</span><span class="nv">before</span> <span class="nv">nullprogram</span> <span class="nv">activate</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">feed</span> <span class="p">(</span><span class="nv">elfeed-db-get-feed</span> <span class="s">"https://nullprogram.com/feed/"</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">elfeed-feed-title</span> <span class="nv">feed</span><span class="p">)</span> <span class="s">"Seriously Handsome Programmer"</span><span class="p">)))</span>
</code></pre></div></div>

<h4 id="entry-tweaking">Entry Tweaking</h4>

<p>Automatic entry modification should happen immediately upon discovery
so that it looks like the entry arrived that way. This is done through
the <code class="language-plaintext highlighter-rouge">elfeed-new-entry-hook</code>. Generally this would be used for applying
custom tags. These examples are from the documentation:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Mark all YouTube entries</span>
<span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'elfeed-new-entry-hook</span>
          <span class="p">(</span><span class="nv">elfeed-make-tagger</span> <span class="ss">:feed-url</span> <span class="s">"youtube\\.com"</span>
                              <span class="ss">:add</span> <span class="o">'</span><span class="p">(</span><span class="nv">video</span> <span class="nv">youtube</span><span class="p">)))</span>

<span class="c1">;; Entries older than 2 weeks are marked as read</span>
<span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'elfeed-new-entry-hook</span>
          <span class="p">(</span><span class="nv">elfeed-make-tagger</span> <span class="ss">:before</span> <span class="s">"2 weeks ago"</span>
                              <span class="ss">:remove</span> <span class="ss">'unread</span><span class="p">))</span>

<span class="c1">;; Building subset feeds</span>
<span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'elfeed-new-entry-hook</span>
          <span class="p">(</span><span class="nv">elfeed-make-tagger</span> <span class="ss">:feed-url</span> <span class="s">"example\\.com"</span>
                              <span class="ss">:entry-title</span> <span class="o">'</span><span class="p">(</span><span class="nb">not</span> <span class="s">"something interesting"</span><span class="p">)</span>
                              <span class="ss">:add</span> <span class="ss">'junk</span>
                              <span class="ss">:remove</span> <span class="ss">'unread</span><span class="p">))</span>
</code></pre></div></div>

<p>Due to a feature I recently ported from my personal configuration,
this tagger helper function is less necessary. You can put lists in
your <code class="language-plaintext highlighter-rouge">elfeed-feeds</code> list to supply automatic tags.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">setq</span> <span class="nv">elfeed-feeds</span>
      <span class="o">'</span><span class="p">((</span><span class="s">"https://nullprogram.com/feed/"</span> <span class="nv">blog</span> <span class="nv">emacs</span><span class="p">)</span>
        <span class="s">"http://www.50ply.com/atom.xml"</span>  <span class="c1">; no autotagging</span>
        <span class="p">(</span><span class="s">"http://nedroid.com/feed/"</span> <span class="nv">webcomic</span><span class="p">)))</span>
</code></pre></div></div>

<h4 id="content-tweaking">Content Tweaking</h4>

<p>Going beyond tagging you could change the content of the feed. Say you
want to <a href="http://xkcd.com/1031/">make feeds 100 times better</a>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">hundred-times-better</span> <span class="p">(</span><span class="nv">entry</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">original</span> <span class="p">(</span><span class="nv">elfeed-deref</span> <span class="p">(</span><span class="nv">elfeed-entry-content</span> <span class="nv">entry</span><span class="p">)))</span>
         <span class="p">(</span><span class="nb">replace</span> <span class="p">(</span><span class="nv">replace-regexp-in-string</span> <span class="s">"keyboard"</span> <span class="s">"leopard"</span> <span class="nv">original</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">elfeed-entry-content</span> <span class="nv">entry</span><span class="p">)</span> <span class="p">(</span><span class="nv">elfeed-ref</span> <span class="nb">replace</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'elfeed-new-entry-hook</span> <span class="nf">#'</span><span class="nv">hundred-times-better</span><span class="p">)</span>
</code></pre></div></div>

<p>The same trick could be used to remove advertising, change the date,
change the title, etc. The <code class="language-plaintext highlighter-rouge">elfeed-deref</code> and <code class="language-plaintext highlighter-rouge">elfeed-ref</code> parts are
needed to fetch and store content in the content database. Only a
reference is stored on the structure. You can actually use these
functions at any time outside of Elfeed, but they’ll eventually get
garbage collected if Elfeed doesn’t know about them.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">ref</span> <span class="p">(</span><span class="nv">elfeed-ref</span> <span class="s">"Hello, World"</span><span class="p">))</span>
<span class="c1">;; =&gt; [cl-struct-elfeed-ref "907d14fb3af2b0d4f18c2d46abe8aedce17367bd"]</span>

<span class="p">(</span><span class="nv">elfeed-deref</span> <span class="nv">ref</span><span class="p">)</span>
<span class="c1">;; =&gt; "Hello, World"</span>
</code></pre></div></div>

<h3 id="deletion">Deletion</h3>

<p>A question that’s been asked few times is if entries can be <em>deleted</em>.
To start off, the answer to that question is “no.” There is no
function provided to remove entries from the database. If you want to
remove entries you’re probably taking the wrong approach.</p>

<p>The main problem with removal is that Elfeed needs to keep track of
what it’s seen before. If an entry is removed and then rediscovered,
it will reappear as unread. There are better ways to “remove” entries,
such as tagging them specially.</p>

<p>On a moderately-powerful computer Elfeed can easily handle <em>at least</em>
several tens of thousands of database entries. If “too many entries”
ever becomes a performance problem I’d rather solve it by making the
database faster than by removing information from the database. It’s
already very date-oriented so that older entries are infrequently
touched.</p>

<p>If storage is a concern, you shouldn’t get too worked up about that.
As of this post I have about 6,000 entries in my database and the
index file is only 3.5 MB. The content database after garbage
collection, which is the <code class="language-plaintext highlighter-rouge">data/</code> directory under <code class="language-plaintext highlighter-rouge">~/.elfeed/</code>, with
these 6k entries is 17MB. When I run <code class="language-plaintext highlighter-rouge">M-x elfeed-db-compact</code>,
currently an experimental feature, it drops down to 1.8MB. That’s less
than 1 kB per entry. It’s also less than my personal Liferea database
of roughly the same amount of content (~15MB) before I wrote Elfeed.</p>

<p>If even this storage is still too much you can always blow away your
<code class="language-plaintext highlighter-rouge">data/</code> content database directory. This is safe to do even while
Emacs is running. You’ll still see all of the entries listed in the
search buffer but won’t be able to read them within Emacs until after
the next database update (when it re-fetches the most recent entry
content).</p>

<p>You can also clear out the content database from within Elisp by
visiting every entry and clearing its content field.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">with-elfeed-db-visit</span> <span class="p">(</span><span class="nv">entry</span> <span class="nv">_</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">elfeed-entry-content</span> <span class="nv">entry</span><span class="p">)</span> <span class="no">nil</span><span class="p">))</span>

<span class="p">(</span><span class="nv">elfeed-db-gc</span><span class="p">)</span>  <span class="c1">;; garbage collect everything</span>
</code></pre></div></div>

<p>The same sort of expression can be used to run over all known entries
to perform other changes. If there was a delete function you might use
it here to remove entries older than a certain date, then hope they’re
not rediscovered.</p>

<p>If you <em>never</em> want to store entry content (you never read entries
within Emacs), you can use a hook to always drop it on the floor as it
arrives,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'elfeed-new-entry-hook</span>
          <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">entry</span><span class="p">)</span> <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">elfeed-entry-content</span> <span class="nv">entry</span><span class="p">)</span> <span class="no">nil</span><span class="p">)))</span>
</code></pre></div></div>

<h3 id="questions">Questions?</h3>

<p>If you have any questions or suggestions about how to make Elfeed do
what you want it to do, feel free to ask. Some things may actually
require that I make changes to Elfeed to support it, though I hope
I’ve anticipated your particular need well enough to avoid that.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  <entry>
    <title>The Elfeed Database</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/09/09/"/>
    <id>urn:uuid:8aba2e49-22a0-330b-e664-54fb50ecdd00</id>
    <updated>2013-09-09T05:53:41Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="elfeed"/>
    <content type="html">
      <![CDATA[<p>The design of <a href="/blog/2013/09/04/">Elfeed’s</a> database took some experimentation
before any part of it was settled. A major design constraint was
Emacs’ very limited file input/output. There’s no random access and,
without the aid of an external program, files must always be read and
written wholesale. That’s not database-friendly at all! In the end I
settled on a design that minimized the size of the frequently
rewritten parts, an index with two different data models, by storing
immutable data in a loose-file, content-addressable database.</p>

<p>At the moment there really aren’t any pure-Elisp database solutions
for Emacs. This is almost certainly due to the aforementioned I/O
limitations. I ran into this same problem last year when I created
<a href="/blog/2012/12/29/">an Emacs pastebin server</a>. I attempted, and failed, to
interface with a SQLite database through it’s command line program.
Nic Ferrier has published a <a href="https://github.com/nicferrier/emacs-db">generic database interface</a>,
but it lacks concrete implementations.</p>

<p>As a bit of good news, as far as I know Emacs <em>does</em> properly handle
atomic file updates across all platforms, so a pure-Elisp database
developer would never have to worry about only writing half the
database. It’s always a safe operation. Worst case scenario you’re
left with an old version of data rather than no data at all.</p>

<p>A real possibility for a database would be connecting to an
established database server via TCP with an Emacs network process. If
the server has a specified wire protocol Elisp could talk to it
efficiently. In fact, there’s exists <a href="http://www.online-marketwatch.com/pgel/pg.html">pg.el</a> that does <em>exactly</em>
this for PostgreSQL. Unfortunately I was not able to get this working
with my pastebin, nor is this solution appropriate for Elfeed. It
would be unreasonable to require users to first set up a PostgreSQL
server just to read web feeds!</p>

<p>Ultimately it would seem that any efficient Emacs database requires
the help of an external program. The <a href="http://notmuchmail.org/">notmuch</a> mail client,
which inspired Elfeed, does this. To access the notmuch database a
command line program is run once for each request. A query is passed
as a program argument and the output of the program is parsed into the
result.</p>

<h3 id="the-early-database">The Early Database</h3>

<p>For the first few days of its existence Elfeed only had an in-memory
database. Closing Emacs would lose everything. For my personal usage
patterns, where I read, or at least address, all entries that arrive
— and especially because I use Elfeed on a couple of different
computers — I don’t really <em>need</em> to track things long term. I could
easily mark everything after a certain date as read and forget about
them. However, it would be nice to have and, more importantly, many
people wouldn’t use Elfeed without persistence between Emacs sessions.</p>

<p>So, for the first database I did what I always do: dumped the data
structure to a file using the printer and parsed it back in later
using the reader. This is dead simple in Lisp, it’s very fast, and it
even works for circular data structures. It’s something I missed so
much with the much-less-capable JSON format earlier this year that I
<a href="/blog/2013/03/28/">wrote a JavaScript library to do it</a>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">save-data</span> <span class="p">(</span><span class="nv">file</span> <span class="nv">data</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">with-temp-file</span> <span class="nv">file</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">standard-output</span> <span class="p">(</span><span class="nv">current-buffer</span><span class="p">))</span>
          <span class="p">(</span><span class="nv">print-circle</span> <span class="no">t</span><span class="p">))</span>  <span class="c1">; Allow circular data</span>
      <span class="p">(</span><span class="nb">prin1</span> <span class="nv">data</span><span class="p">))))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">load-data</span> <span class="p">(</span><span class="nv">file</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">with-temp-buffer</span>
    <span class="p">(</span><span class="nv">insert-file-contents</span> <span class="nv">file</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">read</span> <span class="p">(</span><span class="nv">current-buffer</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">save-data</span> <span class="s">"demo.dat"</span> <span class="o">'</span><span class="p">(</span><span class="nv">a</span> <span class="nv">b</span> <span class="nv">c</span> <span class="nv">[</span><span class="s">"1"</span> <span class="mi">2</span> <span class="nv">3]</span><span class="p">))</span>
<span class="p">(</span><span class="nv">load-data</span> <span class="s">"demo.dat"</span><span class="p">)</span>
<span class="c1">;; =&gt; (a b c ["1" 2 3])</span>
</code></pre></div></div>

<p>Anything with a printed representation can be serialized and stored
this way, including symbols, string, numbers, lists, vectors (structs,
objects), hash tables, and even compiled functions (.elc files).
Basically every Emacs library that stores data on disk uses this
technique.</p>

<p>Unfortunately, this is where I hit another serious database
constraint: <a href="http://lists.gnu.org/archive/html/bug-gnu-emacs/2013-08/msg00860.html"><strong><code class="language-plaintext highlighter-rouge">print-circle</code> is broken in Emacs 24.3</strong></a>,
the current stable release. This means Elfeed cannot take advantage of
this useful feature, at least not for a long time, as I had been
counting on. The final database is slightly slower and larger than
strictly required as a result.</p>

<h3 id="the-content-database">The Content Database</h3>

<p>After breaking the circular references of the in-memory database I
finally had persistence for the first time. With the naive
printer/reader approach it was slow, almost 1 second to write just a
few thousand entries on my 6-year-old laptop (my minimum requirements
target machine). I wanted Elfeed to support hundreds of thousands of
entries, if not millions, so this was much too slow.</p>

<p>The big slowdown was writing out all the entry content each time the
database is saved. These large strings containing HTML that rarely
change. There’s no reason to write these out every time, nor is there
a reason to even keep them in memory all the time, as it’s rarely
accessed. The solution is a loose-file, content-addressable database,
very similar to an unpacked Git object database.</p>

<p>The content database stores immutable sequences of characters — not
just raw bytes, but rather multibyte strings — using an unspecified
coding system (right now it’s UTF-8 for all platforms). The filename
for the content is the content hashed with SHA-1
(“content-addressable”). To limit the number of files per directory,
these files are stored in subdirectories named by the first
hex-encoded byte of the hash (just like Git). A database of 4 items
might look like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>data/
   18/
      18ff6f11945b1e9f3e3c4cae8b5275d36b9944e1
      184c06a83f0bc73a8345c6d886f9043bcae095f8
   6b/
      6b59ae257f2bea24703d8adf5747049c138dfc82
   cc/
      cc47d53872ae2a9186151ef1a68392a94e1f091f
</code></pre></div></div>

<p>Something really neat about the content database is that it’s
completely agnostic about Elfeed. If it weren’t for Elfeed’s garbage
collector, anyone could use it to store arbitrary content. The
function <code class="language-plaintext highlighter-rouge">elfeed-ref</code> accepts a string and returns a reference into
the database. Because of the hash, providing the same string in the
future will return the same reference without actually performing a
write. References are dereferenced with <code class="language-plaintext highlighter-rouge">elfeed-deref</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">ref</span> <span class="p">(</span><span class="nv">elfeed-ref</span> <span class="s">"Hello, world!"</span><span class="p">))</span>
<span class="c1">;; =&gt; [cl-struct-elfeed-ref "943a702d06f34599aee1f8da8ef9f7296031d699"]</span>

<span class="p">(</span><span class="nv">elfeed-deref</span> <span class="nv">ref</span><span class="p">)</span>
<span class="c1">;; =&gt; "Hello, world"</span>
</code></pre></div></div>

<p>With content stored elsewhere, entries are a struct containing only
some small metadata: title, link, date, and a content database
reference. Writing out many of them at once is much, much faster.</p>

<p>I don’t expect it happens often, but this also means content is
de-duplicated. If two entries happen to have the same content they’ll
share content database storage. A small savings.</p>

<p>At this point it’s really tempting to get fancier and really put this
content database to use. The core index itself could be stored as raw
content, and the root to accessing the database would be a single
SHA-1 hash referencing it — again, <em>very</em> similar to Git. If an index
stores a reference to the previously written index, then the the
Elfeed database would be an immutable structure tracking its entire
history. Such a change would cost virtually nothing in performance,
just disk space.</p>

<h3 id="multiple-representations">Multiple Representations</h3>

<p>With all the content out of the way, the database is now just a lean
index. At this point it’s a hash table mapping feed IDs to feeds.
Feeds contain a list of its entries. To build the entry listing for
the elfeed-search buffer, Elfeed needs to visit each feed in the hash
table, gather its entries into one giant list, then finally sort that
list by date. At around O(n log n), that sort operation is a real
performance killer. Completely unacceptable. To fix this we need to
think about how the data is updated and used.</p>

<p>First, <strong>entries are <em>always</em> viewed in date order</strong>, no exceptions.
From my experience of using web feeds for the last six years I <em>never</em>
had a reason to list feed entries by any other order. The vast
majority of the time, newer entries are most relevant, and if I need
to look for something specific I can search for it.</p>

<p>We definitely want to store entries in date-order so we can create
entry listings without performing a sort: something around O(n) or so.
Inserting new entries into this structure should also be efficient.</p>

<p>Second, <strong>entries are never <em>removed</em> from the database</strong>. This isn’t
e-mail. Even if a user doesn’t want to see an entry again, we have to
keep track of it. Otherwise it will show up as new if it’s discovered
in a feed again, which is likely. Things are added to the database and
never removed. In Elfeed, I use a <code class="language-plaintext highlighter-rouge">junk</code> tag to completely hide
entries I don’t want to see, and I always have a <code class="language-plaintext highlighter-rouge">-junk</code> element in my
filter.</p>

<p>There’s an important caveat to this one that I had missed until after
the public release: entry dates can change! When a previously
discovered entry is read from a feed, Elfeed updates (read: mutates)
the entry struct to reflect the new state. This includes the date.
It’s very likely that a date-sorted representation won’t tolerate date
changes underneath it since it’s keying off of them. Either we refuse
to update the entry date, or we remove the entry, update the date, and
then re-insert it (how it currently works).</p>

<p>Third, <strong>entries are generally added with a recent date</strong>. After the
database is initially populated, it’s only picking up new items. We
should prefer adding recently-dated entries be faster than adding
older entries. I didn’t get a chance to take advantage of this, but
it’s something to keep in mind.</p>

<p>Fourth, <strong>entries need to be keyed by an ID string</strong>. Each entry has a
unique, unchanging identifier string, either provided by the feed
itself (RSS’s <code class="language-plaintext highlighter-rouge">guid</code> or Atom’s <code class="language-plaintext highlighter-rouge">id</code>) or generated intelligently by
Elfeed. Especially because of the <code class="language-plaintext highlighter-rouge">print-circle</code> bug, we need to be
able to talk about feeds in terms of their ID — an indirect pointer.</p>

<p>(Actually, even when RSS <code class="language-plaintext highlighter-rouge">guid</code> tags are present, they’re permalinks
by default. So, unfortunately, RSS IDs are not at all resistant to
collisions across feeds. To work around this, entry identifiers are a
<em>pair</em> of strings: feed ID and entry ID. Atom doesn’t have this
problem, but we’re stuck with the lowest common denominator.)</p>

<p>A date-oriented representation would be unable to efficiently look up
an entry by its ID, so it needs to be supplemented by an ID-oriented
representation. This means we need two representations in our
database: date-oriented and ID-oriented.</p>

<p>So what do we use? Well, for keeping entries sorted by date we want
some sort of balanced tree. A B-tree is probably a good choice. Rather
than write one I went with an AVL tree since Emacs comes with a
library for it (<code class="language-plaintext highlighter-rouge">avl-tree</code>). It’s already debugged and optimized! The
bad news is that the internal structure is unspecified, so there are
no guarantees that it can be serialized. A future update to the
library may break the Elfeed database. I also had to hack into it to
work around a security issue. The comparison function is embedded in
the tree. After deserializing the database, Elfeed needs to ensure
that no one stuck a malicious function in there.</p>

<p>The choice for an ID database was super-easy: a hash table. Due to the
<code class="language-plaintext highlighter-rouge">print-circle</code> bug, this is actually the main representation. The AVL
tree only stores IDs and it has to reach into the hash table to do any
date comparisons. If <code class="language-plaintext highlighter-rouge">print-circle</code> was working I could store the same
exact entry objects in the AVL tree as the hash table, so mutating
them would update them in all representations. However, with
<code class="language-plaintext highlighter-rouge">print-circle</code> off, on deserialization these would become unique
objects and updates would break.</p>

<h3 id="the-future">The Future</h3>

<p>That’s where the database is today. I put in a few extra fields that
aren’t actually used yet, so that there’s room to make a few changes
without breaking the database. Perhaps someday I’ll work out a whole
new database structure, or maybe a proper database library will come
into existence, and this post will simply document the old database.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>A Handy Emacs Package Configuration Macro</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/06/02/"/>
    <id>urn:uuid:3e64c259-69a6-3500-0e87-3a93d61c1644</id>
    <updated>2013-06-02T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p><em>Update April 2015</em>: I now use <a href="https://github.com/jwiegley/use-package">use-package</a> instead of
the <code class="language-plaintext highlighter-rouge">with-package</code> macro explained below. It’s cleaner, nicer, and
better maintained.</p>

<p>I was inspired by <a href="http://milkbox.net/note/single-file-master-emacs-configuration/">a post recently written by Milkypostman</a>
(the M in MELPA). He describes some of his <code class="language-plaintext highlighter-rouge">init.el</code> configuration,
specifically focusing on an <code class="language-plaintext highlighter-rouge">after</code> macro that wraps the misdesigned
<code class="language-plaintext highlighter-rouge">eval-after-load</code> function. I wanted to take this macro further in
three ways:</p>

<ul>
  <li>
    <p>The delayed expression should be <a href="http://lunaryorn.com/blog/2013/05/31/byte-compiling-eval-after-load/">properly byte-compiled</a>,
which doesn’t happen by default with <code class="language-plaintext highlighter-rouge">eval-after-load</code>.</p>
  </li>
  <li>
    <p>In a few cases my expression depends on multiple, independent
packages but <code class="language-plaintext highlighter-rouge">eval-after-load</code> only accepts one.</p>
  </li>
  <li>
    <p>If I’m specifying packages when using my macro, why bother listing
them at the top of my initialize file? I could DRY things up by
learning what packages to install when the macro is used. Here’s
the kicker: <strong>I can pretend that every available package is already
installed like built-in packages!</strong></p>
  </li>
</ul>

<p>The result is a pair of macros <code class="language-plaintext highlighter-rouge">with-package</code> and <code class="language-plaintext highlighter-rouge">with-package*</code>
which can be found in <a href="https://github.com/skeeto/.emacs.d/blob/master/lisp/package-helper.el">package-helper.el</a>. The latter form
doesn’t wait but immediately loads the specified packages with
<code class="language-plaintext highlighter-rouge">require</code>. It’s shaped just like Milkypostman’s <code class="language-plaintext highlighter-rouge">after</code> macro, except
that it can accept a list of packages in place of a single symbol.
Also, the package names aren’t quoted; they don’t need to be since
this is a macro instead of a function.</p>

<p>Here’s a typical use case for each macro. That <code class="language-plaintext highlighter-rouge">expose</code> higher-order
function is <a href="/blog/2010/09/29/">from my personal <code class="language-plaintext highlighter-rouge">utility</code> library</a>. The
expressions to be evaluated depend on both packages and neither needs
to be loaded immediately, so I’m using the first form of the macro.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">with-package</span> <span class="p">(</span><span class="nv">skewer-mode</span> <span class="nv">utility</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">skewer-setup</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">define-key</span> <span class="nv">skewer-mode-map</span> <span class="p">(</span><span class="nv">kbd</span> <span class="s">"C-c $"</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">expose</span> <span class="nf">#'</span><span class="nv">skewer-bower-load</span> <span class="s">"jquery"</span> <span class="s">"1.9.1"</span><span class="p">)))</span>

<span class="p">(</span><span class="nv">with-package*</span> <span class="nv">smex</span>
  <span class="p">(</span><span class="nv">smex-initialize</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">global-set-key</span> <span class="p">(</span><span class="nv">kbd</span> <span class="s">"M-x"</span><span class="p">)</span> <span class="ss">'smex</span><span class="p">))</span>
</code></pre></div></div>

<p>For the second one, I’m going to be using smex right away (takes over
<code class="language-plaintext highlighter-rouge">M-x</code>), so I use the second form, which immediately loads smex. The
macro isn’t really necessary at all here since I could just use
<code class="language-plaintext highlighter-rouge">require</code> and follow it with these expressions, but I <em>really</em> like
how this organizes my <code class="language-plaintext highlighter-rouge">init.el</code>. It creates a domain-specific language
(DSL) just for Emacs configuration. Each package configuration is
grouped up in a clean <code class="language-plaintext highlighter-rouge">let</code>-like form. Since I’ve added syntax
highlighting to <code class="language-plaintext highlighter-rouge">with-package</code> it looks very elegant. Normal syntax
highlighters aren’t going to do this, so here’s a screenshot of my
buffer.</p>

<p><img src="/img/emacs/with-package.png" alt="" /></p>

<p>JavaScript developers with a keen eye may notice a familiar pattern
here. This macro is shaped a bit like the
<a href="https://github.com/amdjs/amdjs-api/wiki/AMD">Asynchronous Module Definition (AMD)</a>, with asynchronousy in
mind. Since this is Lisp with a powerful macro system, I get to hide
away the function wrapper part.</p>

<p>Using this macro has caused me to use <code class="language-plaintext highlighter-rouge">eval-after-load</code> with just
about everything. This has cut my initialization time down to about
10% of what it was before! On those occasions that I <em>do</em> restart
Emacs, it’s really nice that it’s back to under 1 second (0.6 seconds
vs 6 seconds).</p>

<h3 id="the-problem-of-eval-after-load">The problem of eval-after-load</h3>

<p>I’m calling <code class="language-plaintext highlighter-rouge">eval-after-load</code> poorly designed because it’s a perfect
example of an inappropriate use of <code class="language-plaintext highlighter-rouge">eval</code>. In function form it
<em>should</em> have accepted a function as its second argument instead of an
s-expression, so it would work like a hook. This is even more
inappropriate now that Emacs has proper lexical closures, which is the
perfect mechanism for delayed evaluation. <strong>The whole point of
<code class="language-plaintext highlighter-rouge">eval-after-load</code> is to speed up Emacs initialization time, but <em>using
<code class="language-plaintext highlighter-rouge">eval</code> is slow</em></strong>. To the compiler, this isn’t code, just data. This
means no byte-compilation and no compiler warnings.</p>

<p>A possible alternative design for <code class="language-plaintext highlighter-rouge">eval-after-load</code> would be a hook
named something like <code class="language-plaintext highlighter-rouge">&lt;package&gt;-load-hook</code>. Then when <code class="language-plaintext highlighter-rouge">load</code> or
<code class="language-plaintext highlighter-rouge">require</code> loads a file, it runs the hook with the matching name. This
removes <code class="language-plaintext highlighter-rouge">eval-after-load</code> as its own standalone language concept.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'skewer-mode-load-hook</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="o">...</span><span class="p">))</span>
</code></pre></div></div>

<p>The problem here is when the package is already loaded the hook is
never run. In contrast, when <code class="language-plaintext highlighter-rouge">eval-after-load</code> is used on an
already-loaded package, the expression is immediately evaluated.</p>

<p>Given this, if there was something I could change about this it would
simply be for <code class="language-plaintext highlighter-rouge">eval-after-load</code>, whatever it would be called, to take
a function for the second argument. I would also provide a simple
macro just like <code class="language-plaintext highlighter-rouge">after</code> that wraps this function. Why not just a
macro? The function form would be really useful for a situation like
this,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">eval-after-load</span> <span class="ss">'skewer-mode</span> <span class="nf">#'</span><span class="nv">skewer-setup</span><span class="p">)</span>
</code></pre></div></div>

<p>Here there’s no need to instantiate a new anonymous function or
s-expression. If all it’s doing is calling a zero-arity function, that
function can be passed in directly.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Prototype-based Elisp Objects with @</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/04/07/"/>
    <id>urn:uuid:d1361157-9022-3e77-270c-5410d903c7d4</id>
    <updated>2013-04-07T00:00:00Z</updated>
    <category term="elisp"/><category term="emacs"/>
    <content type="html">
      <![CDATA[<p><strong>Reflection from the future</strong>: <em>This library is super slow and
inefficient. It should probably not be used for anything serious.</em></p>

<p>Last weekend I had the itch to play around with a multiple-inheritance
prototype-based object system in lisp. It would
<a href="http://steve-yegge.blogspot.com/2008/10/universal-design-pattern.html">look a lot like JavaScript’s object system</a> but wanted to try
experimenting some different ideas. My favorite lisp to hack in is
Emacs Lisp, so that’s what I built it on. What I ended up with is
actually pretty neat. Despite the lack of reader macros in Elisp, I
still managed to introduce new syntax by manipulating symbols at
compile time.</p>

<ul>
  <li><a href="https://github.com/skeeto/at-el">https://github.com/skeeto/at-el</a></li>
</ul>

<p>See the README for a quick demonstration. What follows is the long
explanation.</p>

<p>It’s called <a href="https://github.com/skeeto/at-el">@</a>, due to the syntax that it adds to Elisp as a
domain-specific language. It’s a mini-language, really. The name is also
a challenge to the code that supports Elisp, because so much of it —
including emacs-lisp-mode and Paredit — doesn’t properly handle @ in
identifiers. <del>Even <a href="https://github.com/bhollis/maruku">Maruku</a>, the Markdown to HTML translator I
use for this blog, has bugs that won’t allow it to handle the @
characters in my code, so I had to forgo most syntax highlighting for
this post.</del> (Update: I now use Kramdown so this is no longer an issue.)</p>

<p>Fortunately <code class="language-plaintext highlighter-rouge">require</code> <em>does</em> manage just fine.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'@</span><span class="p">)</span>
</code></pre></div></div>

<p>Objects in @ are vectors with the symbol @ as the first element. The
rest of the elements are implementation specific, but, at the moment,
the second element is a plist (property list) of all of that object’s
properties.</p>

<p>The root object of @ is @, and all other objects are instances of this
object, either directly or indirectly. Because it’s prototype based,
creating a new object is a matter of extending one or more
(multiple-inheritance) existing objects. This is done with the
function <code class="language-plaintext highlighter-rouge">@extend</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Create a brand new object</span>
<span class="p">(</span><span class="nb">defvar</span> <span class="nv">foo</span> <span class="p">(</span><span class="nv">@extend</span> <span class="nv">@</span><span class="p">))</span>
</code></pre></div></div>

<p>If no objects are given to <code class="language-plaintext highlighter-rouge">@extend</code>, @ will be used as the parent
object, so it’s not necessary as an argument above. This is actually
very important, as objects that don’t inherit from @ will not work at
all! I’ll get into that detail in a bit. Additionally, <code class="language-plaintext highlighter-rouge">@extend</code>
accepts keyword arguments, which become properties on the created
object.</p>

<p>The function @ is used to access properties on an object. Remember,
Elisp is a <em>lisp-2</em> meaning that variables and functions exist in
their own namespaces. This means there can be both a variable @ (the
root object) and function @ (property accessor).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">rectangle</span> <span class="p">(</span><span class="nv">@extend</span> <span class="ss">:width</span> <span class="mi">3</span> <span class="ss">:height</span> <span class="mi">4</span><span class="p">))</span>
<span class="p">(</span><span class="nv">@</span> <span class="nv">rectangle</span> <span class="ss">:width</span><span class="p">)</span>  <span class="c1">; =&gt; 3</span>
<span class="p">(</span><span class="nv">@</span> <span class="nv">rectangle</span> <span class="ss">:height</span><span class="p">)</span>  <span class="c1">; =&gt; 4</span>
</code></pre></div></div>

<p>The @ function is also <em>setf-able</em>, so setting properties should be
obvious to any lisper.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">@</span> <span class="nv">rectangle</span> <span class="ss">:width</span><span class="p">)</span> <span class="mi">13</span><span class="p">)</span>
<span class="p">(</span><span class="nv">@</span> <span class="nv">rectangle</span> <span class="ss">:width</span><span class="p">)</span>  <span class="c1">; =&gt; 13</span>
</code></pre></div></div>

<p>Like JavaScript, methods are just functions stored in properties on an
object. In @, the first argument for a method is the object itself,
which is called @@ by convention.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">@</span> <span class="nv">rectangle</span> <span class="ss">:area</span><span class="p">)</span>
  <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">@</span><span class="p">)</span> <span class="p">(</span><span class="nb">*</span> <span class="p">(</span><span class="nv">@</span> <span class="nv">@@</span> <span class="ss">:width</span><span class="p">)</span> <span class="p">(</span><span class="nv">@</span> <span class="nv">@@</span> <span class="ss">:height</span><span class="p">))))</span>

<span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">@</span> <span class="nv">rectangle</span> <span class="ss">:area</span><span class="p">)</span> <span class="nv">rectangle</span><span class="p">)</span>  <span class="c1">; =&gt; 52</span>
</code></pre></div></div>

<h3 id="new-syntax">New Syntax</h3>

<p>Here’s the first really neat part. I find all that <code class="language-plaintext highlighter-rouge">(@ @@ ...)</code>
business to be visually unpleasing. Fortunately this can be fixed by
adding syntax. The macro <code class="language-plaintext highlighter-rouge">def@</code> transforms variables that look like @:
into these @ accessors. The following declaration is equivalent to the
lambda assignment above. It’s meant to be very convenient.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">def@</span> <span class="nv">rectangle</span> <span class="ss">:area</span> <span class="p">()</span>
  <span class="p">(</span><span class="nb">*</span> <span class="nv">@:width</span> <span class="nv">@:height</span><span class="p">))</span>
</code></pre></div></div>

<p>This macro walks the body of the function at compile-time (macro
expansion time) and transforms these symbols into the full @ calls
above. Like most lisp macros, this has <em>no</em> run-time performance cost.</p>

<p>Because using <code class="language-plaintext highlighter-rouge">funcall</code> all the time and remembering to pass the
object as the first argument is tedious, the @! function is provided
for calling methods.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">@!</span> <span class="nv">rectangle</span> <span class="ss">:area</span><span class="p">)</span>  <span class="c1">; =&gt; 52</span>
</code></pre></div></div>

<p>The @: variables become function calls when in function position.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">def@</span> <span class="nv">rectangle</span> <span class="ss">:double-area</span> <span class="p">()</span>
  <span class="p">(</span><span class="nb">*</span> <span class="mi">2</span> <span class="p">(</span><span class="nv">@:area</span><span class="p">))</span>
</code></pre></div></div>

<p>In a <em>lisp-1</em> this would happen for free, but in Elisp this situation
expands to the @! form.</p>

<h3 id="inheritance">Inheritance</h3>

<p>This <code class="language-plaintext highlighter-rouge">rectangle</code> is starting to look like a nice re-usable object.
There’s a @ convention for this: prefix “class” object names with @.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">@rectangle</span> <span class="nv">rectangle</span><span class="p">)</span>
</code></pre></div></div>

<p>Now to create new rectangle objects.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">foo</span> <span class="p">(</span><span class="nv">@extend</span> <span class="nv">@rectangle</span> <span class="ss">:width</span> <span class="mi">3</span> <span class="ss">:height</span> <span class="mf">7.1</span><span class="p">))</span>
<span class="p">(</span><span class="nv">@!</span> <span class="nv">foo</span> <span class="ss">:area</span><span class="p">)</span>  <span class="c1">; =&gt; 21.3</span>
</code></pre></div></div>

<p>Notice that the <code class="language-plaintext highlighter-rouge">foo</code> object doesn’t actually have an <code class="language-plaintext highlighter-rouge">:area</code> property
on itself. It was found on its parent, <code class="language-plaintext highlighter-rouge">@rectangle</code> by inheritance.
<code class="language-plaintext highlighter-rouge">:width</code> and <code class="language-plaintext highlighter-rouge">:height</code> were not looked up on the parent because
they’re already bound on <code class="language-plaintext highlighter-rouge">foo</code>.</p>

<p>Here’s another re-usable prototype. Notice that @: variables are
also setf-able — using <code class="language-plaintext highlighter-rouge">push</code> in this case.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">@colored</span> <span class="p">(</span><span class="nv">@extend</span> <span class="ss">:color</span> <span class="p">()))</span>

<span class="p">(</span><span class="nv">def@</span> <span class="nv">@colored</span> <span class="ss">:mix</span> <span class="p">(</span><span class="nv">color</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">push</span> <span class="nv">color</span> <span class="nv">@:color</span><span class="p">))</span>
</code></pre></div></div>

<p>The object system has multiple-inheritance, so colored rectangles can
be created from these two objects. The parent objects of an object are
listed in the <code class="language-plaintext highlighter-rouge">:proto</code> property as a list (similar to JavaScript’s
<code class="language-plaintext highlighter-rouge">__proto__</code>), which can be modified at any time to change an object’s
prototype chain.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">foo</span> <span class="p">(</span><span class="nv">@extend</span> <span class="nv">@colored</span> <span class="nv">@rectangle</span> <span class="ss">:width</span> <span class="mi">10</span> <span class="ss">:height</span> <span class="mi">4</span><span class="p">))</span>

<span class="p">(</span><span class="nv">@!</span> <span class="nv">foo</span> <span class="ss">:area</span><span class="p">)</span>  <span class="c1">; =&gt; 40</span>
<span class="p">(</span><span class="nv">@!</span> <span class="nv">foo</span> <span class="ss">:mix</span> <span class="ss">:red</span><span class="p">)</span>
<span class="p">(</span><span class="nv">@!</span> <span class="nv">foo</span> <span class="ss">:mix</span> <span class="ss">:blue</span><span class="p">)</span>
<span class="p">(</span><span class="nv">@</span> <span class="nv">foo</span> <span class="ss">:color</span><span class="p">)</span>  <span class="c1">; =&gt; (:blue :red)</span>
</code></pre></div></div>

<p>Even though the initial property was read from the parent, the
assignment (<code class="language-plaintext highlighter-rouge">push</code>), like all assignments, actually occurred on <code class="language-plaintext highlighter-rouge">foo</code>.</p>

<h3 id="setters-and-getters">Setters and Getters</h3>

<p>Remember how I said that objects that don’t eventually inherit from @
will be broken? This is because properties are actually set and
accessed through <code class="language-plaintext highlighter-rouge">:set</code> and <code class="language-plaintext highlighter-rouge">:get</code> methods. That is, @ calls these
methods as needed. The @ object provides the default actions for
these. An interesting part of the @ code: initially setting <code class="language-plaintext highlighter-rouge">:set</code> on
@ is a circularity problem, so there’s a special bootstrap step to
accomplish it.</p>

<p>By providing your own you can fundamentally change how your object
works. For example, here’s an <code class="language-plaintext highlighter-rouge">@immutable</code> mix-in which prevents all
property assignments. It’s provided as part of @.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">@immutable</span> <span class="p">(</span><span class="nv">@extend</span><span class="p">))</span>

<span class="p">(</span><span class="nv">def@</span> <span class="nv">@immutable</span> <span class="ss">:set</span> <span class="p">(</span><span class="nv">property</span> <span class="nv">_value</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">error</span> <span class="s">"Object is immutable, cannot set %s"</span> <span class="nv">property</span><span class="p">))</span>
</code></pre></div></div>

<p>This <code class="language-plaintext highlighter-rouge">:set</code> method will be found before the @ <code class="language-plaintext highlighter-rouge">:set</code> method, so it
gets overridden.</p>

<p>Remember how I said all object have a <code class="language-plaintext highlighter-rouge">:proto</code> that can be used to
modify the objects inheritance? This can be used to <em>freeze</em> an
object’s properties in place. Here’s a <code class="language-plaintext highlighter-rouge">:freeze</code> method for all
objects.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">def@</span> <span class="nv">@</span> <span class="ss">:freeze</span> <span class="p">()</span>
  <span class="s">"Make this object immutable."</span>
  <span class="p">(</span><span class="nb">push</span> <span class="nv">@immutable</span> <span class="nv">@:proto</span><span class="p">))</span>
</code></pre></div></div>

<p>Pretty cool, eh?</p>

<p>The <code class="language-plaintext highlighter-rouge">:get</code> method can be used to provide virtual properties.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">@squares</span> <span class="p">(</span><span class="nv">@extend</span><span class="p">))</span>

<span class="p">(</span><span class="nv">def@</span> <span class="nv">@squares</span> <span class="ss">:get</span> <span class="p">(</span><span class="nv">property</span><span class="p">)</span>
  <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">numberp</span> <span class="nv">property</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">expt</span> <span class="nv">property</span> <span class="mi">2</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">@^:get</span> <span class="nv">property</span><span class="p">)))</span>  <span class="c1">; explained in a moment</span>

<span class="p">(</span><span class="nb">mapcar</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">n</span><span class="p">)</span> <span class="p">(</span><span class="nv">@</span> <span class="nv">@squares</span> <span class="nv">n</span><span class="p">))</span> <span class="o">'</span><span class="p">(</span><span class="mi">0</span> <span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span> <span class="mi">4</span><span class="p">))</span>
<span class="c1">; =&gt; (0 1 4 9 16)</span>
</code></pre></div></div>

<p>I use this technique in the <code class="language-plaintext highlighter-rouge">@vector</code> class under <code class="language-plaintext highlighter-rouge">lib/</code> to expose the
elements of the internal vector as if they were properties.
<a href="http://50ply.com/">Brian</a> used this trick to make a @buffer prototype that wraps
Emacs’ buffers, with methods provided virtually by <code class="language-plaintext highlighter-rouge">:get</code>. For
example, the <code class="language-plaintext highlighter-rouge">:string</code> property would return a lambda that calls
<code class="language-plaintext highlighter-rouge">buffer-string</code>.</p>

<p>With multiple-inheritance and these setters and getters, there are a
lot of interesting mix-in possibilities. I’m only just discovering
some of them now.</p>

<h3 id="supermethods">Supermethods</h3>

<p>Sometimes it’s really useful to call supermethods. There’s syntax for
this: @^:. This calls the next method of that name in the prototype
chain. For example, here’s a <code class="language-plaintext highlighter-rouge">@watchable</code> mix-in (also provided by @)
that allows other code to be notified of changes to an object. It
needs to override <code class="language-plaintext highlighter-rouge">:set</code> but still call the original <code class="language-plaintext highlighter-rouge">:set</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">@watchable</span> <span class="p">(</span><span class="nv">@extend</span> <span class="ss">:watchers</span> <span class="no">nil</span><span class="p">))</span>

<span class="p">(</span><span class="nv">def@</span> <span class="nv">@watchable</span> <span class="ss">:watch</span> <span class="p">(</span><span class="nv">callback</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">push</span> <span class="nv">callback</span> <span class="nv">@:watchers</span><span class="p">))</span>

<span class="p">(</span><span class="nv">def@</span> <span class="nv">@watchable</span> <span class="ss">:unwatch</span> <span class="p">(</span><span class="nv">callback</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="nv">@:watchers</span> <span class="p">(</span><span class="nb">remove</span> <span class="nv">callback</span> <span class="nv">@:watchers</span><span class="p">)))</span>

<span class="p">(</span><span class="nv">def@</span> <span class="nv">@watchable</span> <span class="ss">:set</span> <span class="p">(</span><span class="nv">property</span> <span class="nv">new</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">callback</span> <span class="nv">@:watchers</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">funcall</span> <span class="nv">callback</span> <span class="nv">@@</span> <span class="nv">property</span> <span class="nv">new</span><span class="p">))</span>
  <span class="p">(</span><span class="nv">@^:set</span> <span class="nv">property</span> <span class="nv">new</span><span class="p">))</span>
</code></pre></div></div>

<p>This behavior is also used for constructors. By convention, the
<code class="language-plaintext highlighter-rouge">:init</code> method is the constructor. It should generally call the next
constructor with <code class="language-plaintext highlighter-rouge">(@^:init)</code>. @ has a no-op, no-argument <code class="language-plaintext highlighter-rouge">:init</code>
method to bottom-out this process.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">def@</span> <span class="nv">@rectangle</span> <span class="ss">:init</span> <span class="p">(</span><span class="nv">width</span> <span class="nv">height</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">@^:init</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="nv">@:width</span> <span class="nv">width</span> <span class="nv">@:height</span> <span class="nv">height</span><span class="p">))</span>

<span class="p">(</span><span class="nv">@!</span> <span class="p">(</span><span class="nv">@!</span> <span class="nv">@rectangle</span> <span class="ss">:new</span> <span class="mf">13.2</span> <span class="mf">2.1</span><span class="p">)</span> <span class="ss">:area</span><span class="p">)</span> <span class="c1">; =&gt; 27.72</span>
</code></pre></div></div>

<p>As shown, the <code class="language-plaintext highlighter-rouge">:new</code> method provided by the @ object combines both
<code class="language-plaintext highlighter-rouge">@extend</code> and <code class="language-plaintext highlighter-rouge">:init</code> to provide simple single-object inheritance.</p>

<h3 id="the-cost-of-">The Cost of @</h3>

<p>In the lib/ directory there are a bunch of example objects
implemented: including @vector, @queue, @stack, and @heap. I found
these to be very enjoyable to write, and they’ve been the testing
grounds for @. @heap uses an internal @vector instance and exercises
@’s features the most.</p>

<p>The performance cost of @ very apparent with @heap. Even byte-compiled
it’s slower than the naive implementation (compose <code class="language-plaintext highlighter-rouge">push</code> and <code class="language-plaintext highlighter-rouge">sort</code>)
for even as high as 1,000 elements. While I think @ leads to elegant
code, there’s still plenty to do for performance. It’s comically slow.</p>

<p>This really caught Brian’s interest, because it was an opportunity to
put on his programming language designer’s hat — which I believe to
be his favorite hat. He’s been trying different caching strategies to
reduce all the walking of the prototype chain. This effort can be
found in the other repository branches and in his fork. The system is
so dynamic that cache invalidation is a really complex problem.</p>

<p>Every time a property is set, @ has to find the <code class="language-plaintext highlighter-rouge">:set</code> property for
that object, which generally means walking all the way up to @.
Because <code class="language-plaintext highlighter-rouge">:proto</code> can be modified at any time, every property look-up
requires computing the precedence order (lazily). This all makes
property assignment quite expensive! I can understand why real object
systems aren’t this flexible. It comes at a high price.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>The Limits of Emacs Advice</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/01/22/"/>
    <id>urn:uuid:1cfcdeee-19e5-33b8-1344-8fef68333e41</id>
    <updated>2013-01-22T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>Today at work I was using <a href="http://www.50ply.com/blog/2012/08/13/introducing-impatient-mode/">impatient-mode</a> to share some code
with <a href="http://www.50ply.com/">Brian</a>. It makes for a really handy live pastebin. To
limit the buffer to the relevant code, I narrowed it down with
<code class="language-plaintext highlighter-rouge">narrow-to-region</code>. However, the browser wouldn’t update to show only
the narrowed region until I made an edit. This makes sense because
impatient-mode hooks <code class="language-plaintext highlighter-rouge">after-change-functions</code>.  Narrowing the buffer
doesn’t <em>change</em> anything in the buffer, so, as expected, this hook is
not called.</p>

<p>The solution would be to also join whatever hook is called when the
buffer restriction changes. Unfortunately,
<a href="http://www.gnu.org/software/emacs/manual/html_node/elisp/Standard-Hooks.html">no such hook exists</a>. I thought I could create this hook with
some <a href="http://www.gnu.org/software/emacs/manual/html_node/elisp/Advising-Functions.html">advice</a>, but this turns out to be currently impossible.</p>

<h3 id="emacs-advice">Emacs Advice</h3>

<p>What’s advice? It’s a handy feature of Emacs lisp that allows users to
modify the behavior of almost any function without having to redefine
it. It works a little bit like methods in the Common Lisp Object
System (CLOS): advice is code than can be evaluated before, after, or
around a function.</p>

<p>Advice is defined with <code class="language-plaintext highlighter-rouge">defadvice</code>. Duh. For example, say we wanted to
be silly and have Emacs say “Ouch!” when a line is killed with
<code class="language-plaintext highlighter-rouge">kill-line</code>. We can advise this function to display a message.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defadvice</span> <span class="nv">kill-line</span> <span class="p">(</span><span class="nv">after</span> <span class="nv">say-ouch</span> <span class="nv">activate</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">message</span> <span class="s">"Ouch!"</span><span class="p">))</span>
</code></pre></div></div>

<p>This says we want to advise the function <code class="language-plaintext highlighter-rouge">kill-line</code>, we want this
advise to execute <em>after</em> <code class="language-plaintext highlighter-rouge">kill-line</code> has run, our advice is named
“<code class="language-plaintext highlighter-rouge">say-ouch</code>”, and we want to immediately activate this advice so it
gets used right away. The rest is the body of the advice, like the
body of a function. After evaluating this <code class="language-plaintext highlighter-rouge">defadvice</code>, every time I
hit <code class="language-plaintext highlighter-rouge">C-k</code> Emacs says “Ouch!” in the minibuffer. Cool!</p>

<h3 id="narrow-to-region-and-widen">narrow-to-region and widen</h3>

<p>A hook is a variable that holds a list of functions. (Or maybe hooks
are the functions in this list? Emacs’ documentation calls both of
these things hooks.) These functions are called, usually without
arguments, when some specific event occurs. For example, every mode
has its own mode hook which is called when the mode is activated in a
buffer. This allows users to extend or modify the mode — like by
enabling additional minor modes — without editing the mode’s source
code directly.</p>

<p>To make our hook work we need to advise <code class="language-plaintext highlighter-rouge">narrow-to-region</code> and <code class="language-plaintext highlighter-rouge">widen</code>
to run the hook after they’ve done their work. These are the primitive
narrowing functions which all the other narrowing functions eventually
call, like <code class="language-plaintext highlighter-rouge">narrow-to-defun</code>, <code class="language-plaintext highlighter-rouge">narrow-to-page</code>, and any other
mode-specific narrowing. <strong>Advising these two functions will cover all
buffer narrowing.</strong> It <em>should</em> be this simple.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">change-restriction-hook</span> <span class="p">())</span>

<span class="p">(</span><span class="nv">defadvice</span> <span class="nv">narrow-to-region</span> <span class="p">(</span><span class="nv">after</span> <span class="nv">hook</span> <span class="nv">activate</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">run-hooks</span> <span class="ss">'change-restriction-hook</span><span class="p">))</span>

<span class="p">(</span><span class="nv">defadvice</span> <span class="nv">widen</span> <span class="p">(</span><span class="nv">after</span> <span class="nv">hook</span> <span class="nv">activate</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">run-hooks</span> <span class="ss">'change-restriction-hook</span><span class="p">))</span>
</code></pre></div></div>

<p>At first this seems to work. I can add a test hook see them activate
when I use <code class="language-plaintext highlighter-rouge">M-x narrow-to-region</code> and <code class="language-plaintext highlighter-rouge">M-x widen</code>. However, when I use
other narrowing functions, like <code class="language-plaintext highlighter-rouge">narrow-to-defun</code>, my hook functions
aren’t called.</p>

<p>Is there a narrowing primitive I missed? I check the source
code. Nope, these are lisp functions which ultimately call
<code class="language-plaintext highlighter-rouge">narrow-to-region</code>. Is the advice not getting used when called
indirectly? I test that out.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">foo</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">narrow-to-region</span> <span class="mi">1</span> <span class="mi">2</span><span class="p">))</span>
</code></pre></div></div>

<p>This works fine. Hmmm, these other functions are byte-compiled, maybe
that’s the problem.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">byte-compile</span> <span class="ss">'foo</span><span class="p">)</span>
</code></pre></div></div>

<p>Bingo. The advice has stopped working. It has something to do with
byte-compilation.</p>

<h3 id="bytecode">Bytecode</h3>

<p>Let’s take a look at the bytecode for <code class="language-plaintext highlighter-rouge">foo</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">symbol-function</span> <span class="ss">'foo</span><span class="p">)</span>
<span class="c1">;; =&gt; #[nil "\300\301}\207" [1 2] 2 nil nil]</span>
</code></pre></div></div>

<p>I don’t know too much about Emacs’ byte code, but here’s the gist of
it. A compiled function is a special type of vector (hence the <code class="language-plaintext highlighter-rouge">#[]</code>
form). This is a legal s-expression which you can use directly in
regular Elisp code just like it was a function. The only reason you’d
do so is for obfuscation, so it would look very suspicious.</p>

<p>The first element of this function vector is the parameter list —
empty in this case. The second is a string containing the actual
bytecodes. The rest holds the various constants from the function
body. This includes the symbols of other functions called by this
function. It’s important to note that <strong><code class="language-plaintext highlighter-rouge">narrow-to-region</code> does not
appear in this list</strong>!</p>

<p>Curious. Let’s take a closer look at the bytecode.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">coerce</span> <span class="p">(</span><span class="nb">aref</span> <span class="p">(</span><span class="nb">symbol-function</span> <span class="ss">'foo2</span><span class="p">)</span> <span class="mi">1</span><span class="p">)</span> <span class="ss">'list</span><span class="p">)</span>
<span class="c1">;; =&gt; (192 193 125 135)</span>
</code></pre></div></div>

<p>Looking at <code class="language-plaintext highlighter-rouge">bytecomp.el</code> from the Emacs distribution I can see that
codes 192 and 193 are used for accessing constants. This pushes my
constants 1 and 2 onto a stack for use as function arguments. Next up
is 125, which corresponds to <code class="language-plaintext highlighter-rouge">byte-narrow-to-region</code>. Gotcha!</p>

<p>It turns out <code class="language-plaintext highlighter-rouge">narrow-to-region</code> is so special — probably because it’s
used very frequently — that it gets its own bytecode. The <strong>primitive
function call is being compiled away into a single instruction</strong>. This
means my advice will not be considered in byte-compiled code. Darnit.
The same is true for <code class="language-plaintext highlighter-rouge">widen</code> (code 126).</p>

<h3 id="where-to-go-now">Where to go now?</h3>

<p>Since it’s not possible to hook or advise the buffer-narrowing
primitives, impatient-mode would need to hook some other event that
tends to happen at the same time. Perhaps any time a command is
executed in the current buffer it could check for changes to the
buffer restriction and, if so, update any attached web clients. I’ll
figure something out.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>Turning Asynchronous into Synchronous in Elisp</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/01/14/"/>
    <id>urn:uuid:61c72b82-371c-304f-7e0e-f5ea4a990ef3</id>
    <updated>2013-01-14T00:00:00Z</updated>
    <category term="elisp"/><category term="emacs"/>
    <content type="html">
      <![CDATA[<p>As a <a href="/blog/2013/01/07/">new user of nREPL</a> I was poking around
nrepl.el, seeing what sorts of Elisp tricks I could learn. Even though
it was written 6 months before Skewer, and I was completely unaware of
nREPL’s existence until two weeks ago, there’s a lot of similarity
between nrepl.el and <a href="/blog/2012/10/31/">Skewer</a>. Due to serving the
same purpose for different platforms, this isn’t very surprising.</p>

<p>In particular, Skewer has <code class="language-plaintext highlighter-rouge">skewer-eval</code> for sending a string to the
browser for evaluation. Like JavaScript, Emacs Lisp is
single-threaded: there’s only one execution context at a time and it
has to return to the top-level before a new context can execute. There
are no continuations or coroutines. <code class="language-plaintext highlighter-rouge">skewer-eval</code> requires
coordination with an external process (the browser) making it
inherently asynchronous. So as a second, optional argument, a callback
can be provided for receiving the result.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Echo the result in the minibuffer.</span>
<span class="p">(</span><span class="nv">skewer-eval</span> <span class="s">"Math.pow(2.1, 3.1)"</span>
             <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">r</span><span class="p">)</span> <span class="p">(</span><span class="nv">message</span> <span class="p">(</span><span class="nb">cdr</span> <span class="p">(</span><span class="nb">assoc</span> <span class="ss">'value</span> <span class="nv">r</span><span class="p">)))))</span>
</code></pre></div></div>

<p>However, <strong>the equivalent function in nrepl.el, <code class="language-plaintext highlighter-rouge">nrepl-eval</code>, is
synchronous!</strong> It <em>returns</em> the evaluation result. “That’s not true!
That’s impossible!”</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; !!!</span>
<span class="p">(</span><span class="nv">plist-get</span> <span class="p">(</span><span class="nv">nrepl-eval</span> <span class="s">"(Math/pow 2.1 3.1)"</span><span class="p">)</span> <span class="ss">:value</span><span class="p">)</span>
<span class="c1">;; =&gt; "9.97423999265871"</span>
</code></pre></div></div>

<p>Well, it turns out what I said above about execution contexts wasn’t
completely true. There’s exactly <em>one</em> sneaky function that breaks the
rule: <code class="language-plaintext highlighter-rouge">accept-process-output</code>. It blocks the current execution context
allowing some other execution contexts to run, including timers and
I/O. However, it will lock up Emacs’ interface. <code class="language-plaintext highlighter-rouge">nrepl-eval</code> uses this
function to poll for a response from the nREPL process.</p>

<p>When I saw this, a lightbulb went off in my head. This lone loophole
in Emacs execution model can be abused to provide interesting
benefits. Specifically, it can be used to create a <strong><em>latch</em></strong>
synchronization primitive.</p>

<p>The full source code is here if you want to dive right in. I’ll be
going over a simplified version piece-by-piece below.</p>

<ul>
  <li><a href="https://github.com/skeeto/elisp-latch">https://github.com/skeeto/elisp-latch</a></li>
</ul>

<h3 id="the-latch-primitive">The Latch Primitive</h3>

<p>The idea of a latch is that a thread can <em>wait</em> on the latch, blocking
its execution. It will remain in that state until another thread
<em>notifies</em> the latch, releasing any threads blocked on the
latch. Here’s how it might look in Lisp.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">result</span> <span class="no">nil</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defvar</span> <span class="nv">my-latch</span> <span class="p">(</span><span class="nv">make-latch</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">get-result</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">if</span> <span class="nv">result</span>
      <span class="nv">result</span>
    <span class="p">(</span><span class="nv">wait</span> <span class="nv">my-latch</span><span class="p">)</span> <span class="c1">; Block, waiting for the result</span>
    <span class="nv">result</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">set-result</span> <span class="p">(</span><span class="nv">value</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="nv">result</span> <span class="nv">value</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">notify</span> <span class="nv">my-latch</span><span class="p">))</span> <span class="c1">; Release anyone waiting on my-latch</span>
</code></pre></div></div>

<p>The pattern above is similar to a <strong><em>promise</em></strong>, which we will later
implement on top of latches. In our latch implementation I’d also like
to optionally pass a value from <code class="language-plaintext highlighter-rouge">notify</code> to anyone <code class="language-plaintext highlighter-rouge">wait</code>ing, which
would make the above simpler.</p>

<p>Emacs doesn’t have threads but instead non-preemptive execution
contexts. Ignoring the Emacs UI lockup, we can mostly ignore that
distinction for now.</p>

<p>To exploit <code class="language-plaintext highlighter-rouge">accept-process-output</code> each latch needs to have its own
process object. When blocking on a latch it will simply wait for that
process to receive input. To notify a latch, we need to send data to
that process.</p>

<p>For the process, we’ll ask Emacs to make a pseudo-terminal “process.”
It’s basically just a pipe for Emacs to talk to itself. It’s possible
to literally make a pipe, which is better for this purpose, but
<a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=698096">that’s currently broken</a>. To make such a process, we call
<code class="language-plaintext highlighter-rouge">start-process</code> with <code class="language-plaintext highlighter-rouge">nil</code> as the program name (third argument).</p>

<p>Let’s start by making a new class called <code class="language-plaintext highlighter-rouge">latch</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'eieio</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defclass</span> <span class="nv">latch</span> <span class="p">()</span>
  <span class="p">((</span><span class="nv">process</span> <span class="ss">:initform</span> <span class="p">(</span><span class="nv">start-process</span> <span class="s">"latch"</span> <span class="no">nil</span> <span class="no">nil</span><span class="p">))</span>
   <span class="p">(</span><span class="nv">value</span> <span class="ss">:initform</span> <span class="no">nil</span><span class="p">)))</span>
</code></pre></div></div>

<p>This class has two slots, <code class="language-plaintext highlighter-rouge">process</code> and <code class="language-plaintext highlighter-rouge">value</code>. The process slot
holds the aforementioned process we’ll be blocking on. The <code class="language-plaintext highlighter-rouge">value</code>
slot will be used to pass a value from <code class="language-plaintext highlighter-rouge">notify</code> to <code class="language-plaintext highlighter-rouge">wait</code>. The
<code class="language-plaintext highlighter-rouge">process</code> slot is initialized with a brand new process object upon
instantiation.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmethod</span> <span class="nv">wait</span> <span class="p">((</span><span class="nv">latch</span> <span class="nv">latch</span><span class="p">))</span>
  <span class="p">(</span><span class="nv">accept-process-output</span> <span class="p">(</span><span class="nb">slot-value</span> <span class="nv">latch</span> <span class="ss">'process</span><span class="p">))</span>
  <span class="p">(</span><span class="nb">slot-value</span> <span class="nv">latch</span> <span class="ss">'value</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defmethod</span> <span class="nv">notify</span> <span class="p">((</span><span class="nv">latch</span> <span class="nv">latch</span><span class="p">)</span> <span class="k">&amp;optional</span> <span class="nv">value</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">slot-value</span> <span class="nv">latch</span> <span class="ss">'value</span><span class="p">)</span> <span class="nv">value</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">process-send-string</span> <span class="p">(</span><span class="nb">slot-value</span> <span class="nv">latch</span> <span class="ss">'process</span><span class="p">)</span> <span class="s">"\n"</span><span class="p">))</span>
</code></pre></div></div>

<p>To wait, call <code class="language-plaintext highlighter-rouge">accept-process-output</code> on the latch’s private
process. This function won’t return until data is sent to the
process. By that time, the <code class="language-plaintext highlighter-rouge">value</code> slot will be filled in with the
value from <code class="language-plaintext highlighter-rouge">notify</code>.</p>

<p>To notify, send a newline with <code class="language-plaintext highlighter-rouge">process-send-string</code>. The data to send
is arbitrary, but I wanted to send as little as possible (one byte)
and I figure a newline might be safer when it comes to flushing any
sort of buffer. Buffers tend to flush on newlines. Before sending
data, we set the <code class="language-plaintext highlighter-rouge">value</code> slot to the value that <code class="language-plaintext highlighter-rouge">wait</code> will return.</p>

<p>That’s basically it! However, processes are not garbage collected by
Emacs, so we need a <code class="language-plaintext highlighter-rouge">destroy</code> destructor method. The name <code class="language-plaintext highlighter-rouge">destroy</code>
here is not special to Emacs. It’s something for the user of the
library to call.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmethod</span> <span class="nv">destroy</span> <span class="p">((</span><span class="nv">latch</span> <span class="nv">latch</span><span class="p">))</span>
  <span class="p">(</span><span class="nb">ignore-errors</span>
    <span class="p">(</span><span class="nv">delete-process</span> <span class="p">(</span><span class="nb">slot-value</span> <span class="nv">latch</span> <span class="ss">'process</span><span class="p">))))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">make-latch</span> <span class="p">()</span>
  <span class="p">(</span><span class="nb">make-instance</span> <span class="ss">'latch</span><span class="p">))</span>
</code></pre></div></div>

<p>I also made a convenience constructor function <code class="language-plaintext highlighter-rouge">make-latch</code>, with the
conventional name <code class="language-plaintext highlighter-rouge">make-</code>, since users shouldn’t have to call
<code class="language-plaintext highlighter-rouge">make-instance</code> for our classes.</p>

<p>That’s enough to turn <code class="language-plaintext highlighter-rouge">skewer-eval</code> into a synchronous function.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">skewer-eval-synchronously</span> <span class="p">(</span><span class="nv">js-code</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">lexical-let</span> <span class="p">((</span><span class="nv">latch</span> <span class="p">(</span><span class="nv">make-latch</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">skewer-eval</span> <span class="nv">js-code</span> <span class="p">(</span><span class="nv">apply-partially</span> <span class="nf">#'</span><span class="nv">notify</span> <span class="nv">latch</span><span class="p">))</span>
    <span class="p">(</span><span class="nb">prog1</span> <span class="p">(</span><span class="nv">wait</span> <span class="nv">latch</span><span class="p">)</span>
      <span class="p">(</span><span class="nv">destroy</span> <span class="nv">latch</span><span class="p">))))</span>
</code></pre></div></div>

<p>In combination with <code class="language-plaintext highlighter-rouge">lexical-let</code>, <code class="language-plaintext highlighter-rouge">apply-partially</code> returns a closure
that will notify the latch with the return value passed to it from
skewer. We need to get the return value from <code class="language-plaintext highlighter-rouge">wait</code>, destroy the
latch, then return the value, so I use a <code class="language-plaintext highlighter-rouge">prog1</code> for this.</p>

<h3 id="one-use-latches">One-use Latches</h3>

<p>In my experimenting, I noticed the <code class="language-plaintext highlighter-rouge">prog1</code> pattern coming up a
lot. Having to destroy my latch after a single use was really
inconvenient. Fortunately this pattern can be captured by a subclass:
one-time-latch.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defclass</span> <span class="nv">one-time-latch</span> <span class="p">(</span><span class="nv">latch</span><span class="p">)</span>
  <span class="p">())</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">make-one-time-latch</span> <span class="p">()</span>
  <span class="p">(</span><span class="nb">make-instance</span> <span class="ss">'one-time-latch</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defmethod</span> <span class="nv">wait</span> <span class="ss">:after</span> <span class="p">((</span><span class="nv">latch</span> <span class="nv">one-time-latch</span><span class="p">))</span>
  <span class="p">(</span><span class="nv">destroy</span> <span class="nv">latch</span><span class="p">))</span>
</code></pre></div></div>

<p>This subclass destroys the latch after the superclass’s <code class="language-plaintext highlighter-rouge">wait</code> is
done, through an <code class="language-plaintext highlighter-rouge">:after</code> method (purely for side-effects). CLOS is
fun, isn’t it?</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">skewer-eval-synchronously</span> <span class="p">(</span><span class="nv">js-code</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">lexical-let</span> <span class="p">((</span><span class="nv">latch</span> <span class="p">(</span><span class="nv">make-one-time-latch</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">skewer-eval</span> <span class="nv">js-code</span> <span class="p">(</span><span class="nv">apply-partially</span> <span class="nf">#'</span><span class="nv">notify</span> <span class="nv">latch</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">wait</span> <span class="nv">latch</span><span class="p">)))</span>
</code></pre></div></div>

<p>There, that’s a lot more elegant.</p>

<p>If eieio was a more capable mini-CLOS I could also demonstrate a
<code class="language-plaintext highlighter-rouge">countdown-latch</code>, but this would require an <code class="language-plaintext highlighter-rouge">:around</code> method. Most
uses of <code class="language-plaintext highlighter-rouge">notify</code> would need to skip over the superclass method.</p>

<h3 id="promises">Promises</h3>

<p>We can build promises on top of our latch implementation. Basically, a
promise is a one-time-latch where we can query the <code class="language-plaintext highlighter-rouge">notify</code> value more
than once. In a one-time-latch we can only <code class="language-plaintext highlighter-rouge">wait</code> once.</p>

<p>Our promise will have two similar methods, <code class="language-plaintext highlighter-rouge">deliver</code> (like notify),
and <code class="language-plaintext highlighter-rouge">retrieve</code> (like wait). If a value has been delivered already,
<code class="language-plaintext highlighter-rouge">retrieve</code> will return that value. Otherwise, it will block and wait
until a value is delivered,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defclass</span> <span class="nv">promise</span> <span class="p">()</span>
  <span class="p">((</span><span class="nv">latch</span> <span class="ss">:initform</span> <span class="p">(</span><span class="nv">make-one-time-latch</span><span class="p">))</span>
   <span class="p">(</span><span class="nv">delivered</span> <span class="ss">:initform</span> <span class="no">nil</span><span class="p">)</span>
   <span class="p">(</span><span class="nv">value</span> <span class="ss">:initform</span> <span class="no">nil</span><span class="p">)))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">make-promise</span> <span class="p">()</span>
  <span class="p">(</span><span class="nb">make-instance</span> <span class="ss">'promise</span><span class="p">))</span>
</code></pre></div></div>

<p>It has three slots, the one-time-latch used for blocking, a Boolean
determining the delivery status, and the <code class="language-plaintext highlighter-rouge">value</code> of the promise.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmethod</span> <span class="nv">deliver</span> <span class="p">((</span><span class="nv">promise</span> <span class="nv">promise</span><span class="p">)</span> <span class="nv">value</span><span class="p">)</span>
  <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">slot-value</span> <span class="nv">promise</span> <span class="ss">'delivered</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">error</span> <span class="s">"Promise has already been delivered."</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">slot-value</span> <span class="nv">promise</span> <span class="ss">'value</span><span class="p">)</span> <span class="nv">value</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">slot-value</span> <span class="nv">promise</span> <span class="ss">'delivered</span><span class="p">)</span> <span class="no">t</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">notify</span> <span class="p">(</span><span class="nb">slot-value</span> <span class="nv">promise</span> <span class="ss">'latch</span><span class="p">)</span> <span class="nv">value</span><span class="p">)))</span>

<span class="p">(</span><span class="nb">defmethod</span> <span class="nv">retrieve</span> <span class="p">((</span><span class="nv">promise</span> <span class="nv">promise</span><span class="p">))</span>
  <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">slot-value</span> <span class="nv">promise</span> <span class="ss">'delivered</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">slot-value</span> <span class="nv">promise</span> <span class="ss">'value</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">wait</span> <span class="p">(</span><span class="nb">slot-value</span> <span class="nv">promise</span> <span class="ss">'latch</span><span class="p">))))</span>
</code></pre></div></div>

<p>A promise can only be delivered once, so it throws an error if it is
attempted more than once. Otherwise it updates the promise state and
releases anything waiting on it.</p>

<h3 id="what-to-do-with-this">What to do with this?</h3>

<p>Locking up Emacs’ UI really limits the usefulness of this
library. Since Emacs’ primary purpose is being a text editor, it needs
to remain very lively or else the user will become annoyed. If I used
a synchronous version of <code class="language-plaintext highlighter-rouge">skewer-eval</code>, Emacs would completely lock up
(easily interrupted with <code class="language-plaintext highlighter-rouge">C-g</code>) until the browser responds — which
would be never if no browser is connected. That’s unacceptable.</p>

<p>Also, not very many Emacs functions have the callback pattern. The
only core function I’m aware of that does is <code class="language-plaintext highlighter-rouge">url-retrieve</code>, but it
already has a <code class="language-plaintext highlighter-rouge">url-retrieve-synchronously</code> counterpart.</p>

<p>Please tell me if you have a neat use of any of this!</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  <entry>
    <title>An Emacs Pastebin</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/12/29/"/>
    <id>urn:uuid:cbfbf5b0-607d-34d0-6f31-b2712d4e421f</id>
    <updated>2012-12-29T00:00:00Z</updated>
    <category term="elisp"/><category term="emacs"/><category term="javascript"/><category term="web"/>
    <content type="html">
      <![CDATA[<p>Luke is doing an interesting <s>three</s>five-part tutorial on writing
a pastebin in PHP: <a href="http://terminally-incoherent.com/blog/2012/12/17/php-like-a-pro-part-1/">PHP Like a Pro</a> (<a href="http://terminally-incoherent.com/blog/2012/12/19/php-like-a-pro-part-2/">2</a>, <a href="http://terminally-incoherent.com/blog/2012/12/26/php-like-a-pro-part-3/">3</a>,
<a href="http://terminally-incoherent.com/blog/2013/01/02/php-like-a-pro-part-4/">4</a>, <a href="http://terminally-incoherent.com/blog/2013/01/04/php-like-a-pro-part-5/">5</a>). The tutorial is largely an introduction to
the set of tools a professional would use to accomplish a more
involved project, the most interesting of which, for me, is
<a href="http://vagrantup.com/">Vagrant</a>.</p>

<p>Because I have <a href="http://me.veekun.com/blog/2012/04/09/php-a-fractal-of-bad-design/">no intention of ever using PHP</a>, I decided to
follow along in parallel with my own version. I used Emacs Lisp with
my <a href="/blog/2012/08/20/">simple-httpd</a> package for the server. I really
like my servlet API so was a lot more fun than I expected it to be!
Here’s the source code,</p>

<ul>
  <li><a href="https://github.com/skeeto/emacs-pastebin">https://github.com/skeeto/emacs-pastebin</a></li>
</ul>

<p>Here’s what it looked like once I was all done,</p>

<p><a href="/img/screenshot/pastebin.png"><img src="/img/screenshot/pastebin-thumb.png" alt="" /></a></p>

<p>It has syntax highlighting, paste expiration, and light version
control. The server side is as simple as possible, consisting of only
three servlets,</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">/pastebin/</code>: static files</li>
  <li><code class="language-plaintext highlighter-rouge">/pastebin/get</code>: serves (immutable) pastes in JSON</li>
  <li><code class="language-plaintext highlighter-rouge">/pastebin/post</code>: accepts new pastes in JSON, returns the ID</li>
</ul>

<p>A paste’s JSON is the raw paste content plus some metadata, including
post date, expiration date, language (highlighting), parent paste ID,
and title. That’s it! The server is just a database and static file
host. It performs no dynamic page generation. Instead, the client-side
JavaScript does all the work.</p>

<p>For you non-Emacs users, the repository has a <code class="language-plaintext highlighter-rouge">pastebin-standalone.el</code>
which can be used to launch a standalone instance of the pastebin
server, so long as you have Emacs on your computer. It will fetch any
needed dependencies automatically. See the header comment of this file
for instructions.</p>

<h3 id="ids">IDs</h3>

<p>A paste ID is four or more randomly-generated numbers, letters, dashes
or underscores, with some minor restrictions (<code class="language-plaintext highlighter-rouge">pastebin-id-valid-p</code>).
It’s appended to the end of the servlet URL.</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">/pastebin/&lt;id&gt;</code></li>
  <li><code class="language-plaintext highlighter-rouge">/pastebin/get/&lt;id&gt;</code></li>
</ul>

<p>In the first case, the servlet entirely ignores the ID. Its job is
only to serve static files. In the second case the server looks up the
ID in the database and returns the paste JSON.</p>

<p>The client-side inspects the page’s URL to determine the ID currently
being viewed, if any. It performs an asynchronous request to
<code class="language-plaintext highlighter-rouge">/pastebin/get/&lt;id&gt;</code> to fetch the paste and insert the result, if
found, into the current page.</p>

<p>Form submission isn’t done the normal way. Instead, the submission is
intercepted by an event handler, which wraps the form data up in JSON
(much cleaner to parse!) and sends it asynchronously to
<code class="language-plaintext highlighter-rouge">/pastebin/post</code> via POST. This servlet inserts the paste in the
database and responds in <code class="language-plaintext highlighter-rouge">text/plain</code> with the paste ID it
generated. The client-side then redirects the browser to the paste URL
for that paste.</p>

<h3 id="features">Features</h3>

<p>As I said, the server performs no page generation, so syntax
highlighting is done in the client with
<a href="http://softwaremaniacs.org/soft/highlight/en/">highlight.js</a>. I <em>could</em> have used <a href="http://emacswiki.org/emacs/Htmlize">htmlize</a>
and supported any language that Emacs supports. However, I wanted to
keep the server as simple as possible, and, more importantly, I
<em>really</em> don’t trust Emacs’ various modes to be secure in operating on
arbitrary data. That’s a huge attack surface and these modes were
written without security in mind (fairly reasonable). It’s actually a
deliberate feature for Emacs to automatically <code class="language-plaintext highlighter-rouge">eval</code> Elisp in comments
<a href="http://www.gnu.org/software/emacs/manual/html_node/emacs/Specifying-File-Variables.html">under certain circumstances</a>.</p>

<p>Version control is accomplished by keeping track of which paste was
the parent of the paste being posted. When viewing a paste, the
content is also placed in a textarea for editing. Submitting this form
will create a new paste with the current paste as the parent. When
viewing a paste that has a parent, a “diff” option is provided to view
a diff patch of the current paste with its parent (see the screenshot
above). Again, the server is dead simple, so this patch is computed by
JavaScript after fetching the parent paste from the server.</p>

<h3 id="databases">Databases</h3>

<p>As part of my fun I made a generic database API for the servlets, then
implemented three different database backends. I used eieio, Emacs
Lisp’s CLOS-like object system, to implement this API. Creating a new
database backend is just a matter of making a new class that
implements two specific methods.</p>

<p>The first, and default, implementation uses an Elisp hash table for
storage, which is lost when Emacs exits.</p>

<p>The second is a flat-file database. I estimate it should be able to
support at least 16 million different pastes gracefully. The on-disk
format for pastes is an s-expression. Basically, this is read by
Emacs, expiration date checked, converted to JSON, then served to the
client.</p>

<p>To my great surprise there is practically no support for programmatic
access to a SQL database from <em>GNU</em> Emacs Lisp (other Emacsen do). The
closest I found was <a href="http://www.online-marketwatch.com/pgel/pg.html">pg.el</a>, which is asynchronous by
necessity. However, the specific target I had in mind was SQLite.</p>

<p>I <em>did</em> manage to implement a third backend that uses SQLite, but it’s
a big hack. It invokes the <code class="language-plaintext highlighter-rouge">sqlite3</code> command line program once for
every request, asking for a response in CSV — the only output format
that seems to escape unambiguously. This response then has to be
parsed, so long as it’s not too long to blow the regex stack.</p>

<p><em>Update February 2014</em>: I have
<a href="/blog/2014/02/06/">found a solution to this problem</a>!</p>

<h3 id="future">Future</h3>

<p>This has been an educational project for me. As a tutorial and for
practice I’ll probably write the server again from scratch using other
languages and platforms (Node.js and Hunchentoot maybe?), keeping the
same front-end.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>How a simple-httpd Vulnerability Slipped In</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/12/18/"/>
    <id>urn:uuid:b27d5765-62bd-3770-5b69-b4c0572f7213</id>
    <updated>2012-12-18T00:00:00Z</updated>
    <category term="elisp"/>
    <content type="html">
      <![CDATA[<p>Over Thanksgiving weekend I discovered a vulnerability in my recent
<a href="/blog/2012/08/20/">simple-httpd</a> overhaul. I fixed it immediately and
pushed out the patch. Despite being careful about the translation of
request paths to filesystems paths, one thing slipped by. Here’s how.</p>

<p>When writing my original web server a couple of years ago I
established a global variable, <code class="language-plaintext highlighter-rouge">httpd-root</code>, which is the location on
the filesystem from where all files are served. Nothing above this
directory should be visible to clients in any way. The simple,
dangerous way to do this is with a plain <code class="language-plaintext highlighter-rouge">concat</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">concat</span> <span class="nv">httpd-root</span> <span class="nv">request-path</span><span class="p">)</span>
</code></pre></div></div>

<p>The vulnerability here is that <code class="language-plaintext highlighter-rouge">request-path</code> could contain <code class="language-plaintext highlighter-rouge">..</code>, a
reference to the parent directory. This would allow a request to
access anything on the filesystem. This was obvious to me at the time,
so I wrote a <code class="language-plaintext highlighter-rouge">httpd-clean-path</code> to remove any <code class="language-plaintext highlighter-rouge">..</code> portions in the
request. As long as <code class="language-plaintext highlighter-rouge">httpd-root</code> isn’t an empty string, this closes
all the holes.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">concat</span> <span class="nv">httpd-root</span> <span class="p">(</span><span class="nv">httpd-clean-path</span> <span class="nv">request-path</span><span class="p">))</span>
</code></pre></div></div>

<p>A couple of years later after, I’ve honed my Emacs Lisp skill, I go
through refactoring everything. I’ve since learned about the function
<a href="http://www.gnu.org/software/emacs/manual/html_node/elisp/File-Name-Expansion.html"><code class="language-plaintext highlighter-rouge">expand-file-name</code></a> and when I see <code class="language-plaintext highlighter-rouge">concat</code> being
used to build a path, I change it to this function, which more
appropriate for the job. This happened in commit <a href="https://github.com/skeeto/emacs-http-server/commit/3b405343977df26eee6706a9a4d244e92d695fd5">3b405343</a>
(2012-08-07). I’m using <code class="language-plaintext highlighter-rouge">httpd-clean-path</code> to handle everything
dangerous, so it’s safe, right?!</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">expand-file-name</span> <span class="p">(</span><span class="nv">httpd-clean-path</span> <span class="nv">request-path</span><span class="p">)</span> <span class="nv">httpd-root</span><span class="p">)</span>
</code></pre></div></div>

<p>When the path being expanded by <code class="language-plaintext highlighter-rouge">expand-file-name</code> is an absolute
path, that path is returned directly, ignoring the second
argument. Unfortunately, anything beginning with <code class="language-plaintext highlighter-rouge">~</code> is an absolute
path, because this automatically expands into a home directory. With
<code class="language-plaintext highlighter-rouge">concat</code> this didn’t need to be handled because any <code class="language-plaintext highlighter-rouge">~</code> was always
prepended with <code class="language-plaintext highlighter-rouge">httpd-root</code>. Now that I was using <code class="language-plaintext highlighter-rouge">expand-file-name</code>,
this allowed everyone read-access to everything in the hosting user’s
home directory if the request path started the request path with a
<code class="language-plaintext highlighter-rouge">~</code>. Doh!</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">expand-file-name</span> <span class="s">"~foo"</span> <span class="s">"/etc"</span><span class="p">)</span>  <span class="c1">; =&gt; "/home/foo"</span>
</code></pre></div></div>

<p>The fix is dead simple: prefix the cleaned path with a <code class="language-plaintext highlighter-rouge">./</code> before
using <code class="language-plaintext highlighter-rouge">expand-file-name</code>. This forces the path to be relative so that
it’s expanded properly.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">expand-file-name</span> <span class="p">(</span><span class="nv">concat</span> <span class="s">"./"</span> <span class="s">"~foo"</span><span class="p">)</span> <span class="s">"/etc"</span><span class="p">)</span>  <span class="c1">; =&gt; "/etc/~foo"</span>
</code></pre></div></div>

<p>I apologize to anyone using simple-httpd. The fix has been on MELPA
for about a month now, so make sure you’re updated. I myself have a
long-running simple-httpd instance exposed to the Internet
(impatient-mode makes for a wonderful pastebin!), and when I found
this my stomach sunk in panic. I do keep an eye on my <code class="language-plaintext highlighter-rouge">*httpd*</code> log
and I never saw anyone exploit this. Due to simple-httpd’s obscurity
I’m pretty sure no one else discovered this anyway. The vulnerability
only existed for about three months before I caught it. If you are
able to find any other vulnerability please tell me!</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Elisp Weak References</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/12/17/"/>
    <id>urn:uuid:45be87f8-bd8f-3012-09fb-336698c25046</id>
    <updated>2012-12-17T00:00:00Z</updated>
    <category term="elisp"/>
    <content type="html">
      <![CDATA[<p>Today I added a <code class="language-plaintext highlighter-rouge">skewer-eval-print-last-expression</code> function to
<a href="/blog/2012/10/31/">Skewer</a>, functionality I’ve been sorely missing
for awhile. To properly support it I needed a hash table with
automatically expiring entries. Specifically, I needed to keep track
of state in Emacs that I couldn’t trust the (untrusted) browser to
track for me. The alternative would be to send an encrypted blob to
the browser along with code to evaluate, which would send back with
the result. Instead of getting into questionable, hand-rolled
encryption I wrote an <a href="https://github.com/skeeto/skewer-mode/blob/master/cache-table.el">expiring hash table</a>
implementation.</p>

<p>This had me take a careful look over
<a href="http://www.gnu.org/software/emacs/manual/html_node/elisp/Hash-Tables.html">Elisp’s hash table documentation</a>, which reminded me of a
cool feature they have: key/value weakness. The hash table can be
configured such that it doesn’t prevent its keys and values from being
garbage collected. Elisp’s hash tables are <em>really</em> flexible in this
regard; any combination of key and value weakness is supported. This
is more flexible than Java’s <a href="http://docs.oracle.com/javase/7/docs/api/java/util/WeakHashMap.html">WeakHashMap</a>, which only
supports weak keys. For example, to make a hash table that weakly
holds its values,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">make-hash-table</span> <span class="ss">:weakness</span> <span class="ss">'value</span><span class="p">)</span>
</code></pre></div></div>

<p>Oddly, Elisp lacks functionality to use weak references more
generally. Fortunately <a href="https://github.com/skeeto/elisp-weak-ref">this can be fixed</a>!</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">weak-ref</span> <span class="p">(</span><span class="nv">thing</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">ref</span> <span class="p">(</span><span class="nb">make-hash-table</span> <span class="ss">:size</span> <span class="mi">1</span> <span class="ss">:weakness</span> <span class="no">t</span> <span class="ss">:test</span> <span class="ss">'eq</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">prog1</span> <span class="nv">ref</span>
      <span class="p">(</span><span class="nv">puthash</span> <span class="no">t</span> <span class="nv">thing</span> <span class="nv">ref</span><span class="p">))))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">deref</span> <span class="p">(</span><span class="nv">ref</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">gethash</span> <span class="no">t</span> <span class="nv">ref</span><span class="p">))</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">weak-ref</code> wraps an object in a weak hash table of size 1 under the
key <code class="language-plaintext highlighter-rouge">t</code>. The second function, <code class="language-plaintext highlighter-rouge">deref</code>, fetches the object from the
hash table if it’s still there. Otherwise it returns <code class="language-plaintext highlighter-rouge">nil</code>. Here it is
in action,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">setq</span> <span class="nv">ref</span> <span class="p">(</span><span class="nv">weak-ref</span> <span class="p">(</span><span class="nb">list</span> <span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span><span class="p">)))</span>

<span class="c1">;; It's still there.</span>
<span class="p">(</span><span class="nv">deref</span> <span class="nv">ref</span><span class="p">)</span>  <span class="c1">; =&gt; (1 2 3)</span>

<span class="c1">;; Now run garbage collection.</span>
<span class="p">(</span><span class="nv">garbage-collect</span><span class="p">)</span>

<span class="c1">;; The list has been garbage collected.</span>
<span class="p">(</span><span class="nv">deref</span> <span class="nv">ref</span><span class="p">)</span>  <span class="c1">; =&gt; nil</span>
</code></pre></div></div>

<p>I had to use <code class="language-plaintext highlighter-rouge">setq</code> here instead of <code class="language-plaintext highlighter-rouge">defvar</code> because garbage
collection seems to always get triggered after <code class="language-plaintext highlighter-rouge">defvar</code>.</p>

<p>I don’t have a use-case for this at the moment. Weak references are
mostly useful in hash tables (caches), and these functions would be
entirely redundant in that case. I originally implemented these as
macros, but I feel it made them too inflexible — they couldn’t be
passed as a function.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>A Use For Macrolet</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/12/06/"/>
    <id>urn:uuid:271d91dd-abad-31c8-4e26-f8dfdd5ffa92</id>
    <updated>2012-12-06T00:00:00Z</updated>
    <category term="lisp"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>I recently had a good use for Common Lisp’s <code class="language-plaintext highlighter-rouge">macrolet</code> special
operator. Just as <code class="language-plaintext highlighter-rouge">let</code> establishes a new variable bindings and <code class="language-plaintext highlighter-rouge">flet</code>
establishes new function bindings, <code class="language-plaintext highlighter-rouge">macrolet</code> establishes a new macro
definitions.</p>

<p>For example, here’s a locally-defined <a href="http://en.wikipedia.org/wiki/Anaphoric_macro">anaphoric</a> <code class="language-plaintext highlighter-rouge">lambda</code>
macro called <code class="language-plaintext highlighter-rouge">fn</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">macrolet</span> <span class="p">((</span><span class="nv">fn</span> <span class="p">(</span><span class="k">&amp;body</span> <span class="nv">body</span><span class="p">)</span> <span class="o">`</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">_</span><span class="p">)</span> <span class="o">,@</span><span class="nv">body</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">map</span> <span class="ss">'string</span> <span class="p">(</span><span class="nv">fn</span> <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">standard-char-p</span> <span class="nv">_</span><span class="p">)</span> <span class="nv">_</span> <span class="sc">#\*</span><span class="p">))</span> <span class="s">"naïve"</span><span class="p">))</span>
<span class="c1">;; =&gt; "na*ve"</span>
</code></pre></div></div>

<p>My particular use case was about making my code cleaner for
<a href="http://redd.it/137f7h">a brainfuck interpreter</a>. The state of the machine was being
tracked by this struct. (Interesting side note: SBCL warns about using
<code class="language-plaintext highlighter-rouge">p</code> as a slot name because the accessor function will look like a
predicate.)</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defstruct</span> <span class="nv">bf</span>
  <span class="p">(</span><span class="nv">p</span> <span class="mi">0</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">mem</span> <span class="p">(</span><span class="nb">make-array</span> <span class="mi">30000</span> <span class="ss">:initial-element</span> <span class="mi">0</span><span class="p">)))</span>
</code></pre></div></div>

<p>The BF instructions <code class="language-plaintext highlighter-rouge">+</code> and <code class="language-plaintext highlighter-rouge">-</code> increment the byte at the data
pointer. The Common Lisp <code class="language-plaintext highlighter-rouge">incf</code> and <code class="language-plaintext highlighter-rouge">decf</code> macros can be used to do
this. Similarly, the <code class="language-plaintext highlighter-rouge">,</code> instruction sets the byte at the data
pointer, which can be done with <code class="language-plaintext highlighter-rouge">setf</code>. All three of these macros are
<em>place</em>-modifying.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">interp</span> <span class="p">(</span><span class="nv">program</span> <span class="nv">state</span><span class="p">)</span>
  <span class="c1">;; ...</span>
  <span class="p">(</span><span class="nb">incf</span> <span class="p">(</span><span class="nb">aref</span> <span class="p">(</span><span class="nv">bf-mem</span> <span class="nv">state</span><span class="p">)</span> <span class="p">(</span><span class="nv">bf-p</span> <span class="nv">state</span><span class="p">)))</span>
  <span class="c1">;; ...</span>
  <span class="p">(</span><span class="nb">decf</span> <span class="p">(</span><span class="nb">aref</span> <span class="p">(</span><span class="nv">bf-mem</span> <span class="nv">state</span><span class="p">)</span> <span class="p">(</span><span class="nv">bf-p</span> <span class="nv">state</span><span class="p">)))</span>
  <span class="c1">;; ...</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">aref</span> <span class="p">(</span><span class="nv">bf-mem</span> <span class="nv">state</span><span class="p">)</span> <span class="p">(</span><span class="nv">bf-p</span> <span class="nv">state</span><span class="p">))</span> <span class="p">(</span><span class="nb">char-code</span> <span class="p">(</span><span class="nb">read-char</span><span class="p">))))</span>
</code></pre></div></div>

<p>That’s a whole lot of redundancy for a Lisp program. Under similar
circumstances elsewhere I might use <code class="language-plaintext highlighter-rouge">flet</code> to reduce it.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; This won't work.</span>
<span class="p">(</span><span class="nb">defun</span> <span class="nv">interp</span> <span class="p">(</span><span class="nv">program</span> <span class="nv">state</span><span class="p">)</span>
  <span class="p">(</span><span class="k">flet</span> <span class="p">((</span><span class="nv">ref</span> <span class="p">()</span> <span class="p">(</span><span class="nb">aref</span> <span class="p">(</span><span class="nv">bf-mem</span> <span class="nv">state</span><span class="p">)</span> <span class="p">(</span><span class="nv">bf-p</span> <span class="nv">state</span><span class="p">))))</span>
    <span class="c1">;; ...</span>
    <span class="p">(</span><span class="nb">incf</span> <span class="p">(</span><span class="nv">ref</span><span class="p">))</span>
    <span class="c1">;; ...</span>
    <span class="p">(</span><span class="nb">decf</span> <span class="p">(</span><span class="nv">ref</span><span class="p">))))</span>
</code></pre></div></div>

<p>The problem is that <code class="language-plaintext highlighter-rouge">ref</code> isn’t a <a href="http://www.lispworks.com/documentation/HyperSpec/Body/05_aa.htm"><em>generalized reference</em></a>,
which <code class="language-plaintext highlighter-rouge">incf</code>, <code class="language-plaintext highlighter-rouge">decf</code>, and <code class="language-plaintext highlighter-rouge">setf</code> all require. Common Lisp’s
<em>place</em>-modifying utilities are implemented as macros. It’s known at
compile-time what kind of place they are modifying: a variable, array
index, object/struct slot, car, cdr, or many other things (Emacs <code class="language-plaintext highlighter-rouge">cl</code>
package allows all sorts of things to be <code class="language-plaintext highlighter-rouge">setf</code>ed, like
<code class="language-plaintext highlighter-rouge">(point)</code>). The macro expands into the proper form for setting that
kind of place.</p>

<p>The specific expansion is implementation-dependent, but, for example,
<code class="language-plaintext highlighter-rouge">setf</code> could expand into a <code class="language-plaintext highlighter-rouge">setq</code> when the first argument is a
symbol. New generalized references can be defined with <code class="language-plaintext highlighter-rouge">defsetf</code>.</p>

<p>In my case, a simple macro expansion can fill the role. Below, the
place-modifying macro will expand <code class="language-plaintext highlighter-rouge">ref</code>
(<a href="http://www.lispworks.com/documentation/lw60/CLHS/Body/05_abg.htm"><em>after</em> looking elsewhere</a>) to decide what to do, and <code class="language-plaintext highlighter-rouge">ref</code>
will expand to an <code class="language-plaintext highlighter-rouge">aref</code> form.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">interp</span> <span class="p">(</span><span class="nv">program</span> <span class="nv">state</span><span class="p">)</span>
  <span class="p">(</span><span class="k">macrolet</span> <span class="p">((</span><span class="nv">ref</span> <span class="p">()</span> <span class="o">'</span><span class="p">(</span><span class="nb">aref</span> <span class="p">(</span><span class="nv">bf-mem</span> <span class="nv">state</span><span class="p">)</span> <span class="p">(</span><span class="nv">bf-p</span> <span class="nv">state</span><span class="p">))))</span>
    <span class="c1">;; ...</span>
    <span class="p">(</span><span class="nb">incf</span> <span class="p">(</span><span class="nv">ref</span><span class="p">))</span>
    <span class="c1">;; ...</span>
    <span class="p">(</span><span class="nb">decf</span> <span class="p">(</span><span class="nv">ref</span><span class="p">))</span>
    <span class="c1">;; ...</span>
    <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">ref</span><span class="p">)</span> <span class="p">(</span><span class="nb">char-code</span> <span class="p">(</span><span class="nb">read-char</span><span class="p">)))))</span>
</code></pre></div></div>

<p>Because the macro has no parameters I could have even more easily used
<code class="language-plaintext highlighter-rouge">symbol-macrolet</code>. I just didn’t think of it at the time.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Emacs Abnormal Termination</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/09/28/"/>
    <id>urn:uuid:81986f77-1a4a-3b00-ec3a-a6517f9ca4ca</id>
    <updated>2012-09-28T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="debian"/>
    <content type="html">
      <![CDATA[<p><em>Update: This bug was fixed in Emacs 24.4 (released October 2014).</em></p>

<p>A few months ago I <a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=682995">filed a bug report for Emacs</a>
(<a href="http://lists.gnu.org/archive/html/bug-gnu-emacs/2012-07/msg01071.html">upstream</a>) when I stumbled across Emacs aborting under <em>very</em>
specific circumstances. I was editing in <a href="http://jblevins.org/projects/markdown-mode/">markdown-mode</a> and a
regular expression replacement on lists would reliably, and
frustratingly, cause Emacs to crash.</p>

<p>Through a sort-of binary search I only loaded only half of
markdown-mode to see in which half it would trigger, then I cut that
half in half again and repeated recursively until I had it down to a
small expression that causes a <code class="language-plaintext highlighter-rouge">--no-init-file</code> (<code class="language-plaintext highlighter-rouge">-q</code>) Emacs to
abort. It almost looks like I found it through fuzz testing. Change or
remove anything even slightly and it no longer triggers the abort.</p>

<p>To trigger it, there’s an <code class="language-plaintext highlighter-rouge">after-change-functions</code> hook that performs
a regular expression search immediately after a <code class="language-plaintext highlighter-rouge">replace-regexp</code>. A
peek at the backtrace with gdb shows that this somehow causes the
point to leave the bounds of the buffer. Emacs detects this as an
assertion before dereferencing anything, and it aborts, thus
preventing a buffer overflow vulnerability. This is important for
<a href="/blog/2009/05/17/">my Emacs web server</a> because if there’s a way to
trigger this bug in the web server I’d much rather have it abort than
run arbitrary shellcode injected in by a malicious HTTP request.</p>

<p>My bug report has seen no activity since I posted it. I can understand
why. The circumstances to trigger it are unlikely and it’s a very old
bug, so it’s low priority. It’s also a huge pain to debug. Hacking on
Emacs from Lisp is pleasant but hacking on Emacs from C is not. The
bug likely sits in the bowels of the complicated regular expression
engine, making it even more unpleasant. I personally have no interest
in trying to fix it myself.</p>

<p>So, since it looks like it’s here for the long haul it’s kind of fun
to implement an <code class="language-plaintext highlighter-rouge">abort</code> function on top of it, allowing Elisp programs
to terminate Emacs abnormally — you know, in case <code class="language-plaintext highlighter-rouge">kill-emacs</code> isn’t
fun enough.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nb">abort</span> <span class="p">()</span>
  <span class="s">"Ask Emacs to abnormally terminate itself (bug#12077)."</span>
  <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">with-temp-buffer</span>
    <span class="p">(</span><span class="nv">insert</span> <span class="s">"#\n*\n"</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">goto-char</span> <span class="p">(</span><span class="nv">point-min</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'after-change-functions</span>
              <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span> <span class="nv">c</span><span class="p">)</span> <span class="p">(</span><span class="nv">re-search-forward</span> <span class="s">""</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">replace-regexp</span> <span class="s">"^\\*"</span> <span class="s">" *"</span><span class="p">)))</span>
</code></pre></div></div>

<p>It’s interactive so you could even bind a key to it.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Programs as Elisp Macros</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/09/21/"/>
    <id>urn:uuid:22a67760-c114-3285-fff8-36a6d23f0c65</id>
    <updated>2012-09-21T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>This evening I came across an interesting idea:
<a href="http://sunng.info/blog/2012/09/shake-every-program-can-be-a-clojure-function/">using system programs as functions</a>. The original idea goes to
<a href="http://amoffat.github.com/sh/index.html"><code class="language-plaintext highlighter-rouge">sh</code></a>, a Python module that
exposes system programs as functions. There’s also a Clojure library
called <a href="https://github.com/sunng87/shake/"><code class="language-plaintext highlighter-rouge">shake</code></a> to do the same thing in Clojure.</p>

<p>Thanks to symbols, I think the idea maps especially well onto Lisp
because arguments don’t need to be provided as strings. Here are some
examples,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(ls -lh)
(uname -a)
(cat /etc/debian_version)
(git checkout -b foo)
</code></pre></div></div>

<p>It’s easy to achieve the same effect in Elisp,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: t; -*-</span>
<span class="p">(</span><span class="nb">require</span> <span class="ss">'cl</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">make-shell-macro</span> <span class="p">(</span><span class="nv">program</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">fset</span> <span class="nv">program</span>
        <span class="p">(</span><span class="nb">cons</span> <span class="ss">'macro</span>
              <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">args</span><span class="p">)</span>
                <span class="o">`</span><span class="p">(</span><span class="nv">with-temp-buffer</span>
                   <span class="p">(</span><span class="nb">funcall</span> <span class="nf">#'</span><span class="nv">call-process</span>
                            <span class="o">,</span><span class="p">(</span><span class="nb">symbol-name</span> <span class="nv">program</span><span class="p">)</span> <span class="no">nil</span> <span class="no">t</span> <span class="no">nil</span>
                            <span class="o">,@</span><span class="p">(</span><span class="nb">mapcar</span> <span class="nf">#'</span><span class="nb">prin1-to-string</span> <span class="nv">args</span><span class="p">))</span>
                   <span class="p">(</span><span class="nv">buffer-string</span><span class="p">))))))</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">path</span> <span class="p">(</span><span class="nb">mapcan</span> <span class="nf">#'</span><span class="nv">directory-files</span> <span class="p">(</span><span class="nv">parse-colon-path</span> <span class="p">(</span><span class="nv">getenv</span> <span class="s">"PATH"</span><span class="p">)))))</span>
  <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">program</span> <span class="p">(</span><span class="nb">remove-if</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">f</span><span class="p">)</span> <span class="p">(</span><span class="nb">member</span> <span class="nv">f</span> <span class="o">'</span><span class="p">(</span><span class="s">"."</span> <span class="s">".."</span><span class="p">)))</span> <span class="nv">path</span><span class="p">))</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nc">symbol</span> <span class="p">(</span><span class="nb">intern</span> <span class="nv">program</span><span class="p">)))</span>
      <span class="p">(</span><span class="nb">unless</span> <span class="p">(</span><span class="nb">fboundp</span> <span class="nc">symbol</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">make-shell-macro</span> <span class="nc">symbol</span><span class="p">)))))</span>
</code></pre></div></div>

<p>Evaluating the above will install macros for all programs in your
<code class="language-plaintext highlighter-rouge">PATH</code>, except where you already have functions or macros defined. I
messed up on the latter point while writing this and broke Emacs
enough to require a restart. The system program is called
synchronously and the output is returned as a string.</p>

<p>However, <em>because</em> arguments aren’t evaluated (macros) this has
limited usefulness. These function calls are static and can’t be
passed variable arguments. In order to do that arguments would need to
be evaluated and symbols would need to be quoted. For example,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">git-checkout</span> <span class="p">(</span><span class="nv">branch</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">git</span> <span class="ss">'checkout</span> <span class="nv">branch</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">ls-l</span> <span class="p">(</span><span class="nv">file</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">ls</span> <span class="ss">'-l</span> <span class="nv">file</span><span class="p">))</span>
</code></pre></div></div>

<p>So I think I’d prefer this interface to the one provided by Clojure’s
<code class="language-plaintext highlighter-rouge">shake</code> (and my Elisp code at the top). I have little need to call
programs with static arguments.</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Elisp Recursive Descent Parser (rdp)</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/09/20/"/>
    <id>urn:uuid:0cb87ff3-6862-3772-6d64-3222ff8e56fe</id>
    <updated>2012-09-20T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="lisp"/>
    <content type="html">
      <![CDATA[<p>I recently developed a recursive descent parser, named rdp, for use in
Emacs Lisp programs. I’ve already used it to write a compiler.</p>

<ul>
  <li><a href="https://github.com/skeeto/rdp">https://github.com/skeeto/rdp</a></li>
</ul>

<p>It’s available as a package on <a href="http://melpa.milkbox.net/">MELPA</a>.</p>

<h3 id="the-long-story">The Long Story</h3>

<p>Last month <a href="http://www.50ply.com/">Brian</a> invited me to take
<a href="(http://www.cs.brown.edu/courses/cs173/2012/)">a free, online programming languages course</a> with him. You
may recall that <a href="/blog/2011/01/11/">we developed a programming language
together</a> so it was only natural we would take
this class.</p>

<p>The first part of the class is oriented around a small programming
language created just for this class called <a href="http://www.cs.brown.edu/courses/cs173/2012/Assignments/ParselTest/">ParselTongue</a>.  It
looks like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>deffun evenp(x)
    if ==(x, 0) then
        true
    else if ==(x, 1) then
            false
        else evenp(-(x, 2))
in defvar x = 14 in {
    while (evenp(x)) { x--; };   # Make sure x odd
    print("This is an odd number: ");
    print(x);
    ""; # No output
}
</code></pre></div></div>

<p>I’ve gotten so used to having a solid Emacs major mode when coding
that I can’t stand writing code without the support of a major
mode. Since this language was invented recently <em>just</em> for this class
there was no mode for it, nor would there be unless someone stepped up
to make one. I ended up taking that role. It was an opportunity to
learn how to create a major mode, something I had never done before.</p>

<p>It’s called <a href="https://github.com/skeeto/psl-mode">psl-mode</a>.</p>

<p>At first it was just some syntax highlighting (very easy) and some
poor automatic indentation. The indentation function would get
confused by anything non-trivial. It’s actually <em>really</em> hard to get
it right. I’ve grown a much better appreciation for automatic
indentation in other modes.</p>

<p>In an attempt to improve this I decided I would try to fully parse the
language and use the resulting parse tree to determine indentation —
something like the depth of the pointer in the
tree. <a href="/blog/2009/01/04/">My experience with Perl’s Parse::RecDescent</a>
some years ago was very positive and I wanted to reproduce that
effect. However, rather than write the grammar in a separate language
that mixes in the programming language, which I find extremely messy,
instead I wanted to use pure s-expressions. A grammar looks very nice
as an alist of symbols.</p>

<h4 id="arithmetic-parser">Arithmetic Parser</h4>

<p>For example, here’s a grammar for simple arithmetic expressions,
including operator precedence and grouping (i.e. “4 + 5 * 2.5”,
“(4 + 5) * 2.5”, etc.).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">arith-tokens</span>
  <span class="o">'</span><span class="p">((</span><span class="nv">sum</span>       <span class="nv">prod</span>  <span class="nv">[</span><span class="p">(</span><span class="nv">[+</span> <span class="nv">-]</span> <span class="nv">sum</span><span class="p">)</span>  <span class="nv">no-sum]</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">prod</span>      <span class="nv">value</span> <span class="nv">[</span><span class="p">(</span><span class="nv">[*</span> <span class="nv">/]</span> <span class="nv">prod</span><span class="p">)</span> <span class="nv">no-prod]</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">num</span>     <span class="o">.</span> <span class="s">"-?[0-9]+\\(\\.[0-9]*\\)?"</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">+</span>       <span class="o">.</span> <span class="s">"\\+"</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">-</span>       <span class="o">.</span> <span class="s">"-"</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">*</span>       <span class="o">.</span> <span class="s">"\\*"</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">/</span>       <span class="o">.</span> <span class="s">"/"</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">pexpr</span>     <span class="s">"("</span> <span class="nv">[sum</span> <span class="nv">prod</span> <span class="nv">num</span> <span class="nv">pexpr]</span> <span class="s">")"</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">value</span>   <span class="o">.</span> <span class="nv">[pexpr</span> <span class="nv">num]</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">no-prod</span> <span class="o">.</span> <span class="s">""</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">no-sum</span>  <span class="o">.</span> <span class="s">""</span><span class="p">)))</span>
</code></pre></div></div>

<p>Strings are regular expressions , the only thing to actually match
input text (<em>terminals</em>). Lists are <em>sequences</em>, where each element in
the list must match in order. Vectors (in brackets) are <em>choices</em>
where one of the elements must match. Symbols name an expression so
that it can be referred to by other expression recursively.</p>

<p>Give this alist to the parser and it will return an s-expression of
the parse tree of the current buffer. Due to the way the grammar must
be written this parse tree isn’t really pleasant to handle
directly. For example, a series of multiplications (“1 * 2 * 3 * 4”)
wouldn’t parse to a nice flat list but with further depth for each
additional operand.</p>

<p>To help squash these, the parser will accept an alist of symbols and
functions which process the parse tree at parse time. For example,
these corresponding functions will make sure <code class="language-plaintext highlighter-rouge">"4 * 5 * 6"</code> gets parsed
into <code class="language-plaintext highlighter-rouge">(* 4 (* 5 (* 6 1)))</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">arith-op</span> <span class="p">(</span><span class="nv">expr</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">destructuring-bind</span> <span class="p">(</span><span class="nv">a</span> <span class="p">(</span><span class="nv">op</span> <span class="nv">b</span><span class="p">))</span> <span class="nv">expr</span>
    <span class="p">(</span><span class="nb">list</span> <span class="nv">op</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>

<span class="p">(</span><span class="nb">defvar</span> <span class="nv">arith-funcs</span>
  <span class="o">`</span><span class="p">((</span><span class="nv">sum</span>     <span class="o">.</span> <span class="o">,</span><span class="nf">#'</span><span class="nv">arith-op</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">prod</span>    <span class="o">.</span> <span class="o">,</span><span class="nf">#'</span><span class="nv">arith-op</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">num</span>     <span class="o">.</span> <span class="o">,</span><span class="nf">#'</span><span class="nv">string-to-number</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">+</span>       <span class="o">.</span> <span class="o">,</span><span class="nf">#'</span><span class="nb">intern</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">-</span>       <span class="o">.</span> <span class="o">,</span><span class="nf">#'</span><span class="nb">intern</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">*</span>       <span class="o">.</span> <span class="o">,</span><span class="nf">#'</span><span class="nb">intern</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">/</span>       <span class="o">.</span> <span class="o">,</span><span class="nf">#'</span><span class="nb">intern</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">pexpr</span>   <span class="o">.</span> <span class="o">,</span><span class="nf">#'</span><span class="nb">cadr</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">value</span>   <span class="o">.</span> <span class="o">,</span><span class="nf">#'</span><span class="nb">identity</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">no-prod</span> <span class="o">.</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">e</span><span class="p">)</span> <span class="o">'</span><span class="p">(</span><span class="nb">*</span> <span class="mi">1</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">no-sum</span>  <span class="o">.</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">e</span><span class="p">)</span> <span class="o">'</span><span class="p">(</span><span class="nb">+</span> <span class="mi">0</span><span class="p">)))))</span>
</code></pre></div></div>

<p>Notice how normal Emacs functions could be supplied directly in most
cases! That makes this approach so elegant in my opinion.</p>

<p>Also, in <code class="language-plaintext highlighter-rouge">arith-op</code> note the use of <code class="language-plaintext highlighter-rouge">destructuring-bind</code>. I’ve found
that macro to be invaluable when writing these syntax tree functions.</p>

<p>In this case, we can be even more clever. Rather than build a nice
parse tree, the expression can be evaluated directly. All it takes is
one small change,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">arith-op</span> <span class="p">(</span><span class="nv">expr</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">destructuring-bind</span> <span class="p">(</span><span class="nv">a</span> <span class="p">(</span><span class="nv">op</span> <span class="nv">b</span><span class="p">))</span> <span class="nv">expr</span>
    <span class="p">(</span><span class="nb">funcall</span> <span class="nv">op</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
</code></pre></div></div>

<p>With this, the parser returns the computed value directly. So this
evaluates to 120.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">rdp-parse-string</span> <span class="s">"4 * 5 * 6"</span> <span class="nv">arith-tokens</span> <span class="nv">arith-funcs</span><span class="p">)</span>
</code></pre></div></div>

<h4 id="parseltongue-compiler">ParselTongue Compiler</h4>

<p>I discovered this useful side effect while making my ParselTongue
parser. The original intention was that I’d parse the buffer for use
in indentation, then maybe I’d create an interpreter to evaluate the
parser output. However, the resulting parse tree was looking a lot
like Elisp. In an epiphany I realized I could simply emit valid Elisp
directly and forgo writing the interpreter altogether. And so I
accidentally created a ParselTongue compiler! This was incredibly
exciting for me to realize.</p>

<p>This ParselTongue program,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>defvar obj = {x: 1} in { obj.x }
</code></pre></div></div>

<p>Compiles to this Elisp,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">obj</span> <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="nb">cons</span> <span class="ss">'x</span> <span class="mi">1</span><span class="p">))))</span>
  <span class="p">(</span><span class="k">progn</span> <span class="p">(</span><span class="nb">cdr</span> <span class="p">(</span><span class="nv">assq</span> <span class="ss">'x</span> <span class="nv">obj</span><span class="p">))))</span>
</code></pre></div></div>

<p>Because it compiles to such a high level language, and because
ParselTongue is very Lisp-like semantically, it’s a bit
unconventional: the compiler emits code <em>during</em> parsing. In fact,
when the parser backtracks, some emitted code is thrown away.</p>

<p>By the end of the first evening I had implemented the majority of the
compiler, which quickly took precedence over indentation. The compiler
is now integrated as part of psl-mode. The current buffer can be
evaluated at any time with <code class="language-plaintext highlighter-rouge">psl-eval-buffer</code>. This function compiles
the buffer and has Emacs <code class="language-plaintext highlighter-rouge">eval</code> the result, printing the output in the
minibuffer. Compiler output can be viewed with
<code class="language-plaintext highlighter-rouge">psl-show-elisp-compilation</code> (mostly for my own debugging).</p>

<p>After a few days I integrated indentation with parsing, which required
modifying the parser (changes included in rdp itself). The parser
needed to keep track of where the point is in the parse tree. For
indentation it basically counts the depth into the parse tree, plus a
few more checks for special cases.</p>

<p>The parser was intentionally isolated from the rest of psl-mode so
that it could be separated for general use, which I have now
done. It’s been a <em>really</em> handy general purpose tool since then. That
arithmetic parser is only 35 lines of code and took about half-an-hour
to create.</p>

<h4 id="future-directions">Future Directions</h4>

<p>I also <a href="(https://github.com/skeeto/emacs-torrent/blob/master/bencode.el)">wrote a bencode parser</a> — <em>only</em> the
<code class="language-plaintext highlighter-rouge">bencode-tokens</code> and <code class="language-plaintext highlighter-rouge">bencode-funcs</code> alists are needed to parse
bencode, about 30 LOC. Careful observation will reveal that I cheated
and the result is a little hackish. Due to the way strings work,
bencode is <em>not</em> context-free so it can’t be parsed purely by the
grammar. I can work around it by having the parse tree function for
strings consume input, since it’s called during parsing.</p>

<p>I’ll be using rdp to parse many more things in the future, I’m
sure. It’s much more powerful than I expected.</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>Fractal Rendering in Emacs</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/09/14/"/>
    <id>urn:uuid:71006dd1-ae7b-3860-e82b-4a0affd6524a</id>
    <updated>2012-09-14T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>Taking advantage of Emacs’ <code class="language-plaintext highlighter-rouge">image-mode</code> and the handy
<a href="http://en.wikipedia.org/wiki/Netpbm_format">Netpbm format</a> it’s
possible to generate and render images <em>inside</em> Emacs using
Elisp. This function will generate a
<a href="http://en.wikipedia.org/wiki/Sierpi%C5%84ski_carpet">Sierpinski carpet</a>
and display the result in a buffer.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">sierpinski</span> <span class="p">(</span><span class="nv">s</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">pop-to-buffer</span> <span class="p">(</span><span class="nv">get-buffer-create</span> <span class="s">"*sierpinski*"</span><span class="p">))</span>
  <span class="p">(</span><span class="nv">fundamental-mode</span><span class="p">)</span> <span class="p">(</span><span class="nv">erase-buffer</span><span class="p">)</span>
  <span class="p">(</span><span class="k">labels</span> <span class="p">((</span><span class="nv">fill-p</span> <span class="p">(</span><span class="nv">x</span> <span class="nv">y</span><span class="p">)</span>
                   <span class="p">(</span><span class="nb">cond</span> <span class="p">((</span><span class="nb">or</span> <span class="p">(</span><span class="nb">zerop</span> <span class="nv">x</span><span class="p">)</span> <span class="p">(</span><span class="nb">zerop</span> <span class="nv">y</span><span class="p">))</span> <span class="s">"0"</span><span class="p">)</span>
                         <span class="p">((</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">1</span> <span class="p">(</span><span class="nb">mod</span> <span class="nv">x</span> <span class="mi">3</span><span class="p">))</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">1</span> <span class="p">(</span><span class="nb">mod</span> <span class="nv">y</span> <span class="mi">3</span><span class="p">)))</span> <span class="s">"1"</span><span class="p">)</span>
                         <span class="p">(</span><span class="no">t</span> <span class="p">(</span><span class="nv">fill-p</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">x</span> <span class="mi">3</span><span class="p">)</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">y</span> <span class="mi">3</span><span class="p">))))))</span>
    <span class="p">(</span><span class="nv">insert</span> <span class="p">(</span><span class="nb">format</span> <span class="s">"P1\n%d %d\n"</span> <span class="nv">s</span> <span class="nv">s</span><span class="p">))</span>
    <span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">y</span> <span class="nv">s</span><span class="p">)</span> <span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">x</span> <span class="nv">s</span><span class="p">)</span> <span class="p">(</span><span class="nv">insert</span> <span class="p">(</span><span class="nv">fill-p</span> <span class="nv">x</span> <span class="nv">y</span><span class="p">)</span> <span class="s">" "</span><span class="p">))))</span>
  <span class="p">(</span><span class="nv">image-mode</span><span class="p">))</span>
</code></pre></div></div>

<p>It’s best called with powers of three,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">sierpinski</span> <span class="p">(</span><span class="nb">expt</span> <span class="mi">3</span> <span class="mi">5</span><span class="p">))</span>
</code></pre></div></div>

<p><a href="/img/fractal/sierpinski.png"><img src="/img/fractal/sierpinski-thumb.png" alt="" /></a></p>

<p>This one should <a href="/blog/2007/10/01/">look quite familiar</a>. Using the
same technique,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">mandelbrot</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">pop-to-buffer</span> <span class="p">(</span><span class="nv">get-buffer-create</span> <span class="s">"*mandelbrot*"</span><span class="p">))</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">w</span> <span class="mi">400</span><span class="p">)</span> <span class="p">(</span><span class="nv">h</span> <span class="mi">300</span><span class="p">)</span> <span class="p">(</span><span class="nv">d</span> <span class="mi">32</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">fundamental-mode</span><span class="p">)</span> <span class="p">(</span><span class="nv">erase-buffer</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">set-buffer-multibyte</span> <span class="no">nil</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">insert</span> <span class="p">(</span><span class="nb">format</span> <span class="s">"P6\n%d %d\n255\n"</span> <span class="nv">w</span> <span class="nv">h</span><span class="p">))</span>
    <span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">y</span> <span class="nv">h</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">x</span> <span class="nv">w</span><span class="p">)</span>
        <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">cx</span> <span class="p">(</span><span class="nb">*</span> <span class="mf">1.5</span> <span class="p">(</span><span class="nb">/</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">x</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">w</span> <span class="mf">1.45</span><span class="p">))</span> <span class="nv">w</span> <span class="mf">0.45</span><span class="p">)))</span>
               <span class="p">(</span><span class="nv">cy</span> <span class="p">(</span><span class="nb">*</span> <span class="mf">1.5</span> <span class="p">(</span><span class="nb">/</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">y</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">h</span> <span class="mf">2.0</span><span class="p">))</span> <span class="nv">h</span> <span class="mf">0.5</span><span class="p">)))</span>
               <span class="p">(</span><span class="nv">zr</span> <span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="nv">zi</span> <span class="mi">0</span><span class="p">)</span>
               <span class="p">(</span><span class="nv">v</span> <span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">i</span> <span class="nv">d</span> <span class="nv">d</span><span class="p">)</span>
                    <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">&gt;</span> <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">zr</span> <span class="nv">zr</span><span class="p">)</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">zi</span> <span class="nv">zi</span><span class="p">))</span> <span class="mi">4</span><span class="p">)</span> <span class="p">(</span><span class="nb">return</span> <span class="nv">i</span><span class="p">)</span>
                      <span class="p">(</span><span class="nb">psetq</span> <span class="nv">zr</span> <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">zr</span> <span class="nv">zr</span><span class="p">)</span> <span class="p">(</span><span class="nb">-</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">zi</span> <span class="nv">zi</span><span class="p">))</span> <span class="nv">cx</span><span class="p">)</span>
                             <span class="nv">zi</span> <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nb">*</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">zr</span> <span class="nv">zi</span><span class="p">)</span> <span class="mi">2</span><span class="p">)</span> <span class="nv">cy</span><span class="p">))))))</span>
          <span class="p">(</span><span class="nv">insert-char</span> <span class="p">(</span><span class="nb">floor</span> <span class="p">(</span><span class="nb">*</span> <span class="mi">256</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">v</span> <span class="mf">1.0</span> <span class="nv">d</span><span class="p">)))</span> <span class="mi">3</span><span class="p">))))</span>
    <span class="p">(</span><span class="nv">image-mode</span><span class="p">)))</span>
</code></pre></div></div>

<p><img src="/img/fractal/elisp-mandelbrot.png" alt="" /></p>

<p>Tweak it with a colormap,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">colormap</span> <span class="p">(</span><span class="nv">v</span><span class="p">)</span>
  <span class="s">"Given a value between 0 and 1.0, insert a P6 color."</span>
  <span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">i</span> <span class="mi">3</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">insert-char</span> <span class="p">(</span><span class="nb">floor</span> <span class="p">(</span><span class="nb">*</span> <span class="mi">256</span> <span class="p">(</span><span class="nb">min</span> <span class="mf">0.99</span> <span class="p">(</span><span class="nb">sqrt</span> <span class="p">(</span><span class="nb">*</span> <span class="p">(</span><span class="nb">-</span> <span class="mi">3</span> <span class="nv">i</span><span class="p">)</span> <span class="nv">v</span><span class="p">)))))</span> <span class="mi">1</span><span class="p">)))</span>
</code></pre></div></div>

<p><img src="/img/fractal/elisp-mandelbrot-color.png" alt="" /></p>

<p>One of the project ideas on my mental back-burner of things I’ll never
get to is to create a little graphics library for Elisp. It would use
a technique like this to pull it off. Assuming support was compiled
in, Emacs can even render SVGs to a buffer, so creating a rich
graphics library wouldn’t be difficult at all. Plus, unlike bare
Elisp, it would be <em>fast</em>.</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Markov Chain Text Generation</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/09/05/"/>
    <id>urn:uuid:3f808165-be65-3f4b-f485-8df6aacccd04</id>
    <updated>2012-09-05T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="ai"/>
    <content type="html">
      <![CDATA[<p>You may have been confused by
<a href="/blog/2012/09/04/">yesterday’s nonsense post</a>. That’s because it was
generated by a few
<a href="https://github.com/skeeto/markov-text">Elisp Markov chain functions</a>. It
was fed my entire blog and used to generate a ~1500 word post.  I
tidied up a bit to make sure the markup was valid and parenthesis were
balanced, but that’s about it.</p>

<p>The algorithm is really simple and I was quite surprised by the
quality of the output. After feeding it <em>Great Expectations</em> and <em>A
Princess of Mars</em> (easily obtainable from
<a href="http://www.gutenberg.org/">Project Gutenberg</a>) I had a good laugh at
some of the output. Some choice quotes,</p>

<blockquote>
  <p>He wiped himself again, as if he didn’t marry her by hand.</p>
</blockquote>

<blockquote>
  <p>I admit having done so, and the summer afternoon toned down into the
house.</p>
</blockquote>

<p>My favorite of yesterday’s post was this one,</p>

<blockquote>
  <p>Suppose you want to read a great story, I recommend it.</p>
</blockquote>

<p>The output also looks like some types of spam, so this may be how some
spammers generate content in order to get around spam filters.</p>

<p>To build a Markov chain from input, the program looks at
<code class="language-plaintext highlighter-rouge">markov-text-state-size</code> words (default 3) and makes note of what word
follows. Then it slides the window forward one word and repeats. To
generate text, the last <code class="language-plaintext highlighter-rouge">markov-text-state-size</code> words outputted is
the state and the next word is selected from these notes at random,
weighted by the frequency of its appearance in the input text. Smaller
state sizes generates more random output and larger state sizes
generates better structured output. Too large and the output is the
input verbatim.</p>

<p>For example, given this sentence and a state size of <em>two</em> words,</p>

<blockquote>
  <p>Quickly, he ran and he ran until he couldn’t.</p>
</blockquote>

<p>The produced chain looks like this in alist form,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>((("Quickly," "he") "ran")
 (("he" "ran") "and" "until")
 (("ran" "and") "he")
 (("and" "he") "ran")
 (("ran" "until") "he")
 (("until" "he") "couldn't.")
 (("he" "couldn't.")))
</code></pre></div></div>

<p><a href="/img/diagram/markov-chain.gv"><img src="/img/diagram/markov-chain.png" alt="" /></a></p>

<p>Because there are two options for (“he” “ran”), the generator might
loop around that state for awhile like so,</p>

<blockquote>
  <p>Quickly, he ran and he ran and he ran and he ran until he couldn’t.</p>
</blockquote>

<p>Or it might skip the section altogether,</p>

<blockquote>
  <p>Quickly, he ran until he couldn’t.</p>
</blockquote>

<p>Also notice that the punctuation is part of the word. This makes the
output more natural, automatically forming sentences. More so, my
program also holds onto all newlines. This breaks the output into nice
paragraphs without any extra effort. Since I wrote it in Elisp, I use
<code class="language-plaintext highlighter-rouge">fill-paragraph</code> to properly wrap the paragraphs as I generate them,
so superfluous single newlines don’t hurt anything.</p>

<p>One problem I did run into with my input text was quotes. I was using
novels so there is a lot of quoted text (character dialog). The
generated text tends to balance quotes poorly. My solution for the
moment is to strip these out along with spaces when forming
words. That’s still not ideal.</p>

<p>I’m going to play with this a bit more, using it as a tool for other
project ideas (ERC bot, etc.). I already did this by including a
<a href="http://en.wikipedia.org/wiki/Lorem_ipsum"><em>lorem ipsum</em></a> generator
alongside the <code class="language-plaintext highlighter-rouge">markov-text</code> package. The input text is Cicero’s <em>De
finibus bonorum et malorum</em>, the original source of <em>lorem
ipsum</em>. This was actually the original inspiration for this project,
after I saw <code class="language-plaintext highlighter-rouge">lorem-ipsum.el</code> on EmacsWiki and decided I could do
better.</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>simple-httpd and impatient-mode</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/08/20/"/>
    <id>urn:uuid:627e438d-36e5-3a9d-dd35-6a9a3914a63a</id>
    <updated>2012-08-20T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>After <a href="/blog/2012/08/12/">settling in with MELPA</a> I wanted to see
about into turning <a href="/blog/2009/05/17/">my Emacs web server</a> into an
installable package. Someone had already uploaded my code to
<a href="http://marmalade-repo.org/">Marmalade</a> after taking credit for all
the work and slapping the GPL on it (my version is public domain). So,
due to that and because the name <code class="language-plaintext highlighter-rouge">httpd.el</code> is already overloaded as
it is, I renamed it to <code class="language-plaintext highlighter-rouge">simple-httpd</code>. That’s the name of the package
in MELPA.</p>

<p>I did more than rename the package; it got an overhaul. I rewrote a
few functions, tossed a whole bunch of functions, created
<a href="/blog/2012/08/15/">a test suite</a>, and <strong>finally added directory
listings</strong> — a feature that had long been on the TODO list. To keep
with the name “simple”, I ripped out the
<a href="/blog/2009/11/03/">clunky servlet system</a> (sorry Chunye). This new
version was leaner, cleaner, and more useful.</p>

<p>I’ve definitely improved my software development skill over the last
three years since I originally wrote it. In my refactor I made it
buffer oriented. When a request comes in, the server fills a buffer
with the response and sends it back. This means I could send a
<code class="language-plaintext highlighter-rouge">Content-Length</code> header and use keep-alive to serve multiple requests
over one connection. It also suggested a new servlet paradigm — the
servlet prepares a buffer and the server sends it to the client.</p>

<h3 id="servlets">Servlets</h3>

<p>So I ended up adding servlet support again, from scratch. This time
it’s really easy to use. Here’s a “Hello, World” servlet,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defservlet</span> <span class="nv">hello-world</span> <span class="nv">text/plain</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">insert</span> <span class="s">"Hello, World"</span><span class="p">))</span>
</code></pre></div></div>

<p>The “function name” part is the path to the servlet. This one would be
found at <code class="language-plaintext highlighter-rouge">/hello-world</code>. The second is the MIME type as a
symbol. We’re just sending plain text in this example. The third is
the argument list. A servlet takes up to three arguments: the path,
the query alist, and the full request object (which includes the first
two). Unless a more specific servlet is defined, this servlet handles
everything under its root. In this case <code class="language-plaintext highlighter-rouge">/hello-world</code>, including
<code class="language-plaintext highlighter-rouge">/hello-world/foo</code> and <code class="language-plaintext highlighter-rouge">/hello-world/foo/bar.txt</code>. This is why the
path argument is relevant.</p>

<p>This servlet uses the path to get a name,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defservlet</span> <span class="nv">hello</span> <span class="nv">text/plain</span> <span class="p">(</span><span class="nv">path</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">insert</span> <span class="s">"hello, "</span> <span class="p">(</span><span class="nv">file-name-nondirectory</span> <span class="nv">path</span><span class="p">)))</span>
</code></pre></div></div>

<p>If you visit <code class="language-plaintext highlighter-rouge">/hello/Chris</code> it will send you “Hello, Chris”. Servlets
are trivial to write!</p>

<p>This one serves the contents of the <code class="language-plaintext highlighter-rouge">*scratch*</code> buffer,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defservlet</span> <span class="nv">scratch</span> <span class="nv">text/plain</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">insert-buffer-substring</span> <span class="p">(</span><span class="nv">get-buffer</span> <span class="s">"*scratch*"</span><span class="p">)))</span>
</code></pre></div></div>

<p>In the background I continue to use Chunye’s symbol dispatch
technique, so all servlets are actually functions that begin with
<code class="language-plaintext highlighter-rouge">httpd/</code> (<code class="language-plaintext highlighter-rouge">http/hello-world</code> and <code class="language-plaintext highlighter-rouge">httpd/hello</code>). For a more advanced
servlet, the function can be written directly. There’s another macro,
<code class="language-plaintext highlighter-rouge">with-httpd-buffer</code> to help keep this simple. The server will always
pass four arguments (the three servlet arguments plus one more), so
when creating the function directly it needs to accept at least four
arguments.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">httpd/hello</span> <span class="p">(</span><span class="nv">proc</span> <span class="nv">path</span> <span class="k">&amp;rest</span> <span class="nv">args</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">with-httpd-buffer</span> <span class="nv">proc</span> <span class="s">"text/plain"</span>
    <span class="p">(</span><span class="nv">insert</span> <span class="s">"hello, "</span> <span class="p">(</span><span class="nv">file-name-nondirectory</span> <span class="nv">path</span><span class="p">))))</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">proc</code> object here is the network connection process, providing
more exclusive access to the client. This allows the servlet to do
more interesting things like respond in the future (long polls). The
<code class="language-plaintext highlighter-rouge">with-httpd-buffer</code> macro creates a temporary buffer and, when the
body completes, sends an HTTP header and the buffer as the content,
similar to <code class="language-plaintext highlighter-rouge">defservlet</code>.</p>

<p>With access to the process, the servlet can do more specialized things
like send custom headers with <code class="language-plaintext highlighter-rouge">httpd-send-header</code>, send files with
<code class="language-plaintext highlighter-rouge">httpd-send-file</code>, send an error page with <code class="language-plaintext highlighter-rouge">httpd-error</code>, or do
redirects with <code class="language-plaintext highlighter-rouge">httpd-redirect</code>. The file server part of the server is
actually just another a servlet as well: <code class="language-plaintext highlighter-rouge">httpd/</code>. This could be
redefined to redirect the browser to our example servlet (HTTP 301).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">httpd/</span> <span class="p">(</span><span class="nv">proc</span> <span class="k">&amp;rest</span> <span class="nv">args</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">httpd-redirect</span> <span class="nv">proc</span> <span class="s">"/hello-world"</span><span class="p">))</span>
</code></pre></div></div>

<h3 id="impatient-mode">impatient-mode</h3>

<p>I showed this to <a href="http://50ply.com">Brian</a>, like I do everything, and
he found my servlet concept to be compelling, especially the
buffer-serving servlet. I believe his exact words were, “That’s so
simple.” He found it interesting enough that
<a href="http://www.50ply.com/blog/2012/08/13/introducing-impatient-mode/">he wrote a mode based on it called <code class="language-plaintext highlighter-rouge">impatient-mode</code></a>!</p>

<p>It serves a buffer’s content live to the web browser, including syntax
highlighting (via htmlize). Updates to the buffer are communicated by
a long-poll. The browser initiates a request in the background for an
update. Emacs adds the request to a list. A hook in
<code class="language-plaintext highlighter-rouge">after-change-functions</code> updates all the browsers waiting for an
update.</p>

<p>Enabling <code class="language-plaintext highlighter-rouge">impatient-mode</code>, a minor mode, publishes the buffer. If the
server’s running, the list of published buffers can be found under
<code class="language-plaintext highlighter-rouge">/imp</code> —
i.e. <a href="http://localhost:8080/imp">http://localhost:8080/imp</a>. The
buffer can be accessed directly at <code class="language-plaintext highlighter-rouge">/imp/live/&lt;buffer-name&gt;</code>, which is
where <code class="language-plaintext highlighter-rouge">/imp</code> will link.</p>

<p>Perhaps the coolest thing is serving an HTML buffer <em>without</em>
htmlize. That is, send the raw buffer as <code class="language-plaintext highlighter-rouge">text/html</code>. Brian has a demo
of this in the linked post. You can tweak CSS and HTML and watch it
update live in the browser as you edit. It’s a really neat way to edit
CSS, since it’s often unintuitive (at least for me).</p>

<p><code class="language-plaintext highlighter-rouge">impatient-mode</code> can also be installed through MELPA.</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Elisp Unit Testing with ERT</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/08/15/"/>
    <id>urn:uuid:f5798f49-155b-3038-a8d9-4f5a5f1c2d0c</id>
    <updated>2012-08-15T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>Emacs 24 comes with a unit testing library, ERT (Emacs Lisp
Regression Testing). I learned about it after watching
<a href="http://emacsrocks.com/">Extending Emacs Rocks!</a> and I’ve been using
it ever since. It’s been a pleasant experience; enough so that
<a href="https://github.com/skeeto/.emacs.d/commit/59d3eac73edbad8a5be72a81c7d6c5b1193bbb90">I made a key binding for it</a>
so that I can effortlessly run tests at any time. When I recently made
a major overhaul to my Emacs web server I added
<a href="https://github.com/skeeto/emacs-http-server/blob/master/simple-httpd-test.el">a small test suite</a>
using ERT.</p>

<p>Emacs also comes with the ERT manual so it’s easy to start learning,
but here’s the gist of it. There are essentially two macros to worry
about: <code class="language-plaintext highlighter-rouge">ert-deftest</code> and <code class="language-plaintext highlighter-rouge">should</code>. The first is used to create tests
and the second behaves like <code class="language-plaintext highlighter-rouge">assert</code> but with nicer behavior. Here’s
an example,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">ert-deftest</span> <span class="nv">example-test</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">should</span> <span class="p">(</span><span class="nb">=</span> <span class="p">(</span><span class="nb">+</span> <span class="mi">9</span> <span class="mi">2</span><span class="p">)</span> <span class="mi">11</span><span class="p">)))</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">ert-deftest</code> is what you’d expect from every other <code class="language-plaintext highlighter-rouge">def*</code>. The empty
parameter list does nothing at the moment other than to make it feel
like writing a <code class="language-plaintext highlighter-rouge">defun</code>. The body is evaluated as normal. This is all
turned into an anonymous function which is stuffed in the <em>plist</em> of
the symbol <code class="language-plaintext highlighter-rouge">example-test</code>. When it comes time to running tests, they
are found by searching the plists of every interned symbol.</p>

<p>The other macro, <code class="language-plaintext highlighter-rouge">should</code>, takes one argument: a form that <em>should</em>
evaluate to true. There is also a <code class="language-plaintext highlighter-rouge">should-not</code> and a <code class="language-plaintext highlighter-rouge">should-error</code>,
which do what you would expect.</p>

<p>Tests are run with <code class="language-plaintext highlighter-rouge">M-x ert</code>. It will ask for a <em>test selector</em>, where
<code class="language-plaintext highlighter-rouge">t</code> selects all defined tests. There are many ways to select a subset
of all tests (<code class="language-plaintext highlighter-rouge">:new</code>, <code class="language-plaintext highlighter-rouge">:passed</code>, <code class="language-plaintext highlighter-rouge">:failed</code>, etc.) but I usually just
run all of them (as my key binding makes obvious). The results are
displayed in a separate pop-up buffer which, as usual, can be
dismissed with <code class="language-plaintext highlighter-rouge">q</code>.</p>

<h3 id="running-ert">Running ERT</h3>

<p>What makes <code class="language-plaintext highlighter-rouge">should</code> special is error reporting. When tests fail you
will be provided with the forms that failed and their return
values. For example, if we modify the test above to fail.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">ert-deftest</span> <span class="nv">example-test</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">should</span> <span class="p">(</span><span class="nb">=</span> <span class="p">(</span><span class="nb">+</span> <span class="mi">9</span> <span class="mi">2</span><span class="p">)</span> <span class="mi">100</span><span class="p">)))</span>
</code></pre></div></div>

<p>Then run the test and it will note the failure. There is also some red
coloring not captured here.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">F</span> <span class="nv">example-test</span>
    <span class="p">(</span><span class="nv">ert-test-failed</span>
     <span class="p">((</span><span class="nv">should</span>
       <span class="p">(</span><span class="nb">=</span>
        <span class="p">(</span><span class="nb">+</span> <span class="mi">9</span> <span class="mi">2</span><span class="p">)</span>
        <span class="mi">100</span><span class="p">))</span>
      <span class="ss">:form</span>
      <span class="p">(</span><span class="nb">=</span> <span class="mi">11</span> <span class="mi">100</span><span class="p">)</span>
      <span class="ss">:value</span> <span class="no">nil</span><span class="p">))</span>
</code></pre></div></div>

<p>Displayed are the forms we were comparing — <code class="language-plaintext highlighter-rouge">(+ 9 2)</code> and <code class="language-plaintext highlighter-rouge">100</code> — and
what they evaluated to: <code class="language-plaintext highlighter-rouge">(= 11 100)</code>. If I put the point at the test
result and type <code class="language-plaintext highlighter-rouge">.</code> it will take me to the test definition so that I
can start looking further. Or I can press <code class="language-plaintext highlighter-rouge">b</code> to see a backtrace, <code class="language-plaintext highlighter-rouge">m</code>
to see all output messages from that test, or, if I’m in disbelief,
<code class="language-plaintext highlighter-rouge">r</code> to rerun that test.</p>

<h3 id="mocking">Mocking</h3>

<p>Elisp’s dynamic bindings really come in handy when functions need to
be mocked. For example, say I have a function that, at some point,
needs to check whether or not a particular file exists. This would be
done using <code class="language-plaintext highlighter-rouge">file-exists-p</code>. Creating or removing the file in the
filesystem before the test isn’t a well-contained unit test. Tests
running in parallel could interfere and there are a number of ways
something could go wrong.</p>

<p>Instead I’ll temporarily override the definition of <code class="language-plaintext highlighter-rouge">file-exists-p</code>
with a <em>mock</em> function using <code class="language-plaintext highlighter-rouge">let</code>’s cousin, <code class="language-plaintext highlighter-rouge">flet</code>. Note that
<code class="language-plaintext highlighter-rouge">file-exists-p</code> is a C source function but I can still override it as
if it was any regular lisp function.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">determine-next-action</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nv">file-exists-p</span> <span class="s">"death-star-plans.org"</span><span class="p">)</span>
      <span class="ss">'bring-him-the-passengers</span>
    <span class="ss">'tear-this-ship-apart</span><span class="p">))</span>

<span class="p">(</span><span class="nv">ert-deftest</span> <span class="nv">file-check-test</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">flet</span> <span class="p">((</span><span class="nv">file-exists-p</span> <span class="p">(</span><span class="nv">file</span><span class="p">)</span> <span class="no">t</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">should</span> <span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nv">determine-next-action</span><span class="p">)</span> <span class="ss">'bring-him-the-passengers</span><span class="p">)))</span>
  <span class="p">(</span><span class="k">flet</span> <span class="p">((</span><span class="nv">file-exists-p</span> <span class="p">(</span><span class="nv">file</span><span class="p">)</span> <span class="no">nil</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">should</span> <span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nv">determine-next-action</span><span class="p">)</span> <span class="ss">'tear-this-ship-apart</span><span class="p">))))</span>
</code></pre></div></div>

<p>This is a very simple mock. For a real unit test I might want the mock
to return <code class="language-plaintext highlighter-rouge">t</code> for some filename patterns and <code class="language-plaintext highlighter-rouge">nil</code> for others. There’s
an extension to ERT, <code class="language-plaintext highlighter-rouge">el-mock.el</code>, which assists in creating more
complex mocks, but I haven’t used or needed it yet.</p>

<p>Since it’s so convenient I’m going to be using ERT more and more until
it becomes second-nature.</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Switching to the Emacs Lisp Package Archive</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/08/12/"/>
    <id>urn:uuid:3e3186c8-dccd-3167-1f42-79f34d08a3dd</id>
    <updated>2012-08-12T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p><em>Update June 2017</em>: I no longer use Emacs’ <code class="language-plaintext highlighter-rouge">package.el</code> and instead
manage packages and their dependencies (manually) through my own
decentralized package system called <code class="language-plaintext highlighter-rouge">gpkg</code> (“git package”).</p>

<p>For those who are unaware, Emacs 24 was finally released this past
June. I had been following the official repository for about a year
before the release using what was becoming version 24, very quickly
becoming dependent on several of <a href="http://www.gnu.org/software/emacs/NEWS.24.1">the new features</a>. Now that
it’s been officially released I’m back to using a stable version of
Emacs, about which I’m quite relieved.</p>

<p>One of the new features that I <em>hadn’t</em> been using until recently was
the package manager, <code class="language-plaintext highlighter-rouge">package</code>, and the
<a href="http://elpa.gnu.org/">Emacs Lisp Package Archive</a> (ELPA). You can now
ask Emacs to download and install new modes and extensions from the
Internet. By default, it only uses the official archive. It only hosts
packages with copyright assigned to the FSF — quite
restrictive. There are alternatives, the most popular of which is
<a href="http://marmalade-repo.org/">Marmalade</a>. Fortunately it’s easy to ask
<code class="language-plaintext highlighter-rouge">package</code> to use additional repositories, so this is a non-issue.</p>

<p>Because it was still unstable and buggy at the time, I avoided using
it when <a href="/blog/2011/10/19/">setting up my configuration repository</a>.
Instead I opted to gather packages by way of Git submodules. I’d give
<code class="language-plaintext highlighter-rouge">package</code> a shot once Emacs 24 was released. Once it was released in
June it was just a matter of time until I invested into this new
system.</p>

<p>The trigger was an e-mail from one of my readers, Rolando. He asked me
if I could move my <a href="/blog/2012/08/02/">recently updated</a> memoization
function into its own repository and touch it up so that it could be
turned into a package with <a href="http://melpa.milkbox.net/">MELPA</a>, another
alternative package repository. This forced me to finally investigate.</p>

<p>It turns out MELPA is <em>really</em> interesting. Each package is described
by a “recipe” file, which is essentially just a tiny s-expression
listing the repository URL. In the case of my memoization package,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">memoize</span> <span class="ss">:repo</span> <span class="s">"skeeto/emacs-memoize"</span>
         <span class="ss">:fetcher</span> <span class="nv">github</span><span class="p">)</span>
</code></pre></div></div>

<p>From a package maintainer’s point-of-view, this is fantastic. I don’t
have to take any extra steps to publish updates to my package. I just
keep doing what I do and it happens automatically. However, I need to
be more careful about not pushing broken commits — which is why I
started unit testing (to be
<a href="/blog/2012/08/15/">covered in a future post</a>). And I need to be extra
careful with my SSH keys, since they’re now used to publish code that
other people automatically trust and execute.</p>

<p>Excited about MELPA and wanting to actually use my own package, I
started throwing out my submodules, replacing them with their package
equivalents. If you follow my configuration repository you probably
noticed all the recent disruption, because updating requires manual
intervention. Git leaves submodules around (for good reason!) so they
need to be manually removed.</p>

<p>I also heavily updated and renamed <a href="/blog/2009/05/17/">my web server</a>
(now called <code class="language-plaintext highlighter-rouge">simple-httpd</code>) to provide it as a package (also to be
covered in a future post). Thanks to MELPA, I follow the package
rather than my own repository since it follows so closely (&lt; 1 hour).</p>

<p>Another barrier was that I was using an old version of <a href="https://github.com/magit/magit">Magit</a>
due to a bad interaction of modern versions with Wombat, my preferred
color theme. After <a href="https://github.com/skeeto/.emacs.d/commit/aec488937ff9a344278359ded7732446f2380748">some face tweaking</a>, I not only fixed it
but I made it better than it was before. Sinking a an hour or two into
these sorts of annoyances usually works out really well. I need to
remind myself of this in the future when I run into annoyance issues.</p>

<p>Surprisingly, <code class="language-plaintext highlighter-rouge">package</code> doesn’t seem to be written with managed
configuration in mind. The provided functionally is designed to be
used interactively rather than programmatically. <code class="language-plaintext highlighter-rouge">package-install</code> is
only meant to be invoked once, so care needs to be taken in listing
packages in a configuration and doing everything in the right
order. Here’s how I have it set up at the moment, after after listing
the packages to use in <code class="language-plaintext highlighter-rouge">my-packages</code>,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'package</span><span class="p">)</span>
<span class="p">(</span><span class="nv">add-to-list</span> <span class="ss">'package-archives</span>
             <span class="o">'</span><span class="p">(</span><span class="s">"melpa"</span> <span class="o">.</span> <span class="s">"http://melpa.milkbox.net/packages/"</span><span class="p">)</span> <span class="no">t</span><span class="p">)</span>
<span class="p">(</span><span class="nv">package-initialize</span><span class="p">)</span>
<span class="p">(</span><span class="nb">unless</span> <span class="nv">package-archive-contents</span>
  <span class="p">(</span><span class="nv">package-refresh-contents</span><span class="p">))</span>
<span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">p</span> <span class="nv">my-packages</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nb">not</span> <span class="p">(</span><span class="nv">package-installed-p</span> <span class="nv">p</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">package-install</span> <span class="nv">p</span><span class="p">)))</span>
</code></pre></div></div>

<p>Upgrading/updating is currently a manual process. Run
<code class="language-plaintext highlighter-rouge">package-refresh-contents</code>, list the packages with <code class="language-plaintext highlighter-rouge">list-packages</code>,
type <code class="language-plaintext highlighter-rouge">U</code> to mark updates, then <code class="language-plaintext highlighter-rouge">x</code> to e<code class="language-plaintext highlighter-rouge">x</code>ecute the upgrade. Sometime
I may work that into my configuration to be done automatically
once-per-week or something.</p>

<p>I really look forward to making more use of the package manager,
especially as packages can more easily become interdependent, reducing
duplication of effort.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>Programmatically Setting Lisp Docstrings</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/08/02/"/>
    <id>urn:uuid:d35e27e8-212a-3d1c-5168-afcccc04bf76</id>
    <updated>2012-08-02T00:00:00Z</updated>
    <category term="elisp"/><category term="emacs"/>
    <content type="html">
      <![CDATA[<p>I just updated my <a href="/blog/2010/07/26/">Elisp memoization function</a> so
that it’s no longer a dirty hack. To work around the lack of closures,
due to the lack of lexical scope in Elisp, the original version used
uninterned symbols to store the look-up table. The new version in the
post uses <code class="language-plaintext highlighter-rouge">lexical-let</code>, which does the same thing internally to fake
a closure. The new version in <a href="/blog/2011/10/19/">my dotfiles</a>
repository uses the brand new
<a href="http://www.gnu.org/software/emacs/NEWS.24.1">Emacs 24 lexical scoping</a>.</p>

<p>It was “dirty” because it built a lambda function out of a list at run
time, taking advantage of the way Elisp currently handles
functions. The reason for this was that I wanted to inject the
original documentation string into the new function which can’t
normally be done when <code class="language-plaintext highlighter-rouge">lambda</code> is used the correct way. When I updated
the function I fixed this as well. It uses a trick provided by Elisp,
which is different than the Common Lisp way that I assumed.</p>

<p>Both Elisp and Common Lisp have a <code class="language-plaintext highlighter-rouge">documentation</code> function for
programmatically accessing symbol documentation. The Elisp version
only provides <em>function</em> documentation, so it only accepts one
argument.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">foo</span> <span class="p">()</span>
  <span class="s">"Foo."</span>
  <span class="no">nil</span><span class="p">)</span>

<span class="p">(</span><span class="nb">documentation</span> <span class="ss">'foo</span><span class="p">)</span>
<span class="nv">=&gt;</span> <span class="s">"Foo."</span>
</code></pre></div></div>

<p>The Common Lisp version must be told what type of documentation to
return, such as <code class="language-plaintext highlighter-rouge">function</code> or <code class="language-plaintext highlighter-rouge">variable</code> (<code class="language-plaintext highlighter-rouge">defvar</code>, <code class="language-plaintext highlighter-rouge">defconst</code>).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">documentation</span> <span class="ss">'foo</span> <span class="ss">'function</span><span class="p">)</span>
<span class="nv">=&gt;</span> <span class="s">"Foo."</span>
</code></pre></div></div>

<p>As it might be expected, this is <code class="language-plaintext highlighter-rouge">setf</code>-able! It’s possible to update
or modify documentation strings without needing to redefine the
function.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">documentation</span> <span class="ss">'foo</span> <span class="ss">'function</span><span class="p">)</span> <span class="s">"New doc string."</span><span class="p">)</span>
</code></pre></div></div>

<p>Unfortunately it’s not <code class="language-plaintext highlighter-rouge">setf</code>-able in Elisp. Instead you can set the
<code class="language-plaintext highlighter-rouge">function-documentation</code> <em>property</em> of the symbol. The <code class="language-plaintext highlighter-rouge">documentation</code>
function will prefer this over the string stored in the function
itself.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">put</span> <span class="ss">'foo</span> <span class="ss">'function-documentation</span> <span class="s">"Foo updated."</span><span class="p">)</span>

<span class="p">(</span><span class="nb">documentation</span> <span class="ss">'foo</span><span class="p">)</span>
<span class="nv">=&gt;</span> <span class="s">"Foo updated."</span>
</code></pre></div></div>

<p>The downside is that this is a second place to put docstrings, leading
to surprising behavior for developers unaware of this hack.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">put</span> <span class="ss">'foo</span> <span class="ss">'function-documentation</span> <span class="s">"Old docstring."</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">foo</span> <span class="p">()</span>
  <span class="s">"New docstring."</span>
  <span class="no">nil</span><span class="p">)</span>

<span class="p">(</span><span class="nb">documentation</span> <span class="ss">'foo</span><span class="p">)</span>
<span class="nv">=&gt;</span> <span class="s">"Old docstring."</span>
</code></pre></div></div>

<p>This can be fixed by setting the symbol property for
<code class="language-plaintext highlighter-rouge">function-documentation</code> to <code class="language-plaintext highlighter-rouge">nil</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">put</span> <span class="ss">'foo</span> <span class="ss">'function-documentation</span> <span class="no">nil</span><span class="p">)</span>
</code></pre></div></div>

<p>I prefer the Common Lisp method.</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  <entry>
    <title>Literal Arrays and Vectors in Lisp</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/07/17/"/>
    <id>urn:uuid:71e44f9e-2e92-30e9-8b29-8418229a7ce1</id>
    <updated>2012-07-17T00:00:00Z</updated>
    <category term="lisp"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>Despite being a Lisper, Unlike <a href="http://www.50ply.com/">Brian</a> I
haven’t gotten into Clojure yet. I’ve been following along at a safe
distance. Due to
<a href="http://www.50ply.com/blog/2012/07/06/asynchronous-sequential-code-shape/">a recent post of his</a>
I learned about a significant difference between Clojure and other
Lisps when it comes to arrays/vectors.</p>

<p>In this recent post, Brian wrote a ClojureScript <code class="language-plaintext highlighter-rouge">let</code>-like macro to
hide JavaScript asynchronous function chains so that they can be used
just like regular synchronous functions. Follow Clojure’s style, the
asynchronous functions are written inside a vector rather than a list
to indicate to the macro that they’re special.</p>

<div class="language-clojure highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nf">doasync</span><span class="w">
  </span><span class="p">[</span><span class="n">text</span><span class="w"> </span><span class="p">[</span><span class="n">fetch</span><span class="w"> </span><span class="s">"/foo/json"</span><span class="p">]</span><span class="w">
   </span><span class="n">url</span><span class="w"> </span><span class="p">(</span><span class="nb">str</span><span class="w"> </span><span class="n">text</span><span class="w"> </span><span class="s">".html"</span><span class="p">)</span><span class="w">
   </span><span class="n">result</span><span class="w"> </span><span class="p">[</span><span class="n">fetch</span><span class="w"> </span><span class="n">url</span><span class="p">]</span><span class="w">
   </span><span class="n">_</span><span class="w"> </span><span class="p">(</span><span class="nf">.show</span><span class="w"> </span><span class="n">view</span><span class="w"> </span><span class="n">result</span><span class="p">)</span><span class="w">
   </span><span class="n">_</span><span class="w"> </span><span class="p">[</span><span class="n">timeout</span><span class="w"> </span><span class="mi">1000</span><span class="p">]</span><span class="w">
   </span><span class="n">_</span><span class="w"> </span><span class="p">(</span><span class="nf">.makeEditable</span><span class="w"> </span><span class="n">view</span><span class="p">)])</span><span class="w">
</span></code></pre></div></div>

<p>That sounded completely reasonable to me, since array literals are
rarely used inside code Common Lisp. When they are used, it’s as a
global constant.</p>

<p>A few days later when I was talking to Brian at the metaphorical water
cooler he mentioned that the macro was actually conflicting with what
he would normally write. Sometimes he really did want to use a vector
literal in a <code class="language-plaintext highlighter-rouge">let</code> binding. Why would he do that? In Common Lisp,
that’s just asking for trouble — same for Elisp and Scheme.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">v</span> <span class="o">#(</span><span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">foo</span> <span class="nv">v</span><span class="p">))</span>
</code></pre></div></div>

<p>The reason why this is a bad idea is that the <em>same exact</em> array will
always be passed to <code class="language-plaintext highlighter-rouge">foo</code>. The array is created once at <em>read time</em> by
the reader and re-used for the life of that code. If anyone makes a
modification to the array it will damage the array for everyone using
it.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">foo</span> <span class="p">()</span>
  <span class="o">#(</span><span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span><span class="p">))</span>
<span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nv">foo</span><span class="p">)</span> <span class="p">(</span><span class="nv">foo</span><span class="p">))</span>
<span class="nv">=&gt;</span> <span class="nv">T</span>
</code></pre></div></div>

<p>The safer method is to create a fresh array every time by <em>not</em> using
a literal but instead calling <code class="language-plaintext highlighter-rouge">vector</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">v</span> <span class="p">(</span><span class="nb">vector</span> <span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">foo</span> <span class="nv">v</span><span class="p">))</span>
</code></pre></div></div>

<p>Clojure data structures are immutable, including vectors, so using the
same exact vector in multiple places is safe. That makes use literal
vectors in code less awkward. But that still left a question hanging:
why was Brian using literal vectors so often that he needed one so
soon after writing this macro?</p>

<p>In Common Lisp, they’re not very useful because the elements are not
evaluated by the parser. When this vector is evaluated the result is a
vector where the second element is a list containing three atoms.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">#(</span><span class="mi">1</span> <span class="p">(</span><span class="nb">+</span> <span class="mi">2</span> <span class="mi">3</span><span class="p">)</span> <span class="mi">4</span><span class="p">)</span>
<span class="nv">=&gt;</span> <span class="o">#(</span><span class="mi">1</span> <span class="p">(</span><span class="nb">+</span> <span class="mi">2</span> <span class="mi">3</span><span class="p">)</span> <span class="mi">4</span><span class="p">)</span>
</code></pre></div></div>

<p>Evaluated arrays return themselves unchanged. To do most useful
things, a fresh vector needs to be constructed piecemeal. If somehow
the uniqueness of a literal array wasn’t an issue, they still couldn’t
be used for much.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">foo</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span>
  <span class="o">#(</span><span class="nv">x</span> <span class="nv">x</span> <span class="nv">x</span><span class="p">))</span>
<span class="p">(</span><span class="nv">foo</span> <span class="mi">10</span><span class="p">)</span>
<span class="nv">=&gt;</span> <span class="o">#(</span><span class="nv">X</span> <span class="nv">X</span> <span class="nv">X</span><span class="p">)</span>
</code></pre></div></div>

<p>To achieve the desired effect, the <code class="language-plaintext highlighter-rouge">vector</code> function needs to be used
again. Because it’s a normal function call, the arguments are
evaluated.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">foo</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">vector</span> <span class="nv">x</span> <span class="nv">x</span> <span class="nv">x</span><span class="p">))</span>
<span class="p">(</span><span class="nv">foo</span> <span class="mi">10</span><span class="p">)</span>
<span class="nv">=&gt;</span> <span class="o">#(</span><span class="mi">10</span> <span class="mi">10</span> <span class="mi">10</span><span class="p">)</span>
</code></pre></div></div>

<p>However, to my surprise, Clojure doesn’t work like this! Literal
vectors have their elements evaluated and, if necessary, are created
fresh on every use — exactly like a call to <code class="language-plaintext highlighter-rouge">vector</code>.</p>

<div class="language-clojure highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">foo</span><span class="w"> </span><span class="p">[</span><span class="n">x</span><span class="p">]</span><span class="w">
  </span><span class="p">[</span><span class="n">x</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="n">x</span><span class="p">])</span><span class="w">
</span><span class="p">(</span><span class="nf">foo</span><span class="w"> </span><span class="mi">10</span><span class="p">)</span><span class="w">
</span><span class="n">=&gt;</span><span class="w"> </span><span class="p">[</span><span class="mi">10</span><span class="w"> </span><span class="mi">10</span><span class="w"> </span><span class="mi">10</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nb">identical?</span><span class="w"> </span><span class="p">(</span><span class="nf">foo</span><span class="w"> </span><span class="mi">10</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nf">foo</span><span class="w"> </span><span class="mi">10</span><span class="p">))</span><span class="w">
</span><span class="n">=&gt;</span><span class="w"> </span><span class="n">false</span><span class="w">
</span></code></pre></div></div>

<p>If the exact form of the vector is needed unevaluated, it needs to be
quoted just like lists.</p>

<div class="language-clojure highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">foo</span><span class="w"> </span><span class="p">[</span><span class="n">x</span><span class="p">]</span><span class="w">
  </span><span class="o">'</span><span class="p">[</span><span class="n">x</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="n">x</span><span class="p">])</span><span class="w">
</span><span class="p">(</span><span class="nf">foo</span><span class="w"> </span><span class="mi">10</span><span class="p">)</span><span class="w">
</span><span class="n">=&gt;</span><span class="w"> </span><span class="p">[</span><span class="n">x</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="n">x</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nb">identical?</span><span class="w"> </span><span class="p">(</span><span class="nf">foo</span><span class="w"> </span><span class="mi">10</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nf">foo</span><span class="w"> </span><span class="mi">10</span><span class="p">))</span><span class="w">
</span><span class="n">=&gt;</span><span class="w"> </span><span class="n">true</span><span class="w">
</span></code></pre></div></div>

<p>After further reflection, I now feel like this is the <em>right</em> way to
go about implementing vectors. When I was first learning Lisp the
non-evaluating nature of arrays really caught me by surprise. Vectors
should evaluate their elements by default; if the Common Lisp behavior
is needed it can always be quoted. It’s impossible to “fix” any
established Lisp of course, so I’m merely wishing this was the
behavior defined decades ago.</p>

<p>To recap: normally in Lisp, vectors evaluate to themselves, like
numbers and strings. Instead, evaluation of a vector should return a
<em>new</em> vector containing the results of each of the element
evaluated. Since Clojure’s data structures are immutable, the compiler
can take a shortcut when it can guarantee each of a vector’s elements
always evaluate to themselves, and have the vector evaluate to itself
— purely as an optimization.</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Fake Emacs Namespaces</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2011/08/18/"/>
    <id>urn:uuid:f89408fd-9b2f-3110-af83-fe96f7c1e7f7</id>
    <updated>2011-08-18T00:00:00Z</updated>
    <category term="elisp"/><category term="emacs"/>
    <content type="html">
      <![CDATA[<p>Back in May I wrote a crude <code class="language-plaintext highlighter-rouge">defpackage</code> function for Elisp, modeled
after Common Lisp’s version. I’m calling them fakespaces.</p>

<ul>
  <li><a href="https://github.com/skeeto/elisp-fakespace">https://github.com/skeeto/elisp-fakespace</a></li>
</ul>

<p>It works like so (see <code class="language-plaintext highlighter-rouge">example.el</code> for detailed information on this
code),</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'fakespace</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defpackage</span> <span class="nv">example</span>
  <span class="p">(</span><span class="ss">:use</span> <span class="nv">cl</span> <span class="nv">ido</span><span class="p">)</span>
  <span class="p">(</span><span class="ss">:export</span> <span class="nv">example-main</span> <span class="nv">example-var</span> <span class="nv">eq-hello</span> <span class="nv">hello</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defvar</span> <span class="nv">my-var</span> <span class="mi">100</span>
  <span class="s">"A hidden variable."</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defvar</span> <span class="nv">example-var</span> <span class="no">nil</span>
  <span class="s">"A public variable."</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">my-func</span> <span class="p">()</span>
  <span class="s">"A private function."</span>
  <span class="nv">my-var</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">example-main</span> <span class="p">()</span>
  <span class="s">"An exported function. Notice we can access all the private
variables and functions from here."</span>
  <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="nv">my-func</span><span class="p">)</span> <span class="nv">my-var</span><span class="p">)</span> <span class="nv">example-var</span>
        <span class="p">(</span><span class="nv">ido-completing-read</span> <span class="s">"New value: "</span> <span class="p">(</span><span class="nb">list</span> <span class="s">"foo"</span> <span class="s">"bar"</span><span class="p">))))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">eq-hello</span> <span class="p">(</span><span class="nv">sym</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">eq</span> <span class="nv">sym</span> <span class="ss">'hello</span><span class="p">))</span>

<span class="p">(</span><span class="nv">end-package</span><span class="p">)</span>
</code></pre></div></div>

<p>Notice <code class="language-plaintext highlighter-rouge">end-package</code> at the end, which is not needed in Common
Lisp. That’s part of what makes it crude.</p>

<p>If you run those functions and try changing the assignment of
non-exported symbols, you’ll see the namespace separation in
action. <code class="language-plaintext highlighter-rouge">my-var</code> and <code class="language-plaintext highlighter-rouge">my-func</code> are a completely different symbols than
the ones you’re seeing after <code class="language-plaintext highlighter-rouge">end-package</code>.</p>

<p>It’s really simple in how it works (it’s 40 lines of code). The
<code class="language-plaintext highlighter-rouge">defpackage</code> macro takes a snapshot of the symbol table. Then new
symbols get interned through various function and variable
definitions. Finally <code class="language-plaintext highlighter-rouge">end-package</code> compares the current symbol table
to the snapshot and uninterns any new symbols. These symbols will be
unaccessible to other code, effectively giving them their own
namespace.</p>

<p>Snapshots are pushed onto a stack, so it’s safe to create a new
package within another package, as long as <code class="language-plaintext highlighter-rouge">end-package</code> is used
properly. This is necessary when one namespaced package depends on
another, because the dependency will tend to be loaded in the middle
of defining the current package.</p>

<p><code class="language-plaintext highlighter-rouge">in-package</code> is not provided, so there’s no way to get the symbols
back to where they can be accessed. It’s impossible to modify a
package using fake namespacing. Worst of all, implementing
<code class="language-plaintext highlighter-rouge">in-package</code> is currently (and will likely always be) impossible. When
symbols are uninterned they would need to be stored in a package
symbol table for future re-interning. <code class="language-plaintext highlighter-rouge">in-package</code>’s job would be to
unintern and store away the current package’s symbols and then place
the new package’s symbols into the main symbol table.</p>

<p>However, symbols cannot be re-interned. This is because it’s
impossible for a symbol to exist in two different obarrays at the same
time, so the functionality is intentionally not provided. An obarray
is an Elisp vector containing symbols. It’s treated like a hash table:
the symbol is hashed to choose a location in the vector. If the slot
is already taken, the symbol is invisibly chain behind the residing
symbol by an inaccessible linked list. If the symbol was in two
obarrays at once, it would need to be able to chain to two different
symbols at the same time.</p>

<p>Providing access to symbols through a colon-specificed namespace
(<code class="language-plaintext highlighter-rouge">my-package:my-symbol</code>) is also currently impossible — without
hacking in C anyway.</p>

<p>There’s a neat trick to the <code class="language-plaintext highlighter-rouge">:export</code> list. The <code class="language-plaintext highlighter-rouge">defpackage</code> macro
definition actually ignores that list altogether, because it works
automatically. By the time <code class="language-plaintext highlighter-rouge">defpackage</code> is invoked, the listed symbols
have already been interned by the reader, so they get stored in the
snapshot.</p>

<p>I doubt I’ll ever make use of this for my own packages. This was
mostly a fun exercise in toying with Elisp.</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Elisp Function Composition</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2010/11/15/"/>
    <id>urn:uuid:86809c4e-9f00-396d-71ab-b48d5950a343</id>
    <updated>2010-11-15T00:00:00Z</updated>
    <category term="elisp"/><category term="emacs"/>
    <content type="html">
      <![CDATA[<!-- 15 November 2010 -->
<p>
During my recent Elisp hacking I've run into the situation enough
times where I really wanted function composition that I officially
implemented it for myself. While there is
an <a href="/blog/2010/09/29/"> <code>apply-partially</code></a>
function, Elisp does not currently come with a <code>compose</code>
function. Here's an Elisp definition,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="c1">;; ID: f0c736a9-afec-3e3f-455c-40997023e130</span>
<span class="p">(</span><span class="nb">defun</span> <span class="nv">compose</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">funs</span><span class="p">)</span>
  <span class="s">"Return function composed of FUNS."</span>
  <span class="p">(</span><span class="nv">lexical-let</span> <span class="p">((</span><span class="nv">lex-funs</span> <span class="nv">funs</span><span class="p">))</span>
    <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">args</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">reduce</span> <span class="ss">'funcall</span> <span class="p">(</span><span class="nb">butlast</span> <span class="nv">lex-funs</span><span class="p">)</span>
              <span class="ss">:from-end</span> <span class="no">t</span>
              <span class="ss">:initial-value</span> <span class="p">(</span><span class="nb">apply</span> <span class="p">(</span><span class="nb">car</span> <span class="p">(</span><span class="nb">last</span> <span class="nv">lex-funs</span><span class="p">))</span> <span class="nv">args</span><span class="p">)))))</span></code></pre></figure>
<p>
Here it is in action with three functions.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">compose</span> <span class="ss">'prin1-to-string</span> <span class="ss">'random*</span> <span class="ss">'exp</span><span class="p">)</span> <span class="mi">10</span><span class="p">)</span></code></pre></figure>
<p>
I'll be using this in later posts (and linking back here when I do).
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Emacs Find All Files</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2010/09/30/"/>
    <id>urn:uuid:6b914b5a-f8f8-3d5d-469d-5d2c25b909c8</id>
    <updated>2010-09-30T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<!-- 30 September 2010 -->
<p>
Here's another bit of code I started using recently. I often find
myself wanting to open — or reopen
after <code>kill-matching-buffers</code> — all the files under a
specific point in the file system. I'm using it at work now to open up
all the source files in a deep Java source tree on small-ish
project. Once it's all open I can switch to any file quickly
with <a href="http://www.emacswiki.org/emacs/InteractivelyDoThings">
ido's fuzzy matching</a>, flattening out the directory structure a
bit. (And the ridiculous "security" software at work imposes a
3-second I/O block when opening files, so I get to pay this all up
front at once rather than having it later
<a href="http://c2.com/cgi/wiki?MentalStateCalledFlow"> break my
flow</a>.)
</p>
<p>
This just recursively travels down the sub-directories opening a
buffer for everything it comes across. It ignores dot-files, like the
ones your source control might litter.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="c1">;; ID: 72dc0a9e-c41c-31f8-c8f5-d9db8482de1e</span>
<span class="p">(</span><span class="nb">defun</span> <span class="nv">find-all-files</span> <span class="p">(</span><span class="nv">dir</span><span class="p">)</span>
  <span class="s">"Open all files and sub-directories below the given directory."</span>
  <span class="p">(</span><span class="nv">interactive</span> <span class="s">"DBase directory: "</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nb">list</span> <span class="p">(</span><span class="nv">directory-files</span> <span class="nv">dir</span> <span class="no">t</span> <span class="s">"^[^.]"</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">files</span> <span class="p">(</span><span class="nb">remove-if</span> <span class="ss">'file-directory-p</span> <span class="nb">list</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">dirs</span> <span class="p">(</span><span class="nb">remove-if-not</span> <span class="ss">'file-directory-p</span> <span class="nb">list</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">file</span> <span class="nv">files</span><span class="p">)</span>
      <span class="p">(</span><span class="nv">find-file-noselect</span> <span class="nv">file</span><span class="p">))</span>
    <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">dir</span> <span class="nv">dirs</span><span class="p">)</span>
      <span class="p">(</span><span class="nv">find-file-noselect</span> <span class="nv">dir</span><span class="p">)</span>
      <span class="p">(</span><span class="nv">find-all-files</span> <span class="nv">dir</span><span class="p">))))</span></code></pre></figure>
<p>
One caveat: if you have a symbolic link that creates a file system
loop, this will probably get hung on it.
</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Elisp Higher-order Conversion to Interactive</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2010/09/29/"/>
    <id>urn:uuid:0677cf0d-4ba9-300f-3bf0-795a821b1287</id>
    <updated>2010-09-29T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<!-- 29 September 2010 -->
<p>
For those not familiar with extending Emacs, when you create a
function in Elisp it cannot be called directly by the user
("interactively") without declaring the function interactive. The
simplest way to do this is by adding <code>(interactive)</code> to the
top of the function definition. The <code>interactive</code> call can
be made more complex, if needed, to ask the user interactively for
input.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">defun</span> <span class="nv">hello-world</span> <span class="p">()</span>
  <span class="s">"Example function."</span>
  <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">message</span> <span class="s">"hello"</span><span class="p">))</span></code></pre></figure>
<p>
There are some handy higher-order functions in Elisp, such
as <code>compose</code> and <code>apply-partially</code>. Today I
wanted to bind the output of <code>apply-partially</code> to a key. My
situation was this: I use <code>revert-buffer</code> often enough that
it needs a binding. Also because I use it so much, I wanted it to
stop asking me for confirmation. (Yes,
there <a href="http://www.emacswiki.org/emacs/YesOrNoP"> are other
ways to do this</a> including <code>revert-without-query</code>, but I
wanted a general solution.) Using <code>apply-partially</code> I could
supply the needed function arguments at keybind time.
</p>
<p>
The problem is that you can only bind interactive functions, and the
output of <code>apply-partially</code> is not interactive. A quick way
to work around this is to wrap it in an anonymous function, which also
takes away the need for <code>apply-partially</code>.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span> <span class="p">(</span><span class="nv">revert-buffer</span> <span class="no">nil</span> <span class="no">t</span><span class="p">))</span></code></pre></figure>
<p>
I'd rather there be <i>another</i> higher-order function that takes a
non-interactive function and creates an interactive version. Here it is,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="c1">;; ID: c7db6dec-e7ab-3b0f-bf26-0fa268674c6c</span>
<span class="p">(</span><span class="nb">defun</span> <span class="nv">expose</span> <span class="p">(</span><span class="k">function</span><span class="p">)</span>
  <span class="s">"Return an interactive version of FUNCTION."</span>
  <span class="p">(</span><span class="nv">lexical-let</span> <span class="p">((</span><span class="nv">lex-func</span> <span class="k">function</span><span class="p">))</span>
    <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
      <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">funcall</span> <span class="nv">lex-func</span><span class="p">))))</span></code></pre></figure>
<p>
Now the binding looks like this,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">global-set-key</span> <span class="nv">[f2]</span> <span class="p">(</span><span class="nv">expose</span> <span class="p">(</span><span class="nv">apply-partially</span> <span class="ss">'revert-buffer</span> <span class="no">nil</span> <span class="no">t</span><span class="p">)))</span></code></pre></figure>
<p>
I think this more clearly expresses my intention than
the <code>lambda</code> wrapper would. Maybe?
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Distributed Computing with Emacs</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2010/08/07/"/>
    <id>urn:uuid:9bc67be6-abda-37cd-d34f-ef0e4622358c</id>
    <updated>2010-08-07T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<!-- 7 August 2010 -->
<p>
I got an Elisp idea today and even went as far as implementing a proof
of concept for it: distributed computing with Emacs Lisp. As usual for
me the idea takes advantage of Lisp features to make the task pretty
simple, very specifically Elisp's implementation. In this case it's
the Lisp reader, printer, and the fact that Elisp functions have a
printed representation, both byte-compiled and not.
</p>
<p>
Here's the proof of concept code: <a href="/download/dist-emacs.el">dist-emacs.el</a>
</p>
<p>
A central server listens for TCP connections. Clients offering their
CPU for use connect to the server and await instructions. The server
sends a single, no-argument, anonymous function to the client. The
client calls the function, returning the resulting form back to the
server. In order to transmit the function it's encoded into a string
using the Lisp printer, and the client turns it back into an
executable function with the Lisp reader.
</p>
<p>
For some simple security there is a shared password between the client
and server. When the server sends a function it includes a signature,
and the client only runs code that matches the signature. To create a
signature the string encoded version of the function is appended with
the password (both strings) and hashed with a secure hashing
algorithm. Only someone who knows the password — including other
clients — can create the signature.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">defun</span> <span class="nv">sign-sexp</span> <span class="p">(</span><span class="nv">password</span> <span class="nv">sexp</span><span class="p">)</span>
  <span class="s">"Return signature of the given s-exp."</span>
  <span class="p">(</span><span class="nv">sha1</span> <span class="p">(</span><span class="nb">format</span> <span class="s">"%s%s"</span> <span class="nv">password</span> <span class="nv">sexp</span><span class="p">))</span></code></pre></figure>
<p>
To make it easy for the client to read in both the signature and the
function we just cons them together before encoding them as text.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">defun</span> <span class="nv">encode</span> <span class="p">(</span><span class="nv">password</span> <span class="nv">sexp</span><span class="p">)</span>
  <span class="s">"Encode a s-exp for transmission to client."</span>
  <span class="p">(</span><span class="nb">prin1-to-string</span> <span class="p">(</span><span class="nb">cons</span> <span class="p">(</span><span class="nv">sign-sexp</span> <span class="nv">password</span> <span class="nv">sexp</span><span class="p">)</span> <span class="nv">sexp</span><span class="p">)))</span></code></pre></figure>
<p>
The client calls the Lisp reader on the string, then checks the
signature in the <code>car</code> cell against the s-expression in
the <code>cdr</code> cell. This will return the function if it's
legitimate, otherwise <code>nil</code>.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">defun</span> <span class="nv">decode</span> <span class="p">(</span><span class="nv">password</span> <span class="nv">str</span><span class="p">)</span>
  <span class="s">"Decode string into s-exp, checking the signature in the process."</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nb">cons</span> <span class="p">(</span><span class="nb">read</span> <span class="nv">str</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">sig</span>  <span class="p">(</span><span class="nb">car</span> <span class="nb">cons</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">sexp</span> <span class="p">(</span><span class="nb">cdr</span> <span class="nb">cons</span><span class="p">)))</span>
    <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">equal</span> <span class="nv">sig</span> <span class="p">(</span><span class="nv">sign-sexp</span> <span class="nv">password</span> <span class="nv">sexp</span><span class="p">))</span>
        <span class="nv">sexp</span>
      <span class="no">nil</span><span class="p">)))</span></code></pre></figure>
<p>
And that's the core of it. It just needs some network code to move the
string between computers. That part can be found in the linked source
above.
</p>
<p>
To demo this, I'll use the <code>whiten</code> function from
my <a href="/blog/2010/07/26/">previous post</a>. I'll run it with
three different strings on three different computers. Assume we
started the dist-emacs server (<code>dist-start</code>) and connected
three clients (<code>dist-connect</code>) from three computers to
it. The clients were fired up from scratch so there's
no <code>whiten</code> function on them yet, but there <i>is</i> one
defined on the server. First we'll send the function definition to the
clients. The <code>dist-dist</code> function takes a list of functions
and passes each one to a client. Ideally I'd want this function to be
more intelligent, managing a work queue so that an arbitrary length
list of functions will be fed one at a time to each client. That's not
the case here.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">dist-dist</span> <span class="p">(</span><span class="nb">mapcar</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">p</span><span class="p">)</span>
                          <span class="o">`</span><span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
                             <span class="p">(</span><span class="nv">fset</span> <span class="ss">'whiten</span> <span class="o">,</span><span class="p">(</span><span class="nb">symbol-function</span> <span class="ss">'whiten</span><span class="p">))))</span>
                      <span class="nv">dist-clients</span><span class="p">))</span></code></pre></figure>
<p>
Also like in the previous post, this is an abstraction leak with the
Emacs implementation. But I like this trick so I'm going to use it
anyway. :-) Next we call it on each client with a different string.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">dist-dist</span> <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="p">(</span><span class="nv">whiten</span> <span class="s">"good"</span><span class="p">))</span>
                 <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="p">(</span><span class="nv">whiten</span> <span class="s">"news"</span><span class="p">))</span>
                 <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="p">(</span><span class="nv">whiten</span> <span class="s">"everyone"</span><span class="p">))))</span></code></pre></figure>
<p>
The way I have it set up for my proof of concept the results are just
spit back into the server's <code>*Messages*</code> buffer. If we
watch that buffer we can see each results come back in one at a time
as each machine finishes. I can watch Emacs saturate the CPU on every
client machine simultaneously as it works.
</p>
<pre>
"2577343027adf7817185db876032d8ed"
"46a65dac2c0040afde175adf1e9a81fd"
"f39baf9e74475dd5be7d5495a025fe84"
</pre>
<p>
This isn't the same order as the clients, but the order in which the
jobs were completed.
</p>
<p>
As for the practicality, I doubt there really is one. It's really only
a neat concept (or maybe not even that). For almost the exact same
reasons as my <a href="/blog/2009/06/09/">distributed JavaScript</a>
idea, this is a solution looking for a problem. The problem needs to
be able to be broken into small computation units, because Emacs has
no threading, and it has to be low bandwidth, because it has to be
parsed all at once from a string. If you want to pass large data sets
it needs to be done out-of-band, which probably defeats the
purpose. There seem to be few to no problems that fit these
limitations.
</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Elisp Memoize</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2010/07/26/"/>
    <id>urn:uuid:637ae303-cbcd-3817-8917-26e1640944f5</id>
    <updated>2010-07-26T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<!-- 26 July 2010 -->
<p>
<a href="/blog/2008/03/25/">Memoization</a> is something I think
should be packaged as a standard function for just about every
language. That's not generally the case, but luckily this is easy to
fix in Lisps. I needed memoization recently for an Elisp project I'm
working on. I could have hand-written one but a generic memoization
function would have worked just fine. Since I didn't find any generic
Elisp memoization on-line I wrote my own.
</p>
<p>
<b>Download:
  <a href="https://raw.github.com/skeeto/emacs-memoize/master/memoize.el">
    memoize.el
  </a>
</b>
</p>
<p>
Just put it in your path
and <code>(require 'memoize)</code> it. Here's the core
function.
</p>

<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="c1">;; ID: 83bae208-da65-3e26-2ecb-4941fb310848</span>
<span class="p">(</span><span class="nb">defun</span> <span class="nv">memoize-wrap</span> <span class="p">(</span><span class="nv">func</span><span class="p">)</span>
  <span class="s">"Return the memoized version of FUNC."</span>
  <span class="p">(</span><span class="nv">lexical-let</span> <span class="p">((</span><span class="nv">table</span> <span class="p">(</span><span class="nb">make-hash-table</span> <span class="ss">:test</span> <span class="ss">'equal</span><span class="p">)))</span>
    <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">args</span><span class="p">)</span>
      <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">value</span> <span class="p">(</span><span class="nb">gethash</span> <span class="nv">args</span> <span class="nv">table</span><span class="p">)))</span>
        <span class="p">(</span><span class="k">if</span> <span class="nv">value</span>
            <span class="nv">value</span>
          <span class="p">(</span><span class="nv">puthash</span> <span class="nv">args</span> <span class="p">(</span><span class="nb">apply</span> <span class="nv">func</span> <span class="nv">args</span><span class="p">)</span> <span class="nv">table</span><span class="p">))))))</span></code></pre></figure>

<p>
The hash table is stored inside the fake closure provided by
<code>lexical-let</code>. In a previous version of this function, I
stored it in an uninterned symbol, which is what is going on behind
the scenes of <code>lexical-let</code>.
</p>
<p>
Note that in the full code it keeps the original function
documentation intact. I want the memoization wrapper to be an
unobtrusive as possible.
</p>
<p>
Here's a demo of it in action. This <code>whiten</code> function is
computationally expensive: it performs key whitening. It repeats a
hash function thousands of times to produce an expensive value. This
isn't something you generally want to memoize, but stick with me.
</p>

<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">defun</span> <span class="nv">whiten</span> <span class="p">(</span><span class="nv">key</span><span class="p">)</span>
  <span class="s">"Perform key whitening with the md5 hash function."</span>
  <span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">i</span> <span class="mi">100000</span> <span class="nv">key</span><span class="p">)</span>
    <span class="p">(</span><span class="k">setq</span> <span class="nv">key</span> <span class="p">(</span><span class="nv">md5</span> <span class="nv">key</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">whiten</span> <span class="s">"password"</span><span class="p">)</span>   <span class="c1">; takes a couple of seconds</span></code></pre></figure>

<p>
On my laptop that takes a couple of seconds to run. Increase that
counter if it's quick on your computer. My memoize package provides
a <code>memoize</code> function which will create a new function that
wraps the original, then installs the new function in place of the old
one if we give it the function symbol.
</p>

<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">memoize</span> <span class="ss">'whiten</span><span class="p">)</span></code></pre></figure>

<p>
The first time you run it after memoization it will be slow, but after
that the memoization kicks in for a quick return.
</p>
<p>
There are two Elisp specific issues at hand. First is that memoizing
an interactive function will produce a non-interactive function. It
would be easy to fix this problem when it comes to non-byte-compiled
functions, but recovering the interactive definition from a
byte-compiled function is more complex than I care to deal
with. Besides, interactive functions are always used for their side
effects so there's no reason to memoize them.
</p>
<p>
Second is a limitation of Elisp hash tables. There's no way to
distinguish a nil value and no value. The hash table returns nil for
both. This means you cannot memoize nil returns. But a computationally
expensive function shouldn't be returning nil anyway.
</p>
<p>
<i>Update</i>: As of August 2012, me and several other people have
gotten good mileage out of this function! It's an essential part of my
Emacs dotfiles.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Emacs ParEdit and IELM</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2010/06/10/"/>
    <id>urn:uuid:2fdb7ef9-e9c6-379c-073f-ab33fc8f5875</id>
    <updated>2010-06-10T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<!-- 10 June 2010 -->
<p>
<a href="http://www.emacswiki.org/emacs/ParEdit">ParEdit</a> is a
powerful extension to Emacs that I've just begun using recently. It's
a minor mode that forces all parenthesis, square brackets, and quotes
to be balanced at all times. While it's useful for any programming
language it's especially suited for Lisps, because it's designed for
manipulating nested parenthesis — i.e. s-expressions. It's not
currently part of Emacs so you have to drop the script in
your <code>load-path</code> somewhere.
</p>
<p>
I've frequently thought that a Lisp-based shell would be an
interesting and powerful tool, much like a normal Lisp REPL. Programs
would be treated like Lisp functions. For example,
</p>
<pre>
wellons@luna:~$ (ls -l .emacs)
-rw------- 1 wellons wellons 4859 2010-06-10 23:20 .emacs
wellons@luna:~$
</pre>
<p>
But typing all those parenthesis all the time would be quite the
nuisance. I know this from experience typing at Lisp REPLs. I imagined
something that works exactly like ParEdit would be needed to make all
that work go away. To save even more time each prompt would begin with
a nested pair, with the cursor placed between them. Then typing a
quick command is no different than a normal shell.
</p>
<pre>
wellons@luna:~$ ()
</pre>
<p>
Well, in Emacs we have both ParEdit and REPLs, so we can compose these
features together with just a little advice. Here's how to do it with
the Interactive Emacs-Lisp Mode (IELM) REPL. First tell IELM to use
ParEdit,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'ielm-mode-hook</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="p">(</span><span class="nv">paredit-mode</span> <span class="mi">1</span><span class="p">)))</span></code></pre></figure>
<p>
The function in IELM that spits out the next prompt
is <code>ielm-eval-input</code>, so we give it the advice to call the
ParEdit function afterwards to insert a parenthesis pair.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">defadvice</span> <span class="nv">ielm-eval-input</span> <span class="p">(</span><span class="nv">after</span> <span class="nv">ielm-paredit</span> <span class="nv">activate</span><span class="p">)</span>
  <span class="s">"Begin each IELM prompt with a ParEdit parenthesis pair."</span>
  <span class="p">(</span><span class="nv">paredit-open-round</span><span class="p">))</span></code></pre></figure>
<p>
And that's it! Note that the first IELM prompt is not placed by this
function so it won't appear until the second prompt.
</p>
<pre>
*** Welcome to IELM ***  Type (describe-mode) for help.
ELISP>
ELISP> ()
</pre>
<p>
If you want to enter a single atom and don't need parenthesis, just
hit backspace once. This is much less common so it gets the extra
keystroke.
</p>
<p>
This can be done for <code>inferior-lisp</code>
and <a href="/blog/2010/01/15/">SLIME</a> to enhance those REPLs as
well. You just have to figure out which defun to advise.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Emacs Web Servlets</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2009/11/03/"/>
    <id>urn:uuid:b0a4e98c-4cf7-3c5b-6425-ca437c1ca4ee</id>
    <updated>2009-11-03T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<!-- 3 November 2009 -->
<p>
Remember that <a href="/blog/2009/05/17/">Emacs web server I wrote</a>
back in May? Well, I got an e-mail last night from Chunye Wang
containing a patch with a variant of my dynamic lisp idea, called
"servlets" (not to be confused with Java servlets). Chunye had similar
concept for an Emacs web server for a long time, but never implemented
because Emacs lacked network functionality until recently
(Specifically, <a href="http://www.gnu.org/software/emacs/NEWS.22.1">
<code>make-network-process</code></a> in Emacs 22.1, June 2007). This
led Chunye to find my implementation.
</p>
<p>
Again, you can clone/view the code here. I turned the patch into a
series of commits,
</p>
<pre>
git clone <a href="https://github.com/skeeto/emacs-http-server">git://github.com/skeeto/emacs-http-server.git</a>
</pre>
<p>
This is some cool stuff here.
</p>
<p>
The servlets are simply functions installed under an
"<code>httpd/</code>" namespace, where the trailing slash represents
the server root. So, the function <code>httpd/example-servlet</code>
will be executed when "/example-servlet" is requested from the
server. The servlet runs on a temporary buffer, whose contents are
served when the servlet function returns.
</p>
<p>
To assist in HTML generation, Chunye also wrote a function to turn an
<a href="http://en.wikipedia.org/wiki/S-expression">S-expression</a>
into HTML, similar to the one I described in the web server previous
post. Symbols are converted into strings, alists are attributes, and
the <code>elisp</code> symbol indicates code to be executed, and the
results used to generate HTML. For a simple hello word,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">html</span> <span class="p">(</span><span class="nv">head</span> <span class="p">(</span><span class="nv">title</span> <span class="s">"hello world"</span><span class="p">))</span> <span class="p">(</span><span class="nv">body</span> <span class="s">"hello world"</span><span class="p">))</span></code></pre></figure>
<p>
And for some dynamic content, a die roller,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">defun</span> <span class="nv">httpd/roll-die</span> <span class="p">(</span><span class="nv">uri-query</span> <span class="nv">req</span> <span class="nv">uri-path</span><span class="p">)</span>
  <span class="s">"Rolls a die with the requested number of sides (default 6)."</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">sides</span>
         <span class="p">(</span><span class="nb">1-</span> <span class="p">(</span><span class="nv">string-to-number</span> <span class="p">(</span><span class="nb">or</span> <span class="p">(</span><span class="nb">cadr</span> <span class="p">(</span><span class="nb">assoc</span> <span class="s">"sides"</span> <span class="nv">uri-query</span><span class="p">))</span> <span class="s">"6"</span><span class="p">)))))</span>
    <span class="p">(</span><span class="nv">httpd-generate-html</span>
     <span class="o">'</span><span class="p">(</span><span class="nv">html</span>
       <span class="p">(</span><span class="nv">head</span>
        <span class="p">(</span><span class="nv">title</span> <span class="s">"Die Roll Servlet"</span><span class="p">))</span>
       <span class="p">(</span><span class="nv">body</span>
        <span class="p">(</span><span class="nv">h1</span> <span class="s">"Die Roll Servlet"</span><span class="p">)</span>
        <span class="s">"You rolled a "</span>
        <span class="p">(</span><span class="nv">b</span>
         <span class="p">(</span><span class="nv">elisp</span> <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="nv">number-to-string</span> <span class="p">(</span><span class="nb">1+</span> <span class="p">(</span><span class="nb">random</span> <span class="nv">sides</span><span class="p">)))))))))))</span></code></pre></figure>
<p>
That one would be accessed from the browser with with
"<code>/roll-die</code>" or "<code>/roll-die?sides=100</code>".
</p>
<p>
Chunye provided some sample servlets that list the buffers, with links
that serve them up. There is also another servlet that will switch the
current buffer, which I find compelling. All of Emacs' functionality
is available to the servlet.
</p>
<p>
Now, to write a servlet that runs the Emacs psychiatrist ...
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Lisp Fantasy Name Generator</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2009/07/03/"/>
    <id>urn:uuid:492b2b80-c4ba-3655-9811-1b183558d806</id>
    <updated>2009-07-03T00:00:00Z</updated>
    <category term="elisp"/><category term="lisp"/><category term="game"/>
    <content type="html">
      <![CDATA[<!-- 3 July 2009 -->
<p>
Earlier this year <a href="/blog/2009/01/04">I implemented the
RinkWorks fantasy name generator in Perl</a>. I think lisp lends
itself even better for that, and so I have a partial elisp
implementation for you.
</p>
<p>
What stands out for me is that the patterns can easily be represented
as a S-expression. We represent substitutions with symbols, literals
with strings, and groups with lists. For example, this pattern,
</p>
<pre>
s(ith|&lt;'C&gt;)V
</pre>
<p>
can be represented in code as,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">s</span> <span class="p">(</span><span class="s">"ith"</span> <span class="p">(</span><span class="s">"'"</span> <span class="nv">C</span><span class="p">))</span> <span class="nv">V</span><span class="p">)</span></code></pre></figure>
<p>
I want a function I can apply to this to generate a name. First, I set
up an association list with symbols and its replacements,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">defvar</span> <span class="nv">namegen-subs</span>
  <span class="o">'</span><span class="p">((</span><span class="nv">s</span> <span class="nv">ach</span> <span class="nv">ack</span> <span class="nv">ad</span> <span class="nv">age</span> <span class="nv">ald</span> <span class="nv">ale</span> <span class="nv">an</span> <span class="nv">ang</span> <span class="nv">ar</span> <span class="nv">ard</span> <span class="nv">as</span> <span class="nb">ash</span> <span class="nv">at</span> <span class="nv">ath</span> <span class="nv">augh</span>
       <span class="nv">aw</span> <span class="nv">ban</span> <span class="nv">bel</span> <span class="nv">bur</span> <span class="nv">cer</span> <span class="nv">cha</span> <span class="nv">che</span> <span class="nv">dan</span> <span class="nv">dar</span> <span class="nv">del</span> <span class="nv">den</span> <span class="nv">dra</span> <span class="nv">dyn</span>
       <span class="nv">ech</span> <span class="nv">eld</span> <span class="nv">elm</span> <span class="nv">em</span> <span class="nv">en</span> <span class="nv">end</span> <span class="nv">eng</span> <span class="nv">enth</span> <span class="nv">er</span> <span class="nv">ess</span> <span class="nv">est</span> <span class="nv">et</span> <span class="nv">gar</span> <span class="nv">gha</span>
       <span class="nv">hat</span> <span class="nv">hin</span> <span class="nv">hon</span> <span class="nv">ia</span> <span class="nv">ight</span> <span class="nv">ild</span> <span class="nv">im</span> <span class="nv">ina</span> <span class="nv">ine</span> <span class="nv">ing</span> <span class="nv">ir</span> <span class="nv">is</span> <span class="nv">iss</span> <span class="nv">it</span>
       <span class="nv">kal</span> <span class="nv">kel</span> <span class="nv">kim</span> <span class="nv">kin</span> <span class="nv">ler</span> <span class="nv">lor</span> <span class="nv">lye</span> <span class="nv">mor</span> <span class="nv">mos</span> <span class="nv">nal</span> <span class="nv">ny</span> <span class="nv">nys</span> <span class="nv">old</span> <span class="nv">om</span>
       <span class="nv">on</span> <span class="nb">or</span> <span class="nv">orm</span> <span class="nv">os</span> <span class="nv">ough</span> <span class="nv">per</span> <span class="nv">pol</span> <span class="nv">qua</span> <span class="nv">que</span> <span class="nv">rad</span> <span class="nv">rak</span> <span class="nv">ran</span> <span class="nv">ray</span> <span class="nv">ril</span>
       <span class="nv">ris</span> <span class="nv">rod</span> <span class="nv">roth</span> <span class="nv">ryn</span> <span class="nv">sam</span> <span class="nv">say</span> <span class="nv">ser</span> <span class="nv">shy</span> <span class="nv">skel</span> <span class="nv">sul</span> <span class="nv">tai</span> <span class="nb">tan</span> <span class="nv">tas</span>
       <span class="nv">ther</span> <span class="nv">tia</span> <span class="nv">tin</span> <span class="nv">ton</span> <span class="nv">tor</span> <span class="nv">tur</span> <span class="nv">um</span> <span class="nv">und</span> <span class="nv">unt</span> <span class="nv">urn</span> <span class="nv">usk</span> <span class="nv">ust</span> <span class="nv">ver</span>
       <span class="nv">ves</span> <span class="nv">vor</span> <span class="nv">war</span> <span class="nv">wor</span> <span class="nv">yer</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">v</span> <span class="nv">a</span> <span class="nv">e</span> <span class="nv">i</span> <span class="nv">o</span> <span class="nv">u</span> <span class="nv">y</span><span class="p">)</span>
    <span class="o">...</span>
    <span class="p">(</span><span class="nv">d</span> <span class="nv">elch</span> <span class="nv">idiot</span> <span class="nv">ob</span> <span class="nv">og</span> <span class="nv">ok</span> <span class="nv">olph</span> <span class="nv">olt</span> <span class="nv">omph</span> <span class="nv">ong</span> <span class="nv">onk</span> <span class="nv">oo</span> <span class="nv">oob</span> <span class="nv">oof</span> <span class="nv">oog</span>
       <span class="nv">ook</span> <span class="nv">ooz</span> <span class="nv">org</span> <span class="nv">ork</span> <span class="nv">orm</span> <span class="nv">oron</span> <span class="nv">ub</span> <span class="nv">uck</span> <span class="nv">ug</span> <span class="nv">ulf</span> <span class="nv">ult</span> <span class="nv">um</span> <span class="nv">umb</span> <span class="nv">ump</span> <span class="nv">umph</span>
       <span class="nv">un</span> <span class="nv">unb</span> <span class="nv">ung</span> <span class="nv">unk</span> <span class="nv">unph</span> <span class="nv">unt</span> <span class="nv">uzz</span><span class="p">))</span>
  <span class="s">"Substitutions for the name generator."</span><span class="p">)</span></code></pre></figure>
<p>
Since we will need this in a couple places, make a function to
randomly select an element from a list,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">defun</span> <span class="nv">randth</span> <span class="p">(</span><span class="nv">lst</span><span class="p">)</span>
  <span class="s">"Select random element from the given list."</span>
  <span class="p">(</span><span class="nb">nth</span> <span class="p">(</span><span class="nb">random</span> <span class="p">(</span><span class="nb">length</span> <span class="nv">lst</span><span class="p">))</span> <span class="nv">lst</span><span class="p">))</span></code></pre></figure>
<p>
A function for replacing a symbol,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">defun</span> <span class="nv">namegen-select</span> <span class="p">(</span><span class="nv">sym</span><span class="p">)</span>
  <span class="s">"Select a replacement for the given symbol."</span>
  <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">null</span> <span class="p">(</span><span class="nb">assoc</span> <span class="nv">sym</span> <span class="nv">namegen-subs</span><span class="p">))</span>
      <span class="p">(</span><span class="k">throw</span> <span class="ss">'bad-symbol</span>
             <span class="p">(</span><span class="nv">concat</span> <span class="s">"Invalid substitution symbol: "</span> <span class="p">(</span><span class="nb">format</span> <span class="s">"%s"</span> <span class="nv">sym</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">symbol-name</span> <span class="p">(</span><span class="nv">randth</span> <span class="p">(</span><span class="nb">cdr</span> <span class="p">(</span><span class="nb">assoc</span> <span class="nv">sym</span> <span class="nv">namegen-subs</span><span class="p">))))))</span></code></pre></figure>
<p>
And finally, the generator. Find a string, pass it through, find a
symbol, substitute it, find a list, pick one element and recurse on
it.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">defun</span> <span class="nv">namegen</span> <span class="p">(</span><span class="nv">sexp</span><span class="p">)</span>
  <span class="s">"Generate a name from the given sexp generator."</span>
  <span class="p">(</span><span class="nb">cond</span>
   <span class="p">((</span><span class="nb">null</span> <span class="nv">sexp</span><span class="p">)</span> <span class="s">""</span><span class="p">)</span>
   <span class="p">((</span><span class="nb">stringp</span> <span class="nv">sexp</span><span class="p">)</span> <span class="nv">sexp</span><span class="p">)</span>
   <span class="p">((</span><span class="nb">symbolp</span> <span class="nv">sexp</span><span class="p">)</span> <span class="p">(</span><span class="nv">namegen-select</span> <span class="nv">sexp</span><span class="p">))</span>
   <span class="p">((</span><span class="nb">listp</span> <span class="nv">sexp</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">concat</span> <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">listp</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">sexp</span><span class="p">))</span> <span class="p">(</span><span class="nv">namegen</span> <span class="p">(</span><span class="nv">randth</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">sexp</span><span class="p">)))</span>
              <span class="p">(</span><span class="nv">namegen</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">sexp</span><span class="p">)))</span>
            <span class="p">(</span><span class="nv">namegen</span> <span class="p">(</span><span class="nb">cdr</span> <span class="nv">sexp</span><span class="p">))))))</span></code></pre></figure>
<p>
That's it! We can apply it to the expression above,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">namegen</span> <span class="o">'</span><span class="p">(</span><span class="nv">s</span> <span class="p">(</span><span class="s">"ith"</span> <span class="p">(</span><span class="s">"'"</span> <span class="nv">C</span><span class="p">))</span> <span class="nv">V</span><span class="p">))</span>
<span class="nv">-&gt;</span> <span class="s">"rynithi"</span></code></pre></figure>
<p>
But that's really the easy part. The hard part would be converting the
original pattern into the S-expression, which I don't plan on doing
right now.
</p>
<p>
Something else to note: this is thousands of times faster than the
Perl version I wrote earlier.
</p>
<p>
I threw the code in with the rest of my name generation code
(namegen.el),
</p>
<pre>
git clone <a href="https://github.com/skeeto/fantasyname">git://github.com/skeeto/fantasyname.git</a>
</pre>
<p>
S-expressions are handy anywhere.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  <entry>
    <title>United States Hamiltonian Paths</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2009/06/21/"/>
    <id>urn:uuid:4124ec41-1b0a-3a60-a8a2-9667a2243cad</id>
    <updated>2009-06-21T00:00:00Z</updated>
    <category term="math"/><category term="elisp"/><category term="lisp"/>
    <content type="html">
      <![CDATA[<!-- 21 June 2009 -->
<p>
Awhile ago I wanted to <a
href="http://threesixty360.wordpress.com/2009/04/22/traveling-the-lower-48/">
find every Hamiltonian path in the contiguous 48 states</a>. That is,
trips that visit each state exactly once. Writing a program to search
for Hamiltonian paths is easy (<a href="/blog/2009/05/27">I did this
already</a>). The most time consuming part was actually putting
together the data that specified the graph to be searched. I hope
someone somewhere finds it useful. Here is a map for reference,
</p>
<p class="center">
<a href="/img/diagram/us48.png">
  <img src="/img/diagram/us48-small.png" alt=""/>
</a>
</p>
<p>
It took me several passes before I stopped finding errors. I
<i>think</i> I have it all right now, but there could still be some
mistakes. If you see one, leave a comment and I'll fix it here. Here
is the graph as an S-expression <a
href="http://en.wikipedia.org/wiki/Association_list#Association_lists">
alist</a>; the car (first) element in each list is a state, and the
cdr (rest) is the unordered list of states that can be reached from
it.
</p>
<pre>
((me nh)
 (nh vt ma me)
 (vt ny ma nh)
 (ma ri ct ny nh vt)
 (ny pa nj ma ct vt)
 (ri ma ct)
 (ct ri ma ny)
 (nj pa ny de)
 (de md pa nj)
 (pa nj ny de md wv oh)
 (md pa de va wv)
 (va md wv ky tn nc)
 (nc va tn ga sc)
 (sc nc ga)
 (ga fl sc al nc tn)
 (al ms fl ga tn)
 (ms la ar tn al)
 (tn ms al ga nc va ky mo ar)
 (ky wv va tn mo il in oh)
 (wv md pa oh ky va)
 (oh pa wv ky in mi)
 (fl al ga)
 (mi wi oh in)
 (wi mn ia il mi)
 (il in ky mo ia wi)
 (in oh ky il mi)
 (mo il ky tn ar ok ks ne ia)
 (ar mo tn ms la tx ok)
 (la ms ar tx)
 (tx ok nm ar la)
 (ok ks mo ar tx nm co)
 (ks ok co ne mo)
 (ne sd ia mo ks co wy)
 (sd nd mn ia ne wy mt)
 (nd mt sd mn)
 (ia ne mo il wi mn sd)
 (mn wi ia sd nd)
 (mt id wy sd nd)
 (wy id ut co ne sd mt)
 (co ne ks ok nm ut wy)
 (nm co ok tx az)
 (az nm ut ca nv)
 (ut nv id wy co az)
 (id mt wy ut nv or wa)
 (wa or id)
 (or wa id nv ca)
 (nv or id ut az ca)
 (ca az nv or))
</pre>
<p>
Note that all paths must start or end in Maine because it connects to
only one other state.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  <entry>
    <title>Elisp Wishlist</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2009/05/29/"/>
    <id>urn:uuid:41fa774c-1f9e-3ef1-1029-69b775475150</id>
    <updated>2009-05-29T00:00:00Z</updated>
    <category term="rant"/><category term="emacs"/><category term="elisp"/><category term="lisp"/>
    <content type="html">
      <![CDATA[<!-- 29 May 2009 -->
<p class="abstract">
<b>Update:</b> It looks like all these wishes, except the last one,
may actually be coming
true! <a href="http://lists.gnu.org/archive/html/emacs-devel/2010-04/msg00665.html">
Guile can run Elisp better than Emacs</a>! The idea is that the Elisp
engine is replaced with Guile — the GNU project's Scheme
implementation designed to be used as an extension language — and
written in Scheme is an Elisp compiler that targets Guile's VM. The
extension language of Emacs then becomes Scheme, but Emacs is still
able to run all the old Elisp code. At the same time Elisp itself,
which I'm sure many people will continue to use, gets an upgrade of
arbitrary precision, closures, and better performance.
</p>
<p>
I've been using elisp a lot lately, but unfortunately it's missing a
lot of features that one would find in a more standard lisp. The
following are some features I wish elisp had. Many of these could be
fit into a generic "be more like Scheme or Common Lisp". Some of these
features would break the existing mountain of elisp code out there,
requiring a massive rewrite, which is likely the main reason they are
being held back.
</p>
<p>
<b>Closures</b>, and maybe continuations. Closures are one of the
features I miss the most when writing elisp. They would allow the
implementation of Scheme-style lazy evaluation with <code>delay</code>
and <code>force</code>, among other neat tools. Continuations would
just be a neat thing to have, though they come with a performance
penalty.
</p>
<p>
Closures would also pretty much require Emacs switch to lexical
scoping.
</p>
<p>
<b>Arbitrary precision</b>. Really, any higher order language's
numbers should be bignums. Emacs 22 <i>does</i> come with the Calc
package which provides arbitrary precision via
<code>defmath</code>. Perl does something like this with the bignum
module.
</p>
<p>
<b>Packages/namespaces</b>. Without namespaces all of the Emacs
packages prefix their functions and variables with its name
(i.e. <code>dired-</code>). Some real namespaces would be useful for
large projects.
</p>
<p>
<b>C interface</b>. This is something GNU Emacs will never have
because Richard Stallman considers Emacs shared libraries support to
be <a href="http://www.emacswiki.org/emacs/DynamicallyExtendingEmacs">
a GPL threat</a>. If Emacs could be dynamically extended some useful
libraries could be linked in and exposed to elisp.
</p>
<p>
<b>Concurrency</b>. If some elisp is being executed Emacs will lock
up. This is a particular problem for Gnus. Again, Emacs would really
need to switch to lexical scoping before this could happen. Threading
would be nice.
</p>
<p>
<b>Speed</b>. Emacs lisp is pretty slow, even when compiled. Lexical
scoping would help with performance (compile time vs. run time
binding).
</p>
<p>
<b>Regex type</b>. I mention this last because I think this would be
really cool, and I am not aware of any other lisps that do it. Emacs
does regular expressions with strings, which is silly and
cumbersome. Backslashes need extra escaping, for example. Instead, I
would rather have a regex type like Perl and Javascript have. So
instead of,
</p>
<pre>
(string-match "\\w[0-9]+" "foo525")
</pre>
<p>
we have,
</p>
<pre>
(string-match /\w[0-9]+/ "foo525")
</pre>
<p>
Naturally there would be a <code>regexpp</code> predicate for checking
its type. There could also be a function for compiling a regexp from a
string into a regexp object. As a bonus, I would also like to use it
directly as a function,
</p>
<pre>
(/\w[0-9]+/ "foo525")
</pre>
<p>
I think a regexp price would really give elisp an edge, and would be
entirely appropriate for a text editor. It could also be done without
breaking anything (keep string-style regexp support).
</p>
<p>
There is more commentary over at EmacsWiki: <a
href="http://www.emacswiki.org/emacs/WhyDoesElispSuck"> Why Does Elisp
Suck</a>.
</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Elisp Running Time Macro</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2009/05/28/"/>
    <id>urn:uuid:a693dd66-ee70-35ea-3140-6468705ee2d9</id>
    <updated>2009-05-28T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<!-- 28 May 2009 -->
<p>
I wanted an elisp macro that could measure the running time of a block
of code. Specifically, I wanted it to work like this,
</p>
<pre>
(measure-time
  <i>...
  body
  ...</i>)
</pre>
<p>
And it would return the running time as seconds in floating
point. Well, here's a macro that does it!
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="c1">;; ID: 6a3f3d99-f0da-329a-c01c-bb6b868f3239</span>
<span class="p">(</span><span class="nb">defmacro</span> <span class="nv">measure-time</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">body</span><span class="p">)</span>
  <span class="s">"Measure and return the running time of the code block."</span>
  <span class="p">(</span><span class="k">declare</span> <span class="p">(</span><span class="nv">indent</span> <span class="nb">defun</span><span class="p">))</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">start</span> <span class="p">(</span><span class="nb">make-symbol</span> <span class="s">"start"</span><span class="p">)))</span>
    <span class="o">`</span><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="o">,</span><span class="nv">start</span> <span class="p">(</span><span class="nv">float-time</span><span class="p">)))</span>
       <span class="o">,@</span><span class="nv">body</span>
       <span class="p">(</span><span class="nb">-</span> <span class="p">(</span><span class="nv">float-time</span><span class="p">)</span> <span class="o">,</span><span class="nv">start</span><span class="p">))))</span></code></pre></figure>
<p>
It's only good for up to around 18 hours, then the time integer
overflows. If only Emacs had arbitrary precision numbers. Here it is
in action using my <a href="/blog/2009/05/23">binomial function from
last week</a>.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">measure-time</span>
  <span class="p">(</span><span class="nv">nck</span> <span class="mi">20</span> <span class="mi">10</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">nck</span> <span class="mi">30</span> <span class="mi">7</span><span class="p">))</span></code></pre></figure>
<p>
Which, just now, returned <code>3.643713</code> seconds when executed.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Emacs Web Server</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2009/05/17/"/>
    <id>urn:uuid:1e0a3639-6df2-3c5b-3b16-bb29ac300602</id>
    <updated>2009-05-17T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<!-- 17 May 2009 -->
<p>
As part of my quest of developing solid knowledge of <a
href="http://www.gnu.org/software/emacs/">GNU Emacs</a> lisp, I have
implemented a pseudo-HTTP/1.0 web server within Emacs. Behold,
</p>
<pre>
git clone <a href="https://github.com/skeeto/emacs-http-server">git://github.com/skeeto/emacs-http-server.git</a>
</pre>
<p>
To all other non-emacsen text editors, can your text editor do that?!
Ha! Even though elisp is a slow, closure-less, dynamically scoped,
ugly cousin of more popular lisps, it's still a lot of fun to write.
</p>
<p>
To fire it up, load it into Emacs and run the extended command
(<code>M-x</code>) <code>httpd-start</code>. By default it will serve
files from "<code>~/public_html</code>". To change this, change the
variable <code>httpd-root</code> to the desired web root. You can stop
the server with <code>httpd-stop</code>.
</p>
<p>
It's about 200 lines of code and can serve static websites made of
small, static files. I say small files because it serves files from
buffers, meaning it has to read the entire file in first.
</p>
<p>
For a simple, text editor based server it can hold up to a pretty
decent load. At one point I hit it with 8 <code>wget</code> instances all making
rapid recursive downloads and my manual navigation wasn't slowed down
noticeably. Despite running in the slow elisp interpreter, I think it
can have much better performance by caching commonly served files in
buffers.
</p>
<p>
It <i>should</i> run, unmodified, anywhere a modern Emacs can run, so
I expect that it's already very portable. I can imagine it being
useful in a situation where someone needs to temporarily host some
files but there isn't a web server on the machine. Just grab this
script and throw it at Emacs.
</p>
<p>
Well, it only does IPv4 right now, though I expect IPv6 only requires
changing one number (namely, 4 to 6). I don't have any IPv6 systems to
test it on.
</p>
<p>
When writing it I also had security in mind so, as far as I know, it
should be safe to use. It cleans up the <code>GET</code> from the
client so that no files underneath the serving root can be accessed.
</p>
<p>
The server log is lisp itself. Here is an example log starting the
server, serving one request, and halting,
</p>
<pre>
'(log
  (start "Wed May 13 23:33:34 2009")
  (connection
   (date "Wed May 13 23:36:25 2009")
   (address "192.168.0.3")
   (get "/0001.html")
   (req
    ("Referer" "http://192.168.0.2:8080/")
    ("Connection" "keep-alive")
    ("Keep-Alive" "300")
    ("Accept-Charset" "ISO-8859-1,utf-8;q=0.7,*;q=0.7")
    ("Accept-Encoding" "gzip,deflate")
    ("Accept-Language" "en-us,en;q=0.5")
    ("Accept" "image/png,image/*;q=0.8,*/*;q=0.5")
    ("User-Agent" "Mozilla/5.0 [...] Iceweasel/3.0.9 (Debian-3.0.9-1)")
    ("Host" "192.168.0.2:8080")
    ("GET" "/0001.html" "HTTP/1.1"))
   (path "~/public_html/0001.html")
   (status 200))
  (stop "Wed May 13 23:38:17 2009"))
</pre>
<p>
The log is alists of alists, making a hierarchical tree structure that
can be explored with some simple lisp functions. Normally this sort of
thing is done with XML, but lisp already has its own structured
format: lists!
</p>
<p>
When <code>GET</code> is a directory, it looks for
"<code>index.html</code>" and serves that if it exists. More indexes
can be added to the variable <code>httpd-indexes</code>. This can
actually be done in a special "<code>.htaccess.el</code>" file.
</p>
<p>
If a "<code>.htaccess.el</code>" exists in the directory from which a
file is being served, Emacs will first load/execute it. You see, it's
just a lisp program. If you wanted to add a new index file name, the
hypertext access file could contain this,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">add-to-list</span> <span class="ss">'httpd-indexes</span> <span class="s">"0001.html"</span><span class="p">)</span></code></pre></figure>
<p>
It's a bit like a <code>.emacs</code> file.
</p>
<p>
But I think one of the coolest things about having a lisp-based server
is that the server can be modified in place without disrupting or
restarting it. In my Emacs web server, the only change that requires a
restart is changing the server port. In fact, I wrote most of it while
the server was running and tested my changes from a browser right as I
made them — all on the same instance of the server.
</p>
<p>
If you want to look into the AI side of this, the server could modify
its own code in response to its use.
</p>
<p>
I also had the idea of creating dynamic websites with elisp, in the
same way PHP or Perl does. If a <code>.el</code> file (or
<code>.elc</code>) is accessed, the server would pass the
<code>GET</code>/<code>POST</code> arguments as an alist to a function
in the elisp file. The server would also provide some nifty HTML
generation macros. A dynamic script might look like this,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">defun</span> <span class="nv">script</span> <span class="p">(</span><span class="nb">get</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">html</span>
   <span class="p">(</span><span class="nv">head</span>
    <span class="p">(</span><span class="nv">title</span> <span class="s">"My Script"</span><span class="p">))</span>
   <span class="p">(</span><span class="nv">body</span>
    <span class="p">(</span><span class="nv">h1</span> <span class="s">"Your Query"</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">p</span> <span class="p">(</span><span class="nv">concat</span> <span class="s">"Your query was "</span>
               <span class="p">(</span><span class="nv">html-sanitize</span> <span class="p">(</span><span class="nb">cdr</span> <span class="p">(</span><span class="nb">assoc</span> <span class="s">"q"</span> <span class="nb">get</span><span class="p">))</span> <span class="s">"."</span><span class="p">))))))</span></code></pre></figure>
<p>
However, this is not (yet?) implemented. Just an idea.
</p>
<p>
I will continue to work on it, though I don't expect to add much more
to it. I will mostly improve the code and documentation.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  

</feed>
