<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title>Articles tagged emacs at null program</title>
  <link rel="alternate" type="text/html"
        href="https://nullprogram.com/tags/emacs/"/>
  <link rel="self" type="application/atom+xml"
        href="https://nullprogram.com/tags/emacs/feed/"/>
  <updated>2026-04-09T13:25:45Z</updated>
  <id>urn:uuid:3d01fe0a-7c1c-475c-b07c-47b7b19e8870</id>

  <author>
    <name>Christopher Wellons</name>
    <uri>https://nullprogram.com</uri>
    <email>wellons@nullprogram.com</email>
  </author>

  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>A Makefile for Emacs Packages</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2020/01/22/"/>
    <id>urn:uuid:2e138ef3-bc68-4115-bb84-af260db641c0</id>
    <updated>2020-01-22T02:54:41Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>Each of my Emacs packages has a Makefile to byte-compile all source
files, run the tests, build a package file, and, in some cases, run the
package in an interactive, temporary, isolated Emacs instance. These
<a href="/blog/2017/08/20/">portable Makefiles</a> have a similar structure and follow the same
conventions. It would require more thought and feedback before I’d try
to make it a <em>standard</em>, but these are conventions I’d like to see in
other package Makefiles.</p>

<p>Here’s an incomplete list of examples:</p>

<ul>
  <li><a href="https://github.com/skeeto/bitpack/blob/master/Makefile">https://github.com/skeeto/bitpack/blob/master/Makefile</a></li>
  <li><a href="https://github.com/skeeto/cplx/blob/master/Makefile">https://github.com/skeeto/cplx/blob/master/Makefile</a></li>
  <li><a href="https://github.com/skeeto/devdocs-lookup/blob/master/Makefile">https://github.com/skeeto/devdocs-lookup/blob/master/Makefile</a></li>
  <li><a href="https://github.com/skeeto/elfeed/blob/master/Makefile">https://github.com/skeeto/elfeed/blob/master/Makefile</a></li>
  <li><a href="https://github.com/skeeto/emacs-aio/blob/master/Makefile">https://github.com/skeeto/emacs-aio/blob/master/Makefile</a></li>
  <li><a href="https://github.com/skeeto/emacs-bencode/blob/master/Makefile">https://github.com/skeeto/emacs-bencode/blob/master/Makefile</a></li>
  <li><a href="https://github.com/skeeto/emacs-memoize/blob/master/Makefile">https://github.com/skeeto/emacs-memoize/blob/master/Makefile</a></li>
  <li><a href="https://github.com/skeeto/emacs-web-server/blob/master/Makefile">https://github.com/skeeto/emacs-web-server/blob/master/Makefile</a></li>
  <li><a href="https://github.com/skeeto/impatient-mode/blob/master/Makefile">https://github.com/skeeto/impatient-mode/blob/master/Makefile</a></li>
  <li><a href="https://github.com/skeeto/lcg128/blob/master/Makefile">https://github.com/skeeto/lcg128/blob/master/Makefile</a></li>
  <li><a href="https://github.com/skeeto/nasm-mode/blob/master/Makefile">https://github.com/skeeto/nasm-mode/blob/master/Makefile</a></li>
  <li><a href="https://github.com/skeeto/skewer-mode/blob/master/Makefile">https://github.com/skeeto/skewer-mode/blob/master/Makefile</a></li>
  <li><a href="https://github.com/skeeto/x86-lookup/blob/master/Makefile">https://github.com/skeeto/x86-lookup/blob/master/Makefile</a></li>
</ul>

<p>You should make a habit of compiling your Emacs Lisp files even if you
don’t think you need the performance. The byte-compiler, while
<a href="/blog/2019/02/24/">dumb</a>, does <a href="/blog/2016/12/22/">static analysis</a> and may spot bugs and other
issues early.</p>

<p>First things first: Every portable Makefile starts with a special
target, <code class="language-plaintext highlighter-rouge">.POSIX</code>, to request standard behavior. This is followed by
macro definitions. When compiling a C program, the <code class="language-plaintext highlighter-rouge">CC</code> macro is the
name of the compiler. Analogously, when compiling Emacs packages the
<code class="language-plaintext highlighter-rouge">EMACS</code> macro is the name of the Emacs program.</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">.POSIX</span><span class="o">:</span>
<span class="nv">EMACS</span> <span class="o">=</span> emacs
</code></pre></div></div>

<p>Users can now override the macro to specify alternate Emacs binaries. I
use this all the time to test my packages under different versions of
Emacs.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ make clean
$ make EMACS=emacs-24.3 check
$ make clean
$ make EMACS=emacs-25.1 check
</code></pre></div></div>

<p>Note: It’s common to use <code class="language-plaintext highlighter-rouge">?=</code> assignment here, but that is both
non-standard and unnecessary. If you want to override macro definitions
from the environment, use the <code class="language-plaintext highlighter-rouge">-e</code> option:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ export EMACS=emacs-24.3
$ make -e
</code></pre></div></div>

<p>The first non-special target in the Makefile is the default target. For
Emacs packages, this target should byte-compile all the source files,
including tests. List the byte-compiled file names as the target
dependencies:</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">compile</span><span class="o">:</span> <span class="nf">foo.elc foo-test.elc</span>
</code></pre></div></div>

<p>Now for the tedious part: Define the dependencies between your different
source files. It would be nice to automate this part somehow, but
fortunately most packages just aren’t that complicated. You do not need
to list trivial dependencies — i.e. mapping each .el file to its .elc
file — since make will figure that out on its own.</p>

<p>Since <code class="language-plaintext highlighter-rouge">foo-test.elc</code> relies on <code class="language-plaintext highlighter-rouge">foo.elc</code> — it’s testing this file after
all — the relationship must be indicated to make. For single file
packages (one package file, one test file), this is all that’s needed:</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">foo-test.elc</span><span class="o">:</span> <span class="nf">foo.elc</span>
</code></pre></div></div>

<p>I call my testing targets “check” and this target must depend on the
byte-compiled files containing tests. It will transiently depend on the
other package source files because of the previous section.</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">check</span><span class="o">:</span> <span class="nf">foo-test.elc</span>
    <span class="err">$(EMACS)</span> <span class="err">-Q</span> <span class="err">--batch</span> <span class="err">-L</span> <span class="err">.</span> <span class="err">-l</span> <span class="err">foo-test.elc</span> <span class="err">-f</span> <span class="err">ert-run-tests-batch</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">-Q</code> option runs Emacs with “minimum customizations.” The <code class="language-plaintext highlighter-rouge">-L .</code>
option puts the current directory in the load path so that <code class="language-plaintext highlighter-rouge">(require
'foo</code>) will work. Finally it loads the file containing the tests and
instructs ERT to run all defined tests.</p>

<p>A good build can clean up after itself:</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">clean</span><span class="o">:</span>
    <span class="err">rm</span> <span class="err">-f</span> <span class="err">foo.elc</span> <span class="err">foo-test.elc</span>
</code></pre></div></div>

<p>Finally we need one more thing to tie it all together: an inference rule
to teach make how to compile .elc files from .el files.</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">.SUFFIXES</span><span class="o">:</span> <span class="nf">.el .elc</span>
<span class="nl">.el.elc</span><span class="o">:</span>
    <span class="err">$(EMACS)</span> <span class="err">-Q</span> <span class="err">--batch</span> <span class="err">-L</span> <span class="err">.</span> <span class="err">-f</span> <span class="err">batch-byte-compile</span> <span class="err">$&lt;</span>
</code></pre></div></div>

<p>This is similar to the “check” target, but compiles a source file
instead of running tests.</p>

<p>For simple, single source file packages, this is all you need!</p>

<h3 id="complex-packages">Complex packages</h3>

<p>My most complex package is Elfeed which has 10 source files and 4 test
files. It also includes a target to build a package file, which I would
upload to Marmalade when it was still functioning. I did a few extra
things to keep this tidy.</p>

<p>First, I define the package version in the Makefile:</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">VERSION</span> <span class="o">=</span> 1.2.3
</code></pre></div></div>

<p>It would be nice to grab this information from a reliable place (Git
tag, source file, etc.), but I never found a reliable and satisfactory
way to do this. Simple wins.</p>

<p>To avoid repeating myself, I list the source files in a macro as well:</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">EL</span>   <span class="o">=</span> foo-a.el foo-b.el foo-c.el
<span class="nv">DOC</span>  <span class="o">=</span> README.md
<span class="nv">TEST</span> <span class="o">=</span> foo-test.el
</code></pre></div></div>

<p>These will still need to have all their interdependencies individually
defined for make. For example, if C depends on both A and B, but neither
A nor B depend on each other, this is all you’d need:</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">foo-c.elc</span><span class="o">:</span> <span class="nf">foo-a.elc foo-b.elc</span>
</code></pre></div></div>

<p>Done correctly you can perform parallel builds with the non-standard but
common <code class="language-plaintext highlighter-rouge">-j</code> make option. This is pretty nice since Emacs can’t do
parallel builds itself.</p>

<p>I use the file list macros in the “compile” and “check” targets:</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">compile</span><span class="o">:</span> <span class="nf">$(EL:.el=.elc) $(TEST:.el=.elc)</span>
<span class="nl">test</span><span class="o">:</span> <span class="nf">$(TEST:.el=.elc)</span>
</code></pre></div></div>

<p>The “package” target copies everything under a directory and tars it up.
The directory is removed first, if it exists, so that any potenntial
leftover garbage from doesn’t get included.</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">package</span><span class="o">:</span> <span class="nf">foo-$(VERSION).tar</span>
<span class="nl">foo-$(VERSION).tar</span><span class="o">:</span> <span class="nf">$(EL) $(DOC)</span>
    <span class="err">rm</span> <span class="err">-rf</span> <span class="err">foo-$(VERSION)/</span>
    <span class="err">mkdir</span> <span class="err">foo-$(VERSION)/</span>
    <span class="err">cp</span> <span class="err">$(EL)</span> <span class="err">$(DOC)</span> <span class="err">foo-$(VERSION)/</span>
    <span class="err">tar</span> <span class="err">cf</span> <span class="err">$@</span> <span class="err">foo-$(VERSION)/</span>
    <span class="err">rm</span> <span class="err">-rf</span> <span class="err">foo-$(VERSION)/</span>
</code></pre></div></div>

<p>In Elfeed, the target to test in an interactive, temporary Emacs
instance is called “virtual”. In Skewer it’s called “run”. The name of
the target and the specific rules will depend on the package, should you
even want this target at all. It’s handy to have the option test without
my own configuration contaminating Emacs, and vice versa. When people
report issues, I can also direct them to reproduce their issue in the
clean environment.</p>

<p>Here’s what a simple “run” target might look like:</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">run</span><span class="o">:</span> <span class="nf">$(EL:.el=.elc)</span>
    <span class="err">$(EMACS)</span> <span class="err">-Q</span> <span class="err">-L</span> <span class="err">.</span> <span class="err">-l</span> <span class="err">foo-c.elc</span> <span class="err">-f</span> <span class="err">foo-mode</span>
</code></pre></div></div>

<p>Make is not really designed to run interactive programs like this, but
it works in practice.</p>

<h3 id="dependencies">Dependencies</h3>

<p>What about packages with dependencies? I’ve used <a href="https://github.com/cask/cask">Cask</a> in the
past but was never satisfied, especially when integrating it into a
Makefile. So, again, I’ve opted for the dumb-but-reliable option:
request that dependencies are cloned in adjacent directories matching
the dependency’s package name. For example, the <a href="/blog/2014/02/06/">EmacSQL</a> Makefile
header:</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Clone the dependencies of this package in sibling directories:
#     $ git clone https://github.com/cbbrowne/pg.el ../pg
</span></code></pre></div></div>

<p>I also define a new “linker flags” macro, <code class="language-plaintext highlighter-rouge">LDFLAGS</code>. Like with <code class="language-plaintext highlighter-rouge">EMACS</code>,
this lets users override it if needed:</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">LDFLAGS</span> <span class="o">=</span> <span class="nt">-L</span> ../pg
</code></pre></div></div>

<p>Everywhere I use <code class="language-plaintext highlighter-rouge">-L .</code> I also include <code class="language-plaintext highlighter-rouge">$(LDFLAGS)</code>. For example, in the
inference rule:</p>

<div class="language-make highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">.SUFFIXES</span><span class="o">:</span> <span class="nf">.el .elc</span>
<span class="nl">.el.elc</span><span class="o">:</span>
    <span class="err">$(EMACS)</span> <span class="err">-Q</span> <span class="err">--batch</span> <span class="err">-L</span> <span class="err">.</span> <span class="err">$(LDFLAGS)</span> <span class="err">-f</span> <span class="err">batch-byte-compile</span> <span class="err">$&lt;</span>
</code></pre></div></div>

<p>If the dependencies follow these conventions, then these can also be
compiled in a recursive way with little effort:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ make -C ../pg
</code></pre></div></div>

<p>I’m not completely satisfied with this solution, particularly since it’s
an odd burden on anyone using the Makefile, but it’s worked well enough
for my needs. This is when I wish Emacs had <a href="/blog/2020/01/21/#package-management">distributed package
management</a>.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Efficient Alias of a Built-In Emacs Lisp Function</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/12/10/"/>
    <id>urn:uuid:15421609-2681-4b75-99b2-b2d6aaa835fe</id>
    <updated>2019-12-10T02:32:04Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="optimization"/>
    <content type="html">
      <![CDATA[<p>Suppose you don’t like the names <code class="language-plaintext highlighter-rouge">car</code> and <code class="language-plaintext highlighter-rouge">cdr</code>, the traditional
identifiers for two halves of a lisp cons cell. <a href="https://irreal.org/blog/?p=8500">This is
misguided.</a> A cons is really just a 2-tuple, and the halves
don’t have any particular meaning on their own, even as “head” and
“tail.” However, maybe this is really important to you so you want to
do it anyway. What’s the best way to go about it?</p>

<h3 id="defalias">defalias</h3>

<p>Emacs Lisp has a built-in function just for this, <code class="language-plaintext highlighter-rouge">defalias</code>, which
is the obvious choice.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defalias</span> <span class="ss">'car-alias</span> <span class="nf">#'</span><span class="nb">car</span><span class="p">)</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">car</code> built-in function is so fundamental to the language that <a href="/blog/2014/01/04/">it
gets its own byte-code opcode</a>. When you call <code class="language-plaintext highlighter-rouge">car</code> in your code,
the byte-compiler doesn’t generate a function call, but instead uses a
single instruction. For example, here’s an <code class="language-plaintext highlighter-rouge">add</code> function that sums
the <code class="language-plaintext highlighter-rouge">car</code> of its two arguments. I’ve followed the definition with its
disassembly (Emacs 26.3, <a href="/blog/2016/12/22/">lexical scope</a>):</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">add</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">a</span><span class="p">)</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">b</span><span class="p">)))</span>
<span class="c1">;; 0       stack-ref 1</span>
<span class="c1">;; 1       car</span>
<span class="c1">;; 2       stack-ref 1</span>
<span class="c1">;; 3       car</span>
<span class="c1">;; 4       plus</span>
<span class="c1">;; 5       return</span>
</code></pre></div></div>

<p>There are zero function calls because of the dedicated <code class="language-plaintext highlighter-rouge">car</code> opcode, and
it has the optimal six byte-code instructions.</p>

<p>The problem with <code class="language-plaintext highlighter-rouge">defalias</code> is that the definition is permitted change
— or <a href="/blog/2013/01/22/">be advised</a> — and that robs the byte-compiler of
optimization opportunities. It’s <a href="/blog/2019/12/09/">a constraint</a>. When the
byte-code compiler sees <code class="language-plaintext highlighter-rouge">car-alias</code>, it <em>must</em> emit a function call:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">add-alias</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nv">car-alias</span> <span class="nv">a</span><span class="p">)</span> <span class="p">(</span><span class="nv">car-alias</span> <span class="nv">b</span><span class="p">)))</span>
<span class="c1">;; 0       constant  car-alias</span>
<span class="c1">;; 1       stack-ref 2</span>
<span class="c1">;; 2       call      1</span>
<span class="c1">;; 3       constant  car-alias</span>
<span class="c1">;; 4       stack-ref 2</span>
<span class="c1">;; 5       call      1</span>
<span class="c1">;; 6       plus</span>
<span class="c1">;; 7       return</span>
</code></pre></div></div>

<p>This has two function calls and eight byte-code instructions. Those
function calls are significantly more expensive than a <code class="language-plaintext highlighter-rouge">car</code>
instruction, which will show in the benchmark later.</p>

<h3 id="defsubst">defsubst</h3>

<p>An alternative is <code class="language-plaintext highlighter-rouge">defsubst</code>, an inlined function definition, which
will inline an actual <code class="language-plaintext highlighter-rouge">car</code>. The semantics for <code class="language-plaintext highlighter-rouge">defsubst</code> are, like
macros, explicit that re-definitions may not affect previous uses, so
the constraint is gone. Unfortunately <a href="/blog/2019/02/24/">the byte-code compiler is
pretty dumb</a>, and does a poor job inlining <code class="language-plaintext highlighter-rouge">car-subst</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defsubst</span> <span class="nv">car-subst</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">car</span> <span class="nv">x</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">add-subst</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nv">car-subst</span> <span class="nv">a</span><span class="p">)</span> <span class="p">(</span><span class="nv">car-subst</span> <span class="nv">b</span><span class="p">)))</span>
<span class="c1">;; 0       stack-ref 1</span>
<span class="c1">;; 1       dup</span>
<span class="c1">;; 2       car</span>
<span class="c1">;; 3       stack-set 1</span>
<span class="c1">;; 5       stack-ref 1</span>
<span class="c1">;; 6       dup</span>
<span class="c1">;; 7       car</span>
<span class="c1">;; 8       stack-set 1</span>
<span class="c1">;; 10      plus</span>
<span class="c1">;; 11      return</span>
</code></pre></div></div>

<p>There are zero function calls and ten byte-code instructions. The
<code class="language-plaintext highlighter-rouge">car</code> opcode <em>is</em> in use, but there are five unnecessary instructions.
This is still faster than making the function calls, though. If the
byte-code compiler was just a little smarter and could compile this to
the ideal case, then this would be the end of the discussion.</p>

<h3 id="cl-first">cl-first</h3>

<p>The built-in <code class="language-plaintext highlighter-rouge">cl-lib</code> package has a <code class="language-plaintext highlighter-rouge">cl-first</code> alias for <code class="language-plaintext highlighter-rouge">car</code>. This was
written by someone with intimate knowledge of Emacs Lisp, so how how
well did they do?</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'cl-lib</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">add-cl-first</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nv">cl-first</span> <span class="nv">a</span><span class="p">)</span> <span class="p">(</span><span class="nv">cl-first</span> <span class="nv">b</span><span class="p">)))</span>
<span class="c1">;; 0       stack-ref 1</span>
<span class="c1">;; 1       car</span>
<span class="c1">;; 2       stack-ref 1</span>
<span class="c1">;; 3       car</span>
<span class="c1">;; 4       plus</span>
<span class="c1">;; 5       return</span>
</code></pre></div></div>

<p>It’s just like plain old <code class="language-plaintext highlighter-rouge">car</code>! How did they manage this? By using a
byte-compiler hint:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defalias</span> <span class="ss">'cl-first</span> <span class="ss">'car</span><span class="p">)</span>
<span class="p">(</span><span class="nv">put</span> <span class="ss">'cl-first</span> <span class="ss">'byte-optimizer</span> <span class="ss">'byte-compile-inline-expand</span><span class="p">)</span>
</code></pre></div></div>

<p>They used <code class="language-plaintext highlighter-rouge">defalias</code>, but they also manually told the byte-compiler to
inline the definition like <code class="language-plaintext highlighter-rouge">defsubst</code>. In fact, <code class="language-plaintext highlighter-rouge">defsubst</code> expands to an
expression that sets <code class="language-plaintext highlighter-rouge">byte-compile-inline-expand</code>, but, as seen above,
the inline function overhead gets inlined and doesn’t get eliminated.</p>

<h3 id="benchmark">Benchmark</h3>

<p>So how do the alternatives perform? (<a href="https://gist.github.com/skeeto/36baa3b1493f53eab4e082b449448a96">benchmark source</a>)</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>add           (0.594811299 0 0.0)
add-alias     (1.232037132 0 0.0)
add-subst     (0.700044324 0 0.0)
add-cl-first  (0.58332882 0 0.0)
</code></pre></div></div>

<p>(The <code class="language-plaintext highlighter-rouge">car</code> of the list is the running time.) Since <code class="language-plaintext highlighter-rouge">add</code> and
<code class="language-plaintext highlighter-rouge">add-cl-first</code> have the same byte-codes, we shouldn’t, and didn’t, see
a significant difference. The simple use of <code class="language-plaintext highlighter-rouge">defalias</code> doubles the
running time, and using <code class="language-plaintext highlighter-rouge">defsubst</code> is about 15% slower.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>On-the-fly Linear Congruential Generator Using Emacs Calc</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/11/19/"/>
    <id>urn:uuid:13e56720-ef3a-4fa4-a4ff-0a6fef914504</id>
    <updated>2019-11-19T01:17:50Z</updated>
    <category term="emacs"/><category term="crypto"/><category term="optimization"/><category term="c"/><category term="java"/><category term="javascript"/>
    <content type="html">
      <![CDATA[<p>I regularly make throwaway “projects” and do a surprising amount of
programming in <code class="language-plaintext highlighter-rouge">/tmp</code>. For Emacs Lisp, the equivalent is the
<code class="language-plaintext highlighter-rouge">*scratch*</code> buffer. These are places where I can make a mess, and the
mess usually gets cleaned up before it becomes a problem. A lot of my
established projects (<a href="/blog/2019/03/22/">ex</a>.) start out in volatile storage and
only graduate to more permanent storage once the concept has proven
itself.</p>

<p>Throughout my whole career, this sort of throwaway experimentation has
been an important part of my personal growth, and I try to <a href="/blog/2016/09/02/">encourage it
in others</a>. Even if the idea I’m trying doesn’t pan out, I usually
learn something new, and occasionally it translates into an article here.</p>

<p>I also enjoy small programming challenges. One of the most abused
tools in my mental toolbox is the Monte Carlo method, and I readily
apply it to solve toy problems. Even beyond this, random number
generators are frequently a useful tool (<a href="/blog/2017/04/27/">1</a>, <a href="/blog/2019/07/22/">2</a>), so I
find myself reaching for one all the time.</p>

<p>Nearly every programming language comes with a pseudo-random number
generation function or library. Unfortunately the language’s standard
PRNG is usually a poor choice (C, <a href="https://arvid.io/2018/06/30/on-cxx-random-number-generator-quality/">C++</a>, <a href="https://lowleveldesign.org/2018/08/15/randomness-in-net/">C#</a>, <a href="https://grokbase.com/t/gg/golang-nuts/155f6kbb7a/go-nuts-why-are-high-bits-used-by-math-rand-helpers-instead-of-low-ones">Go</a>).
It’s probably mediocre quality, <a href="/blog/2018/05/27/">slower than it needs to be</a>
(<a href="https://grokbase.com/t/gg/golang-nuts/155f6kbb7a/go-nuts-why-are-high-bits-used-by-math-rand-helpers-instead-of-low-ones">also</a>), <a href="https://lists.freebsd.org/pipermail/svn-src-head/2013-July/049068.html">lacks reliable semantics or behavior between
implementations</a>, or is missing some other property I want. So I’ve
long been a fan of <em>BYOPRNG:</em> Bring Your Own Pseudo-random Number
Generator. Just embed a generator with the desired properties directly
into the program. The <a href="/blog/2017/09/21/">best non-cryptographic PRNGs today</a> are
tiny and exceptionally friendly to embedding. Though, depending on what
you’re doing, you might <a href="/blog/2019/04/30/">need to be creative about seeding</a>.</p>

<h3 id="crafting-a-prng">Crafting a PRNG</h3>

<p>On occasion I don’t have an established, embeddable PRNG in reach, and
I have yet to commit xoshiro256** to memory. Or maybe I want to use
a totally unique PRNG for a particular project. In these cases I make
one up. With just a bit of know-how it’s not too difficult.</p>

<p>Probably the easiest decent PRNG to code from scratch is the venerable
<a href="https://en.wikipedia.org/wiki/Linear_congruential_generator">Linear Congruential Generator</a> (LCG). It’s a simple recurrence
relation:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>x[1] = (x[0] * A + C) % M
</code></pre></div></div>

<p>That’s trivial to remember once you know the details. You only need to
choose appropriate values for <code class="language-plaintext highlighter-rouge">A</code>, <code class="language-plaintext highlighter-rouge">C</code>, and <code class="language-plaintext highlighter-rouge">M</code>. Done correctly, it
will be a <em>full-period</em> generator — a generator that visits a
permutation of each of the numbers between 0 and <code class="language-plaintext highlighter-rouge">M - 1</code>. The seed —
the value of <code class="language-plaintext highlighter-rouge">x[0]</code> — is chooses a starting position in this (looping)
permutation.</p>

<p><code class="language-plaintext highlighter-rouge">M</code> has a natural, obvious choice: a power of two matching the range of
operands, such as 2^32 or 2^64. With this the modulo operation is free
as a natural side effect of the computer architecture.</p>

<p>Choosing <code class="language-plaintext highlighter-rouge">C</code> also isn’t difficult. It must be co-prime with <code class="language-plaintext highlighter-rouge">M</code>, and
since <code class="language-plaintext highlighter-rouge">M</code> is a power of two, any odd number is valid. Even 1. In
theory choosing a small value like 1 is faster since the compiler
won’t need to embed a large integer in the code, but this difference
doesn’t show up in any micro-benchmarks I tried. If you want a cool,
unique generator, then choose a large random integer. More on that
below.</p>

<p>The tricky value is <code class="language-plaintext highlighter-rouge">A</code>, and getting it right is the linchpin of the
whole LCG. It must be coprime with <code class="language-plaintext highlighter-rouge">M</code> (i.e. not even), and, for a
full-period generator, <code class="language-plaintext highlighter-rouge">A-1</code> must be divisible by four. For better
results, <code class="language-plaintext highlighter-rouge">A-1</code> should not be divisible by 8. A good choice is a prime
number that satisfies these properties.</p>

<p>If your operands are 64-bit integers, or larger, how are you going to
generate a prime number?</p>

<h4 id="primes-from-emacs-calc">Primes from Emacs Calc</h4>

<p>Emacs Calc can solve this problem. I’ve <a href="/blog/2009/06/23/">noted before</a> how
featureful it is. It has arbitrary precision, random number
generation, and primality testing. It’s everything we need to choose
<code class="language-plaintext highlighter-rouge">A</code>. (In fact, this is nearly identical to <a href="/blog/2015/10/30/">the process I used to
implement RSA</a>.) For this example I’m going to generate a 64-bit
LCG for the C programming language, but it’s easy to use whatever
width you like and mostly whatever language you like. If you wanted a
<a href="http://www.pcg-random.org/posts/does-it-beat-the-minimal-standard.html">minimal standard 128-bit LCG</a>, this will still work.</p>

<p>Start by opening up Calc with <code class="language-plaintext highlighter-rouge">M-x calc</code>, then:</p>

<ol>
  <li>Push <code class="language-plaintext highlighter-rouge">2</code> on the stack</li>
  <li>Push <code class="language-plaintext highlighter-rouge">64</code> on the stack</li>
  <li>Press <code class="language-plaintext highlighter-rouge">^</code>, computing 2^64 and pushing it on the stack</li>
  <li>Press <code class="language-plaintext highlighter-rouge">k r</code> to generate a random number in this range</li>
  <li>Press <code class="language-plaintext highlighter-rouge">d r 16</code> to switch to hexadecimal display</li>
  <li>Press <code class="language-plaintext highlighter-rouge">k n</code> to find the next prime following the random value</li>
  <li>Repeat step 6 until you get a number that ends with <code class="language-plaintext highlighter-rouge">5</code> or <code class="language-plaintext highlighter-rouge">D</code></li>
  <li>Press <code class="language-plaintext highlighter-rouge">k p</code> a few times to avoid false positives.</li>
</ol>

<p>What’s left on the stack is your <code class="language-plaintext highlighter-rouge">A</code>! If you want a random value for
<code class="language-plaintext highlighter-rouge">C</code>, you can follow a similar process. Heck, make it prime, too!</p>

<p>The reason for using hexadecimal (step 5) and looking for <code class="language-plaintext highlighter-rouge">5</code> or <code class="language-plaintext highlighter-rouge">D</code>
(step 7) is that such numbers satisfy both of the important properties
for <code class="language-plaintext highlighter-rouge">A-1</code>.</p>

<p>Calc doesn’t try to factor your random integer. Instead it uses the
<a href="https://en.wikipedia.org/wiki/Miller%E2%80%93Rabin_primality_test">Miller–Rabin primality test</a>, a probabilistic test that, itself,
requires random numbers. It has false positives but no false negatives.
The false positives can be mitigated by repeating the test multiple
times, hence step 8.</p>

<p>Trying this all out right now, I got this implementation (in C):</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint64_t</span> <span class="nf">lcg1</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">static</span> <span class="kt">uint64_t</span> <span class="n">s</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">s</span> <span class="o">=</span> <span class="n">s</span><span class="o">*</span><span class="n">UINT64_C</span><span class="p">(</span><span class="mh">0x7c3c3267d015ceb5</span><span class="p">)</span> <span class="o">+</span> <span class="n">UINT64_C</span><span class="p">(</span><span class="mh">0x24bd2d95276253a9</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">s</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>However, we can still do a little better. Outputting the entire state
doesn’t have great results, so instead it’s better to create a
<em>truncated</em> LCG and only return some portion of the most significant
bits.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span> <span class="nf">lcg2</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">static</span> <span class="kt">uint64_t</span> <span class="n">s</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">s</span> <span class="o">=</span> <span class="n">s</span><span class="o">*</span><span class="n">UINT64_C</span><span class="p">(</span><span class="mh">0x7c3c3267d015ceb5</span><span class="p">)</span> <span class="o">+</span> <span class="n">UINT64_C</span><span class="p">(</span><span class="mh">0x24bd2d95276253a9</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">s</span> <span class="o">&gt;&gt;</span> <span class="mi">32</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This won’t quite pass <a href="http://simul.iro.umontreal.ca/testu01/tu01.html">BigCrush</a> in 64-bit form, but the results
are pretty reasonable for most purposes.</p>

<p>But we can still do better without needing to remember much more than
this.</p>

<h3 id="appending-permutation">Appending permutation</h3>

<p>A <a href="http://www.pcg-random.org/">Permuted Congruential Generator</a> (PCG) is really just a
truncated LCG with a permutation applied to its output. Like LCGs
themselves, there are arbitrarily many variations. The “official”
implementation has a <a href="/blog/2018/02/07/">data-dependent shift</a>, for which I can
never remember the details. Fortunately a couple of simple, easy to
remember transformations is sufficient. Basically anything I used
<a href="/blog/2018/07/31/">while prospecting for hash functions</a>. I love xorshifts, so
lets add one of those:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span> <span class="nf">pcg1</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">static</span> <span class="kt">uint64_t</span> <span class="n">s</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">s</span> <span class="o">=</span> <span class="n">s</span><span class="o">*</span><span class="n">UINT64_C</span><span class="p">(</span><span class="mh">0x7c3c3267d015ceb5</span><span class="p">)</span> <span class="o">+</span> <span class="n">UINT64_C</span><span class="p">(</span><span class="mh">0x24bd2d95276253a9</span><span class="p">);</span>
    <span class="kt">uint32_t</span> <span class="n">r</span> <span class="o">=</span> <span class="n">s</span> <span class="o">&gt;&gt;</span> <span class="mi">32</span><span class="p">;</span>
    <span class="n">r</span> <span class="o">^=</span> <span class="n">r</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This is a big improvement, but it still fails one BigCrush test. As
they say, when xorshift isn’t enough, use xorshift-multiply! Below I
generated a 32-bit prime for the multiply, but any odd integer is a
valid permutation.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span> <span class="nf">pcg2</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">static</span> <span class="kt">uint64_t</span> <span class="n">s</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">s</span> <span class="o">=</span> <span class="n">s</span><span class="o">*</span><span class="n">UINT64_C</span><span class="p">(</span><span class="mh">0x7c3c3267d015ceb5</span><span class="p">)</span> <span class="o">+</span> <span class="n">UINT64_C</span><span class="p">(</span><span class="mh">0x24bd2d95276253a9</span><span class="p">);</span>
    <span class="kt">uint32_t</span> <span class="n">r</span> <span class="o">=</span> <span class="n">s</span> <span class="o">&gt;&gt;</span> <span class="mi">32</span><span class="p">;</span>
    <span class="n">r</span> <span class="o">^=</span> <span class="n">r</span> <span class="o">&gt;&gt;</span> <span class="mi">16</span><span class="p">;</span>
    <span class="n">r</span> <span class="o">*=</span> <span class="n">UINT32_C</span><span class="p">(</span><span class="mh">0x60857ba9</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This passes BigCrush, and I can reliably build a new one entirely from
scratch using Calc any time I need it.</p>

<h3 id="bonus-adapting-to-other-languages">Bonus: Adapting to other languages</h3>

<p>Sometimes it’s not so straightforward to adapt this technique to other
languages. For example, JavaScript has limited support for 32-bit
integer operations (enough for a poor 32-bit LCG) and no 64-bit
integer operations. Though <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt">BigInt</a> is now a thing, and should
make a great 96- or 128-bit LCG easy to build.</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">function</span> <span class="nx">lcg</span><span class="p">(</span><span class="nx">seed</span><span class="p">)</span> <span class="p">{</span>
    <span class="kd">let</span> <span class="nx">s</span> <span class="o">=</span> <span class="nx">BigInt</span><span class="p">(</span><span class="nx">seed</span><span class="p">);</span>
    <span class="k">return</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
        <span class="nx">s</span> <span class="o">*=</span> <span class="mh">0xef725caa331524261b9646cd</span><span class="nx">n</span><span class="p">;</span>
        <span class="nx">s</span> <span class="o">+=</span> <span class="mh">0x213734f2c0c27c292d814385</span><span class="nx">n</span><span class="p">;</span>
        <span class="nx">s</span> <span class="o">&amp;=</span> <span class="mh">0xffffffffffffffffffffffff</span><span class="nx">n</span><span class="p">;</span>
        <span class="k">return</span> <span class="nb">Number</span><span class="p">(</span><span class="nx">s</span> <span class="o">&gt;&gt;</span> <span class="mi">64</span><span class="nx">n</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Java doesn’t have unsigned integers, so how could you build the above
PCG in Java? Easy! First, remember is that Java has two’s complement
semantics, including wrap around, and that two’s complement doesn’t
care about unsigned or signed for multiplication (or addition, or
subtraction). The result is identical. Second, the oft-forgotten <code class="language-plaintext highlighter-rouge">&gt;&gt;&gt;</code>
operator does an unsigned right shift. With these two tips:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">long</span> <span class="n">s</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span>

<span class="kt">int</span> <span class="nf">pcg2</span><span class="o">()</span> <span class="o">{</span>
    <span class="n">s</span> <span class="o">=</span> <span class="n">s</span><span class="o">*</span><span class="mh">0x7c3c3267d015ceb5</span><span class="no">L</span> <span class="o">+</span> <span class="mh">0x24bd2d95276253a9</span><span class="no">L</span><span class="o">;</span>
    <span class="kt">int</span> <span class="n">r</span> <span class="o">=</span> <span class="o">(</span><span class="kt">int</span><span class="o">)(</span><span class="n">s</span> <span class="o">&gt;&gt;&gt;</span> <span class="mi">32</span><span class="o">);</span>
    <span class="n">r</span> <span class="o">^=</span> <span class="n">r</span> <span class="o">&gt;&gt;&gt;</span> <span class="mi">16</span><span class="o">;</span>
    <span class="n">r</span> <span class="o">*=</span> <span class="mh">0x60857ba9</span><span class="o">;</span>
    <span class="k">return</span> <span class="n">r</span><span class="o">;</span>
<span class="o">}</span>
</code></pre></div></div>

<p>So, in addition to the Calc step list above, you may need to know some
of the finer details of your target language.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>UTF-8 String Indexing Strategies</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/05/29/"/>
    <id>urn:uuid:12e9ed44-b5c1-495f-8750-dfaf1ab008e2</id>
    <updated>2019-05-29T21:52:06Z</updated>
    <category term="elisp"/><category term="emacs"/><category term="go"/><category term="lang"/>
    <content type="html">
      <![CDATA[<p><em>This article was discussed <a href="https://news.ycombinator.com/item?id=20049491">on Hacker News</a>.</em></p>

<p>When designing or, in some cases, implementing a programming language
with built-in support for Unicode strings, an important decision must be
made about how to represent or encode those strings in memory. Not all
representations are equal, and there are trade-offs between different
choices.</p>

<!--more-->

<p>One issue to consider is that strings typically feature random access
indexing of code points with a time complexity resembling constant
time (<code class="language-plaintext highlighter-rouge">O(1)</code>). However, not all string representations actually
support this well. Strings using variable length encoding, such as
UTF-8 or UTF-16, have <code class="language-plaintext highlighter-rouge">O(n)</code> time complexity indexing, ignoring
special cases (discussed below). The most obvious choice to achieve
<code class="language-plaintext highlighter-rouge">O(1)</code> time complexity — an array of 32-bit values, as in UCS-4 —
makes very inefficient use of memory, especially with typical strings.</p>

<p>Despite this, UTF-8 is still chosen in a number of programming
languages, or at least in their implementations. In this article I’ll
discuss three examples — Emacs Lisp, Julia, and Go — and how each takes a
slightly different approach.</p>

<h3 id="emacs-lisp">Emacs Lisp</h3>

<p>Emacs Lisp has two different types of strings that generally can be used
interchangeably: <em>unibyte</em> and <em>multibyte</em>. In fact, the difference
between them is so subtle that I bet that most people writing Emacs Lisp
don’t even realize there are two kinds of strings.</p>

<p>Emacs Lisp uses UTF-8 internally to encode all “multibyte” strings and
buffers. To fully support arbitrary sequences of bytes in the files
being edited, Emacs uses <a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/Text-Representations.html">its own extension of Unicode</a> to
precisely and unambiguously represent raw bytes intermixed with text.
Any arbitrary sequence of bytes can be decoded into Emacs’ internal
representation, then losslessly re-encoded back into the exact same
sequence of bytes.</p>

<p>Unibyte strings and buffers are really just byte-strings. In practice,
they’re essentially ISO/IEC 8859-1, a.k.a. <em>Latin-1</em>. It’s a Unicode
string where all code points are below 256. Emacs prefers the smallest
and simplest string representation when possible, <a href="https://www.python.org/dev/peps/pep-0393/">similar to CPython
3.3+</a>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">multibyte-string-p</span> <span class="s">"hello"</span><span class="p">)</span>
<span class="c1">;; =&gt; nil</span>

<span class="p">(</span><span class="nv">multibyte-string-p</span> <span class="s">"π ≈ 3.14"</span><span class="p">)</span>
<span class="c1">;; =&gt; t</span>
</code></pre></div></div>

<p>Emacs Lisp strings are mutable, and therein lies the kicker: As soon as
you insert a code point above 255, Emacs quietly converts the string to
multibyte.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">fish</span> <span class="s">"fish"</span><span class="p">)</span>

<span class="p">(</span><span class="nv">multibyte-string-p</span> <span class="nv">fish</span><span class="p">)</span>
<span class="c1">;; =&gt; nil</span>

<span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">fish</span> <span class="mi">2</span><span class="p">)</span> <span class="nv">?</span><span class="err">ŝ</span>
      <span class="p">(</span><span class="nb">aref</span> <span class="nv">fish</span> <span class="mi">3</span><span class="p">)</span> <span class="nv">?o</span><span class="p">)</span>

<span class="nv">fish</span>
<span class="c1">;; =&gt; "fiŝo"</span>

<span class="p">(</span><span class="nv">multibyte-string-p</span> <span class="nv">fish</span><span class="p">)</span>
<span class="c1">;; =&gt; t</span>
</code></pre></div></div>

<p>Constant time indexing into unibyte strings is straightforward, and
Emacs does the obvious thing when indexing into unibyte strings. It
helps that most strings in Emacs are probably unibyte, even when the
user isn’t working in English.</p>

<p>Most buffers are multibyte, even if those buffers are generally just
ASCII. Since <a href="/blog/2017/09/07/">Emacs uses gap buffers</a> it generally doesn’t matter:
Nearly all accesses are tightly clustered around the point, so O(n)
indexing doesn’t often matter.</p>

<p>That leaves multibyte strings. Consider these idioms for iterating
across a string in Emacs Lisp:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">i</span> <span class="p">(</span><span class="nb">length</span> <span class="nb">string</span><span class="p">))</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">c</span> <span class="p">(</span><span class="nb">aref</span> <span class="nb">string</span> <span class="nv">i</span><span class="p">)))</span>
    <span class="o">...</span><span class="p">))</span>

<span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">c</span> <span class="nv">being</span> <span class="k">the</span> <span class="nv">elements</span> <span class="nv">of</span> <span class="nb">string</span>
         <span class="o">...</span><span class="p">)</span>
</code></pre></div></div>

<p>The latter expands into essentially the same as the former: An
incrementing index that uses <code class="language-plaintext highlighter-rouge">aref</code> to index to that code point. So is
iterating over a multibyte string — a common operation — an O(n^2)
operation?</p>

<p>The good news is that, at least in this case, no! It’s essentially just
as efficient as iterating over a unibyte string. Before going over why,
consider this little puzzle. Here’s a little string comparison function
that compares two strings a code point at a time, returning their first
difference:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">compare</span> <span class="p">(</span><span class="nv">string-a</span> <span class="nv">string-b</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">a</span> <span class="nv">being</span> <span class="k">the</span> <span class="nv">elements</span> <span class="nv">of</span> <span class="nv">string-a</span>
           <span class="nv">for</span> <span class="nv">b</span> <span class="nv">being</span> <span class="k">the</span> <span class="nv">elements</span> <span class="nv">of</span> <span class="nv">string-b</span>
           <span class="nb">unless</span> <span class="p">(</span><span class="nb">eql</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
           <span class="nb">return</span> <span class="p">(</span><span class="nb">cons</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
</code></pre></div></div>

<p>Let’s examine benchmarks with some long strings (100,000 code points):</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark-run</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">a</span> <span class="p">(</span><span class="nb">make-string</span> <span class="mi">100000</span> <span class="mi">0</span><span class="p">))</span>
          <span class="p">(</span><span class="nv">b</span> <span class="p">(</span><span class="nb">make-string</span> <span class="mi">100000</span> <span class="mi">0</span><span class="p">)))</span>
      <span class="p">(</span><span class="nv">compare</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
<span class="c1">;; =&gt; (0.012568031 0 0.0)</span>
</code></pre></div></div>

<p>With using two, zeroed unibyte strings it takes 13ms. How about changing
the last code point in one of them to 256, converting it to a multibyte
string:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark-run</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">a</span> <span class="p">(</span><span class="nb">make-string</span> <span class="mi">100000</span> <span class="mi">0</span><span class="p">))</span>
          <span class="p">(</span><span class="nv">b</span> <span class="p">(</span><span class="nb">make-string</span> <span class="mi">100000</span> <span class="mi">0</span><span class="p">)))</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">a</span> <span class="p">(</span><span class="nb">1-</span> <span class="p">(</span><span class="nb">length</span> <span class="nv">a</span><span class="p">)))</span> <span class="mi">256</span><span class="p">)</span>
      <span class="p">(</span><span class="nv">compare</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
<span class="c1">;; =&gt; (0.012680513 0 0.0)</span>
</code></pre></div></div>

<p>Same running time, so that multibyte string cost nothing more to iterate
across. Let’s try making them both multibyte:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark-run</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">a</span> <span class="p">(</span><span class="nb">make-string</span> <span class="mi">100000</span> <span class="mi">0</span><span class="p">))</span>
          <span class="p">(</span><span class="nv">b</span> <span class="p">(</span><span class="nb">make-string</span> <span class="mi">100000</span> <span class="mi">0</span><span class="p">)))</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">a</span> <span class="p">(</span><span class="nb">1-</span> <span class="p">(</span><span class="nb">length</span> <span class="nv">a</span><span class="p">)))</span> <span class="mi">256</span>
            <span class="p">(</span><span class="nb">aref</span> <span class="nv">b</span> <span class="p">(</span><span class="nb">1-</span> <span class="p">(</span><span class="nb">length</span> <span class="nv">b</span><span class="p">)))</span> <span class="mi">256</span><span class="p">)</span>
      <span class="p">(</span><span class="nv">compare</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
<span class="c1">;; =&gt; (2.327959762 0 0.0)</span>
</code></pre></div></div>

<p>That took 2.3 seconds: about 2000x longer to run! Iterating over two
multibyte strings concurrently seems to have broken an optimization.
Can you reason about what’s happened?</p>

<p>To avoid the O(n) cost on this common indexing operating, Emacs keeps
a “bookmark” for the last indexing location into a multibyte string.
If the next access is nearby, it can starting looking from this
bookmark, forwards or backwards. Like a gap buffer, this gives a big
advantage to clustered accesses, including iteration.</p>

<p>However, this string bookmark is <em>global</em>, one per Emacs instance, not
once per string. In the last benchmark, the two multibyte strings are
constantly fighting over a single string bookmark, and indexing in
comparison function is reduced to O(n^2) time complexity.</p>

<p>So, Emacs <em>pretends</em> it has constant time access into its UTF-8 text
data, but it’s only faking it with some simple optimizations. This
usually works out just fine.</p>

<h3 id="julia">Julia</h3>

<p>Another approach is to not pretend at all, and to make this limitation
of UTF-8 explicit in the interface. Julia took this approach, and it
<a href="/blog/2014/03/06/">was one of my complaints about the language</a>. I don’t think
this is necessarily a bad choice, but I do still think it’s
inappropriate considering Julia’s target audience (i.e. Matlab users).</p>

<p>Julia strings are explicitly byte strings containing valid UTF-8 data.
All indexing occurs on bytes, which is trivially constant time, and
always decodes the multibyte code point starting at that byte. <em>But</em>
it is an error to index to a byte that doesn’t begin a code point.
That error is also trivially checked in constant time.</p>

<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">s</span> <span class="o">=</span> <span class="s">"π"</span>

<span class="n">s</span><span class="x">[</span><span class="mi">1</span><span class="x">]</span>
<span class="c"># =&gt; 'π'</span>

<span class="n">s</span><span class="x">[</span><span class="mi">2</span><span class="x">]</span>
<span class="c"># ERROR: UnicodeError: invalid character index</span>
<span class="c">#  in getindex at ./strings/basic.jl:37</span>
</code></pre></div></div>

<p>Slices are still over bytes, but they “round up” to the end of the
current code point:</p>

<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">s</span><span class="x">[</span><span class="mi">1</span><span class="o">:</span><span class="mi">1</span><span class="x">]</span>
<span class="c"># =&gt; "π"</span>
</code></pre></div></div>

<p>Iterating over a string requires helper functions which keep an internal
“bookmark” so that each access is constant time:</p>

<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="n">eachindex</span><span class="x">(</span><span class="n">string</span><span class="x">)</span>
    <span class="n">c</span> <span class="o">=</span> <span class="n">string</span><span class="x">[</span><span class="n">i</span><span class="x">]</span>
    <span class="c"># ...</span>
<span class="k">end</span>
</code></pre></div></div>

<p>So Julia doesn’t pretend, it makes the problem explicit.</p>

<h3 id="go">Go</h3>

<p>Go is very similar to Julia, but takes an even more explicit view of
strings. All strings are byte strings and there are no restrictions on
their contents. Conventionally strings contain UTF-8 encoded text, but
this is not strictly required. There’s a <code class="language-plaintext highlighter-rouge">unicode/utf8</code> package for
working with strings containing UTF-8 data.</p>

<p>Beyond convention, the <code class="language-plaintext highlighter-rouge">range</code> clause also assumes the string contains
UTF-8 data, and it’s not an error if it does not. Bytes not containing
valid UTF-8 data appear as a <code class="language-plaintext highlighter-rouge">REPLACEMENT CHARACTER</code> (U+FFFD).</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="n">s</span> <span class="o">:=</span> <span class="s">"π</span><span class="se">\xff</span><span class="s">"</span>
    <span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">r</span> <span class="o">:=</span> <span class="k">range</span> <span class="n">s</span> <span class="p">{</span>
        <span class="n">fmt</span><span class="o">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"U+%04x</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">r</span><span class="p">)</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="c">// U+03c0</span>
<span class="c">// U+fffd</span>
</code></pre></div></div>

<p>A further case of the language favoring UTF-8 is that casting a string
to <code class="language-plaintext highlighter-rouge">[]rune</code> decodes strings into code points, like UCS-4, again using
<code class="language-plaintext highlighter-rouge">REPLACEMENT CHARACTER</code>:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="n">s</span> <span class="o">:=</span> <span class="s">"π</span><span class="se">\xff</span><span class="s">"</span>
    <span class="n">r</span> <span class="o">:=</span> <span class="p">[]</span><span class="kt">rune</span><span class="p">(</span><span class="n">s</span><span class="p">)</span>
    <span class="n">fmt</span><span class="o">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"U+%04x</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">r</span><span class="p">[</span><span class="m">0</span><span class="p">])</span>
    <span class="n">fmt</span><span class="o">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"U+%04x</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">r</span><span class="p">[</span><span class="m">1</span><span class="p">])</span>
<span class="p">}</span>

<span class="c">// U+03c0</span>
<span class="c">// U+fffd</span>
</code></pre></div></div>

<p>So, like Julia, there’s no pretending, and the programmer explicitly
must consider the problem.</p>

<h3 id="preferences">Preferences</h3>

<p>All-in-all I probably prefer how Julia and Go are explicit with
UTF-8’s limitations, rather than Emacs Lisp’s attempt to cover it up
with an internal optimization. Since the abstraction is leaky, it may
as well be made explicit.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  <entry>
    <title>An Async / Await Library for Emacs Lisp</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2019/03/10/"/>
    <id>urn:uuid:5d1462fa-a30d-432e-9a4f-827eb67862b2</id>
    <updated>2019-03-10T20:57:03Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="lisp"/><category term="python"/><category term="javascript"/><category term="lang"/><category term="asyncio"/>
    <content type="html">
      <![CDATA[<p>As part of <a href="/blog/2019/02/24/">building my Python proficiency</a>, I’ve learned how to
use <a href="https://docs.python.org/3/library/asyncio.html">asyncio</a>. This new language feature <a href="https://docs.python.org/3/whatsnew/3.5.html#whatsnew-pep-492">first appeared in
Python 3.5</a> (<a href="https://www.python.org/dev/peps/pep-0492/">PEP 492</a>, September 2015). JavaScript grew <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/async_function">a
nearly identical feature</a> in ES2017 (June 2017). An async function
can pause to await on an asynchronously computed result, much like a
generator pausing when it yields a value.</p>

<p>In fact, both Python and JavaScript async functions are essentially just
fancy generator functions with some specialized syntax and semantics.
That is, they’re <a href="https://blog.varunramesh.net/posts/stackless-vs-stackful-coroutines/">stackless coroutines</a>. Both languages already had
generators, so their generator-like async functions are a natural
extension that — unlike <a href="/blog/2017/06/21/"><em>stackful</em> coroutines</a> — do not require
significant, new runtime plumbing.</p>

<p>Emacs <a href="/blog/2018/05/31/">officially got generators in 25.1</a> (September 2016),
though, unlike Python and JavaScript, it didn’t require any additional
support from the compiler or runtime. It’s implemented entirely using
Lisp macros. In other words, it’s just another library, not a core
language feature. In theory, the generator library could be easily
backported to the first Emacs release to <a href="/blog/2016/12/22/">properly support lexical
closures</a>, Emacs 24.1 (June 2012).</p>

<p>For the same reason, stackless async/await coroutines can also be
implemented as a library. So that’s what I did, letting Emacs’ generator
library do most of the heavy lifting. The package is called <code class="language-plaintext highlighter-rouge">aio</code>:</p>

<ul>
  <li><strong><a href="https://github.com/skeeto/emacs-aio">https://github.com/skeeto/emacs-aio</a></strong></li>
</ul>

<p>It’s modeled more closely on JavaScript’s async functions than Python’s
asyncio, with the core representation being <em>promises</em> rather than a
coroutine objects. I just have an easier time reasoning about promises
than coroutines.</p>

<p>I’m definitely <a href="https://github.com/chuntaro/emacs-async-await">not the first person to realize this was
possible</a>, and was beaten to the punch by two years. Wanting to
<a href="http://www.winestockwebdesign.com/Essays/Lisp_Curse.html">avoid fragmentation</a>, I set aside all formality in my first
iteration on the idea, not even bothering with namespacing my
identifiers. It was to be only an educational exercise. However, I got
quite attached to my little toy. Once I got my head wrapped around the
problem, everything just sort of clicked into place so nicely.</p>

<p>In this article I will show step-by-step one way to build async/await
on top of generators, laying out one concept at a time and then
building upon each. But first, some examples to illustrate the desired
final result.</p>

<h3 id="aio-example">aio example</h3>

<p>Ignoring <a href="/blog/2016/06/16/">all its problems</a> for a moment, suppose you want to use
<code class="language-plaintext highlighter-rouge">url-retrieve</code> to fetch some content from a URL and return it. To keep
this simple, I’m going to omit error handling. Also assume that
<code class="language-plaintext highlighter-rouge">lexical-binding</code> is <code class="language-plaintext highlighter-rouge">t</code> for all examples. Besides, lexical scope
required by the generator library, and therefore also required by <code class="language-plaintext highlighter-rouge">aio</code>.</p>

<p>The most naive approach is to fetch the content synchronously:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">fetch-fortune-1</span> <span class="p">(</span><span class="nv">url</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">buffer</span> <span class="p">(</span><span class="nv">url-retrieve-synchronously</span> <span class="nv">url</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">with-current-buffer</span> <span class="nv">buffer</span>
      <span class="p">(</span><span class="nb">prog1</span> <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">kill-buffer</span><span class="p">)))))</span>
</code></pre></div></div>

<p>The result is returned directly, and errors are communicated by an error
signal (e.g. Emacs’ version of exceptions). This is convenient, but the
function will block the main thread, locking up Emacs until the result
has arrived. This is obviously very undesirable, so, in practice,
everyone nearly always uses the asynchronous version:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">fetch-fortune-2</span> <span class="p">(</span><span class="nv">url</span> <span class="nv">callback</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">url-retrieve</span> <span class="nv">url</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">_status</span><span class="p">)</span>
                      <span class="p">(</span><span class="nb">funcall</span> <span class="nv">callback</span> <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)))))</span>
</code></pre></div></div>

<p>The main thread no longer blocks, but it’s a whole lot less
convenient. The result isn’t returned to the caller, and instead the
caller supplies a callback function. The result, whether success or
failure, will be delivered via callback, so the caller must split
itself into two pieces: the part before the callback and the callback
itself. Errors cannot be delivered using a error signal because of the
inverted flow control.</p>

<p>The situation gets worse if, say, you need to fetch results from two
different URLs. You either fetch results one at a time (inefficient),
or you manage two different callbacks that could be invoked in any
order, and therefore have to coordinate.</p>

<p><em>Wouldn’t it be nice for the function to work like the first example,
but be asynchronous like the second example?</em> Enter async/await:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">aio-defun</span> <span class="nv">fetch-fortune-3</span> <span class="p">(</span><span class="nv">url</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">buffer</span> <span class="p">(</span><span class="nv">aio-await</span> <span class="p">(</span><span class="nv">aio-url-retrieve</span> <span class="nv">url</span><span class="p">))))</span>
    <span class="p">(</span><span class="nv">with-current-buffer</span> <span class="nv">buffer</span>
      <span class="p">(</span><span class="nb">prog1</span> <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">kill-buffer</span><span class="p">)))))</span>
</code></pre></div></div>

<p>A function defined with <code class="language-plaintext highlighter-rouge">aio-defun</code> is just like <code class="language-plaintext highlighter-rouge">defun</code> except that
it can use <code class="language-plaintext highlighter-rouge">aio-await</code> to pause and wait on any other function defined
with <code class="language-plaintext highlighter-rouge">aio-defun</code> — or, more specifically, any function that returns a
promise. Borrowing Python parlance: Returning a promise makes a
function <em>awaitable</em>. If there’s an error, it’s delivered as a error
signal from <code class="language-plaintext highlighter-rouge">aio-url-retrieve</code>, just like the first example. When
called, this function returns immediately with a promise object that
represents a future result. The caller might look like this:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defcustom</span> <span class="nv">fortune-url</span> <span class="o">...</span><span class="p">)</span>

<span class="p">(</span><span class="nv">aio-defun</span> <span class="nv">display-fortune</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">message</span> <span class="s">"%s"</span> <span class="p">(</span><span class="nv">aio-await</span> <span class="p">(</span><span class="nv">fetch-fortune-3</span> <span class="nv">fortune-url</span><span class="p">))))</span>
</code></pre></div></div>

<p>How wonderfully clean that looks! And, yes, it even works with
<code class="language-plaintext highlighter-rouge">interactive</code> like that. I can <code class="language-plaintext highlighter-rouge">M-x display-fortune</code> and a fortune is
printed in the minibuffer as soon as the result arrives from the
server. In the meantime Emacs doesn’t block and I can continue my
work.</p>

<p>You can’t do anything you couldn’t already do before. It’s just a
nicer way to organize the same callbacks: <em>implicit</em> rather than
<em>explicit</em>.</p>

<h3 id="promises-simplified">Promises, simplified</h3>

<p>The core object at play is the <em>promise</em>. Promises are already a
rather simple concept, but <code class="language-plaintext highlighter-rouge">aio</code> promises have been distilled to their
essence, as they’re only needed for this singular purpose. More on
this later.</p>

<p>As I said, a promise represents a future result. In practical terms, a
promise is just an object to which one can subscribe with a callback.
When the result is ready, the callbacks are invoked. Another way to
put it is that <em>promises <a href="https://en.wikipedia.org/wiki/Reification_(computer_science)">reify</a> the concept of callbacks</em>. A
callback is no longer just the idea of extra argument on a function.
It’s a first-class <em>thing</em> that itself can be passed around as a
value.</p>

<p>Promises have two slots: the final promise <em>result</em> and a list of
<em>subscribers</em>. A <code class="language-plaintext highlighter-rouge">nil</code> result means the result hasn’t been computed
yet. It’s so simple I’m not even <a href="/blog/2018/02/14/">bothering with <code class="language-plaintext highlighter-rouge">cl-struct</code></a>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">aio-promise</span> <span class="p">()</span>
  <span class="s">"Create a new promise object."</span>
  <span class="p">(</span><span class="nv">record</span> <span class="ss">'aio-promise</span> <span class="no">nil</span> <span class="p">()))</span>

<span class="p">(</span><span class="nv">defsubst</span> <span class="nv">aio-promise-p</span> <span class="p">(</span><span class="nv">object</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">eq</span> <span class="ss">'aio-promise</span> <span class="p">(</span><span class="nb">type-of</span> <span class="nv">object</span><span class="p">))</span>
       <span class="p">(</span><span class="nb">=</span> <span class="mi">3</span> <span class="p">(</span><span class="nb">length</span> <span class="nv">object</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">defsubst</span> <span class="nv">aio-result</span> <span class="p">(</span><span class="nv">promise</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">aref</span> <span class="nv">promise</span> <span class="mi">1</span><span class="p">))</span>
</code></pre></div></div>

<p>To subscribe to a promise, use <code class="language-plaintext highlighter-rouge">aio-listen</code>:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">aio-listen</span> <span class="p">(</span><span class="nv">promise</span> <span class="nv">callback</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">result</span> <span class="p">(</span><span class="nv">aio-result</span> <span class="nv">promise</span><span class="p">)))</span>
    <span class="p">(</span><span class="k">if</span> <span class="nv">result</span>
        <span class="p">(</span><span class="nv">run-at-time</span> <span class="mi">0</span> <span class="no">nil</span> <span class="nv">callback</span> <span class="nv">result</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">push</span> <span class="nv">callback</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">promise</span> <span class="mi">2</span><span class="p">)))))</span>
</code></pre></div></div>

<p>If the result isn’t ready yet, add the callback to the list of
subscribers. If the result is ready <em>call the callback in the next
event loop turn</em> using <code class="language-plaintext highlighter-rouge">run-at-time</code>. This is important because it
keeps all the asynchronous components isolated from one another. They
won’t see each others’ frames on the call stack, nor frames from
<code class="language-plaintext highlighter-rouge">aio</code>. This is so important that the <a href="https://promisesaplus.com/">Promises/A+ specification</a>
is explicit about it.</p>

<p>The other half of the equation is resolving a promise, which is done
with <code class="language-plaintext highlighter-rouge">aio-resolve</code>. Unlike other promises, <code class="language-plaintext highlighter-rouge">aio</code> promises don’t care
whether the promise is being <em>fulfilled</em> (success) or <em>rejected</em>
(error). Instead a promise is resolved using a <em>value function</em> — or,
usually, a <em>value closure</em>. Subscribers receive this value function
and extract the value by invoking it with no arguments.</p>

<p>Why? This lets the promise’s resolver decide the semantics of the
result. Instead of returning a value, this function can instead signal
an error, propagating an error signal that terminated an async function.
Because of this, the promise doesn’t need to know how it’s being
resolved.</p>

<p>When a promise is resolved, subscribers are each scheduled in their own
event loop turns in the same order that they subscribed. If a promise
has already been resolved, nothing happens. (Thought: Perhaps this
should be an error in order to catch API misuse?)</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">aio-resolve</span> <span class="p">(</span><span class="nv">promise</span> <span class="nv">value-function</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">unless</span> <span class="p">(</span><span class="nv">aio-result</span> <span class="nv">promise</span><span class="p">)</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">callbacks</span> <span class="p">(</span><span class="nb">nreverse</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">promise</span> <span class="mi">2</span><span class="p">))))</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">promise</span> <span class="mi">1</span><span class="p">)</span> <span class="nv">value-function</span>
            <span class="p">(</span><span class="nb">aref</span> <span class="nv">promise</span> <span class="mi">2</span><span class="p">)</span> <span class="p">())</span>
      <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">callback</span> <span class="nv">callbacks</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">run-at-time</span> <span class="mi">0</span> <span class="no">nil</span> <span class="nv">callback</span> <span class="nv">value-function</span><span class="p">)))))</span>
</code></pre></div></div>

<p>If you’re not an async function, you might subscribe to a promise like
so:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">aio-listen</span> <span class="nv">promise</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">v</span><span class="p">)</span>
                      <span class="p">(</span><span class="nv">message</span> <span class="s">"%s"</span> <span class="p">(</span><span class="nb">funcall</span> <span class="nv">v</span><span class="p">))))</span>
</code></pre></div></div>

<p>The simplest example of a non-async function that creates and delivers
on a promise is a “sleep” function:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">aio-sleep</span> <span class="p">(</span><span class="nv">seconds</span> <span class="k">&amp;optional</span> <span class="nv">result</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">promise</span> <span class="p">(</span><span class="nv">aio-promise</span><span class="p">))</span>
        <span class="p">(</span><span class="nv">value-function</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="nv">result</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">prog1</span> <span class="nv">promise</span>
      <span class="p">(</span><span class="nv">run-at-time</span> <span class="nv">seconds</span> <span class="no">nil</span>
                   <span class="nf">#'</span><span class="nv">aio-resolve</span> <span class="nv">promise</span> <span class="nv">value-function</span><span class="p">))))</span>
</code></pre></div></div>

<p>Similarly, here’s a “timeout” promise that delivers a special timeout
error signal at a given time in the future.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">aio-timeout</span> <span class="p">(</span><span class="nv">seconds</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">promise</span> <span class="p">(</span><span class="nv">aio-promise</span><span class="p">))</span>
        <span class="p">(</span><span class="nv">value-function</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="p">(</span><span class="nb">signal</span> <span class="ss">'aio-timeout</span> <span class="no">nil</span><span class="p">))))</span>
    <span class="p">(</span><span class="nb">prog1</span> <span class="nv">promise</span>
      <span class="p">(</span><span class="nv">run-at-time</span> <span class="nv">seconds</span> <span class="no">nil</span>
                   <span class="nf">#'</span><span class="nv">aio-resolve</span> <span class="nv">promise</span> <span class="nv">value-function</span><span class="p">))))</span>
</code></pre></div></div>

<p>That’s all there is to promises.</p>

<h3 id="evaluate-in-the-context-of-a-promise">Evaluate in the context of a promise</h3>

<p>Before we get into pausing functions, lets deal with the slightly
simpler matter of delivering their return values using a promise. What
we need is a way to evaluate a “body” and capture its result in a
promise. If the body exits due to a signal, we want to capture that as
well.</p>

<p>Here’s a macro that does just this:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmacro</span> <span class="nv">aio-with-promise</span> <span class="p">(</span><span class="nv">promise</span> <span class="k">&amp;rest</span> <span class="nv">body</span><span class="p">)</span>
  <span class="o">`</span><span class="p">(</span><span class="nv">aio-resolve</span> <span class="o">,</span><span class="nv">promise</span>
                <span class="p">(</span><span class="nv">condition-case</span> <span class="nb">error</span>
                    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">result</span> <span class="p">(</span><span class="k">progn</span> <span class="o">,@</span><span class="nv">body</span><span class="p">)))</span>
                      <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="nv">result</span><span class="p">))</span>
                  <span class="p">(</span><span class="nb">error</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
                           <span class="p">(</span><span class="nb">signal</span> <span class="p">(</span><span class="nb">car</span> <span class="nb">error</span><span class="p">)</span> <span class="c1">; rethrow</span>
                                   <span class="p">(</span><span class="nb">cdr</span> <span class="nb">error</span><span class="p">)))))))</span>
</code></pre></div></div>

<p>The body result is captured in a closure and delivered to the promise.
If there’s an error signal, it’s “<em>rethrown</em>” into subscribers by the
promise’s value function.</p>

<p>This is where Emacs Lisp has a serious weak spot. There’s not really a
concept of rethrowing a signal. Unlike a language with explicit
exception objects that can capture a snapshot of the backtrace, the
original backtrace is completely lost where the signal is caught.
There’s no way to “reattach” it to the signal when it’s rethrown. This
is unfortunate because it would greatly help debugging if you got to see
the full backtrace on the other side of the promise.</p>

<h3 id="async-functions">Async functions</h3>

<p>So we have promises and we want to pause a function on a promise.
Generators have <code class="language-plaintext highlighter-rouge">iter-yield</code> for pausing an iterator’s execution. To
tackle this problem:</p>

<ol>
  <li>Yield the promise to pause the iterator.</li>
  <li>Subscribe a callback on the promise that continues the generator
(<code class="language-plaintext highlighter-rouge">iter-next</code>) with the promise’s result as the yield result.</li>
</ol>

<p>All the hard work is done in either side of the yield, so <code class="language-plaintext highlighter-rouge">aio-await</code> is
just a simple wrapper around <code class="language-plaintext highlighter-rouge">iter-yield</code>:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmacro</span> <span class="nv">aio-await</span> <span class="p">(</span><span class="nv">expr</span><span class="p">)</span>
  <span class="o">`</span><span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">iter-yield</span> <span class="o">,</span><span class="nv">expr</span><span class="p">)))</span>
</code></pre></div></div>

<p>Remember, that <code class="language-plaintext highlighter-rouge">funcall</code> is here to extract the promise value from the
value function. If it signals an error, this propagates directly into
the iterator just as if it had been a direct call — minus an accurate
backtrace.</p>

<p>So <code class="language-plaintext highlighter-rouge">aio-lambda</code> / <code class="language-plaintext highlighter-rouge">aio-defun</code> needs to wrap the body in a generator
(<code class="language-plaintext highlighter-rouge">iter-lamba</code>), invoke it to produce a generator, then drive the
generator using callbacks. Here’s a simplified, unhygienic definition of
<code class="language-plaintext highlighter-rouge">aio-lambda</code>:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmacro</span> <span class="nv">aio-lambda</span> <span class="p">(</span><span class="nv">arglist</span> <span class="k">&amp;rest</span> <span class="nv">body</span><span class="p">)</span>
  <span class="o">`</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">args</span><span class="p">)</span>
     <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">promise</span> <span class="p">(</span><span class="nv">aio-promise</span><span class="p">))</span>
           <span class="p">(</span><span class="nv">iter</span> <span class="p">(</span><span class="nb">apply</span> <span class="p">(</span><span class="nv">iter-lambda</span> <span class="o">,</span><span class="nv">arglist</span>
                          <span class="p">(</span><span class="nv">aio-with-promise</span> <span class="nv">promise</span>
                            <span class="o">,@</span><span class="nv">body</span><span class="p">))</span>
                        <span class="nv">args</span><span class="p">)))</span>
       <span class="p">(</span><span class="nb">prog1</span> <span class="nv">promise</span>
         <span class="p">(</span><span class="nv">aio--step</span> <span class="nv">iter</span> <span class="nv">promise</span> <span class="no">nil</span><span class="p">)))))</span>
</code></pre></div></div>

<p>The body is evaluated inside <code class="language-plaintext highlighter-rouge">aio-with-promise</code> with the result
delivered to the promise returned directly by the async function.</p>

<p>Before returning, the iterator is handed to <code class="language-plaintext highlighter-rouge">aio--step</code>, which drives
the iterator forward until it delivers its first promise. When the
iterator yields a promise, <code class="language-plaintext highlighter-rouge">aio--step</code> attaches a callback back to
itself on the promise as described above. Immediately driving the
iterator up to the first yielded promise “primes” it, which is
important for getting the ball rolling on any asynchronous operations.</p>

<p>If the iterator ever yields something other than a promise, it’s
delivered right back into the iterator.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">aio--step</span> <span class="p">(</span><span class="nv">iter</span> <span class="nv">promise</span> <span class="nv">yield-result</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">condition-case</span> <span class="nv">_</span>
      <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">result</span> <span class="nb">=</span> <span class="p">(</span><span class="nv">iter-next</span> <span class="nv">iter</span> <span class="nv">yield-result</span><span class="p">)</span>
               <span class="nv">then</span> <span class="p">(</span><span class="nv">iter-next</span> <span class="nv">iter</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="nv">result</span><span class="p">))</span>
               <span class="nv">until</span> <span class="p">(</span><span class="nv">aio-promise-p</span> <span class="nv">result</span><span class="p">)</span>
               <span class="nv">finally</span> <span class="p">(</span><span class="nv">aio-listen</span> <span class="nv">result</span>
                                   <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">value</span><span class="p">)</span>
                                     <span class="p">(</span><span class="nv">aio--step</span> <span class="nv">iter</span> <span class="nv">promise</span> <span class="nv">value</span><span class="p">))))</span>
    <span class="p">(</span><span class="nv">iter-end-of-sequence</span><span class="p">)))</span>
</code></pre></div></div>

<p>When the iterator is done, nothing more needs to happen since the
iterator resolves its own return value promise.</p>

<p>The definition of <code class="language-plaintext highlighter-rouge">aio-defun</code> just uses <code class="language-plaintext highlighter-rouge">aio-lambda</code> with <code class="language-plaintext highlighter-rouge">defalias</code>.
There’s nothing to it.</p>

<p>That’s everything you need! Everything else in the package is merely
useful, awaitable functions like <code class="language-plaintext highlighter-rouge">aio-sleep</code> and <code class="language-plaintext highlighter-rouge">aio-timeout</code>.</p>

<h3 id="composing-promises">Composing promises</h3>

<p>Unfortunately <code class="language-plaintext highlighter-rouge">url-retrieve</code> doesn’t support timeouts. We can work
around this by composing two promises: a <code class="language-plaintext highlighter-rouge">url-retrieve</code> promise and
<code class="language-plaintext highlighter-rouge">aio-timeout</code> promise. First define a promise-returning function,
<code class="language-plaintext highlighter-rouge">aio-select</code> that takes a list of promises and returns (as another
promise) the first promise to resolve:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">aio-select</span> <span class="p">(</span><span class="nv">promises</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">result</span> <span class="p">(</span><span class="nv">aio-promise</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">prog1</span> <span class="nv">result</span>
      <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">promise</span> <span class="nv">promises</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">aio-listen</span> <span class="nv">promise</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">_</span><span class="p">)</span>
                              <span class="p">(</span><span class="nv">aio-resolve</span>
                               <span class="nv">result</span>
                               <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="nv">promise</span><span class="p">))))))))</span>
</code></pre></div></div>

<p>We give <code class="language-plaintext highlighter-rouge">aio-select</code> both our <code class="language-plaintext highlighter-rouge">url-retrieve</code> and <code class="language-plaintext highlighter-rouge">timeout</code> promises, and
it tells us which resolved first:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">aio-defun</span> <span class="nv">fetch-fortune-4</span> <span class="p">(</span><span class="nv">url</span> <span class="nv">timeout</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">promises</span> <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="nv">aio-url-retrieve</span> <span class="nv">url</span><span class="p">)</span>
                         <span class="p">(</span><span class="nv">aio-timeout</span> <span class="nv">timeout</span><span class="p">)))</span>
         <span class="p">(</span><span class="nv">fastest</span> <span class="p">(</span><span class="nv">aio-await</span> <span class="p">(</span><span class="nv">aio-select</span> <span class="nv">promises</span><span class="p">)))</span>
         <span class="p">(</span><span class="nv">buffer</span> <span class="p">(</span><span class="nv">aio-await</span> <span class="nv">fastest</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">with-current-buffer</span> <span class="nv">buffer</span>
      <span class="p">(</span><span class="nb">prog1</span> <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">kill-buffer</span><span class="p">)))))</span>
</code></pre></div></div>

<p>Cool! Note: This will not actually cancel the URL request, just move
the async function forward earlier and prevent it from getting the
result.</p>

<h3 id="threads">Threads</h3>

<p>Despite <code class="language-plaintext highlighter-rouge">aio</code> being entirely about managing concurrent, asynchronous
operations, it has nothing at all to do with threads — as in Emacs 26’s
support for kernel threads. All async functions and promise callbacks
are expected to run <em>only</em> on the main thread. That’s not to say an
async function can’t await on a result from another thread. It just must
be <a href="/blog/2017/02/14/">done very carefully</a>.</p>

<h3 id="processes">Processes</h3>

<p>The package also includes two functions for realizing promises on
processes, whether they be subprocesses or network sockets.</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">aio-process-filter</code></li>
  <li><code class="language-plaintext highlighter-rouge">aio-process-sentinel</code></li>
</ul>

<p>For example, this function loops over each chunk of output (typically
4kB) from the process, as delivered to a filter function:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">aio-defun</span> <span class="nv">process-chunks</span> <span class="p">(</span><span class="nv">process</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">chunk</span> <span class="nb">=</span> <span class="p">(</span><span class="nv">aio-await</span> <span class="p">(</span><span class="nv">aio-process-filter</span> <span class="nv">process</span><span class="p">))</span>
           <span class="nv">while</span> <span class="nv">chunk</span>
           <span class="nb">do</span> <span class="p">(</span><span class="o">...</span> <span class="nv">process</span> <span class="nv">chunk</span> <span class="o">...</span><span class="p">)))</span>
</code></pre></div></div>

<p>Exercise for the reader: Write an awaitable function that returns a line
at at time rather than a chunk at a time. You can build it on top of
<code class="language-plaintext highlighter-rouge">aio-process-filter</code>.</p>

<p>I considered wrapping functions like <code class="language-plaintext highlighter-rouge">start-process</code> so that their <code class="language-plaintext highlighter-rouge">aio</code>
versions would return a promise representing some kind of result from
the process. However there are <em>so</em> many different ways to create and
configure processes that I would have ended up duplicating all the
process functions. Focusing on the filter and sentinel, and letting the
caller create and configure the process is much cleaner.</p>

<p>Unfortunately Emacs has no asynchronous API for writing output to a
process. Both <code class="language-plaintext highlighter-rouge">process-send-string</code> and <code class="language-plaintext highlighter-rouge">process-send-region</code> will block
if the pipe or socket is full. There is no callback, so you cannot await
on writing output. Maybe there’s a way to do it with a dedicated thread?</p>

<p>Another issue is that the <code class="language-plaintext highlighter-rouge">process-send-*</code> functions <a href="/blog/2013/01/14/">are
preemptible</a>, made necessary because they block. The
<code class="language-plaintext highlighter-rouge">aio-process-*</code> functions leave a gap (i.e. between filter awaits)
where no filter or sentinel function is attached. It’s a consequence
of promises being single-fire. The gap is harmless so long as the
async function doesn’t await something else or get preempted. This
needs some more thought.</p>

<p><strong><em>Update</em></strong>: These process functions no longer exist and have been
replaced by a small framework for building chains of promises. See
<code class="language-plaintext highlighter-rouge">aio-make-callback</code>.</p>

<h3 id="testing-aio">Testing aio</h3>

<p>The test suite for <code class="language-plaintext highlighter-rouge">aio</code> is a bit unusual. Emacs’ built-in test suite,
ERT, doesn’t support asynchronous tests. Furthermore, tests are
generally run in batch mode, where Emacs invokes a single function and
then exits rather than pump an event loop. Batch mode can only handle
asynchronous process I/O, not the async functions of <code class="language-plaintext highlighter-rouge">aio</code>. So it’s
not possible to run the tests in batch mode.</p>

<p>Instead I hacked together a really crude callback-based test suite. It
runs in non-batch mode and writes the test results into a buffer
(run with <code class="language-plaintext highlighter-rouge">make check</code>). Not ideal, but it works.</p>

<p>One of the tests is a sleep sort (with reasonable tolerances). It’s a
pretty neat demonstration of what you can do with <code class="language-plaintext highlighter-rouge">aio</code>:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">aio-defun</span> <span class="nv">sleep-sort</span> <span class="p">(</span><span class="nb">values</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">promises</span> <span class="p">(</span><span class="nb">mapcar</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nv">aio-sleep</span> <span class="nv">v</span> <span class="nv">v</span><span class="p">))</span> <span class="nb">values</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">while</span> <span class="nv">promises</span>
             <span class="nv">for</span> <span class="nv">next</span> <span class="nb">=</span> <span class="p">(</span><span class="nv">aio-await</span> <span class="p">(</span><span class="nv">aio-select</span> <span class="nv">promises</span><span class="p">))</span>
             <span class="nb">do</span> <span class="p">(</span><span class="nb">setf</span> <span class="nv">promises</span> <span class="p">(</span><span class="nv">delq</span> <span class="nv">next</span> <span class="nv">promises</span><span class="p">))</span>
             <span class="nv">collect</span> <span class="p">(</span><span class="nv">aio-await</span> <span class="nv">next</span><span class="p">))))</span>
</code></pre></div></div>

<p>To see it in action (<code class="language-plaintext highlighter-rouge">M-x sleep-sort-demo</code>):</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">aio-defun</span> <span class="nv">sleep-sort-demo</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nb">values</span> <span class="o">'</span><span class="p">(</span><span class="mf">0.1</span> <span class="mf">0.4</span> <span class="mf">1.1</span> <span class="mf">0.2</span> <span class="mf">0.8</span> <span class="mf">0.6</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">message</span> <span class="s">"%S"</span> <span class="p">(</span><span class="nv">aio-await</span> <span class="p">(</span><span class="nv">sleep-sort</span> <span class="nb">values</span><span class="p">)))))</span>
</code></pre></div></div>

<h3 id="asyncawait-is-pretty-awesome">Async/await is pretty awesome</h3>

<p>I’m quite happy with how this all came together. Once I had the
concepts straight — particularly resolving to value functions —
everything made sense and all the parts fit together well, and mostly
by accident. That feels good.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Emacs 26 Brings Generators and Threads</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2018/05/31/"/>
    <id>urn:uuid:395c5e11-2088-32fa-53c8-0c749dca2064</id>
    <updated>2018-05-31T17:45:16Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="lisp"/><category term="lang"/>
    <content type="html">
      <![CDATA[<p>Emacs 26.1 was <a href="https://lists.gnu.org/archive/html/emacs-devel/2018-05/msg00765.html">recently released</a>. As you would expect from a
major release, it comes with lots of new goodies. Being <a href="/tags/emacs/">a bit of an
Emacs Lisp enthusiast</a>, the two most interesting new features
are <a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/Generators.html">generators</a> (<code class="language-plaintext highlighter-rouge">iter</code>) and <a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/Threads.html">native threads</a>
(<code class="language-plaintext highlighter-rouge">thread</code>).</p>

<p><strong>Correction</strong>: Generators were actually introduced in Emacs 25.1
(Sept. 2016), not Emacs 26.1. Doh!</p>

<p><strong>Update</strong>: <a href="https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual">ThreadSanitizer (TSan)</a> quickly shows that Emacs’
threading implementation has many data races, making it <a href="https://hboehm.info/boehm-hotpar11.pdf">completely
untrustworthy</a>. Until this is fixed, <strong><em>nobody</em> should use Emacs
threads for any purpose</strong>, and threads should disabled at compile time.</p>

<!--more-->

<h3 id="generators">Generators</h3>

<p>Generators are one of those cool language features that provide a lot of
power at a small implementation cost. They’re like a constrained form of
coroutines, but, unlike coroutines, they’re typically built entirely on
top of first-class functions (e.g. closures). This means <em>no additional
run-time support is needed</em> in order to add generators to a language.
The only complications are the changes to the compiler. Generators are
not compiled the same way as normal functions despite looking so
similar.</p>

<p>What’s perhaps coolest of all about lisp-family generators, including
Emacs Lisp, is that the compiler component can be <em>implemented
entirely with macros</em>. The compiler need not be modified at all,
making generators no more than a library, and not actually part of the
language. That’s exactly how they’ve been implemented in Emacs Lisp
(<code class="language-plaintext highlighter-rouge">emacs-lisp/generator.el</code>).</p>

<p>So what’s a generator? It’s a function that returns an <em>iterator
object</em>. When an iterator object is invoked (e.g. <code class="language-plaintext highlighter-rouge">iter-next</code>) it
evaluates the body of the generator. Each iterator is independent.
What makes them unusual (and useful) is that the evaluation is
<em>paused</em> in the middle of the body to return a value, saving all the
internal state in the iterator. Normally pausing in the middle of
functions isn’t possible, which is what requires the special compiler
support.</p>

<p>Emacs Lisp generators appear to be most closely modeled after <a href="https://wiki.python.org/moin/Generators">Python
generators</a>, though it also shares some similarities to
<a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Iterators_and_Generators">JavaScript generators</a>. What makes it most like Python is the use
of signals for flow control — something I’m <a href="http://wiki.c2.com/?DontUseExceptionsForFlowControl">not personally enthused
about</a>. When a Python generator
completes, it throws a <code class="language-plaintext highlighter-rouge">StopItertion</code> exception. In Emacs Lisp, it’s
an <code class="language-plaintext highlighter-rouge">iter-end-of-sequence</code> signal. A signal is out-of-band and avoids
the issue relying on some special in-band value to communicate the end
of iteration.</p>

<p>In contrast, JavaScript’s solution is to return a “rich” object wrapping
the actual yield value. This object has a <code class="language-plaintext highlighter-rouge">done</code> field that communicates
whether iteration has completed. This avoids the use of exceptions for
flow control, but the caller has to unpack the rich object.</p>

<p>Fortunately the flow control issue isn’t normally exposed to Emacs Lisp
code. Most of the time you’ll use the <code class="language-plaintext highlighter-rouge">iter-do</code> macro or (my preference)
the new <code class="language-plaintext highlighter-rouge">cl-loop</code> keyword <code class="language-plaintext highlighter-rouge">iter-by</code>.</p>

<p>To illustrate how a generator works, here’s a really simple iterator
that iterates over a list:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">iter-defun</span> <span class="nv">walk</span> <span class="p">(</span><span class="nb">list</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">while</span> <span class="nb">list</span>
    <span class="p">(</span><span class="nv">iter-yield</span> <span class="p">(</span><span class="nb">pop</span> <span class="nb">list</span><span class="p">))))</span>
</code></pre></div></div>

<p>Here’s how it might be used:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">i</span> <span class="p">(</span><span class="nv">walk</span> <span class="o">'</span><span class="p">(</span><span class="ss">:a</span> <span class="ss">:b</span> <span class="ss">:c</span><span class="p">)))</span>

<span class="p">(</span><span class="nv">iter-next</span> <span class="nv">i</span><span class="p">)</span>  <span class="c1">; =&gt; :a</span>
<span class="p">(</span><span class="nv">iter-next</span> <span class="nv">i</span><span class="p">)</span>  <span class="c1">; =&gt; :b</span>
<span class="p">(</span><span class="nv">iter-next</span> <span class="nv">i</span><span class="p">)</span>  <span class="c1">; =&gt; :c</span>
<span class="p">(</span><span class="nv">iter-next</span> <span class="nv">i</span><span class="p">)</span>  <span class="c1">; error: iter-end-of-sequence</span>
</code></pre></div></div>

<p>The iterator object itself is <em>opaque</em> and you shouldn’t rely on any
part of its structure. That being said, I’m a firm believer that we
should understand how things work underneath the hood so that we can
make the most effective use of at them. No program should rely on the
particulars of the iterator object internals for <em>correctness</em>, but a
well-written program should employ them in a way that <a href="/blog/2017/01/30/">best exploits
their expected implementation</a>.</p>

<p>Currently iterator objects are closures, and <code class="language-plaintext highlighter-rouge">iter-next</code> invokes the
closure with its own internal protocol. It asks the closure to return
the next value (<code class="language-plaintext highlighter-rouge">:next</code> operation), and <code class="language-plaintext highlighter-rouge">iter-close</code> asks it to clean
itself up (<code class="language-plaintext highlighter-rouge">:close</code> operation).</p>

<p>Since they’re just closures, another <em>really</em> cool thing about Emacs
Lisp generators is that <a href="/blog/2013/12/30/">iterator objects are generally readable</a>.
That is, you can serialize them out with <code class="language-plaintext highlighter-rouge">print</code> and bring them back to
life with <code class="language-plaintext highlighter-rouge">read</code>, even in another instance of Emacs. They exist
independently of the original generator function. This will not work if
one of the values captured in the iterator object is not readable (e.g.
buffers).</p>

<p>How does pausing work? Well, one of other exciting new features of
Emacs 26 is the introduction of a jump table opcode, <code class="language-plaintext highlighter-rouge">switch</code>. I’d
lamented in the past that large <code class="language-plaintext highlighter-rouge">cond</code> and <code class="language-plaintext highlighter-rouge">cl-case</code> expressions could
be a lot more efficient if Emacs’ byte code supported jump tables. It
turns an O(n) sequence of comparisons into an O(1) lookup and jump.
It’s essentially the perfect foundation for a generator since it can
be used to <a href="https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html">jump straight back to the position</a> where evaluation
was paused.</p>

<p><em>Buuut</em>, generators do not currently use jump tables. The generator
library predates the new <code class="language-plaintext highlighter-rouge">switch</code> opcode, and, being independent of it,
its author, Daniel Colascione, went with the best option at the time.
Chunks of code between yields are packaged as individual closures. These
closures are linked together a bit like nodes in a graph, creating a
sort of state machine. To get the next value, the iterator object
invokes the closure representing the next state.</p>

<p>I’ve <em>manually</em> macro expanded the <code class="language-plaintext highlighter-rouge">walk</code> generator above into a form
that <em>roughly</em> resembles the expansion of <code class="language-plaintext highlighter-rouge">iter-defun</code>:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">walk</span> <span class="p">(</span><span class="nb">list</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">(</span><span class="nv">state</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">cl-flet*</span> <span class="p">((</span><span class="nv">state-2</span> <span class="p">()</span>
                 <span class="p">(</span><span class="nb">signal</span> <span class="ss">'iter-end-of-sequence</span> <span class="no">nil</span><span class="p">))</span>
               <span class="p">(</span><span class="nv">state-1</span> <span class="p">()</span>
                 <span class="p">(</span><span class="nb">prog1</span> <span class="p">(</span><span class="nb">pop</span> <span class="nb">list</span><span class="p">)</span>
                   <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nb">null</span> <span class="nb">list</span><span class="p">)</span>
                     <span class="p">(</span><span class="nb">setf</span> <span class="nv">state</span> <span class="nf">#'</span><span class="nv">state-2</span><span class="p">))))</span>
               <span class="p">(</span><span class="nv">state-0</span> <span class="p">()</span>
                 <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">null</span> <span class="nb">list</span><span class="p">)</span>
                     <span class="p">(</span><span class="nv">state-2</span><span class="p">)</span>
                   <span class="p">(</span><span class="nb">setf</span> <span class="nv">state</span> <span class="nf">#'</span><span class="nv">state-1</span><span class="p">)</span>
                   <span class="p">(</span><span class="nv">state-1</span><span class="p">))))</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="nv">state</span> <span class="nf">#'</span><span class="nv">state-0</span><span class="p">)</span>
      <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
        <span class="p">(</span><span class="nb">funcall</span> <span class="nv">state</span><span class="p">)))))</span>
</code></pre></div></div>

<p>This omits the protocol I mentioned, and it doesn’t have yield results
(values passed to the iterator). The actual expansion is a whole lot
messier and less optimal than this, but hopefully my hand-rolled
generator is illustrative enough. Without the protocol, this iterator is
stepped using <code class="language-plaintext highlighter-rouge">funcall</code> rather than <code class="language-plaintext highlighter-rouge">iter-next</code>.</p>

<p>The <code class="language-plaintext highlighter-rouge">state</code> variable keeps track of where in the body of the generator
this iterator is currently “paused.” Continuing the iterator is
therefore just a matter of invoking the closure that represents this
state. Each state closure may update <code class="language-plaintext highlighter-rouge">state</code> to point to a new part of
the generator body. The terminal state is obviously <code class="language-plaintext highlighter-rouge">state-2</code>. Notice
how state transitions occur around branches.</p>

<p>I had said generators can be implemented as a library in Emacs Lisp.
Unfortunately theres a hole in this: <code class="language-plaintext highlighter-rouge">unwind-protect</code>. It’s not valid to
yield inside an <code class="language-plaintext highlighter-rouge">unwind-protect</code> form. Unlike, say, a throw-catch,
there’s no mechanism to trap an unwinding stack so that it can be
restarted later. The state closure needs to return and fall through the
<code class="language-plaintext highlighter-rouge">unwind-protect</code>.</p>

<p>A jump table version of the generator might look like the following.
I’ve used <code class="language-plaintext highlighter-rouge">cl-labels</code> since it allows for recursion.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">walk</span> <span class="p">(</span><span class="nb">list</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">state</span> <span class="mi">0</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">cl-labels</span>
        <span class="p">((</span><span class="nv">closure</span> <span class="p">()</span>
           <span class="p">(</span><span class="nv">cl-case</span> <span class="nv">state</span>
             <span class="p">(</span><span class="mi">0</span> <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">null</span> <span class="nb">list</span><span class="p">)</span>
                    <span class="p">(</span><span class="nb">setf</span> <span class="nv">state</span> <span class="mi">2</span><span class="p">)</span>
                  <span class="p">(</span><span class="nb">setf</span> <span class="nv">state</span> <span class="mi">1</span><span class="p">))</span>
                <span class="p">(</span><span class="nv">closure</span><span class="p">))</span>
             <span class="p">(</span><span class="mi">1</span> <span class="p">(</span><span class="nb">prog1</span> <span class="p">(</span><span class="nb">pop</span> <span class="nb">list</span><span class="p">)</span>
                  <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nb">null</span> <span class="nb">list</span><span class="p">)</span>
                    <span class="p">(</span><span class="nb">setf</span> <span class="nv">state</span> <span class="mi">2</span><span class="p">))))</span>
             <span class="p">(</span><span class="mi">2</span> <span class="p">(</span><span class="nb">signal</span> <span class="ss">'iter-end-of-sequence</span> <span class="no">nil</span><span class="p">)))))</span>
      <span class="nf">#'</span><span class="nv">closure</span><span class="p">)))</span>
</code></pre></div></div>

<p>When byte compiled on Emacs 26, that <code class="language-plaintext highlighter-rouge">cl-case</code> is turned into a jump
table. This “switch” form is closer to how generators are implemented in
other languages.</p>

<p>Iterator objects can <a href="/blog/2017/12/14/">share state between themselves</a> if they
close over a common environment (or, of course, use the same global
variables).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">foo</span>
      <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nb">list</span> <span class="o">'</span><span class="p">(</span><span class="ss">:a</span> <span class="ss">:b</span> <span class="ss">:c</span><span class="p">)))</span>
        <span class="p">(</span><span class="nb">list</span>
         <span class="p">(</span><span class="nb">funcall</span>
          <span class="p">(</span><span class="nv">iter-lambda</span> <span class="p">()</span>
            <span class="p">(</span><span class="nv">while</span> <span class="nb">list</span>
              <span class="p">(</span><span class="nv">iter-yield</span> <span class="p">(</span><span class="nb">pop</span> <span class="nb">list</span><span class="p">)))))</span>
         <span class="p">(</span><span class="nb">funcall</span>
          <span class="p">(</span><span class="nv">iter-lambda</span> <span class="p">()</span>
            <span class="p">(</span><span class="nv">while</span> <span class="nb">list</span>
              <span class="p">(</span><span class="nv">iter-yield</span> <span class="p">(</span><span class="nb">pop</span> <span class="nb">list</span><span class="p">))))))))</span>

<span class="p">(</span><span class="nv">iter-next</span> <span class="p">(</span><span class="nb">nth</span> <span class="mi">0</span> <span class="nv">foo</span><span class="p">))</span>  <span class="c1">; =&gt; :a</span>
<span class="p">(</span><span class="nv">iter-next</span> <span class="p">(</span><span class="nb">nth</span> <span class="mi">1</span> <span class="nv">foo</span><span class="p">))</span>  <span class="c1">; =&gt; :b</span>
<span class="p">(</span><span class="nv">iter-next</span> <span class="p">(</span><span class="nb">nth</span> <span class="mi">0</span> <span class="nv">foo</span><span class="p">))</span>  <span class="c1">; =&gt; :c</span>
</code></pre></div></div>

<p>For years there has been a <em>very</em> crude way to “pause” a function and
allow other functions to run: <code class="language-plaintext highlighter-rouge">accept-process-output</code>. It only works in
the context of processes, but five years ago this was <a href="/blog/2013/01/14/">sufficient for me
to build primitives on top of it</a>. Unlike this old process
function, generators do not block threads, including the user interface,
which is really important.</p>

<h3 id="threads">Threads</h3>

<p>Emacs 26 also bring us threads, which have been attached in a very
bolted on fashion. It’s not much more than a subset of pthreads: shared
memory threads, recursive mutexes, and condition variables. The
interfaces look just like they do in pthreads, and there hasn’t been
much done to integrate more naturally into the Emacs Lisp ecosystem.</p>

<p>This is also only the first step in bringing threading to Emacs Lisp.
Right now there’s effectively a global interpreter lock (GIL), and
threads only run one at a time cooperatively. Like with generators, the
Python influence is obvious. In theory, sometime in the future this
interpreter lock will be removed, making way for actual concurrency.</p>

<p>This is, again, where I think it’s useful to contrast with JavaScript,
which was also initially designed to be single-threaded. Low-level
threading primitives weren’t exposed — though mostly because
JavaScript typically runs sandboxed and there’s no safe way to expose
those primitives. Instead it got a <a href="/blog/2013/01/26/">web worker API</a> that exposes
concurrency at a much higher level, along with an efficient interface
for thread coordination.</p>

<p>For Emacs Lisp, I’d prefer something safer, more like the JavaScript
approach. Low-level pthreads are now a great way to wreck Emacs with
deadlocks (with no <code class="language-plaintext highlighter-rouge">C-g</code> escape). Playing around with the new
threading API for just a few days, I’ve already had to restart Emacs a
bunch of times. Bugs in Emacs Lisp are normally a lot more forgiving.</p>

<p>One important detail that has been designed well is that dynamic
bindings are thread-local. This is really essential for correct
behavior. This is also an easy way to create thread-local storage
(TLS): dynamically bind variables in the thread’s entrance function.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: t; -*-</span>

<span class="p">(</span><span class="nb">defvar</span> <span class="nv">foo-counter-tls</span><span class="p">)</span>
<span class="p">(</span><span class="nb">defvar</span> <span class="nv">foo-path-tls</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">foo-make-thread</span> <span class="p">(</span><span class="nv">path</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">make-thread</span>
   <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
     <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">foo-counter-tls</span> <span class="mi">0</span><span class="p">)</span>
           <span class="p">(</span><span class="nv">foo-name-tls</span> <span class="nv">path</span><span class="p">))</span>
       <span class="o">...</span><span class="p">))))</span>
</code></pre></div></div>

<p>However, <strong><code class="language-plaintext highlighter-rouge">cl-letf</code> “bindings” are <em>not</em> thread-local</strong>, which makes
this <a href="/blog/2017/10/27/">otherwise incredibly useful macro</a> quite dangerous in the
presence of threads. This is one way that the new threading API feels
bolted on.</p>

<h4 id="building-generators-on-threads">Building generators on threads</h4>

<p>In <a href="/blog/2017/06/21/">my stack clashing article</a> I showed a few different ways to
add coroutine support to C. One method spawned per-coroutine threads,
and coordinated using semaphores. With the new threads API in Emacs,
it’s possible to do exactly the same thing.</p>

<p>Since generators are just a limited form of coroutines, this means
threads offer another, <em>very</em> different way to implement them. The
threads API doesn’t provide semaphores, but condition variables can fill
in for them. To “pause” in the middle of the generator, just wait on a
condition variable.</p>

<p>So, naturally, I just had to see if I could make it work. I call it a
“thread iterator” or “thriter.” The API is <em>very</em> similar to <code class="language-plaintext highlighter-rouge">iter</code>:</p>

<p><strong><a href="https://github.com/skeeto/thriter">https://github.com/skeeto/thriter</a></strong></p>

<p>This is merely a proof of concept so don’t actually use this library
for anything. These thread-based generators are about 5x slower than
<code class="language-plaintext highlighter-rouge">iter</code> generators, and they’re a lot more heavy-weight, needing an
entire thread per iterator object. This makes <code class="language-plaintext highlighter-rouge">thriter-close</code> all the
more important. On the other hand, these generators have no problem
yielding inside <code class="language-plaintext highlighter-rouge">unwind-protect</code>.</p>

<p>Originally this article was going to dive into the details of how
these thread-iterators worked, but <code class="language-plaintext highlighter-rouge">thriter</code> turned out to be quite a
bit more complicated than I anticipated, especially as I worked
towards feature matching <code class="language-plaintext highlighter-rouge">iter</code>.</p>

<p>The gist of it is that each side of a next/yield transaction gets its
own condition variable, but share a common mutex. Values are passed
between the threads using slots on the iterator object. The side that
isn’t currently running waits on a condition variable until the other
side frees it, after which the releaser waits on its own condition
variable for the result. This is similar to <a href="/blog/2017/02/14/">asynchronous requests in
Emacs dynamic modules</a>.</p>

<p>Rather than use signals to indicate completion, I modeled it after
JavaScript generators. Iterators return a cons cell. The car indicates
continuation and the cdr holds the yield result. To terminate an
iterator early (<code class="language-plaintext highlighter-rouge">thriter-close</code> or garbage collection), <code class="language-plaintext highlighter-rouge">thread-signal</code>
is used to essentially “cancel” the thread and knock it off the
condition variable.</p>

<p>Since threads aren’t (and shouldn’t be) garbage collected, failing to
run a thread-iterator to completion would normally cause a memory leak,
as the thread <a href="https://www.youtube.com/watch?v=AK3PWHxoT_E">sits there forever waiting on a “next” that will never
come</a>. To deal with this, there’s a finalizer is attached to the
iterator object in such a way that it’s not visible to the thread. A
lost iterator is eventually cleaned up by the garbage collector, but, as
usual with finalizers, this is <a href="https://utcc.utoronto.ca/~cks/space/blog/programming/GoFinalizersStopLeaks">only a last resort</a>.</p>

<h4 id="the-future-of-threads">The future of threads</h4>

<p>This thread-iterator project was my initial, little experiment with
Emacs Lisp threads, similar to why I <a href="/blog/2016/11/05/">connected a joystick to Emacs
using a dynamic module</a>. While I don’t expect the current thread
API to go away, it’s not really suitable for general use in its raw
form. Bugs in Emacs Lisp programs should virtually never bring down
Emacs and require a restart. Outside of threads, the few situations
that break this rule are very easy to avoid (and very obvious that
something dangerous is happening). Dynamic modules are dangerous by
necessity, but concurrency doesn’t have to be.</p>

<p>There really needs to be a safe, high-level API with clean thread
isolation. Perhaps this higher-level API will eventually build on top of
the low-level threading API.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Emacs Lisp Lambda Expressions Are Not Self-Evaluating</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2018/02/22/"/>
    <id>urn:uuid:7a3cd1d1-a48c-3c9b-1564-eacde8b9aa4d</id>
    <updated>2018-02-22T21:30:57Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="lang"/>
    <content type="html">
      <![CDATA[<p>This week I made a mistake that ultimately enlightened me about the
nature of function objects in Emacs Lisp. There are three kinds of
function objects, but they each behave very differently when evaluated
as objects.</p>

<p>But before we get to that, let’s talk about one of Emacs’
embarrassing, old missteps: <code class="language-plaintext highlighter-rouge">eval-after-load</code>.</p>

<h3 id="taming-an-old-dragon">Taming an old dragon</h3>

<p>One of the long-standing issues with Emacs is that loading Emacs Lisp
files (.el and .elc) is a slow process, even when those files have
been byte compiled. There are a number of dirty hacks in place to deal
with this issue, and the biggest and nastiest of them all is the
<a href="https://lwn.net/Articles/707615/"><em>dumper</em></a>, also known as <em>unexec</em>.</p>

<p>The Emacs you routinely use throughout the day is actually a previous
instance of Emacs that’s been resurrected from the dead. Your undead
Emacs was probably created months, if not years, earlier, back when it
was originally compiled. The first stage of compiling Emacs is to
compile a minimal C core called <code class="language-plaintext highlighter-rouge">temacs</code>. The second stage is loading
a bunch of Emacs Lisp files, then dumping a memory image in an
unportable, platform-dependent way. On Linux, this actually <a href="https://lwn.net/Articles/707615/">requires
special hooks in glibc</a>. The Emacs you know and love is this
dumped image loaded back into memory, continuing from where it left
off just after it was compiled. Regardless of your own feelings on the
matter, you have to admit <a href="/blog/2011/01/30/">this <em>is</em> a very lispy thing to do</a>.</p>

<p>There are two notable costs to Emacs’ dumper:</p>

<ol>
  <li>
    <p>The dumped image contains hard-coded memory addresses. This means
Emacs can’t be a <em>Position Independent Executable</em> (PIE). It can’t
take advantage of a security feature called <em>Address Space Layout
Randomization</em> (ASLR), which would increase the difficulty of
<a href="/blog/2017/07/19/">exploiting</a> some <a href="/blog/2012/09/28/">classes of bugs</a>. This might be
important to you if Emacs processes untrusted data, such as when it’s
used as <a href="/blog/2013/09/03/">a mail client</a>, <a href="https://github.com/skeeto/emacs-web-server">a web server</a> or generally
<a href="https://github.com/skeeto/elfeed">parses data downloaded across the network</a>.</p>
  </li>
  <li>
    <p>It’s not possible to cross-compile Emacs since it can only be dumped
by running <code class="language-plaintext highlighter-rouge">temacs</code> on its target platform. As an experiment I’ve
attempted to dump the Windows version of Emacs on Linux using
<a href="https://www.winehq.org/">Wine</a>, but was unsuccessful.</p>
  </li>
</ol>

<p>The good news is that there’s <a href="https://lists.gnu.org/archive/html/emacs-devel/2018-02/msg00347.html">a portable dumper</a> in the works
that makes this a lot less nasty. If you’re adventurous, you can
already disable dumping and run <code class="language-plaintext highlighter-rouge">temacs</code> directly by setting
<a href="https://lists.gnu.org/archive/html/bug-gnu-emacs/2016-11/msg00729.html"><code class="language-plaintext highlighter-rouge">CANNOT_DUMP=yes</code> at compile time</a>. Be warned, though, that a
non-dumped Emacs takes several seconds, or worse, to initialize
<em>before</em> it even begins loading your own configuration. It’s also
somewhat buggy since it seems nobody ever runs it this way
productively.</p>

<p>The other major way Emacs users have worked around slow loading is
aggressive use of lazy loading, generally via <em>autoloads</em>. The major
package interactive entry points are defined ahead of time as stub
functions. These stubs, when invoked, load the full package, which
overrides the stub definition, then finally the stub re-invokes the
new definition with the same arguments.</p>

<p>To further assist with lazy loading, an evaluated <code class="language-plaintext highlighter-rouge">defvar</code> form will
not override an existing global variable binding. This means you can,
to a certain extent, configure a package before it’s loaded. The
package will not clobber any existing configuration when it loads.
This also explains the bizarre interfaces for the various hook
functions, like <code class="language-plaintext highlighter-rouge">add-hook</code> and <code class="language-plaintext highlighter-rouge">run-hooks</code>. These accept symbols — the
<em>names</em> of the variables — rather than <em>values</em> of those variables as
would normally be the case. The <code class="language-plaintext highlighter-rouge">add-to-list</code> function does the same
thing. It’s all intended to cooperate with lazy loading, where the
variable may not have been defined yet.</p>

<h4 id="eval-after-load">eval-after-load</h4>

<p>Sometimes this isn’t enough and you need some some configuration to
take place after the package has been loaded, but without forcing it
to load early. That is, you need to tell Emacs “evaluate this code
after this particular package loads.” That’s where <code class="language-plaintext highlighter-rouge">eval-after-load</code>
comes into play, except for its fatal flaw: it takes the word “eval”
completely literally.</p>

<p>The first argument to <code class="language-plaintext highlighter-rouge">eval-after-load</code> is the name of a package. Fair
enough. The second argument is a form that will be passed to <code class="language-plaintext highlighter-rouge">eval</code>
after that package is loaded. Now hold on a minute. The general rule
of thumb is that if you’re calling <code class="language-plaintext highlighter-rouge">eval</code>, you’re probably doing
something seriously wrong, and this function is no exception. This is
<em>completely</em> the wrong mechanism for the task.</p>

<p>The second argument should have been a function — either a (sharp
quoted) symbol or a function object. And then instead of <code class="language-plaintext highlighter-rouge">eval</code> it
would be something more sensible, like <code class="language-plaintext highlighter-rouge">funcall</code>. Perhaps this
improved version would be named <code class="language-plaintext highlighter-rouge">call-after-load</code> or <code class="language-plaintext highlighter-rouge">run-after-load</code>.</p>

<p>The big problem with passing an s-expression is that it will be left
uncompiled due to being quoted. <a href="/blog/2017/12/14/">I’ve talked before about the
importance of evaluating your lambdas</a>. <code class="language-plaintext highlighter-rouge">eval-after-load</code> not
only encourages badly written Emacs Lisp, it demands it.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; BAD!</span>
<span class="p">(</span><span class="nv">eval-after-load</span> <span class="ss">'simple-httpd</span>
                 <span class="o">'</span><span class="p">(</span><span class="nb">push</span> <span class="o">'</span><span class="p">(</span><span class="s">"c"</span> <span class="o">.</span> <span class="s">"text/plain"</span><span class="p">)</span> <span class="nv">httpd-mime-types</span><span class="p">))</span>
</code></pre></div></div>

<p>This was all corrected in Emacs 25. If the second argument to
<code class="language-plaintext highlighter-rouge">eval-after-load</code> is a function — the result of applying <code class="language-plaintext highlighter-rouge">functionp</code> is
non-nil — then it uses <code class="language-plaintext highlighter-rouge">funcall</code>. There’s also a new macro,
<code class="language-plaintext highlighter-rouge">with-eval-after-load</code>, to package it all up nicely.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; Better (Emacs &gt;= 25 only)</span>
<span class="p">(</span><span class="nv">eval-after-load</span> <span class="ss">'simple-httpd</span>
  <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
    <span class="p">(</span><span class="nb">push</span> <span class="o">'</span><span class="p">(</span><span class="s">"c"</span> <span class="o">.</span> <span class="s">"text/plain"</span><span class="p">)</span> <span class="nv">httpd-mime-types</span><span class="p">)))</span>

<span class="c1">;;; Best (Emacs &gt;= 25 only)</span>
<span class="p">(</span><span class="nv">with-eval-after-load</span> <span class="ss">'simple-httpd</span>
  <span class="p">(</span><span class="nb">push</span> <span class="o">'</span><span class="p">(</span><span class="s">"c"</span> <span class="o">.</span> <span class="s">"text/plain"</span><span class="p">)</span> <span class="nv">httpd-mime-types</span><span class="p">))</span>
</code></pre></div></div>

<p>Though in both of these examples the compiler will likely warn about
<code class="language-plaintext highlighter-rouge">httpd-mime-types</code> not being defined. That’s a problem for another
day.</p>

<h4 id="a-workaround">A workaround</h4>

<p>But what if you <em>need</em> to use Emacs 24, as was the <a href="https://github.com/skeeto/elfeed/pull/268">situation that
sparked this article</a>? What can we do with the bad version of
<code class="language-plaintext highlighter-rouge">eval-after-load</code>? We could situate a lambda such that it’s evaluated,
but then smuggle the resulting function object into the form passed to
<code class="language-plaintext highlighter-rouge">eval-after-load</code>, all using a backquote.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; Note: this is subtly broken</span>
<span class="p">(</span><span class="nv">eval-after-load</span> <span class="ss">'simple-httpd</span>
  <span class="o">`</span><span class="p">(</span><span class="nb">funcall</span>
    <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
       <span class="p">(</span><span class="nb">push</span> <span class="o">'</span><span class="p">(</span><span class="s">"c"</span> <span class="o">.</span> <span class="s">"text/plain"</span><span class="p">)</span> <span class="nv">httpd-mime-types</span><span class="p">)))</span>
</code></pre></div></div>

<p>When everything is compiled, the backquoted form evalutes to this:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">funcall</span> <span class="err">#</span><span class="nv">[0</span> <span class="nv">&lt;bytecode&gt;</span> <span class="nv">[httpd-mime-types</span> <span class="p">(</span><span class="s">"c"</span> <span class="o">.</span> <span class="s">"text/plain"</span><span class="p">)</span><span class="nv">]</span> <span class="nv">2]</span><span class="p">)</span>
</code></pre></div></div>

<p>Where the second value (<code class="language-plaintext highlighter-rouge">#[...]</code>) is a <a href="/blog/2014/01/04/">byte-code object</a>.
However, as the comment notes, this is subtly broken. A cleaner and
correct way to solve all this is with a named function. The damage
caused by <code class="language-plaintext highlighter-rouge">eval-after-load</code> will have been (mostly) minimized.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">my-simple-httpd-hook</span> <span class="p">()</span>
  <span class="p">(</span><span class="nb">push</span> <span class="o">'</span><span class="p">(</span><span class="s">"c"</span> <span class="o">.</span> <span class="s">"text/plain"</span><span class="p">)</span> <span class="nv">httpd-mime-types</span><span class="p">))</span>

<span class="p">(</span><span class="nv">eval-after-load</span> <span class="ss">'simple-httpd</span>
  <span class="o">'</span><span class="p">(</span><span class="nb">funcall</span> <span class="nf">#'</span><span class="nv">my-simple-httpd-hook</span><span class="p">))</span>
</code></pre></div></div>

<p>But, let’s go back to the anonymous function solution. What was broken
about it? It all has to do with evaluating function objects.</p>

<h3 id="evaluating-function-objects">Evaluating function objects</h3>

<p>So what happens when we evaluate an expression like the one above with
<code class="language-plaintext highlighter-rouge">eval</code>? Here’s what it looks like again.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">funcall</span> <span class="err">#</span><span class="nv">[...]</span><span class="p">)</span>
</code></pre></div></div>

<p>First, <code class="language-plaintext highlighter-rouge">eval</code> notices it’s been given a non-empty list, so it’s probably
a function call. The first argument is the name of the function to be
called (<code class="language-plaintext highlighter-rouge">funcall</code>) and the remaining elements are its arguments. <em>But</em>
each of these elements must be evaluated first, and the <em>result</em> of that
evaluation becomes the arguments.</p>

<p>Any value that isn’t a list or a symbol is <em>self-evaluating</em>. That is,
it evaluates to its own value:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">eval</span> <span class="mi">10</span><span class="p">)</span>
<span class="c1">;; =&gt; 10</span>
</code></pre></div></div>

<p>If the value is a symbol, it’s treated as a variable. If the value is a
list, it goes through the function call process I’m describing (or one
of a number of other special cases, such as macro expansion, lambda
expressions, and special forms).</p>

<p>So, conceptually <code class="language-plaintext highlighter-rouge">eval</code> recurses on the function object <code class="language-plaintext highlighter-rouge">#[...]</code>. A
function object is not a list or a symbol, so it’s self-evaluating. No
problem.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Byte-code objects are self-evaluating</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">x</span> <span class="p">(</span><span class="nv">byte-compile</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()))))</span>
  <span class="p">(</span><span class="nb">eq</span> <span class="nv">x</span> <span class="p">(</span><span class="nb">eval</span> <span class="nv">x</span><span class="p">)))</span>
<span class="c1">;; =&gt; t</span>
</code></pre></div></div>

<p>What if this code <em>wasn’t</em> compiled? Rather than a byte-code object,
we’d have some other kind of function object for the interpreter.
Let’s examine the dynamic scope (<em>shudder</em>) case. Here, a lambda
<em>appears</em> to evaluate to itself, but appearances can be deceiving:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">eval</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">())</span>
<span class="c1">;; =&gt; (lambda ())</span>
</code></pre></div></div>

<p>However, this is not self-evaluation. <strong>Lambda expressions are not
self-evaluating</strong>. It’s merely <em>coincidence</em> that the result of
evaluating a lambda expression looks like the original expression.
This is just how the Emacs Lisp interpreter is currently implemented
and, strictly speaking, it’s an implementation detail that <em>just so
happens</em> to be mostly compatible with byte-code objects being
self-evaluating. It would be a mistake to rely on this.</p>

<p>Instead, <strong>dynamic scope lambda expression evaluation is
<a href="https://labs.spotify.com/2013/06/18/creative-usernames/">idempotent</a>.</strong> Applying <code class="language-plaintext highlighter-rouge">eval</code> to the result will return
an <code class="language-plaintext highlighter-rouge">equal</code>, but not identical (<code class="language-plaintext highlighter-rouge">eq</code>), expression. In contrast, a
self-evaluating value is also idempotent under evaluation, but with
<code class="language-plaintext highlighter-rouge">eq</code> results.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Not self-evaluating:</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">x</span> <span class="o">'</span><span class="p">(</span><span class="k">lambda</span> <span class="p">())))</span>
  <span class="p">(</span><span class="nb">eq</span> <span class="nv">x</span> <span class="p">(</span><span class="nb">eval</span> <span class="nv">x</span><span class="p">)))</span>
<span class="c1">;; =&gt; nil</span>

<span class="c1">;; Evaluation is idempotent:</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">x</span> <span class="o">'</span><span class="p">(</span><span class="k">lambda</span> <span class="p">())))</span>
  <span class="p">(</span><span class="nb">equal</span> <span class="nv">x</span> <span class="p">(</span><span class="nb">eval</span> <span class="nv">x</span><span class="p">)))</span>
<span class="c1">;; =&gt; t</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">x</span> <span class="o">'</span><span class="p">(</span><span class="k">lambda</span> <span class="p">())))</span>
  <span class="p">(</span><span class="nb">equal</span> <span class="nv">x</span> <span class="p">(</span><span class="nb">eval</span> <span class="p">(</span><span class="nb">eval</span> <span class="nv">x</span><span class="p">))))</span>
<span class="c1">;; =&gt; t</span>
</code></pre></div></div>

<p>So, with dynamic scope, the subtly broken backquote example will still
work, but only by sheer luck. Under lexical scope, the situation isn’t
so lucky:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-scope: t; -*-</span>

<span class="p">(</span><span class="k">lambda</span> <span class="p">())</span>
<span class="c1">;; =&gt; (closure (t) nil)</span>
</code></pre></div></div>

<p>These interpreted lambda functions are neither self-evaluating nor
idempotent. Passing <code class="language-plaintext highlighter-rouge">t</code> as the second argument to <code class="language-plaintext highlighter-rouge">eval</code> tells it to
use lexical scope, as shown below:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Not self-evaluating:</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">x</span> <span class="o">'</span><span class="p">(</span><span class="k">lambda</span> <span class="p">())))</span>
  <span class="p">(</span><span class="nb">eq</span> <span class="nv">x</span> <span class="p">(</span><span class="nb">eval</span> <span class="nv">x</span> <span class="no">t</span><span class="p">)))</span>
<span class="c1">;; =&gt; nil</span>

<span class="c1">;; Not idempotent:</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">x</span> <span class="o">'</span><span class="p">(</span><span class="k">lambda</span> <span class="p">())))</span>
  <span class="p">(</span><span class="nb">equal</span> <span class="nv">x</span> <span class="p">(</span><span class="nb">eval</span> <span class="nv">x</span> <span class="no">t</span><span class="p">)))</span>
<span class="c1">;; =&gt; nil</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">x</span> <span class="o">'</span><span class="p">(</span><span class="k">lambda</span> <span class="p">())))</span>
  <span class="p">(</span><span class="nb">equal</span> <span class="nv">x</span> <span class="p">(</span><span class="nb">eval</span> <span class="p">(</span><span class="nb">eval</span> <span class="nv">x</span> <span class="no">t</span><span class="p">)</span> <span class="no">t</span><span class="p">)))</span>
<span class="c1">;; error: (void-function closure)</span>
</code></pre></div></div>

<p>I can <a href="/blog/2017/05/03/">imagine an implementation</a> of Emacs Lisp where dynamic
scope lambda expressions are in the same boat, where they’re not even
idempotent. For example:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: nil; -*-</span>

<span class="p">(</span><span class="k">lambda</span> <span class="p">())</span>
<span class="c1">;; =&gt; (totally-not-a-closure ())</span>
</code></pre></div></div>

<p>Most Emacs Lisp would work just fine under this change, and only code
that makes some kind of logical mistake — where there’s nested
evaluation of lambda expressions — would break. This essentially
already happened when lots of code was quietly switched over to
lexical scope after Emacs 24. Lambda idempotency was lost and
well-written code didn’t notice.</p>

<p>There’s a temptation here for Emacs to define a <code class="language-plaintext highlighter-rouge">closure</code> function or
special form that would allow interpreter closure objects to be either
self-evaluating or idempotent. This would be a mistake. It would only
serve as a hack that covers up logical mistakes that lead to nested
evaluation. Much better to catch those problems early.</p>

<h3 id="solving-the-problem-with-one-character">Solving the problem with one character</h3>

<p>So how do we fix the subtly broken example? With a strategically
placed quote right before the comma.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">eval-after-load</span> <span class="ss">'simple-httpd</span>
  <span class="o">`</span><span class="p">(</span><span class="nb">funcall</span>
    <span class="ss">',</span><span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
        <span class="p">(</span><span class="nb">push</span> <span class="o">'</span><span class="p">(</span><span class="s">"c"</span> <span class="o">.</span> <span class="s">"text/plain"</span><span class="p">)</span> <span class="nv">httpd-mime-types</span><span class="p">)))</span>
</code></pre></div></div>

<p>So the form passed to <code class="language-plaintext highlighter-rouge">eval-after-load</code> becomes:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Compiled:</span>
<span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="k">quote</span> <span class="err">#</span><span class="nv">[...]</span><span class="p">))</span>

<span class="c1">;; Dynamic scope:</span>
<span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="k">quote</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="o">...</span><span class="p">)))</span>

<span class="c1">;; Lexical scope:</span>
<span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="k">quote</span> <span class="p">(</span><span class="nv">closure</span> <span class="p">(</span><span class="no">t</span><span class="p">)</span> <span class="p">()</span> <span class="o">...</span><span class="p">)))</span>
</code></pre></div></div>

<p>The quote prevents <code class="language-plaintext highlighter-rouge">eval</code> from evaluating the function object, which
would be either needless or harmful. There’s also an argument to be
made that this is a perfect situation for a sharp-quote (<code class="language-plaintext highlighter-rouge">#'</code>), which
exists to quote functions.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>Options for Structured Data in Emacs Lisp</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2018/02/14/"/>
    <id>urn:uuid:3837b5b2-0aba-3381-ff6f-9432f8ff03e9</id>
    <updated>2018-02-14T17:43:34Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>So your Emacs package has grown beyond a dozen or so lines of code, and
the data it manages is now structured and heterogeneous. Informal plain
old lists, the bread and butter of any lisp, are not longer cutting it.
You really need to cleanly abstract this structure, both for your own
organizational sake any for anyone reading your code.</p>

<p>With informal lists as structures, you might regularly ask questions
like, “Was the ‘name’ slot stored in the third list element, or was
it the fourth element?” A plist or alist helps with this problem, but
those are better suited for informal, externally-supplied data, not
for internal structures with fixed slots. Occasionally someone
suggests using hash tables as structures, but Emacs Lisp’s hash tables
are <em>much</em> too heavy for this. Hash tables are more appropriate when
keys themselves are data.</p>

<h3 id="defining-a-data-structure-from-scratch">Defining a data structure from scratch</h3>

<p>Imagine a refrigerator package that manages a collection of food in a
refrigerator. A food item could be structured as a plain old list,
with slots at specific positions.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">fridge-item-create</span> <span class="p">(</span><span class="nv">name</span> <span class="nv">expiry</span> <span class="nv">weight</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">list</span> <span class="nv">name</span> <span class="nv">expiry</span> <span class="nv">weight</span><span class="p">))</span>
</code></pre></div></div>

<p>A function that computes the mean weight of a list of food items might
look like this:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">fridge-mean-weight</span> <span class="p">(</span><span class="nv">items</span><span class="p">)</span>
  <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">null</span> <span class="nv">items</span><span class="p">)</span>
      <span class="mf">0.0</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">sum</span> <span class="mf">0.0</span><span class="p">)</span>
          <span class="p">(</span><span class="nb">count</span> <span class="mi">0</span><span class="p">))</span>
      <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">item</span> <span class="nv">items</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">sum</span> <span class="nb">count</span><span class="p">))</span>
        <span class="p">(</span><span class="nb">setf</span> <span class="nb">count</span> <span class="p">(</span><span class="nb">1+</span> <span class="nb">count</span><span class="p">)</span>
              <span class="nv">sum</span> <span class="p">(</span><span class="nb">+</span> <span class="nv">sum</span> <span class="p">(</span><span class="nb">nth</span> <span class="mi">2</span> <span class="nv">item</span><span class="p">)))))))</span>
</code></pre></div></div>

<p>Note the use of <code class="language-plaintext highlighter-rouge">(nth 2 item)</code> at the end, used to get the item’s
weight. That magic number 2 is easy to mess up. Even worse, if lots of
code accesses “weight” this way, then future extensions will be
inhibited. Defining some accessor functions solves this problem.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defsubst</span> <span class="nv">fridge-item-name</span> <span class="p">(</span><span class="nv">item</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">nth</span> <span class="mi">0</span> <span class="nv">item</span><span class="p">))</span>

<span class="p">(</span><span class="nv">defsubst</span> <span class="nv">fridge-item-expiry</span> <span class="p">(</span><span class="nv">item</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">nth</span> <span class="mi">1</span> <span class="nv">item</span><span class="p">))</span>

<span class="p">(</span><span class="nv">defsubst</span> <span class="nv">fridge-item-weight</span> <span class="p">(</span><span class="nv">item</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">nth</span> <span class="mi">2</span> <span class="nv">item</span><span class="p">))</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">defsubst</code> defines an inline function, so there’s effectively no
additional run-time costs for these accessors compared to a bare
<code class="language-plaintext highlighter-rouge">nth</code>. Since these only cover <em>getting</em> slots, we should also define
some setters using the built-in gv (generalized variable) package.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'gv</span><span class="p">)</span>

<span class="p">(</span><span class="nv">gv-define-setter</span> <span class="nv">fridge-item-name</span> <span class="p">(</span><span class="nv">value</span> <span class="nv">item</span><span class="p">)</span>
  <span class="o">`</span><span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">nth</span> <span class="mi">0</span> <span class="o">,</span><span class="nv">item</span><span class="p">)</span> <span class="o">,</span><span class="nv">value</span><span class="p">))</span>

<span class="p">(</span><span class="nv">gv-define-setter</span> <span class="nv">fridge-item-expiry</span> <span class="p">(</span><span class="nv">value</span> <span class="nv">item</span><span class="p">)</span>
  <span class="o">`</span><span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">nth</span> <span class="mi">1</span> <span class="o">,</span><span class="nv">item</span><span class="p">)</span> <span class="o">,</span><span class="nv">value</span><span class="p">))</span>

<span class="p">(</span><span class="nv">gv-define-setter</span> <span class="nv">fridge-item-weight</span> <span class="p">(</span><span class="nv">value</span> <span class="nv">item</span><span class="p">)</span>
  <span class="o">`</span><span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">nth</span> <span class="mi">2</span> <span class="o">,</span><span class="nv">item</span><span class="p">)</span> <span class="o">,</span><span class="nv">value</span><span class="p">))</span>
</code></pre></div></div>

<p>This makes each slot setf-able. Generalized variables are great for
simplifying APIs, since otherwise there would need to be an equal
number of setter functions (<code class="language-plaintext highlighter-rouge">fridge-item-set-name</code>, etc.). With
generalized variables, both are at the same entrypoint:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">fridge-item-name</span> <span class="nv">item</span><span class="p">)</span> <span class="s">"Eggs"</span><span class="p">)</span>
</code></pre></div></div>

<p>There are still two more significant improvements.</p>

<ol>
  <li>
    <p>As far as Emacs Lisp is concerned, this isn’t a real <em>type</em>. The
type-ness of it is just a fiction created by the conventions of the
package. It would be easy to make the mistake of passing an
arbitrary list to these <code class="language-plaintext highlighter-rouge">fridge-item</code> functions, and the mistake
wouldn’t be caught so long as that list has at least three items.
An common solution is to add a <em>type tag</em>: a symbol at the
beginning of the structure that identifies it.</p>
  </li>
  <li>
    <p>It’s still a linked list, and <code class="language-plaintext highlighter-rouge">nth</code> has to walk the list (i.e.
<code class="language-plaintext highlighter-rouge">O(n)</code>) to retrieve items. It would be much more efficient to use a
vector, turning this into an efficient <code class="language-plaintext highlighter-rouge">O(1)</code> operation.</p>
  </li>
</ol>

<p>Addressing both of these at once:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">fridge-item-create</span> <span class="p">(</span><span class="nv">name</span> <span class="nv">expiry</span> <span class="nv">weight</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">vector</span> <span class="ss">'fridge-item</span> <span class="nv">name</span> <span class="nv">expiry</span> <span class="nv">weight</span><span class="p">))</span>

<span class="p">(</span><span class="nv">defsubst</span> <span class="nv">fridge-item-p</span> <span class="p">(</span><span class="nv">object</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">vectorp</span> <span class="nv">object</span><span class="p">)</span>
       <span class="p">(</span><span class="nb">=</span> <span class="p">(</span><span class="nb">length</span> <span class="nv">object</span><span class="p">)</span> <span class="mi">4</span><span class="p">)</span>
       <span class="p">(</span><span class="nb">eq</span> <span class="ss">'fridge-item</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">object</span> <span class="mi">0</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">defsubst</span> <span class="nv">fridge-item-name</span> <span class="p">(</span><span class="nv">item</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">unless</span> <span class="p">(</span><span class="nv">fridge-item-p</span> <span class="nv">item</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">signal</span> <span class="ss">'wrong-type-argument</span> <span class="p">(</span><span class="nb">list</span> <span class="ss">'fridge-item</span> <span class="nv">item</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">aref</span> <span class="nv">item</span> <span class="mi">1</span><span class="p">))</span>

<span class="p">(</span><span class="nv">defsubst</span> <span class="nv">fridge-item-name--set</span> <span class="p">(</span><span class="nv">item</span> <span class="nv">value</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">unless</span> <span class="p">(</span><span class="nv">fridge-item-p</span> <span class="nv">item</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">signal</span> <span class="ss">'wrong-type-argument</span> <span class="p">(</span><span class="nb">list</span> <span class="ss">'fridge-item</span> <span class="nv">item</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">item</span> <span class="mi">1</span><span class="p">)</span> <span class="nv">value</span><span class="p">))</span>

<span class="p">(</span><span class="nv">gv-define-setter</span> <span class="nv">fridge-item-name</span> <span class="p">(</span><span class="nv">value</span> <span class="nv">item</span><span class="p">)</span>
  <span class="o">`</span><span class="p">(</span><span class="nv">fridge-item-name--set</span> <span class="o">,</span><span class="nv">item</span> <span class="o">,</span><span class="nv">value</span><span class="p">))</span>

<span class="c1">;; And so on for expiry and weight...</span>
</code></pre></div></div>

<p>As long as <code class="language-plaintext highlighter-rouge">fridge-mean-weight</code> uses the <code class="language-plaintext highlighter-rouge">fridge-item-weight</code>
accessor, it continues to work unmodified across all these changes.
But, <em>whew</em>, that’s quite a lot of boilerplate to write and maintain
for each data structure in our package! Boilerplate code generation is
a perfect candidate for a macro definition. Luckily for us, Emacs
already defines a macro to generate all this code: <code class="language-plaintext highlighter-rouge">cl-defstruct</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'cl-lib</span><span class="p">)</span>

<span class="p">(</span><span class="nv">cl-defstruct</span> <span class="nv">fridge-item</span>
  <span class="nv">name</span> <span class="nv">expiry</span> <span class="nv">weight</span><span class="p">)</span>
</code></pre></div></div>

<p>In Emacs 25 and earlier, this innocent looking definition expands into
essentially all the above code. The code it generates is expressed in
<a href="/blog/2017/01/30/">the most optimal form</a> for its version of Emacs, and it
exploits many of the available optimizations by using function
declarations such as <code class="language-plaintext highlighter-rouge">side-effect-free</code> and <code class="language-plaintext highlighter-rouge">error-free</code>. It’s
configurable, too, allowing for the exclusion of a type tag (<code class="language-plaintext highlighter-rouge">:named</code>)
— discarding all the type checks — or using a list rather than a
vector as the underlying structure (<code class="language-plaintext highlighter-rouge">:type</code>). As a crude form of
structural inheritance, it even allows for directly embedding other
structures (<code class="language-plaintext highlighter-rouge">:include</code>).</p>

<h4 id="two-pitfalls">Two pitfalls</h4>

<p>There a couple pitfalls, though. First, for historical reasons, <strong>the
macro will define two namespace-unfriendly functions: <code class="language-plaintext highlighter-rouge">make-NAME</code> and
<code class="language-plaintext highlighter-rouge">copy-NAME</code></strong>. I always override these, preferring the <code class="language-plaintext highlighter-rouge">-create</code>
convention for the constructor, and tossing the copier since it’s
either useless or, worse, semantically wrong.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-defstruct</span> <span class="p">(</span><span class="nv">fridge-item</span> <span class="p">(</span><span class="ss">:constructor</span> <span class="nv">fridge-item-create</span><span class="p">)</span>
                           <span class="p">(</span><span class="ss">:copier</span> <span class="no">nil</span><span class="p">))</span>
  <span class="nv">name</span> <span class="nv">expiry</span> <span class="nv">weight</span><span class="p">)</span>
</code></pre></div></div>

<p>If the constructor needs to be more sophisticated than just setting
slots, it’s common to define a “private” constructor (double dash in
the name) and wrap it with a “public” constructor that has some
behavior.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-defstruct</span> <span class="p">(</span><span class="nv">fridge-item</span> <span class="p">(</span><span class="ss">:constructor</span> <span class="nv">fridge-item--create</span><span class="p">)</span>
                           <span class="p">(</span><span class="ss">:copier</span> <span class="no">nil</span><span class="p">))</span>
  <span class="nv">name</span> <span class="nv">expiry</span> <span class="nv">weight</span> <span class="nv">entry-time</span><span class="p">)</span>

<span class="p">(</span><span class="nv">cl-defun</span> <span class="nv">fridge-item-create</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">args</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">apply</span> <span class="nf">#'</span><span class="nv">fridge-item--create</span> <span class="ss">:entry-time</span> <span class="p">(</span><span class="nv">float-time</span><span class="p">)</span> <span class="nv">args</span><span class="p">))</span>
</code></pre></div></div>

<p>The other pitfall is related to printing. In Emacs 25 and earlier,
types defined by <code class="language-plaintext highlighter-rouge">cl-defstruct</code> are still only types by convention.
They’re really just vectors as far as Emacs Lisp is concerned. One
benefit from this is that <a href="/blog/2013/12/30/">printing and reading</a> these
structures is “free” because vectors are printable. It’s trivial to
serialize <code class="language-plaintext highlighter-rouge">cl-defstruct</code> structures out to a file. This is <a href="/blog/2013/09/09/">exactly
how the Elfeed database works</a>.</p>

<p>The pitfall is that <strong>once a structure has been serialized, there’s no
more changing the <code class="language-plaintext highlighter-rouge">cl-defstruct</code> definition.</strong> It’s now a file format
definition, so the slots are locked in place. Forever.</p>

<p>Emacs 26 throws a wrench in all this, though it’s worth it in the long
run. There’s a new primitive type in Emacs 26 with its own reader
syntax: records. This is similar to hash tables <a href="/blog/2010/06/07/">becoming first class
in the reader in Emacs 23.2</a>. In Emacs 26, <code class="language-plaintext highlighter-rouge">cl-defstruct</code> uses
records instead of vectors.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Emacs 25:</span>
<span class="p">(</span><span class="nv">fridge-item-create</span> <span class="ss">:name</span> <span class="s">"Eggs"</span> <span class="ss">:weight</span> <span class="mf">11.1</span><span class="p">)</span>
<span class="c1">;; =&gt; [cl-struct-fridge-item "Eggs" nil 11.1]</span>

<span class="c1">;; Emacs 26:</span>
<span class="p">(</span><span class="nv">fridge-item-create</span> <span class="ss">:name</span> <span class="s">"Eggs"</span> <span class="ss">:weight</span> <span class="mf">11.1</span><span class="p">)</span>
<span class="c1">;; =&gt; #s(fridge-item "Eggs" nil 11.1)</span>
</code></pre></div></div>

<p>So far slots are still accessed using <code class="language-plaintext highlighter-rouge">aref</code>, and all the type
checking still happens in Emacs Lisp. The only practical change is the
<code class="language-plaintext highlighter-rouge">record</code> function is used in place of the <code class="language-plaintext highlighter-rouge">vector</code> function when
allocating a structure. But it does pave the way for more interesting
things in the future.</p>

<p>The major short-term downside is that this breaks printed compatibility
across the Emacs 25/26 boundary. The <code class="language-plaintext highlighter-rouge">cl-old-struct-compat-mode</code>
function can be used for <em>some</em> degree of backwards, but not forwards,
compatibility. Emacs 26 can read and use some structures printed by
Emacs 25 and earlier, but the reverse will never be true. This issue
initially <a href="https://debbugs.gnu.org/cgi/bugreport.cgi?bug=27617">tripped up Emacs’ built-in packages</a>, and when Emacs 26
is released we’ll see more of these issues arise in external packages.</p>

<h3 id="dynamic-dispatch">Dynamic dispatch</h3>

<p>Prior to Emacs 25, the major built-in package for dynamic dispatch —
functions that specialize on the run-time type of their arguments — was
EIEIO, though it only supported single dispatch (specializing on a
single argument). EIEIO brought much of the Common Lisp Object System
(CLOS) to Emacs Lisp, including classes and methods.</p>

<p>Emacs 25 introduced a more sophisticated dynamic dispatch package
called cl-generic. It focuses only on dynamic dispatch and supports
multiple dispatch, completely replacing the dynamic dispatch portion
of EIEIO. Since <code class="language-plaintext highlighter-rouge">cl-defstruct</code> does inheritance and cl-generic does
dynamic dispatch, there’s not really much left for EIEIO — besides bad
ideas like multiple inheritance and method combination.</p>

<p>Without either of these packages, the most direct way to build single
dispatch on top of <code class="language-plaintext highlighter-rouge">cl-defstruct</code> would be to <a href="/blog/2014/10/21/">shove a function in one
of the slots</a>. Then the “method” is just a wrapper that call this
function.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Base "class"</span>

<span class="p">(</span><span class="nv">cl-defstruct</span> <span class="nv">greeter</span>
  <span class="nv">greeting</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">greet</span> <span class="p">(</span><span class="nv">thing</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">greeter-greeting</span> <span class="nv">thing</span><span class="p">)</span> <span class="nv">thing</span><span class="p">))</span>

<span class="c1">;; Cow "class"</span>

<span class="p">(</span><span class="nv">cl-defstruct</span> <span class="p">(</span><span class="nv">cow</span> <span class="p">(</span><span class="ss">:include</span> <span class="nv">greeter</span><span class="p">)</span>
                   <span class="p">(</span><span class="ss">:constructor</span> <span class="nv">cow--create</span><span class="p">)))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">cow-create</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">cow--create</span> <span class="ss">:greeting</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">_</span><span class="p">)</span> <span class="s">"Moo!"</span><span class="p">)))</span>

<span class="c1">;; Bird "class"</span>

<span class="p">(</span><span class="nv">cl-defstruct</span> <span class="p">(</span><span class="nv">bird</span> <span class="p">(</span><span class="ss">:include</span> <span class="nv">greeter</span><span class="p">)</span>
                    <span class="p">(</span><span class="ss">:constructor</span> <span class="nv">bird--create</span><span class="p">)))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">bird-create</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">bird--create</span> <span class="ss">:greeting</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">_</span><span class="p">)</span> <span class="s">"Chirp!"</span><span class="p">)))</span>

<span class="c1">;; Usage:</span>

<span class="p">(</span><span class="nv">greet</span> <span class="p">(</span><span class="nv">cow-create</span><span class="p">))</span>
<span class="c1">;; =&gt; "Moo!"</span>

<span class="p">(</span><span class="nv">greet</span> <span class="p">(</span><span class="nv">bird-create</span><span class="p">))</span>
<span class="c1">;; =&gt; "Chirp!"</span>
</code></pre></div></div>

<p>Since cl-generic is aware of the types created by <code class="language-plaintext highlighter-rouge">cl-defstruct</code>,
functions can specialize on them as if they were native types. It’s a
lot simpler to let cl-generic do all the hard work. The people reading
your code will appreciate it, too:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'cl-generic</span><span class="p">)</span>

<span class="p">(</span><span class="nv">cl-defgeneric</span> <span class="nv">greet</span> <span class="p">(</span><span class="nv">greeter</span><span class="p">))</span>

<span class="p">(</span><span class="nv">cl-defstruct</span> <span class="nv">cow</span><span class="p">)</span>

<span class="p">(</span><span class="nv">cl-defmethod</span> <span class="nv">greet</span> <span class="p">((</span><span class="nv">_</span> <span class="nv">cow</span><span class="p">))</span>
  <span class="s">"Moo!"</span><span class="p">)</span>

<span class="p">(</span><span class="nv">cl-defstruct</span> <span class="nv">bird</span><span class="p">)</span>

<span class="p">(</span><span class="nv">cl-defmethod</span> <span class="nv">greet</span> <span class="p">((</span><span class="nv">_</span> <span class="nv">bird</span><span class="p">))</span>
  <span class="s">"Chirp!"</span><span class="p">)</span>

<span class="p">(</span><span class="nv">greet</span> <span class="p">(</span><span class="nv">make-cow</span><span class="p">))</span>
<span class="c1">;; =&gt; "Moo!"</span>

<span class="p">(</span><span class="nv">greet</span> <span class="p">(</span><span class="nv">make-bird</span><span class="p">))</span>
<span class="c1">;; =&gt; "Chirp!"</span>
</code></pre></div></div>

<p>The majority of the time a simple <code class="language-plaintext highlighter-rouge">cl-defstruct</code> will fulfill your
needs, keeping in mind the gotcha with the constructor and copier
names. Its use should feel almost as natural as defining functions.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>Debugging Emacs or: How I Learned to Stop Worrying and Love DTrace</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2018/01/17/"/>
    <id>urn:uuid:a55cabc9-2d87-30a4-9066-9ec5e45b8bce</id>
    <updated>2018-01-17T23:59:49Z</updated>
    <category term="emacs"/><category term="elfeed"/><category term="bsd"/>
    <content type="html">
      <![CDATA[<p><em>Update: This article was featured on <a href="https://www.youtube.com/watch?v=Xi_pX2QIzho">BSD Now 233</a> (starting
at 21:38).</em></p>

<p>For some time <a href="https://github.com/skeeto/elfeed">Elfeed</a> was experiencing a strange, spurious
failure. Every so often users were <a href="https://github.com/skeeto/elfeed/issues/248">seeing an error</a> (spoiler
warning) when updating feeds: “error in process sentinel: Search
failed.” If you use Elfeed, you might have even seen this yourself.
From the surface it appeared that curl, tasked with the
<a href="/blog/2016/06/16/">responsibility for downloading feed data</a>, was producing
incomplete output despite reporting a successful run. Since the run
was successful, Elfeed assumed certain data was in curl’s output
buffer, but, since it wasn’t, it failed hard.</p>

<!--more-->

<p>Unfortunately this issue was not reproducible. Manually running curl
outside of Emacs never revealed any issues. Asking Elfeed to retry
fetching the feeds would work fine. The issue would only randomly rear
its head when Elfeed was fetching many feeds in parallel, under
stress. By the time the error was discovered, the curl process had
exited and vital debugging information was lost. Considering that
this was likely to be a bug in Emacs itself, there really wasn’t a
reliable way to capture the necessary debugging information from
within Emacs Lisp. And, indeed, this later proved to be the case.</p>

<p>A quick-and-dirty work around is to use <code class="language-plaintext highlighter-rouge">condition-case</code> to catch and
swallow the error. When the bizarre issue shows up, rather than fail
badly in front of the user, Elfeed could attempt to swallow the error
— assuming it can be reliably detected — and treat the fetch as simply
a failure. That didn’t sit comfortably with me. Elfeed had done its
due diligence checking for errors already. <em>Someone</em> was lying to
Elfeed, and I intended to catch them with their pants on fire.
Someday.</p>

<p>I’d just need to witness the bug on one of my own machines. Elfeed is
part of my daily routine, so surely I’d have to experience this issue
myself someday. My plan was, should that day come, to run a modified
Elfeed, instrumented to capture extra data. I would have also routinely
run Emacs under GDB so that I could inspect the failure more deeply.</p>

<p>For now I just had to wait to <a href="https://www.youtube.com/watch?v=fE2KDzZaxvE">hunt that zebra</a>.</p>

<h3 id="bryan-cantrill-dtrace-and-freebsd">Bryan Cantrill, DTrace, and FreeBSD</h3>

<p>Over the holidays I re-discovered <a href="https://en.wikipedia.org/wiki/Bryan_Cantrill">Bryan Cantrill</a>, a systems
software engineer who worked for Sun between 1996 and 2010, and is most
well known for <a href="http://dtrace.org/blogs/about/">DTrace</a>. My first exposure to him was in a <a href="https://www.youtube.com/watch?v=l6XQUciI-Sc">BSD
Now interview</a> in 2015. I had re-watched that interview and decided
there was a lot more I had to learn from him. He’s become a personal
hero to me. So I scoured the internet for <a href="http://dtrace.org/blogs/bmc/2018/02/03/talks/">more of his writing and
talks</a>. Besides what I’ve already linked in this article, here
are a couple more great presentations:</p>

<ul>
  <li><a href="https://www.youtube.com/watch?v=4PaWFYm0kEw">Oral Tradition in Software Engineering</a></li>
  <li><a href="https://www.youtube.com/watch?v=-zRN7XLCRhc">Fork Yeah! The Rise and Development of illumos</a></li>
</ul>

<p>You can also find some of his writing <a href="http://dtrace.org/blogs/bmc/">scattered around the DTrace
blog</a>.</p>

<p>Some interesting operating system technology came out of Sun during
its final 15 or so years — most notably DTrace and ZFS — and Bryan
speaks about it passionately. Almost as a matter of luck, most of it
survived the Oracle acquisition thanks to Sun releasing it as open
source in just the nick of time. Otherwise it would have been lost
forever. The scattered ex-Sun employees, still passionate about their
prior work at Sun, along with some of their old customers have since
picked up the pieces and kept going as a community under the name
<a href="https://illumos.org/">illumos</a>. It’s like an open source flotilla.</p>

<p>Naturally I wanted to get my hands on this stuff to try it out for
myself. Is it really as good as they say? Normally I stick to Linux,
but it (generally) doesn’t have these Sun technologies. The main
reason is license incompatibility. Sun released its code under the
<a href="https://opensource.org/licenses/CDDL-1.0">CDDL</a>, which is incompatible with the GPL. Ubuntu <em>does</em>
<a href="https://insights.ubuntu.com/2016/02/18/zfs-licensing-and-linux/">infamously include ZFS</a>, but other distributions are
unwilling to take that risk. Porting DTrace is a serious undertaking
since it’s got its fingers throughout the kernel, which also makes the
licensing issues even more complicated.</p>

<p>(<em>Update Feburary 2018</em>: <a href="https://gnu.wildebeest.org/blog/mjw/2018/02/14/dtrace-for-linux-oracle-does-the-right-thing/">DTrace has been released under the
GPLv2</a>, allowing it to be legally integrated with Linux.)</p>

<p>Linux has a reputation for Not Invented Here (NIH) syndrome, and these
licensing issues certainly contribute to that. Rather than adopt ZFS
and DTrace, they’ve been reinvented from scratch: btrfs instead of
ZFS, and <a href="http://www.brendangregg.com/blog/2015-07-08/choosing-a-linux-tracer.html">a slew of partial options</a> instead of DTrace.
Normally I’m most interested in system call tracing, and my go to is
<a href="https://en.wikipedia.org/wiki/Strace">strace</a>, though it certainly has its limitations — including
this situation of debugging curl under Emacs. Another famous example
of NIH is Linux’s <a href="http://man7.org/linux/man-pages/man7/epoll.7.html"><code class="language-plaintext highlighter-rouge">epoll(2)</code></a>, which is a <a href="https://idea.popcount.org/2017-02-20-epoll-is-fundamentally-broken-12/">broken</a>
<a href="https://idea.popcount.org/2017-03-20-epoll-is-fundamentally-broken-22/">version</a> of BSD <a href="https://www.freebsd.org/cgi/man.cgi?query=kqueue&amp;sektion=2"><code class="language-plaintext highlighter-rouge">kqueue(2)</code></a>.</p>

<p>So, if I want to try these for myself, I’ll need to install a
different operating system. I’ve dabbled with <a href="https://omnios.omniti.com/">OmniOS</a>, an OS
built on illumos, in virtual machines, using it as an alien
environment to test some of my software (e.g. <a href="/blog/2017/03/12/">enchive</a>).
OmniOS has a philosophy called <a href="https://omnios.omniti.com/wiki.php/KYSTY">Keep Your Software To Yourself</a>
(KYSTY), which is really just code for “we don’t do packaging.”
Honestly, you can’t blame them since <a href="https://utcc.utoronto.ca/~cks/space/blog/solaris/IllumosSupportLimits">they’re a tiny community</a>.
The best solution to this is probably <a href="https://www.pkgsrc.org/">pkgsrc</a>, which is
essentially a universal packaging system. Otherwise <a href="/blog/2017/06/19/">you’re on your
own</a>.</p>

<p>There’s also <a href="https://www.openindiana.org/">openindiana</a>, which is a more friendly
desktop-oriented illumos distribution. Still, the short of it is that
you’re very much on your own when things don’t work. The situation is
like running Linux a couple decades ago, when it was still difficult
to do.</p>

<p>If you’re interested in trying DTrace, the easiest option these days is
probably <a href="https://www.freebsd.org/">FreeBSD</a>. It’s got a big, active community, thorough
documentation, and a huge selection of packages. Its license (the <em>BSD
license</em>, duh) is compatible with the CDDL, so both ZFS and DTrace have
been ported to FreeBSD.</p>

<h3 id="what-is-dtrace">What is DTrace?</h3>

<p>I’ve done all this talking but haven’t yet described what <a href="https://wiki.freebsd.org/DTrace/Tutorial">DTrace
really is</a>. I won’t pretend to write my own tutorial, but I’ll
provide enough information to follow along. DTrace is a tracing
framework for debugging production systems <em>in real time</em>, both for
the kernel and for applications. The “production systems” part means
it’s stable and safe — using DTrace won’t put your system at risk of
crashing or damaging data. The “real time” part means it has little
impact on performance. You can use DTrace on live, active systems with
little impact. Both of these core design principles are vital for
troubleshooting those really tricky bugs that only show up in
production.</p>

<p>There are DTrace <em>probes</em> scattered all throughout the system: on
system calls, scheduler events, networking events, process events,
signals, virtual memory events, etc. Using a specialized language
called D (unrelated to the general purpose programming language D),
you can dynamically add behavior at these instrumentation points.
Generally the behavior is to capture information, but it can also
manipulate the event being traced.</p>

<p>Each probe is fully identified by a 4-tuple delimited by colons:
provider, module, function, and probe name. An empty element denotes a
sort of wildcard. For example, <code class="language-plaintext highlighter-rouge">syscall::open:entry</code> is a probe at the
beginning (i.e. “entry”) of <code class="language-plaintext highlighter-rouge">open(2)</code>. <code class="language-plaintext highlighter-rouge">syscall:::entry</code> matches all
system call entry probes.</p>

<p>Unlike strace on Linux which monitors a specific process, DTrace
applies to the entire system when active. To run curl under strace
from Emacs, I’d have to modify Emacs’ behavior to do so. With DTrace I
can instrument every curl process without making a single change to
Emacs, and with negligible impact to Emacs. That’s a big deal.</p>

<p>So, when it comes to this Elfeed issue, FreeBSD is much better poised
for debugging the problem. All I have to do is catch it in the act.
However, it’s been months since that bug report and I’m not really
making this connection yet. I’m just hoping I eventually find an
interesting problem where I can apply DTrace.</p>

<h3 id="freebsd-on-a-raspberry-pi-2">FreeBSD on a Raspberry Pi 2</h3>

<p>So I’ve settled in FreeBSD as the playground for these technologies, I
just have to decide where. I could always run it in a virtual machine,
but it’s always more interesting to try things out on real hardware.
<a href="https://wiki.freebsd.org/FreeBSD/arm/Raspberry%20Pi">FreeBSD supports the Raspberry Pi 2</a> as a Tier 2 system, and
I had a Raspberry Pi 2 sitting around collecting dust, so I put it to
use.</p>

<p>I wrote the image to an SD card, and for a few days I stretched my
legs on this new system. I cloned a couple dozen of my own git
repositories, ran the builds and the tests, and just got a feel for
things. I tried out the ports system for the first time, mainly to
discover that the low-powered Raspberry Pi 2 takes days to build some
of the packages I want to try.</p>

<p>I <a href="/blog/2017/04/01/">mostly program in Vim these days</a>, so it’s some days before I
even set up Emacs. Eventually I do build Emacs, clone my
configuration, fire it up, and give Elfeed a spin.</p>

<p>And that’s when the “search failed” bug strikes! Not just once, but
dozens of times. Perfect! This low-powered platform is the jackpot for
this particular bug, triggering it left and right. Given that I’ve got
DTrace at my disposal, it’s <em>the</em> perfect place to debug this.
Something is lying to Elfeed and DTrace will play the judge.</p>

<p>Before I dive in I see three possibilities:</p>

<ol>
  <li>curl is reporting success but truncating its output.</li>
  <li>Emacs is quietly truncating curl’s output.</li>
  <li>Emacs is misinterpreting curl’s exit status.</li>
</ol>

<p>With Dtrace I can observe what every curl process writes to Emacs, and
I can also double check curl’s exit status. I come up with the
following (newbie) DTrace script:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>syscall::write:entry
/execname == "curl"/
{
    printf("%d WRITE %d \"%s\"\n",
           pid, arg2, stringof(copyin(arg1, arg2)));
}

syscall::exit:entry
/execname == "curl"/
{
    printf("%d EXIT  %d\n", pid, arg0);
}
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">/execname == "curl"/</code> is a predicate that (obviously) causes the
behavior to only fire for curl processes. The first probe has DTrace
print a line for every <code class="language-plaintext highlighter-rouge">write(2)</code> from curl. <code class="language-plaintext highlighter-rouge">arg0</code>, <code class="language-plaintext highlighter-rouge">arg1</code>, and
<code class="language-plaintext highlighter-rouge">arg2</code> correspond to the arguments of <code class="language-plaintext highlighter-rouge">write(2)</code>: fd, buf, count. It
logs the process ID (pid) of the write, the length of the write, and
the actual contents written. Remember that these curl processes are
run in parallel by Emacs, so the pid allows me to associate the
separate writes and the exit status.</p>

<p>The second probe prints the pid and the exit status (the first argument
to <code class="language-plaintext highlighter-rouge">exit(2)</code>).</p>

<p>I also want to compare this to exactly what is delivered to Elfeed when
curl exits, so I modify the <a href="http://www.gnu.org/software/emacs/manual/html_node/elisp/Sentinels.html">process sentinel</a> — the callback
that handles a subprocess exiting — to call <code class="language-plaintext highlighter-rouge">write-file</code> before any
action is taken. I can compare these buffer dumps to the logs produced
by DTrace.</p>

<p>There are two important findings.</p>

<p>First, when the “search failed” bug occurs, the buffer was completely
empty (95% of the time) or truncated at the end of the HTTP headers
(5% of the time), right at the blank line. DTrace indicates that curl
did its job to the full, so it’s Emacs who’s the liar. It’s not
delivering all of curl’s data to Elfeed. That’s pretty annoying.</p>

<p>Second, <strong>curl was line-buffered</strong>. Each line was a separate,
independent <code class="language-plaintext highlighter-rouge">write(2)</code>. I was certainly <em>not</em> expecting this. Normally
the C library only does line buffering when the output is a terminal.
That’s because it’s guessing a user may be watching, expecting the
output to arrive a line at a time.</p>

<p>Here’s a sample of what it looked like in the log:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>88188 WRITE 32 "Server: Apache/2.4.18 (Ubuntu)
"
88188 WRITE 46 "Location: https://blog.plover.com/index.atom
"
88188 WRITE 21 "Content-Length: 299
"
88188 WRITE 45 "Content-Type: text/html; charset=iso-8859-1
"
88188 WRITE 2 "
"
</code></pre></div></div>

<p>Why would curl think Emacs is a terminal?</p>

<p><em>Oh.</em> That’s right. <em>This is the <a href="/blog/2014/02/06/">same problem I ran into four years
ago when writing EmacSQL</a>.</em> By default Emacs connects to
subprocesses through a pseudo-terminal (pty). I called this a mistake
in Emacs back then, and I still stand by that claim. The pty causes
weird, annoying problems for little benefit:</p>

<ul>
  <li>Interpreting control characters. Hope you weren’t transferring binary
data!</li>
  <li>Subprocesses will generally get line buffered. This makes them
slower, though in some situations it might be desirable.</li>
  <li>Stdout and stderr get mixed together. (Optional since Emacs 25.)</li>
  <li><em>New!</em> There’s a bug somewhere in Emacs that causes truncation when
ptys are used heavily in parallel.</li>
</ul>

<p>Just from eyeballing the DTrace log I knew what to do: dump the pty
and switch to a pipe. This is controlled with the
<code class="language-plaintext highlighter-rouge">process-connection-type</code> variable, and fixing it <a href="https://github.com/skeeto/elfeed/commit/945765a57d2f27996b6a43bc62e803dc167d1547">is a
one-liner</a>.</p>

<p>Not only did this completely resolve the truncation issue, Elfeed is
noticeably faster at fetching feeds on all machines. It’s no longer
receiving mountains of XML one line at a time, like sucking pudding
through a straw. It’s now quite zippy even on my Raspberry Pi 2, which
had <em>never</em> been the case before (without the “search failed” bug).
Even if you were never affected by this bug, you will benefit from the
fix.</p>

<p>I haven’t officially reported this as an Emacs bug yet because
reproducibility is still an issue. It needs something better than
“fire off a bunch of HTTP requests across the internet in parallel
from a Raspberry Pi.”</p>

<p>The fix reminds me of that <a href="https://www.buzzmaven.com/old-engineer-hammer-2/">old boilermaker story</a> about
charging a lot of money just to swing a hammer. Once the problem
arose, <strong>DTrace quickly helped to identify the place to hit Emacs with
the hammer</strong>.</p>

<p><em>Finally, a big thanks to alphapapa for originally taking the time to
report this bug months ago.</em></p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>What's in an Emacs Lambda</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2017/12/14/"/>
    <id>urn:uuid:efcc8cf7-11d3-3bd3-9fc9-a23e80f7bf33</id>
    <updated>2017-12-14T18:18:57Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="compsci"/><category term="lang"/>
    <content type="html">
      <![CDATA[<p>There was recently some <a href="https://old.reddit.com/r/emacs/comments/7h23ed/dynamically_construct_a_lambda_function/">interesting discussion</a> about correctly
using backquotes to express a mixture of data and code. Since lambda
expressions <em>seem</em> to evaluate to themselves, what’s the difference?
For example, an association list of operations:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">'</span><span class="p">((</span><span class="nv">add</span> <span class="o">.</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">+</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">sub</span> <span class="o">.</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">mul</span> <span class="o">.</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">div</span> <span class="o">.</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">))))</span>
</code></pre></div></div>

<p>It looks like it would work, and indeed it does work in this case.
However, there are good reasons to actually evaluate those lambda
expressions. Eventually invoking the lambda expressions in the quoted
form above are equivalent to using <code class="language-plaintext highlighter-rouge">eval</code>. So, instead, prefer the
backquote form:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">`</span><span class="p">((</span><span class="nv">add</span> <span class="o">.</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">+</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">sub</span> <span class="o">.</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">mul</span> <span class="o">.</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">div</span> <span class="o">.</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">))))</span>
</code></pre></div></div>

<p>There are a lot of interesting things to say about this, but let’s
first reduce it to two very simple cases:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>

<span class="o">'</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>
</code></pre></div></div>

<p>What’s the difference between these two forms? The first is a lambda
expression, and it evaluates to a function object. The other is a quoted
list that <em>looks like</em> a lambda expression, and it evaluates to a list —
a piece of data.</p>

<p>A naive evaluation of these expressions in <code class="language-plaintext highlighter-rouge">*scratch*</code> (<code class="language-plaintext highlighter-rouge">C-x C-e</code>)
suggests they are are identical, and so it would seem that quoting a
lambda expression doesn’t really matter:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>
<span class="c1">;; =&gt; (lambda (x) x)</span>

<span class="o">'</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>
<span class="c1">;; =&gt; (lambda (x) x)</span>
</code></pre></div></div>

<p>However, there are two common situations where this is not the case:
<strong>byte compilation</strong> and <strong>lexical scope</strong>.</p>

<h3 id="lambda-under-byte-compilation">Lambda under byte compilation</h3>

<p>It’s a little trickier to evaluate these forms byte compiled in the
scratch buffer since that doesn’t happen automatically. But if it did,
it would look like this:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: nil; -*-</span>

<span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>
<span class="c1">;; =&gt; #[(x) "\010\207" [x] 1]</span>

<span class="o">'</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>
<span class="c1">;; =&gt; (lambda (x) x)</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">#[...]</code> is the syntax for a byte-code function object. As
discussed in detail in <a href="/blog/2014/01/04/">my byte-code internals article</a>, it’s a
special vector object that contains byte-code, and other metadata, for
evaluation by Emacs’ virtual stack machine. Elisp is one of very few
languages with <a href="/blog/2013/12/30/">readable function objects</a>, and this feature is
core to its ahead-of-time byte compilation.</p>

<p>The quote, by definition, prevents evaluation, and so inhibits byte
compilation of the lambda expression. It’s vital that the byte compiler
does not try to guess the programmer’s intent and compile the expression
anyway, since that would interfere with lists that just so happen to
look like lambda expressions — i.e. any list containing the <code class="language-plaintext highlighter-rouge">lambda</code>
symbol.</p>

<p>There are three reasons you want your lambda expressions to get byte
compiled:</p>

<ul>
  <li>
    <p>Byte-compiled functions are significantly faster. That’s the main
purpose for byte compilation after all.</p>
  </li>
  <li>
    <p>The compiler performs static checks, producing warnings and errors
ahead of time. This lets you spot certain classes of problems before
they occur. The static analysis is even better under lexical scope due
to its tighter semantics.</p>
  </li>
  <li>
    <p>Under lexical scope, byte-compiled closures may use less memory. More
specifically, they won’t accidentally keep objects alive longer than
necessary. I’ve never seen a name for this implementation issue, but I
call it <em>overcapturing</em>. More on this later.</p>
  </li>
</ul>

<p>While it’s common for personal configurations to skip byte compilation,
Elisp should still generally be written as if it were going to be byte
compiled. General rule of thumb: <strong>Ensure your lambda expressions are
actually evaluated.</strong></p>

<h3 id="lambda-in-lexical-scope">Lambda in lexical scope</h3>

<p>As I’ve stressed many times, <a href="/blog/2016/12/22/">you should <em>always</em> use lexical
scope</a>. There’s no practical disadvantage or trade-off involved.
Just do it.</p>

<p>Once lexical scope is enabled, the two expressions diverge even without
byte compilation:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: t; -*-</span>

<span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>
<span class="c1">;; =&gt; (closure (t) (x) x)</span>

<span class="o">'</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>
<span class="c1">;; =&gt; (lambda (x) x)</span>
</code></pre></div></div>

<p>Under lexical scope, lambda expressions evaluate to <em>closures</em>.
Closures capture their lexical environment in their closure object —
nothing in this particular case. It’s a type of function object,
making it a valid first argument to <code class="language-plaintext highlighter-rouge">funcall</code>.</p>

<p>Since the quote prevents the second expression from being evaluated,
semantically it evaluates to a list that just so happens to look like
a (non-closure) function object. <strong>Invoking a <em>data</em> object as a
function is like using <code class="language-plaintext highlighter-rouge">eval</code></strong> — i.e. executing data as code.
Everyone already knows <code class="language-plaintext highlighter-rouge">eval</code> should not be used lightly.</p>

<p>It’s a little more interesting to look at a closure that actually
captures a variable, so here’s a definition for <code class="language-plaintext highlighter-rouge">constantly</code>, a
higher-order function that returns a closure that accepts any number of
arguments and returns a particular constant:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nb">constantly</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span>
  <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">_</span><span class="p">)</span> <span class="nv">x</span><span class="p">))</span>
</code></pre></div></div>

<p>Without byte compiling it, here’s an example of its return value:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">constantly</span> <span class="ss">:foo</span><span class="p">)</span>
<span class="c1">;; =&gt; (closure ((x . :foo) t) (&amp;rest _) x)</span>
</code></pre></div></div>

<p>The environment has been captured as an association list (with a
trailing <code class="language-plaintext highlighter-rouge">t</code>), and we can plainly see that the variable <code class="language-plaintext highlighter-rouge">x</code> is bound to
the symbol <code class="language-plaintext highlighter-rouge">:foo</code> in this closure. Consider that we could manipulate
this data structure (e.g. <code class="language-plaintext highlighter-rouge">setcdr</code> or <code class="language-plaintext highlighter-rouge">setf</code>) to change the binding of
<code class="language-plaintext highlighter-rouge">x</code> for this closure. <em>This is essentially how closures mutate their own
environment.</em> Moreover, closures from the same environment share
structure, so such mutations are also shared. More on this later.</p>

<p>Semantically, closures are distinct objects (via <code class="language-plaintext highlighter-rouge">eq</code>), even if the
variables they close over are bound to the same value. This is because
they each have a distinct environment attached to them, even if in
some invisible way.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nb">constantly</span> <span class="ss">:foo</span><span class="p">)</span> <span class="p">(</span><span class="nb">constantly</span> <span class="ss">:foo</span><span class="p">))</span>
<span class="c1">;; =&gt; nil</span>
</code></pre></div></div>

<p>Without byte compilation, this is true <em>even when there’s no lexical
environment to capture</em>:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">dummy</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="no">t</span><span class="p">))</span>

<span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nv">dummy</span><span class="p">)</span> <span class="p">(</span><span class="nv">dummy</span><span class="p">))</span>
<span class="c1">;; =&gt; nil</span>
</code></pre></div></div>

<p>The byte compiler is smart, though. <a href="/blog/2017/01/30/">As an optimization</a>, the
same closure object is reused when possible, avoiding unnecessary
work, including multiple object allocations. Though this is a bit of
an abstraction leak. A function can (ab)use this to introspect whether
it’s been byte compiled:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">have-i-been-compiled-p</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">funcs</span> <span class="p">(</span><span class="nb">vector</span> <span class="no">nil</span> <span class="no">nil</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">i</span> <span class="mi">2</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">funcs</span> <span class="nv">i</span><span class="p">)</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">())))</span>
    <span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">funcs</span> <span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">funcs</span> <span class="mi">1</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">have-i-been-compiled-p</span><span class="p">)</span>
<span class="c1">;; =&gt; nil</span>

<span class="p">(</span><span class="nv">byte-compile</span> <span class="ss">'have-i-been-compiled-p</span><span class="p">)</span>

<span class="p">(</span><span class="nv">have-i-been-compiled-p</span><span class="p">)</span>
<span class="c1">;; =&gt; t</span>
</code></pre></div></div>

<p>The trick here is to evaluate the exact same non-capturing lambda
expression twice, which requires a loop (or at least some sort of
branch). <em>Semantically</em> we should think of these closures as being
distinct objects, but, if we squint our eyes a bit, we can see the
effects of the behind-the-scenes optimization.</p>

<p>Don’t actually do this in practice, of course. That’s what
<code class="language-plaintext highlighter-rouge">byte-code-function-p</code> is for, which won’t rely on a subtle
implementation detail.</p>

<h3 id="overcapturing">Overcapturing</h3>

<p>I mentioned before that one of the potential gotchas of not byte
compiling your lambda expressions is overcapturing closure variables in
the interpreter.</p>

<p>To evaluate lisp code, Emacs has both an interpreter and a virtual
machine. The interpreter evaluates code in list form: cons cells,
numbers, symbols, etc. The byte compiler is like the interpreter, but
instead of directly executing those forms, it emits byte-code that, when
evaluated by the virtual machine, produces identical visible results to
the interpreter — <em>in theory</em>.</p>

<p>What this means is that <strong>Emacs contains two different implementations
of Emacs Lisp</strong>, one in the interpreter and one in the byte compiler.
The Emacs developers have been maintaining and expanding these
implementations side-by-side for decades. A pitfall to this approach
is that the <em>implementations can, and do, diverge in their behavior</em>.
We saw this above with that introspective function, and it <a href="/blog/2013/01/22/">comes up
in practice with advice</a>.</p>

<p>Another way they diverge is in closure variable capture. For example:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: t; -*-</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">overcapture</span> <span class="p">(</span><span class="nv">x</span> <span class="nv">y</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">when</span> <span class="nv">y</span>
    <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="nv">x</span><span class="p">)))</span>

<span class="p">(</span><span class="nv">overcapture</span> <span class="ss">:x</span> <span class="ss">:some-big-value</span><span class="p">)</span>
<span class="c1">;; =&gt; (closure ((y . :some-big-value) (x . :x) t) nil x)</span>
</code></pre></div></div>

<p>Notice that the closure captured <code class="language-plaintext highlighter-rouge">y</code> even though it’s unnecessary.
This is because the interpreter doesn’t, and shouldn’t, take the time
to analyze the body of the lambda to determine which variables should
be captured. That would need to happen at run-time each time the
lambda is evaluated, which would make the interpreter much slower.
Overcapturing can get pretty messy if macros are introducing their own
hidden variables.</p>

<p>On the other hand, the byte compiler can do this analysis just once at
compile-time. And it’s already doing the analysis as part of its job.
It can avoid this problem easily:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">overcapture</span> <span class="ss">:x</span> <span class="ss">:some-big-value</span><span class="p">)</span>
<span class="c1">;; =&gt; #[0 "\300\207" [:x] 1]</span>
</code></pre></div></div>

<p>It’s clear that <code class="language-plaintext highlighter-rouge">:some-big-value</code> isn’t present in the closure.</p>

<p>But… how does this work?</p>

<h3 id="how-byte-compiled-closures-are-constructed">How byte compiled closures are constructed</h3>

<p>Recall from the <a href="/blog/2014/01/04/">internals article</a> that the four core elements of a
byte-code function object are:</p>

<ol>
  <li>Parameter specification</li>
  <li>Byte-code string (opcodes)</li>
  <li>Constants vector</li>
  <li>Maximum stack usage</li>
</ol>

<p>While a closure <em>seems</em> like compiling a whole new function each time
the lambda expression is evaluated, there’s actually not that much to
it! Namely, <a href="/blog/2017/01/08/">the <em>behavior</em> of the function remains the same</a>. Only
the closed-over environment changes.</p>

<p>What this means is that closures produced by a common lambda
expression can all share the same byte-code string (second element).
Their bodies are identical, so they compile to the same byte-code.
Where they differ are in their constants vector (third element), which
gets filled out according to the closed over environment. It’s clear
just from examining the outputs:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">constantly</span> <span class="ss">:a</span><span class="p">)</span>
<span class="c1">;; =&gt; #[128 "\300\207" [:a] 2]</span>

<span class="p">(</span><span class="nb">constantly</span> <span class="ss">:b</span><span class="p">)</span>
<span class="c1">;; =&gt; #[128 "\300\207" [:b] 2]</span>

</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">constantly</code> has three of the four components of the closure in its own
constant pool. Its job is to construct the constants vector, and then
assemble the whole thing into a byte-code function object (<code class="language-plaintext highlighter-rouge">#[...]</code>).
Here it is with <code class="language-plaintext highlighter-rouge">M-x disassemble</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  make-byte-code
1       constant  128
2       constant  "\300\207"
4       constant  vector
5       stack-ref 4
6       call      1
7       constant  2
8       call      4
9       return
</code></pre></div></div>

<p>(Note: since byte compiler doesn’t produce perfectly optimal code, I’ve
simplified it for this discussion.)</p>

<p>It pushes most of its constants on the stack. Then the <code class="language-plaintext highlighter-rouge">stack-ref 5</code> (5)
puts <code class="language-plaintext highlighter-rouge">x</code> on the stack. Then it calls <code class="language-plaintext highlighter-rouge">vector</code> to create the constants
vector (6). Finally, it constructs the function object (<code class="language-plaintext highlighter-rouge">#[...]</code>) by
calling <code class="language-plaintext highlighter-rouge">make-byte-code</code> (8).</p>

<p>Since this might be clearer, here’s the same thing expressed back in
terms of Elisp:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nb">constantly</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">make-byte-code</span> <span class="mi">128</span> <span class="s">"\300\207"</span> <span class="p">(</span><span class="nb">vector</span> <span class="nv">x</span><span class="p">)</span> <span class="mi">2</span><span class="p">))</span>
</code></pre></div></div>

<p>To see the disassembly of the closure’s byte-code:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">disassemble</span> <span class="p">(</span><span class="nb">constantly</span> <span class="ss">:x</span><span class="p">))</span>
</code></pre></div></div>

<p>The result isn’t very surprising:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  :x
1       return
</code></pre></div></div>

<p>Things get a little more interesting when mutation is involved. Consider
this adder closure generator, which mutates its environment every time
it’s called:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">adder</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">total</span> <span class="mi">0</span><span class="p">))</span>
    <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="p">(</span><span class="nv">cl-incf</span> <span class="nv">total</span><span class="p">))))</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nb">count</span> <span class="p">(</span><span class="nv">adder</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">funcall</span> <span class="nb">count</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">funcall</span> <span class="nb">count</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">funcall</span> <span class="nb">count</span><span class="p">))</span>
<span class="c1">;; =&gt; 3</span>

<span class="p">(</span><span class="nv">adder</span><span class="p">)</span>
<span class="c1">;; =&gt; #[0 "\300\211\242T\240\207" [(0)] 2]</span>
</code></pre></div></div>

<p>The adder essentially works like this:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">adder</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">make-byte-code</span> <span class="mi">0</span> <span class="s">"\300\211\242T\240\207"</span> <span class="p">(</span><span class="nb">vector</span> <span class="p">(</span><span class="nb">list</span> <span class="mi">0</span><span class="p">))</span> <span class="mi">2</span><span class="p">))</span>
</code></pre></div></div>

<p><em>In theory</em>, this closure could operate by mutating its constants vector
directly. But that wouldn’t be much of a <em>constants</em> vector, now would
it!? Instead, mutated variables are <em>boxed</em> inside a cons cell. Closures
don’t share constant vectors, so the main reason for boxing is to share
variables between closures from the same environment. That is, they have
the same cons in each of their constant vectors.</p>

<p>There’s no equivalent Elisp for the closure in <code class="language-plaintext highlighter-rouge">adder</code>, so here’s the
disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  (0)
1       dup
2       car-safe
3       add1
4       setcar
5       return
</code></pre></div></div>

<p>It puts two references to boxed integer on the stack (<code class="language-plaintext highlighter-rouge">constant</code>,
<code class="language-plaintext highlighter-rouge">dup</code>), unboxes the top one (<code class="language-plaintext highlighter-rouge">car-safe</code>), increments that unboxed
integer, stores it back in the box (<code class="language-plaintext highlighter-rouge">setcar</code>) via the bottom reference,
leaving the incremented value behind to be returned.</p>

<p>This all gets a little more interesting when closures interact:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">fancy-adder</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">total</span> <span class="mi">0</span><span class="p">))</span>
    <span class="o">`</span><span class="p">(</span><span class="ss">:add</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="p">(</span><span class="nv">cl-incf</span> <span class="nv">total</span><span class="p">))</span>
      <span class="ss">:set</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nb">setf</span> <span class="nv">total</span> <span class="nv">v</span><span class="p">))</span>
      <span class="ss">:get</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="nv">total</span><span class="p">))))</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">counter</span> <span class="p">(</span><span class="nv">fancy-adder</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">counter</span> <span class="ss">:set</span><span class="p">)</span> <span class="mi">100</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">counter</span> <span class="ss">:add</span><span class="p">))</span>
  <span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">counter</span> <span class="ss">:add</span><span class="p">))</span>
  <span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">counter</span> <span class="ss">:get</span><span class="p">)))</span>
<span class="c1">;; =&gt; 102</span>

<span class="p">(</span><span class="nv">fancy-adder</span><span class="p">)</span>
<span class="c1">;; =&gt; (:add #[0 "\300\211\242T\240\207" [(0)] 2]</span>
<span class="c1">;;     :set #[257 "\300\001\240\207" [(0)] 3]</span>
<span class="c1">;;     :get #[0 "\300\242\207" [(0)] 1])</span>
</code></pre></div></div>

<p>This is starting to resemble object oriented programming, with methods
acting upon fields stored in a common, closed-over environment.</p>

<p>All three closures share a common variable, <code class="language-plaintext highlighter-rouge">total</code>. Since I didn’t
use <code class="language-plaintext highlighter-rouge">print-circle</code>, this isn’t obvious from the last result, but each
of those <code class="language-plaintext highlighter-rouge">(0)</code> conses are the same object. When one closure mutates
the box, they all see the change. Here’s essentially how <code class="language-plaintext highlighter-rouge">fancy-adder</code>
is transformed by the byte compiler:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">fancy-adder</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">box</span> <span class="p">(</span><span class="nb">list</span> <span class="mi">0</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">list</span> <span class="ss">:add</span> <span class="p">(</span><span class="nv">make-byte-code</span> <span class="mi">0</span> <span class="s">"\300\211\242T\240\207"</span> <span class="p">(</span><span class="nb">vector</span> <span class="nv">box</span><span class="p">)</span> <span class="mi">2</span><span class="p">)</span>
          <span class="ss">:set</span> <span class="p">(</span><span class="nv">make-byte-code</span> <span class="mi">257</span> <span class="s">"\300\001\240\207"</span> <span class="p">(</span><span class="nb">vector</span> <span class="nv">box</span><span class="p">)</span> <span class="mi">3</span><span class="p">)</span>
          <span class="ss">:get</span> <span class="p">(</span><span class="nv">make-byte-code</span> <span class="mi">0</span> <span class="s">"\300\242\207"</span> <span class="p">(</span><span class="nb">vector</span> <span class="nv">box</span><span class="p">)</span> <span class="mi">1</span><span class="p">))))</span>
</code></pre></div></div>

<p>The backquote in the original <code class="language-plaintext highlighter-rouge">fancy-adder</code> brings this article full
circle. This final example wouldn’t work correctly if those lambdas
weren’t evaluated properly.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  <entry>
    <title>Make Flet Great Again</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2017/10/27/"/>
    <id>urn:uuid:46576058-9269-392b-96b2-0f434cbb87a2</id>
    <updated>2017-10-27T21:02:58Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>Do you long for the days before Emacs 24.3 when <code class="language-plaintext highlighter-rouge">flet</code> was dynamically
scoped? Well, you probably shouldn’t since there are <a href="/blog/2016/12/22/">some very good
reasons</a> lexical scope. But, still, a dynamically scoped <code class="language-plaintext highlighter-rouge">flet</code>
is situationally really useful, particularly in unit testing. The good
news is that it’s trivial to get this original behavior back without
relying on deprecated functions nor third-party packages.</p>

<p>But first, what is <code class="language-plaintext highlighter-rouge">flet</code> and what does it mean for it to be
dynamically scoped? The name stands for “function let” (or something
to that effect). It’s a macro to bind named functions within a local
scope, just as <code class="language-plaintext highlighter-rouge">let</code> binds variables within some local scope. It’s
provided by the now-deprecated <code class="language-plaintext highlighter-rouge">cl</code> package.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'cl</span><span class="p">)</span>  <span class="c1">; deprecated!</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">norm</span> <span class="p">(</span><span class="nv">x</span> <span class="nv">y</span><span class="p">)</span>
  <span class="p">(</span><span class="k">flet</span> <span class="p">((</span><span class="nv">square</span> <span class="p">(</span><span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">v</span> <span class="nv">v</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">sqrt</span> <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nv">square</span> <span class="nv">x</span><span class="p">)</span> <span class="p">(</span><span class="nv">square</span> <span class="nv">y</span><span class="p">)))))</span>
</code></pre></div></div>

<p>However, a gotcha here is that <code class="language-plaintext highlighter-rouge">square</code> is visible not just to the body
of <code class="language-plaintext highlighter-rouge">norm</code> but also to any function called directly or indirectly from
the <code class="language-plaintext highlighter-rouge">flet</code> body. That’s dynamic scope.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">flet</span> <span class="p">((</span><span class="nb">sqrt</span> <span class="p">(</span><span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">v</span> <span class="mi">2</span><span class="p">)))</span>  <span class="c1">; close enough</span>
  <span class="p">(</span><span class="nv">norm</span> <span class="mi">2</span> <span class="mi">2</span><span class="p">))</span>
<span class="c1">;; -&gt; 4</span>
</code></pre></div></div>

<p>Note: This works because <code class="language-plaintext highlighter-rouge">sqrt</code> hasn’t (yet?) been assigned a bytecode
opcode. One weakness with <code class="language-plaintext highlighter-rouge">flet</code> is that, due to being dynamically
scoped, it is unable to define or override functions whose calls
evaporate under byte compilation. For example, addition:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">add-with-flet</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">flet</span> <span class="p">((</span><span class="nb">+</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">_</span><span class="p">)</span> <span class="ss">:override</span><span class="p">))</span>
    <span class="p">(</span><span class="nb">+</span> <span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span><span class="p">)))</span>

<span class="p">(</span><span class="nv">add-with-flet</span><span class="p">)</span>
<span class="c1">;; -&gt; :override</span>

<span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">byte-compile</span> <span class="nf">#'</span><span class="nv">add-with-flet</span><span class="p">))</span>
<span class="c1">;; -&gt; 6</span>
</code></pre></div></div>

<p>Since <code class="language-plaintext highlighter-rouge">+</code> has its own opcode, the function call is eliminated under
byte-compilation and <code class="language-plaintext highlighter-rouge">flet</code> can’t do its job. This is similar <a href="/blog/2013/01/22/">these
same functions being <em>unadvisable</em></a>.</p>

<h3 id="cl-lib-and-cl-flet">cl-lib and cl-flet</h3>

<p>The <code class="language-plaintext highlighter-rouge">cl-lib</code> package introduced in Emacs 24.3, replacing <code class="language-plaintext highlighter-rouge">cl</code>, adds a
namespace prefix, <code class="language-plaintext highlighter-rouge">cl-</code>, to all of these Common Lisp style functions.
In most cases this was the only change. One exception is <code class="language-plaintext highlighter-rouge">cl-flet</code>,
which has different semantics: It’s lexically scoped, just like in
Common Lisp. Its bindings aren’t visible outside of the <code class="language-plaintext highlighter-rouge">cl-flet</code>
body.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'cl-lib</span><span class="p">)</span>

<span class="p">(</span><span class="nv">cl-flet</span> <span class="p">((</span><span class="nb">sqrt</span> <span class="p">(</span><span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">v</span> <span class="mi">2</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">norm</span> <span class="mi">2</span> <span class="mi">2</span><span class="p">))</span>
<span class="c1">;; -&gt; 2.8284271247461903</span>
</code></pre></div></div>

<p>In most cases <em>this is what you actually want</em>. The old <code class="language-plaintext highlighter-rouge">flet</code> subtly
changes the environment for all functions called directly or
indirectly from its body.</p>

<p>Besides being cleaner and less error prone, <code class="language-plaintext highlighter-rouge">cl-flet</code> also doesn’t
have special exceptions for functions with assigned opcodes. At
macro-expansion time it walks the body, taking its action before the
byte-compiler can interfere.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">add-with-cl-flet</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">cl-flet</span> <span class="p">((</span><span class="nb">+</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">_</span><span class="p">)</span> <span class="ss">:override</span><span class="p">))</span>
    <span class="p">(</span><span class="nb">+</span> <span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span><span class="p">)))</span>

<span class="p">(</span><span class="nv">add-with-cl-flet</span><span class="p">)</span>
<span class="c1">;; -&gt; :override</span>

<span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">byte-compile</span> <span class="nf">#'</span><span class="nv">add-with-cl-flet</span><span class="p">))</span>
<span class="c1">;; -&gt; :override</span>
</code></pre></div></div>

<p>In order for it to work properly, it’s essential that functions are
quoted with sharp-quotes (<code class="language-plaintext highlighter-rouge">#'</code>) so that the macro can tell the
difference between functions and symbols. Just make a general habit of
sharp-quoting functions.</p>

<p>In unit testing, temporarily overriding functions for all of Emacs is
useful, so <code class="language-plaintext highlighter-rouge">flet</code> still has some uses. But it’s deprecated!</p>

<h3 id="unit-testing-with-flet">Unit testing with flet</h3>

<p>Since Emacs can do anything, suppose there is an Emacs package that
makes sandwiches. In this package there’s an interactive function to
set the default sandwich cheese.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">default-cheese</span> <span class="ss">'cheddar</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">set-default-cheese</span> <span class="p">(</span><span class="k">type</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">interactive</span>
   <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">options</span> <span class="o">'</span><span class="p">(</span><span class="s">"cheddar"</span> <span class="s">"swiss"</span> <span class="s">"american"</span><span class="p">))</span>
          <span class="p">(</span><span class="nv">input</span> <span class="p">(</span><span class="nv">completing-read</span> <span class="s">"Cheese: "</span> <span class="nv">options</span> <span class="no">nil</span> <span class="no">t</span><span class="p">)))</span>
     <span class="p">(</span><span class="nb">when</span> <span class="nv">input</span>
       <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="nb">intern</span> <span class="nv">input</span><span class="p">)))))</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="nv">default-cheese</span> <span class="k">type</span><span class="p">))</span>
</code></pre></div></div>

<p>Since it’s interactive, it uses <code class="language-plaintext highlighter-rouge">completing-read</code> to prompt the user
for input. A unit test could call this function non-interactively, but
perhaps we’d also like to test the interactive path. The code inside
<code class="language-plaintext highlighter-rouge">interactive</code> occasionally gets messy and may warrant testing. It
would obviously be inconvenient to prompt the user for input during
testing, and it wouldn’t work at all in batch mode (<code class="language-plaintext highlighter-rouge">-batch</code>).</p>

<p>With <code class="language-plaintext highlighter-rouge">flet</code> we can stub out <code class="language-plaintext highlighter-rouge">completing-read</code> just for the unit test:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: t; -*-</span>

<span class="p">(</span><span class="nv">ert-deftest</span> <span class="nv">test-set-default-cheese</span> <span class="p">()</span>
  <span class="c1">;; protect original with dynamic binding</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">(</span><span class="nv">default-cheese</span><span class="p">)</span>
    <span class="c1">;; simulate user entering "american"</span>
    <span class="p">(</span><span class="k">flet</span> <span class="p">((</span><span class="nv">completing-read</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">_</span><span class="p">)</span> <span class="s">"american"</span><span class="p">))</span>
      <span class="p">(</span><span class="nv">call-interactively</span> <span class="nf">#'</span><span class="nv">set-default-cheese</span><span class="p">)</span>
      <span class="p">(</span><span class="nv">should</span> <span class="p">(</span><span class="nb">eq</span> <span class="ss">'american</span> <span class="nv">default-cheese</span><span class="p">)))))</span>
</code></pre></div></div>

<p>Since <code class="language-plaintext highlighter-rouge">default-cheese</code> was defined with <code class="language-plaintext highlighter-rouge">defvar</code>, it will be
dynamically scoped despite <code class="language-plaintext highlighter-rouge">let</code> normally using lexical scope in this
example. Both of the <em>side effects</em> of the tested function — setting a
global variable and prompting the user — are captured using a
combination of <code class="language-plaintext highlighter-rouge">let</code> and <code class="language-plaintext highlighter-rouge">flet</code>.</p>

<p>Since <code class="language-plaintext highlighter-rouge">cl-flet</code> is lexically scoped, it cannot serve this purpose. If
<code class="language-plaintext highlighter-rouge">flet</code> is deprecated and <code class="language-plaintext highlighter-rouge">cl-flet</code> can’t do the job, what’s the right
way to fix it? The answer lies in <em>generalized variables</em>.</p>

<h3 id="cl-letf">cl-letf</h3>

<p>What’s <em>really</em> happening inside <code class="language-plaintext highlighter-rouge">flet</code> is it’s globally binding a
function name to a different function, evaluating the body, and
rebinding it back to the original definition when the body completes.
It macro-expands to something like this:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">original</span> <span class="p">(</span><span class="nb">symbol-function</span> <span class="ss">'completing-read</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">symbol-function</span> <span class="ss">'completing-read</span><span class="p">)</span>
        <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">_</span><span class="p">)</span> <span class="s">"american"</span><span class="p">))</span>
  <span class="p">(</span><span class="k">unwind-protect</span>
      <span class="p">(</span><span class="nv">call-interactively</span> <span class="nf">#'</span><span class="nv">set-default-cheese</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">symbol-function</span> <span class="ss">'completing-read</span><span class="p">)</span> <span class="nv">original</span><span class="p">)))</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">unwind-protect</code> ensures the original function is rebound even if
the body of the call were to fail. This is very much a <code class="language-plaintext highlighter-rouge">let</code>-like
pattern, and I’m using <code class="language-plaintext highlighter-rouge">symbol-function</code> as a generalized variable via
<code class="language-plaintext highlighter-rouge">setf</code>. Is there a generalized variable version of <code class="language-plaintext highlighter-rouge">let</code>?</p>

<p>Yes! It’s called <code class="language-plaintext highlighter-rouge">cl-letf</code>! In this case the <code class="language-plaintext highlighter-rouge">f</code> suffix is analogous
to the <code class="language-plaintext highlighter-rouge">f</code> suffix in <code class="language-plaintext highlighter-rouge">setf</code>. That form above can be reduced to a more
general form:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-letf</span> <span class="p">(((</span><span class="nb">symbol-function</span> <span class="ss">'completing-read</span><span class="p">)</span>
           <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">_</span><span class="p">)</span> <span class="s">"american"</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">call-interactively</span> <span class="nf">#'</span><span class="nv">set-default-cheese</span><span class="p">))</span>
</code></pre></div></div>

<p>And <em>that’s</em> the way to reproduce the dynamically scoped behavior of
<code class="language-plaintext highlighter-rouge">flet</code> since Emacs 24.3. There’s nothing complicated about it.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">ert-deftest</span> <span class="nv">test-set-default-cheese</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">(</span><span class="nv">default-cheese</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">cl-letf</span> <span class="p">(((</span><span class="nb">symbol-function</span> <span class="ss">'completing-read</span><span class="p">)</span>
               <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">_</span><span class="p">)</span> <span class="s">"american"</span><span class="p">)))</span>
      <span class="p">(</span><span class="nv">call-interactively</span> <span class="nf">#'</span><span class="nv">set-default-cheese</span><span class="p">)</span>
      <span class="p">(</span><span class="nv">should</span> <span class="p">(</span><span class="nb">eq</span> <span class="ss">'american</span> <span class="nv">default-cheese</span><span class="p">)))))</span>

</code></pre></div></div>

<p>Keep in mind that this suffers the exact same problem with
bytecode-assigned functions as <code class="language-plaintext highlighter-rouge">flet</code>, and for exactly the same
reasons. If <code class="language-plaintext highlighter-rouge">completing-read</code> were to ever be assigned its own opcode
then <code class="language-plaintext highlighter-rouge">cl-letf</code> would no longer work for this particular example.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Gap Buffers Are Not Optimized for Multiple Cursors</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2017/09/07/"/>
    <id>urn:uuid:8c80d068-2342-356a-9b78-f180806418a4</id>
    <updated>2017-09-07T01:34:04Z</updated>
    <category term="emacs"/><category term="c"/><category term="vim"/>
    <content type="html">
      <![CDATA[<p>Gap buffers are a common data structure for representing a text buffer
in a text editor. Emacs famously uses gap buffers — long-standing proof
that gap buffers are a perfectly sufficient way to represent a text
buffer.</p>

<ul>
  <li>
    <p>Gap buffers are <em>very</em> easy to implement. A bare minimum
implementation is about 60 lines of C.</p>
  </li>
  <li>
    <p>Gap buffers are especially efficient for the majority of typical
editing commands, which tend to be clustered in a small area.</p>
  </li>
  <li>
    <p>Except for the gap, the content of the buffer is contiguous, making
the search and display implementations simpler and more efficient.
There’s also the potential for most of the gap buffer to be
memory-mapped to the original file, though typical encoding and
decoding operations prevent this from being realized.</p>
  </li>
  <li>
    <p>Due to having contiguous content, saving a gap buffer is basically
just two <code class="language-plaintext highlighter-rouge">write(2)</code> system calls. (Plus <a href="https://www.youtube.com/watch?v=LMe7hf2G1po"><code class="language-plaintext highlighter-rouge">fsync(2)</code>, etc.</a>)</p>
  </li>
</ul>

<p>A gap buffer is really a pair of buffers where one buffer holds all of
the content before the cursor (or <em>point</em> for Emacs), and the other
buffer holds the content after the cursor. When the cursor is moved
through the buffer, characters are copied from one buffer to the
other. Inserts and deletes close to the gap are very efficient.</p>

<p>Typically it’s implemented as a single large buffer, with the
pre-cursor content at the beginning, the post-cursor content at the
end, and the gap spanning the middle. Here’s an illustration:</p>

<p><img src="/img/gap-buffer/intro.gif" alt="" /></p>

<p>The top of the animation is the display of the text content and cursor
as the user would see it. The bottom is the gap buffer state, where
each character is represented as a gray block, and a literal gap for
the cursor.</p>

<p>Ignoring for a moment more complicated concerns such as undo and
Unicode, a gap buffer could be represented by something as simple as
the following:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">gapbuf</span> <span class="p">{</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">buf</span><span class="p">;</span>
    <span class="kt">size_t</span> <span class="n">total</span><span class="p">;</span>  <span class="cm">/* total size of buf */</span>
    <span class="kt">size_t</span> <span class="n">front</span><span class="p">;</span>  <span class="cm">/* size of content before cursor */</span>
    <span class="kt">size_t</span> <span class="n">gap</span><span class="p">;</span>    <span class="cm">/* size of the gap */</span>
<span class="p">};</span>
</code></pre></div></div>

<p>This is close to <a href="http://git.savannah.gnu.org/cgit/emacs.git/tree/src/buffer.h?h=emacs-25.2#n425">how Emacs represents it</a>. In the structure
above, the size of the content after the cursor isn’t tracked directly,
but can be computed on the fly from the other three quantities. That is
to say, this data structure is <em>normalized</em>.</p>

<p>As an optimization, the cursor could be tracked separately from the
gap such that non-destructive cursor movement is essentially free. The
difference between cursor and gap would only need to be reconciled for
a destructive change — an insert or delete.</p>

<p>A gap buffer certainly isn’t the only way to do it. For example, the
original <a href="https://ecc-comp.blogspot.com/2015/05/a-brief-glance-at-how-5-text-editors.html">vi used an array of lines</a>, which sort of explains
some of its quirky <a href="http://vimhelp.appspot.com/options.txt.html#'backspace'">line-oriented idioms</a>. The BSD clone of vi, nvi,
<a href="https://en.wikipedia.org/wiki/Nvi">uses an entire database</a> to represent buffers. Vim uses a fairly
complex <a href="https://en.wikipedia.org/wiki/Rope_(data_structure)">rope</a>-like <a href="https://github.com/vim/vim/blob/e723c42836d971180d1bf9f98916966c5543fff1/src/memline.c">data structure</a> with <a href="http://www.free-soft.org/FSM/english/issue01/vim.html">page-oriented
blocks</a>, which may be stored out-of-order in its swap file.</p>

<h3 id="multiple-cursors">Multiple cursors</h3>

<p><a href="http://emacsrocks.com/e13.html"><em>Multiple cursors</em></a> is fairly recent text editor invention that
has gained a lot of popularity recent years. It seems every major
editor either has the feature built in or a readily-available
extension. I myself used Magnar Sveen’s <a href="https://github.com/magnars/multiple-cursors.el">well-polished package</a>
for several years. Though obviously the concept didn’t originate in
Emacs or else it would have been called <em>multiple points</em>, which
doesn’t quite roll off the tongue quite the same way.</p>

<p>The concept is simple: If the same operation needs to done in many
different places in a buffer, you place a cursor at each position, then
drive them all in parallel using the same commands. It’s super flashy
and great for impressing all your friends.</p>

<p>However, as a result of <a href="/blog/2017/04/01/">improving my typing skills</a>, I’ve
come to the conclusion that <a href="https://medium.com/@schtoeffel/you-don-t-need-more-than-one-cursor-in-vim-2c44117d51db">multiple cursors is all hat and no
cattle</a>. It doesn’t compose well with other editing commands, it
doesn’t scale up to large operations, and it’s got all sorts of flaky
edge cases (off-screen cursors). Nearly anything you can do with
multiple cursors, you can do better with old, well-established editing
paradigms.</p>

<p>Somewhere around 99% of my multiple cursors usage was adding a common
prefix to a contiguous serious of lines. As similar brute force
options, Emacs already has rectangular editing, and Vim already has
visual block mode.</p>

<p>The most sophisticated, flexible, and robust alternative is a good old
macro. You can play it back anywhere it’s needed. You can zip it across
a huge buffer. The only downside is that it’s less flashy and so you’ll
get invited to a slightly smaller number of parties.</p>

<p>But if you don’t buy my arguments about multiple cursors being
tasteless, there’s still a good technical argument: <strong>Gap buffers are
not designed to work well in the face of multiple cursors!</strong></p>

<p>For example, suppose we have a series of function calls and we’d like to
add the same set of arguments to each. It’s a classic situation for a
macro or for multiple cursors. Here’s the original code:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>foo();
bar();
baz();
</code></pre></div></div>

<p>The example is tiny so that it will fit in the animations to come.
Here’s the desired code:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>foo(x, y);
bar(x, y);
baz(x, y);
</code></pre></div></div>

<p>With multiple cursors you would place a cursor inside each set of
parenthesis, then type <code class="language-plaintext highlighter-rouge">x, y</code>. Visually it looks something like this:</p>

<p><img src="/img/gap-buffer/illusion.gif" alt="" /></p>

<p>Text is magically inserted in parallel in multiple places at a time.
However, if this is a text editor that uses a gap buffer, the
situation underneath isn’t quite so magical. The entire edit doesn’t
happen at once. First the <code class="language-plaintext highlighter-rouge">x</code> is inserted in each location, then the
comma, and so on. The edits are not clustered so nicely.</p>

<p>From the gap buffer’s point of view, here’s what it looks like:</p>

<p><img src="/img/gap-buffer/multicursors.gif" alt="" /></p>

<p>For every individual character insertion the buffer has to visit each
cursor in turn, performing lots of copying back and forth. The more
cursors there are, the worse it gets. For an edit of length <code class="language-plaintext highlighter-rouge">n</code> with
<code class="language-plaintext highlighter-rouge">m</code> cursors, that’s <code class="language-plaintext highlighter-rouge">O(n * m)</code> calls to <code class="language-plaintext highlighter-rouge">memmove(3)</code>. Multiple cursors
scales badly.</p>

<p>Compare that to the old school hacker who can’t be bothered with
something as tacky and <em>modern</em> (eww!) as multiple cursors, instead
choosing to record a macro, then play it back:</p>

<p><img src="/img/gap-buffer/macros.gif" alt="" /></p>

<p>The entire edit is done locally before moving on to the next location.
It’s perfectly in tune with the gap buffer’s expectations, only needing
<code class="language-plaintext highlighter-rouge">O(m)</code> calls to <code class="language-plaintext highlighter-rouge">memmove(3)</code>. Most of the work flows neatly into the
gap.</p>

<p>So, don’t waste your time with multiple cursors, especially if you’re
using a gap buffer text editor. Instead get more comfortable with your
editor’s macro feature. If your editor doesn’t have a good macro
feature, get a new editor.</p>

<p>If you want to make your own gap buffer animations, here’s the source
code. It includes a tiny gap buffer implementation:</p>

<ul>
  <li><a href="https://github.com/skeeto/gap-buffer-animator">https://github.com/skeeto/gap-buffer-animator</a></li>
</ul>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>Vim vs. Emacs: the Working Directory</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2017/08/22/"/>
    <id>urn:uuid:f27469f8-4731-35b5-1c55-4bbeb200fcad</id>
    <updated>2017-08-22T04:51:36Z</updated>
    <category term="vim"/><category term="emacs"/>
    <content type="html">
      <![CDATA[<p>Vim and Emacs have different internals models for the current working
directory, and these models influence the overall workflow for each
editor. They decide how files are opened, how shell commands are
executed, and how the build system is operated. These effects even reach
outside the editor to influence the overall structure of the project
being edited.</p>

<p>In the traditional unix model, which was <a href="https://web.archive.org/web/0/https://blogs.msdn.microsoft.com/oldnewthing/20101011-00/?p=12563">eventually adopted</a>
everywhere else, each process has a particular working directory
tracked by the operating system. When a process makes a request to the
operating system using a relative path — a path that doesn’t begin
with a slash — the operating system uses the process’ working
directory to convert the path into an absolute path. When a process
forks, its child starts in the same directory. A process can change
its working directory at any time using <a href="http://pubs.opengroup.org/onlinepubs/9699919799/functions/chdir.html"><code class="language-plaintext highlighter-rouge">chdir(2)</code></a>, though
most programs never need to do it. The most obvious way this system
call is exposed to regular users is through the shell’s built-in <code class="language-plaintext highlighter-rouge">cd</code>
command.</p>

<p>Vim’s spiritual heritage is obviously rooted in vi, one of the classic
unix text editors, and the <a href="http://pubs.opengroup.org/onlinepubs/9699919799/utilities/vi.html">most elaborate text editor standardized by
POSIX</a>. Like vi, Vim closely follows the unix model for working
directories. At any given time Vim has exactly one working directory.
Shell commands that are run within Vim will start in Vim’s working
directory. Like a shell, the <code class="language-plaintext highlighter-rouge">cd</code> ex command changes and queries Vim’s
working directory.</p>

<p>Emacs eschews this model and instead each buffer has its own working
directory tracked using a buffer-local variable, <code class="language-plaintext highlighter-rouge">default-directory</code>.
Emacs internally simulates working directories for its buffers like an
operating system, resolving absolute paths itself, giving credence to
the idea that Emacs is an operating system (“lacking only a decent
editor”). Perhaps this model comes from ye olde lisp machines?</p>

<p>In contrast, Emacs’ <code class="language-plaintext highlighter-rouge">M-x cd</code> command manipulates the local variable
and has no effect on the Emacs process’ working directory. In fact,
Emacs completely hides its operating system working directory from
Emacs Lisp. This can cause some trouble if that hidden working
directory happens to be sitting on filesystem you’d like to unmount.</p>

<p>Vim can be configured to simulate Emacs’ model with its <code class="language-plaintext highlighter-rouge">autochdir</code>
option. When set, Vim will literally <code class="language-plaintext highlighter-rouge">chdir(2)</code> each time the user
changes buffers, switches windows, etc. To the user, this feels just
like Emacs’ model, but this is just a convenience, and the core
working directory model is still the same.</p>

<h3 id="single-instance-editors">Single instance editors</h3>

<p>For most of my Emacs career, I’ve stuck to running a single,
long-lived Emacs instance no matter how many different tasks I’m
touching simultaneously. I start the Emacs daemon shortly after
logging in, and it continues running until I log out — typically only
when the machine is shut down. It’s common to have multiple Emacs
windows (frames) for different tasks, but they’re all bound to the
same daemon process.</p>

<p>While <a href="https://github.com/jwiegley/use-package">with care</a> it’s possible to have a complex, rich Emacs
configuration that doesn’t significantly impact Emacs’ startup time, the
general consensus is that Emacs is slow to start. But since it has a
really solid daemon, this doesn’t matter: hardcore Emacs users only ever
start Emacs occasionally. The rest of the time they’re launching
<code class="language-plaintext highlighter-rouge">emacsclient</code> and connecting to the daemon. Outside of system
administration, it’s the most natural way to use Emacs.</p>

<p>The case isn’t so clear for Vim. Vim is so fast that many users fire
it up on demand and exit when they’ve finished the immediate task. At
the other end of the spectrum, others <a href="https://vimeo.com/4446112">advocate using a single
instance of Vim</a> like running a single Emacs daemon. In <a href="/blog/2017/04/01/">my
initial dive into Vim</a>, I tried the single-instance, Emacs way of
doing things. I set <code class="language-plaintext highlighter-rouge">autochdir</code> out of necessity and pretended each
buffer had its own working directory.</p>

<p>At least for me, this isn’t the right way to use Vim, and it all comes
down to working directories. <strong>I want Vim to be anchored at the
project root</strong> with one Vim instance per project. Everything is
smoother when it happens in the context of the project’s root
directory, from opening files, to running shell commands (<code class="language-plaintext highlighter-rouge">ctags</code> in
particular), to invoking the build system. With <code class="language-plaintext highlighter-rouge">autochdir</code>, these
actions are difficult to do correctly, particularly the last two.</p>

<h3 id="invoking-the-build">Invoking the build</h3>

<p>I suspect the Emacs’ model of per-buffer working directories has, in a
<a href="https://en.wikipedia.org/wiki/Linguistic_relativity">Sapir-Whorf</a> sort of way, been responsible for leading developers
towards <a href="/blog/2017/08/20/">poorly-designed, recursive Makefiles</a>. Without a global
concept of working directory, it’s inconvenient to invoke the build
system (<code class="language-plaintext highlighter-rouge">M-x compile</code>) in some particular grandparent directory that
is the root of the project. If each directory has its own Makefile, it
usually makes sense to invoke <code class="language-plaintext highlighter-rouge">make</code> in the same directory as the file
being edited.</p>

<p>Over the years I’ve been reinventing the same solution to this
problem, and it wasn’t until I spent time with Vim and its alternate
working directory model that I truly understood the problem. Emacs
itself has long had a solution lurking deep in its bowels, unseen by
daylight: <em>dominating files</em>. The function I’m talking about is
<code class="language-plaintext highlighter-rouge">locate-dominating-file</code>:</p>

<blockquote>
  <p><code class="language-plaintext highlighter-rouge">(locate-dominating-file FILE NAME)</code></p>

  <p>Look up the directory hierarchy from FILE for a directory containing
NAME. Stop at the first parent directory containing a file NAME, and
return the directory. Return nil if not found. Instead of a string,
NAME can also be a predicate taking one argument (a directory) and
returning a non-nil value if that directory is the one for which we’re
looking.</p>
</blockquote>

<p>The trouble of invoking the build system at the project root is that
Emacs doesn’t really have a concept of a project root. It doesn’t know
where it is or how to find it. The vi model inherited by Vim is to
leave the working directory at the project root. While Vim can
simulate Emacs’ working directory model, Emacs cannot (currently)
simulate Vim’s model.</p>

<p>Instead, by identifying a file name unique to the project’s root (i.e.
a “dominating” file) such as <code class="language-plaintext highlighter-rouge">Makefile</code> or <code class="language-plaintext highlighter-rouge">build.xml</code>, then
<code class="language-plaintext highlighter-rouge">locate-dominating-file</code> can discover the project root. All that’s
left is wrapping <code class="language-plaintext highlighter-rouge">M-x compile</code> so that <code class="language-plaintext highlighter-rouge">default-directory</code> is
temporarily adjusted to the project’s root.</p>

<p>That looks <em>very</em> roughly like this (and needs more work):</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">my-compile</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">default-directory</span> <span class="p">(</span><span class="nv">locate-dominating-file</span> <span class="s">"."</span> <span class="s">"Makefile"</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">compile</span> <span class="s">"make"</span><span class="p">)))</span>
</code></pre></div></div>

<p>It’s a pattern I’ve used <a href="https://github.com/skeeto/.emacs.d/blob/e8af63ca3585598f5e509bc274e0bb3b875206d3/lisp/ctags.el#L40">again</a> and <a href="https://github.com/skeeto/.emacs.d/blob/e8af63ca3585598f5e509bc274e0bb3b875206d3/etc/compile-bind.el#L38">again</a> and
<a href="https://github.com/skeeto/ant-project-mode/blob/335070891f1fabe8d3205418374a68bb13cec8c0/ant-project-mode.el#L211">again</a>, working against the same old friction. By running one
Vim instance per project at the project’s root, I get the correct
behavior for free.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>My Journey with Touch Typing and Vim</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2017/04/01/"/>
    <id>urn:uuid:985ef250-4b1f-3ec0-76a4-79406f3e993e</id>
    <updated>2017-04-01T04:02:08Z</updated>
    <category term="vim"/><category term="emacs"/><category term="meatspace"/>
    <content type="html">
      <![CDATA[<p><em>Given the title, the publication date of this article is probably
really confusing. This was deliberate.</em></p>

<p>Three weeks ago I made a conscious decision to improve my typing
habits. You see, I had <a href="http://steve-yegge.blogspot.com/2008/09/programmings-dirtiest-little-secret.html">a dirty habit</a>. Despite spending literally
decades typing on a daily basis, I’ve been a weak typist. It wasn’t
exactly finger pecking, nor did it require looking down at the
keyboard as I typed, but rather a six-finger dance I developed
organically over the years. My technique was optimized towards Emacs’
frequent use of CTRL and ALT combinations, avoiding most of the hand
scrunching. It was fast enough to keep up with my thinking most of the
time, but was ultimately limiting due to its poor accuracy. I was
hitting the wrong keys far too often.</p>

<p>My prime motivation was to learn Vim — or, more specifically, to learn
modal editing. Lots of people swear by it, including people whose
opinions I hold in high regard. The modal editing community is without
a doubt larger than the Emacs community, especially since, thanks to
Viper and <a href="https://github.com/emacs-evil/evil">Evil</a>, a subset of the Emacs community is also part
of the modal editing community. There’s obviously <em>something</em>
significantly valuable about it, and I wanted to understand what that
was.</p>

<p>But I was a lousy typist who couldn’t hit the right keys often enough to
make effective use of modal editing. I would need to learn touch typing
first.</p>

<h3 id="touch-typing">Touch typing</h3>

<p>How would I learn? Well, the first search result for “online touch
typing course” was <a href="https://www.typingclub.com/">Typing Club</a>, so that’s what I went with. By
the way, here’s my official review: “Good enough not to bother
checking out the competition.” For a website it’s pretty much the
ultimate compliment, but it’s not exactly the sort of thing you’d want
to hear from your long-term partner.</p>

<p>My hard rule was that I would immediately abandon my old habits cold
turkey. Poor typing is a bad habit just like smoking, minus the cancer
and weakened sense of smell. It was vital that I unlearn all that old
muscle memory. That included not just my six-finger dance, but also my
<a href="http://www.nethack.org/">NetHack</a> muscle memory. NetHack uses “hjkl” for navigation just
like Vim. The problem was that I’d spent a couple hundred hours in
NetHack over the past decade with my index finger on “h”, not the
proper home row location. It was disorienting to navigate around Vim
initally, like <a href="https://www.youtube.com/watch?v=MFzDaBzBlL0">riding a bicycle with inverted controls</a>.</p>

<p>Based on reading other people’s accounts, I determined I’d need
several days of introductory practice where I’d be utterly
unproductive. I took a three-day weekend, starting my touch typing
lessons on a Thursday evening. Boy, they weren’t kidding about it
being slow going. It was a rough weekend. When checking in on my
practice, my wife literally said she pitied me. Ouch.</p>

<p>By Monday I was at a level resembling a very slow touch typist. For
the rest of the first week I followed all the lessons up through the
number keys, never progressing past an exercise until I had exceeded
the target speed with at least 90% accuracy. This was now enough to
get me back on my feet for programming at a glacial, frustrating pace.
Programming involves a lot more numbers and symbols than other kinds
of typing, making that top row so important. For a programmer, it
would probably be better for these lessons to be earlier in the
series.</p>

<h3 id="modal-editing">Modal editing</h3>

<p>For that first week I mostly used Emacs while I was finding my feet
(or finding my fingers?). That’s when I experienced first hand what
all these non-Emacs people — people who I, until recently, considered
to be unenlightened simpletons — had been complaining about all these
years: <strong>Pressing CTRL and ALT key combinations from the home row is a
real pain in in the ass!</strong> These complaints were suddenly making
sense. I was already seeing the value of modal editing before I even
started really learning Vim. It made me look forward to it even more.</p>

<p>During the second week of touch typing I went though <a href="http://derekwyatt.org/vim/tutorials/">Derek Wyatt’s
Vim videos</a> and learned my way around the :help system enough
to bootstrap my Vim education. I then read through the user manual,
practicing along the way. I’ll definitely have to pass through it a
few more times to pick up all sorts of things that didn’t stick. This
is one way that Emacs and Vim are a lot alike.</p>

<p>Update: <a href="https://pragprog.com/book/dnvim2/practical-vim-second-edition"><em>Practical Vim: Edit Text at the Speed of Thought</em></a> was
recommended in the comments, and it’s certainly a better place to
start than the Vim user manual. Unlike the manual, it’s opinionated
and focuses on good habits, which is exactly what a newbie needs.</p>

<p>One of my rules when learning Vim was to resist the urge to remap
keys. I’ve done it a lot with Emacs: “Hmm, that’s not very convenient.
I’ll change it.” It means <a href="https://github.com/skeeto/.emacs.d">my Emacs configuration</a> is fairly
non-standard, and using Emacs without my configuration is like using
an unfamiliar editor. This is both good and bad. The good is that I’ve
truly changed Emacs to be <em>my</em> editor, suited just for me. The bad is
that I’m extremely dependent on my configuration. What if there was a
text editing emergency?</p>

<p>With Vim as a sort of secondary editor, I want to be able to fire it
up unconfigured and continue to be nearly as productive. A pile of
remappings would prohibit this. In my mind this is like a form of
emergency preparedness. Other people stock up food and supplies. I’m
preparing myself to sit at a strange machine without any of my
configuration so that I can start the rewrite of the <a href="/blog/2016/11/17/">software lost in
the disaster</a>, so long as that machine has <a href="/blog/2017/03/30/">vi, cc, and
make</a>. If I can’t code in C, then what’s the point in surviving
anyway?</p>

<p>The other reason is that I’m just learning. A different mapping might
<em>seem</em> more appropriate, but what do I know at this point? It’s better
to follow the beaten path at first, lest I form a bunch of bad habits
again. Trust in the knowledge of the ancients.</p>

<h3 id="future-directions">Future directions</h3>

<p><strong>I am absolutely sticking with modal editing for the long term.</strong> I’m
<em>really</em> enjoying it so far. At three weeks of touch typing and two
weeks of modal editing, I’m around 80% caught back up with my old
productivity speed, but this time I’ve got a lot more potential for
improvement.</p>

<p>For now, Vim will continue taking over more and more of my text
editing work. My last three articles were written in Vim. It’s really
important to keep building proficiency. I still <a href="/blog/2013/09/03/">rely on Emacs for
email</a> and for <a href="https://github.com/skeeto/elfeed">syndication feeds</a>, and that’s not
changing any time soon. I also <a href="https://github.com/magit/magit">really like Magit</a> as a Git
interface. Plus I don’t want to <a href="/tags/emacs/">abandon years of accumulated
knowledge</a> and leave the users of my various Emacs packages out
to dry. Ultimately I believe will end up using Evil, to get what seems
to be the best of both worlds: modal editing and Emacs’ rich
extensibility.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Asynchronous Requests from Emacs Dynamic Modules</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2017/02/14/"/>
    <id>urn:uuid:00a59e4f-268c-343f-e6c6-bb23cde265de</id>
    <updated>2017-02-14T02:30:00Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="c"/><category term="linux"/><category term="win32"/>
    <content type="html">
      <![CDATA[<p>A few months ago I had a discussion with Vladimir Kazanov about his
<a href="https://github.com/vkazanov/toy-orgfuse">Orgfuse</a> project: a Python script that exposes an Emacs
Org-mode document as a <a href="https://en.wikipedia.org/wiki/Filesystem_in_Userspace">FUSE filesystem</a>. It permits other
programs to navigate the structure of an Org-mode document through the
standard filesystem APIs. I suggested that, with the new dynamic
modules in Emacs 25, Emacs <em>itself</em> could serve a FUSE filesystem. In
fact, support for FUSE services in general could be an package of his
own.</p>

<p>So that’s what he did: <a href="https://github.com/vkazanov/elfuse"><strong>Elfuse</strong></a>. It’s an old joke that
Emacs is an operating system, and here it is handling system calls.</p>

<p>However, there’s a tricky problem to solve, an issue also present <a href="/blog/2016/11/05/">my
joystick module</a>. Both modules handle asynchronous events —
filesystem requests or joystick events — but Emacs runs the event loop
and owns the main thread. The external events somehow need to feed
into the main event loop. It’s even more difficult with FUSE because
FUSE <em>also</em> wants control of its own thread for its own event loop.
This requires Elfuse to spawn a dedicated FUSE thread and negotiate a
request/response hand-off.</p>

<p>When a filesystem request or joystick event arrives, how does Emacs
know to handle it? The simple and obvious solution is to poll the
module from a timer.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">queue</span> <span class="n">requests</span><span class="p">;</span>

<span class="n">emacs_value</span>
<span class="nf">Frequest_next</span><span class="p">(</span><span class="n">emacs_env</span> <span class="o">*</span><span class="n">env</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">n</span><span class="p">,</span> <span class="n">emacs_value</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">p</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">emacs_value</span> <span class="n">next</span> <span class="o">=</span> <span class="n">Qnil</span><span class="p">;</span>
    <span class="n">queue_lock</span><span class="p">(</span><span class="n">requests</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">queue_length</span><span class="p">(</span><span class="n">requests</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">void</span> <span class="o">*</span><span class="n">request</span> <span class="o">=</span> <span class="n">queue_pop</span><span class="p">(</span><span class="n">requests</span><span class="p">,</span> <span class="n">env</span><span class="p">);</span>
        <span class="n">next</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_user_ptr</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">fin_empty</span><span class="p">,</span> <span class="n">request</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="n">queue_unlock</span><span class="p">(</span><span class="n">request</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">next</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And then ask Emacs to check the module every, say, 10ms:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">request--poll</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">next</span> <span class="p">(</span><span class="nv">request-next</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">when</span> <span class="nv">next</span>
      <span class="p">(</span><span class="nv">request-handle</span> <span class="nv">next</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">run-at-time</span> <span class="mi">0</span> <span class="mf">0.01</span> <span class="nf">#'</span><span class="nv">request--poll</span><span class="p">)</span>
</code></pre></div></div>

<p>Blocking directly on the module’s event pump with Emacs’ thread would
prevent Emacs from doing important things like, you know, <em>being a
text editor</em>. The timer allows it to handle its own events
uninterrupted. It gets the job done, but it’s far from perfect:</p>

<ol>
  <li>
    <p>It imposes an arbitrary latency to handling requests. Up to the
poll period could pass before a request is handled.</p>
  </li>
  <li>
    <p>Polling the module 100 times per second is inefficient. Unless you
really enjoy recharging your laptop, that’s no good.</p>
  </li>
</ol>

<p>The poll period is a sliding trade-off between latency and battery
life. If only there was some mechanism to, ahem, <em>signal</em> the Emacs
thread, informing it that a request is waiting…</p>

<h3 id="sigusr1">SIGUSR1</h3>

<p>Emacs Lisp programs can handle the POSIX SIGUSR1 and SIGUSR2 signals,
which is exactly the mechanism we need. The interface is a “key”
binding on <code class="language-plaintext highlighter-rouge">special-event-map</code>, the keymap that handles these kinds of
events. When the signal arrives, Emacs queues it up for the main event
loop.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">define-key</span> <span class="nv">special-event-map</span> <span class="nv">[sigusr1]</span>
  <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
    <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">request-handle</span> <span class="p">(</span><span class="nv">request-next</span><span class="p">))))</span>
</code></pre></div></div>

<p>The module blocks on its own thread on its own event pump. When a
request arrives, it queues the request, rings the bell for Emacs to
come handle it (<code class="language-plaintext highlighter-rouge">raise()</code>), and waits on a semaphore. For illustration
purposes, assume the module reads requests from and writes responses
to a file descriptor, like a socket.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">event_fd</span> <span class="o">=</span> <span class="cm">/* ... */</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">request</span> <span class="n">request</span><span class="p">;</span>
<span class="n">sem_init</span><span class="p">(</span><span class="o">&amp;</span><span class="n">request</span><span class="p">.</span><span class="n">sem</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>

<span class="k">for</span> <span class="p">(;;)</span> <span class="p">{</span>
    <span class="cm">/* Blocking read for request event */</span>
    <span class="n">read</span><span class="p">(</span><span class="n">event_fd</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">request</span><span class="p">.</span><span class="n">event</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">request</span><span class="p">.</span><span class="n">event</span><span class="p">));</span>

    <span class="cm">/* Put request on the queue */</span>
    <span class="n">queue_lock</span><span class="p">(</span><span class="n">requests</span><span class="p">);</span>
    <span class="n">queue_push</span><span class="p">(</span><span class="n">requests</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">request</span><span class="p">);</span>
    <span class="n">queue_unlock</span><span class="p">(</span><span class="n">requests</span><span class="p">);</span>
    <span class="n">raise</span><span class="p">(</span><span class="n">SIGUSR1</span><span class="p">);</span>  <span class="c1">// TODO: Should raise() go inside the lock?</span>

    <span class="cm">/* Wait for Emacs */</span>
    <span class="k">while</span> <span class="p">(</span><span class="n">sem_wait</span><span class="p">(</span><span class="o">&amp;</span><span class="n">request</span><span class="p">.</span><span class="n">sem</span><span class="p">))</span>
        <span class="p">;</span>

    <span class="cm">/* Reply with Emacs' response */</span>
    <span class="n">write</span><span class="p">(</span><span class="n">event_fd</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">request</span><span class="p">.</span><span class="n">response</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">request</span><span class="p">.</span><span class="n">response</span><span class="p">));</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">sem_wait()</code> is in a loop because signals will wake it up
prematurely. In fact, it may even wake up due to its own signal on the
line before. This is the only way this particular use of <code class="language-plaintext highlighter-rouge">sem_wait()</code>
might fail, so there’s no need to check <code class="language-plaintext highlighter-rouge">errno</code>.</p>

<p>If there are multiple module threads making requests to the same
global queue, the lock is necessary to protect the queue. The
semaphore is only for blocking the thread until Emacs has finished
writing its particular response. Each thread has its own semaphore.</p>

<p>When Emacs is done writing the response, it releases the module thread
by incrementing the semaphore. It might look something like this:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">emacs_value</span>
<span class="nf">Frequest_complete</span><span class="p">(</span><span class="n">emacs_env</span> <span class="o">*</span><span class="n">env</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">n</span><span class="p">,</span> <span class="n">emacs_value</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">p</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">request</span> <span class="o">*</span><span class="n">request</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">get_user_ptr</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">request</span><span class="p">)</span>
        <span class="n">sem_post</span><span class="p">(</span><span class="o">&amp;</span><span class="n">request</span><span class="o">-&gt;</span><span class="n">sem</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">Qnil</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The top-level handler dispatches to the specific request handler,
calling <code class="language-plaintext highlighter-rouge">request-complete</code> above when it’s done.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">request-handle</span> <span class="p">(</span><span class="nv">next</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">condition-case</span> <span class="nv">e</span>
      <span class="p">(</span><span class="nv">cl-ecase</span> <span class="p">(</span><span class="nv">request-type</span> <span class="nv">next</span><span class="p">)</span>
        <span class="p">(</span><span class="ss">:open</span>  <span class="p">(</span><span class="nv">request-handle-open</span>  <span class="nv">next</span><span class="p">))</span>
        <span class="p">(</span><span class="ss">:close</span> <span class="p">(</span><span class="nv">request-handle-close</span> <span class="nv">next</span><span class="p">))</span>
        <span class="p">(</span><span class="ss">:read</span>  <span class="p">(</span><span class="nv">request-handle-read</span>  <span class="nv">next</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">error</span> <span class="p">(</span><span class="nv">request-respond-as-error</span> <span class="nv">next</span> <span class="nv">e</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">request-complete</span><span class="p">))</span>
</code></pre></div></div>

<p>This SIGUSR1+semaphore mechanism is roughly how Elfuse currently
processes requests.</p>

<h3 id="making-it-work-on-windows">Making it work on Windows</h3>

<p>Windows doesn’t have signals. This isn’t a problem for Elfuse since
Windows doesn’t have FUSE either. Nor does it matter for Joymacs since
XInput isn’t event-driven and always requires polling. But someday
someone will need this mechanism for a dynamic module on Windows.</p>

<p>Fortunately there’s a solution: <em>input language change</em> events,
<code class="language-plaintext highlighter-rouge">WM_INPUTLANGCHANGE</code>. It’s also on <code class="language-plaintext highlighter-rouge">special-event-map</code>:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">define-key</span> <span class="nv">special-event-map</span> <span class="nv">[language-change]</span>
  <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
    <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">request-process</span> <span class="p">(</span><span class="nv">request-next</span><span class="p">))))</span>
</code></pre></div></div>

<p>Instead of <code class="language-plaintext highlighter-rouge">raise()</code> (or <code class="language-plaintext highlighter-rouge">pthread_kill()</code>), broadcast the window event
with <code class="language-plaintext highlighter-rouge">PostMessage()</code>. Outside of invoking the <code class="language-plaintext highlighter-rouge">language-change</code> key
binding, Emacs will ignore the event because WPARAM is 0 — it doesn’t
belong to any particular window. We don’t <em>really</em> want to change the
input language, after all.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">PostMessageA</span><span class="p">(</span><span class="n">HWND_BROADCAST</span><span class="p">,</span> <span class="n">WM_INPUTLANGCHANGE</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
</code></pre></div></div>

<p>Naturally you’ll also need to replace the POSIX threading primitives
with the Windows versions (<code class="language-plaintext highlighter-rouge">CreateThread()</code>, <code class="language-plaintext highlighter-rouge">CreateSemaphore()</code>,
etc.). With a bit of abstraction in the right places, it should be
pretty easy to support both POSIX and Windows in these asynchronous
dynamic module events.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>How to Write Fast(er) Emacs Lisp</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2017/01/30/"/>
    <id>urn:uuid:cee07e3d-08cc-3465-1a29-c1e30b5bd0e2</id>
    <updated>2017-01-30T21:08:19Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="optimization"/>
    <content type="html">
      <![CDATA[<p>Not everything written in Emacs Lisp needs to be fast. Most of Emacs
itself — around 82% — is written in Emacs Lisp <em>because</em> those parts
are generally not performance-critical. Otherwise these functions
would be built-ins written in C. Extensions to Emacs don’t have a
choice and — outside of a few exceptions like <a href="/blog/2016/11/05/">dynamic modules</a>
and inferior processes — must be written in Emacs Lisp, including
their performance-critical bits. Common performance hot spots are
automatic indentation, <a href="https://github.com/mooz/js2-mode">AST parsing</a>, and <a href="/blog/2016/12/11/">interactive
completion</a>.</p>

<p>Here are 5 guidelines, each very specific to Emacs Lisp, that will
result in faster code. The non-intrusive guidelines could be applied
at all times as a matter of style — choosing one equally expressive
and maintainable form over another just because it performs better.</p>

<p>There’s one caveat: These guidelines are focused on Emacs 25.1 and
“nearby” versions. Emacs is constantly evolving. Changes to the
<a href="/blog/2014/01/04/">virtual machine</a> and byte-code compiler may transform
currently-slow expressions into fast code, obsoleting some of these
guidelines. In the future I’ll add notes to this article for anything
that changes.</p>

<h3 id="1-use-lexical-scope">(1) Use lexical scope</h3>

<p>This guideline refers to the following being the first line of every
Emacs Lisp source file you write:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: t; -*-</span>
</code></pre></div></div>

<p>This point is worth mentioning again and again. Not only will <a href="/blog/2016/12/22/">your
code be more correct</a>, it will be measurably faster. Dynamic
scope is still opt-in through the explicit use of <em>special variables</em>,
so there’s absolutely no reason not to be using lexical scope. If
you’ve written clean, dynamic scope code, then switching to lexical
scope won’t have any effect on its behavior.</p>

<p>Along similar lines, special variables are a lot slower than local,
lexical variables. Only use them when necessary.</p>

<h3 id="2-prefer-built-in-functions">(2) Prefer built-in functions</h3>

<p>Built-in functions are written in C and are, as expected,
significantly faster than the equivalent written in Emacs Lisp.
Complete as much work as possible inside built-in functions, even if
it might mean taking more conceptual steps overall.</p>

<p>For example, what’s the fastest way to accumulate a list of items?
That is, new items go on the tail but, for algorithm reasons, the list
must be constructed from the head.</p>

<p>You might be tempted to keep track of the tail of the list, appending
new elements directly to the tail with <code class="language-plaintext highlighter-rouge">setcdr</code> (via <code class="language-plaintext highlighter-rouge">setf</code> below).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">fib-track-tail</span> <span class="p">(</span><span class="nv">n</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">a</span> <span class="mi">0</span><span class="p">)</span>
         <span class="p">(</span><span class="nv">b</span> <span class="mi">1</span><span class="p">)</span>
         <span class="p">(</span><span class="nv">head</span> <span class="p">(</span><span class="nb">list</span> <span class="mi">1</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">tail</span> <span class="nv">head</span><span class="p">))</span>
    <span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">_</span> <span class="nv">n</span> <span class="nv">head</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">psetf</span> <span class="nv">a</span> <span class="nv">b</span>
             <span class="nv">b</span> <span class="p">(</span><span class="nb">+</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">))</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">cdr</span> <span class="nv">tail</span><span class="p">)</span> <span class="p">(</span><span class="nb">list</span> <span class="nv">b</span><span class="p">)</span>
            <span class="nv">tail</span> <span class="p">(</span><span class="nb">cdr</span> <span class="nv">tail</span><span class="p">)))))</span>

<span class="p">(</span><span class="nv">fib-track-tail</span> <span class="mi">8</span><span class="p">)</span>
<span class="c1">;; =&gt; (1 1 2 3 5 8 13 21 34)</span>
</code></pre></div></div>

<p>Actually, it’s much faster to construct the list in reverse, then
destructively reverse it at the end.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">fib-nreverse</span> <span class="p">(</span><span class="nv">n</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">a</span> <span class="mi">0</span><span class="p">)</span>
         <span class="p">(</span><span class="nv">b</span> <span class="mi">1</span><span class="p">)</span>
         <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="nb">list</span> <span class="mi">1</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">_</span> <span class="nv">n</span> <span class="p">(</span><span class="nb">nreverse</span> <span class="nb">list</span><span class="p">))</span>
      <span class="p">(</span><span class="nb">psetf</span> <span class="nv">a</span> <span class="nv">b</span>
             <span class="nv">b</span> <span class="p">(</span><span class="nb">+</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">))</span>
      <span class="p">(</span><span class="nb">push</span> <span class="nv">b</span> <span class="nb">list</span><span class="p">))))</span>
</code></pre></div></div>

<p>It might not look it, but <code class="language-plaintext highlighter-rouge">nreverse</code> is <em>very</em> fast. Not only is it a
built-in, it’s got its own opcode. Using <code class="language-plaintext highlighter-rouge">push</code> in a loop, then
finishing with <code class="language-plaintext highlighter-rouge">nreverse</code> is the canonical and fastest way to
accumulate a list of items.</p>

<p>In <code class="language-plaintext highlighter-rouge">fib-track-tail</code>, the added complexity of tracking the tail in
Emacs Lisp is much slower than zipping over the entire list a second
time in C.</p>

<h3 id="3-avoid-unnecessary-lambda-functions">(3) Avoid unnecessary lambda functions</h3>

<p>I’m talking about <code class="language-plaintext highlighter-rouge">mapcar</code> and friends.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Slower</span>
<span class="p">(</span><span class="nb">defun</span> <span class="nv">expt-list</span> <span class="p">(</span><span class="nb">list</span> <span class="nv">e</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">mapcar</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="p">(</span><span class="nb">expt</span> <span class="nv">x</span> <span class="nv">e</span><span class="p">))</span> <span class="nb">list</span><span class="p">))</span>
</code></pre></div></div>

<p>Listen, I know you love <a href="https://github.com/magnars/dash.el">dash.el</a> and higher order functions,
but <em>this habit ain’t cheap</em>. The byte-code compiler does not know how
to inline these lambdas, so there’s an additional per-element function
call overhead.</p>

<p>Worse, if you’re using lexical scope like I told you, the above
example forms a <em>closure</em> over <code class="language-plaintext highlighter-rouge">e</code>. This means a new function object
is created (e.g. <code class="language-plaintext highlighter-rouge">make-byte-code</code>) each time <code class="language-plaintext highlighter-rouge">expt-list</code> is called. To
be clear, I don’t mean that the lambda is recompiled each time — the
same byte-code string is shared between all instances of the same
lambda. A unique function vector (<code class="language-plaintext highlighter-rouge">#[...]</code>) and constants vector are
allocated and initialized each time <code class="language-plaintext highlighter-rouge">expt-list</code> is invoked.</p>

<p>Related mini-guideline: Don’t create any more garbage than strictly
necessary in performance-critical code.</p>

<p>Compare to an implementation with an explicit loop, using the
<code class="language-plaintext highlighter-rouge">nreverse</code> list-accumulation technique.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">expt-list-fast</span> <span class="p">(</span><span class="nb">list</span> <span class="nv">e</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">result</span> <span class="p">()))</span>
    <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">x</span> <span class="nb">list</span> <span class="p">(</span><span class="nb">nreverse</span> <span class="nv">result</span><span class="p">))</span>
      <span class="p">(</span><span class="nb">push</span> <span class="p">(</span><span class="nb">expt</span> <span class="nv">x</span> <span class="nv">e</span><span class="p">)</span> <span class="nv">result</span><span class="p">))))</span>
</code></pre></div></div>

<ul>
  <li>No unnecessary garbage is created.</li>
  <li>No unnecessary per-element function calls.</li>
</ul>

<p>This is the fastest possible definition for this function, and it’s
what you need to use in performance-critical code.</p>

<p>Personally I prefer the list comprehension approach, using <code class="language-plaintext highlighter-rouge">cl-loop</code>
from <code class="language-plaintext highlighter-rouge">cl-lib</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">expt-list-fast</span> <span class="p">(</span><span class="nb">list</span> <span class="nv">e</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">x</span> <span class="nv">in</span> <span class="nb">list</span>
           <span class="nv">collect</span> <span class="p">(</span><span class="nb">expt</span> <span class="nv">x</span> <span class="nv">e</span><span class="p">)))</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">cl-loop</code> macro will expand into essentially the previous
definition, making them practically equivalent. It takes some getting
used to, but writing efficient loops is a whole lot less tedious with
<code class="language-plaintext highlighter-rouge">cl-loop</code>.</p>

<p>In Emacs 24.4 and earlier, <code class="language-plaintext highlighter-rouge">catch</code>/<code class="language-plaintext highlighter-rouge">throw</code> is implemented by
converting the body of the <code class="language-plaintext highlighter-rouge">catch</code> into a lambda function and calling
it. If code inside the <code class="language-plaintext highlighter-rouge">catch</code> accesses a variable outside the <code class="language-plaintext highlighter-rouge">catch</code>
(very likely), then, in lexical scope, it turns into a closure,
resulting in the garbage function object like before.</p>

<p>In Emacs 24.5 and later, the byte-code compiler uses a new opcode,
<code class="language-plaintext highlighter-rouge">pushcatch</code>. It’s a whole lot more efficient, and there’s no longer a
reason to shy away from <code class="language-plaintext highlighter-rouge">catch</code>/<code class="language-plaintext highlighter-rouge">throw</code> in performance-critical code.
This is important because it’s often the only way to perform an early
bailout.</p>

<h3 id="4-prefer-using-functions-with-dedicated-opcodes">(4) Prefer using functions with dedicated opcodes</h3>

<p>When following the guideline about using built-in functions, you might
have several to pick from. Some built-in functions have dedicated
virtual machine opcodes, making them much faster to invoke. Prefer
these functions when possible.</p>

<p>How can you tell when a function has an assigned opcode? Take a peek
at the <code class="language-plaintext highlighter-rouge">byte-defop</code> listings in <a href="https://github.com/emacs-mirror/emacs/blob/master/lisp/emacs-lisp/bytecomp.el">bytecomp.el</a>. Optimization often
involves getting into the weeds, so don’t be shy.</p>

<p>For example, the <code class="language-plaintext highlighter-rouge">assq</code> and <code class="language-plaintext highlighter-rouge">assoc</code> functions search for a matching
key in an association list (alist). Both are built-in functions, and
the only difference is that the former compares keys with <code class="language-plaintext highlighter-rouge">eq</code> (e.g.
symbol or integer keys) and the latter with <code class="language-plaintext highlighter-rouge">equal</code> (typically string
keys). The difference in performance between <code class="language-plaintext highlighter-rouge">eq</code> and <code class="language-plaintext highlighter-rouge">equal</code> isn’t as
important as another factor: <code class="language-plaintext highlighter-rouge">assq</code> has its own opcode (158).</p>

<p>This means in performance-critical code you should prefer <code class="language-plaintext highlighter-rouge">assq</code>,
perhaps even going as far as restructuring your alists specifically to
have <code class="language-plaintext highlighter-rouge">eq</code> keys. That last step is probably a trade-off, which means
you’ll want to make some benchmarks to help with that decision.</p>

<p>Another example is <code class="language-plaintext highlighter-rouge">eq</code>, <code class="language-plaintext highlighter-rouge">=</code>, <code class="language-plaintext highlighter-rouge">eql</code>, and <code class="language-plaintext highlighter-rouge">equal</code>. Some macros and
functions use <code class="language-plaintext highlighter-rouge">eql</code>, especially <code class="language-plaintext highlighter-rouge">cl-lib</code> which inherits <code class="language-plaintext highlighter-rouge">eql</code> as a
default from Common Lisp. Take <code class="language-plaintext highlighter-rouge">cl-case</code>, which is like <code class="language-plaintext highlighter-rouge">switch</code> from
the C family of languages. It compares elements with <code class="language-plaintext highlighter-rouge">eql</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">op-apply</span> <span class="p">(</span><span class="nv">op</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-case</span> <span class="nv">op</span>
    <span class="p">(</span><span class="ss">:norm</span> <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">a</span> <span class="nv">a</span><span class="p">)</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">b</span> <span class="nv">b</span><span class="p">)))</span>
    <span class="p">(</span><span class="ss">:disp</span> <span class="p">(</span><span class="nb">abs</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
    <span class="p">(</span><span class="ss">:isin</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">b</span> <span class="p">(</span><span class="nb">sin</span> <span class="nv">a</span><span class="p">)))))</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">cl-case</code> expands into a <code class="language-plaintext highlighter-rouge">cond</code>. Since Emacs byte-code lacks
support for jump tables, there’s not much room for cleverness.</p>

<p><strong>Update</strong>: Emacs 26.1, released May 2018, introduced a jump table
opcode.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">op-apply</span> <span class="p">(</span><span class="nv">op</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">cond</span>
   <span class="p">((</span><span class="nb">eql</span> <span class="nv">op</span> <span class="ss">:norm</span><span class="p">)</span> <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">a</span> <span class="nv">a</span><span class="p">)</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">b</span> <span class="nv">b</span><span class="p">)))</span>
   <span class="p">((</span><span class="nb">eql</span> <span class="nv">op</span> <span class="ss">:disp</span><span class="p">)</span> <span class="p">(</span><span class="nb">abs</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
   <span class="p">((</span><span class="nb">eql</span> <span class="nv">op</span> <span class="ss">:isin</span><span class="p">)</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">b</span> <span class="p">(</span><span class="nb">sin</span> <span class="nv">a</span><span class="p">)))))</span>
</code></pre></div></div>

<p>It turns out <code class="language-plaintext highlighter-rouge">eql</code> is pretty much always the worst choice for
<code class="language-plaintext highlighter-rouge">cl-case</code>. Of the four equality functions I listed, the only one
lacking an opcode is <code class="language-plaintext highlighter-rouge">eql</code>. A faster definition would use <code class="language-plaintext highlighter-rouge">eq</code>. (In
theory, <code class="language-plaintext highlighter-rouge">cl-case</code> <em>could</em> have done this itself because it knows all
the keys are symbols.)</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">op-apply</span> <span class="p">(</span><span class="nv">op</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">cond</span>
   <span class="p">((</span><span class="nb">eq</span> <span class="nv">op</span> <span class="ss">:norm</span><span class="p">)</span> <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">a</span> <span class="nv">a</span><span class="p">)</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">b</span> <span class="nv">b</span><span class="p">)))</span>
   <span class="p">((</span><span class="nb">eq</span> <span class="nv">op</span> <span class="ss">:disp</span><span class="p">)</span> <span class="p">(</span><span class="nb">abs</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
   <span class="p">((</span><span class="nb">eq</span> <span class="nv">op</span> <span class="ss">:isin</span><span class="p">)</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">b</span> <span class="p">(</span><span class="nb">sin</span> <span class="nv">a</span><span class="p">)))))</span>
</code></pre></div></div>

<p>Fortunately <code class="language-plaintext highlighter-rouge">eq</code> can safely compare integers in Emacs Lisp. You only
need <code class="language-plaintext highlighter-rouge">eql</code> when comparing symbols, integers, and floats all at once,
which is unusual.</p>

<h3 id="5-unroll-loops-using-andor">(5) Unroll loops using and/or</h3>

<p>Consider the following function which checks its argument against a
list of numbers, bailing out on the first match. I used <code class="language-plaintext highlighter-rouge">%</code> instead of
<code class="language-plaintext highlighter-rouge">mod</code> since the former has an opcode (166) and the latter does not.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">detect</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span>
  <span class="p">(</span><span class="k">catch</span> <span class="ss">'found</span>
    <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">f</span> <span class="o">'</span><span class="p">(</span><span class="mi">2</span> <span class="mi">3</span> <span class="mi">5</span> <span class="mi">7</span> <span class="mi">11</span> <span class="mi">13</span> <span class="mi">17</span> <span class="mi">19</span> <span class="mi">23</span> <span class="mi">29</span> <span class="mi">31</span><span class="p">))</span>
      <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="nv">x</span> <span class="nv">f</span><span class="p">))</span>
        <span class="p">(</span><span class="k">throw</span> <span class="ss">'found</span> <span class="nv">f</span><span class="p">)))))</span>
</code></pre></div></div>

<p>The byte-code compiler doesn’t know how to unroll loops. Fortunately
that’s something we can do for ourselves using <code class="language-plaintext highlighter-rouge">and</code> and <code class="language-plaintext highlighter-rouge">or</code>. The
compiler will turn this into clean, efficient jumps in the byte-code.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">detect-unrolled</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">or</span> <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="nv">x</span> <span class="mi">2</span><span class="p">))</span> <span class="mi">2</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="nv">x</span> <span class="mi">3</span><span class="p">))</span> <span class="mi">3</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="nv">x</span> <span class="mi">5</span><span class="p">))</span> <span class="mi">5</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="nv">x</span> <span class="mi">7</span><span class="p">))</span> <span class="mi">7</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="nv">x</span> <span class="mi">11</span><span class="p">))</span> <span class="mi">11</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="nv">x</span> <span class="mi">13</span><span class="p">))</span> <span class="mi">13</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="nv">x</span> <span class="mi">17</span><span class="p">))</span> <span class="mi">17</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="nv">x</span> <span class="mi">19</span><span class="p">))</span> <span class="mi">19</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="nv">x</span> <span class="mi">23</span><span class="p">))</span> <span class="mi">23</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="nv">x</span> <span class="mi">29</span><span class="p">))</span> <span class="mi">29</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="nv">x</span> <span class="mi">31</span><span class="p">))</span> <span class="mi">31</span><span class="p">)))</span>
</code></pre></div></div>

<p>In Emacs 24.4 and earlier with the old-fashioned lambda-based <code class="language-plaintext highlighter-rouge">catch</code>,
the unrolled definition is seven times faster. With the faster
<code class="language-plaintext highlighter-rouge">pushcatch</code>-based <code class="language-plaintext highlighter-rouge">catch</code> it’s about twice as fast. This means the
loop overhead accounts for about half the work of the first definition
of this function.</p>

<p>Update: It was pointed out in the comments that this particular
example is equivalent to a <code class="language-plaintext highlighter-rouge">cond</code>. That’s literally true all the way
down to the byte-code, and it would be a clearer way to express the
unrolled code. In real code it’s often not <em>quite</em> equivalent.</p>

<p>Unlike some of the other guidelines, this is certainly something you’d
only want to do in code you know for sure is performance-critical.
Maintaining unrolled code is tedious and error-prone.</p>

<p>I’ve had the most success with this approach by not by unrolling these
loops myself, but by <a href="/blog/2016/12/27/">using a macro</a>, or <a href="/blog/2016/12/11/">similar</a>, to
generate the unrolled form.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmacro</span> <span class="nv">with-detect</span> <span class="p">(</span><span class="nv">var</span> <span class="nb">list</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">e</span> <span class="nv">in</span> <span class="nb">list</span>
           <span class="nv">collect</span> <span class="o">`</span><span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">0</span> <span class="p">(</span><span class="nv">%</span> <span class="o">,</span><span class="nv">var</span> <span class="o">,</span><span class="nv">e</span><span class="p">))</span> <span class="o">,</span><span class="nv">e</span><span class="p">)</span> <span class="nv">into</span> <span class="nv">conditions</span>
           <span class="nv">finally</span> <span class="nb">return</span> <span class="o">`</span><span class="p">(</span><span class="nb">or</span> <span class="o">,@</span><span class="nv">conditions</span><span class="p">)))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">detect-unrolled</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">with-detect</span> <span class="nv">x</span> <span class="p">(</span><span class="mi">2</span> <span class="mi">3</span> <span class="mi">5</span> <span class="mi">7</span> <span class="mi">11</span> <span class="mi">13</span> <span class="mi">17</span> <span class="mi">19</span> <span class="mi">23</span> <span class="mi">29</span> <span class="mi">31</span><span class="p">)))</span>
</code></pre></div></div>

<h3 id="how-can-i-find-more-optimization-opportunities-myself">How can I find more optimization opportunities myself?</h3>

<p>Use <code class="language-plaintext highlighter-rouge">M-x disassemble</code> to inspect the byte-code for your own hot spots.
Observe how the byte-code changes in response to changes in your
functions. Take note of the sorts of forms that allow the byte-code
compiler to produce the best code, and then exploit it where you can.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  <entry>
    <title>Domain-Specific Language Compilation in Elfeed</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/12/27/"/>
    <id>urn:uuid:6a6cd6a2-b44d-35b5-503c-c496d9094ac0</id>
    <updated>2016-12-27T21:46:30Z</updated>
    <category term="elfeed"/><category term="emacs"/><category term="elisp"/><category term="optimization"/>
    <content type="html">
      <![CDATA[<p>Last night I pushed another performance enhancement for Elfeed, this
time reducing the time spent parsing feeds. It’s accomplished by
compiling, during macro expansion, a jQuery-like domain-specific
language within Elfeed.</p>

<h3 id="heuristic-parsing">Heuristic parsing</h3>

<p>Given the nature of the domain — <a href="/blog/2013/09/23/">an under-specified standard</a>
and a lack of robust adherence — feed parsing is much more heuristic
than strict. Sure, everyone’s feed XML is strictly conforming since
virtually no feed reader tolerates invalid XML (thank you, XML
libraries), but, for the schema, the situation resembles the <em>de
facto</em> looseness of HTML. Sometimes important or required information
is missing, or is only available in <a href="https://www.intertwingly.net/wiki/pie/DublinCore">a different namespace</a>.
Sometimes, especially in the case of timestamps, it’s in the wrong
format, or encoded incorrectly, or ambiguous. It’s real world data.</p>

<p>To get a particular piece of information, Elfeed looks in a number of
different places within the feed, starting with the preferred source
and stopping when the information is found. For example, to find the
date of an Atom entry, Elfeed first searches for elements in this
order:</p>

<ol>
  <li><code class="language-plaintext highlighter-rouge">&lt;published&gt;</code></li>
  <li><code class="language-plaintext highlighter-rouge">&lt;updated&gt;</code></li>
  <li><code class="language-plaintext highlighter-rouge">&lt;date&gt;</code></li>
  <li><code class="language-plaintext highlighter-rouge">&lt;modified&gt;</code></li>
  <li><code class="language-plaintext highlighter-rouge">&lt;issued&gt;</code></li>
</ol>

<p>Failing to find any of these elements, or if no parsable date is
found, it settles on the current time. Only the <code class="language-plaintext highlighter-rouge">updated</code> element is
required, but <code class="language-plaintext highlighter-rouge">published</code> usually has the desired information, so it
goes first. The last three are only valid for another namespace, but
are useful fallbacks.</p>

<p>Before Elfeed even starts this search, the XML text is parsed into an
s-expression using <code class="language-plaintext highlighter-rouge">xml-parse-region</code> — a pure Elisp XML parser
included in Emacs. The search is made over the resulting s-expression.</p>

<p>For example, here’s a sample <a href="https://tools.ietf.org/html/rfc4287">from the Atom specification</a>.</p>

<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">&lt;?xml version="1.0" encoding="utf-8"?&gt;</span>
<span class="nt">&lt;feed</span> <span class="na">xmlns=</span><span class="s">"http://www.w3.org/2005/Atom"</span><span class="nt">&gt;</span>

  <span class="nt">&lt;title&gt;</span>Example Feed<span class="nt">&lt;/title&gt;</span>
  <span class="nt">&lt;link</span> <span class="na">href=</span><span class="s">"http://example.org/"</span><span class="nt">/&gt;</span>
  <span class="nt">&lt;updated&gt;</span>2003-12-13T18:30:02Z<span class="nt">&lt;/updated&gt;</span>
  <span class="nt">&lt;author&gt;</span>
    <span class="nt">&lt;name&gt;</span>John Doe<span class="nt">&lt;/name&gt;</span>
  <span class="nt">&lt;/author&gt;</span>
  <span class="nt">&lt;id&gt;</span>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6<span class="nt">&lt;/id&gt;</span>

  <span class="nt">&lt;entry&gt;</span>
    <span class="nt">&lt;title&gt;</span>Atom-Powered Robots Run Amok<span class="nt">&lt;/title&gt;</span>
    <span class="nt">&lt;link</span> <span class="na">rel=</span><span class="s">"alternate"</span> <span class="na">href=</span><span class="s">"http://example.org/2003/12/13/atom03"</span><span class="nt">/&gt;</span>
    <span class="nt">&lt;id&gt;</span>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a<span class="nt">&lt;/id&gt;</span>
    <span class="nt">&lt;updated&gt;</span>2003-12-13T18:30:02Z<span class="nt">&lt;/updated&gt;</span>
    <span class="nt">&lt;summary&gt;</span>Some text.<span class="nt">&lt;/summary&gt;</span>
  <span class="nt">&lt;/entry&gt;</span>

<span class="nt">&lt;/feed&gt;</span>
</code></pre></div></div>

<p>Which is parsed to into this s-expression.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">((</span><span class="nv">feed</span> <span class="p">((</span><span class="nv">xmlns</span> <span class="o">.</span> <span class="s">"http://www.w3.org/2005/Atom"</span><span class="p">))</span>
       <span class="p">(</span><span class="nv">title</span> <span class="p">()</span> <span class="s">"Example Feed"</span><span class="p">)</span>
       <span class="p">(</span><span class="nv">link</span> <span class="p">((</span><span class="nv">href</span> <span class="o">.</span> <span class="s">"http://example.org/"</span><span class="p">)))</span>
       <span class="p">(</span><span class="nv">updated</span> <span class="p">()</span> <span class="s">"2003-12-13T18:30:02Z"</span><span class="p">)</span>
       <span class="p">(</span><span class="nv">author</span> <span class="p">()</span> <span class="p">(</span><span class="nv">name</span> <span class="p">()</span> <span class="s">"John Doe"</span><span class="p">))</span>
       <span class="p">(</span><span class="nv">id</span> <span class="p">()</span> <span class="s">"urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6"</span><span class="p">)</span>
       <span class="p">(</span><span class="nv">entry</span> <span class="p">()</span>
              <span class="p">(</span><span class="nv">title</span> <span class="p">()</span> <span class="s">"Atom-Powered Robots Run Amok"</span><span class="p">)</span>
              <span class="p">(</span><span class="nv">link</span> <span class="p">((</span><span class="nv">rel</span> <span class="o">.</span> <span class="s">"alternate"</span><span class="p">)</span>
                     <span class="p">(</span><span class="nv">href</span> <span class="o">.</span> <span class="s">"http://example.org/2003/12/13/atom03"</span><span class="p">)))</span>
              <span class="p">(</span><span class="nv">id</span> <span class="p">()</span> <span class="s">"urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a"</span><span class="p">)</span>
              <span class="p">(</span><span class="nv">updated</span> <span class="p">()</span> <span class="s">"2003-12-13T18:30:02Z"</span><span class="p">)</span>
              <span class="p">(</span><span class="nv">summary</span> <span class="p">()</span> <span class="s">"Some text."</span><span class="p">))))</span>
</code></pre></div></div>

<p>Each XML element is converted to a list. The first item is a symbol
that is the element’s name. The second item is an alist of attributes
— cons pairs of symbols and strings. And the rest are its children,
both string nodes and other elements. I’ve trimmed the extraneous
string nodes from the sample s-expression.</p>

<p>A subtle detail is that <code class="language-plaintext highlighter-rouge">xml-parse-region</code> doesn’t just return the
root element. It returns a <em>list of elements</em>, which always happens to
be a single element list, which is the root element. I don’t know why
this is, but I’ve built everything to assume this structure as input.</p>

<p>Elfeed strips all namespaces stripped from both elements and
attributes to make parsing simpler. As I said, it’s heuristic rather
than strict, so namespaces are treated as noise.</p>

<h3 id="a-domain-specific-language">A domain-specific language</h3>

<p>Coding up Elfeed’s s-expression searches in straight Emacs Lisp would
be tedious, error-prone, and difficult to understand. It’s a lot of
loops, <code class="language-plaintext highlighter-rouge">assoc</code>, etc. So instead I invented a jQuery-like, CSS
selector-like, domain-specific language (DSL) to express these
searches concisely and clearly.</p>

<p>For example, all of the entry links are “selected” using this
expression:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">feed</span> <span class="nv">entry</span> <span class="nv">link</span> <span class="nv">[rel</span> <span class="s">"alternate"</span><span class="nv">]</span> <span class="ss">:href</span><span class="p">)</span>
</code></pre></div></div>

<p>Reading right-to-left, this matches every <code class="language-plaintext highlighter-rouge">href</code> attribute under every
<code class="language-plaintext highlighter-rouge">link</code> element with the <code class="language-plaintext highlighter-rouge">rel="alternate"</code> attribute, under every
<code class="language-plaintext highlighter-rouge">entry</code> element, under the <code class="language-plaintext highlighter-rouge">feed</code> root element. Symbols match element
names, two-element vectors match elements with a particular attribute
pair, and keywords (which must come last) narrow the selection to a
specific attribute value.</p>

<p>Imagine hand-writing the code to navigate all these conditions for
each piece of information that Elfeed requires. The RSS parser makes
up to 16 such queries, and the Atom parser makes as many as 24. That
would add up to a lot of tedious code.</p>

<p>The package (included with Elfeed) that executes this query is called
“xml-query.” It comes in two flavors: <code class="language-plaintext highlighter-rouge">xml-query</code> and <code class="language-plaintext highlighter-rouge">xml-query-all</code>.
The former returns just the first match, and the latter returns all
matches. The naming parallels the <code class="language-plaintext highlighter-rouge">querySelector()</code> and
<code class="language-plaintext highlighter-rouge">querySelectorAll()</code> DOM methods in JavaScript.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">xml</span> <span class="p">(</span><span class="nv">elfeed-xml-parse-region</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">xml-query-all</span> <span class="o">'</span><span class="p">(</span><span class="nv">feed</span> <span class="nv">entry</span> <span class="nv">link</span> <span class="nv">[rel</span> <span class="s">"alternate"</span><span class="nv">]</span> <span class="ss">:href</span><span class="p">)</span> <span class="nv">xml</span><span class="p">))</span>

<span class="c1">;; =&gt; ("http://example.org/2003/12/13/atom03")</span>
</code></pre></div></div>

<p>That date search I mentioned before looks roughly like this. The <code class="language-plaintext highlighter-rouge">*</code>
matches text nodes within the selected element. It must come last just
like the keyword matcher.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">or</span> <span class="p">(</span><span class="nv">xml-query</span> <span class="o">'</span><span class="p">(</span><span class="nv">feed</span> <span class="nv">entry</span> <span class="nv">published</span> <span class="nb">*</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">xml-query</span> <span class="o">'</span><span class="p">(</span><span class="nv">feed</span> <span class="nv">entry</span> <span class="nv">updated</span> <span class="nb">*</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">xml-query</span> <span class="o">'</span><span class="p">(</span><span class="nv">feed</span> <span class="nv">entry</span> <span class="nv">date</span> <span class="nb">*</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">xml-query</span> <span class="o">'</span><span class="p">(</span><span class="nv">feed</span> <span class="nv">entry</span> <span class="nv">modified</span> <span class="nb">*</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">xml-query</span> <span class="o">'</span><span class="p">(</span><span class="nv">feed</span> <span class="nv">entry</span> <span class="nv">issued</span> <span class="nb">*</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">current-time</span><span class="p">))</span>
</code></pre></div></div>

<p>Over the past three years, Elfeed has gained more and more of these
selectors as it collects more and more information from feeds. Most
recently, Elfeed collects author and category information provided by
feeds. Each new query slows feed parsing a little bit, and it’s a
perfect example of a program slowing down as it gains more features
and capabilities.</p>

<p>But I don’t want Elfeed to slow down. I want it to get <em>faster</em>!</p>

<h3 id="optimizing-the-domain-specific-language">Optimizing the domain-specific language</h3>

<p>Just like the primary jQuery function (<code class="language-plaintext highlighter-rouge">$</code>), both <code class="language-plaintext highlighter-rouge">xml-query</code> and
<code class="language-plaintext highlighter-rouge">xml-query-all</code> are functions. The xml-query engine processes the
selector from scratch on each invocation. It examines the first
element, dispatches on its type/value to apply it to the input, and
then recurses on the rest of selector with the narrowed input,
stopping when it hits the end of the list. That’s the way it’s worked
from the start.</p>

<p>However, every selector argument in Elfeed is a static, quoted list.
<a href="/blog/2016/12/11/">Unlike user-supplied filters</a>, I know exactly what I want to
execute ahead of time. It would be much better if the engine didn’t
have to waste time reparsing the DSL for each query.</p>

<p>This is the classic split between interpreters and compilers. An
interpreter reads input and immediately executes it, doing what the
input tells it to do. A compiler reads input and, rather than execute
it, produces output, usually in a simpler language, that, when
evaluated, has the same effect as executing the input.</p>

<p>Rather than interpret the selector, it would be better to compile it
into Elisp code, compile that <a href="/blog/2014/01/04/">into byte-code</a>, and then have the
Emacs byte-code virtual machine (VM) execute the query each time it’s
needed. The extra work of parsing the DSL is performed ahead of time,
the dispatch is entirely static, and the selector ultimately executes
on a much faster engine (byte-code VM). This should be a lot faster!</p>

<p>So I wrote a function that accepts a selector expression and emits
Elisp source that implements that selector: a compiler for my DSL.
Having a readily-available syntax tree is one of the <a href="https://en.wikipedia.org/wiki/Homoiconicity">big advantages
of homoiconicity</a>, and this sort of function makes perfect sense
in a lisp. For the external interface, this compiler function is
called by a new pair of macros, <code class="language-plaintext highlighter-rouge">xml-query*</code> and <code class="language-plaintext highlighter-rouge">xml-query-all*</code>.
These macros consume a static selector and expand into the compiled
Elisp form of the selector.</p>

<p>To demonstrate, remember that link query from before? Here’s the macro
version of that selection, but only returning the first match. Notice
the selector is no longer quoted. This is because it’s consumed by the
macro, not evaluated.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">xml-query*</span> <span class="p">(</span><span class="nv">feed</span> <span class="nv">entry</span> <span class="nv">title</span> <span class="nv">[rel</span> <span class="s">"alternate"</span><span class="nv">]</span> <span class="ss">:href</span><span class="p">)</span> <span class="nv">xml</span><span class="p">)</span>
</code></pre></div></div>

<p>This will expand into the following code.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">catch</span> <span class="ss">'done</span>
  <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">v</span> <span class="nv">xml</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">consp</span> <span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">v</span><span class="p">)</span> <span class="ss">'feed</span><span class="p">))</span>
      <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">v</span> <span class="p">(</span><span class="nb">cddr</span> <span class="nv">v</span><span class="p">))</span>
        <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">consp</span> <span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">v</span><span class="p">)</span> <span class="ss">'entry</span><span class="p">))</span>
          <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">v</span> <span class="p">(</span><span class="nb">cddr</span> <span class="nv">v</span><span class="p">))</span>
            <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">consp</span> <span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">v</span><span class="p">)</span> <span class="ss">'title</span><span class="p">))</span>
              <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">value</span> <span class="p">(</span><span class="nb">cdr</span> <span class="p">(</span><span class="nv">assq</span> <span class="ss">'rel</span> <span class="p">(</span><span class="nb">cadr</span> <span class="nv">v</span><span class="p">)))))</span>
                <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nb">equal</span> <span class="nv">value</span> <span class="s">"alternate"</span><span class="p">)</span>
                  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">v</span> <span class="p">(</span><span class="nb">cdr</span> <span class="p">(</span><span class="nv">assq</span> <span class="ss">'href</span> <span class="p">(</span><span class="nb">cadr</span> <span class="nv">v</span><span class="p">)))))</span>
                    <span class="p">(</span><span class="nb">when</span> <span class="nv">v</span>
                      <span class="p">(</span><span class="k">throw</span> <span class="ss">'done</span> <span class="nv">v</span><span class="p">))))))))))))</span>
</code></pre></div></div>

<p>As soon as it finds a match, it’s thrown to the top level and
returned. Without the DSL, the expansion is essentially what would
have to be written by hand. <strong>This is <em>exactly</em> the sort of leverage
you should be getting from a compiler.</strong> It compiles to around 130
byte-code instructions.</p>

<p>The <code class="language-plaintext highlighter-rouge">xml-query-all*</code> form is nearly the same, but instead of a
<code class="language-plaintext highlighter-rouge">throw</code>, it pushes the result into the return list. Only the prologue
(the outermost part) and the epilogue (the innermost part) are
different.</p>

<p>Parsing feeds is a hot spot for Elfeed, so I wanted the compiler’s
output to be as efficient as possible. I had three goals for this:</p>

<ul>
  <li>
    <p><strong>No extraneous code.</strong> It’s easy for the compiler to emit
unnecessary code. The byte-code compiler might be able to eliminate
some of it, but I don’t want to rely on that. Except for the
identifiers, it should basically look like a human wrote it.</p>
  </li>
  <li>
    <p><strong>Avoid function calls.</strong> I don’t want to pay function call
overhead, and, with some care, it’s easy to avoid. In the
<code class="language-plaintext highlighter-rouge">xml-query*</code> expansion, the only function call is <code class="language-plaintext highlighter-rouge">throw</code>, which is
unavoidable. The <code class="language-plaintext highlighter-rouge">xml-query-all*</code> version makes no function calls
whatsoever. Notice that I used <code class="language-plaintext highlighter-rouge">assq</code> rather than <code class="language-plaintext highlighter-rouge">assoc</code>. First, it
only needs to match symbols, so it should be faster. Second, <code class="language-plaintext highlighter-rouge">assq</code>
has its own byte-code instruction (158) and <code class="language-plaintext highlighter-rouge">assoc</code> does not.</p>
  </li>
  <li>
    <p><strong>No unnecessary memory allocations</strong>. The <code class="language-plaintext highlighter-rouge">xml-query*</code> expansion
makes <em>no</em> allocations. The <code class="language-plaintext highlighter-rouge">xml-query-all*</code> version only conses
once per output, which is the minimum possible.</p>
  </li>
</ul>

<p>The end result is at least as optimal as hand-written code, but
without the chance of human error (typos, fat fingering) and sourced
from an easy-to-read DSL.</p>

<h3 id="performance">Performance</h3>

<p>In my tests, the <strong>xml-query macros are a full order of magnitude
faster than the functions</strong>. Yes, ten times faster! It’s an even
bigger gain than I expected.</p>

<p>In the full picture, xml-query is only one part of parsing a feed.
Measuring the time starting from raw XML text (as <a href="/blog/2016/06/16/">delivered by
cURL</a>) to a list of database entry objects, I’m seeing an
<strong>overall 25% speedup</strong> with the macros. The remaining time is
dominated by <code class="language-plaintext highlighter-rouge">xml-parse-region</code>, which is mostly out of my control.</p>

<p>With xml-query so computationally cheap, I don’t need to worry about
using it more often. Compared to parsing XML text, it’s virtually
free.</p>

<p>When it came time to validate my DSL compiler, I was <em>really</em> happy
that Elfeed had a test suite. I essentially rewrote a core component
from scratch, and passing all of the unit tests was a strong sign that
it was correct. Many times that test suite has provided confidence in
changes made both by me and by others.</p>

<p>I’ll end by describing another possible application: Apply this
technique to regular expressions, such that static strings containing
regular expressions are compiled into Elisp/byte-code via macro
expansion. I wonder if situationally this would be faster than Emacs’
own regular expression engine.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>Some Performance Advantages of Lexical Scope</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/12/22/"/>
    <id>urn:uuid:21bc4afa-caa8-37ed-a912-a35f35d0e432</id>
    <updated>2016-12-22T02:33:36Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="optimization"/><category term="compsci"/>
    <content type="html">
      <![CDATA[<p>I recently had a discussion with <a href="http://ergoemacs.org/">Xah Lee</a> about lexical scope in
Emacs Lisp. The topic was why <code class="language-plaintext highlighter-rouge">lexical-binding</code> exists at a file-level
when there was already <code class="language-plaintext highlighter-rouge">lexical-let</code> (from <code class="language-plaintext highlighter-rouge">cl-lib</code>), prompted by my
previous article on <a href="/blog/2016/12/11/">JIT byte-code compilation</a>. The specific
context is Emacs Lisp, but these concepts apply to language design in
general.</p>

<p>Until Emacs 24.1 (June 2012), Elisp only had dynamically scoped
variables — a feature, mostly by accident, common to old lisp
dialects. While dynamic scope has some selective uses, it’s widely
regarded as a mistake for local variables, and virtually no other
languages have adopted it.</p>

<p>Way back in 1993, Dave Gillespie’s deviously clever <code class="language-plaintext highlighter-rouge">lexical-let</code>
macro <a href="http://git.savannah.gnu.org/cgit/emacs.git/commit/?h=fcd73769&amp;id=fcd737693e8e320acd70f91ec8e0728563244805">was committed</a> to the <code class="language-plaintext highlighter-rouge">cl</code> package, providing a rudimentary
form of opt-in lexical scope. The macro walks its body replacing local
variable names with guaranteed-unique gensym names: the exact same
technique used in macros to create “hygienic” bindings that aren’t
visible to the macro body. It essentially “fakes” lexical scope within
Elisp’s dynamic scope by preventing variable name collisions.</p>

<p>For example, here’s one of the consequences of dynamic scope.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">inner</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">setq</span> <span class="nv">v</span> <span class="ss">:inner</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">outer</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">v</span> <span class="ss">:outer</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">inner</span><span class="p">)</span>
    <span class="nv">v</span><span class="p">))</span>

<span class="p">(</span><span class="nv">outer</span><span class="p">)</span>
<span class="c1">;; =&gt; :inner</span>
</code></pre></div></div>

<p>The “local” variable <code class="language-plaintext highlighter-rouge">v</code> in <code class="language-plaintext highlighter-rouge">outer</code> is visible to its callee, <code class="language-plaintext highlighter-rouge">inner</code>,
which can access and manipulate it. The meaning of the <em>free variable</em>
<code class="language-plaintext highlighter-rouge">v</code> in <code class="language-plaintext highlighter-rouge">inner</code> depends entirely on the run-time call stack. It might
be a global variable, or it might be a local variable for a caller,
direct or indirect.</p>

<p>Using <code class="language-plaintext highlighter-rouge">lexical-let</code> deconflicts these names, giving the effect of
lexical scope.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">v</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">lexical-outer</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">lexical-let</span> <span class="p">((</span><span class="nv">v</span> <span class="ss">:outer</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">inner</span><span class="p">)</span>
    <span class="nv">v</span><span class="p">))</span>

<span class="p">(</span><span class="nv">lexical-outer</span><span class="p">)</span>
<span class="c1">;; =&gt; :outer</span>
</code></pre></div></div>

<p>But there’s more to lexical scope than this. Closures only make sense
in the context of lexical scope, and the most useful feature of
<code class="language-plaintext highlighter-rouge">lexical-let</code> is that lambda expressions evaluate to closures. The
macro implements this using a technique called <a href="https://en.wikipedia.org/wiki/Lambda_lifting"><em>closure
conversion</em></a>. Additional parameters are added to the original
lambda function, one for each lexical variable (and not just each
closed-over variable), and the whole thing is wrapped in <em>another</em>
lambda function that invokes the original lambda function with the
additional parameters filled with the closed-over variables — yes, the
variables (e.g. symbols) themselves, <em>not</em> just their values, (e.g.
pass-by-reference). The last point means different closures can
properly close over the same variables, and they can bind new values.</p>

<p>To roughly illustrate how this works, the first lambda expression
below, which closes over the lexical variables <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code>, would be
converted into the latter by <code class="language-plaintext highlighter-rouge">lexical-let</code>. The <code class="language-plaintext highlighter-rouge">#:</code> is Elisp’s syntax
for uninterned variables. So <code class="language-plaintext highlighter-rouge">#:x</code> is <em>a</em> symbol <code class="language-plaintext highlighter-rouge">x</code>, but not <em>the</em>
symbol <code class="language-plaintext highlighter-rouge">x</code> (see <code class="language-plaintext highlighter-rouge">print-gensym</code>).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Before conversion:</span>
<span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
  <span class="p">(</span><span class="nb">+</span> <span class="nv">x</span> <span class="nv">y</span><span class="p">))</span>

<span class="c1">;; After conversion:</span>
<span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">args</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">apply</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">x</span> <span class="nv">y</span><span class="p">)</span>
           <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nb">symbol-value</span> <span class="nv">x</span><span class="p">)</span>
              <span class="p">(</span><span class="nb">symbol-value</span> <span class="nv">y</span><span class="p">)))</span>
         <span class="o">'</span><span class="ss">#:x</span> <span class="o">'</span><span class="ss">#:y</span> <span class="nv">args</span><span class="p">))</span>
</code></pre></div></div>

<p>I’ve said on multiple occasions that <code class="language-plaintext highlighter-rouge">lexical-binding: t</code> has
significant advantages, both in performance and static analysis, and
so it should be used for all future Elisp code. The only reason it’s
not the default is because it breaks some old (badly written) code.
However, <strong><code class="language-plaintext highlighter-rouge">lexical-let</code> doesn’t realize any of these advantages</strong>! In
fact, it has worse performance than straightforward dynamic scope with
<code class="language-plaintext highlighter-rouge">let</code>.</p>

<ol>
  <li>
    <p>New symbol objects are allocated and initialized (<code class="language-plaintext highlighter-rouge">make-symbol</code>) on
each run-time evaluation, one per lexical variable.</p>
  </li>
  <li>
    <p>Since it’s just faking it, <code class="language-plaintext highlighter-rouge">lexical-let</code> still uses dynamic
bindings, which are more expensive than lexical bindings. It varies
depending on the C compiler that built Emacs, but dynamic variable
accesses (opcode <code class="language-plaintext highlighter-rouge">varref</code>) take around 30% longer than lexical
variable accesses (opcode <code class="language-plaintext highlighter-rouge">stack-ref</code>). Assignment is far worse,
where dynamic variable assignment (<code class="language-plaintext highlighter-rouge">varset</code>) takes 650% longer than
lexical variable assignment (<code class="language-plaintext highlighter-rouge">stack-set</code>). How I measured all this
is a topic for another article.</p>
  </li>
  <li>
    <p>The “lexical” variables are accessed using <code class="language-plaintext highlighter-rouge">symbol-value</code>, a full
function call, so they’re even slower than normal dynamic
variables.</p>
  </li>
  <li>
    <p>Because converted lambda expressions are constructed dynamically at
run-time within the body of <code class="language-plaintext highlighter-rouge">lexical-let</code>, the resulting closure is
only partially byte-compiled even if the code as a whole has been
byte-compiled. In contrast, <code class="language-plaintext highlighter-rouge">lexical-binding: t</code> closures are fully
compiled. How this works is worth <a href="/blog/2017/12/14/">its own article</a>.</p>
  </li>
  <li>
    <p>Converted lambda expressions include the additional internal
function invocation, making them slower.</p>
  </li>
</ol>

<p>While <code class="language-plaintext highlighter-rouge">lexical-let</code> is clever, and occasionally useful prior to Emacs
24, it may come at a hefty performance cost if evaluated frequently.
There’s no reason to use it anymore.</p>

<h3 id="constraints-on-code-generation">Constraints on code generation</h3>

<p>Another reason to be weary of dynamic scope is that it puts needless
constraints on the compiler, preventing a number of important
optimization opportunities. For example, consider the following
function, <code class="language-plaintext highlighter-rouge">bar</code>:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">bar</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">x</span> <span class="mi">1</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">y</span> <span class="mi">2</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">foo</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">+</span> <span class="nv">x</span> <span class="nv">y</span><span class="p">)))</span>
</code></pre></div></div>

<p>Byte-compile this function under dynamic scope (<code class="language-plaintext highlighter-rouge">lexical-binding:
nil</code>) and <a href="/blog/2014/01/04/">disassemble it</a> to see what it looks like.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">byte-compile</span> <span class="nf">#'</span><span class="nv">bar</span><span class="p">)</span>
<span class="p">(</span><span class="nb">disassemble</span> <span class="nf">#'</span><span class="nv">bar</span><span class="p">)</span>
</code></pre></div></div>

<p>That pops up a buffer with the disassembly listing:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  1
1       constant  2
2       varbind   y
3       varbind   x
4       constant  foo
5       call      0
6       discard
7       varref    x
8       varref    y
9       plus
10      unbind    2
11      return
</code></pre></div></div>

<p>It’s 12 instructions, 5 of which deal with dynamic bindings. The
byte-compiler doesn’t always produce optimal byte-code, but this just
so happens to be <em>nearly</em> optimal byte-code. The <code class="language-plaintext highlighter-rouge">discard</code> (a very
fast instruction) isn’t necessary, but otherwise no more compiler
smarts can improve on this. Since the variables <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> are
visible to <code class="language-plaintext highlighter-rouge">foo</code>, they must be bound before the call and <a href="/blog/2016/07/25/">loaded after
the call</a>. While generally this function will return 3, the
compiler cannot assume so since it ultimately depends on the behavior
<code class="language-plaintext highlighter-rouge">foo</code>. Its hands are tied.</p>

<p>Compare this to the lexical scope version (<code class="language-plaintext highlighter-rouge">lexical-binding: t</code>):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  1
1       constant  2
2       constant  foo
3       call      0
4       discard
5       stack-ref 1
6       stack-ref 1
7       plus
8       return
</code></pre></div></div>

<p>It’s only 8 instructions, none of which are expensive dynamic variable
instructions. And this isn’t even close to the optimal byte-code. In
fact, as of Emacs 25.1 the byte-compiler often doesn’t produce the
optimal byte-code for lexical scope code and still needs some work.
<strong>Despite not firing on all cylinders, lexical scope still manages to
beat dynamic scope in performance benchmarks.</strong></p>

<p>Here’s the optimal byte-code, should the byte-compiler become smarter
someday:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  foo
1       call      0
2       constant  3
3       return
</code></pre></div></div>

<p>It’s down to 4 instructions due to computing the math operation at
compile time. Emacs’ byte-compiler only has rudimentary constant
folding, so it doesn’t notice that <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> are constants and
misses this optimization. I speculate this is due to its roots
compiling under dynamic scope. Since <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> are no longer exposed
to <code class="language-plaintext highlighter-rouge">foo</code>, the compiler has the opportunity to optimize them out of
existence. I haven’t measured it, but I would expect this to be
significantly faster than the dynamic scope version of this function.</p>

<h3 id="optional-dynamic-scope">Optional dynamic scope</h3>

<p>You might be thinking, “What if I really <em>do</em> want <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> to be
dynamically bound for <code class="language-plaintext highlighter-rouge">foo</code>?” This is often useful. Many of Emacs’ own
functions are designed to have certain variables dynamically bound
around them. For example, the print family of functions use the global
variable <code class="language-plaintext highlighter-rouge">standard-output</code> to determine where to send output by
default.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">standard-output</span> <span class="p">(</span><span class="nv">current-buffer</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">princ</span> <span class="s">"value = "</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">prin1</span> <span class="nv">value</span><span class="p">))</span>
</code></pre></div></div>

<p>Have no fear: <strong>With <code class="language-plaintext highlighter-rouge">lexical-binding: t</code> you can have your cake and
eat it too.</strong> Variables declared with <code class="language-plaintext highlighter-rouge">defvar</code>, <code class="language-plaintext highlighter-rouge">defconst</code>, or
<code class="language-plaintext highlighter-rouge">defvaralias</code> are marked as “special” with an internal bit flag
(<code class="language-plaintext highlighter-rouge">declared_special</code> in C). When the compiler detects one of these
variables (<code class="language-plaintext highlighter-rouge">special-variable-p</code>), it uses a classical dynamic binding.</p>

<p>Declaring both <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> as special restores the original semantics,
reverting <code class="language-plaintext highlighter-rouge">bar</code> back to its old byte-code definition (next time it’s
compiled, that is). But it would be poor form to mark <code class="language-plaintext highlighter-rouge">x</code> or <code class="language-plaintext highlighter-rouge">y</code> as
special: You’d de-optimize all code (compiled <em>after</em> the declaration)
anywhere in Emacs that uses these names. As a package author, only do
this with the namespace-prefixed variables that belong to you.</p>

<p>The only way to unmark a special variable is with the undocumented
function <code class="language-plaintext highlighter-rouge">internal-make-var-non-special</code>. I expected <code class="language-plaintext highlighter-rouge">makunbound</code> to
do this, but as of Emacs 25.1 it does not. This could possibly be
considered a bug.</p>

<h3 id="accidental-closures">Accidental closures</h3>

<p>I’ve said there are absolutely no advantages to <code class="language-plaintext highlighter-rouge">lexical-binding: nil</code>.
It’s only the default for the sake of backwards-compatibility. However,
there <em>is</em> one case where <code class="language-plaintext highlighter-rouge">lexical-binding: t</code> introduces a subtle issue
that would otherwise not exist. Take this code for example (and
nevermind <code class="language-plaintext highlighter-rouge">prin1-to-string</code> for a moment):</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; -*- lexical-binding: t; -*-</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">function-as-string</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">with-temp-buffer</span>
    <span class="p">(</span><span class="nb">prin1</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="ss">:example</span><span class="p">)</span> <span class="p">(</span><span class="nv">current-buffer</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)))</span>
</code></pre></div></div>

<p>This creates and serializes a closure, which is <a href="/blog/2013/12/30/">one of Elisp’s unique
features</a>. It doesn’t close over any variables, so it should be
pretty simple. However, this function will only work correctly under
<code class="language-plaintext highlighter-rouge">lexical-binding: t</code> when byte-compiled.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">function-as-string</span><span class="p">)</span>
<span class="c1">;; =&gt; "(closure ((temp-buffer . #&lt;buffer  *temp*&gt;) t) nil :example)"</span>
</code></pre></div></div>

<p>The interpreter doesn’t analyze the closure, so just closes over
everything. This includes the hidden variable <code class="language-plaintext highlighter-rouge">temp-buffer</code> created by
the <code class="language-plaintext highlighter-rouge">with-temp-buffer</code> macro, resulting in an abstraction leak.
Buffers aren’t readable, so this will signal an error if an attempt is
made to read this function back into an s-expression. The
byte-compiler fixes this by noticing <code class="language-plaintext highlighter-rouge">temp-buffer</code> isn’t actually
closed over and so doesn’t include it in the closure, making it work
correctly.</p>

<p>Under <code class="language-plaintext highlighter-rouge">lexical-binding: nil</code> it works correctly either way:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">function-as-string</span><span class="p">)</span>
<span class="c1">;; -&gt; "(lambda nil :example)"</span>
</code></pre></div></div>

<p>This may seem contrived — it’s certainly unlikely — but <a href="https://github.com/jwiegley/emacs-async/issues/17">it has come
up in practice</a>. Still, it’s no reason to avoid <code class="language-plaintext highlighter-rouge">lexical-binding: t</code>.</p>

<h3 id="use-lexical-scope-in-all-new-code">Use lexical scope in all new code</h3>

<p>As I’ve said again and again, always use <code class="language-plaintext highlighter-rouge">lexical-binding: t</code>. Use
dynamic variables judiciously. And <code class="language-plaintext highlighter-rouge">lexical-let</code> is no replacement. It
has virtually none of the benefits, performs <em>worse</em>, and it only
applies to <code class="language-plaintext highlighter-rouge">let</code>, not any of the other places bindings are created:
function parameters, <code class="language-plaintext highlighter-rouge">dotimes</code>, <code class="language-plaintext highlighter-rouge">dolist</code>, and <code class="language-plaintext highlighter-rouge">condition-case</code>.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Faster Elfeed Search Through JIT Byte-code Compilation</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/12/11/"/>
    <id>urn:uuid:47002cc3-816a-3cb8-b462-327364e3f943</id>
    <updated>2016-12-11T23:16:42Z</updated>
    <category term="emacs"/><category term="elfeed"/><category term="optimization"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>Today I pushed an update for <a href="https://github.com/skeeto/elfeed">Elfeed</a> that doubles the speed
of the search filter in the worse case. This is the user-entered
expression that dynamically narrows the entry listing to a subset that
meets certain criteria: published after a particular date,
with/without particular tags, and matching/non-matching zero or more
regular expressions. The filter is live, applied to the database as
the expression is edited, so it’s important for usability that this
search completes under a threshold that the user might notice.</p>

<p><img src="/img/elfeed/filter.gif" alt="" /></p>

<p>The typical workaround for these kinds of interfaces is to make
filtering/searching asynchronous. It’s possible to do this well, but
it’s usually a terrible, broken design. If the user acts upon the
asynchronous results — say, by typing the query and hitting enter to
choose the current or expected top result — then the final behavior is
non-deterministic, a race between the user’s typing speed and the
asynchronous search. Elfeed will keep its synchronous live search.</p>

<p>For anyone not familiar with Elfeed, here’s a filter that finds all
entries from within the past year tagged “youtube” (<code class="language-plaintext highlighter-rouge">+youtube</code>) that
mention Linux or Linus (<code class="language-plaintext highlighter-rouge">linu[sx]</code>), but aren’t tagged “bsd” (<code class="language-plaintext highlighter-rouge">-bsd</code>),
limited to the most recent 15 entries (<code class="language-plaintext highlighter-rouge">#15</code>):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@1-year-old +youtube linu[xs] -bsd #15
</code></pre></div></div>

<p>The database is primarily indexed over publication date, so filters on
publication dates are the most efficient filters. Entries are visited
in order starting with the most recently published, and the search can
bail out early once it crosses the filter threshold. Time-oriented
filters have been encouraged as the solution to keep the live search
feeling lively.</p>

<h3 id="filtering-overview">Filtering Overview</h3>

<p>The first step in filtering is parsing the filter text entered by the
user. This string is broken into its components using the
<code class="language-plaintext highlighter-rouge">elfeed-search-parse-filter</code> function. Date filter components are
converted into a unix epoch interval, tags are interned into symbols,
regular expressions are gathered up as strings, and the entry limit is
parsed into a plain integer. Absence of a filter component is
indicated by nil.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">elfeed-search-parse-filter</span> <span class="s">"@1-year-old +youtube linu[xs] -bsd #15"</span><span class="p">)</span>
<span class="c1">;; =&gt; (31557600.0 (youtube) (bsd) ("linu[xs]") nil 15)</span>
</code></pre></div></div>

<p>Previously, the next step was to apply the <code class="language-plaintext highlighter-rouge">elfeed-search-filter</code>
function with this structured filter representation to the database.
Except for special early-bailout situations, it works left-to-right
across the filter, checking each condition against each entry. This is
analogous to an interpreter, with the filter being a program.</p>

<p>Thinking about it that way, what if the filter was instead compiled
into an Emacs byte-code function and executed directly by the Emacs
virtual machine? That’s what this latest update does.</p>

<h3 id="benchmarks">Benchmarks</h3>

<p>With six different filter components, the actual filtering routine is
a bit too complicated for an article, so I’ll set up a simpler, but
roughly equivalent, scenario. With a reasonable cut-off date, the
filter was already sufficiently fast, so for benchmarking I’ll focus
on the worst case: no early bailout opportunities. An entry will be
just a list of tags (symbols), and the filter will have to test every
entry.</p>

<p>My <a href="/blog/2016/08/12/">real-world Elfeed database</a> currently has 46,772 entries with
36 distinct tags. For my benchmark I’ll round this up to a nice
100,000 entries, and use 26 distinct tags (A–Z), which has the nice
alphabet property and more closely reflects the number of tags I still
care about.</p>

<p>First, here’s <code class="language-plaintext highlighter-rouge">make-random-entry</code> to generate a random list of 1–5
tags (i.e. an entry). The <code class="language-plaintext highlighter-rouge">state</code> parameter is the random state,
allowing for deterministic benchmarks on a randomly-generated
database.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-defun</span> <span class="nv">make-random-entry</span> <span class="p">(</span><span class="k">&amp;key</span> <span class="nv">state</span> <span class="p">(</span><span class="nb">min</span> <span class="mi">1</span><span class="p">)</span> <span class="p">(</span><span class="nb">max</span> <span class="mi">5</span><span class="p">))</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">repeat</span> <span class="p">(</span><span class="nb">+</span> <span class="nb">min</span> <span class="p">(</span><span class="nv">cl-random</span> <span class="p">(</span><span class="nb">1+</span> <span class="p">(</span><span class="nb">-</span> <span class="nb">max</span> <span class="nb">min</span><span class="p">))</span> <span class="nv">state</span><span class="p">))</span>
           <span class="nv">for</span> <span class="nv">letter</span> <span class="nb">=</span> <span class="p">(</span><span class="nb">+</span> <span class="nv">?A</span> <span class="p">(</span><span class="nv">cl-random</span> <span class="mi">26</span> <span class="nv">state</span><span class="p">))</span>
           <span class="nv">collect</span> <span class="p">(</span><span class="nb">intern</span> <span class="p">(</span><span class="nb">format</span> <span class="s">"%c"</span> <span class="nv">letter</span><span class="p">))))</span>
</code></pre></div></div>

<p>The database is just a big list of entries. In Elfeed this is actually
an AVL tree. Without dates, the order doesn’t matter.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-defun</span> <span class="nv">make-random-database</span> <span class="p">(</span><span class="k">&amp;key</span> <span class="nv">state</span> <span class="p">(</span><span class="nb">count</span> <span class="mi">100000</span><span class="p">))</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">repeat</span> <span class="nb">count</span> <span class="nv">collect</span> <span class="p">(</span><span class="nv">make-random-entry</span> <span class="ss">:state</span> <span class="nv">state</span><span class="p">)))</span>
</code></pre></div></div>

<p>Here’s <a href="/blog/2009/05/28/">my old time macro</a>. An important change I’ve made since
years ago is to call <code class="language-plaintext highlighter-rouge">garbage-collect</code> before starting the clock,
eliminating bad samples from unlucky garbage collection events.
Depending on what you want to measure, it may even be worth disabling
garbage collection during the measurement by setting
<code class="language-plaintext highlighter-rouge">gc-cons-threshold</code> to a high value.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmacro</span> <span class="nv">measure-time</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">body</span><span class="p">)</span>
  <span class="p">(</span><span class="k">declare</span> <span class="p">(</span><span class="nv">indent</span> <span class="nb">defun</span><span class="p">))</span>
  <span class="p">(</span><span class="nv">garbage-collect</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">start</span> <span class="p">(</span><span class="nb">make-symbol</span> <span class="s">"start"</span><span class="p">)))</span>
    <span class="o">`</span><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="o">,</span><span class="nv">start</span> <span class="p">(</span><span class="nv">float-time</span><span class="p">)))</span>
       <span class="o">,@</span><span class="nv">body</span>
       <span class="p">(</span><span class="nb">-</span> <span class="p">(</span><span class="nv">float-time</span><span class="p">)</span> <span class="o">,</span><span class="nv">start</span><span class="p">))))</span>
</code></pre></div></div>

<p>Finally, the benchmark harness. It uses a hard-coded seed to generate
the same pseudo-random database. The test is run against the a filter
function, <code class="language-plaintext highlighter-rouge">f</code>, 100 times in search for the same 6 tags, and the timing
results are averaged.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-defun</span> <span class="nv">benchmark</span> <span class="p">(</span><span class="nv">f</span> <span class="k">&amp;optional</span> <span class="p">(</span><span class="nv">n</span> <span class="mi">100</span><span class="p">)</span> <span class="p">(</span><span class="nv">tags</span> <span class="o">'</span><span class="p">(</span><span class="nv">A</span> <span class="nv">B</span> <span class="nv">C</span> <span class="nv">D</span> <span class="nv">E</span> <span class="nv">F</span><span class="p">)))</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">state</span> <span class="p">(</span><span class="nv">copy-sequence</span> <span class="nv">[cl-random-state-tag</span> <span class="mi">-1</span> <span class="mi">30</span> <span class="nv">267466518]</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">db</span> <span class="p">(</span><span class="nv">make-random-database</span> <span class="ss">:state</span> <span class="nv">state</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">repeat</span> <span class="nv">n</span>
             <span class="nv">sum</span> <span class="p">(</span><span class="nv">measure-time</span>
                   <span class="p">(</span><span class="nb">funcall</span> <span class="nv">f</span> <span class="nv">db</span> <span class="nv">tags</span><span class="p">))</span>
             <span class="nv">into</span> <span class="nv">total</span>
             <span class="nv">finally</span> <span class="nb">return</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">total</span> <span class="p">(</span><span class="nb">float</span> <span class="nv">n</span><span class="p">)))))</span>
</code></pre></div></div>

<p>The baseline will be <code class="language-plaintext highlighter-rouge">memq</code> (test for membership using identity,
<code class="language-plaintext highlighter-rouge">eq</code>). There are two lists of tags to compare: the list that is the
entry, and the list from the filter. This requires a nested loop for
each entry, one explicit (<code class="language-plaintext highlighter-rouge">cl-loop</code>) and one implicit (<code class="language-plaintext highlighter-rouge">memq</code>), both
with early bailout.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">memq-count</span> <span class="p">(</span><span class="nv">db</span> <span class="nv">tags</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">entry</span> <span class="nv">in</span> <span class="nv">db</span> <span class="nb">count</span>
           <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">tag</span> <span class="nv">in</span> <span class="nv">tags</span>
                    <span class="nb">when</span> <span class="p">(</span><span class="nv">memq</span> <span class="nv">tag</span> <span class="nv">entry</span><span class="p">)</span>
                    <span class="nb">return</span> <span class="no">t</span><span class="p">)))</span>
</code></pre></div></div>

<p>Byte-code compiling everything and running the benchmark on my laptop
I get:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">memq-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.041 seconds</span>
</code></pre></div></div>

<p>That’s actually not too bad. One of the advantages of this definition
is that there are no function calls. The <code class="language-plaintext highlighter-rouge">memq</code> built-in function has
its own opcode (62), and the rest of the definition is special forms
and macros expanding to special forms (<code class="language-plaintext highlighter-rouge">cl-loop</code>). It’s exactly the
thing I need to exploit to make filters faster.</p>

<p>As a sanity check, what would happen if I used <code class="language-plaintext highlighter-rouge">member</code> instead of
<code class="language-plaintext highlighter-rouge">memq</code>? In theory it should be slower because it uses <code class="language-plaintext highlighter-rouge">equal</code> for
tests instead of <code class="language-plaintext highlighter-rouge">eq</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">member-count</span> <span class="p">(</span><span class="nv">db</span> <span class="nv">tags</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">entry</span> <span class="nv">in</span> <span class="nv">db</span> <span class="nb">count</span>
           <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">tag</span> <span class="nv">in</span> <span class="nv">tags</span>
                    <span class="nb">when</span> <span class="p">(</span><span class="nb">member</span> <span class="nv">tag</span> <span class="nv">entry</span><span class="p">)</span>
                    <span class="nb">return</span> <span class="no">t</span><span class="p">)))</span>
</code></pre></div></div>

<p>It’s only slightly slower because <code class="language-plaintext highlighter-rouge">member</code>, <a href="/blog/2013/01/22/">like many other
built-ins</a>, also has an opcode (157). It’s just a tiny bit
more overhead.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">member-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.047 seconds</span>
</code></pre></div></div>

<p>To test function call overhead while still using the built-in (e.g.
written in C) <code class="language-plaintext highlighter-rouge">memq</code>, I’ll alias it so that the byte-code compiler is
forced to emit a function call.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defalias</span> <span class="ss">'memq-alias</span> <span class="ss">'memq</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">memq-alias-count</span> <span class="p">(</span><span class="nv">db</span> <span class="nv">tags</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">entry</span> <span class="nv">in</span> <span class="nv">db</span> <span class="nb">count</span>
           <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">tag</span> <span class="nv">in</span> <span class="nv">tags</span>
                    <span class="nb">when</span> <span class="p">(</span><span class="nv">memq-alias</span> <span class="nv">tag</span> <span class="nv">entry</span><span class="p">)</span>
                    <span class="nb">return</span> <span class="no">t</span><span class="p">)))</span>
</code></pre></div></div>

<p>To verify that this is doing what I expect, I <code class="language-plaintext highlighter-rouge">M-x disassemble</code> the
function and inspect the byte-code disassembly. Here’s a simple
example.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">disassemble</span>
 <span class="p">(</span><span class="nv">byte-compile</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nb">list</span><span class="p">)</span> <span class="p">(</span><span class="nv">memq</span> <span class="ss">:foo</span> <span class="nb">list</span><span class="p">))))</span>
</code></pre></div></div>

<p>When compiled under lexical scope (<code class="language-plaintext highlighter-rouge">lexical-binding</code> is true), here’s
the disassembly. To understand what this means, see <a href="/blog/2014/01/04/"><em>Emacs Byte-code
Internals</em></a>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  :foo
1       stack-ref 1
2       memq
3       return
</code></pre></div></div>

<p>Notice the <code class="language-plaintext highlighter-rouge">memq</code> instruction. Try using <code class="language-plaintext highlighter-rouge">memq-alias</code> instead:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">disassemble</span>
 <span class="p">(</span><span class="nv">byte-compile</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nb">list</span><span class="p">)</span> <span class="p">(</span><span class="nv">memq-alias</span> <span class="ss">:foo</span> <span class="nb">list</span><span class="p">))))</span>
</code></pre></div></div>

<p>Resulting in a function call:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  memq-alias
1       constant  :foo
2       stack-ref 2
3       call      2
4       return
</code></pre></div></div>

<p>And the benchmark:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">memq-alias-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.052 seconds</span>
</code></pre></div></div>

<p>So the function call adds about 27% overhead. This means it would be a
good idea to <strong>avoid calling functions in the filter</strong> if I can help
it. I should rely on these special opcodes.</p>

<p>Suppose <code class="language-plaintext highlighter-rouge">memq</code> was written in Emacs Lisp rather than C. How much would
that hurt performance? My version of <code class="language-plaintext highlighter-rouge">my-memq</code> below isn’t quite the
same since it returns t rather than the sublist, but it’s good enough
for this purpose. (I’m using <code class="language-plaintext highlighter-rouge">cl-loop</code> because writing early bailout
in plain Elisp without recursion is, in my opinion, ugly.)</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">my-memq</span> <span class="p">(</span><span class="nv">needle</span> <span class="nv">haystack</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">element</span> <span class="nv">in</span> <span class="nv">haystack</span>
           <span class="nb">when</span> <span class="p">(</span><span class="nb">eq</span> <span class="nv">needle</span> <span class="nv">element</span><span class="p">)</span>
           <span class="nb">return</span> <span class="no">t</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">my-memq-count</span> <span class="p">(</span><span class="nv">db</span> <span class="nv">tags</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">entry</span> <span class="nv">in</span> <span class="nv">db</span> <span class="nb">count</span>
           <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">tag</span> <span class="nv">in</span> <span class="nv">tags</span>
                    <span class="nb">when</span> <span class="p">(</span><span class="nv">my-memq</span> <span class="nv">tag</span> <span class="nv">entry</span><span class="p">)</span>
                    <span class="nb">return</span> <span class="no">t</span><span class="p">)))</span>
</code></pre></div></div>

<p>And the benchmark:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">my-memq-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.137 seconds</span>
</code></pre></div></div>

<p>Oof! It’s more than 3 times slower than the opcode. This means <strong>I
should use built-ins as much as possible</strong> in the filter.</p>

<h3 id="dynamic-vs-lexical-scope">Dynamic vs. lexical scope</h3>

<p>There’s one last thing to watch out for. Everything so far has been
compiled with lexical scope. You should really turn this on by default
for all new code that you write. It has three important advantages:</p>

<ol>
  <li>It allows the compiler to catch more mistakes.</li>
  <li>It eliminates a class of bugs related to dynamic scope: Local
variables are exposed to manipulation by callees.</li>
  <li><a href="/blog/2016/12/22/">Lexical scope has better performance</a>.</li>
</ol>

<p>Here are all the benchmarks with the default dynamic scope:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">memq-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.065 seconds</span>

<span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">member-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.070 seconds</span>

<span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">memq-alias-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.074 seconds</span>

<span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">my-memq-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.256 seconds</span>
</code></pre></div></div>

<p>It halves the performance in this benchmark, and for no benefit. Under
dynamic scope, local variables use the <code class="language-plaintext highlighter-rouge">varref</code> opcode — a global
variable lookup — instead of the <code class="language-plaintext highlighter-rouge">stack-ref</code> opcode — a simple array
index.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">norm</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">*</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
</code></pre></div></div>

<p>Under dynamic scope, this compiles to:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       varref    a
1       varref    b
2       diff
3       varref    a
4       varref    b
5       diff
6       mult
7       return
</code></pre></div></div>

<p>And under lexical scope (notice the variable names disappear):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       stack-ref 1
1       stack-ref 1
2       diff
3       stack-ref 2
4       stack-ref 2
5       diff
6       mult
7       return
</code></pre></div></div>

<h3 id="jit-compiled-filters">JIT-compiled filters</h3>

<p>So far I’ve been moving in the wrong direction, making things slower
rather than faster. How can I make it faster than the straight <code class="language-plaintext highlighter-rouge">memq</code>
version? By compiling the filter into byte-code.</p>

<p>I won’t write the byte-code directly, but instead generate Elisp code
and use the byte-code compiler on it. This is safer, will work
correctly in future versions of Emacs, and leverages the optimizations
performed by the byte-compiler. This sort of thing recently <a href="http://emacshorrors.com/posts/when-data-becomes-code.html">got a bad
rap on Emacs Horrors</a>, but I was happy to see that this
technique is already established.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">jit-count</span> <span class="p">(</span><span class="nv">db</span> <span class="nv">tags</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">memq-list</span> <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">tag</span> <span class="nv">in</span> <span class="nv">tags</span>
                             <span class="nv">collect</span> <span class="o">`</span><span class="p">(</span><span class="nv">memq</span> <span class="ss">',tag</span> <span class="nv">entry</span><span class="p">)))</span>
         <span class="p">(</span><span class="k">function</span> <span class="o">`</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">db</span><span class="p">)</span>
                      <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">entry</span> <span class="nv">in</span> <span class="nv">db</span>
                               <span class="nb">count</span> <span class="p">(</span><span class="nb">or</span> <span class="o">,@</span><span class="nv">memq-list</span><span class="p">))))</span>
         <span class="p">(</span><span class="nv">compiled</span> <span class="p">(</span><span class="nv">byte-compile</span> <span class="k">function</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">funcall</span> <span class="nv">compiled</span> <span class="nv">db</span><span class="p">)))</span>
</code></pre></div></div>

<p>It dynamically builds the code as an s-expression, runs that through
the byte-code compiler, executes it, and throws it away. It’s
“just-in-time,” though compiling to byte-code and not <a href="/blog/2015/03/19/">native
code</a>. For the benchmark tags of <code class="language-plaintext highlighter-rouge">(A B C D E F)</code>, this builds
the following:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">db</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">entry</span> <span class="nv">in</span> <span class="nv">db</span>
           <span class="nb">count</span> <span class="p">(</span><span class="nb">or</span> <span class="p">(</span><span class="nv">memq</span> <span class="ss">'A</span> <span class="nv">entry</span><span class="p">)</span>
                     <span class="p">(</span><span class="nv">memq</span> <span class="ss">'B</span> <span class="nv">entry</span><span class="p">)</span>
                     <span class="p">(</span><span class="nv">memq</span> <span class="ss">'C</span> <span class="nv">entry</span><span class="p">)</span>
                     <span class="p">(</span><span class="nv">memq</span> <span class="ss">'D</span> <span class="nv">entry</span><span class="p">)</span>
                     <span class="p">(</span><span class="nv">memq</span> <span class="ss">'E</span> <span class="nv">entry</span><span class="p">)</span>
                     <span class="p">(</span><span class="nv">memq</span> <span class="ss">'F</span> <span class="nv">entry</span><span class="p">))))</span>
</code></pre></div></div>

<p>Due to its short-circuiting behavior, <code class="language-plaintext highlighter-rouge">or</code> is a special form, so this
function is just special forms and <code class="language-plaintext highlighter-rouge">memq</code> in its opcode form. It’s as
fast as Elisp can get.</p>

<p>Having s-expressions is a real strength for lisp, since the
alternative (in, say, JavaScript) would be to assemble the function by
concatenating code strings. By contrast, this looks a lot like a
regular lisp macro. Invoking the byte-code compiler does add some
overhead compared to the interpreted filter, but it’s insignificant.</p>

<p>How much faster is this?</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">benchmark</span> <span class="nf">#'</span><span class="nv">jit-count</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.017s</span>
</code></pre></div></div>

<p><strong>It’s more than twice as fast!</strong> The big gain here is through <em>loop
unrolling</em>. The outer loop has been unrolled into the <code class="language-plaintext highlighter-rouge">or</code> expression.
That section of byte-code looks like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0       constant  A
1       stack-ref 1
2       memq
3       goto-if-not-nil-else-pop 1
6       constant  B
7       stack-ref 1
8       memq
9       goto-if-not-nil-else-pop 1
12      constant  C
13      stack-ref 1
14      memq
15      goto-if-not-nil-else-pop 1
18      constant  D
19      stack-ref 1
20      memq
21      goto-if-not-nil-else-pop 1
24      constant  E
25      stack-ref 1
26      memq
27      goto-if-not-nil-else-pop 1
30      constant  F
31      stack-ref 1
32      memq
33:1    return
</code></pre></div></div>

<p>In Elfeed, not only does it unroll these loops, it completely
eliminates the overhead for unused filter components. Comparing to
this benchmark, I’m seeing roughly matching gains in Elfeed’s worst
case. In Elfeed, I also bind <code class="language-plaintext highlighter-rouge">lexical-binding</code> around the
<code class="language-plaintext highlighter-rouge">byte-compile</code> call to force lexical scope, since otherwise it just
uses the buffer-local value (usually nil).</p>

<p>Filter compilation can be toggled on and off by setting
<code class="language-plaintext highlighter-rouge">elfeed-search-compile-filter</code>. If you’re up to date, try out live
filters with it both enabled and disabled. See if you can notice the
difference.</p>

<h3 id="result-summary">Result summary</h3>

<p>Here are the results in a table, all run with Emacs 24.4 on x86-64.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(ms)      memq      member    memq-alias my-memq   jit
lexical   41        47        52         137       17
dynamic   65        70        74         256       21
</code></pre></div></div>

<p>And the same benchmarks on Aarch64 (Emacs 24.5, ARM Cortex-A53), where
I also occasionally use Elfeed, and where I have been very interested
in improving performance.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(ms)      memq      member    memq-alias my-memq   jit
lexical   170       235       242        614       79
dynamic   274       340       345        1130      92
</code></pre></div></div>

<p>And here’s how you can run the benchmarks for yourself, perhaps with
different parameters:</p>

<ul>
  <li><a href="/download/jit-bench.el">jit-bench.el</a></li>
</ul>

<p>The header explains how to run the benchmark in batch mode:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ emacs -Q -batch -f batch-byte-compile jit-bench.el
$ emacs -Q -batch -l jit-bench.elc -f benchmark-batch
</code></pre></div></div>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>A Showerthoughts Fortune File</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/12/01/"/>
    <id>urn:uuid:0a266c4d-a224-3399-a851-848f71b47dc3</id>
    <updated>2016-12-01T23:58:15Z</updated>
    <category term="reddit"/><category term="linux"/><category term="emacs"/>
    <content type="html">
      <![CDATA[<p>I have created a <a href="https://en.wikipedia.org/wiki/Fortune_(Unix)"><code class="language-plaintext highlighter-rouge">fortune</code> file</a> for the all-time top 10,000
<a href="https://old.reddit.com/r/Showerthoughts/">/r/Showerthoughts</a> posts, as of October 2016. As a word of
warning: Many of these entries are adult humor and may not be
appropriate for your work computer. These fortunes would be
categorized as “offensive” (<code class="language-plaintext highlighter-rouge">fortune -o</code>).</p>

<p>Download: <a href="https://skeeto.s3.amazonaws.com/share/showerthoughts" class="download">showerthoughts</a> (1.3 MB)</p>

<p>The copyright status of this file is subject to each of its thousands
of authors. Since it’s not possible to contact many of these authors —
some may not even still live — it’s obviously never going to be under
an open source license (Creative Commons, etc.). Even more, some
quotes are probably from comedians and such, rather than by the
redditor who made the post. I distribute it only for fun.</p>

<h3 id="installation">Installation</h3>

<p>To install this into your <code class="language-plaintext highlighter-rouge">fortune</code> database, first process it with
<code class="language-plaintext highlighter-rouge">strfile</code> to create a random-access index, showerthoughts.dat, then
copy them to the directory with the rest.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ strfile showerthoughts
"showerthoughts.dat" created
There were 10000 strings
Longest string: 343 bytes
Shortest string: 39 bytes

$ cp showerthoughts* /usr/share/games/fortunes/
</code></pre></div></div>

<p>Alternatively, <code class="language-plaintext highlighter-rouge">fortune</code> can be told to use this file directly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ fortune showerthoughts
Not once in my life have I stepped into somebody's house and
thought, "I sure hope I get an apology for 'the mess'."
        ―AndItsDeepToo, Aug 2016
</code></pre></div></div>

<p>If you didn’t already know, <code class="language-plaintext highlighter-rouge">fortune</code> is an old unix utility that
displays a random quotation from a quotation database — a digital
<em>fortune cookie</em>. I use it as an interactive login shell greeting on
my <a href="http://www.hardkernel.com/main/products/prdt_info.php">ODROID-C2</a> server:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if </span><span class="nb">shopt</span> <span class="nt">-q</span> login_shell<span class="p">;</span> <span class="k">then
    </span>fortune ~/.fortunes
<span class="k">fi</span>
</code></pre></div></div>

<h3 id="how-was-it-made">How was it made?</h3>

<p>Fortunately I didn’t have to do something crazy like scrape reddit for
weeks on end. Instead, I downloaded <a href="http://files.pushshift.io/reddit/">the pushshift.io submission
archives</a>, which is currently around 70 GB compressed. Each file
contains one month’s worth of JSON data, one object per submission,
one submission per line, all compressed with bzip2.</p>

<p>Unlike so many other datasets, especially when it’s made up of
arbitrary inputs from millions of people, the format of the
/r/Showerthoughts posts is surprisingly very clean and requires
virtually no touching up. It’s some really fantastic data.</p>

<p>A nice feature of bzip2 is concatenating compressed files also
concatenates the uncompressed files. Additionally, it’s easy to
parallelize bzip2 compression and decompression, which gives it <a href="/blog/2009/03/16/">an
edge over xz</a>. I strongly recommend using <a href="http://lbzip2.org/">lbzip2</a> to
decompress this data, should you want to process it yourself.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cat </span>RS_<span class="k">*</span>.bz2 | lbunzip2 <span class="o">&gt;</span> everything.json
</code></pre></div></div>

<p><a href="https://stedolan.github.io/jq/">jq</a> is my favorite command line tool for processing JSON (and
<a href="/blog/2016/09/15/">rendering fractals</a>). To filter all the /r/Showerthoughts posts,
it’s a simple <code class="language-plaintext highlighter-rouge">select</code> expression. Just mind the capitalization of the
subreddit’s name. The <code class="language-plaintext highlighter-rouge">-c</code> tells <code class="language-plaintext highlighter-rouge">jq</code> to keep it one per line.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cat </span>RS_<span class="k">*</span>.bz2 | <span class="se">\</span>
    lbunzip2 | <span class="se">\</span>
    jq <span class="nt">-c</span> <span class="s1">'select(.subreddit == "Showerthoughts")'</span> <span class="se">\</span>
    <span class="o">&gt;</span> showerthoughts.json
</code></pre></div></div>

<p>However, you’ll quickly find that jq is the bottleneck, parsing all
that JSON. Your cores won’t be exploited by lbzip2 as they should. So
I throw <code class="language-plaintext highlighter-rouge">grep</code> in front to dramatically decrease the workload for
<code class="language-plaintext highlighter-rouge">jq</code>.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cat</span> <span class="k">*</span>.bz2 | <span class="se">\</span>
    lbunzip2 | <span class="se">\</span>
    <span class="nb">grep</span> <span class="nt">-a</span> Showerthoughts | <span class="se">\</span>
    jq <span class="nt">-c</span> <span class="s1">'select(.subreddit == "Showerthoughts")'</span>
    <span class="o">&gt;</span> showerthoughts.json
</code></pre></div></div>

<p>This will let some extra things through, but it’s a superset. The <code class="language-plaintext highlighter-rouge">-a</code>
option is necessary because the data contains some null bytes. Without
it, <code class="language-plaintext highlighter-rouge">grep</code> switches into binary mode and breaks everything. This is
incredibly frustrating when you’ve already waited half an hour for
results.</p>

<p>To further reduce the workload further down the pipeline, I take
advantage of the fact that only four fields will be needed: <code class="language-plaintext highlighter-rouge">title</code>,
<code class="language-plaintext highlighter-rouge">score</code>, <code class="language-plaintext highlighter-rouge">author</code>, and <code class="language-plaintext highlighter-rouge">created_utc</code>. The rest can — and should, for
efficiency’s sake — be thrown away where it’s cheap to do so.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cat</span> <span class="k">*</span>.bz2 | <span class="se">\</span>
    lbunzip2 | <span class="se">\</span>
    <span class="nb">grep</span> <span class="nt">-a</span> Showerthoughts | <span class="se">\</span>
    jq <span class="nt">-c</span> <span class="s1">'select(.subreddit == "Showerthoughts") |
               {title, score, author, created_utc}'</span> <span class="se">\</span>
    <span class="o">&gt;</span> showerthoughts.json
</code></pre></div></div>

<p>This gathers all 1,199,499 submissions into a 185 MB JSON file (as of
this writing). Most of these submissions are terrible, so the next
step is narrowing it to the small set of good submissions and putting
them into the <code class="language-plaintext highlighter-rouge">fortune</code> database format.</p>

<p><strong>It turns out reddit already has a method for finding the best
submissions: a voting system.</strong> Just pick the highest scoring posts.
Through experimentation I arrived at 10,000 as the magic cut-off
number. After this the quality really starts to drop off. Over time
this should probably be scaled up with the total number of
submissions.</p>

<p>I did both steps at the same time using a bit of Emacs Lisp, which is
particularly well-suited to the task:</p>

<ul>
  <li><a href="https://github.com/skeeto/showerthoughts">https://github.com/skeeto/showerthoughts</a></li>
</ul>

<p>This Elisp program reads one JSON object at a time and sticks each
into a AVL tree sorted by score (descending), then timestamp
(ascending), then title (ascending). The AVL tree is limited to 10,000
items, with the lowest items being dropped. This was a lot faster than
the more obvious approach: collecting everything into a big list,
sorting it, and keeping the top 10,000 items.</p>

<h4 id="formatting">Formatting</h4>

<p>The most complicated part is actually paragraph wrapping the
submissions. Most are too long for a single line, and letting the
terminal hard wrap them is visually unpleasing. The submissions are
encoded in UTF-8, some with characters beyond simple ASCII. Proper
wrapping requires not just Unicode awareness, but also some degree of
Unicode <em>rendering</em>. The algorithm needs to recognize grapheme
clusters and know the size of the rendered text. This is not so
trivial! Most paragraph wrapping tools and libraries get this wrong,
some counting width by bytes, others counting width by codepoints.</p>

<p>Emacs’ <code class="language-plaintext highlighter-rouge">M-x fill-paragraph</code> knows how to do all these things — only
for a monospace font, which is all I needed — and I decided to
leverage it when generating the <code class="language-plaintext highlighter-rouge">fortune</code> file. Here’s an example that
paragraph-wraps a string:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">string-fill-paragraph</span> <span class="p">(</span><span class="nv">s</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">with-temp-buffer</span>
    <span class="p">(</span><span class="nv">insert</span> <span class="nv">s</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">fill-paragraph</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)))</span>
</code></pre></div></div>

<p>For the file format, items are delimited by a <code class="language-plaintext highlighter-rouge">%</code> on a line by itself.
I put the wrapped content, followed by a <a href="http://www.fileformat.info/info/unicode/char/2015/index.htm">quotation dash</a>, the
author, and the date. A surprising number of these submissions have
date-sensitive content (“on this day X years ago”), so I found it was
important to include a date.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>April Fool's Day is the one day of the year when people critically
evaluate news articles before accepting them as true.
        ―kellenbrent, Apr 2015
%
Of all the bodily functions that could be contagious, thank god
it's the yawn.
        ―MKLV, Aug 2015
%
</code></pre></div></div>

<p>There’s the potential that a submission itself could end with a lone
<code class="language-plaintext highlighter-rouge">%</code> and, with a bit of bad luck, it happens to wrap that onto its own
line. Fortunately this hasn’t happened yet. But, now that I’ve
advertised it, someone could make such a submission, popular enough
for the top 10,000, with the intent to personally trip me up in a
future update. I accept this, though it’s unlikely, and it would be
fairly easy to work around if it happened.</p>

<p>The <code class="language-plaintext highlighter-rouge">strfile</code> program looks for the <code class="language-plaintext highlighter-rouge">%</code> delimiters and fills out a
table of file offsets. The header of the <code class="language-plaintext highlighter-rouge">.dat</code> file indicates the
number strings along with some other metadata. What follows is a table
of 32-bit file offsets.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="p">{</span>
    <span class="kt">uint32_t</span> <span class="n">str_version</span><span class="p">;</span>  <span class="cm">/* version number */</span>
    <span class="kt">uint32_t</span> <span class="n">str_numstr</span><span class="p">;</span>   <span class="cm">/* # of strings in the file */</span>
    <span class="kt">uint32_t</span> <span class="n">str_longlen</span><span class="p">;</span>  <span class="cm">/* length of longest string */</span>
    <span class="kt">uint32_t</span> <span class="n">str_shortlen</span><span class="p">;</span> <span class="cm">/* shortest string length */</span>
    <span class="kt">uint32_t</span> <span class="n">str_flags</span><span class="p">;</span>    <span class="cm">/* bit field for flags */</span>
    <span class="kt">char</span> <span class="n">str_delim</span><span class="p">;</span>        <span class="cm">/* delimiting character */</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Note that the table doesn’t necessarily need to list the strings in
the same order as they appear in the original file. In fact, recent
versions of <code class="language-plaintext highlighter-rouge">strfile</code> can sort the strings by sorting the table, all
without touching the original file. Though none of this important to
<code class="language-plaintext highlighter-rouge">fortune</code>.</p>

<p>Now that you know how it all works, you can build your own <code class="language-plaintext highlighter-rouge">fortune</code>
file from your own inputs!</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Emacs, Dynamic Modules, and Joysticks</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/11/05/"/>
    <id>urn:uuid:c53305bb-4770-3a7f-934c-31eea37d38eb</id>
    <updated>2016-11-05T04:01:51Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="c"/><category term="linux"/>
    <content type="html">
      <![CDATA[<p>Two months ago Emacs 25 was released and introduced a <a href="http://diobla.info/blog-archive/modules-tut.html">new dynamic
module feature</a>. Emacs can now load shared libraries built
against Emacs’ module API, defined in <a href="http://git.savannah.gnu.org/cgit/emacs.git/tree/src/emacs-module.h?h=emacs-25.1">emacs-module.h</a>. What’s
interesting about this API is that it doesn’t require linking against
Emacs or any sort of library. Instead, at run time Emacs supplies the
module’s initialization function with function pointers for the entire
API.</p>

<p>As a demonstration, in this article I’ll build an Emacs joystick
interface (Linux only) using a dynamic module. It will allow Emacs to
read events from any joystick on the system. All the source code is
here:</p>

<ul>
  <li><a href="https://github.com/skeeto/joymacs">https://github.com/skeeto/joymacs</a></li>
</ul>

<p>It includes a calibration interface (<code class="language-plaintext highlighter-rouge">M-x joydemo</code>) within Emacs:</p>

<p><a href="/img/joymacs/joymacs.png"><img src="/img/joymacs/joymacs-thumb.png" alt="" /></a></p>

<p>Currently, Emacs’ emacs-module.h header is the entirety of the module
documentation. It’s a bit thin and leaves ambiguities that requires
some reading of the Emacs source code. Even reading the source, it’s
not clear which behaviors are a reliable part of the interface. For
example, if there’s a pending non-local exit, it’s safe for a function
to return <code class="language-plaintext highlighter-rouge">NULL</code> since the return value is never inspected (Emacs
25.1), but will this always be the case? While mistakes are
unforgiving (a hard crash), the API is mostly intuitive and it’s been
pretty easy to feel my way around it.</p>

<p><em>Update</em>: Philipp Stephani has <a href="https://phst.github.io/emacs-modules">written thorough, reliable module
documentation</a>.</p>

<h3 id="dynamic-module-types">Dynamic Module Types</h3>

<p>All Emacs values — integers, floats, cons cells, vectors, strings,
etc. — are represented as the polymorphic, pointer-valued type,
<code class="language-plaintext highlighter-rouge">emacs_value</code>. Despite being a pointer, <code class="language-plaintext highlighter-rouge">NULL</code> is not a valid value,
as convenient as that would be. The API includes functions for
creating and extracting the fundamental types: integers, floats,
strings. Almost all other object types can only be accessed by making
Lisp function calls to regular Emacs functions from the module.</p>

<p>Modules also introduce a brand new Emacs object type: a <em>user
pointer</em>. These are <a href="/blog/2013/12/30/">non-readable</a>, opaque pointer values
returned by modules, typically representing a handle to some resource,
be it a memory block, database connection, or a joystick. These
objects include a finalizer function pointer — which, surprisingly, is
not permitted to be NULL — and their lifetime is managed by Emacs’
garbage collector.</p>

<p>User pointers are a somewhat dangerous feature since there’s little to
stop Emacs Lisp code from misusing them. A Lisp program can take a
user pointer from one module and pass it to a function in a different
module. Since it’s just a pointer, there’s no way to type check it. At
best, a module could maintain a table of all its live pointers,
checking all user pointer arguments against the table before
dereferencing. But I don’t expect this to be normal practice.</p>

<h3 id="module-initialization">Module Initialization</h3>

<p>After loading the module through the platform’s mechanism, the first
thing Emacs does is check for the symbol <code class="language-plaintext highlighter-rouge">plugin_is_GPL_compatible</code>.
While tacky, this is not surprising given the culture around Emacs.</p>

<p>Next it calls <code class="language-plaintext highlighter-rouge">emacs_module_init()</code>, passing it the first function
pointer. From this, the module can get a Lisp environment and start
doing Emacs things, such as binding module functions to Lisp symbols.</p>

<p>Here’s a complete “Hello, world!” example:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">"emacs-module.h"</span><span class="cp">
</span>
<span class="kt">int</span> <span class="n">plugin_is_GPL_compatible</span><span class="p">;</span>

<span class="kt">int</span>
<span class="nf">emacs_module_init</span><span class="p">(</span><span class="k">struct</span> <span class="n">emacs_runtime</span> <span class="o">*</span><span class="n">ert</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">emacs_env</span> <span class="o">*</span><span class="n">env</span> <span class="o">=</span> <span class="n">ert</span><span class="o">-&gt;</span><span class="n">get_environment</span><span class="p">(</span><span class="n">ert</span><span class="p">);</span>
    <span class="n">emacs_value</span> <span class="n">message</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">intern</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="s">"message"</span><span class="p">);</span>
    <span class="k">const</span> <span class="kt">char</span> <span class="n">hi</span><span class="p">[]</span> <span class="o">=</span> <span class="s">"Hello, world!"</span><span class="p">;</span>
    <span class="n">emacs_value</span> <span class="n">string</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_string</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">hi</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">hi</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">);</span>
    <span class="n">env</span><span class="o">-&gt;</span><span class="n">funcall</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">message</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">string</span><span class="p">);</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>In a real module, it’s common to create function objects for native
functions, then fetch the <code class="language-plaintext highlighter-rouge">fset</code> symbol and make a Lisp call on it to
bind the newly-created function object to a name. You’ll see this in
action later.</p>

<h3 id="joystick-api">Joystick API</h3>

<p>The joystick API will closely resemble <a href="https://www.kernel.org/doc/Documentation/input/joystick-api.txt">Linux’s own joystick API</a>,
making for a fairly thin wrapper. It’s so thin that Emacs <em>almost</em>
doesn’t even need a dynamic module. This is because, on Linux,
joysticks are just files under <code class="language-plaintext highlighter-rouge">/dev/input/</code>. Want to see the input
events on the first joystick? Just read <code class="language-plaintext highlighter-rouge">/dev/input/js0</code>. So Plan 9.</p>

<p>Emacs already knows how to read files, but these virtual files are a
little <em>too</em> special for that. The header <code class="language-plaintext highlighter-rouge">linux/joystick.h</code> defines a
<code class="language-plaintext highlighter-rouge">struct js_event</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">js_event</span> <span class="p">{</span>
    <span class="kt">uint32_t</span> <span class="n">time</span><span class="p">;</span>  <span class="cm">/* event timestamp in milliseconds */</span>
    <span class="kt">int16_t</span> <span class="n">value</span><span class="p">;</span>
    <span class="kt">uint8_t</span> <span class="n">type</span><span class="p">;</span>
    <span class="kt">uint8_t</span> <span class="n">number</span><span class="p">;</span> <span class="cm">/* axis/button number */</span>
<span class="p">};</span>
</code></pre></div></div>

<p>The idea is to read from the joystick device into this structure. The
first several reads are initialization that define the axes and
buttons of the joystick and their initial state. Further events are
queued up for the file descriptor. This all means that the file can’t
just be opened each time joystick input is needed. It has to be held
open for the duration, and is typically configured non-blocking.</p>

<p>The Emacs package will be called <code class="language-plaintext highlighter-rouge">joymacs</code> and there will be three
functions:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">joymacs-open</span> <span class="nv">N</span><span class="p">)</span>
<span class="p">(</span><span class="nv">joymacs-close</span> <span class="nv">JOYSTICK</span><span class="p">)</span>
<span class="p">(</span><span class="nv">joymacs-read</span> <span class="nv">JOYSTICK</span> <span class="nv">EVENT-VECTOR</span><span class="p">)</span>
</code></pre></div></div>

<h4 id="joymacs-open">joymacs-open</h4>

<p>The <code class="language-plaintext highlighter-rouge">joymacs-open</code> function will take an integer, opening the Nth
joystick (<code class="language-plaintext highlighter-rouge">/dev/input/jsN</code>). It will create a file descriptor for the
joystick device, returning it as a user pointer. Think of it as a sort
of “joystick handle.” Now, it <em>could</em> instead return the file
descriptor as an integer, but the user pointer has two significant
benefits:</p>

<ol>
  <li>
    <p><strong>The resource will be garbage collected.</strong> If the caller loses
track of a file descriptor returned as an integer, the joystick
device will be held open until Emacs shuts down, using up one of
Emacs’ file descriptors. By putting it in a user pointer, the
garbage collector will have the module to release the file
descriptor if the user loses track of it.</p>
  </li>
  <li>
    <p><strong>It should be difficult for the user to make a dangerous call.</strong>
Emacs Lisp can’t create user pointers — they only come from modules
— and so the module is less likely to get passed the wrong thing.
In the case of <code class="language-plaintext highlighter-rouge">joystick-close</code>, the module will be calling
<code class="language-plaintext highlighter-rouge">close(2)</code> on the argument. We definitely don’t want to make that
system call on file descriptors owned by Emacs. Further, since user
pointers are mutable, the module can ensure it doesn’t call
<code class="language-plaintext highlighter-rouge">close(2)</code> twice.</p>
  </li>
</ol>

<p>Here’s the implementation for <code class="language-plaintext highlighter-rouge">joymacs-open</code>. I’ll over over each part
in detail.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="n">emacs_value</span>
<span class="nf">joymacs_open</span><span class="p">(</span><span class="n">emacs_env</span> <span class="o">*</span><span class="n">env</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">n</span><span class="p">,</span> <span class="n">emacs_value</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">ptr</span><span class="p">)</span>
<span class="p">{</span>
    <span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="n">ptr</span><span class="p">;</span>
    <span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="n">n</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">id</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">extract_integer</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">env</span><span class="o">-&gt;</span><span class="n">non_local_exit_check</span><span class="p">(</span><span class="n">env</span><span class="p">)</span> <span class="o">!=</span> <span class="n">emacs_funcall_exit_return</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
    <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">64</span><span class="p">];</span>
    <span class="kt">int</span> <span class="n">buflen</span> <span class="o">=</span> <span class="n">sprintf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="s">"/dev/input/js%d"</span><span class="p">,</span> <span class="n">id</span><span class="p">);</span>
    <span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">open</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="n">O_RDONLY</span> <span class="o">|</span> <span class="n">O_NONBLOCK</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">fd</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">emacs_value</span> <span class="n">signal</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">intern</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="s">"file-error"</span><span class="p">);</span>
        <span class="n">emacs_value</span> <span class="n">message</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_string</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">buf</span><span class="p">,</span> <span class="n">buflen</span><span class="p">);</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">non_local_exit_signal</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">signal</span><span class="p">,</span> <span class="n">message</span><span class="p">);</span>
        <span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_user_ptr</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">fin_close</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)(</span><span class="kt">intptr_t</span><span class="p">)</span><span class="n">fd</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The C function name doesn’t matter to Emacs. It’s <code class="language-plaintext highlighter-rouge">static</code> because it
doesn’t even matter if the function visible to Emacs. It will get the
function pointer later as part of initialization.</p>

<p>This is the prototype for all functions callable by Emacs Lisp,
regardless of its arity. It has four arguments:</p>

<ol>
  <li>
    <p>It gets an environment, <code class="language-plaintext highlighter-rouge">env</code>, through which to call back into
Emacs.</p>
  </li>
  <li>
    <p>It gets <code class="language-plaintext highlighter-rouge">n</code>, the number of arguments. This is guaranteed to be the
correct number of arguments, as specified later when creating the
function object, so only variadic functions need to inspect this
argument.</p>
  </li>
  <li>
    <p>The Lisp arguments are passed as an array of values, <code class="language-plaintext highlighter-rouge">args</code>.
There’s no type declaration when declaring a function object, so
these may be of the wrong type. I’ll go over how to deal with this.</p>
  </li>
  <li>
    <p>Finally, it gets an arbitrary pointer, supplied at function object
creation time. This allows the module to create closures, but will
usually be ignored.</p>
  </li>
</ol>

<p>The first thing the function does is extract its integer argument.
This is actually an <code class="language-plaintext highlighter-rouge">intmax_t</code>, but I don’t think anyone has that many
USB ports. An <code class="language-plaintext highlighter-rouge">int</code> will suffice.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kt">int</span> <span class="n">id</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">extract_integer</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">env</span><span class="o">-&gt;</span><span class="n">non_local_exit_check</span><span class="p">(</span><span class="n">env</span><span class="p">)</span> <span class="o">!=</span> <span class="n">emacs_funcall_exit_return</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
</code></pre></div></div>

<p>As for not underestimating fools, what if the user passed a value that
isn’t an integer? Will the world come crashing down? Fortunately Emacs
checks that in <code class="language-plaintext highlighter-rouge">extract_integer</code> and, if there’s a mismatch, sets a
pending error signal in the environment. This is really great because
checking types directly in the module is a <em>real pain the ass</em>. So,
before committing to anything further, such as opening a file, I check
for this signal and bail out early if necessary. In Emacs 25.1 it’s
safe to return NULL since the return value will be completely ignored,
but I’d rather hedge my bets.</p>

<p>By the way, the <code class="language-plaintext highlighter-rouge">nil</code> here is a global variable set in initialization.
You don’t just get that for free!</p>

<p>The next step is opening the joystick device, read-only and
non-blocking. The non-blocking is vital because the module would
otherwise hang Emacs later if there are no events (well, except for
the read being quickly interrupted by a POSIX signal).</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">64</span><span class="p">];</span>
    <span class="kt">int</span> <span class="n">buflen</span> <span class="o">=</span> <span class="n">sprintf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="s">"/dev/input/js%d"</span><span class="p">,</span> <span class="n">id</span><span class="p">);</span>
    <span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">open</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="n">O_RDONLY</span> <span class="o">|</span> <span class="n">O_NONBLOCK</span><span class="p">);</span>
</code></pre></div></div>

<p>If the joystick fails to open (e.g. it doesn’t exist, or the user
lacks permission), manually set an error signal for a non-local exit.
I chose the <code class="language-plaintext highlighter-rouge">file-error</code> signal and I’m just using the filename as the
signal data.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">if</span> <span class="p">(</span><span class="n">fd</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">emacs_value</span> <span class="n">signal</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">intern</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="s">"file-error"</span><span class="p">);</span>
        <span class="n">emacs_value</span> <span class="n">message</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_string</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">buf</span><span class="p">,</span> <span class="n">buflen</span><span class="p">);</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">non_local_exit_signal</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">signal</span><span class="p">,</span> <span class="n">message</span><span class="p">);</span>
        <span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>Otherwise create the user pointer. No need to allocate any memory;
just stuff it in the pointer itself. If the user mistakenly passes it
to another module, it will sure be in for a surprise when it tries to
dereference it.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">return</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_user_ptr</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">fin_close</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)(</span><span class="kt">intptr_t</span><span class="p">)</span><span class="n">fd</span><span class="p">);</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">fin_close()</code> function is defined as:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">void</span>
<span class="nf">fin_close</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="n">fdptr</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="p">(</span><span class="kt">intptr_t</span><span class="p">)</span><span class="n">fdptr</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">fd</span> <span class="o">!=</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
        <span class="n">close</span><span class="p">(</span><span class="n">fd</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The garbage collector will call this function when the user pointer is
lost. If the user closes it early with <code class="language-plaintext highlighter-rouge">joymacs-close</code>, that function
will set the user pointer to -1, an invalid file descriptor, so that
it doesn’t get closed a second time here.</p>

<h4 id="joymacs-close">joymacs-close</h4>

<p>Here’s <code class="language-plaintext highlighter-rouge">joymacs-close</code>, which is a bit simpler.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="n">emacs_value</span>
<span class="nf">joymacs_close</span><span class="p">(</span><span class="n">emacs_env</span> <span class="o">*</span><span class="n">env</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">n</span><span class="p">,</span> <span class="n">emacs_value</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">ptr</span><span class="p">)</span>
<span class="p">{</span>
    <span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="n">ptr</span><span class="p">;</span>
    <span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="n">n</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="p">(</span><span class="kt">intptr_t</span><span class="p">)</span><span class="n">env</span><span class="o">-&gt;</span><span class="n">get_user_ptr</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">env</span><span class="o">-&gt;</span><span class="n">non_local_exit_check</span><span class="p">(</span><span class="n">env</span><span class="p">)</span> <span class="o">!=</span> <span class="n">emacs_funcall_exit_return</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">fd</span> <span class="o">!=</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">close</span><span class="p">(</span><span class="n">fd</span><span class="p">);</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">set_user_ptr</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)(</span><span class="kt">intptr_t</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Again, it starts by extracting its argument, relying on Emacs to do
the check:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="p">(</span><span class="kt">intptr_t</span><span class="p">)</span><span class="n">env</span><span class="o">-&gt;</span><span class="n">get_user_ptr</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">env</span><span class="o">-&gt;</span><span class="n">non_local_exit_check</span><span class="p">(</span><span class="n">env</span><span class="p">)</span> <span class="o">!=</span> <span class="n">emacs_funcall_exit_return</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
</code></pre></div></div>

<p>If the user pointer hasn’t been closed yet, then close it and strip
out the file descriptor to prevent further closes.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">if</span> <span class="p">(</span><span class="n">fd</span> <span class="o">!=</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">close</span><span class="p">(</span><span class="n">fd</span><span class="p">);</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">set_user_ptr</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)(</span><span class="kt">intptr_t</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>
</code></pre></div></div>

<h4 id="joymacs-read">joymacs-read</h4>

<p>The <code class="language-plaintext highlighter-rouge">joymacs-read</code> function is doing something a little unusual for an
Emacs Lisp function. It takes two arguments: the joystick handle and a
5-element vector. Instead of returning the event in some
representation, it fills the vector with the event details. The are
two reasons for this:</p>

<ol>
  <li>
    <p>The API has no function for creating vectors … though the module
<em>could</em> get the <code class="language-plaintext highlighter-rouge">make-symbol</code> vector and call it to create a
vector.</p>
  </li>
  <li>
    <p>The idiom for event pumps is for the caller to supply a buffer to
the pump. This has better performance by avoiding lots of
unnecessary allocations, especially since events tend to be
message-like objects with a short, well-defined extent.</p>
  </li>
</ol>

<p>Here’s the full definition:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="n">emacs_value</span>
<span class="nf">joymacs_read</span><span class="p">(</span><span class="n">emacs_env</span> <span class="o">*</span><span class="n">env</span><span class="p">,</span> <span class="kt">ptrdiff_t</span> <span class="n">n</span><span class="p">,</span> <span class="n">emacs_value</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">ptr</span><span class="p">)</span>
<span class="p">{</span>
    <span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="n">n</span><span class="p">;</span>
    <span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="n">ptr</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="p">(</span><span class="kt">intptr_t</span><span class="p">)</span><span class="n">env</span><span class="o">-&gt;</span><span class="n">get_user_ptr</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">env</span><span class="o">-&gt;</span><span class="n">non_local_exit_check</span><span class="p">(</span><span class="n">env</span><span class="p">)</span> <span class="o">!=</span> <span class="n">emacs_funcall_exit_return</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
    <span class="k">struct</span> <span class="n">js_event</span> <span class="n">e</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">r</span> <span class="o">=</span> <span class="n">read</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">e</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">e</span><span class="p">));</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">r</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span> <span class="o">&amp;&amp;</span> <span class="n">errno</span> <span class="o">==</span> <span class="n">EAGAIN</span><span class="p">)</span> <span class="p">{</span>
        <span class="cm">/* No more events. */</span>
        <span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">r</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="cm">/* An actual read error (joystick unplugged, etc.). */</span>
        <span class="n">emacs_value</span> <span class="n">signal</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">intern</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="s">"file-error"</span><span class="p">);</span>
        <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">error</span> <span class="o">=</span> <span class="n">strerror</span><span class="p">(</span><span class="n">errno</span><span class="p">);</span>
        <span class="kt">size_t</span> <span class="n">len</span> <span class="o">=</span> <span class="n">strlen</span><span class="p">(</span><span class="n">error</span><span class="p">);</span>
        <span class="n">emacs_value</span> <span class="n">message</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_string</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">error</span><span class="p">,</span> <span class="n">len</span><span class="p">);</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">non_local_exit_signal</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">signal</span><span class="p">,</span> <span class="n">message</span><span class="p">);</span>
        <span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="cm">/* Fill out event vector. */</span>
        <span class="n">emacs_value</span> <span class="n">v</span> <span class="o">=</span> <span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
        <span class="n">emacs_value</span> <span class="n">type</span> <span class="o">=</span> <span class="n">e</span><span class="p">.</span><span class="n">type</span> <span class="o">&amp;</span> <span class="n">JS_EVENT_BUTTON</span> <span class="o">?</span> <span class="n">button</span> <span class="o">:</span> <span class="n">axis</span><span class="p">;</span>
        <span class="n">emacs_value</span> <span class="n">value</span><span class="p">;</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">type</span> <span class="o">==</span> <span class="n">button</span><span class="p">)</span>
            <span class="n">value</span> <span class="o">=</span> <span class="n">e</span><span class="p">.</span><span class="n">value</span> <span class="o">?</span> <span class="n">t</span> <span class="o">:</span> <span class="n">nil</span><span class="p">;</span>
        <span class="k">else</span>
            <span class="n">value</span> <span class="o">=</span>  <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_float</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">e</span><span class="p">.</span><span class="n">value</span> <span class="o">/</span> <span class="p">(</span><span class="kt">double</span><span class="p">)</span><span class="n">INT16_MAX</span><span class="p">);</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">vec_set</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_integer</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">e</span><span class="p">.</span><span class="n">time</span><span class="p">));</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">vec_set</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">type</span><span class="p">);</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">vec_set</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">value</span><span class="p">);</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">vec_set</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_integer</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">e</span><span class="p">.</span><span class="n">number</span><span class="p">));</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">vec_set</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="n">e</span><span class="p">.</span><span class="n">type</span> <span class="o">&amp;</span> <span class="n">JS_EVENT_INIT</span> <span class="o">?</span> <span class="n">t</span> <span class="o">:</span> <span class="n">nil</span><span class="p">);</span>
        <span class="k">return</span> <span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>As before, extract the first argument and check for a signal. Then
call <code class="language-plaintext highlighter-rouge">read(2)</code> to get an event. If the read fails with <code class="language-plaintext highlighter-rouge">EAGAIN</code>, it’s
not a real failure. There are just no more events, so return nil.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">struct</span> <span class="n">js_event</span> <span class="n">e</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">r</span> <span class="o">=</span> <span class="n">read</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">e</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">e</span><span class="p">));</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">r</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span> <span class="o">&amp;&amp;</span> <span class="n">errno</span> <span class="o">==</span> <span class="n">EAGAIN</span><span class="p">)</span> <span class="p">{</span>
        <span class="cm">/* No more events. */</span>
        <span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>If the read failed with something else — perhaps the joystick was
unplugged — signal an error. The <code class="language-plaintext highlighter-rouge">strerror(3)</code> string is used for the
signal data.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">if</span> <span class="p">(</span><span class="n">r</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="cm">/* An actual read error (joystick unplugged, etc.). */</span>
        <span class="n">emacs_value</span> <span class="n">signal</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">intern</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="s">"file-error"</span><span class="p">);</span>
        <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">error</span> <span class="o">=</span> <span class="n">strerror</span><span class="p">(</span><span class="n">errno</span><span class="p">);</span>
        <span class="n">emacs_value</span> <span class="n">message</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_string</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">error</span><span class="p">,</span> <span class="n">strlen</span><span class="p">(</span><span class="n">error</span><span class="p">));</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">non_local_exit_signal</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">signal</span><span class="p">,</span> <span class="n">message</span><span class="p">);</span>
        <span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>Otherwise fill out the event vector. If the second argument isn’t a
vector, or if it’s too short, the signal will automatically get raised
by Emacs. The module can keep plowing through the <code class="language-plaintext highlighter-rouge">vec_set()</code> calls
safely since it’s not committing to anything.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>        <span class="cm">/* Fill out event vector. */</span>
        <span class="n">emacs_value</span> <span class="n">v</span> <span class="o">=</span> <span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
        <span class="n">emacs_value</span> <span class="n">type</span> <span class="o">=</span> <span class="n">e</span><span class="p">.</span><span class="n">type</span> <span class="o">&amp;</span> <span class="n">JS_EVENT_BUTTON</span> <span class="o">?</span> <span class="n">button</span> <span class="o">:</span> <span class="n">axis</span><span class="p">;</span>
        <span class="n">emacs_value</span> <span class="n">value</span><span class="p">;</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">type</span> <span class="o">==</span> <span class="n">button</span><span class="p">)</span>
            <span class="n">value</span> <span class="o">=</span> <span class="n">e</span><span class="p">.</span><span class="n">value</span> <span class="o">?</span> <span class="n">t</span> <span class="o">:</span> <span class="n">nil</span><span class="p">;</span>
        <span class="k">else</span>
            <span class="n">value</span> <span class="o">=</span>  <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_float</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">e</span><span class="p">.</span><span class="n">value</span> <span class="o">/</span> <span class="p">(</span><span class="kt">double</span><span class="p">)</span><span class="n">INT16_MAX</span><span class="p">);</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">vec_set</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_integer</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">e</span><span class="p">.</span><span class="n">time</span><span class="p">));</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">vec_set</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">type</span><span class="p">);</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">vec_set</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">value</span><span class="p">);</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">vec_set</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_integer</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">e</span><span class="p">.</span><span class="n">number</span><span class="p">));</span>
        <span class="n">env</span><span class="o">-&gt;</span><span class="n">vec_set</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="n">e</span><span class="p">.</span><span class="n">type</span> <span class="o">&amp;</span> <span class="n">JS_EVENT_INIT</span> <span class="o">?</span> <span class="n">t</span> <span class="o">:</span> <span class="n">nil</span><span class="p">);</span>
        <span class="k">return</span> <span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
</code></pre></div></div>

<p>The Linux event struct has four fields and the function fills out five
values of the vector. This is because the <code class="language-plaintext highlighter-rouge">type</code> field has a bit flag
indicating initialization events. This is split out into an extra
t/nil value. It also normalizes axis values and converts button values
into t/nil, which makes more sense for Emacs Lisp. The event itself is
returned since it’s a truthy value and it’s convenient for the caller.</p>

<p>The astute programmer might notice that the negative side of the axis
could go just below -1.0, since <code class="language-plaintext highlighter-rouge">INT16_MIN</code> has one extra value over
<code class="language-plaintext highlighter-rouge">INT16_MAX</code> (two’s complement). It doesn’t seem to be documented, but
the joystick drivers I’ve seen never exactly return <code class="language-plaintext highlighter-rouge">INT16_MIN</code>, so
this is in fact the correct way to normalize it.</p>

<h4 id="initialization">Initialization</h4>

<p><em>Update 2021</em>: In a previous version of this article, I talked about
interning symbols during initialziation so that they do not need to be
re-interned each time the module is called. This <a href="https://github.com/skeeto/joymacs/issues/1">no longer works</a>,
and it was probably never intended to be work in the first place. The
lesson is simple: <strong>Do not reuse Emacs objects between module calls.</strong></p>

<p>First grab the <code class="language-plaintext highlighter-rouge">fset</code> symbol since this function will be needed to bind
names to the module’s functions.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">emacs_value</span> <span class="n">fset</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">intern</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="s">"fset"</span><span class="p">);</span>
</code></pre></div></div>

<p>Using <code class="language-plaintext highlighter-rouge">fset</code>, bind the functions. The second and third arguments to
<code class="language-plaintext highlighter-rouge">make_function</code> are the minimum and maximum number of arguments, which
<a href="/blog/2014/01/04/">may look familiar</a>. The last argument is that closure pointer
I mentioned at the beginning.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">emacs_value</span> <span class="n">args</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span>
    <span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">intern</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="s">"joymacs-open"</span><span class="p">);</span>
    <span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">make_function</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">joymacs_open</span><span class="p">,</span> <span class="n">doc</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
    <span class="n">env</span><span class="o">-&gt;</span><span class="n">funcall</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">fset</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">args</span><span class="p">);</span>
</code></pre></div></div>

<p>If the module is to be loaded with <code class="language-plaintext highlighter-rouge">require</code> like any other package,
it needs to provide: <code class="language-plaintext highlighter-rouge">(provide 'joymacs)</code>.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">emacs_value</span> <span class="n">provide</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">intern</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="s">"provide"</span><span class="p">);</span>
    <span class="n">emacs_value</span> <span class="n">joymacs</span> <span class="o">=</span> <span class="n">env</span><span class="o">-&gt;</span><span class="n">intern</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="s">"joymacs"</span><span class="p">);</span>
    <span class="n">env</span><span class="o">-&gt;</span><span class="n">funcall</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">provide</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">joymacs</span><span class="p">);</span>
</code></pre></div></div>

<p>And that’s it!</p>

<p>The source repository now includes a port to Windows (XInput). If
you’re on Linux or Windows, have Emacs 25 with modules enabled, and a
joystick is plugged in, then <code class="language-plaintext highlighter-rouge">make run</code> in the repository should bring
up Emacs running a joystick calibration demonstration. The module
can’t poke at Emacs when events are ready, so instead there’s a timer
that polls the module for events.</p>

<p>I’d like to someday see an Emacs Lisp game well-suited for a joystick.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>An Elfeed Database Analysis</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/08/12/"/>
    <id>urn:uuid:7f407aaa-229a-388c-ab7a-73e8ed24c04a</id>
    <updated>2016-08-12T03:20:16Z</updated>
    <category term="emacs"/><category term="elfeed"/>
    <content type="html">
      <![CDATA[<p>The end of the month marks <a href="/blog/2013/09/04/">Elfeed’s third birthday</a>. Surprising
to nobody, it’s also been three years of heavy, daily use by me. While
I’ve used Elfeed concurrently on a number of different machines over
this period, I’ve managed to keep an Elfeed <a href="/blog/2013/09/09/">database index</a>
with a lineage going all the way back to the initial development
stages, before the announcement. It’s a large, organically-grown
database that serves as a daily performance stress test. Hopefully
this means I’m one of the first people to have trouble if an invisible
threshold is ever exceeded.</p>

<p>I’m also the sort of person who gets excited when I come across an
interesting dataset, and I have this gem sitting right in front of me.
So a couple of days ago I pushed a new Elfeed function,
<code class="language-plaintext highlighter-rouge">elfeed-csv-export</code>, which exports a database index into three CSV
files. These are intended to serve as three tables in a SQL database,
exposing the database to interesting relational queries and joins.
Entry content (HTML, etc.) has always been considered volatile, so
this is not exported. The export function isn’t interactive (yet?), so
if you want to generate your own you’ll need to <code class="language-plaintext highlighter-rouge">(require
'elfeed-csv)</code> and evaluate it yourself.</p>

<p>All the source code for performing the analysis below on your own
database can be found here:</p>

<ul>
  <li><a href="https://github.com/skeeto/elfeed-analysis">https://github.com/skeeto/elfeed-analysis</a></li>
</ul>

<p>The three exported tables are <em>feeds</em>, <em>entries</em>, and <em>tags</em>. Here are
the corresponding columns (optional CSV header) for each:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>url, title, canonical-url, author
id, feed, title, link, date
entry, feed, tag
</code></pre></div></div>

<p>And here’s the SQLite schema I’m using for these tables:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">feeds</span> <span class="p">(</span>
    <span class="n">url</span> <span class="nb">TEXT</span> <span class="k">PRIMARY</span> <span class="k">KEY</span><span class="p">,</span>
    <span class="n">title</span> <span class="nb">TEXT</span><span class="p">,</span>
    <span class="n">canonical_url</span> <span class="nb">TEXT</span><span class="p">,</span>
    <span class="n">author</span> <span class="nb">TEXT</span>
<span class="p">);</span>

<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">entries</span> <span class="p">(</span>
    <span class="n">id</span> <span class="nb">TEXT</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="n">feed</span> <span class="nb">TEXT</span> <span class="k">NOT</span> <span class="k">NULL</span> <span class="k">REFERENCES</span> <span class="n">feeds</span> <span class="p">(</span><span class="n">url</span><span class="p">),</span>
    <span class="n">title</span> <span class="nb">TEXT</span><span class="p">,</span>
    <span class="n">link</span> <span class="nb">TEXT</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="nb">date</span> <span class="nb">REAL</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="k">PRIMARY</span> <span class="k">KEY</span> <span class="p">(</span><span class="n">id</span><span class="p">,</span> <span class="n">feed</span><span class="p">)</span>
<span class="p">);</span>

<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">tags</span> <span class="p">(</span>
    <span class="n">entry</span> <span class="nb">TEXT</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="n">feed</span> <span class="nb">TEXT</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="n">tag</span> <span class="nb">TEXT</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
    <span class="k">FOREIGN</span> <span class="k">KEY</span> <span class="p">(</span><span class="n">entry</span><span class="p">,</span> <span class="n">feed</span><span class="p">)</span> <span class="k">REFERENCES</span> <span class="n">entries</span> <span class="p">(</span><span class="n">id</span><span class="p">,</span> <span class="n">feed</span><span class="p">)</span>
<span class="p">);</span>
</code></pre></div></div>

<p>Web authors are notoriously awful at picking actually-unique entry
IDs, even when <a href="/blog/2013/09/23/">using the smarter option</a>, Atom. I still simply
don’t trust that entry IDs are unique, so, as usual, I’ve qualified
them by their source feed URL, hence the primary key on both columns
in <code class="language-plaintext highlighter-rouge">entries</code>.</p>

<p>At this point I wish I had collected a lot more information. If I were
to start fresh today, Elfeed’s database schema would not only fully
match Atom’s schema, but also exceed it with additional logging:</p>

<ul>
  <li>When was each entry actually fetched?</li>
  <li>How did each entry change since the last fetch?</li>
  <li>When and for what reason did a feed fetch fail?</li>
  <li>When did an entry stop appearing in a feed?</li>
  <li>How long did fetching take?</li>
  <li>How long did parsing take?</li>
  <li>Which computer (hostname) performed the fetch?</li>
  <li>What interesting HTTP headers were included?</li>
  <li>Even if not kept for archival, how large was the content?</li>
</ul>

<p>I may start tracking some of these. If I don’t, I’ll be kicking myself
three years from now when I look at this again.</p>

<h3 id="a-look-at-my-index">A look at my index</h3>

<p>So just how big is my index? It’s <strong>25MB uncompressed</strong>, 2.5MB
compressed. I currently follow 117 feeds, but my index includes
<strong>43,821 entries</strong> from <strong>309 feeds</strong>. These entries are marked with
<strong>53,360 tags</strong> from a set of 35 unique tags. Some of these datapoints
are the result of temporarily debugging Elfeed issues and don’t
represent content that I actually follow. I’m more careful these days
to test in a temporary database as to avoid contamination. Some are
duplicates due to feeds changing URLs over the years. Some are
artifacts from old bugs. This all represents a bit of noise, but
should be negligible. During my analysis I noticed some of these
anomalies and took a moment to clean up obviously bogus data (weird
dates, etc.), all by adjusting tags.</p>

<p>The first thing I wanted to know is the weekday frequency. A number of
times I’ve blown entire Sundays working on Elfeed, and, as if to
frustrate my testing, it’s not unusual for several hours to pass
between new entries on Sundays. Is this just my perception or are
Sundays really that slow?</p>

<p>Here’s my query. I’m using SQLite’s <a href="https://www.sqlite.org/lang_datefunc.html">strftime</a> to shift the
result into my local time zone, Eastern Time. This time zone is the
source, or close to the source, of a large amount of the content. This
also automatically accounts for daylight savings time, which can’t be
done with a simple divide and subtract.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">tag</span><span class="p">,</span>
       <span class="k">cast</span><span class="p">(</span><span class="n">strftime</span><span class="p">(</span><span class="s1">'%w'</span><span class="p">,</span> <span class="nb">date</span><span class="p">,</span> <span class="s1">'unixepoch'</span><span class="p">,</span> <span class="s1">'localtime'</span><span class="p">)</span> <span class="k">AS</span> <span class="nb">INT</span><span class="p">)</span> <span class="k">AS</span> <span class="k">day</span><span class="p">,</span>
       <span class="k">count</span><span class="p">(</span><span class="n">id</span><span class="p">)</span> <span class="k">AS</span> <span class="k">count</span>
<span class="k">FROM</span> <span class="n">entries</span>
<span class="k">JOIN</span> <span class="n">tags</span> <span class="k">ON</span> <span class="n">tags</span><span class="p">.</span><span class="n">entry</span> <span class="o">=</span> <span class="n">entries</span><span class="p">.</span><span class="n">id</span> <span class="k">AND</span> <span class="n">tags</span><span class="p">.</span><span class="n">feed</span> <span class="o">=</span> <span class="n">entries</span><span class="p">.</span><span class="n">feed</span>
<span class="k">GROUP</span> <span class="k">BY</span> <span class="n">tag</span><span class="p">,</span> <span class="k">day</span><span class="p">;</span>
</code></pre></div></div>

<p>The most frequent tag (13,666 appearances) is “youtube”, which marks
every YouTube video, and I’ll use gnuplot to visualize it. The input
“file” is actually a command since gnuplot is poor at filtering data
itself, especially for histograms.</p>

<pre><code class="language-gnuplot">plot '&lt; grep ^youtube, weekdays.csv' using 2:3 with boxes
</code></pre>

<p><a href="/img/elfeed-graphs/weekdays-youtube.png"><img src="/img/elfeed-graphs/weekdays-youtube-thumb.png" alt="" /></a></p>

<p>Wow, things <em>do</em> quiet down dramatically on weekends! From the
glass-half-full perspective, this gives me a chance to catch up when I
inevitably fall behind on these videos during the week.</p>

<p>The same is basically true for other types of content, including
“comic” (12,465 entries) and “blog” (7,505 entries).</p>

<p><a href="/img/elfeed-graphs/weekdays-comic.png"><img src="/img/elfeed-graphs/weekdays-comic-thumb.png" alt="" /></a></p>

<p><a href="/img/elfeed-graphs/weekdays-blog.png"><img src="/img/elfeed-graphs/weekdays-blog-thumb.png" alt="" /></a></p>

<p>However, “emacs” (2,404 entries) is a different story. It doesn’t slow
down on the weekend, but Emacs users sure love to talk about Emacs on
Mondays. In my own index, this spike largely comes from <a href="http://planet.emacsen.org/">Planet
Emacsen</a>. Initially I thought maybe this was an artifact of
Planet Emacsen’s date handling — i.e. perhaps it does a big fetch on
Mondays and groups up the dates — but I double checked: they pass the
date directly through from the original articles.</p>

<p>Conclusion: Emacs users love Mondays. Or maybe they hate Mondays and
talk about Emacs as an escape.</p>

<p><a href="/img/elfeed-graphs/weekdays-emacs.png"><img src="/img/elfeed-graphs/weekdays-emacs-thumb.png" alt="" /></a></p>

<p>I can reuse the same query to look at different time scales. When
during the day do entries appear? Adjusting the time zone here becomes
a lot more important.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">tag</span><span class="p">,</span>
       <span class="k">cast</span><span class="p">(</span><span class="n">strftime</span><span class="p">(</span><span class="s1">'%H'</span><span class="p">,</span> <span class="nb">date</span><span class="p">,</span> <span class="s1">'unixepoch'</span><span class="p">,</span> <span class="s1">'localtime'</span><span class="p">)</span> <span class="k">AS</span> <span class="nb">INT</span><span class="p">)</span> <span class="k">AS</span> <span class="n">hour</span><span class="p">,</span>
       <span class="k">count</span><span class="p">(</span><span class="n">id</span><span class="p">)</span> <span class="k">AS</span> <span class="k">count</span>
<span class="k">FROM</span> <span class="n">entries</span>
<span class="k">JOIN</span> <span class="n">tags</span> <span class="k">ON</span> <span class="n">tags</span><span class="p">.</span><span class="n">entry</span> <span class="o">=</span> <span class="n">entries</span><span class="p">.</span><span class="n">id</span> <span class="k">AND</span> <span class="n">tags</span><span class="p">.</span><span class="n">feed</span> <span class="o">=</span> <span class="n">entries</span><span class="p">.</span><span class="n">feed</span>
<span class="k">GROUP</span> <span class="k">BY</span> <span class="n">tag</span><span class="p">,</span> <span class="n">hour</span><span class="p">;</span>
</code></pre></div></div>

<p>Emacs bloggers tend to follow a nice Eastern Time sleeping schedule.
(I wonder how Vim bloggers compare, since, as an Emacs user, I
naturally assume Vim users’ schedules are as undisciplined as their
bathing habits.) However, this also <a href="http://irreal.org/blog/">might be prolific the
Irreal</a> breaking the curve.</p>

<p><a href="/img/elfeed-graphs/hours-emacs.png"><img src="/img/elfeed-graphs/hours-emacs-thumb.png" alt="" /></a></p>

<p>The YouTube channels I follow are a bit more erratic, but there’s
still a big drop in the early morning and a spike in the early
afternoon. It’s unclear if the timestamp published in the feed is the
upload time or the publication time. This would make a difference in
the result (e.g. overnight video uploads).</p>

<p><a href="/img/elfeed-graphs/hours-youtube.png"><img src="/img/elfeed-graphs/hours-youtube-thumb.png" alt="" /></a></p>

<p>Do you suppose there’s a slow <em>month</em>?</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">tag</span><span class="p">,</span>
       <span class="k">cast</span><span class="p">(</span><span class="n">strftime</span><span class="p">(</span><span class="s1">'%m'</span><span class="p">,</span> <span class="nb">date</span><span class="p">,</span> <span class="s1">'unixepoch'</span><span class="p">,</span> <span class="s1">'localtime'</span><span class="p">)</span> <span class="k">AS</span> <span class="nb">INT</span><span class="p">)</span> <span class="k">AS</span> <span class="k">day</span><span class="p">,</span>
       <span class="k">count</span><span class="p">(</span><span class="n">id</span><span class="p">)</span> <span class="k">AS</span> <span class="k">count</span>
<span class="k">FROM</span> <span class="n">entries</span>
<span class="k">JOIN</span> <span class="n">tags</span> <span class="k">ON</span> <span class="n">tags</span><span class="p">.</span><span class="n">entry</span> <span class="o">=</span> <span class="n">entries</span><span class="p">.</span><span class="n">id</span> <span class="k">AND</span> <span class="n">tags</span><span class="p">.</span><span class="n">feed</span> <span class="o">=</span> <span class="n">entries</span><span class="p">.</span><span class="n">feed</span>
<span class="k">GROUP</span> <span class="k">BY</span> <span class="n">tag</span><span class="p">,</span> <span class="k">day</span><span class="p">;</span>
</code></pre></div></div>

<p>December is a big drop across all tags, probably for the holidays.
Both “comic” and “blog” also have an interesting drop in August. For
brevity, I’ll only show one. This might be partially due my not
waiting until the end of this month for this analysis, since there are
only 2.5 Augusts in my 3-year dataset.</p>

<p><a href="/img/elfeed-graphs/months-comic.png"><img src="/img/elfeed-graphs/months-comic-thumb.png" alt="" /></a></p>

<p>Unfortunately the timestamp is the only direct <em>numerical</em> quantity in
the data. So far I’ve been binning data points and counting to get a
second numerical quantity. Everything else is text, so I’ll need to
get more creative to find other interesting relationships.</p>

<p>So let’s have a look a the lengths of entry titles.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">tag</span><span class="p">,</span>
       <span class="k">length</span><span class="p">(</span><span class="n">title</span><span class="p">)</span> <span class="k">AS</span> <span class="k">length</span><span class="p">,</span>
       <span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="k">AS</span> <span class="k">count</span>
<span class="k">FROM</span> <span class="n">entries</span>
<span class="k">JOIN</span> <span class="n">tags</span> <span class="k">ON</span> <span class="n">tags</span><span class="p">.</span><span class="n">entry</span> <span class="o">=</span> <span class="n">entries</span><span class="p">.</span><span class="n">id</span> <span class="k">AND</span> <span class="n">tags</span><span class="p">.</span><span class="n">feed</span> <span class="o">=</span> <span class="n">entries</span><span class="p">.</span><span class="n">feed</span>
<span class="k">GROUP</span> <span class="k">BY</span> <span class="n">tag</span><span class="p">,</span> <span class="k">length</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="k">length</span><span class="p">;</span>
</code></pre></div></div>

<p>The shortest are the webcomics. I’ve <a href="/blog/2015/09/26/">complained about poor webcomic
titles before</a>, so this isn’t surprising. The spikes are from
comics that follow a strict (uncreative) title format.</p>

<p><a href="/img/elfeed-graphs/lengths-comic.png"><img src="/img/elfeed-graphs/lengths-comic-thumb.png" alt="" /></a></p>

<p>Emacs article titles follow a nice distribution. You can tell these
are programmers because so many titles are exactly 32 characters long.
Picking this number is such a natural instinct that we aren’t even
aware of it. Or maybe all their database schemas have <code class="language-plaintext highlighter-rouge">VARCHAR(32)</code>
title columns?</p>

<p><a href="/img/elfeed-graphs/lengths-emacs.png"><img src="/img/elfeed-graphs/lengths-emacs-thumb.png" alt="" /></a></p>

<p>Blogs in general follow a nice distribution. The big spike is from the
<a href="http://www.bay12games.com/dwarves/index.html">Dwarf Fortress development blog</a>, which follows a strict date
format.</p>

<p><a href="/img/elfeed-graphs/lengths-blog.png"><img src="/img/elfeed-graphs/lengths-blog-thumb.png" alt="" /></a></p>

<p>The longest on average are YouTube videos. This is largely due to the
kinds of videos I watch (“Let’s Play” videos), which tend to have
long, predictable names.</p>

<p><a href="/img/elfeed-graphs/lengths-youtube.png"><img src="/img/elfeed-graphs/lengths-youtube-thumb.png" alt="" /></a></p>

<p>And finally, here’s the most interesting-looking graph of them all.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="p">((</span><span class="nb">date</span> <span class="o">-</span> <span class="mi">4</span><span class="o">*</span><span class="mi">60</span><span class="o">*</span><span class="mi">60</span><span class="p">)</span> <span class="o">%</span> <span class="p">(</span><span class="mi">24</span><span class="o">*</span><span class="mi">60</span><span class="o">*</span><span class="mi">60</span><span class="p">))</span> <span class="o">/</span> <span class="p">(</span><span class="mi">60</span><span class="o">*</span><span class="mi">60</span><span class="p">)</span> <span class="k">AS</span> <span class="n">day_time</span><span class="p">,</span>
       <span class="k">length</span><span class="p">(</span><span class="n">title</span><span class="p">)</span> <span class="k">AS</span> <span class="k">length</span>
<span class="k">FROM</span> <span class="n">entries</span>
<span class="k">JOIN</span> <span class="n">tags</span> <span class="k">ON</span> <span class="n">tags</span><span class="p">.</span><span class="n">entry</span> <span class="o">=</span> <span class="n">entries</span><span class="p">.</span><span class="n">id</span> <span class="k">AND</span> <span class="n">tags</span><span class="p">.</span><span class="n">feed</span> <span class="o">=</span> <span class="n">entries</span><span class="p">.</span><span class="n">feed</span><span class="p">;</span>
</code></pre></div></div>

<p>This is the title length versus time of day (not binned). Each point
is one of the 53,360 posts.</p>

<pre><code class="language-gnuplot">set style fill transparent solid 0.25 noborder
set style circle radius 0.04
plot 'length-vs-daytime.csv' using 1:2 with circles
</code></pre>

<p>(This is a good one to follow through to the full size image.)</p>

<p><a href="/img/elfeed-graphs/length-vs-daytime.png"><img src="/img/elfeed-graphs/length-vs-daytime-thumb.png" alt="" /></a></p>

<p>Again, all Eastern Time since I’m self-centered like that. Vertical
lines are authors rounding their post dates to the hour. Horizontal
lines are the length spikes from above, such as the line of entries at
title length 10 in the evening (Dwarf Fortress blog). There’s a the
mid-day cloud of entries of various title lengths, with the shortest
title cloud around mid-morning. That’s probably when many of the
webcomics come up.</p>

<p>Additional analysis could look further at textual content, beyond
simply length, in some quantitative way (n-grams? soundex?). But
mostly I really need to keep track of more data!</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Elfeed, cURL, and You</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2016/06/16/"/>
    <id>urn:uuid:76942398-f693-3127-fd45-19d508b5c044</id>
    <updated>2016-06-16T18:22:16Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="elfeed"/>
    <content type="html">
      <![CDATA[<p>This morning I pushed out an important update to <a href="https://github.com/skeeto/elfeed">Elfeed</a>, my
web feed reader for Emacs. The update should be available in MELPA by
the time you read this. Elfeed now has support for fetching feeds
using a <a href="https://curl.haxx.se/">cURL</a> through a <code class="language-plaintext highlighter-rouge">curl</code> inferior process. You’ll need
the program in your PATH or configured through
<code class="language-plaintext highlighter-rouge">elfeed-curl-program-name</code>.</p>

<p>I’ve been using it for a couple of days now, but, while I work out the
remaining kinks, it’s disabled by default. So in addition to having
cURL installed, you’ll need to set <code class="language-plaintext highlighter-rouge">elfeed-use-curl</code> to non-nil.
Sometime soon it will be enabled by default whenever cURL is
available. The original <code class="language-plaintext highlighter-rouge">url-retrieve</code> fetcher will remain in place
for time time being. However, cURL <em>may</em> become a requirement someday.</p>

<p>Fetching with a <code class="language-plaintext highlighter-rouge">curl</code> inferior process has some huge advantages.</p>

<h3 id="its-much-faster">It’s much faster</h3>

<p>The most obvious change is that you should experience a huge speedup
on updates and better responsiveness during updates after the first
cURL run. There are important two reasons:</p>

<p><strong>Asynchronous DNS and TCP</strong>: Emacs 24 and earlier performs DNS
queries synchronously even for asynchronous network processes. This is
being fixed on some platforms (including Linux) in Emacs 25, but now
we don’t have to wait.</p>

<p>On Windows it’s even worse: the TCP connection is also established
synchronously. This is especially bad when fetching relatively small
items such as feeds, because the DNS look-up and TCP handshake dominate
the overall fetch time. It essentially makes the whole process
synchronous.</p>

<p><strong>Conditional GET</strong>: HTTP has two mechanism to avoid transmitting
information that a client has previously fetched. One is the
Last-Modified header delivered by the server with the content. When
querying again later, the client echos the date back <a href="https://utcc.utoronto.ca/~cks/space/blog/web/IfModifiedSinceHowNot">like a
token</a> in the If-Modified-Since header.</p>

<p>The second is the “entity tag,” an arbitrary server-selected token
associated with each version of the content. The server delivers it
along with the content in the ETag header, and the client hands it
back later in the If-None-Match header, sort of like a cookie.</p>

<p>This is highly valuable for feeds because, unless the feed is
particularly active, most of the time the feed hasn’t been updated
since the last query. This avoids sending anything other hand a
handful of headers each way. In Elfeed’s case, it means <strong>it doesn’t
have to parse the same XML over and over again</strong>.</p>

<p>Both of these being outside of cURL’s scope, Elfeed has to manage
conditional GET itself. I had no control over the HTTP headers until
now, so I couldn’t take advantage of it. Emacs’ <code class="language-plaintext highlighter-rouge">url-retrieve</code>
function allows for sending custom headers through dynamically binding
<code class="language-plaintext highlighter-rouge">url-request-extra-headers</code>, but this isn’t available when calling
<code class="language-plaintext highlighter-rouge">url-queue-retrieve</code> since the request itself is created
asynchronously.</p>

<p>Both the ETag and Last-Modified values are stored in the database and
persist across sessions. This is the reason the full speedup isn’t
realized until the second fetch. The initial cURL fetch doesn’t have
these values.</p>

<h3 id="fewer-bugs">Fewer bugs</h3>

<p>As mentioned previously, Emacs has a built-in URL retrieval library
called <code class="language-plaintext highlighter-rouge">url</code>. The central function is <code class="language-plaintext highlighter-rouge">url-retrieve</code> which
asynchronously fetches the content at an arbitrary URL (usually HTTP)
and delivers the buffer and status to a callback when it’s ready.
There’s also a queue front-end for it, <code class="language-plaintext highlighter-rouge">url-queue-retrieve</code> which
limits the number of parallel connections. Elfeed hands this function
a pile of feed URLs all at once and it fetches them N at a time.</p>

<p>Unfortunately both these functions are <em>incredibly</em> buggy. It’s been a
thorn in my side for years.</p>

<p>Here’s what the interface looks like for both:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">url-retrieve</span> <span class="nv">URL</span> <span class="nv">CALLBACK</span> <span class="k">&amp;optional</span> <span class="nv">CBARGS</span> <span class="nv">SILENT</span> <span class="nv">INHIBIT-COOKIES</span><span class="p">)</span>
</code></pre></div></div>

<p>It takes a URL and a callback. Seeing this, the sane, unsurprising
expectation is the callback will be invoked <em>exactly once</em> for time
<code class="language-plaintext highlighter-rouge">url-retrieve</code> was called. In any case where the request fails, it
should report it through the callback. <a href="http://debbugs.gnu.org/cgi/bugreport.cgi?bug=20159">This is not the case</a>.
The callback may be invoked any number of times, <em>including zero</em>.</p>

<p>In this example, suppose you have a webserver that will return an HTTP
404 for a requested URL. Below, I fire off 10 asynchronous requests in a
row.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">results</span> <span class="p">())</span>
<span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">i</span> <span class="mi">10</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">url-retrieve</span> <span class="s">"http://127.0.0.1:8080/404"</span>
                <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">status</span><span class="p">)</span> <span class="p">(</span><span class="nb">push</span> <span class="p">(</span><span class="nb">cons</span> <span class="nv">i</span> <span class="nv">status</span><span class="p">)</span> <span class="nv">results</span><span class="p">))))</span>
</code></pre></div></div>

<p>What would you guess is the length of <code class="language-plaintext highlighter-rouge">results</code>? It’s initially 0
before any requests complete and over time (a very short time) I would
expect this to top out at 10. On Emacs 24, here’s the real answer:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">length</span> <span class="nv">results</span><span class="p">)</span>
<span class="c1">;; =&gt; 46</span>
</code></pre></div></div>

<p>The same error is reported multiple times to the callback. At least
the pattern is obvious.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-count</span> <span class="mi">0</span> <span class="nv">results</span> <span class="ss">:key</span> <span class="nf">#'</span><span class="nb">car</span><span class="p">)</span>
<span class="c1">;; =&gt; 9</span>
<span class="p">(</span><span class="nv">cl-count</span> <span class="mi">1</span> <span class="nv">results</span> <span class="ss">:key</span> <span class="nf">#'</span><span class="nb">car</span><span class="p">)</span>
<span class="c1">;; =&gt; 8</span>
<span class="p">(</span><span class="nv">cl-count</span> <span class="mi">2</span> <span class="nv">results</span> <span class="ss">:key</span> <span class="nf">#'</span><span class="nb">car</span><span class="p">)</span>
<span class="c1">;; =&gt; 7</span>

<span class="p">(</span><span class="nv">cl-count</span> <span class="mi">9</span> <span class="nv">results</span> <span class="ss">:key</span> <span class="nf">#'</span><span class="nb">car</span><span class="p">)</span>
<span class="c1">;; =&gt; 1</span>
</code></pre></div></div>

<p>Here’s another one, this time to the non-existent foo.example. The DNS
query should never resolve.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">results</span> <span class="p">())</span>
<span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">i</span> <span class="mi">10</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">url-retrieve</span> <span class="s">"http://foo.example/"</span>
                <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">status</span><span class="p">)</span> <span class="p">(</span><span class="nb">push</span> <span class="p">(</span><span class="nb">cons</span> <span class="nv">i</span> <span class="nv">status</span><span class="p">)</span> <span class="nv">results</span><span class="p">))))</span>
</code></pre></div></div>

<p>What’s the length of <code class="language-plaintext highlighter-rouge">results</code>? This time it’s zero. Remember how DNS
is synchronous? Because of this, DNS failures are reported
synchronously as a signaled error. This gets a lot worse with
<code class="language-plaintext highlighter-rouge">url-queue-retrieve</code>. Since the request is put off until later, DNS
doesn’t fail until later, and you get neither a callback nor an error
signal. This also puts the queue in a bad state and necessitated
<code class="language-plaintext highlighter-rouge">elfeed-unjam</code> for manually clear it. This one should get fixed in
Emacs 25 when DNS is asynchronous.</p>

<p>This last one assumes you don’t have anything listening on port 57432
(pulled out of nowhere) so that the connection fails.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">results</span> <span class="p">())</span>
<span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">i</span> <span class="mi">10</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">url-retrieve</span> <span class="s">"http://127.0.0.1:57432/"</span>
                <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">status</span><span class="p">)</span> <span class="p">(</span><span class="nb">push</span> <span class="p">(</span><span class="nb">cons</span> <span class="nv">i</span> <span class="nv">status</span><span class="p">)</span> <span class="nv">results</span><span class="p">))))</span>
</code></pre></div></div>

<p>On Linux, we finally get the sane result of 10. However, on Windows,
it’s zero. The synchronous TCP connection will fail, signaling an
error just like DNS failures. Not only is it broken, it’s broken in
different ways on different platforms.</p>

<p>There are many more cases of callback weirdness which depend on the
connection and HTTP session being in various states when thing go
awry. These were just the easiest to demonstrate. By using cURL, I get
to bypass this mess.</p>

<h3 id="no-more-gnutls-issues">No more GnuTLS issues</h3>

<p>At compile time, Emacs can optionally be linked against GnuTLS, giving
it robust TLS support so long as the shared library is available.
<code class="language-plaintext highlighter-rouge">url-retrieve</code> uses this for fetching HTTPS content. Unfortunately,
this library is noisy and will occasionally echo non-informational
messages in the minibuffer and in <code class="language-plaintext highlighter-rouge">*Messages*</code> that cannot be
suppressed.</p>

<p>When not linked against GnuTLS, Emacs will instead run the GnuTLS
command line program as an inferior process, just like Elfeed now does
with cURL. Unfortunately this interface is very slow and frequently
fails, basically preventing Elfeed from fetching HTTPS feeds. I
suspect it’s in part due to an improper <code class="language-plaintext highlighter-rouge">coding-system-for-read</code>.</p>

<p>cURL handles all the TLS negotation itself, so both these problems
disappear. The compile-time configuration doesn’t matter.</p>

<h3 id="windows-is-now-supported">Windows is now supported</h3>

<p>Emacs’ Windows networking code is so unstable, even in Emacs 25, that
I couldn’t make any practical use of Elfeed on that platform. Even the
Cygwin emacs-w32 version couldn’t cut it. It hard crashes Emacs every
time I’ve tried to fetch feeds. Fortunately the inferior process code
is a whole lot more stable, meaning fetching with cURL works great. As
of today, you can now use Elfeed on Windows. The biggest obstable is
getting cURL installed and configured.</p>

<h3 id="interface-changes">Interface changes</h3>

<p>With cURL, obviously the values of <code class="language-plaintext highlighter-rouge">url-queue-timeout</code> and
<code class="language-plaintext highlighter-rouge">url-queue-parallel-processes</code> no longer have any meaning to Elfeed.
If you set these for yourself, you should instead call the functions
<code class="language-plaintext highlighter-rouge">elfeed-set-timeout</code> and <code class="language-plaintext highlighter-rouge">elfeed-set-max-connections</code>, which will do
the appropriate thing depending on the value of <code class="language-plaintext highlighter-rouge">elfeed-use-curl</code>.
Each also comes with a getter so you can query the current value.</p>

<p>The deprecated <code class="language-plaintext highlighter-rouge">elfeed-max-connections</code> has been removed.</p>

<p>Feed objects now have meta tags <code class="language-plaintext highlighter-rouge">:etag</code>, <code class="language-plaintext highlighter-rouge">:last-modified</code>, and
<code class="language-plaintext highlighter-rouge">:canonical-url</code>. The latter can identify feeds that have been moved,
though it needs a real UI.</p>

<h3 id="see-any-bugs">See any bugs?</h3>

<p>If you use Elfeed, grab the current update and give the cURL fetcher a
shot. Please open a ticket if you find problems. Be sure to report
your Emacs version, operating system, and cURL version.</p>

<p>As of this writing there’s just one thing missing compared to
url-queue: connection reuse. cURL supports it, so I just need to code
it up.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>9 Elfeed Features You Might Not Know</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2015/12/03/"/>
    <id>urn:uuid:26807fd8-4b69-3caa-552a-90308cc0b24f</id>
    <updated>2015-12-03T22:33:17Z</updated>
    <category term="emacs"/><category term="elfeed"/>
    <content type="html">
      <![CDATA[<p>It’s been two years since <a href="/blog/2013/11/26/">I last wrote about Elfeed</a>, my
<a href="https://github.com/skeeto/elfeed">Atom/RSS feed reader for Emacs</a>. I’ve used it every single
day since, and I continue to maintain it with help from the community.
So far 18 people besides me have contributed commits. Over the last
couple of years it’s accumulated some new features, some more obvious
than others.</p>

<p>Every time I mark a new release, I update the ChangeLog at the top of
elfeed.el which lists what’s new. Since it’s easy to overlook many of
the newer useful features, I thought I’d list the more important ones
here.</p>

<h4 id="custom-entry-colors">Custom Entry Colors</h4>

<p>You can now customize entry faces through <code class="language-plaintext highlighter-rouge">elfeed-search-face-alist</code>.
This variable maps tags to faces. An entry inherits the face of any
tag it carries. Previously “unread” was a special tag that got a bold
face, but this is now implemented as nothing more than an initial
entry in the alist.</p>

<p><a href="/img/elfeed/colors.png"><img src="/img/elfeed/colors-thumb.png" alt="" /></a></p>

<p>I’ve been using it to mark different kinds of content (videos,
podcasts, comics) with different colors.</p>

<h4 id="autotagging">Autotagging</h4>

<p>You can specify the starting tags for entries from particular feeds
directly in the feed listing. This has been a feature for awhile now,
but it’s not something you’d want to miss. It started out as a feature
in my personal configuration that eventually migrated into Elfeed
proper.</p>

<p>For example, your <code class="language-plaintext highlighter-rouge">elfeed-feeds</code> may initially look like this,
especially if you imported from OPML.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="s">"https://nullprogram.com/feed/"</span>
 <span class="s">"http://nedroid.com/feed/"</span>
 <span class="s">"https://www.youtube.com/feeds/videos.xml?user=quill18"</span><span class="p">)</span>
</code></pre></div></div>

<p>If you wanted certain tags applied to entries from each, you would
need to putz around with <code class="language-plaintext highlighter-rouge">elfeed-make-tagger</code>. For the most common
case — apply certain tags to all entries from a URL — it’s much
simpler to specify the information as part of the listing itself,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">((</span><span class="s">"https://nullprogram.com/feed/"</span> <span class="nv">blog</span> <span class="nv">emacs</span><span class="p">)</span>
 <span class="p">(</span><span class="s">"http://nedroid.com/feed/"</span> <span class="nv">webcomic</span><span class="p">)</span>
 <span class="p">(</span><span class="s">"https://www.youtube.com/feeds/videos.xml?user=quill18"</span> <span class="nv">youtube</span><span class="p">))</span>
</code></pre></div></div>

<p>Today I only use custom tagger functions in my own configuration to
filter within a couple of particularly noisy feeds.</p>

<h4 id="arbitrary-metadata">Arbitrary Metadata</h4>

<p>Metadata is more for Elfeed extensions (i.e. <a href="https://github.com/remyhonig/elfeed-org">elfeed-org</a>)
than regular users. You can attach arbitrary, <a href="/blog/2013/12/30/">readable</a>
metadata to any Elfeed object (entry, feed). This metadata is
automatically stored in the database. It’s a plist.</p>

<p>Metadata is accessed entirely through one setf-able function:
<code class="language-plaintext highlighter-rouge">elfeed-meta</code>. For example, you might want to track <em>when</em> you’ve read
something, not just that you’ve read it. You could use this to
selectively update certain feeds or just to evaluate your own habits.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">my-elfeed-mark-read</span> <span class="p">(</span><span class="nv">entry</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">elfeed-untag</span> <span class="nv">entry</span> <span class="ss">'unread</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">date</span> <span class="p">(</span><span class="nv">format-time-string</span> <span class="s">"%FT%T%z"</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">elfeed-meta</span> <span class="nv">entry</span> <span class="ss">:read-date</span><span class="p">)</span> <span class="nv">date</span><span class="p">)))</span>
</code></pre></div></div>

<p>Two things motivated this feature. First, without a plist, if I added
more properties in the future, I would need to change the database
format to support them. I modified the database format to add
metadata, requiring an upgrade function to quietly upgrade older
databases as they were loaded. I’d really like to avoid this in the
future.</p>

<p>Second, I wanted to make it easy for extension authors to store their
own data. I still imagine an extension someday to update feeds
intelligently based on their history. For example, the database
doesn’t track when the feed was last fetched, just the date of the
most recent entry (if any). A smart-update extension could use
metadata to tag feeds with this information.</p>

<p>Elfeed itself already uses two metadata keys: <code class="language-plaintext highlighter-rouge">:failures</code> on feeds and
<code class="language-plaintext highlighter-rouge">:title</code> on both. <code class="language-plaintext highlighter-rouge">:failures</code> counts the total number of times
fetching that feed resulted in an error. You could use this get a
listing of troublesome feeds like so,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">url</span> <span class="nv">in</span> <span class="p">(</span><span class="nv">elfeed-feed-list</span><span class="p">)</span>
         <span class="nv">for</span> <span class="nv">feed</span> <span class="nb">=</span> <span class="p">(</span><span class="nv">elfeed-db-get-feed</span> <span class="nv">url</span><span class="p">)</span>
         <span class="nv">for</span> <span class="nv">failures</span> <span class="nb">=</span> <span class="p">(</span><span class="nv">elfeed-meta</span> <span class="nv">feed</span> <span class="ss">:failures</span><span class="p">)</span>
         <span class="nb">when</span> <span class="nv">failures</span>
         <span class="nv">collect</span> <span class="p">(</span><span class="nb">cons</span> <span class="nv">url</span> <span class="nv">failures</span><span class="p">))</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">:title</code> property allows for a custom title for both feeds and
entries in the search buffer listing, assuming you’re using the
default function (see below). It overrides the title provided by the
feed itself. This is different than <code class="language-plaintext highlighter-rouge">elfeed-entry-title</code> and
<code class="language-plaintext highlighter-rouge">elfeed-feed-title</code>, which is kept in sync with feed content. Metadata
is not kept in sync with the feed itself.</p>

<h4 id="filter-inversion">Filter Inversion</h4>

<p>You can invert filter components by prefixing them with <code class="language-plaintext highlighter-rouge">!</code>. For
example, say you’re looking at all my posts from the past 6 months:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@6-months nullprogram.com
</code></pre></div></div>

<p>But say you’re tired of me and decide you want to see every entry from
the past 6 months <em>excluding</em> my posts.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@6-months !nullprogram.com
</code></pre></div></div>

<h4 id="filter-limiter">Filter Limiter</h4>

<p>Normally you limit the number of results by date, but you can now
limit the result by count using <code class="language-plaintext highlighter-rouge">#n</code>. For example, to see my most
recent 12 posts regardless of date,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>nullprogram.com #12
</code></pre></div></div>

<p>This is used internally in the live filter to limit the number of
results to the height of the screen. If you noticed that live
filtering has been much more responsive in the last few months, this is
probably why.</p>

<h4 id="bookmark-support">Bookmark Support</h4>

<p>Elfeed properly integrates with Emacs’ bookmarks (<a href="https://github.com/skeeto/elfeed/issues/110">thanks to
groks</a>). You can bookmark the current filter with <code class="language-plaintext highlighter-rouge">M-x
bookmark-set</code> (<code class="language-plaintext highlighter-rouge">C-x r m</code>). By default, Emacs will persist bookmarks
between sessions. To revisit a filter in the future, <code class="language-plaintext highlighter-rouge">M-x
bookmark-jump</code> (<code class="language-plaintext highlighter-rouge">C-x r b</code>).</p>

<p>Since this requires no configuration, this may serve as an easy
replacement for manually building “view” toggles — filters bound to
certain keys — which I know many users have done, including me.</p>

<h4 id="new-header">New Header</h4>

<p>If you’ve updated very recently, you probably noticed Elfeed got a
brand new header. Previously it faked a header by writing to the first
line of the buffer. This is because somehow I had no idea Emacs had
official support for buffer headers (despite notmuch using them all
this time).</p>

<p>The new header includes additional information, such as the current
filter, the number of unread entries, the total number of entries, and
the number of unique feeds currently in view. You’ll see this as
<code class="language-plaintext highlighter-rouge">&lt;unread&gt;/&lt;total&gt;:&lt;feeds&gt;</code> in the middle of the header.</p>

<p>As of this writing, the new header has not been made part of a formal
release. So if you’re only tracking stable releases, you won’t see
this for awhile longer.</p>

<p>You can supply your own header via <code class="language-plaintext highlighter-rouge">elfeed-search-header-function</code>
(<a href="https://github.com/skeeto/elfeed/issues/111">thanks to Gergely Nagy</a>).</p>

<h4 id="scoped-updates">Scoped Updates</h4>

<p>As you already know, in the search buffer listing you can press <code class="language-plaintext highlighter-rouge">G</code> to
update your feeds. But did you know you it takes a prefix argument?
Run as <code class="language-plaintext highlighter-rouge">C-u G</code>, it only updates feeds with entries currently listed in
the buffer.</p>

<p>As of this writing, this is another feature not yet in a formal
release. I’d been wanting something like this for awhile but couldn’t
think of a reasonable interface. Directly prompting the user for feeds
is neither elegant nor composable. However, groks <a href="https://github.com/skeeto/elfeed/issues/109">suggested the
prefix argument</a>, which composes perfectly with Elfeed’s
existing idioms.</p>

<h4 id="listing-customizations">Listing Customizations</h4>

<p>In addition to custom faces, there are a number of ways to customize
the listing.</p>

<ul>
  <li>Choose the sort order with <code class="language-plaintext highlighter-rouge">elfeed-sort-order</code>.</li>
  <li>Set a custom date format with <code class="language-plaintext highlighter-rouge">elfeed-search-date-format</code>.</li>
  <li>Adjust field widths with <code class="language-plaintext highlighter-rouge">elfeed-search-*-width</code>.</li>
  <li>Or override everything with <code class="language-plaintext highlighter-rouge">elfeed-search-print-entry-function</code>.</li>
</ul>

<p>Gergely Nagy has been throwing lots of commits at me over the last
couple of weeks to open up lots of Elfeed’s behavior to customization,
so there are more to come.</p>

<h3 id="thank-you-emacs-community">Thank You, Emacs Community</h3>

<p>Apologies about any features I missed or anyone I forgot to mention
who’s made contributions. The above comes from my ChangeLogs, the
commit log, the GitHub issue listing, and my own memory, so I’m likely
to have forgotten some things. A couple of these features I had
forgotten about myself!</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Quickly Access x86 Documentation in Emacs</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2015/11/21/"/>
    <id>urn:uuid:982279c7-22a7-3b69-016b-749883870385</id>
    <updated>2015-11-21T05:42:17Z</updated>
    <category term="x86"/><category term="emacs"/>
    <content type="html">
      <![CDATA[<p>I recently released an Emacs package called <a href="https://github.com/skeeto/x86-lookup"><strong>x86-lookup</strong></a>.
Given a mnemonic, Emacs will open up a local copy of an <a href="http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html">Intel’s
software developer manual</a> PDF at the page documenting the
instruction. It complements <a href="/blog/2015/04/19/">nasm-mode</a>, released earlier this
year.</p>

<ul>
  <li><a href="https://github.com/skeeto/x86-lookup">https://github.com/skeeto/x86-lookup</a></li>
</ul>

<p>x86-lookup is also available from <a href="https://melpa.org/">MELPA</a>.</p>

<p>To use it, you’ll need <a href="http://poppler.freedesktop.org/">Poppler’s</a> pdftotext command line
program — used to build an index of the PDF — and a copy of the
complete Volume 2 of Intel’s instruction set manual. There’s only one
command to worry about: <code class="language-plaintext highlighter-rouge">M-x x86-lookup</code>.</p>

<h3 id="minimize-documentation-friction">Minimize documentation friction</h3>

<p>This package should be familiar to anyone who’s used
<a href="https://github.com/skeeto/javadoc-lookup">javadoc-lookup</a>, one of my older packages. It has a common
underlying itch: the context switch to read API documentation while
coding should have as little friction as possible, otherwise I’m
discouraged from doing it. In an ideal world I wouldn’t ever need to
check documentation because it’s already in my head. By visiting
documentation frequently with ease, it’s going to become familiar that
much faster and I’ll be reaching for it less and less, approaching the
ideal.</p>

<p>I picked up x86 assembly [about a year ago][x86] and for the first few
months I struggled to find a good online reference for the instruction
set. There are little scraps here and there, but not much of
substance. The big exception is <a href="http://www.felixcloutier.com/x86/">Félix Cloutier’s reference</a>,
which is an amazingly well-done HTML conversion of Intel’s PDF
manuals. Unfortunately I could never get it working locally to
generate my own. There’s also the <a href="http://ref.x86asm.net/">X86 Opcode and Instruction
Reference</a>, but it’s more for machines than humans.</p>

<p>Besides, I often work without an Internet connection, so offline
documentation is absolutely essential. (You hear that Microsoft? Not
only do I avoid coding against Win32 because it’s badly designed, but
even more so because you don’t offer offline documentation anymore!
The friction to API reference your documentation is enormous.)</p>

<p>I avoided the official x86 documentation for awhile, thinking it would
be too opaque, at least until I became more accustomed to the
instruction set. But really, it’s not bad! With a handle on the
basics, I would encourage anyone to dive into either Intel’s or <a href="http://developer.amd.com/resources/documentation-articles/developer-guides-manuals/">AMD’s
manuals</a>. The reason there’s not much online in HTML form is
because these manuals are nearly everything you need.</p>

<p>I chose Intel’s manuals for x86-lookup because I’m more familiar with
it, it’s more popular, it’s (slightly) easier to parse, it’s offered
as a single PDF, and it’s more complete. The regular expression for
finding instructions is tuned for Intel’s manual and it won’t work
with AMD’s manuals.</p>

<p>For a couple months prior to writing x86-lookup, I had a couple of
scratch functions to very roughly accomplish the same thing. The
tipping point for formalizing it was that last month I wrote my own
x86 assembler. A single mnemonic often has a dozen or more different
opcodes depending on the instruction’s operands, and there are often
several ways to encode the same operation. I was frequently looking up
opcodes, and navigating the PDF quickly became a real chore. I only
needed about 80 different opcodes, so I was just adding them to the
assembler’s internal table manually as needed.</p>

<h3 id="how-does-it-work">How does it work?</h3>

<p>Say you want to look up the instruction RDRAND.</p>

<p><img src="/img/screenshot/rdrand-pdf.png" alt="" /></p>

<p>Initially Emacs has no idea what page this is on, so the first step is
to build an index mapping mnemonics to pages. x86-lookup runs the
pdftotext command line program on the PDF and loads the result into a
temporary buffer.</p>

<p>The killer feature of pdftotext is that it emits FORM FEED (U+0012)
characters between pages. Think of these as page breaks. By counting
form feed characters, x86-lookup can track the page for any part of
the document. In fact, Emacs is already set up to do this with its
<code class="language-plaintext highlighter-rouge">forward-page</code> and <code class="language-plaintext highlighter-rouge">backward-page</code> commands. So to build the index,
x86-lookup steps forward page-by-page looking for mnemonics, keeping
note of the page. Since this process typically takes about 10 seconds,
the index is cached in a file (see <code class="language-plaintext highlighter-rouge">x86-lookup-cache-directory</code>) for
future use. It only needs to happen once for a particular manual on a
particular computer.</p>

<p>The mnemonic listing is slightly incomplete, so x86-lookup expands
certain mnemonics into the familiar set. For example, all the
conditional jumps are listed under “Jcc,” but this is probably not
what you’d expect to look up. I compared x86-lookup’s mnemonic listing
against NASM/nasm-mode’s mnemonics to ensure everything was accounted
for. Both packages benefited from this process.</p>

<p>Once the index is built, pdftotext is no longer needed. If you’re
desperate and don’t have this program available, you can borrow the
index file from another computer. But you’re on your own for figuring
that out!</p>

<p>So to look up RDRAND, x86-lookup checks the index for the page number
and invokes a PDF reader on that page. This is where not all PDF
readers are created equal. There’s no convention for opening a PDF to
a particular page and each PDF reader differs. Some don’t even support
it. To deal with this, x86-lookup has a function specialized for
different PDF readers. Similar to <code class="language-plaintext highlighter-rouge">browse-url-browser-function</code>,
x86-lookup has <code class="language-plaintext highlighter-rouge">x86-lookup-browse-pdf-function</code>.</p>

<p>By default it tries to open the PDF for viewing within Emacs (did you
know Emacs is a PDF viewer?), falling back to on options if the
feature is unavailable. I welcome pull requests for any PDF readers
not yet supported by x86-lookup. Perhaps this functionality deserves
its own package.</p>

<p>That’s it! It’s a simple feature that has already saved me a lot of
time. If you’re ever programming in x86 assembly, give x86-lookup a
spin.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>RSA Signatures in Emacs Lisp</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2015/10/30/"/>
    <id>urn:uuid:9d9ef14d-d513-3cad-b053-fb016f3c3bf0</id>
    <updated>2015-10-30T22:35:13Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="lisp"/>
    <content type="html">
      <![CDATA[<p>Emacs comes with a wonderful arbitrary-precision computer algebra
system called <a href="http://www.gnu.org/software/emacs/manual/html_mono/calc.html">calc</a>. I’ve <a href="/blog/2009/06/23/">discussed it previously</a> and
continue to use it on a daily basis. That’s right, people, <em>Emacs can
do calculus</em>. Like everything Emacs, it’s programmable and extensible
from Emacs Lisp. In this article, I’m going to implement the <a href="https://en.wikipedia.org/wiki/RSA_(cryptosystem)">RSA
public-key cryptosystem</a> in Emacs Lisp using calc.</p>

<p>If you want to dive right in first, here’s the repository:</p>

<ul>
  <li><a href="https://github.com/skeeto/emacs-rsa">https://github.com/skeeto/emacs-rsa</a></li>
</ul>

<p>This is only a toy implementation and not really intended for serious
cryptographic work. It’s also far too slow when using keys of
reasonable length.</p>

<h3 id="evaluation-with-calc">Evaluation with calc</h3>

<p>The calc package is particularly useful when considering Emacs’
limited integer type. Emacs uses a tagged integer scheme where
integers are embedded within pointers. It’s a lot faster than the
alternative (individually-allocated integer objects), but it means
they’re always a few bits short of the platform’s native integer type.</p>

<p>calc has a large API, but the user-friendly porcelain for it is the
under-documented <code class="language-plaintext highlighter-rouge">calc-eval</code> function. It evaluates an expression
string with format-like argument substitutions (<code class="language-plaintext highlighter-rouge">$n</code>).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"2^16 - 1"</span><span class="p">)</span>
<span class="c1">;; =&gt; "65535"</span>

<span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"2^$1 - 1"</span> <span class="no">nil</span> <span class="mi">128</span><span class="p">)</span>
<span class="c1">;; =&gt; "340282366920938463463374607431768211455"</span>
</code></pre></div></div>

<p>Notice it returns strings, which is one of the ways calc represents
arbitrary precision numbers. For arguments, it accepts regular Elisp
numbers and strings just like this function returns. The implicit
radix is 10. To explicitly set the radix, prefix the number with the
radix and <code class="language-plaintext highlighter-rouge">#</code>. This is the same as in the user interface of calc. For
example:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"16#deadbeef"</span><span class="p">)</span>
<span class="c1">;; =&gt; "3735928559"</span>
</code></pre></div></div>

<p>The second argument (optional) to <code class="language-plaintext highlighter-rouge">calc-eval</code> adjusts its behavior.
Given <code class="language-plaintext highlighter-rouge">nil</code>, it simply evaluates the string and returns the result.
The manual documents the different options, but the only other
relevant option for RSA is the symbol <code class="language-plaintext highlighter-rouge">pred</code>, which asks it to return
a boolean “predicate” result.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"$1 &lt; $2"</span> <span class="ss">'pred</span> <span class="s">"4000"</span> <span class="s">"5000"</span><span class="p">)</span>
<span class="c1">;; =&gt; t</span>
</code></pre></div></div>

<h3 id="generating-primes">Generating primes</h3>

<p>RSA is founded on the difficulty of factoring large composites with
large factors. Generating an RSA keypair starts with generating two
prime numbers, <code class="language-plaintext highlighter-rouge">p</code> and <code class="language-plaintext highlighter-rouge">q</code>, and using these primes to compute two
mathematically related composite numbers.</p>

<p>calc has a function <code class="language-plaintext highlighter-rouge">calc-next-prime</code> for finding the next prime
number following any arbitrary number. It uses a probabilistic
primarily test — the <del>Fermat</del> Miller-Rabin primality test
— to efficiently test large integers. It increments the input until
it finds a result that passes enough iterations of the primality test.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"nextprime($1)"</span> <span class="no">nil</span> <span class="s">"100000000000000000"</span><span class="p">)</span>
<span class="c1">;; =&gt; "100000000000000003"</span>
</code></pre></div></div>

<p>So to generate a random n-bit prime, first generate a random n-bit
number and then increment it until a prime number is found.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Generate a 128-bit prime, 10 iterations (0.000084% error rate)</span>
<span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"nextprime(random(2^$1), 10)"</span> <span class="no">nil</span> <span class="mi">128</span><span class="p">)</span>
<span class="s">"111618319598394878409654851283959105123"</span>
</code></pre></div></div>

<p>Unfortunately calc’s <code class="language-plaintext highlighter-rouge">random</code> function is based on Emacs’ <code class="language-plaintext highlighter-rouge">random</code>
function, which is entirely unsuitable for cryptography. In the real
implementation I read n bits from <code class="language-plaintext highlighter-rouge">/dev/urandom</code> to generate an n-bit
number.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">with-temp-buffer</span>
  <span class="p">(</span><span class="nv">set-buffer-multibyte</span> <span class="no">nil</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">call-process</span> <span class="s">"head"</span> <span class="s">"/dev/urandom"</span> <span class="no">t</span> <span class="no">nil</span> <span class="s">"-c"</span> <span class="p">(</span><span class="nb">format</span> <span class="s">"%d"</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">bits</span> <span class="mi">8</span><span class="p">)))</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">f</span> <span class="p">(</span><span class="nv">apply-partially</span> <span class="nf">#'</span><span class="nb">format</span> <span class="s">"%02x"</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">concat</span> <span class="s">"16#"</span> <span class="p">(</span><span class="nv">mapconcat</span> <span class="nv">f</span> <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)</span> <span class="s">""</span><span class="p">))))</span>
</code></pre></div></div>

<p>(Note: <code class="language-plaintext highlighter-rouge">/dev/urandom</code> <em>is</em> the right choice. There’s <a href="http://www.2uo.de/myths-about-urandom/">no reason to use
<code class="language-plaintext highlighter-rouge">/dev/random</code> for generating keys</a>.)</p>

<h3 id="computing-e-and-d">Computing e and d</h3>

<p>From here the code just follows along from the Wikipedia article.
After generating the primes <code class="language-plaintext highlighter-rouge">p</code> and <code class="language-plaintext highlighter-rouge">q</code>, two composites are computed,
<code class="language-plaintext highlighter-rouge">n = p * q</code> and <code class="language-plaintext highlighter-rouge">i = (p - 1) * (q - 1)</code>. Lacking any reason to do
otherwise, I chose 65,537 for the public exponent <code class="language-plaintext highlighter-rouge">e</code>.</p>

<p>The function <code class="language-plaintext highlighter-rouge">rsa--inverse</code> is just a straight Emacs Lisp + calc
implementation of the extended Euclidean algorithm from <a href="https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm">the Wikipedia
article pseudocode</a>, computing <code class="language-plaintext highlighter-rouge">d ≡ e^-1 (mod i)</code>. It’s not much
use sharing it here, so take a look at the repository if you’re
curious.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">rsa-generate-keypair</span> <span class="p">(</span><span class="nv">bits</span><span class="p">)</span>
  <span class="s">"Generate a fresh RSA keypair plist of BITS length."</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">p</span> <span class="p">(</span><span class="nv">rsa-generate-prime</span> <span class="p">(</span><span class="nb">+</span> <span class="mi">1</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">bits</span> <span class="mi">2</span><span class="p">))))</span>
         <span class="p">(</span><span class="nv">q</span> <span class="p">(</span><span class="nv">rsa-generate-prime</span> <span class="p">(</span><span class="nb">+</span> <span class="mi">1</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">bits</span> <span class="mi">2</span><span class="p">))))</span>
         <span class="p">(</span><span class="nv">n</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"$1 * $2"</span> <span class="no">nil</span> <span class="nv">p</span> <span class="nv">q</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">i</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"($1 - 1) * ($2 - 1)"</span> <span class="no">nil</span> <span class="nv">p</span> <span class="nv">q</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">e</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"2^16+1"</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">d</span> <span class="p">(</span><span class="nv">rsa--inverse</span> <span class="nv">e</span> <span class="nv">i</span><span class="p">)))</span>
    <span class="o">`</span><span class="p">(</span><span class="ss">:public</span>  <span class="p">(</span><span class="ss">:n</span> <span class="o">,</span><span class="nv">n</span> <span class="ss">:e</span> <span class="o">,</span><span class="nv">e</span><span class="p">)</span> <span class="ss">:private</span> <span class="p">(</span><span class="ss">:n</span> <span class="o">,</span><span class="nv">n</span> <span class="ss">:d</span> <span class="o">,</span><span class="nv">d</span><span class="p">))))</span>
</code></pre></div></div>

<p>The public key is <code class="language-plaintext highlighter-rouge">n</code> and <code class="language-plaintext highlighter-rouge">e</code> and the private key is <code class="language-plaintext highlighter-rouge">n</code> and <code class="language-plaintext highlighter-rouge">d</code>. From
here we can compute and verify cryptographic signatures.</p>

<h3 id="signatures">Signatures</h3>

<p>To compute signature <code class="language-plaintext highlighter-rouge">s</code> of an integer <code class="language-plaintext highlighter-rouge">m</code> (where <code class="language-plaintext highlighter-rouge">m &lt; n</code>), compute
<code class="language-plaintext highlighter-rouge">s ≡ m^d (mod n)</code>. I chose the right-to-left binary method, again
straight from <a href="https://en.wikipedia.org/wiki/Modular_exponentiation#Right-to-left_binary_method">the Wikipedia pseudocode</a> (lazy!). I’ll share this
one since it’s short. The backslash denotes integer division.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">rsa--mod-pow</span> <span class="p">(</span><span class="nv">base</span> <span class="nv">exponent</span> <span class="nv">modulus</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">result</span> <span class="mi">1</span><span class="p">))</span>
    <span class="p">(</span><span class="nb">setf</span> <span class="nv">base</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"$1 % $2"</span> <span class="no">nil</span> <span class="nv">base</span> <span class="nv">modulus</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">while</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"$1 &gt; 0"</span> <span class="ss">'pred</span> <span class="nv">exponent</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"$1 % 2 == 1"</span> <span class="ss">'pred</span> <span class="nv">exponent</span><span class="p">)</span>
        <span class="p">(</span><span class="nb">setf</span> <span class="nv">result</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"($1 * $2) % $3"</span> <span class="no">nil</span> <span class="nv">result</span> <span class="nv">base</span> <span class="nv">modulus</span><span class="p">)))</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="nv">exponent</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"$1 \\ 2"</span> <span class="no">nil</span> <span class="nv">exponent</span><span class="p">)</span>
            <span class="nv">base</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"($1 * $1) % $2"</span> <span class="no">nil</span> <span class="nv">base</span> <span class="nv">modulus</span><span class="p">)))</span>
    <span class="nv">result</span><span class="p">))</span>
</code></pre></div></div>

<p>Verifying the signature is the same process, but with the public key’s
<code class="language-plaintext highlighter-rouge">e</code>: <code class="language-plaintext highlighter-rouge">m ≡ s^e (mod n)</code>. If the signature is valid, <code class="language-plaintext highlighter-rouge">m</code> will be
recovered. In theory, only someone who knows <code class="language-plaintext highlighter-rouge">d</code> can feasibly compute
<code class="language-plaintext highlighter-rouge">s</code> from <code class="language-plaintext highlighter-rouge">m</code>. If <code class="language-plaintext highlighter-rouge">n</code> is <a href="http://crypto.stackexchange.com/a/5942">small enough to factor</a>, revealing
<code class="language-plaintext highlighter-rouge">p</code> and <code class="language-plaintext highlighter-rouge">q</code>, then <code class="language-plaintext highlighter-rouge">d</code> can be feasibly recomputed from the public key.
So mind your Ps and Qs.</p>

<p>So that leaves one problem: generally users want to sign strings and
files and such, not integers. A hash function is used to reduce an
arbitrary quantity of data into an integer suitable for signing. Emacs
comes with a bunch of them, accessible through <code class="language-plaintext highlighter-rouge">secure-hash</code>. It
hashes strings and buffers.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">secure-hash</span> <span class="ss">'sha224</span> <span class="s">"Hello, world!"</span><span class="p">)</span>
<span class="c1">;; =&gt; "8552d8b7a7dc5476cb9e25dee69a8091290764b7f2a64fe6e78e9568"</span>
</code></pre></div></div>

<p>Since the result is hexadecimal, just prefix <code class="language-plaintext highlighter-rouge">16#</code> to turn it into a
calc integer.</p>

<p>Here’s the signature and verification functions. Any string or buffer
can be signed.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">rsa-sign</span> <span class="p">(</span><span class="nv">private-key</span> <span class="nv">object</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">n</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">private-key</span> <span class="ss">:n</span><span class="p">))</span>
        <span class="p">(</span><span class="nv">d</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">private-key</span> <span class="ss">:d</span><span class="p">))</span>
        <span class="p">(</span><span class="nv">hash</span> <span class="p">(</span><span class="nv">concat</span> <span class="s">"16#"</span> <span class="p">(</span><span class="nv">secure-hash</span> <span class="ss">'sha384</span> <span class="nv">object</span><span class="p">))))</span>
    <span class="c1">;; truncate hash such that hash &lt; n</span>
    <span class="p">(</span><span class="nv">while</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"$1 &gt; $2"</span> <span class="ss">'pred</span> <span class="nv">hash</span> <span class="nv">n</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="nv">hash</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"$1 \\ 2"</span> <span class="no">nil</span> <span class="nv">hash</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">rsa--mod-pow</span> <span class="nv">hash</span> <span class="nv">d</span> <span class="nv">n</span><span class="p">)))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">rsa-verify</span> <span class="p">(</span><span class="nv">public-key</span> <span class="nv">object</span> <span class="nv">sig</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">n</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">public-key</span> <span class="ss">:n</span><span class="p">))</span>
        <span class="p">(</span><span class="nv">e</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">public-key</span> <span class="ss">:e</span><span class="p">))</span>
        <span class="p">(</span><span class="nv">hash</span> <span class="p">(</span><span class="nv">concat</span> <span class="s">"16#"</span> <span class="p">(</span><span class="nv">secure-hash</span> <span class="ss">'sha384</span> <span class="nv">object</span><span class="p">))))</span>
    <span class="c1">;; truncate hash such that hash &lt; n</span>
    <span class="p">(</span><span class="nv">while</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"$1 &gt; $2"</span> <span class="ss">'pred</span> <span class="nv">hash</span> <span class="nv">n</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="nv">hash</span> <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"$1 \\ 2"</span> <span class="no">nil</span> <span class="nv">hash</span><span class="p">)))</span>
    <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">result</span> <span class="p">(</span><span class="nv">rsa--mod-pow</span> <span class="nv">sig</span> <span class="nv">e</span> <span class="nv">n</span><span class="p">)))</span>
      <span class="p">(</span><span class="nv">calc-eval</span> <span class="s">"$1 == $2"</span> <span class="ss">'pred</span> <span class="nv">result</span> <span class="nv">hash</span><span class="p">))))</span>
</code></pre></div></div>

<p>Note the hash truncation step. If this is actually necessary, then
your <code class="language-plaintext highlighter-rouge">n</code> is <em>very</em> easy to factor! It’s in there since this is just a
toy and I want it to work with small keys.</p>

<h3 id="putting-it-all-together">Putting it all together</h3>

<p>Here’s the whole thing in action with an extremely small, 128-bit key.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">message</span> <span class="s">"hello, world!"</span><span class="p">)</span>

<span class="p">(</span><span class="nb">setf</span> <span class="nv">keypair</span> <span class="p">(</span><span class="nv">rsa-generate-keypair</span> <span class="mi">128</span><span class="p">))</span>
<span class="c1">;; =&gt; (:public  (:n "74924929503799951536367992905751084593"</span>
<span class="c1">;;               :e "65537")</span>
<span class="c1">;;     :private (:n "74924929503799951536367992905751084593"</span>
<span class="c1">;;               :d "36491277062297490768595348639394259869"))</span>

<span class="p">(</span><span class="nb">setf</span> <span class="nv">sig</span> <span class="p">(</span><span class="nv">rsa-sign</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">keypair</span> <span class="ss">:private</span><span class="p">)</span> <span class="nv">message</span><span class="p">))</span>
<span class="c1">;; =&gt; "31982247477262471348259501761458827454"</span>

<span class="p">(</span><span class="nv">rsa-verify</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">keypair</span> <span class="ss">:public</span><span class="p">)</span> <span class="nv">message</span> <span class="nv">sig</span><span class="p">)</span>
<span class="c1">;; =&gt; t</span>

<span class="p">(</span><span class="nv">rsa-verify</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">keypair</span> <span class="ss">:public</span><span class="p">)</span> <span class="p">(</span><span class="nv">capitalize</span> <span class="nv">message</span><span class="p">)</span> <span class="nv">sig</span><span class="p">)</span>
<span class="c1">;; =&gt; nil</span>
</code></pre></div></div>

<p>Each of these operations took less than a second. For larger,
secure-length keys, this implementation is painfully slow. For
example, generating a 2048-bit key takes my laptop about half an hour,
and computing a signature with that key (any size message) takes about
a minute. That’s probably a little too slow for, say, signing ELPA
packages.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Counting Processor Cores in Emacs</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2015/10/14/"/>
    <id>urn:uuid:dbfba1a0-b3af-356d-4d01-96917d622906</id>
    <updated>2015-10-14T03:17:16Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="c"/><category term="cpp"/>
    <content type="html">
      <![CDATA[<p>One of the great advantages of dependency analysis is parallelization.
Modern processors reorder instructions whose results don’t affect each
other. Compilers reorder expressions and statements to improve
throughput. Build systems know which outputs are inputs for other
targets and can choose any arbitrary build order within that
constraint. This article involves the last case.</p>

<p>The build system I use most often is GNU Make, either directly or
indirectly (Autoconf, CMake). It’s far from perfect, but it does what
I need. I almost always invoke it from within Emacs rather than in a
terminal. In fact, I do it so often that I’ve wrapped Emacs’ <code class="language-plaintext highlighter-rouge">compile</code>
command for rapid invocation.</p>

<p>I recently helped a co-worker set this set up for himself, so it had
me thinking about the problem again. The situation <a href="https://github.com/skeeto/.emacs.d">in my
config</a> is much more complicated than it needs to be, so I’ll
share a simplified version instead.</p>

<p>First bring in the usual goodies (we’re going to be making closures):</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: t; -*-</span>
<span class="p">(</span><span class="nb">require</span> <span class="ss">'cl-lib</span><span class="p">)</span>
</code></pre></div></div>

<p>We need a couple of configuration variables.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">quick-compile-command</span> <span class="s">"make -k "</span><span class="p">)</span>
<span class="p">(</span><span class="nb">defvar</span> <span class="nv">quick-compile-build-file</span> <span class="s">"Makefile"</span><span class="p">)</span>
</code></pre></div></div>

<p>Then a couple of interactive functions to set these on the fly. It’s
not strictly necessary, but I like giving each a key binding. I also
like having a history available via <code class="language-plaintext highlighter-rouge">read-string</code>, so I can switch
between a couple of different options with ease.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">quick-compile-set-command</span> <span class="p">(</span><span class="nv">command</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">interactive</span>
   <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="nv">read-string</span> <span class="s">"Command: "</span> <span class="nv">quick-compile-command</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="nv">quick-compile-command</span> <span class="nv">command</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">quick-compile-set-build-file</span> <span class="p">(</span><span class="nv">build-file</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">interactive</span>
   <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="nv">read-string</span> <span class="s">"Build file: "</span> <span class="nv">quick-compile-build-file</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="nv">quick-compile-build-file</span> <span class="nv">build-file</span><span class="p">))</span>
</code></pre></div></div>

<p>Now finally to the good part. Below, <code class="language-plaintext highlighter-rouge">quick-compile</code> is a
non-interactive function that returns an interactive closure ready to
be bound to any key I desire. It takes an optional target. This means
I don’t use the above <code class="language-plaintext highlighter-rouge">quick-compile-set-command</code> to choose a target,
only for setting other options. That will make more sense in a moment.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-defun</span> <span class="nv">quick-compile</span> <span class="p">(</span><span class="k">&amp;optional</span> <span class="p">(</span><span class="nv">target</span> <span class="s">""</span><span class="p">))</span>
  <span class="s">"Return an interaction function that runs `compile' for TARGET."</span>
  <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
    <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">save-buffer</span><span class="p">)</span>  <span class="c1">; so I don't get asked</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">default-directory</span>
            <span class="p">(</span><span class="nv">locate-dominating-file</span>
             <span class="nv">default-directory</span> <span class="nv">quick-compile-build-file</span><span class="p">)))</span>
      <span class="p">(</span><span class="k">if</span> <span class="nv">default-directory</span>
          <span class="p">(</span><span class="nb">compile</span> <span class="p">(</span><span class="nv">concat</span> <span class="nv">quick-compile-command</span> <span class="s">" "</span> <span class="nv">target</span><span class="p">))</span>
        <span class="p">(</span><span class="nb">error</span> <span class="s">"Cannot find %s"</span> <span class="nv">quick-compile-build-file</span><span class="p">)))))</span>
</code></pre></div></div>

<p>It traverses up (down?) the directory hierarchy towards root looking
for a Makefile — or whatever is set for <code class="language-plaintext highlighter-rouge">quick-compile-build-file</code>
— then invokes the build system there. I <a href="http://aegis.sourceforge.net/auug97.pdf">don’t believe in recursive
<code class="language-plaintext highlighter-rouge">make</code></a>.</p>

<p>So how do I put this to use? I clobber some key bindings I don’t
otherwise care about. A better choice might be the F-keys, but my
muscle memory is already committed elsewhere.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">global-set-key</span> <span class="p">(</span><span class="nv">kbd</span> <span class="s">"C-x c"</span><span class="p">)</span> <span class="p">(</span><span class="nv">quick-compile</span><span class="p">))</span> <span class="c1">; default target</span>
<span class="p">(</span><span class="nv">global-set-key</span> <span class="p">(</span><span class="nv">kbd</span> <span class="s">"C-x C"</span><span class="p">)</span> <span class="p">(</span><span class="nv">quick-compile</span> <span class="s">"clean"</span><span class="p">))</span>
<span class="p">(</span><span class="nv">global-set-key</span> <span class="p">(</span><span class="nv">kbd</span> <span class="s">"C-x t"</span><span class="p">)</span> <span class="p">(</span><span class="nv">quick-compile</span> <span class="s">"test"</span><span class="p">))</span>
<span class="p">(</span><span class="nv">global-set-key</span> <span class="p">(</span><span class="nv">kbd</span> <span class="s">"C-x r"</span><span class="p">)</span> <span class="p">(</span><span class="nv">quick-compile</span> <span class="s">"run"</span><span class="p">))</span>
</code></pre></div></div>

<p>Each of those invokes a different target without second guessing me.
Let me tell you, having “clean” at the tip of my fingers is wonderful.</p>

<h3 id="parallel-builds">Parallel Builds</h3>

<p>An extension common to many different <code class="language-plaintext highlighter-rouge">make</code> programs is <code class="language-plaintext highlighter-rouge">-j</code>, which
asks <code class="language-plaintext highlighter-rouge">make</code> to build targets in parallel where possible. These days
where multi-core machines are the norm, you nearly always want to use
this option, ideally set to the number of logical processor cores on
your system. It’s a huge time-saver.</p>

<p>My recent revelation was that my default build command could be
better: <code class="language-plaintext highlighter-rouge">make -k</code> is minimal. It should at least include <code class="language-plaintext highlighter-rouge">-j</code>, but
choosing an argument (number of processor cores) is a problem. Today I
use different machines with 2, 4, or 8 cores, so most of the time any
given number will be wrong. I could use a per-system configuration,
but I’d rather not. Unfortunately GNU Make will not automatically
detect the number of cores. That leaves the matter up to Emacs Lisp.</p>

<p>Emacs doesn’t currently have a built-in function that returns the
number of processor cores. I’ll need to reach into the operating
system to figure it out. My usual development environments are Linux,
Windows, and OpenBSD, so my solution should work on each. I’ve ranked
them by order of importance.</p>

<h4 id="number-of-cores-on-linux">Number of cores on Linux</h4>

<p>Linux has the <code class="language-plaintext highlighter-rouge">/proc</code> virtual filesystem in the fashion of Plan 9,
allowing different aspects of the system to be explored through the
standard filesystem API. The relevant file here is <code class="language-plaintext highlighter-rouge">/proc/cpuinfo</code>,
listing useful information about each of the system’s processors. To
get the number of processors, count the number of processor entries in
this file. I’ve wrapped it in <code class="language-plaintext highlighter-rouge">if-file-exists</code> so that it returns
<code class="language-plaintext highlighter-rouge">nil</code> on other operating systems instead of throwing an error.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nv">file-exists-p</span> <span class="s">"/proc/cpuinfo"</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">with-temp-buffer</span>
    <span class="p">(</span><span class="nv">insert-file-contents</span> <span class="s">"/proc/cpuinfo"</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">how-many</span> <span class="s">"^processor[[:space:]]+:"</span><span class="p">)))</span>
</code></pre></div></div>

<h4 id="number-of-cores-on-windows">Number of cores on Windows</h4>

<p>When I was first researching how to do this on Windows, I thought I
would need to invoke the <code class="language-plaintext highlighter-rouge">wmic</code> command line program and hope the
output could be parsed the same way on different versions of the
operating system and tool. However, it turns out the solution for
Windows is trivial. The environment variable <code class="language-plaintext highlighter-rouge">NUMBER_OF_PROCESSORS</code>
gives every process the answer for free. Being an environment
variable, it will need to be parsed.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">number-of-processors</span> <span class="p">(</span><span class="nv">getenv</span> <span class="s">"NUMBER_OF_PROCESSORS"</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">when</span> <span class="nv">number-of-processors</span>
    <span class="p">(</span><span class="nv">string-to-number</span> <span class="nv">number-of-processors</span><span class="p">)))</span>
</code></pre></div></div>

<h4 id="number-of-cores-on-bsd">Number of cores on BSD</h4>

<p>This seems to work the same across all the BSDs, including OS X,
though I haven’t yet tested it exhaustively. Invoke <code class="language-plaintext highlighter-rouge">sysctl</code>, which
returns an undecorated number to be parsed.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">with-temp-buffer</span>
  <span class="p">(</span><span class="nb">ignore-errors</span>
    <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nb">zerop</span> <span class="p">(</span><span class="nv">call-process</span> <span class="s">"sysctl"</span> <span class="no">nil</span> <span class="no">t</span> <span class="no">nil</span> <span class="s">"-n"</span> <span class="s">"hw.ncpu"</span><span class="p">))</span>
      <span class="p">(</span><span class="nv">string-to-number</span> <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)))))</span>
</code></pre></div></div>

<p>Also not complicated, but it’s the heaviest solution of the three.</p>

<h3 id="putting-it-all-together">Putting it all together</h3>

<p>Join all these together with <code class="language-plaintext highlighter-rouge">or</code>, call it <code class="language-plaintext highlighter-rouge">numcores</code>, and ta-da.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">quick-compile-command</span> <span class="p">(</span><span class="nb">format</span> <span class="s">"make -kj%d"</span> <span class="p">(</span><span class="nv">numcores</span><span class="p">)))</span>
</code></pre></div></div>

<p>Now <code class="language-plaintext highlighter-rouge">make</code> is invoked correctly on any system by default.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>NASM x86 Assembly Major Mode for Emacs</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2015/04/19/"/>
    <id>urn:uuid:6966e5d3-9e81-3fc0-5d47-eeb29677f7da</id>
    <updated>2015-04-19T02:38:23Z</updated>
    <category term="emacs"/><category term="x86"/>
    <content type="html">
      <![CDATA[<p>Last weekend I created a new Emacs mode, <a href="https://github.com/skeeto/nasm-mode"><strong>nasm-mode</strong></a>,
for editing <a href="http://www.nasm.us/">Netwide Assembler</a> (NASM) x86 assembly programs.
Over the past week I tweaked it until it felt comfortable enough to
share on <a href="http://melpa.org/">MELPA</a>. It’s got what you’d expect from a standard
Emacs programming language mode: syntax highlighting, automatic
indentation, and imenu support. It’s not a full parser, but it knows
all of NASM’s instructions and directives.</p>

<p>Until recently I didn’t really have preferences about x86 assemblers
(<a href="https://www.gnu.org/software/binutils/">GAS</a>, NASM, <a href="http://yasm.tortall.net/">YASM</a>, <a href="http://flatassembler.net/">FASM</a>, MASM, etc.) or syntax
(Intel, AT&amp;T). I stuck to the GNU Assembler (GAS) since it’s already
there with all the other GNU development tools I know and love, and
it’s required for inline assembly in GCC. However, nasm-mode now marks
my commitment to NASM as my primary x86 assembler.</p>

<h3 id="why-nasm">Why NASM?</h3>

<p>I need an assembler that can assemble 16-bit code (8086, 8088, 80186,
80286), because <a href="/blog/2014/12/09/">real mode is fun</a>. Despite its <code class="language-plaintext highlighter-rouge">.code16gcc</code>
directive, GAS is not suitable for this purpose. It’s <em>just</em> enough to
get the CPU into protected mode — as needed when writing an operating
system with GCC — and that’s it. A different assembler is required
for serious 16-bit programming.</p>

<p><a href="http://x86asm.net/articles/what-i-dislike-about-gas/">GAS syntax has problems</a>. I’m not talking about the argument
order (source first or destination first), since there’s no right
answer to that one. The linked article covers a number of problems,
with these being the big ones for me:</p>

<ul>
  <li>
    <p>The use of <code class="language-plaintext highlighter-rouge">%</code> sigils on all registers is tedious. I’m sure it’s
handy when generating code, where it becomes a register namespace,
but it’s annoying to write.</p>
  </li>
  <li>
    <p>Integer constants are an easy source of bugs. Forget the <code class="language-plaintext highlighter-rouge">$</code> and
suddenly you’re doing absolute memory access, which is a poor
default. NASM simplifies this by using brackets <code class="language-plaintext highlighter-rouge">[]</code> for all such
“dereferences.”</p>
  </li>
  <li>
    <p>GAS cannot produce pure binaries — raw machine code without any
headers or container (ELF, COFF, PE). Pure binaries are useful for
developing <a href="http://www.vividmachines.com/shellcode/shellcode.html">shellcode</a>, bootloaders, 16-bit COM programs,
and <a href="/blog/2015/03/19/">just-in-time compilers</a>.</p>
  </li>
</ul>

<p>Being a portable assembler, GAS is the jack of all instruction sets,
master of none. If I’m going to write a lot of x86 assembly, I want a
tool specialized for the job.</p>

<h4 id="yasm">YASM</h4>

<p>I also looked at YASM, a rewrite of NASM. It supports 16-bit assembly
and mostly uses NASM syntax. In my research I found that NASM used to
lag behind in features due to slower development, which is what
spawned YASM. In recent years this seems to have flipped around, with
YASM lagging behind. If you’re using YASM, nasm-mode should work
pretty well for you, since it’s still very similar.</p>

<p>YASM optionally supports GAS syntax, but this reintroduces almost all
of GAS’s problems. Even YASM’s improvements (i.e. its <code class="language-plaintext highlighter-rouge">ORG</code> directive)
become broken when switching to GAS syntax.</p>

<h4 id="fasm">FASM</h4>

<p>FASM is the “flat assembler,” an assembler written in assembly
language. This means it’s only available on x86 platforms. While I
don’t really plan on developing x86 assembly on a Raspberry Pi, I’d
rather not limit my options! I already regard 16-bit DOS programming
as a form of embedded programming, and this may very well extend to
the rest of x86 someday.</p>

<p>Also, it hasn’t made its way into the various Linux distribution
package repositories, including Debian, so it’s already at a
disadvantage for me.</p>

<h4 id="masm">MASM</h4>

<p>This is Microsoft’s assembler that comes with Visual Studio. Windows
only and not open source, this is in no way a serious consideration.
But since NASM’s syntax was originally derived from MASM, it’s worth
mentioning. NASM takes the good parts of MASM and <a href="https://courses.engr.illinois.edu/ece390/archive/mp/f99/mp5/masm_nasm.html">fixes the
mistakes</a> (such as the <code class="language-plaintext highlighter-rouge">offset</code> operator). It’s different enough
that nasm-mode would not work well with MASM.</p>

<h4 id="nasm">NASM</h4>

<p>It’s not perfect, but it’s got an <a href="http://www.nasm.us/doc/">excellent manual</a>, it’s a
solid program that does exactly what it says it will do, has a
powerful macro system, great 16-bit support, highly portable, easy to
build, and its semantics and syntax has been carefully considered. It
also comes with a simple, pure binary disassembler (<code class="language-plaintext highlighter-rouge">ndisasm</code>). In
retrospect it seems like an obvious choice!</p>

<p>My one complaint would be that it’s that it’s <em>too</em> flexible about
labels. The colon on labels is optional, which can lead to subtle
bugs. NASM will warn about this under some conditions (orphan-labels).
Combined with the preprocessor, the difference between a macro and a
label is ambiguous, short of re-implementing the entire preprocessor
in Emacs Lisp.</p>

<h3 id="why-nasm-mode">Why nasm-mode?</h3>

<p>Emacs comes with an <code class="language-plaintext highlighter-rouge">asm-mode</code> for editing assembly code for various
architectures. Unfortunately it’s another jack-of-all-trades that’s
not very good. More so, it doesn’t follow Emacs’ normal editing
conventions, having unusual automatic indentation and self-insertion
behaviors. It’s what prompted me to make nasm-mode.</p>

<p>To be fair, I don’t think it’s possible to write a major mode that
covers many different instruction set architectures. Each architecture
has its own quirks and oddities that essentially makes gives it a
unique language. This is especially true with x86, which, from its 37
year tenure touched by so many different vendors, comes in a number of
incompatible flavors. Each assembler/architecture pair needs its own
major mode. I hope I just wrote NASM’s.</p>

<p>One area where I’m still stuck is that I can’t find an x86 style
guide. It’s easy to find half a dozen style guides of varying
authority for any programming language that’s more than 10 years old
… except x86. There’s no obvious answer when it comes to automatic
indentation. How are comments formatted and indented? How are
instructions aligned? Should labels be on the same line as the
instruction? Should labels require a colon? (I’ve decided this is
“yes.”) What about long label names? How are function
prototypes/signatures documented? (The mode could take advantage of
such a standard, a la ElDoc.) It seems everyone uses their own style.
This is another conundrum for a generic asm-mode.</p>

<p>There are a couple of <a href="http://matthieuhauglustaine.blogspot.com/2011/08/nasm-mode-for-emacs.html">other nasm-modes</a> floating around with
different levels of completeness. Mine should supersede these, and
will be much easier to maintain into the future as NASM evolves.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Emacs Autotetris Mode</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/10/19/"/>
    <id>urn:uuid:e76556be-ebeb-3f65-7041-bffbe2e19952</id>
    <updated>2014-10-19T21:45:53Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="interactive"/>
    <content type="html">
      <![CDATA[<p>For more than a decade now, Emacs has come with a built-in Tetris
clone, originally written by XEmacs’ Glynn Clements. Just run <code class="language-plaintext highlighter-rouge">M-x
tetris</code> any time you want to play. For anyone too busy to waste time
playing Tetris, earlier this year I wrote an autotetris-mode that will
play the Emacs game automatically.</p>

<ul>
  <li><a href="https://github.com/skeeto/autotetris-mode">https://github.com/skeeto/autotetris-mode</a></li>
</ul>

<p>Load the source, <code class="language-plaintext highlighter-rouge">autotetris-mode.el</code> and <code class="language-plaintext highlighter-rouge">M-x autotetris</code>. It will
start the built-in Tetris but make all the moves itself. It works best
when byte compiled.</p>

<p><img src="/img/diagram/tetris/screenshot.png" alt="" /></p>

<p>At the time I had read <a href="http://www.cs.cornell.edu/boom/1999sp/projects/tetris/">an article</a> and was interested in trying
my hand at my own Tetris AI. Like most things Emacs, the built-in
Tetris game is very hackable. It’s also pretty simple and easy to
understand. Rather than write my own I chose to build upon this one.</p>

<h3 id="heuristics">Heuristics</h3>

<p>It’s not a particularly strong AI. It doesn’t pay attention to the
next piece in queue, it doesn’t know the game’s basic shapes, and it
doesn’t try to maximize the score (clearing multiple rows at once).
The goal is to continue running for as long as possible. But since
it’s able to get to the point where the game is so fast that the AI is
unable to move pieces fast enough (it’s rate limited like a human
player), that means it’s good enough.</p>

<p>When a new piece appears at the top of the screen, the AI, in memory,
tries placing it in all possible positions and all possible
orientations. For each of these positions it runs a heuristic on the
resulting game state, summing five metrics. Each metric is scaled by a
hand-tuned weight to adjust its relative priority. Smaller is better,
so the position with the lowest score is selected.</p>

<h4 id="number-of-holes">Number of Holes</h4>

<p><img src="/img/diagram/tetris/holes.png" alt="" /></p>

<p>A hole is any open space that has a solid block above it, even if that
hole is accessible without passing through a solid block. Count these
holes.</p>

<h4 id="maximum-height">Maximum Height</h4>

<p><img src="/img/diagram/tetris/height.png" alt="" /></p>

<p>Add the height of the tallest column. Column height includes any holes
in the column. The game ends when a column touches the top of the
screen (or something like that), so this should be kept in check.</p>

<h4 id="mean-height">Mean Height</h4>

<p><img src="/img/diagram/tetris/mean.png" alt="" /></p>

<p>Add the mean height of all columns. The higher this is, the closer we
are to losing the game. Since each row will have at least one hole,
this will be a similar measure to the hole count.</p>

<h4 id="height-disparity">Height Disparity</h4>

<p><img src="/img/diagram/tetris/disparity.png" alt="" /></p>

<p>Add the difference between the shortest column height and the tallest
column height. If this number is large it means we’re not making
effective use of the playing area. It also discourages the AI from
getting into that annoying situation we all remember: when you
<em>really</em> need a 4x1 piece that never seems to come. Those are the
brief moments when I truly believe the version I’m playing has to be
rigged.</p>

<h4 id="surface-roughness">Surface Roughness</h4>

<p><img src="/img/diagram/tetris/surface.png" alt="" /></p>

<p>Take the root mean square of the column heights. A rougher surface
leaves fewer options when placing pieces. This measure will be similar
to the disparity measurement.</p>

<h3 id="emacs-specific-details">Emacs-specific Details</h3>

<p>With a position selected, the AI sends player inputs at a limited rate
to the game itself, moving the piece into place. This is done by
calling <code class="language-plaintext highlighter-rouge">tetris-move-right</code>, <code class="language-plaintext highlighter-rouge">tetris-move-left</code>, and
<code class="language-plaintext highlighter-rouge">tetris-rotate-next</code>, which, in the normal game, are bound to the
arrow keys.</p>

<p>The built-in tetris-mode isn’t quite designed for this kind of
extension, so it needs a little bit of help. I defined two pieces of
advice to create hooks. These hooks alert my AI to two specific events
in the game: the game start and a fresh, new piece.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defadvice</span> <span class="nv">tetris-new-shape</span> <span class="p">(</span><span class="nv">after</span> <span class="nv">autotetris-new-shape-hook</span> <span class="nv">activate</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">run-hooks</span> <span class="ss">'autotetris-new-shape-hook</span><span class="p">))</span>

<span class="p">(</span><span class="nv">defadvice</span> <span class="nv">tetris-start-game</span> <span class="p">(</span><span class="nv">after</span> <span class="nv">autotetris-start-game-hook</span> <span class="nv">activate</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">run-hooks</span> <span class="ss">'autotetris-start-game-hook</span><span class="p">))</span>
</code></pre></div></div>

<p>I talked before about <a href="/blog/2014/10/12/">the problems with global state</a>.
Fortunately, tetris-mode doesn’t store any game state in global
variables. It stores everything in buffer-local variables, which can
be exploited for use in the AI. To perform the “in memory” heuristic
checks, it creates a copy of the game state and manipulates the copy.
The copy is made by way of <code class="language-plaintext highlighter-rouge">clone-buffer</code> on the <code class="language-plaintext highlighter-rouge">*Tetris*</code> buffer.
The tetris-mode functions all work equally as well on the clone, so I
can use the existing game rules to properly place the next piece in
each available position. The game’s own rules take care of clearing
rows and checking for collisions for me. I wrote an
<code class="language-plaintext highlighter-rouge">autotetris-save-excursion</code> function to handle the messy details.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmacro</span> <span class="nv">autotetris-save-excursion</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">body</span><span class="p">)</span>
  <span class="s">"Restore tetris game state after BODY completes."</span>
  <span class="p">(</span><span class="k">declare</span> <span class="p">(</span><span class="nv">indent</span> <span class="nb">defun</span><span class="p">))</span>
  <span class="o">`</span><span class="p">(</span><span class="nv">with-current-buffer</span> <span class="nv">tetris-buffer-name</span>
     <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">autotetris-saved</span> <span class="p">(</span><span class="nv">clone-buffer</span> <span class="s">"*Tetris-saved*"</span><span class="p">)))</span>
       <span class="p">(</span><span class="k">unwind-protect</span>
           <span class="p">(</span><span class="nv">with-current-buffer</span> <span class="nv">autotetris-saved</span>
             <span class="p">(</span><span class="nv">kill-local-variable</span> <span class="ss">'kill-buffer-hook</span><span class="p">)</span>
             <span class="o">,@</span><span class="nv">body</span><span class="p">)</span>
         <span class="p">(</span><span class="nv">kill-buffer</span> <span class="nv">autotetris-saved</span><span class="p">)))))</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">kill-buffer-hook</code> variable is also cloned, but I don’t want
tetris-mode to respond to the clone being killed, so I clear out the
hook.</p>

<p>That’s basically all there is to it! While watching it feels like it’s
making dumb mistakes, not placing pieces in optimal positions, but it
recovers well from these situations almost every time, so it must know
what it’s doing. Currently it’s a better player than me, which is <a href="/blog/2011/08/24/">my
rule-of-thumb</a> for calling an AI successful.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Emacs Unicode Pitfalls</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/06/13/"/>
    <id>urn:uuid:1ebd6db9-a40e-3433-dc30-192cf133b2f0</id>
    <updated>2014-06-13T05:58:34Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>GNU Emacs is seven years older than Unicode. Support for Unicode had
to be added relatively late in Emacs’ existence. This means Emacs has
existed longer without Unicode support (16 years) than with it (14
years). Despite this, Emacs has excellent Unicode support. It feels as
if it was there the whole time.</p>

<p>However, as a natural result of Unicode covering all sorts of edge
cases for every known human language, there are pitfalls and
complications. As a <em>user</em> of Emacs, you’re not particularly affected
by these, but extension developers might run into trouble while
handling Emacs character-oriented data structures: strings and
buffers.</p>

<p>In this article I’ll go over Elisp’s Unicode surprises. I’ve been
caught by some of these myself. In fact, as a result of writing this
article, I’ve discovered subtle encoding bugs in some of my own
extensions. None of these pitfalls are Emacs’ fault. They’re just the
result of complexities of natural language.</p>

<h3 id="unicode-and-code-points">Unicode and Code Points</h3>

<p>First, there are excellent materials online for learning Unicode. I
recommend starting with <a href="http://www.cl.cam.ac.uk/~mgk25/unicode.html">UTF-8 and Unicode FAQ for Unix/Linux</a>.
There’s no reason for me to repeat all this information here, but I’ll
attempt to quickly summarize it.</p>

<p>Unicode maps <em>code points</em> (integers) to specific characters, along
with a standard name. As of this writing, Unicode defines over 110,000
characters. For backwards compatibility, the first 128 code points are
mapped to ASCII. This trend continues for other character standards,
like Latin-1.</p>

<p>In Emacs, Unicode characters are entered into a buffer with <code class="language-plaintext highlighter-rouge">C-x 8
RET</code> (<code class="language-plaintext highlighter-rouge">insert-char</code>). You can enter either the official name of the
character (e.g. “GREEK SMALL LETTER PI” for π) or the hexadecimal code
point. Outside of Emacs it depends on the application, but <code class="language-plaintext highlighter-rouge">C-S-u</code>
followed by the hexadecimal code works for most of the applications I
care about.</p>

<h4 id="encodings">Encodings</h4>

<p>The Unicode standard also describes several methods for encoding
sequences of code points into sequences of bytes. Obviously a selection
of 110,000 characters cannot be encoded with one byte per letter, so
these are multibyte encodings. The two most popular encodings are
probably UTF-8 and UTF-16.</p>

<p>UTF-8 was designed to be backwards compatible with ASCII, Unix, and
existing C APIs (null-terminated C strings). The first 128 code points
are encoded directly as a single byte. Every other character is
encoded with two to six bytes, with the highest bit of each byte set
to 1. This ensures that no part of a multibyte character will be
interpreted as ASCII, nor will it contain a null (0). The latter means
that C programs and C APIs can handle UTF-8 strings with few or no
changes. Most importantly, every ASCII encoded file is automatically a
UTF-8 encoded file.</p>

<p>UTF-16 encodes all the characters from the <em>Basic Multilingual Plane</em>
(BMP) with two bytes. Even the original ASCII characters get two bytes
(<em>16</em> bits). The BMP covers virtually all modern languages and is
generally all you’ll ever practically need. However, this doesn’t
include the important <a href="http://www.fileformat.info/info/unicode/char/1f379/index.htm">TROPICAL DRINK</a> or <a href="http://www.fileformat.info/info/unicode/char/1F4A9/index.htm">PILE OF POO</a>
characters from the supplemental (“astral”) plane. If you need to use
these characters in UTF-16, you’re going to run into problems:
characters outside the BMP don’t fit in two bytes. To accommodate
these characters, UTF-16 uses <em>surrogate pairs</em>: these characters are
encoded with two 16-bit units.</p>

<p>Because of this last point, <strong>UTF-16 offers no practical advantages
over UTF-8</strong>. Its <a href="http://www.utf8everywhere.org/">existence was probably a big mistake</a>. You
can’t do constant-time character lookup because you have to scan for
surrogate pairs. It’s not backwards compatible and cannot be stored in
null-terminated strings. In both Java and JavaScript, it leads to the
awkward situation where the “length” of a string is not the number of
characters, code points, or even bytes. Worst of all, <a href="https://speakerdeck.com/mathiasbynens/hacking-with-unicode?slide=114">it has serious
security implications</a>. New applications should avoid it
whenever possible.</p>

<h3 id="emacs-and-utf-8">Emacs and UTF-8</h3>

<p><strong>Emacs internally stores all text as UTF-8.</strong> This was an excellent
choice! When text leaves Emacs, such as writing to a file or to a
process, Emacs automatically converts it to the coding system
configured for that particular file or process. When it accepts text
from a file or process, it either converts it to UTF-8 or preserves it
as raw bytes.</p>

<p>There are two modes for this in Emacs: unibyte and multibyte. Unibyte
strings/buffers are just raw bytes. They have constant access O(1)
time but can only hold single-byte values. The <a href="/blog/2014/01/04/">byte-code compiler
outputs unibyte strings</a>.</p>

<p>Multibyte strings/buffers hold UTF-8 encoded code points. Character
access is O(n) because the string/buffer has to be scanned to count
characters.</p>

<p>The actual encoding is rarely relevant because there’s little way (and
need) to access it directly. Emacs automatically converts text as
needed when it leaves Emacs and arrives in Emacs, so there’s no need
to know the internal encoding. If you <em>really</em> want to see it anyway,
you can use <code class="language-plaintext highlighter-rouge">string-as-unibyte</code> to get a copy of a string with the
exact same bytes, but as a byte-string.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">string-as-unibyte</span> <span class="s">"π"</span><span class="p">)</span>
<span class="c1">;; =&gt; "\317\200"</span>
</code></pre></div></div>

<p>This can be reversed with <code class="language-plaintext highlighter-rouge">string-as-multibyte</code>), to change a unibyte
string holding UTF-8 encoded text back into a multibyte string. Note
that these functions are different than <code class="language-plaintext highlighter-rouge">string-to-unibyte</code> and
<code class="language-plaintext highlighter-rouge">string-to-multibyte</code>, which will attempt a conversion rather than
preserving the raw bytes.</p>

<p>The <code class="language-plaintext highlighter-rouge">length</code> and <code class="language-plaintext highlighter-rouge">buffer-size</code> functions always count characters in
multibyte and bytes in unibyte. Being UTF-8, there are no surrogate
pairs to worry about here. The <code class="language-plaintext highlighter-rouge">string-bytes</code> and <code class="language-plaintext highlighter-rouge">position-bytes</code>
functions return byte information for both multibyte and unibyte.</p>

<p>To specify a Unicode character in a string literal without using the
character directly, use <code class="language-plaintext highlighter-rouge">\uXXXX</code>. The <code class="language-plaintext highlighter-rouge">XXXX</code> is the hexadecimal code
point for the character and is always 4 digits long. For characters
outside the BMP, which won’t fit in four digits, use a capital U with
eight digits: <code class="language-plaintext highlighter-rouge">\UXXXXXXXX</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="s">"\u03C0"</span>
<span class="c1">;; =&gt; "π"</span>

<span class="s">"\U0001F4A9"</span>
<span class="c1">;; =&gt; "💩"  (PILE OF POO)</span>
</code></pre></div></div>

<p>Finally, Emacs extends Unicode with 256 additional “characters”
representing raw bytes. This allows raw bytes to be embedded
distinctly within UTF-8 sequences. For example, it’s used to
distinguish the code point U+0041 from the raw byte #x41. As far as I
can tell, this isn’t used very often.</p>

<h3 id="combining-characters">Combining Characters</h3>

<p>Some Unicode characters are defined as <em>combining characters</em>. These
characters modify the non-combining character that appears before it,
typically with accents or diacritical marks.</p>

<p>For example, the word “naïve” can be written as <em>six</em> characters as
<code class="language-plaintext highlighter-rouge">"nai\u0308ve"</code>. The fourth character, U+0308 (COMBINING DIAERESIS),
is a combining character that changes the “i” (U+0069 LATIN SMALL
LETTER I) into an umlaut character.</p>

<p>The most commonly accented characters have a code of their own. These
are called <em>precomposed characters</em>. This includes ï (U+00EF LATIN
SMALL LETTER I WITH DIAERESIS). This means “naïve” can also be written
as <em>five</em> characters as <code class="language-plaintext highlighter-rouge">"na\u00EFve"</code>.</p>

<h4 id="normalization">Normalization</h4>

<p>So what happens when comparing two different representations of the
same text? They’re not equal.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">string=</span> <span class="s">"nai\u0308ve"</span> <span class="s">"na\u00EFve"</span><span class="p">)</span>
<span class="c1">;; =&gt; nil</span>
</code></pre></div></div>

<p>To deal with situations like this, the Unicode standard defines four
different kinds of normalization. The two most important ones are NFC
(composition) and NFD (decomposition). The former uses precomposed
characters whenever possible and the latter breaks them apart. The
functions <code class="language-plaintext highlighter-rouge">ucs-normalize-NFC-string</code> and <code class="language-plaintext highlighter-rouge">ucs-normalize-NFD-string</code>
perform this operation.</p>

<p>Pitfall #1: <strong>Proper string comparison requires normalization.</strong> It
doesn’t matter which normalization you use (though NFD should be
slightly faster), you just need to use it consistently. Unfortunately
this can get tricky when using <code class="language-plaintext highlighter-rouge">equal</code> to compare complex data
structures with multiple strings.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">string=</span> <span class="p">(</span><span class="nv">ucs-normalize-NFD-string</span> <span class="s">"nai\u0308ve"</span><span class="p">)</span>
         <span class="p">(</span><span class="nv">ucs-normalize-NFD-string</span> <span class="s">"na\u00EFve"</span><span class="p">))</span>
<span class="c1">;; =&gt; t</span>
</code></pre></div></div>

<p>Emacs itself fails to do this. It doesn’t normalize strings before
interning them, which is probably a mistake. This means you can have
differently defined variables and functions with the same canonical
name.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nb">intern</span> <span class="s">"nai\u0308ve"</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">intern</span> <span class="s">"na\u00EFve"</span><span class="p">))</span>
<span class="c1">;; =&gt; nil</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">print-r</span><span class="err">é</span><span class="nv">sum</span><span class="err">é</span> <span class="p">()</span>
  <span class="s">"NFC-normalized form."</span>
  <span class="p">(</span><span class="nb">print</span> <span class="s">"I'm going to sabotage your team."</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">print-re</span><span class="err">́</span><span class="nv">sume</span><span class="err">́</span> <span class="p">()</span>
  <span class="s">"NFD-normalized form."</span>
  <span class="p">(</span><span class="nb">print</span> <span class="s">"I'd be a great asset to your team."</span><span class="p">))</span>

<span class="p">(</span><span class="nv">print-r</span><span class="err">é</span><span class="nv">sum</span><span class="err">é</span><span class="p">)</span>
<span class="c1">;; =&gt; "I'm going to sabotage your team."</span>
</code></pre></div></div>

<h4 id="string-width">String Width</h4>

<p>There are three ways to quantify multibyte text. These are often the
same value, but in some circumstances they can each be different.</p>

<ul>
  <li><em>length</em>: number of characters, including combining characters</li>
  <li><em>bytes</em>:  number of bytes in its UTF-8 encoding</li>
  <li><em>width</em>:  number of columns it would occupy in the current buffer</li>
</ul>

<p>Most of the time, one character is one column (a width of one). Some
characters, like combining characters, consume no columns. Many Asian
characters consume two columns (U+4000, 䀀). Tabs consume <code class="language-plaintext highlighter-rouge">tab-width</code>
columns, usually 8.</p>

<p>Generally, a string should have the same width regardless of which
whether it’s NFD or NFC. However, due to bugs and incomplete Unicode
support, this isn’t strictly true. For example, some combining
characters, such as U+20DD ⃝, won’t combine correctly in Emacs nor in
other applications.</p>

<p>Pitfall #2: <strong>Always measure text by width, not length, when laying
out a buffer</strong>. Width is measured with the <code class="language-plaintext highlighter-rouge">string-width</code> function.
This comes up when laying out tables in a buffer. The number of
characters that fit in a column depends on what those characters are.</p>

<p>Fortunately I accidentally got this right in <a href="/blog/2013/09/04/">Elfeed</a> because
I used the <code class="language-plaintext highlighter-rouge">format</code> function for layout. The <code class="language-plaintext highlighter-rouge">%s</code> directive operates
on width, as would be expected. However, this has the side effect that
the output of may <code class="language-plaintext highlighter-rouge">format</code> change depending on the current buffer!
Pitfall #3: <strong>Be mindful of the current buffer when using the format
function.</strong></p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">tab-width</span> <span class="mi">4</span><span class="p">))</span>
  <span class="p">(</span><span class="nb">length</span> <span class="p">(</span><span class="nb">format</span> <span class="s">"%.6s"</span> <span class="s">"\t"</span><span class="p">)))</span>
<span class="c1">;; =&gt; 1</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">tab-width</span> <span class="mi">8</span><span class="p">))</span>
  <span class="p">(</span><span class="nb">length</span> <span class="p">(</span><span class="nb">format</span> <span class="s">"%.6s"</span> <span class="s">"\t"</span><span class="p">)))</span>
<span class="c1">;; =&gt; 0</span>
</code></pre></div></div>

<h3 id="string-reversal">String Reversal</h3>

<p>Say you want to reverse a multibyte string. Simple, right?</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">reverse-string</span> <span class="p">(</span><span class="nb">string</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">concat</span> <span class="p">(</span><span class="nb">reverse</span> <span class="p">(</span><span class="nv">string-to-list</span> <span class="nb">string</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">reverse-string</span> <span class="s">"abc"</span><span class="p">)</span>
<span class="c1">;; =&gt; "cba"</span>
</code></pre></div></div>

<p>Wrong! The combining characters will get flipped around to the wrong
side of the character they’re meant to modify.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">reverse-string</span> <span class="s">"nai\u0308ve"</span><span class="p">)</span>
<span class="c1">;; =&gt; "ev̈ian"</span>
</code></pre></div></div>

<p>Pitfall #4: <strong><a href="https://github.com/mathiasbynens/esrever">Reversing Unicode strings is non-trivial</a>.</strong>
The <a href="http://rosettacode.org/wiki/Reverse_a_string">Rosetta Code</a> page is full of incorrect examples, and
<a href="/blog/2012/11/15/">I’m personally guilty</a> of this, too. The other day I
<a href="https://github.com/magnars/s.el/pull/58">submitted a patch to s.el</a> to correct its <code class="language-plaintext highlighter-rouge">s-reverse</code> function
for Unicode. If it’s accepted, you should never need to worry about
this.</p>

<h3 id="regular-expressions">Regular Expressions</h3>

<p>Regular expressions operate on code points. This means combining
characters are counted separately and the match may change depending
on how characters are composed. To avoid this, you might want to
consider NFC normalization before performing some kinds of regular
expressions.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Like string= from before:</span>
<span class="p">(</span><span class="nv">string-match-p</span>  <span class="s">"na\u00EFve"</span> <span class="s">"nai\u0308ve"</span><span class="p">)</span>
<span class="c1">;; =&gt; nil</span>

<span class="c1">;; The . only matches part of the composition</span>
<span class="p">(</span><span class="nv">string-match-p</span> <span class="s">"na.ve"</span> <span class="s">"nai\u0308ve"</span><span class="p">)</span>
<span class="c1">;; =&gt; nil</span>
</code></pre></div></div>

<p>Pitfall #5: <strong>Be mindful of combining characters when using regular
expressions.</strong> Prefer NFC normalization when dealing with regular
expressions.</p>

<p>Another potential problem is ranges, though this is quite uncommon.
Ranges of characters can be expressed in inside brackets, e.g.
<code class="language-plaintext highlighter-rouge">[a-zA-Z]</code>. If the range begins or ends with a decomposed combining
character you won’t get the proper range because its parts are
considered separately by the regular expression engine.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">match-weird</span> <span class="s">"[\u00E0-\u00F6]+"</span><span class="p">)</span>

<span class="p">(</span><span class="nv">string-match-p</span> <span class="nv">match-weird</span> <span class="s">"áâãäå"</span><span class="p">)</span>
<span class="c1">;; =&gt; 0  (successful match)</span>

<span class="p">(</span><span class="nv">string-match-p</span> <span class="p">(</span><span class="nv">ucs-normalize-NFD-string</span> <span class="nv">match-weird</span><span class="p">)</span> <span class="s">"áâãäå"</span><span class="p">)</span>
<span class="c1">;; =&gt; nil</span>
</code></pre></div></div>

<p>It’s <em>especially</em> important to keep all of this in mind when
sanitizing untrusted input, such as when using Emacs as a web server.
An attacker might use a denormalized or strange grapheme cluster to
bypass a filter.</p>

<h3 id="interacting-with-the-world">Interacting with the World</h3>

<p>Here’s a mistake I’ve made twice now. Emacs uses UTF-8 internally,
regardless of whatever encoding the original text came in. Pitfall #6:
<strong>When working with bytes of text, the counts may be different than
the original source of the text.</strong></p>

<p>For example, HTTP/1.1 introduced persistent connections. Before this,
a client connects to a server and asks for content. The server sends
the content and then closes the connection to signal the end of the
data. In HTTP/1.1, when <code class="language-plaintext highlighter-rouge">Connection: close</code> isn’t specified, the
server will instead send a <code class="language-plaintext highlighter-rouge">Content-Length</code> header indicating the
length of the content in bytes. The connection can then be re-used for
more requests, or, more importantly, pipelining requests.</p>

<p>The main problem is that HTTP headers usually have a different
encoding than the content body. Emacs is not prepared to handle
multiple encodings from a single source, so the only correct way to
talk HTTP with a network process is raw. My mistake was allowing Emacs
to do the UTF-8 conversion, then measuring the length of the content
in its UTF-8 encoding. This just happens to work fine about 99.9% of
the time since clients tend to speak UTF-8, or something like it,
anyway, but it’s not correct.</p>

<h3 id="further-reading">Further Reading</h3>

<p>A lot of this investigation was inspired by JavaScript’s and other
languages’ Unicode shortcomings.</p>

<ul>
  <li><a href="http://www.cl.cam.ac.uk/~mgk25/unicode.html">UTF-8 and Unicode FAQ for Unix/Linux</a></li>
  <li><a href="https://speakerdeck.com/mathiasbynens/hacking-with-unicode">Hacking with Unicode</a></li>
  <li><a href="https://github.com/mathiasbynens/jsesc">jsesc</a></li>
  <li><a href="http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#unicode">java.lang.Character Unicode Character Representations</a></li>
  <li><a href="http://www.gnu.org/software/emacs/manual/html_node/elisp/Strings-and-Characters.html">GNU Emacs Lisp Reference Manual: Strings and Characters</a></li>
</ul>

<p>Comparatively, Emacs Lisp has really great Unicode support. This isn’t
too surprising considering that it’s primary purpose is for
manipulating text.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>An Emacs Foreign Function Interface</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/04/26/"/>
    <id>urn:uuid:ba31fe59-f5b0-3603-b243-4bcae00aebcf</id>
    <updated>2014-04-26T16:25:51Z</updated>
    <category term="elisp"/><category term="emacs"/>
    <content type="html">
      <![CDATA[<p>For many years Richard Stallman (RMS) prohibited a foreign function
interface (FFI) in GNU Emacs. An FFI is an API for dynamically calling
native libraries at run-time, like the Java Native Interface (JNI). He
was concerned that people might use it to make proprietary extensions
to the popular editor. This was the same (paranoid) justification for
rejecting a package manager in Emacs for many years, that someone
might use it to distribute proprietary packages.</p>

<p>Fortunately, times have changed. RMS reevaluated his
<a href="http://lists.gnu.org/archive/html/emacs-devel/2010-03/msg00240.html">stances on FFI</a> and on package managers. Today Emacs comes with
a package manager (package.el), and there are multiple package
repositories with no proprietary packages in sight. Though, outside of
<a href="http://www.loveshack.ukfsn.org/emacs/dynamic-loading/">some unaccepted patches</a>, no significant progress has been
made to add an FFI.</p>

<p>A few weeks ago I did something about that by writing a package that
adds an FFI. It requires no patches or any other changes to Emacs
itself. Instead, it drives a subprocess running <a href="http://sourceware.org/libffi/">libffi</a>,
passing arguments and return values back and forth through a pipe, in
the spirit of <a href="/blog/2014/02/06/">EmacSQL</a>. It’s not as efficient as a built-in
API, but it could potentially be distributed through an ELPA
repository.</p>

<ul>
  <li><a href="https://github.com/skeeto/elisp-ffi">Emacs Lisp Foreign Function Interface</a></li>
</ul>

<p>The API is modeled loosely after <a href="http://julia.readthedocs.org/en/latest/manual/calling-c-and-fortran-code/">Julia’s elegant FFI</a>. A call
interface (CIF) doesn’t need to be prepared ahead of time. Provide all
the necessary information at the call site and the library takes care
of building and caching CIFs and handles for you.</p>

<h3 id="api-examples">API Examples</h3>

<p>The core function for the FFI is <code class="language-plaintext highlighter-rouge">ffi-call</code>. Here’s an example that
calls the system’s <code class="language-plaintext highlighter-rouge">srand()</code> and then <code class="language-plaintext highlighter-rouge">rand()</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; seed with 0</span>
<span class="p">(</span><span class="nv">ffi-call</span> <span class="no">nil</span> <span class="s">"srand"</span> <span class="nv">[:void</span> <span class="ss">:uint32]</span> <span class="mi">0</span><span class="p">)</span>
<span class="c1">;; =&gt; :void</span>

<span class="p">(</span><span class="nv">ffi-call</span> <span class="no">nil</span> <span class="s">"rand"</span> <span class="nv">[:sint32]</span><span class="p">)</span>
<span class="c1">;; =&gt; 1102520059</span>
</code></pre></div></div>

<p>The first two arguments are similar to the first two arguments of
<code class="language-plaintext highlighter-rouge">dlsym()</code>. For <code class="language-plaintext highlighter-rouge">ffi-call</code>, the first argument is the library shared
object name. The back-end automatically takes care of obtaining a
handle on the library with <code class="language-plaintext highlighter-rouge">dlopen()</code>. In this case we’re accessing a
function that’s already in the main program, so we pass nil. This is
identical to passing NULL to <code class="language-plaintext highlighter-rouge">dlsym()</code>. In this FFI, nil always
corresponds to NULL.</p>

<p>The second argument is the function name, just like <code class="language-plaintext highlighter-rouge">dlsym()</code>’s second
argument.</p>

<p>The third argument is the function signature. It’s a vector of
keywords declaring the return value type followed by the types of each
argument. In this example, <code class="language-plaintext highlighter-rouge">srand()</code> returns nothing (void) and
accepts a single 32-bit unsigned argument, so the signature is
<code class="language-plaintext highlighter-rouge">[:void :uint32]</code>.</p>

<p>The remaining arguments are the native function arguments. I can keep
making the second FFI call (“rand”) to retrieve different numbers,
using the first FFI call (“srand”) to reset the sequence.</p>

<h4 id="using-a-library">Using a Library</h4>

<p>Here’s another example, loading <code class="language-plaintext highlighter-rouge">libm</code> and calling <code class="language-plaintext highlighter-rouge">cos</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; cos(1.2)</span>
<span class="p">(</span><span class="nv">ffi-call</span> <span class="s">"libm.so"</span> <span class="s">"cos"</span> <span class="nv">[:double</span> <span class="ss">:double]</span> <span class="mf">1.2</span><span class="p">)</span>
<span class="c1">;; =&gt; 0.362357754476674</span>
</code></pre></div></div>

<p>The first time a library is used, the back-end creates a handle for it
with <code class="language-plaintext highlighter-rouge">dlopen()</code>. Further calls will reuse the handle, trying to be as
efficient as possible. Handles are never closed.</p>

<h4 id="pointers">Pointers</h4>

<p>Here are a couple of examples that use pointers. As stated before, nil
is used to pass a NULL pointer. Like the underlying libffi, the FFI
doesn’t care what <em>kind</em> of pointer you’re passing, just that it’s a
pointer, so it’s declared with <code class="language-plaintext highlighter-rouge">:pointer</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; time(NULL);</span>
<span class="p">(</span><span class="nv">ffi-call</span> <span class="no">nil</span> <span class="s">"time"</span> <span class="nv">[:uint64</span> <span class="ss">:pointer]</span> <span class="no">nil</span><span class="p">)</span>
<span class="c1">;; =&gt; 1396496875</span>
</code></pre></div></div>

<p>Strings are automatically copied to the subprocess, their lifetime
tied to the lifetime of the Elisp string (note: this detail is still
unimplemented). When used as arguments, they become pointers.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; getenv("DISPLAY")</span>
<span class="p">(</span><span class="nv">ffi-call</span> <span class="no">nil</span> <span class="s">"getenv"</span> <span class="nv">[:pointer</span> <span class="ss">:pointer]</span> <span class="s">"DISPLAY"</span><span class="p">)</span>
<span class="c1">;; =&gt; 0x7fffc13ceb29</span>

<span class="p">(</span><span class="nv">ffi-get-string</span> <span class="ss">'0x7fffc13ceb29</span><span class="p">)</span>
<span class="c1">;; =&gt; ":0"</span>
</code></pre></div></div>

<p>Pointers can be handled as values on the Elisp side. They’re
represented as symbols whose name is an address. In the above example,
<code class="language-plaintext highlighter-rouge">0x7fffc13ceb29</code> is one of these symbols. I would have preferred to
use a plain integer to represent pointers, but, because Elisp integers
are <em>tagged</em>, they’re guaranteed not to be wide enough for this. I
plan to add pointer operators to do pointer arithmetic on these
special pointer values.</p>

<p>The function <code class="language-plaintext highlighter-rouge">ffi-get-string</code> is used to retrieve the null-terminated
string referenced by a pointer. If the string returned by <code class="language-plaintext highlighter-rouge">getenv()</code>
needed to be freed (it doesn’t and shouldn’t), the FFI caller would
need to be careful to call <code class="language-plaintext highlighter-rouge">free()</code> as another FFI call.</p>

<h3 id="how-it-works-the-stack-machine">How It Works: The Stack Machine</h3>

<p>My goal is to keep the back-end as simple as possible. All resource
management is handled by Emacs, <a href="/blog/2014/01/27/">tied to garbage collection</a>.
For example, the pointer returned by <code class="language-plaintext highlighter-rouge">dlopen()</code> isn’t stored anywhere
in the subprocess. It’s passed to Emacs and managed there. To call a
function using the handle, the pointer is transmitted back to the
subprocess.</p>

<p>To keep it simple, the back-end is just a stack machine with a simple
human-readable bytecode. You can see the instruction set by looking at
the big switch statement in <code class="language-plaintext highlighter-rouge">ffi-glue.cc</code>. For example, to push a
signed 2-byte integer 237 onto the stack, send a <code class="language-plaintext highlighter-rouge">j</code> followed by an
ASCII representation of the number (terminated by a space if needed):
<code class="language-plaintext highlighter-rouge">j237</code>.</p>

<p>As usual, my assumption is that the Elisp printer and reader is faster
than any possible serialization I could implement within Elisp itself.
This also nicely sidesteps the byte-order issue.</p>

<p>The function signature is declared by pushing zeros of the
return/argument types onto the stack, with a special void “value” used
to communicate <code class="language-plaintext highlighter-rouge">void</code>. Once it’s all set up, the <code class="language-plaintext highlighter-rouge">C</code> instruction is
called, collapsing the signature into a CIF handle: a pointer for the
Elisp side to manage.</p>

<p>Pointers to raw strings of bytes are pushed onto the stack with the
<code class="language-plaintext highlighter-rouge">M</code> instruction. It pops the top integer on the stack to get the byte
count, reads that number of bytes from input into a buffer,
null-terminates the buffer in case it’s used as a string, and finally
puts a pointer to that buffer on the stack.</p>

<p>Calling functions is just a matter of pushing all the needed
information onto the stack, invoking libffi to magically call the
function, then popping the result off the stack. Popping a value
transmits it to Elisp.</p>

<h4 id="stack-machine-example">Stack Machine Example</h4>

<p>Here’s a concise example that calls <code class="language-plaintext highlighter-rouge">cos(1.2)</code> (assuming libm.so is
already linked). The actual Elisp-generated FFI bytecode doesn’t plan
things quite this way — particularly because it needs to keep track
of the various pointers involved — but this example keeps it simple.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>d1.2d0d0w1Cp0w3McosSco
</code></pre></div></div>

<p>You can run this example manually by executing the <code class="language-plaintext highlighter-rouge">ffi-glue</code> program
and pasting in that line as standard input. The result will be
printed.</p>

<ol>
  <li><code class="language-plaintext highlighter-rouge">d1.2</code> : Push a double, 1.2, onto the stack. This will be the
function argument.</li>
  <li><code class="language-plaintext highlighter-rouge">d0d0</code> : Push a couple of zero doubles onto the stack. This is our
function signature. It takes a double and returns a double.</li>
  <li><code class="language-plaintext highlighter-rouge">w1</code> : Push an unsigned 32-bit 1 onto the stack. Instructions that
use integers accept unsigned 32-bit integers. This 1 indicates that
our function accepts one argument.</li>
  <li><code class="language-plaintext highlighter-rouge">C</code> : create a CIF. The integer 1 and the two 0 doubles are
consumed and a pointer to a CIF is put on the stack. Elisp would
normally pop this off and save it for future use, but we’re going
to leave it there (and ultimately leak it in the example).</li>
  <li><code class="language-plaintext highlighter-rouge">p0</code> : Push a NULL onto the stack. <code class="language-plaintext highlighter-rouge">p</code> means push a pointer and 0
is a NULL pointer. This is our library handle. We’re assuming <code class="language-plaintext highlighter-rouge">cos</code>
will be in the main program.</li>
  <li><code class="language-plaintext highlighter-rouge">w3Mcos</code> : Put a pointer to the string “cos” into the stack. First
push on the number 3 (string length), then <code class="language-plaintext highlighter-rouge">M</code> to read from input,
then pass three bytes: “cos”. In our example, this buffer will be
leaked because we lose the buffer pointer.</li>
  <li><code class="language-plaintext highlighter-rouge">S</code> : Call <code class="language-plaintext highlighter-rouge">dlsym()</code> on the string and handle on top of the stack.
This consumes the top two values (NULL and “cos”), and pushes a
function handle on top of the stack. At this point the stack has
three values: 1.2, the CIF, and the function handle.</li>
  <li><code class="language-plaintext highlighter-rouge">c</code> : Call the function pointed to by the top of the stack. This
consumes the top pointer, the CIF below it, and the CIF indicates
how many more values to consume: just one in this case, since the
function takes one argument. The function’s return value is pushed
on the stack. If the function is <code class="language-plaintext highlighter-rouge">void</code>, the special void “value”
is pushed on the stack.</li>
  <li><code class="language-plaintext highlighter-rouge">o</code> : Pop the top stack value, sending it to Emacs. This is
what would be returned by <code class="language-plaintext highlighter-rouge">ffi-call</code>.</li>
</ol>

<p>Before I got the Elisp side of things going, I was testing out the
back-end by writing lots of little programs like this by hand.</p>

<h3 id="a-safe-ffi">A Safe FFI</h3>

<p>While using an FFI through a pipe is slow compared to a built-in FFI,
there is a distinct advantage. The FFI can never crash Emacs! Normally,
making calls to an FFI is <em>unsafe</em>. It allows the programmer to
violate normal language constraints. If the programmer misuses the
FFI, the whole process may crash or become corrupt. This will lose any
state held behind foreign interface, but Emacs will be safe.</p>

<p>In my package, the handle for the FFI Emacs subprocess is called the
<em>context</em>. A context is automatically established and bound to the
<code class="language-plaintext highlighter-rouge">ffi-context</code> global variable as needed. This context keeps track of
CIFs, string buffers, handles, and any other resources held by the
subprocess. If the subprocess dies, the context becomes meaningless
since the pointers it holds are dead.</p>

<h3 id="limitations">Limitations</h3>

<p>This FFI package is about 80% complete. It occasionally leaks memory
in the subprocess, it’s overly-sensitive to mis-typing, it doesn’t
manage stdin/stdout, it can’t inspect/modify structs, and it can’t set
up closures.</p>

<p>The last point, closures, would require some changes to the
interprocess communication. The purpose here would be to allow foreign
functions to call Elisp functions. The subprocess would need to be
able to initiate activity with Elisp.</p>

<p>Manipulating structs is complex, and even libffi has limited support
for working with them. It allows structs to be declared, but leaves
alignment and access up to the user to sort out. That’s where the
previously-mentioned pointer arithmetic comes into play.</p>

<p>Currently stdin, stdout, and stderr are problems, especially when I
was trying to write a test GTK application with Elisp. Any command
line junkie knows that GTK (and Qt) applications are ridiculously
noisy. It spews hundreds of lines of warnings and notifications as
part of its normal operation. This noise interferes with FFI
communication with Emacs. I need to figure out how to separate this
and get standard input/output/error to/from Emacs through separate
channels.</p>

<p>Like libffi, there are no guarantees about variadic function calls. It
should generally Just Work, but you can’t rely on it.</p>

<p>The whole thing will not work as well in 32-bit Emacs, where integers
are limited to a tiny 29 bits. For example, those <code class="language-plaintext highlighter-rouge">rand()</code> return
values will simply not fit. In the long run, this is probably the
single largest barrier to making the FFI work smoothly. It’s too easy
to run into large integer values.</p>

<p>Right now I consider it a proof of concept; an FFI <em>really can</em> be
done this way. I don’t have any particular uses in mind, and, outside
of the “cool factor,” I can’t actually think of any useful
applications. If a solid FFI already existed, I may have tried to use
it for EmacSQL rather than use this subprocess trick. My FFI <em>is</em>
probably mature enough to drive SQLite, so maybe this is the future of
EmacSQL.</p>

<p>If you can think of a good use for an Emacs FFI, please share it. I
need good test ideas.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Emacs Lisp Defstruct Namespace Convention</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/03/19/"/>
    <id>urn:uuid:624f92a9-6696-33bb-f955-d6c83da56fc1</id>
    <updated>2014-03-19T01:41:52Z</updated>
    <category term="emacs"/><category term="lisp"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>One of the drawbacks of Emacs Lisp is the lack of namespaces. Every
<code class="language-plaintext highlighter-rouge">defun</code>, <code class="language-plaintext highlighter-rouge">defvar</code>, <code class="language-plaintext highlighter-rouge">defcustom</code>, <code class="language-plaintext highlighter-rouge">defface</code>, <code class="language-plaintext highlighter-rouge">defalias</code>, <code class="language-plaintext highlighter-rouge">defstruct</code>,
and <code class="language-plaintext highlighter-rouge">defclass</code> establishes one or more names in the global scope. To
work around this, package authors are strongly encouraged to prefix
every global name with the name of its package. That way there should
never be a naming conflict between two different packages.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">mypackage-foo-limit</span> <span class="mi">10</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defvar</span> <span class="nv">mypackage--bar-counter</span> <span class="mi">0</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">mypackage-init</span> <span class="p">()</span>
  <span class="o">...</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">mypackage-compute-children</span> <span class="p">(</span><span class="nv">node</span><span class="p">)</span>
  <span class="o">...</span><span class="p">)</span>

<span class="p">(</span><span class="nb">provide</span> <span class="ss">'mypackage</span><span class="p">)</span>
</code></pre></div></div>

<p>While this has solved the problem for the time being, attaching the
package name to almost every identifier, including private function
and variable names, is quite cumbersome. Namespaces can <em>almost</em> be
hacked into the language by using multiple obarrays,
<a href="/blog/2011/08/18/">but symbols have internal linked lists</a> that prohibit
inclusion in multiple obarrays.</p>

<p>By convention, private names are given a double-dash after the
namespace. If a “bar counter” is an implementation detail that may
disappear in the future, it will be called <code class="language-plaintext highlighter-rouge">mypackage--bar-counter</code> to
warn users and other package authors not to rely on it.</p>

<p>There’s been a recent push to follow this namespace-prefix policy more
strictly, particularly with the depreciation of <code class="language-plaintext highlighter-rouge">cl</code> and introduction
of <code class="language-plaintext highlighter-rouge">cl-lib</code>. I suspect someday when namespaces are finally introduced,
packages with strictly clean namespaces with be at an advantage,
somehow automatically supported. <a href="http://nic.ferrier.me.uk/blog/2013_06/adding-namespaces-to-elisp">Nic Ferrier has proposed ideas</a>
for how to move forward on this.</p>

<h3 id="how-strict-are-we-talking">How strict are we talking?</h3>

<p>Over the last few years I’ve gotten much stricter in my own packages
when it comes to namespace prefixes. You can see the progression going
from <a href="https://github.com/skeeto/javadoc-lookup">javadoc-lookup</a> (2010) where I was completely sloppy about
it, to <a href="https://github.com/skeeto/emacsql">EmacSQL</a> (2014) where every single global identifier
is meticulously prefixed.</p>

<p>For a time I considered names such as <code class="language-plaintext highlighter-rouge">make-*</code> and <code class="language-plaintext highlighter-rouge">with-*</code> to be
exceptions to the rule, since these names are idioms inherited from
Common Lisp. The namespace comes <em>after</em> the expected prefix. I’ve
changed my mind about this, which has caused me to change my usage of
<code class="language-plaintext highlighter-rouge">defstruct</code> (now <code class="language-plaintext highlighter-rouge">cl-defstruct</code>).</p>

<p>Just as in Common Lisp, by default <code class="language-plaintext highlighter-rouge">cl-defstruct</code> defines a
constructor starting with <code class="language-plaintext highlighter-rouge">make-*</code>. This is fine in Common Lisp, where
it’s a package-private function by default, but in Emacs Lisp this
pollutes the global namespace.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'cl-lib</span><span class="p">)</span>

<span class="c1">;; Defines make-circle, circle-x, circle-y, circle-radius, circle-p</span>
<span class="p">(</span><span class="nv">cl-defstruct</span> <span class="nv">circle</span>
  <span class="nv">x</span> <span class="nv">y</span> <span class="nv">radius</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defvar</span> <span class="nv">unit-circle</span> <span class="p">(</span><span class="nv">make-circle</span> <span class="ss">:x</span> <span class="mf">0.0</span> <span class="ss">:y</span> <span class="mf">0.0</span> <span class="ss">:radius</span> <span class="mf">1.0</span><span class="p">))</span>

<span class="nv">unit-circle</span>
<span class="c1">;; =&gt; [cl-struct-circle 0.0 0.0 1.0]</span>

<span class="p">(</span><span class="nv">circle-radius</span> <span class="nv">unit-circle</span><span class="p">)</span>
<span class="c1">;; =&gt; 1.0</span>
</code></pre></div></div>

<p>This constructor isn’t namespace clean, so package authors should
avoid defstruct’s default. If the package is named <code class="language-plaintext highlighter-rouge">circle</code> then all
of the accessors are perfectly fine, though.</p>

<p>To fix this, I now use another, more recent Emacs Lisp idiom: name the
constructor <code class="language-plaintext highlighter-rouge">create</code>. That is, for the package <code class="language-plaintext highlighter-rouge">circle</code>, we desire
<code class="language-plaintext highlighter-rouge">circle-create</code>. To get this behavior from <code class="language-plaintext highlighter-rouge">cl-defstruct</code>, use the
<code class="language-plaintext highlighter-rouge">:constructor</code> option.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Clean!</span>
<span class="p">(</span><span class="nv">cl-defstruct</span> <span class="p">(</span><span class="nv">circle</span> <span class="p">(</span><span class="ss">:constructor</span> <span class="nv">circle-create</span><span class="p">))</span>
  <span class="nv">x</span> <span class="nv">y</span> <span class="nv">radius</span><span class="p">)</span>

<span class="p">(</span><span class="nv">circle-create</span> <span class="ss">:x</span> <span class="mi">0</span> <span class="ss">:y</span> <span class="mi">0</span> <span class="ss">:radius</span> <span class="mi">1</span><span class="p">)</span>
<span class="c1">;; =&gt; [cl-struct-circle 0 0 1]</span>

<span class="p">(</span><span class="nb">provide</span> <span class="ss">'circle</span><span class="p">)</span>
</code></pre></div></div>

<p>This affords a new opportunity to craft a better constructor. Have
<code class="language-plaintext highlighter-rouge">cl-defstruct</code> define a private constructor, then manually write a
constructor with a nicer interface. It may also do additional work,
like enforce invariants or initialize dependent slots.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-defstruct</span> <span class="p">(</span><span class="nv">circle</span> <span class="p">(</span><span class="ss">:constructor</span> <span class="nv">circle--create</span><span class="p">))</span>
  <span class="nv">x</span> <span class="nv">y</span> <span class="nv">radius</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">circle-create</span> <span class="p">(</span><span class="nv">x</span> <span class="nv">y</span> <span class="nv">radius</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">circle</span> <span class="p">(</span><span class="nv">circle--create</span> <span class="ss">:x</span> <span class="nv">x</span> <span class="ss">:y</span> <span class="nv">y</span> <span class="ss">:radius</span> <span class="nv">radius</span><span class="p">)))</span>
    <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">&lt;</span> <span class="nv">radius</span> <span class="mi">0</span><span class="p">)</span>
        <span class="p">(</span><span class="nb">error</span> <span class="s">"must have non-negative radius"</span><span class="p">)</span>
      <span class="nv">circle</span><span class="p">)))</span>

<span class="p">(</span><span class="nv">circle-create</span> <span class="mi">0</span> <span class="mi">0</span> <span class="mi">1</span><span class="p">)</span>
<span class="c1">;; =&gt; [cl-struct-circle 0 0 1]</span>

<span class="p">(</span><span class="nv">circle-create</span> <span class="mi">0</span> <span class="mi">0</span> <span class="mi">-1</span><span class="p">)</span>
<span class="c1">;; error: "must have non-negative radius"</span>
</code></pre></div></div>

<p>This is now how I always use <code class="language-plaintext highlighter-rouge">cl-defstruct</code> in Emacs Lisp. It’s a tidy
convention that will probably become more common in the future.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  <entry>
    <title>Introducing EmacSQL</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/02/06/"/>
    <id>urn:uuid:af878773-296e-3411-593a-cc516856b832</id>
    <updated>2014-02-06T05:52:37Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>Yesterday I made the first official release of <a href="https://github.com/skeeto/emacsql">EmacSQL</a>, an
Emacs package I’ve been working on for the past few weeks. EmacSQL is
a high-level SQL database for Emacs. It primarily targets SQLite as a
back-end, but it also currently supports PostgreSQL and MySQL.</p>

<ul>
  <li><a href="https://github.com/skeeto/emacsql">https://github.com/skeeto/emacsql</a></li>
</ul>

<p>It’s <a href="http://melpa.milkbox.net/#/emacsql">available on MELPA</a> and is ready for immediate use. It
depends on the <a href="/blog/2014/01/27/">finalizers package</a> I added last week.</p>

<p>While there’s a non-Elisp component, SQLite, there are no special
requirements for the user to worry about. When the package’s Elisp is
compiled, if a C compiler is available it will use it to compile a
SQLite binary for EmacSQL. If not, it will later offer to download a
pre-built binary that I built. Ideally this makes the non-Elisp part
of EmacSQL completely transparent and users can pretend Emacs has a
built-in relational database.</p>

<p>The official SQLite command line shell is not used even if present,
and I’ll explain why below.</p>

<p>Just as <a href="/blog/2012/10/31/">Skewer</a> jump started my web development experience,
EmacSQL has been a crash course in SQL and relational databases.
Before starting this project I knew little about this topic and I’ve
gained a lot of appreciation for it in the process. Building an Emacs
extension is a very rapid way to dive into a new topic.</p>

<p>If you’re a total newb about this stuff like I was and want to learn
SQL for SQLite yourself, I highly recommend <a href="http://www.amazon.com/gp/product/0596521189/ref=as_li_qf_sp_asin_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0596521189&amp;linkCode=as2&amp;tag=nullprogram-20">Using SQLite</a>. It’s
a really solid introduction.</p>

<h3 id="high-level-sql-compiler">High-level SQL Compiler</h3>

<p>By “high-level” I mean that it goes beyond assembling strings
containing SQL code. In EmacSQL, statements are assembled from
s-expressions which, behind the scenes, are compiled into SQL using
some simple rules. This means if you already know SQL you should be
able to hit the ground running with EmacSQL. Here’s an example,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'emacsql</span><span class="p">)</span>

<span class="c1">;; Connect to the database, SQLite in this case:</span>
<span class="p">(</span><span class="nb">defvar</span> <span class="nv">db</span> <span class="p">(</span><span class="nv">emacsql-connect</span> <span class="s">"~/office.db"</span><span class="p">))</span>

<span class="c1">;; Create a table with 3 columns:</span>
<span class="p">(</span><span class="nv">emacsql</span> <span class="nv">db</span> <span class="nv">[:create-table</span> <span class="nv">patients</span>
             <span class="p">(</span><span class="nv">[name</span> <span class="p">(</span><span class="nv">id</span> <span class="nc">integer</span> <span class="ss">:primary-key</span><span class="p">)</span> <span class="p">(</span><span class="nv">weight</span> <span class="nb">float</span><span class="p">)</span><span class="nv">]</span><span class="p">)</span><span class="nv">]</span><span class="p">)</span>

<span class="c1">;; Insert a few rows:</span>
<span class="p">(</span><span class="nv">emacsql</span> <span class="nv">db</span> <span class="nv">[:insert</span> <span class="ss">:into</span> <span class="nv">patients</span>
             <span class="ss">:values</span> <span class="p">(</span><span class="nv">[</span><span class="s">"Jeff"</span> <span class="mi">1000</span> <span class="nv">184.2]</span> <span class="nv">[</span><span class="s">"Susan"</span> <span class="mi">1001</span> <span class="nv">118.9]</span><span class="p">)</span><span class="nv">]</span><span class="p">)</span>

<span class="c1">;; Query the database:</span>
<span class="p">(</span><span class="nv">emacsql</span> <span class="nv">db</span> <span class="nv">[:select</span> <span class="nv">[name</span> <span class="nv">id]</span>
             <span class="ss">:from</span> <span class="nv">patients</span>
             <span class="ss">:where</span> <span class="p">(</span><span class="nb">&lt;</span> <span class="nv">weight</span> <span class="mf">150.0</span><span class="p">)</span><span class="nv">]</span><span class="p">)</span>
<span class="c1">;; =&gt; (("Susan" 1001))</span>

<span class="c1">;; Queries can be templates, using $s1, $i2, etc. as parameters:</span>
<span class="p">(</span><span class="nv">emacsql</span> <span class="nv">db</span> <span class="nv">[:select</span> <span class="nv">[name</span> <span class="nv">id]</span>
             <span class="ss">:from</span> <span class="nv">patients</span>
             <span class="ss">:where</span> <span class="p">(</span><span class="nb">&gt;</span> <span class="nv">weight</span> <span class="nv">$s1</span><span class="p">)</span><span class="nv">]</span>
         <span class="mi">100</span><span class="p">)</span>
<span class="c1">;; =&gt; (("Jeff" 1000) ("Susan" 1001))</span>
</code></pre></div></div>

<p>A query is a vector of keywords, identifiers, parameters, and data.
Thanks to parameters, these s-expression statements should not need to
be constructed dynamically at run-time.</p>

<p>The compilation rules are listed in the EmacSQL documentation so I
won’t repeat them in detail here. In short, lisp keywords become SQL
keywords, row-oriented information is always presented as vectors,
expressions are lists, and symbols are identifiers, except when
quoted.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">[:select</span> <span class="nv">[name</span> <span class="nv">weight]</span> <span class="ss">:from</span> <span class="nv">patients</span> <span class="ss">:where</span> <span class="p">(</span><span class="nb">&lt;</span> <span class="nv">weight</span> <span class="mf">150.0</span><span class="p">)</span><span class="nv">]</span>
</code></pre></div></div>

<p>That compiles to this,</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">name</span><span class="p">,</span> <span class="n">weight</span> <span class="k">FROM</span> <span class="n">patients</span> <span class="k">WHERE</span> <span class="n">weight</span> <span class="o">&lt;</span> <span class="mi">150</span><span class="p">.</span><span class="mi">0</span><span class="p">;</span>
</code></pre></div></div>

<p>Also, any <a href="/blog/2013/12/30/#almost_everything_prints_readably">readable lisp value</a> can be stored in an
attribute. Integers are mapped to INTEGER, floats are mapped to REAL,
nil is mapped to NULL, and everything else is printed and stored as
TEXT. The specifics vary depending on the back-end.</p>

<h4 id="parameters">Parameters</h4>

<p>A symbol beginning with a dollar sign is a parameter. It has a type —
identifier (i), scalar (s), vector (v), schema (S) — and an argument
position.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">[:select</span> <span class="nv">[$i1]</span> <span class="ss">:from</span> <span class="nv">$i2</span> <span class="ss">:where</span> <span class="p">(</span><span class="nb">&lt;</span> <span class="nv">$i3</span> <span class="nv">$s4</span><span class="p">)</span><span class="nv">]</span>
</code></pre></div></div>

<p>Given the arguments <code class="language-plaintext highlighter-rouge">name people age 21</code>, three symbols and an
integer, it compiles to:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">SELECT</span> <span class="nv">name</span> <span class="nv">FROM</span> <span class="nv">people</span> <span class="nv">WHERE</span> <span class="nv">age</span> <span class="nb">&lt;</span> <span class="mi">21</span><span class="c1">;</span>
</code></pre></div></div>

<p>A vector parameter refers to rows to be inserted or as a set for an
<code class="language-plaintext highlighter-rouge">IN</code> expression.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">[:insert-into</span> <span class="nv">people</span> <span class="nv">[name</span> <span class="nv">age]</span> <span class="ss">:values</span> <span class="nv">$v1]</span>
</code></pre></div></div>

<p>Given the argument <code class="language-plaintext highlighter-rouge">(["Jim" 45] ["Jeff" 34])</code>, a list of two rows,
this becomes,</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">people</span> <span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">age</span><span class="p">)</span> <span class="k">VALUES</span> <span class="p">(</span><span class="s1">'"Jim"'</span><span class="p">,</span> <span class="mi">45</span><span class="p">),</span> <span class="p">(</span><span class="s1">'"Jeff"'</span><span class="p">,</span> <span class="mi">34</span><span class="p">);</span>
</code></pre></div></div>

<p>And this,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">[:select</span> <span class="nb">*</span> <span class="ss">:from</span> <span class="nv">tags</span> <span class="ss">:where</span> <span class="p">(</span><span class="nv">in</span> <span class="nv">tag</span> <span class="nv">$v1</span><span class="p">)</span><span class="nv">]</span>
</code></pre></div></div>

<p>Given the argument <code class="language-plaintext highlighter-rouge">[hiking camping biking]</code> becomes,</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">tags</span> <span class="k">WHERE</span> <span class="n">tag</span> <span class="k">IN</span> <span class="p">(</span><span class="s1">'hiking'</span><span class="p">,</span> <span class="s1">'camping'</span><span class="p">,</span> <span class="s1">'biking'</span><span class="p">);</span>
</code></pre></div></div>

<p>When writing these expressions keep in mind the command
<code class="language-plaintext highlighter-rouge">emacsql-show-last-sql</code>. It will display in the minibuffer the SQL
result of the s-expression statement before the point.</p>

<h3 id="schemas">Schemas</h3>

<p>A table schema is a list whose first element is a column specification
vector (i.e. row-oriented information is presented as vectors). The
remaining elements are table constraints. Here are the examples from
the documentation,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; No constraints schema with four columns:</span>
<span class="p">(</span><span class="nv">[name</span> <span class="nv">id</span> <span class="nv">building</span> <span class="nv">room]</span><span class="p">)</span>

<span class="c1">;; Add some column constraints:</span>
<span class="p">(</span><span class="nv">[</span><span class="p">(</span><span class="nv">name</span> <span class="ss">:unique</span><span class="p">)</span> <span class="p">(</span><span class="nv">id</span> <span class="nc">integer</span> <span class="ss">:primary-key</span><span class="p">)</span> <span class="nv">building</span> <span class="nv">room]</span><span class="p">)</span>

<span class="c1">;; Add some table constraints:</span>
<span class="p">(</span><span class="nv">[</span><span class="p">(</span><span class="nv">name</span> <span class="ss">:unique</span><span class="p">)</span> <span class="p">(</span><span class="nv">id</span> <span class="nc">integer</span> <span class="ss">:primary-key</span><span class="p">)</span> <span class="nv">building</span> <span class="nv">room]</span>
 <span class="p">(</span><span class="ss">:unique</span> <span class="nv">[building</span> <span class="nv">room]</span><span class="p">)</span>
 <span class="p">(</span><span class="ss">:check</span> <span class="p">(</span><span class="nb">&gt;</span> <span class="nv">id</span> <span class="mi">0</span><span class="p">)))</span>
</code></pre></div></div>

<p>In the handful of EmacSQL databases I’ve created for practice and
testing, I’ve put the schema in a global constant. A table schema is a
part of a program’s type specifications, and rows are instances of
that type, so it makes sense to declare schemas up top with things
like defstructs.</p>

<p>These schemas can be substituted into a SQL statement using a <code class="language-plaintext highlighter-rouge">$S</code>
parameter (capital “S” for <em>S</em>chema).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defconst</span> <span class="nv">foo-schema-people</span>
  <span class="o">'</span><span class="p">(</span><span class="nv">[</span><span class="p">(</span><span class="nv">person-id</span> <span class="nc">integer</span> <span class="ss">:primary-key</span><span class="p">)</span> <span class="nv">name</span> <span class="nv">age]</span><span class="p">))</span>

<span class="c1">;; ...</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">foo-init</span> <span class="p">(</span><span class="nv">db</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">emacsql</span> <span class="nv">db</span> <span class="nv">[:create-table</span> <span class="nv">$i1</span> <span class="nv">$S2]</span> <span class="ss">'people</span> <span class="nv">foo-schema-people</span><span class="p">))</span>
</code></pre></div></div>

<h3 id="back-ends">Back-ends</h3>

<p>Everything I’ve discussed so far is restricted to the SQL statement
compiler. It’s completely independent of the back-end implementations,
themselves mostly handling strings of SQL statements.</p>

<h4 id="sqlite-implementation-difficulties">SQLite Implementation Difficulties</h4>

<p>A little over a year ago I wrote <a href="/blog/2012/12/29/">a pastebin webapp</a> in
Elisp. I wanted to use SQLite as a back-end for storing pastes but
struggled to get the SQLite command shell, sqlite3, to cooperate with
Emacs. The problem was that all of the output modes except for “tcl”
are ambiguous. This includes the “csv” formatted output. TEXT values
can dump newlines, allowing rows to span an arbitrary number of lines.
They can dump things that look like the sqlite3 prompt, so it’s
impossible to know when sqlite3 is done printing results. I ultimately
decided the command shell was inadequate as an Emacs subprocess.</p>

<p>Recently there <a href="/blog/2013/09/09/">was some discussion</a> from alexbenjm and Andres
Ramirez on an Elfeed post about using SQLite as an Elfeed back-end.
This inspired me to take another look and that’s when I came up with a
workaround for SQLite’s ambiguity: only store printed Elisp values for
TEXT values! With <code class="language-plaintext highlighter-rouge">print-escape-newlines</code> set, TEXT values no longer
span multiple lines, and I can use <code class="language-plaintext highlighter-rouge">read</code> to pull in data from
sqlite3. All of sqlite3’s output modes were now unambiguous.</p>

<p>However, after making significant progress I discovered an even bigger
issue: GNU Readline. The sqlite3 binary provided by Linux package
repositories is almost always compiled with Readline support. This
makes the tool much more friendly to use, but it’s a huge problem for
Emacs.</p>

<p>First, sqlite3 the command shell is not up to the same standards as
SQLite the database. Not by a long shot. In my short time working with
SQLite I’ve already discovered several bugs in the command shell. For
one, it’s not properly integrated with GNU Readline. There’s an
<code class="language-plaintext highlighter-rouge">.echo</code> meta-command that turns command echoing on and off. That is,
it repeats your command back to you. Useful in some circumstances,
though not mine. The bug is that this echo is separate from GNU
Readline’s echo. When Readline is active and <code class="language-plaintext highlighter-rouge">.echo</code> is enabled, there
are actually <em>two</em> echos. Turn it off and there’s one echo.</p>

<h5 id="pseudo-terminals">Pseudo-terminals</h5>

<p>Under some circumstances, like when communicating over a pipe rather
than a PTY, Readline will mostly become deactivated. This would have
been a workaround, but when Readline is disabled sqlite3 heavily
buffers its output. This breaks any sort of interaction. Even worse,
on Windows <a href="http://sqlite.1065341.n5.nabble.com/Command-line-shell-not-flushing-stderr-when-interactive-td73340.html">stderr is not always unbuffered</a>, so sqlite3’s
error messages may not appear for a long time (another bug).</p>

<p>Besides the problem of getting Readline to shut up, another problem is
getting Readline to stop acting on control characters. The first 32
characters in ASCII are control characters. A pseudo-terminal (PTY)
that is not in raw mode will immediately act upon any control
characters it sees. There’s no escaping them.</p>

<p>Emacs communicates with subprocesses through a PTY by default
(probably an early design mistake), limiting the kind of data that can
be transmitted. You can try this yourself in a comint mode sometime
where a subprocess is used (not a socket like SLIME). Fire up <code class="language-plaintext highlighter-rouge">M-x
sql-sqlite</code> (part of Emacs) and try sending a string containing byte
0x1C (28, file separator). You can type one by pressing <code class="language-plaintext highlighter-rouge">C-q C-\</code>.
Send that byte and the subprocess dies.</p>

<p>There are two ways to work around this. One is to use a pipe (bind
<code class="language-plaintext highlighter-rouge">process-connection-type</code> to nil). Pipes don’t respond to control
characters. This doesn’t work with sqlite3 because of the
previously-mentioned buffering issue.</p>

<p>The other way to work around this is to put the PTY in raw mode.
Unfortunately there’s no function to do this so you need to call
<code class="language-plaintext highlighter-rouge">stty</code>. Of course, this program needs to run on the same PTY, so a
<code class="language-plaintext highlighter-rouge">start-process-shell-command</code> is required.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">start-process-shell-command</span> <span class="nv">name</span> <span class="nv">buffer</span> <span class="s">"stty raw &amp;&amp; &lt;your command&gt;"</span><span class="p">)</span>
</code></pre></div></div>

<p>Windows has neither <code class="language-plaintext highlighter-rouge">stty</code> nor PTYs (nor any of PTY’s issues) so
you’ll need to check the operating system before starting the process.
Even this still doesn’t work for sqlite3 because Readline itself will
respond to control characters. There’s no option to disable this.</p>

<p>There’s a package called <a href="https://github.com/mhayashi1120/Emacs-esqlite">esqlite</a> that is also a SQLite
front-end. It’s built to use sqlite3 and therefore suffers from all of
these problems.</p>

<h4 id="a-custom-sqlite-binary">A Custom SQLite Binary</h4>

<p>Since sqlite3 proved unreliable I developed my own protocol and
external program. It’s just a tiny bit of C that accepts a SQL string
and returns results as an s-expression. I’m not longer constrained to
storing readable values, but I’m still keeping that paradigm. First,
it keeps the C glue program simple and, more importantly, I can rely
entirely on the Emacs reader to parse the results. This makes
communication between Emacs and the subprocess as fast as it can
possibly be. The reader is faster than any possible Elisp program.</p>

<p>As I mentioned before, this C program is compiled when possible, and
otherwise a pre-built binary is fetched from my server (popular
platforms only, obviously). It’s likely EmacSQL will have at least one
working back-end on whatever you’re using.</p>

<h3 id="other-back-ends">Other Back-ends</h3>

<p>Both PostgreSQL and MySQL are also supported, though these require the
user have the appropriate client programs installed (psql or mysql).
Both of these are much better behaved than sqlite3 and, with the
<code class="language-plaintext highlighter-rouge">stty</code> trick, each can reliably be used without any special help. Both
pass all of the unit tests, so, in theory, they’ll work just as well
as SQLite.</p>

<p>To use them with the example at the beginning of this article, require
<code class="language-plaintext highlighter-rouge">emacsql-psql</code> or <code class="language-plaintext highlighter-rouge">emacsql-mysql</code>, then swap <code class="language-plaintext highlighter-rouge">emacsql-connect</code> for the
constructors <code class="language-plaintext highlighter-rouge">emacsql-psql</code> or <code class="language-plaintext highlighter-rouge">emacsql-mysql</code> (along with the proper
arguments). All three of these constructors return an
<code class="language-plaintext highlighter-rouge">emacsql-connection</code> object that works with the same API.</p>

<p>EmacSQL only goes so far to normalize the interfaces to these
databases, so for any non-trivial program you may not be able to swap
back-ends without some work. All of the EmacSQL functions that operate
on connections are generic functions (EIEIO), so changing back-ends
will only have an effect on the program’s SQL statements. For example,
if you use q SQLite-ism (dynamic typing) it won’t translate to either
of the other databases should they be swapped in.</p>

<p>I’ll cover the connections API, and what it takes to implement a new
back-end, in a future post. Outside of the PTY caveats, it’s actually
very easy. The MySQL implementation is just 80 lines of code.</p>

<h3 id="emacsqls-future">EmacSQL’s Future</h3>

<p>I hope this becomes a reliable and trusted database solution that
other packages can depend upon. Twice so far, the pastebin demo and
Elfeed, I’ve really wanted something like this and, instead, ended up
having to hack together my own database.</p>

<p>I’ve already started a branch on Elfeed re-implementing its database
in EmacSQL. Someday it may become Elfeed’s primary database if I feel
there’s no disadvantage to it. EmacSQL builds SQLite with the
full-text search engine enabled, which opens to the door to a
powerful, fast Elfeed search API. Currently the main obstacle is
actually Elfeed’s database API being somewhat incompatible with ACID
database transactions — shortsightedness on my part!</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Emacs Lisp Object Finalizers</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/01/27/"/>
    <id>urn:uuid:48023a80-358c-39b4-371b-d74dfb248897</id>
    <updated>2014-01-27T05:24:16Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p><strong>*Update</strong>: Emacs 25.1 (released Sept. 2016) formally introduced
finalizers to Emacs Lisp. This article is left here for historical
purposes.</p>

<p><strong>Problem</strong>: You have a special resource, such as a buffer or process,
associated with an Emacs Lisp object which is not managed by the
garbage collector. You want this resource to be cleaned up when the
owning lisp object is garbage collected. Unlike some other languages,
Elisp doesn’t provide <a href="http://en.wikipedia.org/wiki/Finalizer">finalizers</a> for this job, so what do
you do?</p>

<p><strong>Solution</strong>: This is Emacs Lisp. We can just add this feature to the
language ourselves!</p>

<p>I’ve already implemented this feature as a package called <code class="language-plaintext highlighter-rouge">finalize</code>,
available on MELPA. I will be using it as part of a larger, upcoming
project.</p>

<ul>
  <li><a href="https://github.com/skeeto/elisp-finalize">https://github.com/skeeto/elisp-finalize</a></li>
</ul>

<p>In this article I will describe how it works.</p>

<h3 id="processes-and-buffers">Processes and Buffers</h3>

<p>Process and buffers are special types of objects. Immediately after
instantiation these objects are added to a global list. They will
never become unreachable without explicitly being killed. The garbage
collector will never manage them for you.</p>

<p>This is a problem for APIs like those provided by the url package. The
functions <code class="language-plaintext highlighter-rouge">url-retrieve</code> and <code class="language-plaintext highlighter-rouge">url-retrieve-synchronously</code> create
buffers and hand them back to their callers. Ownership is transfered
to the caller and the caller must be careful to kill the buffer, or
transfer ownership again, before it returns. Otherwise the buffer is
“leaked.” The url package tries to manage this a little bit with
<code class="language-plaintext highlighter-rouge">url-gc-dead-buffers</code>, but this can’t be relied upon.</p>

<p>Another issue is when a process is started and is stored in a struct
or some other kind of object. There is probably a “close” function
that accepts one of these structs and kills the process. But if that
function isn’t called, due to a bug or an error condition, it will
become a “dangling” process. If the struct is completely lost, it will
probably be inconvenient to deal with the process — the “close”
function is no longer useful.</p>

<h3 id="with-macros">With Macros</h3>

<p>A common way to deal with this problem is using a <code class="language-plaintext highlighter-rouge">with-</code> macro. This
macro establishes a resource, evaluates a body, and ensures the
resource is properly cleaned up regardless of the body’s termination
state. The latter is accomplished using <code class="language-plaintext highlighter-rouge">unwind-protect</code>. For example,
<code class="language-plaintext highlighter-rouge">with-temp-buffer</code>,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Fetch the first 10 bytes of foo.txt</span>
<span class="p">(</span><span class="nv">with-temp-buffer</span>
  <span class="p">(</span><span class="nv">insert-file-contents</span> <span class="s">"foo.txt"</span> <span class="no">nil</span> <span class="mi">0</span> <span class="mi">10</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">buffer-string</span><span class="p">))</span>
</code></pre></div></div>

<p>This expands (roughly) to the following expression.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">temp-buffer</span> <span class="p">(</span><span class="nv">generate-new-buffer</span> <span class="s">"*temp*"</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">with-current-buffer</span> <span class="nv">temp-buffer</span>
    <span class="p">(</span><span class="k">unwind-protect</span>
        <span class="p">(</span><span class="k">progn</span>
          <span class="p">(</span><span class="nv">insert-file-contents</span> <span class="s">"foo.txt"</span> <span class="no">nil</span> <span class="mi">0</span> <span class="mi">10</span><span class="p">)</span>
          <span class="p">(</span><span class="nv">buffer-string</span><span class="p">))</span>
      <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nv">buffer-live-p</span> <span class="nv">temp-buffer</span><span class="p">)</span>
           <span class="p">(</span><span class="nv">kill-buffer</span> <span class="nv">temp-buffer</span><span class="p">)))))</span>
</code></pre></div></div>

<p>For dealing with open files, Common Lisp has <code class="language-plaintext highlighter-rouge">with-open-stream</code>. It
establishes a binding for a new stream over its body and ensures the
stream is closed when the body is complete. There’s no chance for a
stream to be left open, leaking a system resource.</p>

<p>However, <code class="language-plaintext highlighter-rouge">with-</code> macros aren’t useful in asynchronous situations. In
Emacs this would be the case for asynchronous sub-processes, such as
an attached language interpreter. The extent of the process goes
beyond a single body.</p>

<h3 id="finalizers">Finalizers</h3>

<p>What would really be useful is to have a callback — a finalizer —
that runs when an object is garbage collected. This ensures that the
resource will not outlive its owner, restoring management back to the
garbage collector. However, Emacs provides no such hook.</p>

<p>Fortunately this feature can be built using weak hash tables and the
<code class="language-plaintext highlighter-rouge">post-gc-hook</code>, a list of functions that are run immediately after
garbage collection.</p>

<h4 id="weak-references">Weak References</h4>

<p>I’ve discussed before <a href="/blog/2012/12/17/">how to create weak references in Elisp</a>.
The only weak references in Emacs are built into weak hash tables.
Normally the language provides weak references first and hash tables
are built on top of them. With Emacs we do this backwards.</p>

<p>The <code class="language-plaintext highlighter-rouge">make-hash-table</code> function accepts a key argument <code class="language-plaintext highlighter-rouge">:weakness</code> to
specify how strongly keys and values should be held by the table. To
make a weak reference just create a hash table of size 1 and set
<code class="language-plaintext highlighter-rouge">:weakness</code> to t.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">weak-ref</span> <span class="p">(</span><span class="nv">thing</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">ref</span> <span class="p">(</span><span class="nb">make-hash-table</span> <span class="ss">:size</span> <span class="mi">1</span> <span class="ss">:weakness</span> <span class="no">t</span> <span class="ss">:test</span> <span class="ss">'eq</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">prog1</span> <span class="nv">ref</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">gethash</span> <span class="no">t</span> <span class="nv">ref</span><span class="p">)</span> <span class="nv">thing</span><span class="p">))))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">deref</span> <span class="p">(</span><span class="nv">ref</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">gethash</span> <span class="no">t</span> <span class="nv">ref</span><span class="p">))</span>
</code></pre></div></div>

<p>The same trick can be used to detect when an object is garbage
collected. If the result of <code class="language-plaintext highlighter-rouge">deref</code> is nil, then the object was
garbage collected. (Or the weakly-referenced object <em>is</em> nil, but this
object will never be garbage collected anyway.)</p>

<p>To check if we need to run a finalizer all we have to do is create a
weak reference to the object, then check the reference after garbage
collection. This check can be done in a <code class="language-plaintext highlighter-rouge">post-gc-hook</code> function.</p>

<h4 id="registration">Registration</h4>

<p>To avoid cluttering up <code class="language-plaintext highlighter-rouge">post-gc-hook</code> with one closure per object
we’ll keep a register of all watched objects.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">finalizable-objects</span> <span class="p">())</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">register</span> <span class="p">(</span><span class="nv">object</span> <span class="nv">callback</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">push</span> <span class="p">(</span><span class="nb">cons</span> <span class="p">(</span><span class="nv">weak-ref</span> <span class="nv">object</span><span class="p">)</span> <span class="nv">callback</span><span class="p">)</span> <span class="nv">finalizable-objects</span><span class="p">))</span>
</code></pre></div></div>

<p>Now a function to check for missing objects, <code class="language-plaintext highlighter-rouge">try-finalize</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">try-finalize</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">alive</span> <span class="p">(</span><span class="nv">cl-remove-if-not</span> <span class="nf">#'</span><span class="nv">deref</span> <span class="nv">finalizable-objects</span> <span class="ss">:key</span> <span class="nf">#'</span><span class="nb">car</span><span class="p">))</span>
        <span class="p">(</span><span class="nv">dead</span> <span class="p">(</span><span class="nv">cl-remove-if</span> <span class="nf">#'</span><span class="nv">deref</span> <span class="nv">finalizable-objects</span> <span class="ss">:key</span> <span class="nf">#'</span><span class="nb">car</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">setf</span> <span class="nv">finalizable-objects</span> <span class="nv">alive</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">mapc</span> <span class="nf">#'</span><span class="nb">funcall</span> <span class="p">(</span><span class="nb">mapcar</span> <span class="nf">#'</span><span class="nb">cdr</span> <span class="nv">dead</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'post-gc-hook</span> <span class="nf">#'</span><span class="nv">try-finalize</span><span class="p">)</span>
</code></pre></div></div>

<p>Now to try it out. Create a process, stuff it in a vector (like a
defstruct), register <code class="language-plaintext highlighter-rouge">delete-process</code> as a finalizer, and, for the
sake of demonstration, immediately forget the vector.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: t; -*-</span>
<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">process</span> <span class="p">(</span><span class="nv">start-process</span> <span class="s">"ping"</span> <span class="no">nil</span> <span class="s">"ping"</span> <span class="s">"localhost"</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">register</span> <span class="p">(</span><span class="nb">vector</span> <span class="nv">process</span><span class="p">)</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="p">(</span><span class="nv">delete-process</span> <span class="nv">process</span><span class="p">))))</span>

<span class="c1">;; Assuming the garbage collector has not already run.</span>
<span class="p">(</span><span class="nv">get-process</span> <span class="s">"ping"</span><span class="p">)</span>
<span class="c1">;; =&gt; #&lt;process ping&gt;</span>

<span class="c1">;; Force garbage collection.</span>
<span class="p">(</span><span class="nv">garbage-collect</span><span class="p">)</span>

<span class="p">(</span><span class="nv">get-process</span> <span class="s">"ping"</span><span class="p">)</span>
<span class="c1">;; =&gt; nil</span>
</code></pre></div></div>

<p>The garbage collector killed the process for us!</p>

<p>There are some problems with this implementation. Using <code class="language-plaintext highlighter-rouge">cl-remove-if</code>
is unwise in a <code class="language-plaintext highlighter-rouge">post-gc-hook</code> function. It allocates lots of new cons
cells but garbage collection is inhibited while the function is run.
The docstring warns us:</p>

<blockquote>
  <p>Garbage collection is inhibited while the hook functions run, so be
careful writing them.</p>
</blockquote>

<p>Similarly, all of the finalizers are run within the context of this
memory-sensitive hook. Instead they should be delayed until the next
evaluation turn (i.e. <code class="language-plaintext highlighter-rouge">run-at-time</code> of 0). Some of the finalizers
could also fail, which would cause the remaining finalizers to never
run. The real implementation deals with all of these issues.</p>

<p>A major drawback to these Emacs Lisp finalizers compared to other
languages is that the actual object is not available. We don’t know
it’s getting collected until after it’s already gone. This solves the
object resurrection problem, but it’s darn inconvenient. One possible
workaround in the case of defstructs and EIEIO objects is to make a
copy of the original object (<code class="language-plaintext highlighter-rouge">copy-sequence</code> or <code class="language-plaintext highlighter-rouge">clone</code>) and run the
finalizer on the copy as if it was the original.</p>

<h3 id="the-real-implementation">The Real Implementation</h3>

<p>The real implementation is more carefully namespaced and its API has
just one function: <code class="language-plaintext highlighter-rouge">finalize-register</code>. It works just like <code class="language-plaintext highlighter-rouge">register</code>
above but it accepts <code class="language-plaintext highlighter-rouge">&amp;rest</code> arguments to be passed to the finalizer.
This makes the registration call simpler and avoids some
<a href="/blog/2013/12/30/#the_readable_closures_catch">significant problems with closures</a>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">process</span> <span class="p">(</span><span class="nv">start-process</span> <span class="s">"ping"</span> <span class="no">nil</span> <span class="s">"ping"</span> <span class="s">"localhost"</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">finalize-register</span> <span class="p">(</span><span class="nb">vector</span> <span class="nv">process</span><span class="p">)</span> <span class="nf">#'</span><span class="nv">delete-process</span> <span class="nv">process</span><span class="p">))</span>
</code></pre></div></div>

<p>Here’s a more formal example of how it might really be used.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-defstruct</span> <span class="p">(</span><span class="nv">pinger</span> <span class="p">(</span><span class="ss">:constructor</span> <span class="nv">pinger--create</span><span class="p">))</span>
  <span class="nv">process</span> <span class="nv">host</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">pinger-create</span> <span class="p">(</span><span class="nv">host</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">process</span> <span class="p">(</span><span class="nv">start-process</span> <span class="s">"pinger"</span> <span class="no">nil</span> <span class="s">"ping"</span> <span class="nv">host</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">object</span> <span class="p">(</span><span class="nv">pinger--create</span> <span class="ss">:process</span> <span class="nv">process</span> <span class="ss">:host</span> <span class="nv">host</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">finalize-register</span> <span class="nv">object</span> <span class="nf">#'</span><span class="nv">delete-process</span> <span class="nv">process</span><span class="p">)</span>
    <span class="nv">object</span><span class="p">))</span>
</code></pre></div></div>

<p>To make things cleaner for EIEIO classes there’s also a <code class="language-plaintext highlighter-rouge">finalizable</code>
mixin class that ensures the <code class="language-plaintext highlighter-rouge">finalize</code> generic function is called on
a copy of the object (the original object is gone) when it’s garbage
collected.</p>

<p>Here’s how it would be used for the same “pinger” concept, this time
as an EIEIO class. An advantage here is that anyone can manually call
<code class="language-plaintext highlighter-rouge">finalize</code> early if desired.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'eieio</span><span class="p">)</span>
<span class="p">(</span><span class="nb">require</span> <span class="ss">'finalizable</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defclass</span> <span class="nv">pinger</span> <span class="p">(</span><span class="nv">finalizable</span><span class="p">)</span>
  <span class="p">((</span><span class="nv">process</span> <span class="ss">:initarg</span> <span class="ss">:process</span> <span class="ss">:reader</span> <span class="nv">pinger-process</span><span class="p">)</span>
   <span class="p">(</span><span class="nv">host</span> <span class="ss">:initarg</span> <span class="ss">:host</span> <span class="ss">:reader</span> <span class="nv">pinger-host</span><span class="p">)))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">pinger-create</span> <span class="p">(</span><span class="nv">host</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">make-instance</span> <span class="ss">'pinger</span>
                 <span class="ss">:process</span> <span class="p">(</span><span class="nv">start-process</span> <span class="s">"ping"</span> <span class="no">nil</span> <span class="s">"ping"</span> <span class="nv">host</span><span class="p">)</span>
                 <span class="ss">:host</span> <span class="nv">host</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defmethod</span> <span class="nv">finalize</span> <span class="p">((</span><span class="nv">pinger</span> <span class="nv">pinger</span><span class="p">))</span>
  <span class="p">(</span><span class="nv">delete-process</span> <span class="p">(</span><span class="nv">pinger-process</span> <span class="nv">pinger</span><span class="p">)))</span>
</code></pre></div></div>

<p>It’s a small package but I think it can be quite handy.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Measure Elisp Object Memory Usage with Calipers</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/01/26/"/>
    <id>urn:uuid:3ba9664d-2758-30c8-6b33-2c17835575d1</id>
    <updated>2014-01-26T01:15:02Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>A couple of weeks ago I wrote a library to measure the retained memory
footprint of arbitrary Elisp objects for the purposes of optimization.
It’s called Caliper.</p>

<ul>
  <li><a href="https://github.com/skeeto/caliper">https://github.com/skeeto/caliper</a></li>
</ul>

<p>Note, Caliper requires <a href="/blog/2013/12/18/">predd, my predicate dispatch library</a>.
Neither of these packages are on MELPA or Marmalade since they’re
mostly for fun.</p>

<p>The reason I wanted this was that I came across a post on reddit where
someone had <a href="http://old.reddit.com/r/datasets/comments/1uyd0t/">scraped 217,000 <em>Jeopardy!</em> questions</a> from
<a href="http://www.j-archive.com/">J! Archive</a> and dumped them out into a single, large JSON
file. The significance of the effort is that it dealt with some of the
inconsistencies of <em>J! Archive</em>’s data presentation, normalizing them
for the JSON output.</p>

<ul>
  <li><a href="https://skeeto.s3.amazonaws.com/share/JEOPARDY_QUESTIONS1.json.gz">JEOPARDY_QUESTIONS1.json.gz</a> (12MB, 53MB uncompressed)</li>
</ul>

<p>When I want to examine a JSON dataset like this I have three preferred
options:</p>

<ul>
  <li>Load it into a browser page and poke at it from JavaScript remotely
with <a href="https://github.com/skeeto/skewer-mode">Skewer</a>. With the JSON text weighing in at 53MB and
with such a large object count, I decided this was too large for a
browser page. It definitely <em>could</em> be done, it’s just that the
browser is not the place to be working on large datasets.</li>
  <li>Load it into Clojure. I’m familiar with Clojure’s
<a href="https://github.com/clojure/data.json">data.json</a>. This is not a bad choice, but there’s
something else I always reach for first if I can.</li>
  <li>Load it into Emacs using json.el (part of Emacs). This is what I
ended up doing.</li>
</ul>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">jeopardy</span>
  <span class="p">(</span><span class="nv">with-temp-buffer</span>
    <span class="p">(</span><span class="nv">insert-file-contents</span> <span class="s">"/tmp/JEOPARDY_QUESTIONS1.json"</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">json-read</span><span class="p">)))</span>

<span class="p">(</span><span class="nb">length</span> <span class="nv">jeopardy</span><span class="p">)</span>
<span class="c1">;; =&gt; 216930</span>
</code></pre></div></div>

<p>Here, <code class="language-plaintext highlighter-rouge">jeopardy</code> is bound to a vector of 216,930 association lists
(alists). I’m curious exactly how much heap memory this data structure
is using. To find out, we need to walk the data structure and sum the
sizes of everything we come across. However, care must be taken not to
count the identical objects twice, such as symbols, which, being
interned, appear many times in this data.</p>

<h3 id="measuring-object-sizes">Measuring Object Sizes</h3>

<p>This is lisp so let’s start with the cons cell. A cons cell is just a
pair of pointers, called <em>car</em> and <em>cdr</em>.</p>

<p><img src="/img/diagram/cons.png" alt="" /></p>

<p>These are used to assemble lists.</p>

<p><img src="/img/diagram/list.png" alt="" /></p>

<p>So a cons cell itself — the <em>shallow</em> size — is two words: 16 bytes
on a 64-bit operating system. To make sure Elisp doesn’t happen to
have any additional information attached to cons cells, let’s take a
look at the Emacs source code.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">Lisp_Cons</span>
  <span class="p">{</span>
    <span class="cm">/* Car of this cons cell.  */</span>
    <span class="n">Lisp_Object</span> <span class="n">car</span><span class="p">;</span>

    <span class="k">union</span>
    <span class="p">{</span>
      <span class="cm">/* Cdr of this cons cell.  */</span>
      <span class="n">Lisp_Object</span> <span class="n">cdr</span><span class="p">;</span>

      <span class="cm">/* Used to chain conses on a free list.  */</span>
      <span class="k">struct</span> <span class="n">Lisp_Cons</span> <span class="o">*</span><span class="n">chain</span><span class="p">;</span>
    <span class="p">}</span> <span class="n">u</span><span class="p">;</span>
  <span class="p">};</span>
</code></pre></div></div>

<p>The return value from <code class="language-plaintext highlighter-rouge">garbage-collect</code> backs this up. The first value
after each type is the shallow size of that type. From here on, all
values have been computed for 64-bit Emacs running on x86-64
GNU/Linux.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">garbage-collect</span><span class="p">)</span>
<span class="c1">;; =&gt; ((conses 16 9923172 2036943)</span>
<span class="c1">;;     (symbols 48 57017 54)</span>
<span class="c1">;;     (miscs 40 10203 18892)</span>
<span class="c1">;;     (strings 32 4810027 197961)</span>
<span class="c1">;;     (string-bytes 1 104599635)</span>
<span class="c1">;;     (vectors 16 103138)</span>
<span class="c1">;;     (vector-slots 8 2921744 131076)</span>
<span class="c1">;;     (floats 8 12494 5816)</span>
<span class="c1">;;     (intervals 56 119911 69249)</span>
<span class="c1">;;     (buffers 960 134)</span>
<span class="c1">;;     (heap 1024 593412 133853))</span>
</code></pre></div></div>

<p>A <code class="language-plaintext highlighter-rouge">Lisp_Object</code> is just a pointer to a lisp object. The <em>retained</em>
size of a cons cell is its shallow size plus, recursively, the
retained size of the objects in its car and cdr.</p>

<h4 id="integers-and-floats">Integers and Floats</h4>

<p>Integers are a special case. Elisp uses what is called <em>tagged
integers</em>. They’re not heap-allocated objects. Instead they’re
embedded inside the object pointers. That is, those <code class="language-plaintext highlighter-rouge">Lisp_Object</code>
pointers in <code class="language-plaintext highlighter-rouge">Lisp_Cons</code> will hold integers directly. This means to
Caliper integers have retained size of 0. We can use this to verify
Caliper’s return value for cons cells.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">caliper-object-size</span> <span class="mi">100</span><span class="p">)</span>
<span class="c1">;; =&gt; 0</span>

<span class="p">(</span><span class="nv">caliper-object-size</span> <span class="p">(</span><span class="nb">cons</span> <span class="mi">100</span> <span class="mi">200</span><span class="p">))</span>
<span class="c1">;; =&gt; 16</span>
</code></pre></div></div>

<p>Tagged integers are fast and save on memory. They also compare
properly with <code class="language-plaintext highlighter-rouge">eq</code>, which is just a pointer (identity) comparison.
However, because a few bits need to be reserved for differentiating
them from actual pointers these integers have a restricted dynamic
range.</p>

<p>Floats are not tagged and exist as immutable objects in the heap.
That’s why <code class="language-plaintext highlighter-rouge">eql</code> is still useful in Elisp — it’s like <code class="language-plaintext highlighter-rouge">eq</code> but will
handle numbers properly. (By convention you should use <code class="language-plaintext highlighter-rouge">eql</code> for
integers, too.)</p>

<h4 id="symbols-and-strings">Symbols and Strings</h4>

<p>Not counting the string’s contents, a string’s base size is 32 bytes
according to <code class="language-plaintext highlighter-rouge">garbage-collect</code>. The <code class="language-plaintext highlighter-rouge">length</code> of the string can’t be
used here because that counts characters, which vary in size. There’s
a <code class="language-plaintext highlighter-rouge">string-bytes</code> function for this. A string’s size is 32 plus its
<code class="language-plaintext highlighter-rouge">string-bytes</code> value.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">string-bytes</span> <span class="s">"naïveté"</span><span class="p">)</span>
<span class="c1">;; =&gt; 9</span>
<span class="p">(</span><span class="nv">caliper-object-size</span> <span class="s">"naïveté"</span><span class="p">)</span>
<span class="c1">;; =&gt; 41  (i.e. 32 + 9)</span>
</code></pre></div></div>

<p>As you can see from above, symbols are <em>huge</em>. Without even counting
either the string holding the name of the symbol or the symbol’s
plist, a symbol is 48 bytes.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">caliper-object-size</span> <span class="ss">'hello</span><span class="p">)</span>
<span class="c1">;; =&gt; 1038</span>
</code></pre></div></div>

<p>This 1,038 bytes is a little misleading. The symbol itself is 48
bytes, the string <code class="language-plaintext highlighter-rouge">"hello"</code> is 37 bytes, and the plist is nil. The
retained size of <code class="language-plaintext highlighter-rouge">nil</code> is significant. On my system, nil’s plist has 4
key-value pairs, which themselves have retained sizes. When examining
symbols, caliper doesn’t care if they’re interned or not, including
symbols like <code class="language-plaintext highlighter-rouge">nil</code> and <code class="language-plaintext highlighter-rouge">t</code>. However, nil is only counted once, so it
will have little impact on a large data structure.</p>

<h4 id="miscellaneous">Miscellaneous</h4>

<p>Outside of vectors, measuring object sizes starts to get fuzzy. For
example, it’s not possible to examine the exact internals of a hash
table from Elisp. We can see its contents and the number of elements
it can hold without re-sizing, but there’s intermediate structure
that’s not visible. Caliper makes rough estimates for each of these
types.</p>

<h4 id="circularity-and-double-counting">Circularity and Double Counting</h4>

<p>To avoid double counting objects, a hash table with a test of <code class="language-plaintext highlighter-rouge">eq</code> is
dynamically bound by the top level call. It’s used like a set. Before
an object is examined, the hash table is checked. If the object is
listed, the reported size is 0 (it consumes no additional space than
already accounted for).</p>

<p>This automatically solves the circularity problem. There’s no way we
can traverse into the same data structure a second time because we’ll
stop when we see it twice.</p>

<h3 id="using-caliper">Using Caliper</h3>

<p>So what’s the total retained size of the <code class="language-plaintext highlighter-rouge">jeopardy</code> structure? About
124MB.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">caliper-object-size</span> <span class="nv">jeopardy</span><span class="p">)</span>
<span class="c1">;; =&gt; 130430198</span>
</code></pre></div></div>

<p>For fun, let’s see if how much we can improve on this.</p>

<p>json.el will return alists for objects by default, but this can be
changed by setting <code class="language-plaintext highlighter-rouge">json-object-type</code> to something else. Initially I
thought maybe using plists instead would save space, but I later
realized that <strong>plists use exactly the same number of cons cells as
alists</strong>. If this doesn’t sound right, try to picture the cons cells
in your head (an exercise for the reader).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">jeopardy</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">json-object-type</span> <span class="ss">'plist</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">with-temp-buffer</span>
      <span class="p">(</span><span class="nv">insert-file-contents</span> <span class="s">"~/JEOPARDY_QUESTIONS1.json"</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">point</span><span class="p">)</span> <span class="p">(</span><span class="nv">point-min</span><span class="p">))</span>
      <span class="p">(</span><span class="nv">json-read</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">caliper-object-size</span> <span class="nv">jeopardy</span><span class="p">)</span>
<span class="c1">;; =&gt; 130430077 (plist)</span>
</code></pre></div></div>

<p>Strangely this is 121 bytes smaller. I don’t know why yet, but in the
scope of 124MB that’s nothing.</p>

<p>So what do these questions look like?</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">elt</span> <span class="nv">jeopardy</span> <span class="mi">0</span><span class="p">)</span>
<span class="c1">;; =&gt; (:show_number "4680"</span>
<span class="c1">;;     :round "Jeopardy!"</span>
<span class="c1">;;     :answer "Copernicus"</span>
<span class="c1">;;     :value "$200"</span>
<span class="c1">;;     :question "..." ;; omitted</span>
<span class="c1">;;     :air_date "2004-12-31"</span>
<span class="c1">;;     :category "HISTORY")</span>
</code></pre></div></div>

<p>They’re (now) plists of 7 pairs. All of the keys are symbols, and, as
such, are interned and consuming very little memory. All of the values
are strings. Surely we can do better here. The strings can be interned
and the numbers can be turned into tagged integers. The :category
values would probably be good candidates for conversion into symbols.</p>

<p>Here’s an interesting fact about Jeopardy! that can be exploited for
our purposes. While Jeopardy! covers a broad range of trivia,
<a href="http://vimeo.com/29001512">it does so very shallowly</a>. The same answers appear many
times. For example, the very first answer from our dataset,
Copernicus, appears 14 times. That makes even the answers good
candidates for interning.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="nv">question</span> <span class="nv">across</span> <span class="nv">jeopardy</span>
         <span class="nv">for</span> <span class="nv">answer</span> <span class="nb">=</span> <span class="p">(</span><span class="nv">plist-get</span> <span class="nv">question</span> <span class="ss">:answer</span><span class="p">)</span>
         <span class="nb">count</span> <span class="p">(</span><span class="nb">string=</span> <span class="nv">answer</span> <span class="s">"Copernicus"</span><span class="p">))</span>
<span class="c1">;; =&gt; 14</span>
</code></pre></div></div>

<p>A string pool is trivial to implement. Just use a weak, <code class="language-plaintext highlighter-rouge">equal</code> hash
table to track strings. Making it weak keeps it from leaking memory by
holding onto strings for longer than necessary.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">string-pool</span>
  <span class="p">(</span><span class="nb">make-hash-table</span> <span class="ss">:test</span> <span class="ss">'equal</span> <span class="ss">:weakness</span> <span class="no">t</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">intern-string</span> <span class="p">(</span><span class="nb">string</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">or</span> <span class="p">(</span><span class="nb">gethash</span> <span class="nb">string</span> <span class="nv">string-pool</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">gethash</span> <span class="nb">string</span> <span class="nv">string-pool</span><span class="p">)</span> <span class="nb">string</span><span class="p">)))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">jeopardy-fix</span> <span class="p">(</span><span class="nv">question</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">cl-loop</span> <span class="nv">for</span> <span class="p">(</span><span class="nv">key</span> <span class="nv">value</span><span class="p">)</span> <span class="nv">on</span> <span class="nv">question</span> <span class="nv">by</span> <span class="nf">#'</span><span class="nb">cddr</span>
           <span class="nv">collect</span> <span class="nv">key</span>
           <span class="nv">collect</span> <span class="p">(</span><span class="nv">cl-case</span> <span class="nv">key</span>
                     <span class="p">(</span><span class="ss">:show_number</span> <span class="p">(</span><span class="nb">read</span> <span class="nv">value</span><span class="p">))</span>
                     <span class="p">(</span><span class="ss">:value</span> <span class="p">(</span><span class="k">if</span> <span class="nv">value</span> <span class="p">(</span><span class="nb">read</span> <span class="p">(</span><span class="nv">substring</span> <span class="nv">value</span> <span class="mi">1</span><span class="p">))))</span>
                     <span class="p">(</span><span class="ss">:category</span> <span class="p">(</span><span class="nb">intern</span> <span class="nv">value</span><span class="p">))</span>
                     <span class="p">(</span><span class="nv">otherwise</span> <span class="p">(</span><span class="nv">intern-string</span> <span class="nv">value</span><span class="p">)))))</span>

<span class="p">(</span><span class="nb">defvar</span> <span class="nv">jeopardy-interned</span>
  <span class="p">(</span><span class="nv">cl-map</span> <span class="ss">'vector</span> <span class="nf">#'</span><span class="nv">jeopardy-fix</span> <span class="nv">jeopardy</span><span class="p">))</span>
</code></pre></div></div>

<p>So how are we looking now?</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">caliper-object-size</span> <span class="nv">jeopardy-interned</span><span class="p">)</span>
<span class="c1">;; =&gt; 83254322</span>
</code></pre></div></div>

<p>That’s down to 79MB of memory. Not bad! If we <code class="language-plaintext highlighter-rouge">print-circle</code> this,
taking advantage of string interning in the printed representation, I
wonder how it compares to the original JSON.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">with-temp-buffer</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">print-circle</span> <span class="no">nil</span><span class="p">))</span>
    <span class="p">(</span><span class="nb">prin1</span> <span class="nv">jeopardy-interned</span> <span class="p">(</span><span class="nv">current-buffer</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">buffer-size</span><span class="p">)))</span>
<span class="c1">;; =&gt; 45554437</span>
</code></pre></div></div>

<p>About 44MB, down from JSON’s 53MB. With <code class="language-plaintext highlighter-rouge">print-circle</code> set to nil it’s about
48MB.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Emacs Byte-code Internals</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2014/01/04/"/>
    <id>urn:uuid:c03869b5-fca0-3f9e-8dda-c3f361b287a8</id>
    <updated>2014-01-04T05:07:26Z</updated>
    <category term="emacs"/><category term="lisp"/><category term="elisp"/><category term="lang"/>
    <content type="html">
      <![CDATA[<p>Byte-code compilation is an underdocumented — and in the case of the
recent lexical binding updates, undocumented — part of Emacs. Most
users know that Elisp is usually compiled into a byte-code saved to
<code class="language-plaintext highlighter-rouge">.elc</code> files, and that byte-code loads and runs faster than uncompiled
Elisp. That’s all users really need to know, and the <em>GNU Emacs Lisp
Reference Manual</em> specifically discourages poking around too much.</p>

<blockquote>
  <p><strong>People do not write byte-code;</strong> that job is left to the byte
compiler. But we provide a disassembler to satisfy a cat-like
curiosity.</p>
</blockquote>

<p>Screw that! What if I want to handcraft some byte-code myself? :-) The
purpose of this article is to introduce the internals of Elisp
byte-code interpreter. I will explain how it works, why lexically
scoped code is faster, and demonstrate writing some byte-code by hand.</p>

<h3 id="the-humble-stack-machine">The Humble Stack Machine</h3>

<p>The byte-code interpreter is a simple stack machine. The stack holds
arbitrary lisp objects. The interpreter is backwards compatible but
not forwards compatible (old versions can’t run new byte-code). Each
instruction is between 1 and 3 bytes. The first byte is the opcode and
the second and third bytes are either a single operand or a single
intermediate value. Some operands are packed into the opcode byte.</p>

<p>As of this writing (Emacs 24.3) there are 142 opcodes, 6 of which have
been declared obsolete. Most opcodes refer to commonly used built-in
functions for fast access. (Looking at the selection, Elisp really is
geared towards text!) Considering packed operands, there are up to 27
potential opcodes unused, reserved for the future.</p>

<ul>
  <li>opcodes 48 - 55</li>
  <li>opcode 97</li>
  <li>opcode 128</li>
  <li>opcodes 169 - 174</li>
  <li>opcodes 180 - 181</li>
  <li>opcodes 183 - 191</li>
</ul>

<p>The easiest place to access the opcode listing is in
<a href="http://cvs.savannah.gnu.org/viewvc/emacs/emacs/lisp/emacs-lisp/bytecomp.el?view=markup">bytecomp.el</a>. Beware that some of the opcode comments are
currently out of date.</p>

<h3 id="segmentation-fault-warning">Segmentation Fault Warning</h3>

<p>Byte-code does not offer the same safety as normal Elisp. <strong>Bad
byte-code can, and will, cause Emacs to crash.</strong> You can try out for
yourself right now,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>emacs -batch -Q --eval '(print (#[0 "\300\207" [] 0]))'
</code></pre></div></div>

<p>Or evaluate the code manually in a buffer (save everything first!),</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="err">#</span><span class="nv">[0</span> <span class="s">"\300\207"</span> <span class="nv">[]</span> <span class="nv">0]</span><span class="p">)</span>
</code></pre></div></div>

<p>This segfault, caused by referencing beyond the end of the constants
vector, is <em>not</em> an Emacs bug. Doing a boundary test would slow down
the byte-code interpreter. Not performing this test at run-time is a
practical engineering decision. The Emacs developers have instead
chosen to rely on valid byte-code output from the compiler, making a
disclaimer to anyone wanting to write their own byte-code,</p>

<blockquote>
  <p>You should not try to come up with the elements for a byte-code
function yourself, because if they are inconsistent, Emacs may crash
when you call the function. Always leave it to the byte compiler to
create these objects; it makes the elements consistent (we hope).</p>
</blockquote>

<p>You’ve been warned. Now it’s time to start playing with firecrackers.</p>

<h3 id="the-byte-code-object">The Byte-code Object</h3>

<p>A byte-code object is functionally equivalent to a normal Elisp vector
<em>except</em> that it can be evaluated as a function. Elements are accessed
in constant time, the syntax is similar to vector syntax (<code class="language-plaintext highlighter-rouge">[...]</code> vs.
<code class="language-plaintext highlighter-rouge">#[...]</code>), and it can be of any length, though valid functions must
have at least 4 elements.</p>

<p>There are two ways to create a byte-code object: using a byte-code
object literal or with <code class="language-plaintext highlighter-rouge">make-byte-code</code>.
<a href="/blog/2012/07/17/">Like vector literals</a>, byte-code literals don’t need to be
quoted.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">make-byte-code</span> <span class="mi">0</span> <span class="s">""</span> <span class="nv">[]</span> <span class="mi">0</span><span class="p">)</span>
<span class="c1">;; =&gt; #[0 "" [] 0]</span>

<span class="err">#</span><span class="nv">[1</span> <span class="mi">2</span> <span class="mi">3</span> <span class="nv">4]</span>
<span class="c1">;; =&gt; #[1 2 3 4]</span>

<span class="p">(</span><span class="err">#</span><span class="nv">[0</span> <span class="s">""</span> <span class="nv">[]</span> <span class="nv">0]</span><span class="p">)</span>
<span class="c1">;; error: Invalid byte opcode</span>
</code></pre></div></div>

<p>The elements of an object literal are:</p>

<ul>
  <li>Function parameter (lambda) list</li>
  <li>Unibyte string of byte-code</li>
  <li>Constants vector</li>
  <li>Maximum stack usage</li>
  <li>Docstring (optional, nil for none)</li>
  <li>Interactive specification (optional)</li>
</ul>

<h4 id="parameter-list">Parameter List</h4>

<p>The parameter list takes on two different forms depending on if the
function is lexically or dynamically scoped. If the function is
dynamically scoped, the argument list is exactly what appears in lisp
code.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">byte-compile</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span> <span class="k">&amp;optional</span> <span class="nv">c</span><span class="p">)))</span>
<span class="c1">;; =&gt; #[(a b &amp;optional c) "\300\207" [nil] 1]</span>
</code></pre></div></div>

<p>There’s really no shorter way to represent the parameter list because
preserving the argument names is critical. Remember that, in dynamic
scope, while the function body is being evaluated these variables are
<em>globally</em> bound (eww!) to the function’s arguments.</p>

<p>When the function is lexically scoped, the parameter list is packed
into an Elisp integer, indicating the counts of the different kinds of
parameters: required, <code class="language-plaintext highlighter-rouge">&amp;optional</code>, and <code class="language-plaintext highlighter-rouge">&amp;rest</code>.</p>

<p><img src="/img/diagram/elisp-params.png" alt="" /></p>

<p>The least significant 7 bits indicate the number of required
arguments. Notice that this limits compiled, lexically-scoped
functions to 127 required arguments. The 8th bit is the number of
<code class="language-plaintext highlighter-rouge">&amp;rest</code> arguments (up to 1). The remaining bits indicate the total
number of optional and required arguments (not counting <code class="language-plaintext highlighter-rouge">&amp;rest</code>). It’s
really easy to parse these in your head when viewed as hexadecimal
because each portion almost always fits inside its own “digit.”</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">byte-compile-make-args-desc</span> <span class="o">'</span><span class="p">())</span>
<span class="c1">;; =&gt; #x000  (0 args, 0 rest, 0 required)</span>

<span class="p">(</span><span class="nv">byte-compile-make-args-desc</span> <span class="o">'</span><span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">))</span>
<span class="c1">;; =&gt; #x202  (2 args, 0 rest, 2 required)</span>

<span class="p">(</span><span class="nv">byte-compile-make-args-desc</span> <span class="o">'</span><span class="p">(</span><span class="nv">a</span> <span class="nv">b</span> <span class="k">&amp;optional</span> <span class="nv">c</span><span class="p">))</span>
<span class="c1">;; =&gt; #x302  (3 args, 0 rest, 2 required)</span>

<span class="p">(</span><span class="nv">byte-compile-make-args-desc</span> <span class="o">'</span><span class="p">(</span><span class="nv">a</span> <span class="nv">b</span> <span class="k">&amp;optional</span> <span class="nv">c</span> <span class="k">&amp;rest</span> <span class="nv">d</span><span class="p">))</span>
<span class="c1">;; =&gt; #x382  (3 args, 1 rest, 2 required)</span>
</code></pre></div></div>

<p>The names of the arguments don’t matter in lexical scope: they’re
purely positional. This tighter argument specification is one of the
reasons lexical scope is faster: the byte-code interpreter doesn’t
need to parse the entire lambda list and assign all of the variables
on each function invocation.</p>

<h4 id="unibyte-string-byte-code">Unibyte String Byte-code</h4>

<p>The second element is a unibyte string — it strictly holds octets and
is not to be interpreted as any sort of Unicode encoding. These
strings should be created with <code class="language-plaintext highlighter-rouge">unibyte-string</code> because <code class="language-plaintext highlighter-rouge">string</code> may
return a multibyte string. To disambiguate the string type to the lisp
reader when higher values are present (&gt; 127), the strings are printed
in an escaped octal notation, keeping the string literal inside the
ASCII character set.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">unibyte-string</span> <span class="mi">100</span> <span class="mi">200</span> <span class="mi">250</span><span class="p">)</span>
<span class="c1">;; =&gt; "d\310\372"</span>
</code></pre></div></div>

<p>It’s unusual to see a byte-code string that doesn’t end with 135
(#o207, byte-return). Perhaps this should have been implicit? I’ll
talk more about the byte-code below.</p>

<h4 id="constants-vector">Constants Vector</h4>

<p>The byte-code has very limited operands. Most operands are only a few
bits, some fill an entire byte, and occasionally two bytes. The meat
of the function that holds all the constants, function symbols, and
variables symbols is the constants vector. It’s a normal Elisp vector
and can be created with <code class="language-plaintext highlighter-rouge">vector</code> or a vector literal. Operands
reference either this vector or they index into the stack itself.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">byte-compile</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nv">my-func</span> <span class="nv">b</span> <span class="nv">a</span><span class="p">)))</span>
<span class="c1">;; =&gt; #[(a b) "\302\134\011\042\207" [b a my-func] 3]</span>
</code></pre></div></div>

<p>Note that the constants vector lists the variable symbols as well as
the external function symbol. If this was a lexically scoped function
the constants vector wouldn’t have the variables listed, being only
<code class="language-plaintext highlighter-rouge">[my-func]</code>.</p>

<h4 id="maximum-stack-usage">Maximum Stack Usage</h4>

<p>This is the maximum stack space used by this byte-code. This value can
be derived from the byte-code itself, but it’s pre-computed so that
the byte-code interpreter can quickly check for stack overflow.
Under-reporting this value is probably another way to crash Emacs.</p>

<h4 id="docstring">Docstring</h4>

<p>The simplest component and completely optional. It’s either the
docstring itself, or if the docstring is especially large it’s a cons
cell indicating a compiled <code class="language-plaintext highlighter-rouge">.elc</code> and a position for lazy access. Only
one position, the start, is needed because the lisp reader is used to
load it and it knows how to recognize the end.</p>

<h4 id="interactive-specification">Interactive Specification</h4>

<p>If this element is present and non-nil then the function is an
interactive function. It holds the exactly contents of <code class="language-plaintext highlighter-rouge">interactive</code>
in the uncompiled function definition.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">byte-compile</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">n</span><span class="p">)</span> <span class="p">(</span><span class="nv">interactive</span> <span class="s">"nNumber: "</span><span class="p">)</span> <span class="nv">n</span><span class="p">))</span>
<span class="c1">;; =&gt; #[(n) "\010\207" [n] 1 nil "nNumber: "]</span>

<span class="p">(</span><span class="nv">byte-compile</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">n</span><span class="p">)</span> <span class="p">(</span><span class="nv">interactive</span> <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="nb">read</span><span class="p">)))</span> <span class="nv">n</span><span class="p">))</span>
<span class="c1">;; =&gt; #[(n) "\010\207" [n] 1 nil (list (read))]</span>
</code></pre></div></div>

<p>The interactive expression is always interpreted, never byte-compiled.
This is usually fine because, by definition, this code is going to be
waiting on user input. However, it slows down keyboard macro playback.</p>

<h3 id="opcodes">Opcodes</h3>

<p>The bulk of the established opcode bytes is for variable, stack, and
constant access opcodes, most of which use packed operands.</p>

<ul>
  <li>0 - 7   : (<code class="language-plaintext highlighter-rouge">stack-ref</code>) stack reference</li>
  <li>8 - 15  : (<code class="language-plaintext highlighter-rouge">varref</code>) variable reference (from constants vector)</li>
  <li>16 - 23 : (<code class="language-plaintext highlighter-rouge">varset</code>) variable set (from constants vector)</li>
  <li>24 - 31 : (<code class="language-plaintext highlighter-rouge">varbind</code>) variable binding (from constants vector)</li>
  <li>32 - 39 : (<code class="language-plaintext highlighter-rouge">call</code>) function call (immediate = number of arguments)</li>
  <li>40 - 47 : (<code class="language-plaintext highlighter-rouge">unbind</code>) variable unbinding (from constants vector)</li>
  <li>129, 192-255 : (<code class="language-plaintext highlighter-rouge">constant</code>) direct constants vector access</li>
</ul>

<p>Except for the last item, each kind of instruction comes in sets of 8.
The nth such instruction means access the nth thing. For example, the
instruction “<code class="language-plaintext highlighter-rouge">2</code>” copies the third stack item to the top of the stack.
An instruction of “<code class="language-plaintext highlighter-rouge">9</code>” pushes onto the stack the value of the
variable named by the second element listed in the constants vector.</p>

<p>However, the 7th and 8th such instructions in each set take an operand
byte or two. The 7th instruction takes a 1-byte operand and the 8th
takes a 2-byte operand. A 2-byte operand is written in little-endian
byte-order regardless of the host platform.</p>

<p>For example, let’s manually craft an instruction that returns the
value of the global variable <code class="language-plaintext highlighter-rouge">foo</code>. Each opcode has a named constant
of <code class="language-plaintext highlighter-rouge">byte-X</code> so we don’t have to worry about their actual byte-code
number.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'bytecomp</span><span class="p">)</span>  <span class="c1">; named opcodes</span>

<span class="p">(</span><span class="nb">defvar</span> <span class="nv">foo</span> <span class="s">"hello"</span><span class="p">)</span>

<span class="p">(</span><span class="nv">defalias</span> <span class="ss">'get-foo</span>
  <span class="p">(</span><span class="nv">make-byte-code</span>
    <span class="m">#x000</span>                 <span class="c1">; no arguments</span>
    <span class="p">(</span><span class="nv">unibyte-string</span>
      <span class="p">(</span><span class="nb">+</span> <span class="mi">0</span> <span class="nv">byte-varref</span><span class="p">)</span>   <span class="c1">; ref variable under first constant</span>
      <span class="nv">byte-return</span><span class="p">)</span>        <span class="c1">; pop and return</span>
    <span class="nv">[foo]</span>                 <span class="c1">; constants</span>
    <span class="mi">1</span><span class="p">))</span>                   <span class="c1">; only using 1 stack space</span>

<span class="p">(</span><span class="nv">get-foo</span><span class="p">)</span>
<span class="c1">;; =&gt; "hello"</span>
</code></pre></div></div>

<p>Ta-da! That’s a handcrafted byte-code function. I left a “+ 0” in
there so that I can change the offset. This function has the exact
same behavior, it’s just less optimal,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defalias</span> <span class="ss">'get-foo</span>
  <span class="p">(</span><span class="nv">make-byte-code</span>
    <span class="m">#x000</span>
    <span class="p">(</span><span class="nv">unibyte-string</span>
      <span class="p">(</span><span class="nb">+</span> <span class="mi">3</span> <span class="nv">byte-varref</span><span class="p">)</span>     <span class="c1">; 4th form of varref</span>
      <span class="nv">byte-return</span><span class="p">)</span>
    <span class="nv">[nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="nv">foo]</span>
    <span class="mi">1</span><span class="p">))</span>
</code></pre></div></div>

<p>If <code class="language-plaintext highlighter-rouge">foo</code> was the 10th constant, we would need to use the 1-byte
operand version. Again, the same behavior, just less optimal.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defalias</span> <span class="ss">'get-foo</span>
  <span class="p">(</span><span class="nv">make-byte-code</span>
    <span class="m">#x000</span>
    <span class="p">(</span><span class="nv">unibyte-string</span>
      <span class="p">(</span><span class="nb">+</span> <span class="mi">6</span> <span class="nv">byte-varref</span><span class="p">)</span>     <span class="c1">; 7th form of varref</span>
      <span class="mi">9</span>                     <span class="c1">; operand, (constant index 9)</span>
      <span class="nv">byte-return</span><span class="p">)</span>
    <span class="nv">[nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="nv">foo]</span>
    <span class="mi">1</span><span class="p">))</span>
</code></pre></div></div>

<p>Dynamically-scoped code makes heavy use of <code class="language-plaintext highlighter-rouge">varref</code> but
lexically-scoped code rarely uses it (global variables only), instead
relying heavily on <code class="language-plaintext highlighter-rouge">stack-ref</code>, which is faster. This is where the
different calling conventions come into play.</p>

<h3 id="calling-convention">Calling Convention</h3>

<p>Each kind of scope gets its own calling convention. Here we finally
get to glimpse some of the really great work by Stefan Monnier
updating the compiler for lexical scope.</p>

<h4 id="dynamic-scope-calling-convention">Dynamic Scope Calling Convention</h4>

<p>Remembering back to the parameter list element of the byte-code
object, dynamically scoped functions keep track of all its argument
names. Before executing a function the interpreter examines the lambda
list and binds (<code class="language-plaintext highlighter-rouge">varbind</code>) every variable globally to an argument.</p>

<p>If the caller was byte-compiled, each argument started on the stack,
was popped and bound to a variable, and, to be accessed by the
function, will be pushed back right onto the stack (<code class="language-plaintext highlighter-rouge">varref</code>). There’s
a lot of argument indirection for each function call.</p>

<h4 id="lexical-scope-calling-convention">Lexical Scope Calling Convention</h4>

<p>With lexical scope, the argument names are not actually bound for the
evaluation byte-code. The names are completely gone because the
compiler has converted local variables into stack offsets.</p>

<p>When calling a lexically-scoped function, the byte-code interpreter
examines the integer parameter descriptor. It checks to make sure the
appropriate number of arguments have been provided, and for each
unprovided <code class="language-plaintext highlighter-rouge">&amp;optional</code> argument it pushes a nil onto the stack. If the
function has a <code class="language-plaintext highlighter-rouge">&amp;rest</code> parameter, any extra arguments are popped off
into a list and that list is pushed onto the stack.</p>

<p>From here the function can access its arguments directly on the stack
without any named variable misdirection. It can even consume them
directly.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; -*- lexical-binding: t -*-</span>
<span class="p">(</span><span class="nb">defun</span> <span class="nv">foo</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span>

<span class="p">(</span><span class="nb">symbol-function</span> <span class="nf">#'</span><span class="nv">foo</span><span class="p">)</span>
<span class="c1">;; =&gt; #[#x101 "\207" [] 2]</span>
</code></pre></div></div>

<p>The byte-code for <code class="language-plaintext highlighter-rouge">foo</code> is a single instruction: <code class="language-plaintext highlighter-rouge">return</code>. The
function’s argument is already on the stack so it doesn’t have to do
anything. Strangely the maximum stack usage element is wrong here (2),
but it won’t cause a crash.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; (As of this writing `byte-compile' always uses dynamic scope.)</span>

<span class="p">(</span><span class="nv">byte-compile</span> <span class="ss">'foo</span><span class="p">)</span>
<span class="c1">;; =&gt; #[(x) "\010\207" [x] 1]</span>
</code></pre></div></div>

<p>It takes longer to set up (x is implicitly bound), it has to make an
explicit variable dereference (<code class="language-plaintext highlighter-rouge">varref</code>), then it has to clean up by
unbinding x (implicit <code class="language-plaintext highlighter-rouge">unbind</code>). It’s no wonder lexical scope is
faster!</p>

<p>Note that there’s also a <code class="language-plaintext highlighter-rouge">disassemble</code> function for examining
byte-code, but it only reveals part of the story.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">disassemble</span> <span class="nf">#'</span><span class="nv">foo</span><span class="p">)</span>
<span class="c1">;; byte code:</span>
<span class="c1">;;   args: (x)</span>
<span class="c1">;; 0       varref    x</span>
<span class="c1">;; 1       return</span>
</code></pre></div></div>

<h3 id="compiler-intermediate-lapcode">Compiler Intermediate “lapcode”</h3>

<p>The Elisp byte-compiler has an intermediate language called lapcode
(“Lisp Assembly Program”), which is much easier to optimize than
byte-code. It’s basically an assembly language built out of
s-expressions. Opcodes are referenced by name and operands, including
packed operands, are handled whole. Each instruction is a cons cell,
<code class="language-plaintext highlighter-rouge">(opcode . operand)</code>, and a program is a list of these.</p>

<p>Let’s rewrite our last <code class="language-plaintext highlighter-rouge">get-foo</code> using lapcode.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defalias</span> <span class="ss">'get-foo</span>
  <span class="p">(</span><span class="nv">make-byte-code</span>
    <span class="m">#x000</span>
    <span class="p">(</span><span class="nv">byte-compile-lapcode</span>
      <span class="o">'</span><span class="p">((</span><span class="nv">byte-varref</span> <span class="o">.</span> <span class="mi">9</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">byte-return</span><span class="p">)))</span>
    <span class="nv">[nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="nv">foo]</span>
    <span class="mi">1</span><span class="p">))</span>
</code></pre></div></div>

<p>We didn’t have to worry about which form of <code class="language-plaintext highlighter-rouge">varref</code> we were using or
even how to encode a 2-byte operand. The lapcode “assembler” took care
of that detail.</p>

<h3 id="project-ideas">Project Ideas?</h3>

<p>The Emacs byte-code compiler and interpreter are fascinating. Having
spent time studying them I’m really tempted to build a project on top
of it all. Perhaps implementing a programming language that targets
the byte-code interpreter, improving compiler optimization, or, for a
really big project, JIT compiling Emacs byte-code.</p>

<p><strong>People <em>can</em> write byte-code!</strong></p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Emacs Lisp Readable Closures</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/12/30/"/>
    <id>urn:uuid:84f86fb6-e029-3a57-1450-1d25be3fdee0</id>
    <updated>2013-12-30T23:52:38Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="lisp"/><category term="javascript"/>
    <content type="html">
      <![CDATA[<p>I’ve stated before that one of the unique features of Emacs Lisp is
that its closures are <em>readable</em>. Closures can be serialized by the
printer and read back in with the reader. I am unaware of any other
programming language that has this feature. In fact it’s essential for
Elisp byte-code compilation because byte-compiled Elisp files are
merely s-expressions of byte-code dumped out as source.</p>

<h3 id="lisp-printing">Lisp Printing</h3>

<p>The Lisp family of languages are <em>homoiconic</em>. Lisp source code is
written in the syntax of its own data structures, s-expressions. Since
a compiler/interpreter is usually provided at run-time, a consequence
of this is that reading and printing are a fundamental feature of
Lisps. A value can be handed to the printer, which will serialize the
value into an s-expression as a sequence of characters. Later on the
reader can parse the s-expression back into an <code class="language-plaintext highlighter-rouge">equal</code> value.</p>

<p>To compare, JavaScript originally had half of this in place.
JavaScript has convenient object syntax for defining an associative
array, known today as JSON. The <code class="language-plaintext highlighter-rouge">eval</code> function could (dangerously) be
used as a reader for parsing a string containing JSON-encoded data
into a value. But until <code class="language-plaintext highlighter-rouge">JSON.stringify()</code> became standard, developers
had to write their own printer. Lisp s-expression syntax is much more
powerful (and complicated) than JSON, maintaining
<a href="/blog/2013/03/28/">both identity and cycles</a> (e.g. <code class="language-plaintext highlighter-rouge">*print-circle*</code>).</p>

<p>Not all values can be read. They’ll still print (when <code class="language-plaintext highlighter-rouge">*print-readably*</code>
is nil) but will do so using special syntax that will signal an error
in the reader: <code class="language-plaintext highlighter-rouge">#&lt;</code>. For example, in Emacs Lisp buffers cannot be
serialized so they print using this syntax.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">prin1-to-string</span> <span class="p">(</span><span class="nv">current-buffer</span><span class="p">))</span>
<span class="c1">;; =&gt; "#&lt;buffer *scratch*&gt;"</span>
</code></pre></div></div>

<p>It doesn’t matter what’s between the angle brackets, or even that
there’s a closing angle bracket. The reader will signal an error as
soon as it hits a <code class="language-plaintext highlighter-rouge">#&lt;</code>.</p>

<h4 id="almost-everything-prints-readably">Almost Everything Prints Readably</h4>

<p>Elisp has a small set of primitive data types. All of these primitive
types print readably:</p>

<ul>
  <li>integer (<code class="language-plaintext highlighter-rouge">1024</code>, <code class="language-plaintext highlighter-rouge">?a</code>)</li>
  <li>float (<code class="language-plaintext highlighter-rouge">1.7</code>)</li>
  <li>cons/list (<code class="language-plaintext highlighter-rouge">(...)</code>)</li>
  <li>vector (one-dimensional, <code class="language-plaintext highlighter-rouge">[...]</code>)</li>
  <li>bool-vector (<code class="language-plaintext highlighter-rouge">#&amp;n"..."</code>)</li>
  <li>string (<code class="language-plaintext highlighter-rouge">"..."</code>)</li>
  <li>char-table (<code class="language-plaintext highlighter-rouge">#^[...]</code>)</li>
  <li>hash-table (readable as of Emacs 23.3, <code class="language-plaintext highlighter-rouge">#s(hash-table ...)</code>)</li>
  <li>byte-code function object (<code class="language-plaintext highlighter-rouge">#[...]</code>)</li>
  <li>symbol</li>
</ul>

<p>Here are all the non-readable types. Each one has a good reason for
not being serializable.</p>

<ul>
  <li>buffer</li>
  <li>process (external state)</li>
  <li>frame (user interface element)</li>
  <li>marker (live, automatically updates)</li>
  <li>overlay (belongs to a buffer)</li>
  <li>built-in functions (native code)</li>
  <li>user-ptr (opaque pointers from Emacs 25 dynamic modules)</li>
</ul>

<p>And that’s it. Every other value in Elisp is constructed from one or
more of these primitives, including keymaps, functions, macros, syntax
tables, <code class="language-plaintext highlighter-rouge">defstruct</code> structs, and EIEIO objects. This means that as
long as these values don’t refer to an unreadable value, they
themselves can be printed.</p>

<p>An interesting note here is that, unlike the Common Lisp Object System
(CLOS), EIEIO objects are readable by default. To Elisp they’re just
vectors, so of course they print. CLOS objects are unreadable without
manually defining a print method per class.</p>

<h3 id="elisp-closures">Elisp Closures</h3>

<p>Elisp got lexical scoping in Emacs 24, released in June 2012. It’s now
one of the relatively few languages to have both dynamic and lexical
scope. Like Common Lisp, variables declared with <code class="language-plaintext highlighter-rouge">defvar</code> (and family)
continue to have dynamic scope. For backwards compatibility with old
Lisp code, lexical scope is disabled by default. It’s enabled for a
specific file or buffer by setting <code class="language-plaintext highlighter-rouge">lexical-binding</code> to non-nil.</p>

<p>With lexical scope, anonymous functions become closures, a powerful
functional programming primitive: a function plus a captured lexical
environment. It also provides some performance benefits. In my own
tests, compiled Elisp with lexical scope enabled is about 10% to 15%
faster than with the default dynamic scope.</p>

<p>What do closures look like in Emacs Lisp? It takes on two forms
depending on whether the closure is compiled or not. For example,
consider this function, <code class="language-plaintext highlighter-rouge">foo</code>, that takes two arguments and returns a
closure that returns the first argument.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; -*- lexical-binding: t; -*-</span>
<span class="p">(</span><span class="nb">defun</span> <span class="nv">foo</span> <span class="p">(</span><span class="nv">x</span> <span class="nv">y</span><span class="p">)</span>
  <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="nv">x</span><span class="p">))</span>

<span class="p">(</span><span class="nv">foo</span> <span class="ss">:bar</span> <span class="ss">:ignored</span><span class="p">)</span>
<span class="c1">;; =&gt; (closure ((y . :ignored) (x . :bar) t) () x)</span>
</code></pre></div></div>

<p>An uncompiled closure is a list beginning with the symbol <code class="language-plaintext highlighter-rouge">closure</code>.
The second element is the lexical environment, the third is the
argument list (lambda list), and the rest is the body of the function.
Here we can see that both <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> have been “closed over.” This is
a little bit sloppy because the function never makes use of <code class="language-plaintext highlighter-rouge">y</code>.
Capturing it has a few problems.</p>

<ul>
  <li>The closure has a larger footprint than necessary.</li>
  <li>Values are held longer than necessary, delaying collection.</li>
  <li>It affects the readability of the closure, which I’ll get to later.</li>
</ul>

<p>Fortunately the compiler is smart enough to see this and will avoid
capturing unused variables. To prove this, I’ve now compiled <code class="language-plaintext highlighter-rouge">foo</code> so
that it returns a compiled closure.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">foo</span> <span class="ss">:bar</span> <span class="ss">:ignored</span><span class="p">)</span>
<span class="c1">;; =&gt; #[0 "\300\207" [:bar] 1]</span>
</code></pre></div></div>

<p>What’s returned here is a byte-code function object, with the <code class="language-plaintext highlighter-rouge">#[...]</code>
syntax. It has these elements:</p>

<ol>
  <li>The function’s lambda list (zero arguments)</li>
  <li>Byte-codes stored in a unibyte string</li>
  <li>Constants vector</li>
  <li>Maximum stack space needed by this function</li>
</ol>

<p>Notice that the lexical environment has been captured in the constants
vector, specifically noting the lack of <code class="language-plaintext highlighter-rouge">:ignored</code> in this vector. The
compiler didn’t capture it.</p>

<p>For those curious about the byte-code here’s an explanation. The
string syntax shown is in octal, representing a string containing two
bytes: 192 and 135. The
<a href="/blog/2014/01/04/">Elisp byte-code interpreter is stack-based</a>. The 192
(<code class="language-plaintext highlighter-rouge">constant 0</code>) says to push the first constant onto the stack. The 135
(<code class="language-plaintext highlighter-rouge">return</code>) says to pop the top element from the stack and return it.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">coerce</span> <span class="s">"\300\207"</span> <span class="ss">'list</span><span class="p">)</span>
<span class="c1">;; =&gt; (192 135)</span>
</code></pre></div></div>

<h3 id="the-readable-closures-catch">The Readable Closures Catch</h3>

<p>Since closures are byte-code function objects, they print readably.
You can capture an environment in a closure, serialize it, read it
back in, and evaluate it. That’s pretty cool! This means closures can
be transmitted to other Emacs instances in a multi-processing setup
(i.e. <a href="https://github.com/nicferrier/elnode">Elnode</a>, <a href="https://github.com/jwiegley/emacs-async">Async</a>)</p>

<p>The catch is that it’s easy to accidentally capture an unreadable
value, especially buffers. Consider this function <code class="language-plaintext highlighter-rouge">bar</code> which uses a
temporary buffer as an efficient string builder. It returns a closure
that returns the result. (Weird, but stick with me here!)</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">bar</span> <span class="p">(</span><span class="nv">n</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">with-temp-buffer</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">standard-output</span> <span class="p">(</span><span class="nv">current-buffer</span><span class="p">)))</span>
      <span class="p">(</span><span class="nb">loop</span> <span class="nv">for</span> <span class="nv">i</span> <span class="nv">from</span> <span class="mi">0</span> <span class="nv">to</span> <span class="nv">n</span> <span class="nb">do</span> <span class="p">(</span><span class="nb">princ</span> <span class="nv">i</span><span class="p">))</span>
      <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nb">string</span> <span class="p">(</span><span class="nv">buffer-string</span><span class="p">)))</span>
        <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="nb">string</span><span class="p">)))))</span>
</code></pre></div></div>

<p>The compiled form looks fine,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">bar</span> <span class="mi">3</span><span class="p">)</span>
<span class="c1">;; =&gt; #[0 "\300\207" ["0123"] 1]</span>
</code></pre></div></div>

<p>But the interpreted form of the closure has a problem. The
<code class="language-plaintext highlighter-rouge">with-temp-buffer</code> macro silently introduced a new binding — an
abstraction leak.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">bar</span> <span class="mi">3</span><span class="p">)</span>
<span class="c1">;; =&gt; (closure ((string . "0123")</span>
<span class="c1">;;              (temp-buffer . #&lt;killed buffer&gt;)</span>
<span class="c1">;;              (n . 3) t)</span>
<span class="c1">;;      () string)</span>
</code></pre></div></div>

<p>The temporary buffer is mistakenly captured in the closure making it
unreadable, but <em>only</em> in its uncompiled form. This creates the
awkward situation where compiled and uncompiled code has <a href="/blog/2016/12/22/#accidental-closures">different
behavior</a>.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Clojure-style Multimethods in Emacs Lisp</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/12/18/"/>
    <id>urn:uuid:029f9acb-a29f-3e58-14f3-457f245cdb5d</id>
    <updated>2013-12-18T23:06:15Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="clojure"/>
    <content type="html">
      <![CDATA[<p>This past week I added <a href="http://clojure.org/multimethods">Clojure-style multimethods</a> to Emacs
Lisp through a package I call <code class="language-plaintext highlighter-rouge">predd</code> (predicate dispatch). <strong>I
believe it is Elisp’s very first complete <em>multiple dispatch</em> object
system!</strong> That is, methods are dispatched based on the dynamic,
run-time type of <a href="http://en.wikipedia.org/wiki/Multimethods">more than one of its arguments</a>.</p>

<ul>
  <li><a href="https://github.com/skeeto/predd">https://github.com/skeeto/predd</a></li>
</ul>

<p>(Unfortunately I was unaware of the
<a href="https://github.com/kurisuwhyte/emacs-multi">other Clojure-style multimethod library</a> when I wrote mine.
However, my version is <em>much</em> more complete, has better performance,
and is public domain.)</p>

<p>As of version 23.2, Emacs includes a CLOS-like object system cleverly
named EIEIO. While CLOS (Common Lisp Object System) is multiple
dispatch, EIEIO is, like most object systems, only <em>single dispatch</em>.
The predd package is also very different than my other Elisp object
system, <a href="/blog/2013/04/07/">@</a>, which was prototype based and, therefore, also single
dispatch (and comically slow).</p>

<p>The <a href="http://clojure.org/multimethods">Clojure multimethods documentation</a> provides a good
introduction. The predd package works almost exactly the same way,
except that due to Elisp’s lack of namespacing the function names are
prefixed with <code class="language-plaintext highlighter-rouge">predd-</code>. Also different is that the optional hierarchy
(<code class="language-plaintext highlighter-rouge">h</code>) argument is handled by the dynamic variable <code class="language-plaintext highlighter-rouge">predd-hierarchy</code>,
which holds the global hierarchy.</p>

<h3 id="combination-example">Combination Example</h3>

<p>To define a multimethod, pick a name and give it a <em>classifier
function</em>. The classifier function will look at the method’s arguments
and return a <em>dispatch value</em>. This value is used to select a
particular method. What makes predd a multiple dispatch system is the
dispatch value can be derived from any number of methods arguments.
Because the dispatch value is computed at run-time this is called a
<em>late binding</em>.</p>

<p>Here I’m going to define a multimethod called <code class="language-plaintext highlighter-rouge">combine</code> that takes two
arguments. It combines its arguments appropriately depending on their
dynamic run-time types.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">predd-defmulti</span> <span class="nv">combine</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="p">(</span><span class="nb">vector</span> <span class="p">(</span><span class="nb">type-of</span> <span class="nv">a</span><span class="p">)</span> <span class="p">(</span><span class="nb">type-of</span> <span class="nv">b</span><span class="p">)))</span>
  <span class="s">"Appropriately combine A and B."</span><span class="p">)</span>
</code></pre></div></div>

<p>The classifier uses <code class="language-plaintext highlighter-rouge">type-of</code>, an Elisp built-in, to examine its
argument types. It returns them as tuple in the form of a vector. The
classifier of a method can be accessed with <code class="language-plaintext highlighter-rouge">predd-classifier</code>, which
I’ll use to demonstrate what these dispatch values will look like.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">predd-classifier</span> <span class="ss">'combine</span><span class="p">)</span> <span class="mi">1</span> <span class="mi">2</span><span class="p">)</span>    <span class="c1">; =&gt; [integer integer]</span>
<span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">predd-classifier</span> <span class="ss">'combine</span><span class="p">)</span> <span class="mi">1</span> <span class="s">"2"</span><span class="p">)</span>  <span class="c1">; =&gt; [integer string]</span>
</code></pre></div></div>

<p>I chose a vector for the dispatch value because I like the bracket
style when defining methods (you’ll see below). The dispatch value can
be literally anything that <code class="language-plaintext highlighter-rouge">equal</code> knows how to compare, not just
vectors. Note that it’s actually faster to create a list than a vector
up to a length of about 6, so this multimethod would be faster if the
classifier returned a list — or even better: a single cons.</p>

<p>Now define some methods for different dispatch values.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">predd-defmethod</span> <span class="nv">combine</span> <span class="nv">[integer</span> <span class="nv">integer]</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">+</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">))</span>

<span class="p">(</span><span class="nv">predd-defmethod</span> <span class="nv">combine</span> <span class="nv">[string</span> <span class="nv">string]</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">concat</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">))</span>

<span class="p">(</span><span class="nv">predd-defmethod</span> <span class="nv">combine</span> <span class="nv">[cons</span> <span class="nv">cons]</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">append</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">))</span>
</code></pre></div></div>

<p>Now try it out.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">combine</span> <span class="mi">1</span> <span class="mi">2</span><span class="p">)</span>            <span class="c1">; =&gt; 3</span>
<span class="p">(</span><span class="nv">combine</span> <span class="s">"a"</span> <span class="s">"b"</span><span class="p">)</span>        <span class="c1">; =&gt;"ab"</span>
<span class="p">(</span><span class="nv">combine</span> <span class="o">'</span><span class="p">(</span><span class="mi">1</span> <span class="mi">2</span><span class="p">)</span> <span class="o">'</span><span class="p">(</span><span class="mi">3</span> <span class="mi">4</span><span class="p">))</span>  <span class="c1">; =&gt; (1 2 3 4)</span>

<span class="p">(</span><span class="nv">combine</span> <span class="mi">1</span> <span class="o">'</span><span class="p">(</span><span class="mi">3</span> <span class="mi">4</span><span class="p">))</span>
<span class="c1">; error: "No method found in combine for [integer cons]"</span>
</code></pre></div></div>

<p>Notice in the last case it didn’t know how to combine these two types,
so it threw an error. In this simple example where we’re only calling
a single function, so rather than use the <code class="language-plaintext highlighter-rouge">predd-defmethod</code> macro
these methods can be added directly with the <code class="language-plaintext highlighter-rouge">predd-add-method</code>
function. This has the exact same result except that it has slightly
better performance (no wrapper functions).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">predd-add-method</span> <span class="ss">'combine</span> <span class="nv">[integer</span> <span class="nv">integer]</span> <span class="nf">#'</span><span class="nb">+</span><span class="p">)</span>
<span class="p">(</span><span class="nv">predd-add-method</span> <span class="ss">'combine</span> <span class="nv">[string</span> <span class="nv">string]</span>   <span class="nf">#'</span><span class="nv">concat</span><span class="p">)</span>
<span class="p">(</span><span class="nv">predd-add-method</span> <span class="ss">'combine</span> <span class="nv">[cons</span> <span class="nv">cons]</span>       <span class="nf">#'</span><span class="nb">append</span><span class="p">)</span>
</code></pre></div></div>

<h4 id="use-the-hierarchy">Use the Hierarchy</h4>

<p>Hmmm, the <code class="language-plaintext highlighter-rouge">+</code> function is already polymorphic. It seamlessly operates
on both floats and integers. So far it seems there’s no way to exploit
this with multimethods. Fortunately we can solve this by defining our
own ad hoc hierarchy using <code class="language-plaintext highlighter-rouge">predd-derive</code>. Both integers and floats
are a kind of number. It’s important to note that <code class="language-plaintext highlighter-rouge">type-of</code> never
returns <code class="language-plaintext highlighter-rouge">number</code>. We’re introducing that name here ourselves.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">type-of</span> <span class="mf">1.0</span><span class="p">)</span>  <span class="c1">; =&gt; float</span>

<span class="p">(</span><span class="nv">predd-derive</span> <span class="ss">'integer</span> <span class="ss">'number</span><span class="p">)</span>
<span class="p">(</span><span class="nv">predd-derive</span> <span class="ss">'float</span> <span class="ss">'number</span><span class="p">)</span>

<span class="c1">;; Types can derive from multiple parents, like multiple inheritance</span>
<span class="p">(</span><span class="nv">predd-derive</span> <span class="ss">'integer</span> <span class="ss">'exact</span><span class="p">)</span>
<span class="p">(</span><span class="nv">predd-derive</span> <span class="ss">'float</span> <span class="ss">'inexact</span><span class="p">)</span>
</code></pre></div></div>

<p>This says that <code class="language-plaintext highlighter-rouge">integer</code> and <code class="language-plaintext highlighter-rouge">float</code> are each a kind of <code class="language-plaintext highlighter-rouge">number</code>. Now
we can use <code class="language-plaintext highlighter-rouge">number</code> in a dispatch value. When it sees something like
<code class="language-plaintext highlighter-rouge">[float integer]</code> it knows that it matches <code class="language-plaintext highlighter-rouge">[number number]</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">predd-add-method</span> <span class="ss">'combine</span> <span class="nv">[number</span> <span class="nv">number]</span> <span class="nf">#'</span><span class="nb">+</span><span class="p">)</span>

<span class="p">(</span><span class="nv">combine</span> <span class="mf">1.5</span> <span class="mi">2</span><span class="p">)</span>  <span class="c1">; =&gt; 3.5</span>
</code></pre></div></div>

<p>We can check the hierarchy explicitly with <code class="language-plaintext highlighter-rouge">predd-isa-p</code> (like
Clojure’s <code class="language-plaintext highlighter-rouge">isa?</code>). It compares two values just like <code class="language-plaintext highlighter-rouge">equal</code>, but it
also accounts for all <code class="language-plaintext highlighter-rouge">predd-derive</code> declarations. Because of this
extra concern, unlike <code class="language-plaintext highlighter-rouge">equal</code>, <code class="language-plaintext highlighter-rouge">predd-isa-p</code> is <em>not</em> commutative.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">predd-isa-p</span> <span class="ss">'number</span> <span class="ss">'number</span><span class="p">)</span>  <span class="c1">; =&gt; 0</span>
<span class="p">(</span><span class="nv">predd-isa-p</span> <span class="ss">'float</span> <span class="ss">'number</span><span class="p">)</span>   <span class="c1">; =&gt; 1</span>
<span class="p">(</span><span class="nv">predd-isa-p</span> <span class="ss">'number</span> <span class="ss">'float</span><span class="p">)</span>   <span class="c1">; =&gt; nil</span>

<span class="p">(</span><span class="nv">predd-isa-p</span> <span class="nv">[float</span> <span class="nv">float]</span> <span class="nv">[number</span> <span class="nv">number]</span><span class="p">)</span>  <span class="c1">; =&gt; 2</span>
</code></pre></div></div>

<p>(Remember that <code class="language-plaintext highlighter-rouge">0</code> is truthy in Elisp.) The integer returned is a
distance metric used by method dispatch to determine which values are
“closer” so that the most appropriate method is selected.</p>

<p>You might be worried that introducing <code class="language-plaintext highlighter-rouge">number</code> will make the
multimethod slower. Examining the hierarchy will definitely have a
cost after all. Fortunately predd has a dispatch cache, so
introducing this indirection will have <em>no</em> additional performance
penalty after the first call with a particular dispatch value.</p>

<h3 id="struct-example">Struct Example</h3>

<p>Something that really sets these multimethods apart from other object
systems is a lack of concern about encapsulation — or really about
object data in general. That’s the classifier’s concern. So here’s an
example of how to combine predd with <code class="language-plaintext highlighter-rouge">defstruct</code> from cl/cl-lib.</p>

<p>Imagine we’re making some kind of game where each of the creatures is
represented by an <code class="language-plaintext highlighter-rouge">actor</code> struct. Each actor has a name, hit points,
and active status effects.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defstruct</span> <span class="nv">actor</span>
  <span class="p">(</span><span class="nv">name</span> <span class="s">"Unknown"</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">hp</span> <span class="mi">100</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">statuses</span> <span class="p">()))</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">defstruct</code> macro has a useful inheritance feature that we can
exploit for our game to create subtypes. The parent accessors will
work on these subtypes, immediately providing some (efficient)
polymorphism even before multimethods are involved.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defstruct</span> <span class="p">(</span><span class="nv">player</span> <span class="p">(</span><span class="ss">:include</span> <span class="nv">actor</span><span class="p">))</span>
  <span class="nv">control-scheme</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defstruct</span> <span class="p">(</span><span class="nv">stinkmonster</span> <span class="p">(</span><span class="ss">:include</span> <span class="nv">actor</span><span class="p">))</span>
  <span class="p">(</span><span class="k">type</span> <span class="ss">'sewage</span><span class="p">))</span>

<span class="p">(</span><span class="nv">actor-hp</span> <span class="p">(</span><span class="nv">make-stinkmonster</span><span class="p">))</span>  <span class="c1">; =&gt; 100</span>
</code></pre></div></div>

<p>As a side note: this isn’t necessarily the best way to go about
modeling a game. We probably shouldn’t be relying on inheritance too
much, but bear with me for this example.</p>

<p>Say we want an <code class="language-plaintext highlighter-rouge">attack</code> method for handling attacks between different
types of monsters. Elisp structs have a very useful property by
default: they’re simply vectors whose first element is a symbol
denoting its type. We can use this in a multimethod classifier.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">make-player</span><span class="p">)</span>
<span class="c1">;; =&gt; [cl-struct-player "Unknown" 100 nil nil]</span>

<span class="p">(</span><span class="nv">predd-defmulti</span> <span class="nv">attack</span>
    <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">attacker</span> <span class="nv">victim</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">vector</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">attacker</span> <span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="nb">aref</span> <span class="nv">victim</span> <span class="mi">0</span><span class="p">)))</span>
  <span class="s">"Perform an attack from ATTACKER on VICTIM."</span><span class="p">)</span>
</code></pre></div></div>

<p>Let’s define a base case. This will be overridden by more specific
methods (determined by that distance metric).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">predd-defmethod</span> <span class="nv">attack</span> <span class="nv">[cl-struct-actor</span> <span class="nv">cl-struct-actor]</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">v</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">decf</span> <span class="p">(</span><span class="nv">actor-hp</span> <span class="nv">v</span><span class="p">)</span> <span class="mi">10</span><span class="p">))</span>
</code></pre></div></div>

<p>We could have instead used <code class="language-plaintext highlighter-rouge">:default</code> for the dispatch value, which is
a special catch-all value. The <code class="language-plaintext highlighter-rouge">actor-hp</code> function will signal an
error for any victim non-actors anyway. However, not using <code class="language-plaintext highlighter-rouge">:default</code>
will force both argument types to be checked. It will also demonstrate
specialization for the example.</p>

<p>However, before we can make use of this we need to teach predd about
the relationship between these structs. It doesn’t check <code class="language-plaintext highlighter-rouge">defstruct</code>
hierarchies. This step is what makes combining <code class="language-plaintext highlighter-rouge">defstruct</code> and predd
a little unwieldy. A wrapper macro is probably due for this.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">predd-derive</span> <span class="ss">'cl-struct-player</span> <span class="ss">'cl-struct-actor</span><span class="p">)</span>
<span class="p">(</span><span class="nv">predd-derive</span> <span class="ss">'cl-struct-stinkmonster</span> <span class="ss">'cl-struct-actor</span><span class="p">)</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">player</span> <span class="p">(</span><span class="nv">make-player</span><span class="p">))</span>
      <span class="p">(</span><span class="nv">monster</span> <span class="p">(</span><span class="nv">make-stinkmonster</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">attack</span> <span class="nv">player</span> <span class="nv">monster</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">actor-hp</span> <span class="nv">monster</span><span class="p">))</span>
<span class="c1">;; =&gt; 90</span>
</code></pre></div></div>

<p>When the stinkmonster attacks players it doesn’t do damage. Instead it
applies a status effect.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">predd-defmethod</span> <span class="nv">attack</span> <span class="nv">[cl-struct-stinkmonster</span> <span class="nv">cl-struct-player]</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">v</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">pushnew</span> <span class="p">(</span><span class="nv">stinkmonster-type</span> <span class="nv">a</span><span class="p">)</span> <span class="p">(</span><span class="nv">actor-statuses</span> <span class="nv">v</span><span class="p">)))</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">player</span> <span class="p">(</span><span class="nv">make-player</span><span class="p">))</span>
      <span class="p">(</span><span class="nv">monster</span> <span class="p">(</span><span class="nv">make-stinkmonster</span><span class="p">)))</span>
  <span class="p">(</span><span class="nv">attack</span> <span class="nv">monster</span> <span class="nv">player</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">actor-statuses</span> <span class="nv">player</span><span class="p">))</span>
<span class="c1">;; =&gt; (sewage)</span>
</code></pre></div></div>

<p>If the monster applied a status effect in addition to the default
attack behavior then CLOS-style method combination would be far more
appropriate here (if only it was available in Elisp). The method would
instead be defined as an “after” method and it would automatically run
in addition to the default behavior.</p>

<p>If I was actually building a system combing structs and predd, I would
be using this helper function for building classifiers. It returns a
dispatch value for selected arguments.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: t; -*-</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">struct-classifier</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">pattern</span><span class="p">)</span>
  <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">args</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">loop</span> <span class="nv">for</span> <span class="nv">select-p</span> <span class="nv">in</span> <span class="nv">pattern</span> <span class="nb">and</span> <span class="nv">arg</span> <span class="nv">in</span> <span class="nv">args</span>
          <span class="nb">when</span> <span class="nv">select-p</span> <span class="nv">collect</span> <span class="p">(</span><span class="nb">elt</span> <span class="nv">arg</span> <span class="mi">0</span><span class="p">))))</span>

<span class="c1">;; Takes 3 arguments, dispatches on the first 2 argument types.</span>
<span class="p">(</span><span class="nv">predd-defmulti</span> <span class="nv">speak</span> <span class="p">(</span><span class="nv">struct-classifier</span> <span class="no">t</span> <span class="no">t</span> <span class="no">nil</span><span class="p">))</span>

<span class="c1">;; Messages sent to the player are displayed.</span>
<span class="p">(</span><span class="nv">predd-defmethod</span> <span class="nv">speak</span> <span class="o">'</span><span class="p">(</span><span class="nv">cl-struct-actor</span> <span class="nv">cl-struct-player</span><span class="p">)</span> <span class="p">(</span><span class="nv">from</span> <span class="nv">to</span> <span class="nv">message</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">message</span> <span class="s">"%s says %s."</span> <span class="p">(</span><span class="nv">actor-name</span> <span class="nv">from</span><span class="p">)</span> <span class="nv">message</span><span class="p">))</span>
</code></pre></div></div>

<h3 id="the-future">The Future</h3>

<p>As of this writing there isn’t yet a <code class="language-plaintext highlighter-rouge">prefer-method</code> for
disambiguating equally preferred dispatch values. I will add it in the
future. I think <code class="language-plaintext highlighter-rouge">prefer-method</code> gets unwieldy quickly as the type
hierarchy grows, so it should be avoided anyway.</p>

<p>I haven’t put predd in MELPA or otherwise published it yet. That’s
what this post is for. But I think it’s ready for prime time, so feel
free to try it out.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Emacs Lisp Reddit API Wrapper</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/12/16/"/>
    <id>urn:uuid:3362934d-9762-3f58-e05c-4d8b28175367</id>
    <updated>2013-12-16T23:27:23Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="reddit"/><category term="web"/>
    <content type="html">
      <![CDATA[<p>A couple of months ago I wrote an Emacs Lisp wrapper for the
<a href="http://old.reddit.com/dev/api">reddit API</a>. I didn’t put it in MELPA,
not yet anyway. If anyone is finding it useful I’ll see about getting
that done. My intention was give it some exercise and testing before
putting it out there for people to use, locking down the API. You can
find it here,</p>

<ul>
  <li><a href="https://github.com/skeeto/emacs-reddit-api">https://github.com/skeeto/emacs-reddit-api</a></li>
</ul>

<p>Except for logging in, the library is agnostic about the actual API
endpoints themselves. It just knows how to translate between Elisp and
the reddit API protocol. This makes the library dead simple to use. I
had considered supporting <a href="http://blog.jenkster.com/2013/10/an-oauth2-in-emacs-example.html">OAuth2 authentication</a> rather than
password authentication, but reddit’s OAuth2 support is pretty rough
around the edges.</p>

<h3 id="library-usage">Library Usage</h3>

<p>The reddit API has two kinds of endpoints, GET and POST, so there are
really only three functions to concern yourself with.</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">reddit-login</code></li>
  <li><code class="language-plaintext highlighter-rouge">reddit-get</code></li>
  <li><code class="language-plaintext highlighter-rouge">reddit-post</code></li>
</ul>

<p>And one variable,</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">reddit-session</code></li>
</ul>

<p>The <code class="language-plaintext highlighter-rouge">reddit-login</code> function is really just a special case of
<code class="language-plaintext highlighter-rouge">reddit-post</code>. It returns a session value (cookie/modhash tuple) that
is used by the other two functions for authenticating the user. Just
as you get automatically with almost all Elisp data structures —
probably more so than <em>any</em> other popular programming language — it
can be serialized with the printer and reader, allowing a reddit
session to be maintained across Emacs sessions.</p>

<p>The return value of <code class="language-plaintext highlighter-rouge">reddit-login</code> generally doesn’t need to be
captured. It automatically sets the dynamic variable <code class="language-plaintext highlighter-rouge">reddit-session</code>,
which is what the other functions access for authentication. This can
be bound with <code class="language-plaintext highlighter-rouge">let</code> to other session values in order to switch between
different users.</p>

<p>Both <code class="language-plaintext highlighter-rouge">reddit-get</code> and <code class="language-plaintext highlighter-rouge">reddit-post</code> take an endpoint name and a list
of key-value pairs in the form of a property list (plist). (The
<code class="language-plaintext highlighter-rouge">api-type</code> key is automatically supplied.) They each return the JSON
response from the server in association list (alist) form. The actual
shape of this data matches the response from reddit, which,
unfortunately, is inconsistent and unspecified, so writing any sort of
program to operate on the API requires lots of trial and error. If the
API responded with an error, these functions signal a <code class="language-plaintext highlighter-rouge">reddit-error</code>.</p>

<p>Typical usage looks like so. Notice that values need not be only
strings; they just need to print to something reasonable.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Login first</span>
<span class="p">(</span><span class="nv">reddit-login</span> <span class="s">"your-username"</span> <span class="s">"your-password"</span><span class="p">)</span>

<span class="c1">;; Subscribe to a subreddit</span>
<span class="p">(</span><span class="nv">reddit-post</span> <span class="s">"/api/subscribe"</span> <span class="o">'</span><span class="p">(</span><span class="ss">:sr</span> <span class="s">"t5_2s49f"</span> <span class="ss">:action</span> <span class="nv">sub</span><span class="p">))</span>

<span class="c1">;; Post a comment</span>
<span class="p">(</span><span class="nv">reddit-post</span> <span class="s">"/api/comment/"</span> <span class="o">'</span><span class="p">(</span><span class="ss">:text</span> <span class="s">"Hello world."</span> <span class="ss">:thing_id</span> <span class="s">"t1_cd3ar7y"</span><span class="p">))</span>
</code></pre></div></div>

<p>For plists keys I considered automatically converting between dashes
and underscores so that the keywords could have Lisp-style names. But
the reddit API is inconsistent, using both, so there’s no correct way
to do this.</p>

<p>To further refine the API it might be worth defining a function for
each of the reddit endpoints, forming a facade for the wrapper
library, hiding way the plist arguments and complicated responses.
That would eliminate the trial and error of using the API.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">reddit-api-comment</span> <span class="p">(</span><span class="nv">parent</span> <span class="nv">comment</span><span class="p">)</span>
  <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">null</span> <span class="nv">reddit-session</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">error</span> <span class="s">"Not logged in."</span><span class="p">)</span>
    <span class="c1">;; TODO: reduce the return value into a thing/struct</span>
    <span class="p">(</span><span class="nv">reddit-post</span> <span class="s">"/api/comment/"</span> <span class="o">'</span><span class="p">(</span><span class="ss">:thing_id</span> <span class="nv">parent</span> <span class="ss">:text</span> <span class="nv">comment</span><span class="p">))))</span>
</code></pre></div></div>

<p>Furthermore there could be defstructs for comments, posts, subreddits,
etc. so that the “thing” ID stuff is hidden away. This is basically
what was already done for sessions out of necessity. I might add these
structs and functions someday but I don’t currently have a need for
it.</p>

<p>It would be neat to use this API to create an interface to reddit from
within Emacs. I imagine it might look like one of the Emacs mail
clients, or <a href="/blog/2013/09/04/">like Elfeed</a>. Almost everything, including
viewing image posts within Emacs, should be possible.</p>

<h3 id="background">Background</h3>

<p>For the last 3.5 years I’ve been a moderator of <a href="http://old.reddit.com/r/civ">/r/civ</a>,
<a href="http://old.reddit.com/r/civ/comments/clxj4/lets_tidy_rciv_up_a_bit/">starting back when it had about 100 subscribers</a>. As of this
writing it’s just short of 60k subscribers and we’re now up to 9
moderators.</p>

<p>A few months ago we decided to institute a self-post-only Sunday. All
day Sunday, midnight to midnight Eastern time, only self-posts are
allowed in the subreddit. One of the other moderators was turning this
on and off manually, so I offered to write a bot to do the job. There
<a href="https://github.com/reddit/reddit/wiki/API-Wrappers">weren’t any Lisp wrappers yet</a> (though raw4j could be used
with Clojure), so I decided to write one.</p>

<p>As mentioned before, the reddit API leaves <em>a lot</em> to be desired. It
randomly returns errors, so a correct program needs to be prepared to
retry requests after a short delay, depending on the error. My
particular annoyance is that the <code class="language-plaintext highlighter-rouge">/api/site_admin</code> endpoint requires
that most of its keys are supplied, and it’s not documented which ones
are required. Even worse, there’s no single endpoint to get all of the
required values, the key names between endpoints are inconsistent, and
even the values themselves can’t be returned as-is, requiring
<a href="http://old.reddit.com/r/bugs/comments/1t162o/">massaging/fixing before returning them back to the API</a>.</p>

<p>I hope other people find this library useful!</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Emacs, Thanksgiving, and Hanukkah</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/11/28/"/>
    <id>urn:uuid:cd66c73c-cb8c-3e12-9c16-396a36266f8b</id>
    <updated>2013-11-28T22:25:36Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="meatspace"/>
    <content type="html">
      <![CDATA[<p>Today is Thanksgiving in the United States. It also happens to be
Hanukkah. There’s been news going around that Thanksgiving and
Hanukkah <a href="http://www.leancrew.com/all-this/2013/01/hanukkah-and-thanksgiving/">will not coincide again for about 80,000 years</a>. This
sounded somewhat unbelievable to me because
<a href="http://blog.plover.com/calendar/july-weekends.html">the Gregorian repeats every 400 years</a>. I decided to
compute it for myself to double-check this figure.</p>

<p>I’m not Jewish and I know very little about Hanukkah, so I had to look
it up. After learning that Hanukkah is based on the Hebrew calendar,
the rumors were sounding more believable. The Hebrew calendar repeats
every 689,472 Hebrew years. This means the correspondence between
Gregorian and Hebrew calendars <a href="http://hebrewcalendar.tripod.com/">is about 14 billion years</a>.
That 80,000 seems lowball.</p>

<p>Since I decided to use Emacs Lisp for the computation, I fortunately
was able to ignore all the unfamiliar, complicated rules for the
Hebrew calendar: Emacs knows how to compute Hebrew dates. It can be
accessed through the function <code class="language-plaintext highlighter-rouge">calendar-hebrew-date-string</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Thanksgiving 2013</span>
<span class="p">(</span><span class="nv">calendar-hebrew-date-string</span> <span class="o">'</span><span class="p">(</span><span class="mi">11</span> <span class="mi">28</span> <span class="mi">2013</span><span class="p">))</span>
<span class="c1">;; =&gt; "Kislev 25, 5774"</span>
</code></pre></div></div>

<p>Hanukkah begins on the 25th of Kislev, so I can write a
quick-and-dirty function to detect if a date is the first day of
Hanukkah.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">hanukkah-p</span> <span class="p">(</span><span class="nv">date</span><span class="p">)</span>
  <span class="s">"Return non-nil if DATE is Hanukkah."</span>
  <span class="p">(</span><span class="nv">string-match-p</span> <span class="s">"^Kislev 25"</span> <span class="p">(</span><span class="nv">calendar-hebrew-date-string</span> <span class="nv">date</span><span class="p">)))</span>
</code></pre></div></div>

<p>Next I need a function to compute Thanksgiving, which is really
simple. Thanksgiving falls on the fourth Thursday of November.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">thanksgiving</span> <span class="p">(</span><span class="nv">year</span><span class="p">)</span>
  <span class="s">"Return the date of Thanksgiving for YEAR."</span>
  <span class="p">(</span><span class="nb">loop</span> <span class="nv">for</span> <span class="nv">day</span> <span class="nv">from</span> <span class="mi">1</span> <span class="nv">upto</span> <span class="mi">7</span>
        <span class="nb">when</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">4</span> <span class="p">(</span><span class="nv">calendar-day-of-week</span> <span class="o">`</span><span class="p">(</span><span class="mi">11</span> <span class="o">,</span><span class="nv">day</span> <span class="o">,</span><span class="nv">year</span><span class="p">)))</span>
        <span class="nb">return</span> <span class="o">`</span><span class="p">(</span><span class="mi">11</span> <span class="o">,</span><span class="p">(</span><span class="nb">+</span> <span class="nv">day</span> <span class="mi">21</span><span class="p">)</span> <span class="o">,</span><span class="nv">year</span><span class="p">)))</span>
</code></pre></div></div>

<p>If there was no <code class="language-plaintext highlighter-rouge">calendar-day-of-week</code> I could compute it using
<a href="http://en.wikipedia.org/wiki/Determination_of_the_day_of_the_week#Gauss.27s_algorithm">Zeller’s algorithm</a>, which I already happen to have
implemented,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">cal/day-of-week</span> <span class="p">(</span><span class="nv">year</span> <span class="nv">month</span> <span class="nv">day</span><span class="p">)</span>
  <span class="s">"Return day of week number (0-7)."</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">Y</span> <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">&lt;</span> <span class="nv">month</span> <span class="mi">3</span><span class="p">)</span> <span class="p">(</span><span class="nb">1-</span> <span class="nv">year</span><span class="p">)</span> <span class="nv">year</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">m</span> <span class="p">(</span><span class="nb">1+</span> <span class="p">(</span><span class="nb">mod</span> <span class="p">(</span><span class="nb">+</span> <span class="nv">month</span> <span class="mi">9</span><span class="p">)</span> <span class="mi">12</span><span class="p">)))</span>
         <span class="p">(</span><span class="nv">y</span> <span class="p">(</span><span class="nb">mod</span> <span class="nv">Y</span> <span class="mi">100</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">c</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">Y</span> <span class="mi">100</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">mod</span> <span class="p">(</span><span class="nb">+</span> <span class="nv">day</span> <span class="p">(</span><span class="nb">floor</span> <span class="p">(</span><span class="nb">-</span> <span class="p">(</span><span class="nb">*</span> <span class="mi">26</span> <span class="nv">m</span><span class="p">)</span> <span class="mi">2</span><span class="p">)</span> <span class="mi">10</span><span class="p">)</span> <span class="nv">y</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">y</span> <span class="mi">4</span><span class="p">)</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">c</span> <span class="mi">4</span><span class="p">)</span> <span class="p">(</span><span class="nb">*</span> <span class="mi">-2</span> <span class="nv">c</span><span class="p">))</span> <span class="mi">7</span><span class="p">)))</span>
</code></pre></div></div>

<p>Now for each year find Thanksgiving and test it for Hanukkah. I
started with 1942 because that’s when the fourth-Thursday-of-November
rule was established. Presumably due to the regexp part, this
expression takes a moment to compute.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">loop</span> <span class="nv">for</span> <span class="nv">year</span> <span class="nv">from</span> <span class="mi">1942</span> <span class="nv">to</span> <span class="mi">80000</span>
      <span class="nb">when</span> <span class="p">(</span><span class="nv">hanukkah-p</span> <span class="p">(</span><span class="nv">thanksgiving</span> <span class="nv">year</span><span class="p">))</span>
      <span class="nv">collect</span> <span class="nv">year</span><span class="p">)</span>
<span class="c1">;; =&gt; (2013 79043 79290 79537 79564 79635 79784 79811 79882)</span>
</code></pre></div></div>

<p>My result exactly matches what I’m seeing elsewhere. The rumors are
correct! The next coincidence occurs on November 23rd, 79043. Thanks,
Emacs!</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Elfeed Tips and Tricks</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/11/26/"/>
    <id>urn:uuid:45fbc221-dbea-302c-22c0-ec0527421ed8</id>
    <updated>2013-11-26T00:38:20Z</updated>
    <category term="elfeed"/><category term="elisp"/><category term="emacs"/>
    <content type="html">
      <![CDATA[<p>This past weekend I had some questions from next-user-here (NUH) on my
<a href="/blog/2013/09/04/">original Elfeed post</a> about changing some of Elfeed’s
behavior. NUH is an Elisp novice so accomplishing some of the
requested modifications wasn’t obvious. A novice is mostly limited to
setting variables, not defining advice or using hooks. I’ve also been
using Elfeed daily for about three months now as my sole web feed
reader and along the way I’ve developed some best practices. In
addition to responding to some of NIH’s questions here, I’d like to
share some tips and tricks.</p>

<h3 id="custom-entry-launchers">Custom Entry Launchers</h3>

<p>Currently you can press “b” to launch one or more entries in your
browser. You can use “y” to copy an single entry to the clipboard.
What if you want to make another action.</p>

<p>In my configuration I have a fancy binding that sends the entry URLs
in the selected region to <a href="http://rg3.github.io/youtube-dl/">youtube-dl</a> for downloading the
videos. It’s too large to share as a snippet so here’s a small example
of something similar using a program called <code class="language-plaintext highlighter-rouge">xcowsay</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">xcowsay</span> <span class="p">(</span><span class="nv">message</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">call-process</span> <span class="s">"xcowsay"</span> <span class="no">nil</span> <span class="no">nil</span> <span class="no">nil</span> <span class="nv">message</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">elfeed-xcowsay</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">entry</span> <span class="p">(</span><span class="nv">elfeed-search-selected</span> <span class="ss">:single</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">xcowsay</span> <span class="p">(</span><span class="nv">elfeed-entry-title</span> <span class="nv">entry</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">define-key</span> <span class="nv">elfeed-search-mode-map</span> <span class="s">"x"</span> <span class="nf">#'</span><span class="nv">elfeed-xcowsay</span><span class="p">)</span>
</code></pre></div></div>

<p>Now when I hit “x” over an entry in Elfeed I’m greeted by a cow
announcing the title.</p>

<p><img src="/img/screenshot/xcowsay-small.png" alt="" /></p>

<h3 id="entry-listing-customization">Entry Listing Customization</h3>

<p>The <em>search</em> buffer you see when starting Elfeed, where entries are
listed, can be customized a few different ways. First, this buffer
<em>does</em> grow dynamically. After re-sizing the window/frame horizontally
you just have to refresh the view by pressing <code class="language-plaintext highlighter-rouge">g</code> (an Emacs
convention). How it fills out depends on the settings of these
variables,</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">elfeed-search-title-max-width</code></li>
  <li><code class="language-plaintext highlighter-rouge">elfeed-search-title-min-width</code></li>
  <li><code class="language-plaintext highlighter-rouge">elfeed-search-trailing-width</code></li>
</ul>

<p>They control how wide the different columns should be as the window
size changes. An important caveat to this is that the cache stored in
<code class="language-plaintext highlighter-rouge">elfeed-search-cache</code> <em>must</em> be cleared before the changes will be
reflected in the display. This cache exists because building the
display, assembling all the special faces, is actually quite
CPU-intensive. It was an optimization I established early on.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">clrhash</span> <span class="nv">elfeed-search-cache</span><span class="p">)</span>
</code></pre></div></div>

<p>If you set these variables in your start-up configuration you don’t
need to worry about clearing the cache because it will already be
empty. It’s only a concern when playing with the settings.</p>

<h4 id="date-display">Date Display</h4>

<p>Another question was about adding time to the entry listing. Elfeed
only displays the entry’s date. Dates are formatted by the function
<code class="language-plaintext highlighter-rouge">elfeed-search-format-date</code>. This can be redefined to display dates
differently.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">elfeed-search-format-date</span> <span class="p">(</span><span class="nv">date</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">format-time-string</span> <span class="s">"%Y-%m-%d %H:%M"</span> <span class="p">(</span><span class="nv">seconds-to-time</span> <span class="nv">date</span><span class="p">)))</span>
</code></pre></div></div>

<p>It’s given epoch seconds as a float and it returns a string to display
as a date.</p>

<h4 id="faces-and-colors">Faces and Colors</h4>

<p>All of the faces used in the display are declared for customization,
so these can be changed to whatever you like.</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">elfeed-search-date-face</code></li>
  <li><code class="language-plaintext highlighter-rouge">elfeed-search-title-face</code></li>
  <li><code class="language-plaintext highlighter-rouge">elfeed-search-feed-face</code></li>
  <li><code class="language-plaintext highlighter-rouge">elfeed-search-tag-face</code></li>
</ul>

<p>Say you suffered a head injury and decided you want your Elfeed dates
to be bold, purple, and underlined,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">custom-set-faces</span>
 <span class="o">'</span><span class="p">(</span><span class="nv">elfeed-search-date-face</span>
   <span class="p">((</span><span class="no">t</span> <span class="ss">:foreground</span> <span class="s">"#f0f"</span>
       <span class="ss">:weight</span> <span class="nv">extra-bold</span>
       <span class="ss">:underline</span> <span class="no">t</span><span class="p">))))</span>
</code></pre></div></div>

<h3 id="database-manipulation">Database Manipulation</h3>

<p>Feeds and entries in the database can be manipulated to become
whatever you want them to be. Because Elfeed is regularly modifying
the database, the trick is to perform the manipulation at <em>just</em> the
right time.</p>

<h4 id="feed-title-changes">Feed Title Changes</h4>

<p>Say you want to change a feed title because you don’t like the title
supplied by the feed. For example, the title to my blog’s feed is
“null program” but instead you think it should be “Seriously Handsome
Programmer” (head injury, remember?). The function
<code class="language-plaintext highlighter-rouge">elfeed-db-get-feed</code> can be used to fetch a feed’s data structure from
the database, given it’s exact URL as listed in your <code class="language-plaintext highlighter-rouge">elfeed-feeds</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">feed</span> <span class="p">(</span><span class="nv">elfeed-db-get-feed</span> <span class="s">"https://nullprogram.com/feed/"</span><span class="p">)))</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">elfeed-feed-title</span> <span class="nv">feed</span><span class="p">)</span> <span class="s">"Seriously Handsome Programmer"</span><span class="p">))</span>
</code></pre></div></div>

<p>Hold it, that didn’t work. First, that display cache is getting in the
way again. Feed titles change very infrequently so they’re cached
aggressively. More importantly, next time you update your feeds Elfeed
will re-synchronize the feed title with the official title. It’s going
to fight against your intervention.</p>

<p>The solution is to do it with a little bit of advice just before the
title is displayed. Advise the function <code class="language-plaintext highlighter-rouge">elfeed-search-update</code> with
some “before” advice.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defadvice</span> <span class="nv">elfeed-search-update</span> <span class="p">(</span><span class="nv">before</span> <span class="nv">nullprogram</span> <span class="nv">activate</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">feed</span> <span class="p">(</span><span class="nv">elfeed-db-get-feed</span> <span class="s">"https://nullprogram.com/feed/"</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">elfeed-feed-title</span> <span class="nv">feed</span><span class="p">)</span> <span class="s">"Seriously Handsome Programmer"</span><span class="p">)))</span>
</code></pre></div></div>

<h4 id="entry-tweaking">Entry Tweaking</h4>

<p>Automatic entry modification should happen immediately upon discovery
so that it looks like the entry arrived that way. This is done through
the <code class="language-plaintext highlighter-rouge">elfeed-new-entry-hook</code>. Generally this would be used for applying
custom tags. These examples are from the documentation:</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Mark all YouTube entries</span>
<span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'elfeed-new-entry-hook</span>
          <span class="p">(</span><span class="nv">elfeed-make-tagger</span> <span class="ss">:feed-url</span> <span class="s">"youtube\\.com"</span>
                              <span class="ss">:add</span> <span class="o">'</span><span class="p">(</span><span class="nv">video</span> <span class="nv">youtube</span><span class="p">)))</span>

<span class="c1">;; Entries older than 2 weeks are marked as read</span>
<span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'elfeed-new-entry-hook</span>
          <span class="p">(</span><span class="nv">elfeed-make-tagger</span> <span class="ss">:before</span> <span class="s">"2 weeks ago"</span>
                              <span class="ss">:remove</span> <span class="ss">'unread</span><span class="p">))</span>

<span class="c1">;; Building subset feeds</span>
<span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'elfeed-new-entry-hook</span>
          <span class="p">(</span><span class="nv">elfeed-make-tagger</span> <span class="ss">:feed-url</span> <span class="s">"example\\.com"</span>
                              <span class="ss">:entry-title</span> <span class="o">'</span><span class="p">(</span><span class="nb">not</span> <span class="s">"something interesting"</span><span class="p">)</span>
                              <span class="ss">:add</span> <span class="ss">'junk</span>
                              <span class="ss">:remove</span> <span class="ss">'unread</span><span class="p">))</span>
</code></pre></div></div>

<p>Due to a feature I recently ported from my personal configuration,
this tagger helper function is less necessary. You can put lists in
your <code class="language-plaintext highlighter-rouge">elfeed-feeds</code> list to supply automatic tags.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">setq</span> <span class="nv">elfeed-feeds</span>
      <span class="o">'</span><span class="p">((</span><span class="s">"https://nullprogram.com/feed/"</span> <span class="nv">blog</span> <span class="nv">emacs</span><span class="p">)</span>
        <span class="s">"http://www.50ply.com/atom.xml"</span>  <span class="c1">; no autotagging</span>
        <span class="p">(</span><span class="s">"http://nedroid.com/feed/"</span> <span class="nv">webcomic</span><span class="p">)))</span>
</code></pre></div></div>

<h4 id="content-tweaking">Content Tweaking</h4>

<p>Going beyond tagging you could change the content of the feed. Say you
want to <a href="http://xkcd.com/1031/">make feeds 100 times better</a>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">hundred-times-better</span> <span class="p">(</span><span class="nv">entry</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">original</span> <span class="p">(</span><span class="nv">elfeed-deref</span> <span class="p">(</span><span class="nv">elfeed-entry-content</span> <span class="nv">entry</span><span class="p">)))</span>
         <span class="p">(</span><span class="nb">replace</span> <span class="p">(</span><span class="nv">replace-regexp-in-string</span> <span class="s">"keyboard"</span> <span class="s">"leopard"</span> <span class="nv">original</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">elfeed-entry-content</span> <span class="nv">entry</span><span class="p">)</span> <span class="p">(</span><span class="nv">elfeed-ref</span> <span class="nb">replace</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'elfeed-new-entry-hook</span> <span class="nf">#'</span><span class="nv">hundred-times-better</span><span class="p">)</span>
</code></pre></div></div>

<p>The same trick could be used to remove advertising, change the date,
change the title, etc. The <code class="language-plaintext highlighter-rouge">elfeed-deref</code> and <code class="language-plaintext highlighter-rouge">elfeed-ref</code> parts are
needed to fetch and store content in the content database. Only a
reference is stored on the structure. You can actually use these
functions at any time outside of Elfeed, but they’ll eventually get
garbage collected if Elfeed doesn’t know about them.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">ref</span> <span class="p">(</span><span class="nv">elfeed-ref</span> <span class="s">"Hello, World"</span><span class="p">))</span>
<span class="c1">;; =&gt; [cl-struct-elfeed-ref "907d14fb3af2b0d4f18c2d46abe8aedce17367bd"]</span>

<span class="p">(</span><span class="nv">elfeed-deref</span> <span class="nv">ref</span><span class="p">)</span>
<span class="c1">;; =&gt; "Hello, World"</span>
</code></pre></div></div>

<h3 id="deletion">Deletion</h3>

<p>A question that’s been asked few times is if entries can be <em>deleted</em>.
To start off, the answer to that question is “no.” There is no
function provided to remove entries from the database. If you want to
remove entries you’re probably taking the wrong approach.</p>

<p>The main problem with removal is that Elfeed needs to keep track of
what it’s seen before. If an entry is removed and then rediscovered,
it will reappear as unread. There are better ways to “remove” entries,
such as tagging them specially.</p>

<p>On a moderately-powerful computer Elfeed can easily handle <em>at least</em>
several tens of thousands of database entries. If “too many entries”
ever becomes a performance problem I’d rather solve it by making the
database faster than by removing information from the database. It’s
already very date-oriented so that older entries are infrequently
touched.</p>

<p>If storage is a concern, you shouldn’t get too worked up about that.
As of this post I have about 6,000 entries in my database and the
index file is only 3.5 MB. The content database after garbage
collection, which is the <code class="language-plaintext highlighter-rouge">data/</code> directory under <code class="language-plaintext highlighter-rouge">~/.elfeed/</code>, with
these 6k entries is 17MB. When I run <code class="language-plaintext highlighter-rouge">M-x elfeed-db-compact</code>,
currently an experimental feature, it drops down to 1.8MB. That’s less
than 1 kB per entry. It’s also less than my personal Liferea database
of roughly the same amount of content (~15MB) before I wrote Elfeed.</p>

<p>If even this storage is still too much you can always blow away your
<code class="language-plaintext highlighter-rouge">data/</code> content database directory. This is safe to do even while
Emacs is running. You’ll still see all of the entries listed in the
search buffer but won’t be able to read them within Emacs until after
the next database update (when it re-fetches the most recent entry
content).</p>

<p>You can also clear out the content database from within Elisp by
visiting every entry and clearing its content field.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">with-elfeed-db-visit</span> <span class="p">(</span><span class="nv">entry</span> <span class="nv">_</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">elfeed-entry-content</span> <span class="nv">entry</span><span class="p">)</span> <span class="no">nil</span><span class="p">))</span>

<span class="p">(</span><span class="nv">elfeed-db-gc</span><span class="p">)</span>  <span class="c1">;; garbage collect everything</span>
</code></pre></div></div>

<p>The same sort of expression can be used to run over all known entries
to perform other changes. If there was a delete function you might use
it here to remove entries older than a certain date, then hope they’re
not rediscovered.</p>

<p>If you <em>never</em> want to store entry content (you never read entries
within Emacs), you can use a hook to always drop it on the floor as it
arrives,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'elfeed-new-entry-hook</span>
          <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">entry</span><span class="p">)</span> <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">elfeed-entry-content</span> <span class="nv">entry</span><span class="p">)</span> <span class="no">nil</span><span class="p">)))</span>
</code></pre></div></div>

<h3 id="questions">Questions?</h3>

<p>If you have any questions or suggestions about how to make Elfeed do
what you want it to do, feel free to ask. Some things may actually
require that I make changes to Elfeed to support it, though I hope
I’ve anticipated your particular need well enough to avoid that.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  <entry>
    <title>The Elfeed Database</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/09/09/"/>
    <id>urn:uuid:8aba2e49-22a0-330b-e664-54fb50ecdd00</id>
    <updated>2013-09-09T05:53:41Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="elfeed"/>
    <content type="html">
      <![CDATA[<p>The design of <a href="/blog/2013/09/04/">Elfeed’s</a> database took some experimentation
before any part of it was settled. A major design constraint was
Emacs’ very limited file input/output. There’s no random access and,
without the aid of an external program, files must always be read and
written wholesale. That’s not database-friendly at all! In the end I
settled on a design that minimized the size of the frequently
rewritten parts, an index with two different data models, by storing
immutable data in a loose-file, content-addressable database.</p>

<p>At the moment there really aren’t any pure-Elisp database solutions
for Emacs. This is almost certainly due to the aforementioned I/O
limitations. I ran into this same problem last year when I created
<a href="/blog/2012/12/29/">an Emacs pastebin server</a>. I attempted, and failed, to
interface with a SQLite database through it’s command line program.
Nic Ferrier has published a <a href="https://github.com/nicferrier/emacs-db">generic database interface</a>,
but it lacks concrete implementations.</p>

<p>As a bit of good news, as far as I know Emacs <em>does</em> properly handle
atomic file updates across all platforms, so a pure-Elisp database
developer would never have to worry about only writing half the
database. It’s always a safe operation. Worst case scenario you’re
left with an old version of data rather than no data at all.</p>

<p>A real possibility for a database would be connecting to an
established database server via TCP with an Emacs network process. If
the server has a specified wire protocol Elisp could talk to it
efficiently. In fact, there’s exists <a href="http://www.online-marketwatch.com/pgel/pg.html">pg.el</a> that does <em>exactly</em>
this for PostgreSQL. Unfortunately I was not able to get this working
with my pastebin, nor is this solution appropriate for Elfeed. It
would be unreasonable to require users to first set up a PostgreSQL
server just to read web feeds!</p>

<p>Ultimately it would seem that any efficient Emacs database requires
the help of an external program. The <a href="http://notmuchmail.org/">notmuch</a> mail client,
which inspired Elfeed, does this. To access the notmuch database a
command line program is run once for each request. A query is passed
as a program argument and the output of the program is parsed into the
result.</p>

<h3 id="the-early-database">The Early Database</h3>

<p>For the first few days of its existence Elfeed only had an in-memory
database. Closing Emacs would lose everything. For my personal usage
patterns, where I read, or at least address, all entries that arrive
— and especially because I use Elfeed on a couple of different
computers — I don’t really <em>need</em> to track things long term. I could
easily mark everything after a certain date as read and forget about
them. However, it would be nice to have and, more importantly, many
people wouldn’t use Elfeed without persistence between Emacs sessions.</p>

<p>So, for the first database I did what I always do: dumped the data
structure to a file using the printer and parsed it back in later
using the reader. This is dead simple in Lisp, it’s very fast, and it
even works for circular data structures. It’s something I missed so
much with the much-less-capable JSON format earlier this year that I
<a href="/blog/2013/03/28/">wrote a JavaScript library to do it</a>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">save-data</span> <span class="p">(</span><span class="nv">file</span> <span class="nv">data</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">with-temp-file</span> <span class="nv">file</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">standard-output</span> <span class="p">(</span><span class="nv">current-buffer</span><span class="p">))</span>
          <span class="p">(</span><span class="nv">print-circle</span> <span class="no">t</span><span class="p">))</span>  <span class="c1">; Allow circular data</span>
      <span class="p">(</span><span class="nb">prin1</span> <span class="nv">data</span><span class="p">))))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">load-data</span> <span class="p">(</span><span class="nv">file</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">with-temp-buffer</span>
    <span class="p">(</span><span class="nv">insert-file-contents</span> <span class="nv">file</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">read</span> <span class="p">(</span><span class="nv">current-buffer</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">save-data</span> <span class="s">"demo.dat"</span> <span class="o">'</span><span class="p">(</span><span class="nv">a</span> <span class="nv">b</span> <span class="nv">c</span> <span class="nv">[</span><span class="s">"1"</span> <span class="mi">2</span> <span class="nv">3]</span><span class="p">))</span>
<span class="p">(</span><span class="nv">load-data</span> <span class="s">"demo.dat"</span><span class="p">)</span>
<span class="c1">;; =&gt; (a b c ["1" 2 3])</span>
</code></pre></div></div>

<p>Anything with a printed representation can be serialized and stored
this way, including symbols, string, numbers, lists, vectors (structs,
objects), hash tables, and even compiled functions (.elc files).
Basically every Emacs library that stores data on disk uses this
technique.</p>

<p>Unfortunately, this is where I hit another serious database
constraint: <a href="http://lists.gnu.org/archive/html/bug-gnu-emacs/2013-08/msg00860.html"><strong><code class="language-plaintext highlighter-rouge">print-circle</code> is broken in Emacs 24.3</strong></a>,
the current stable release. This means Elfeed cannot take advantage of
this useful feature, at least not for a long time, as I had been
counting on. The final database is slightly slower and larger than
strictly required as a result.</p>

<h3 id="the-content-database">The Content Database</h3>

<p>After breaking the circular references of the in-memory database I
finally had persistence for the first time. With the naive
printer/reader approach it was slow, almost 1 second to write just a
few thousand entries on my 6-year-old laptop (my minimum requirements
target machine). I wanted Elfeed to support hundreds of thousands of
entries, if not millions, so this was much too slow.</p>

<p>The big slowdown was writing out all the entry content each time the
database is saved. These large strings containing HTML that rarely
change. There’s no reason to write these out every time, nor is there
a reason to even keep them in memory all the time, as it’s rarely
accessed. The solution is a loose-file, content-addressable database,
very similar to an unpacked Git object database.</p>

<p>The content database stores immutable sequences of characters — not
just raw bytes, but rather multibyte strings — using an unspecified
coding system (right now it’s UTF-8 for all platforms). The filename
for the content is the content hashed with SHA-1
(“content-addressable”). To limit the number of files per directory,
these files are stored in subdirectories named by the first
hex-encoded byte of the hash (just like Git). A database of 4 items
might look like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>data/
   18/
      18ff6f11945b1e9f3e3c4cae8b5275d36b9944e1
      184c06a83f0bc73a8345c6d886f9043bcae095f8
   6b/
      6b59ae257f2bea24703d8adf5747049c138dfc82
   cc/
      cc47d53872ae2a9186151ef1a68392a94e1f091f
</code></pre></div></div>

<p>Something really neat about the content database is that it’s
completely agnostic about Elfeed. If it weren’t for Elfeed’s garbage
collector, anyone could use it to store arbitrary content. The
function <code class="language-plaintext highlighter-rouge">elfeed-ref</code> accepts a string and returns a reference into
the database. Because of the hash, providing the same string in the
future will return the same reference without actually performing a
write. References are dereferenced with <code class="language-plaintext highlighter-rouge">elfeed-deref</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">ref</span> <span class="p">(</span><span class="nv">elfeed-ref</span> <span class="s">"Hello, world!"</span><span class="p">))</span>
<span class="c1">;; =&gt; [cl-struct-elfeed-ref "943a702d06f34599aee1f8da8ef9f7296031d699"]</span>

<span class="p">(</span><span class="nv">elfeed-deref</span> <span class="nv">ref</span><span class="p">)</span>
<span class="c1">;; =&gt; "Hello, world"</span>
</code></pre></div></div>

<p>With content stored elsewhere, entries are a struct containing only
some small metadata: title, link, date, and a content database
reference. Writing out many of them at once is much, much faster.</p>

<p>I don’t expect it happens often, but this also means content is
de-duplicated. If two entries happen to have the same content they’ll
share content database storage. A small savings.</p>

<p>At this point it’s really tempting to get fancier and really put this
content database to use. The core index itself could be stored as raw
content, and the root to accessing the database would be a single
SHA-1 hash referencing it — again, <em>very</em> similar to Git. If an index
stores a reference to the previously written index, then the the
Elfeed database would be an immutable structure tracking its entire
history. Such a change would cost virtually nothing in performance,
just disk space.</p>

<h3 id="multiple-representations">Multiple Representations</h3>

<p>With all the content out of the way, the database is now just a lean
index. At this point it’s a hash table mapping feed IDs to feeds.
Feeds contain a list of its entries. To build the entry listing for
the elfeed-search buffer, Elfeed needs to visit each feed in the hash
table, gather its entries into one giant list, then finally sort that
list by date. At around O(n log n), that sort operation is a real
performance killer. Completely unacceptable. To fix this we need to
think about how the data is updated and used.</p>

<p>First, <strong>entries are <em>always</em> viewed in date order</strong>, no exceptions.
From my experience of using web feeds for the last six years I <em>never</em>
had a reason to list feed entries by any other order. The vast
majority of the time, newer entries are most relevant, and if I need
to look for something specific I can search for it.</p>

<p>We definitely want to store entries in date-order so we can create
entry listings without performing a sort: something around O(n) or so.
Inserting new entries into this structure should also be efficient.</p>

<p>Second, <strong>entries are never <em>removed</em> from the database</strong>. This isn’t
e-mail. Even if a user doesn’t want to see an entry again, we have to
keep track of it. Otherwise it will show up as new if it’s discovered
in a feed again, which is likely. Things are added to the database and
never removed. In Elfeed, I use a <code class="language-plaintext highlighter-rouge">junk</code> tag to completely hide
entries I don’t want to see, and I always have a <code class="language-plaintext highlighter-rouge">-junk</code> element in my
filter.</p>

<p>There’s an important caveat to this one that I had missed until after
the public release: entry dates can change! When a previously
discovered entry is read from a feed, Elfeed updates (read: mutates)
the entry struct to reflect the new state. This includes the date.
It’s very likely that a date-sorted representation won’t tolerate date
changes underneath it since it’s keying off of them. Either we refuse
to update the entry date, or we remove the entry, update the date, and
then re-insert it (how it currently works).</p>

<p>Third, <strong>entries are generally added with a recent date</strong>. After the
database is initially populated, it’s only picking up new items. We
should prefer adding recently-dated entries be faster than adding
older entries. I didn’t get a chance to take advantage of this, but
it’s something to keep in mind.</p>

<p>Fourth, <strong>entries need to be keyed by an ID string</strong>. Each entry has a
unique, unchanging identifier string, either provided by the feed
itself (RSS’s <code class="language-plaintext highlighter-rouge">guid</code> or Atom’s <code class="language-plaintext highlighter-rouge">id</code>) or generated intelligently by
Elfeed. Especially because of the <code class="language-plaintext highlighter-rouge">print-circle</code> bug, we need to be
able to talk about feeds in terms of their ID — an indirect pointer.</p>

<p>(Actually, even when RSS <code class="language-plaintext highlighter-rouge">guid</code> tags are present, they’re permalinks
by default. So, unfortunately, RSS IDs are not at all resistant to
collisions across feeds. To work around this, entry identifiers are a
<em>pair</em> of strings: feed ID and entry ID. Atom doesn’t have this
problem, but we’re stuck with the lowest common denominator.)</p>

<p>A date-oriented representation would be unable to efficiently look up
an entry by its ID, so it needs to be supplemented by an ID-oriented
representation. This means we need two representations in our
database: date-oriented and ID-oriented.</p>

<p>So what do we use? Well, for keeping entries sorted by date we want
some sort of balanced tree. A B-tree is probably a good choice. Rather
than write one I went with an AVL tree since Emacs comes with a
library for it (<code class="language-plaintext highlighter-rouge">avl-tree</code>). It’s already debugged and optimized! The
bad news is that the internal structure is unspecified, so there are
no guarantees that it can be serialized. A future update to the
library may break the Elfeed database. I also had to hack into it to
work around a security issue. The comparison function is embedded in
the tree. After deserializing the database, Elfeed needs to ensure
that no one stuck a malicious function in there.</p>

<p>The choice for an ID database was super-easy: a hash table. Due to the
<code class="language-plaintext highlighter-rouge">print-circle</code> bug, this is actually the main representation. The AVL
tree only stores IDs and it has to reach into the hash table to do any
date comparisons. If <code class="language-plaintext highlighter-rouge">print-circle</code> was working I could store the same
exact entry objects in the AVL tree as the hash table, so mutating
them would update them in all representations. However, with
<code class="language-plaintext highlighter-rouge">print-circle</code> off, on deserialization these would become unique
objects and updates would break.</p>

<h3 id="the-future">The Future</h3>

<p>That’s where the database is today. I put in a few extra fields that
aren’t actually used yet, so that there’s room to make a few changes
without breaking the database. Perhaps someday I’ll work out a whole
new database structure, or maybe a proper database library will come
into existence, and this post will simply document the old database.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Introducing Elfeed, an Emacs Web Feed Reader</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/09/04/"/>
    <id>urn:uuid:fdfd55d2-65dd-39cc-6695-655c3ea7e8e0</id>
    <updated>2013-09-04T05:33:10Z</updated>
    <category term="emacs"/><category term="web"/><category term="elfeed"/>
    <content type="html">
      <![CDATA[<p>Unsatisfied with my the results of
<a href="/blog/2013/06/13/">recent search for a new web feed reader</a>, I created my own
from scratch, called <a href="https://github.com/skeeto/elfeed">Elfeed</a>. It’s built on top of Emacs and
is available for download through <a href="http://melpa.milkbox.net/">MELPA</a>. I intend it to be
highly extensible, a power user’s web feed reader. It supports both
Atom and RSS.</p>

<ul>
  <li><a href="https://github.com/skeeto/elfeed">https://github.com/skeeto/elfeed</a></li>
</ul>

<p>The design of Elfeed was inspired by <a href="http://notmuchmail.org/">notmuch</a>, which is
<a href="/blog/2013/09/03/">my e-mail client of choice</a>. I’ve enjoyed the notmuch search
interface and the extensibility of the whole system — a side-effect
of being written in Emacs Lisp — so much that I wanted a similar
interface for my web feed reader.</p>

<h3 id="the-search-buffer">The search buffer</h3>

<p>Unlike many other feed readers, Elfeed is oriented around <em>entries</em> —
the Atom term for articles — rather than <em>feeds</em>. It cares less about
where entries came from and more about listing relevant entries for
reading. This listing is the <code class="language-plaintext highlighter-rouge">*elfeed-search*</code> buffer. It looks like
this,</p>

<p><a href="/img/elfeed/search.png"><img src="/img/elfeed/search-thumb.png" alt="" /></a></p>

<p>This buffer is not necessarily about listing unread or recent entries,
it’s a filtered view of all entries in the local Elfeed database.
Hence the “search” buffer. Entries are marked with various <em>tags</em>,
which play a role in view filtering — the notmuch model. By default,
all new entries are tagged <code class="language-plaintext highlighter-rouge">unread</code> (customize with
<code class="language-plaintext highlighter-rouge">elfeed-initial-tags</code>). I’ll cover the filtering syntax shortly.</p>

<p>From the search buffer there are a number of ways to interact with
entries. You can select an single entry with the point, or multiple
entries at once with a region, and interact with them.</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">b</code>: visit the selected entries in a browser</li>
  <li><code class="language-plaintext highlighter-rouge">y</code>: copy the selected entry URL to the clipboard</li>
  <li><code class="language-plaintext highlighter-rouge">r</code>: mark selected entries as read</li>
  <li><code class="language-plaintext highlighter-rouge">u</code>: mark selected entries as unread</li>
  <li><code class="language-plaintext highlighter-rouge">+</code>: add a specific tag to selected entries</li>
  <li><code class="language-plaintext highlighter-rouge">-</code>: remove a specific tag from selected entries</li>
  <li><code class="language-plaintext highlighter-rouge">RET</code>: view selected entry in a buffer</li>
</ul>

<p>(This list can be viewed within Emacs with the standard <code class="language-plaintext highlighter-rouge">C-h m</code>.)</p>

<p>The last action uses the Simple HTTP Renderer (shr), now part of
Emacs, to render entry content into a buffer for viewing. It will even
fetch and display images in the buffer, assuming your Emacs has been
built for it. (Note: the GNU-provided Windows build of Emacs doesn’t
ship with the necessary libraries.) It looks a lot like reading an
e-mail within Emacs,</p>

<p><a href="/img/elfeed/show.png"><img src="/img/elfeed/show-thumb.png" alt="" /></a></p>

<p>The standard read-only keys are in action. Space and backspace are for
page up/down. The <code class="language-plaintext highlighter-rouge">n</code> and <code class="language-plaintext highlighter-rouge">p</code> keys switch between the next and
previous entries from the search buffer. The idea is that you should
be able to hop into the first entry and work your way along reading
them within Emacs when possible.</p>

<h3 id="configuration">Configuration</h3>

<p>Elfeed maintains a database in <code class="language-plaintext highlighter-rouge">~/.elfeed/</code> (configurable). It will
start out empty because you need to tell it what feeds you’d like to
follow. List your feeds <code class="language-plaintext highlighter-rouge">elfeed-feeds</code> variable. You would do this in
your <code class="language-plaintext highlighter-rouge">.emacs</code> or other initialization files.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">setq</span> <span class="nv">elfeed-feeds</span>
      <span class="o">'</span><span class="p">(</span><span class="s">"http://www.50ply.com/atom.xml"</span>
        <span class="s">"http://possiblywrong.wordpress.com/feed/"</span>
        <span class="c1">;; ...</span>
        <span class="s">"http://www.devrand.org/feeds/posts/default"</span><span class="p">))</span>
</code></pre></div></div>

<p>Once set, hitting <code class="language-plaintext highlighter-rouge">G</code> (capitalized) in the search buffer or running
<code class="language-plaintext highlighter-rouge">elfeed-update</code> will tell Elfeed to fetch each of these feeds and load
in their entries. Entries will populate the search buffer as they are
discovered (assuming they pass the current filter), where they can be
immediately acted upon. Pressing <code class="language-plaintext highlighter-rouge">g</code> (lower case) refreshes the search
buffer view without fetching any feeds.</p>

<p>Everything fetched will be added to the database for next time you run
Emacs. It’s not required at all in order to use Elfeed, but I’ll
discuss some of
<a href="/blog/2013/09/09/">the details of the database format in another post</a>.</p>

<h3 id="the-search-filter">The search filter</h3>

<p>Pressing <code class="language-plaintext highlighter-rouge">s</code> in the search buffer will allow you to edit the search
filter in action.</p>

<p>There are three kinds of ways to filter on entries, in order of
efficiency: by age, by tag, and by regular expression. For an entry to
be shown, it must pass each of the space-delimited components of the
filter.</p>

<p>Ages are described by plain language relative time, starting with <code class="language-plaintext highlighter-rouge">@</code>.
This component is ultimately parsed by Emacs’ <code class="language-plaintext highlighter-rouge">time-duration</code>
function. Here are some examples.</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">@1-year-old</code></li>
  <li><code class="language-plaintext highlighter-rouge">@5-days-ago</code></li>
  <li><code class="language-plaintext highlighter-rouge">@2-weeks</code></li>
</ul>

<p>Tag filters start with <code class="language-plaintext highlighter-rouge">+</code> and <code class="language-plaintext highlighter-rouge">-</code>. When <code class="language-plaintext highlighter-rouge">+</code>, entries <em>must</em> be tagged
with that tag. When <code class="language-plaintext highlighter-rouge">-</code>, entries <em>must not</em> be tagged with that tag.
Some examples,</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">+unread</code>: show only unread posts.</li>
  <li><code class="language-plaintext highlighter-rouge">-junk +unread</code>: don’t show unread “junk” entries.</li>
</ul>

<p>Anything else is treated like a regular expression. However, the
regular expression is applied <em>only</em> to titles and URLs for both
entries and feeds. It’s not currently possible to filter on entry
content, and I’ve found that I never want to do this anyway.</p>

<p>Putting it all together, here are some examples.</p>

<ul>
  <li>
    <p><code class="language-plaintext highlighter-rouge">linu[xs] @1-year-old</code>: only show entries about Linux or Linus from
the last year.</p>
  </li>
  <li>
    <p><code class="language-plaintext highlighter-rouge">-unread +youtube</code>: only show previously-read entries tagged
with <code class="language-plaintext highlighter-rouge">youtube</code>.</p>
  </li>
</ul>

<p>Note: the database is date-oriented, so age filtering is by far the
fastest. Including an age limit will greatly increase the performance
of the search buffer, so I recommend adding it to the default filter
(<code class="language-plaintext highlighter-rouge">elfeed-search-search-filter</code>).</p>

<h3 id="tagging">Tagging</h3>

<p>Generally you don’t want to spend time tagging entries. Fortunately
this step can easily be automated using <code class="language-plaintext highlighter-rouge">elfeed-make-tagger</code>. To tag
all YouTube entries with <code class="language-plaintext highlighter-rouge">youtube</code> and <code class="language-plaintext highlighter-rouge">video</code>,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'elfeed-new-entry-hook</span>
          <span class="p">(</span><span class="nv">elfeed-make-tagger</span> <span class="ss">:feed-url</span> <span class="s">"youtube\\.com"</span>
                              <span class="ss">:add</span> <span class="o">'</span><span class="p">(</span><span class="nv">video</span> <span class="nv">youtube</span><span class="p">)))</span>
</code></pre></div></div>

<p>Any functions added to <code class="language-plaintext highlighter-rouge">elfeed-new-entry-hook</code> are called with the new
entry as its argument. The <code class="language-plaintext highlighter-rouge">elfeed-make-tagger</code> function returns a
function that applies tags to entries matching specific criteria.</p>

<p>This tagger tags old entries as read. It’s handy for initializing an
Elfeed database on a new computer, since I’ve likely already read most
of the entries being discovered.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'elfeed-new-entry-hook</span>
          <span class="p">(</span><span class="nv">elfeed-make-tagger</span> <span class="ss">:before</span> <span class="s">"2 weeks ago"</span>
                              <span class="ss">:remove</span> <span class="ss">'unread</span><span class="p">))</span>
</code></pre></div></div>

<h3 id="creating-custom-subfeeds">Creating custom subfeeds</h3>

<p>Tagging is also really handy for fixing some kinds of broken feeds or
otherwise filtering out unwanted content. I like to use a <code class="language-plaintext highlighter-rouge">junk</code> tag
to indicate uninteresting entries.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'elfeed-new-entry-hook</span>
          <span class="p">(</span><span class="nv">elfeed-make-tagger</span> <span class="ss">:feed-url</span> <span class="s">"example\\.com"</span>
                              <span class="ss">:entry-title</span> <span class="o">'</span><span class="p">(</span><span class="nb">not</span> <span class="s">"something interesting"</span><span class="p">)</span>
                              <span class="ss">:add</span> <span class="ss">'junk</span>
                              <span class="ss">:remove</span> <span class="ss">'unread</span><span class="p">))</span>
</code></pre></div></div>

<p>There are a few feeds I’d <em>like</em> to follow but do not because the
entries lack dates. This makes them difficult to follow without a
shared, persistent database. I’ve contacted the authors of these feeds
to try to get them fixed but have not gotten any responses. I haven’t
quite figured out how to do it yet, but I will eventually create a
function for <code class="language-plaintext highlighter-rouge">elfeed-new-entry-hook</code> that adds reasonable dates to
these feeds.</p>

<h3 id="custom-actions">Custom actions</h3>

<p>In <a href="https://github.com/skeeto/.emacs.d">my own .emacs.d configuration</a> I’ve added a new entry action
to Elfeed: video downloads with youtube-dl. When I hit <code class="language-plaintext highlighter-rouge">d</code> on a
YouTube entry either in the entry “show” buffer or the search buffer,
Elfeed will download that video into my local drive. I consume quite a
few YouTube videos on a regular basis (I’m a “cord-never”), so this
has already saved me a lot of time.</p>

<p>Adding custom actions like this to Elfeed is exactly the extensibility
I’m interested in supporting. I want this to be easy. After just a
week of usage I’ve already customized Elfeed a lot for myself — very
specific customizations which are not included with Elfeed.</p>

<h3 id="web-interface">Web interface</h3>

<p>Elfeed also includes a web interface! If you’ve loaded/installed
<code class="language-plaintext highlighter-rouge">elfeed-web</code>, start it with <code class="language-plaintext highlighter-rouge">elfeed-web-start</code> and visit this URL in
your browser (check your <code class="language-plaintext highlighter-rouge">httpd-port</code>).</p>

<ul>
  <li>http://localhost:8080/elfeed/</li>
</ul>

<p><a href="/img/elfeed/web.png"><img src="/img/elfeed/web-thumb.png" alt="" /></a></p>

<p>Elfeed exposes a RESTful JSON API, consumable by any application. The
web interface builds on this using AngularJS, behaving as a
single-page application. It includes a filter search box that filters
out entries as you type. I think it’s pretty slick, though still a bit
rough.</p>

<p>It still needs some work to truly be useful. I’m intending for this to
become the “mobile” interface to Elfeed, for remote access on a phone
or tablet. Patches welcome.</p>

<h3 id="try-it-out">Try it out</h3>

<p>After Google Reader closed I tried The Old Reader for awhile. When
that collapsed under its own popularity I decided to go with a local
client reader. Canto was crushed under the weight of all my feeds, so
I ended up using Liferea for awhile. Frustrated at Liferea’s lack of
extensibility and text-file configuration, I ended up writing Elfeed.</p>

<p>Elfeed now serving 100% of my personal web feed reader needs. I think
it’s already far better than any reader I’ve used before. Another case
of “I should have done this years ago,” though I think I lacked the
expertise to pull it off well until fairly recently.</p>

<p>At the moment I believe Elfeed is already the most extensible and
powerful web feed reader in the world.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Leaving Gmail Behind</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/09/03/"/>
    <id>urn:uuid:8d4f4f52-2295-32c7-8452-d22d4f45a437</id>
    <updated>2013-09-03T03:45:58Z</updated>
    <category term="emacs"/>
    <content type="html">
      <![CDATA[<p><em>Update May 2017</em>: Shortly after <a href="/blog/2017/04/01/">switching to modal editing</a>, I
stopped using Emacs and Notmuch as my mail client. I <a href="/blog/2017/06/15/">now use Mutt and
Vim</a>.</p>

<p>For the last 8 years I have been using Gmail as my e-mail provider,
during which 3 years I was a student. It was very convenient,
inexpensive (cost-free!), and especially suitable for a student using
various lab computers and stuck behind a fairly restrictive firewall
(an anti-file-sharing measure). I didn’t have to worry about filtering
spam or maintaining or paying for a server. Easy, easy, easy. This has
finally come to an end.</p>

<p>That convenience kept me as a user after college up until now. As of
two weeks ago I changed my e-mail address. You can find it listed
below my portrait on this page (if you’re reading this on my website).
Not only am I using my own domain name, but I’m running my own e-mail
server. After attempting, and failing, at creating a decent setup with
<a href="http://www.djcbsoftware.nl/code/mu/mu4e.html">mu4e</a>, I ended up following this guide:</p>

<ul>
  <li><a href="http://dbpmail.net/essays/2013-06-29-hackers-replacement-for-gmail.html">A Hacker’s Replacement for GMail</a></li>
</ul>

<p>It’s built around the superb <a href="http://notmuchmail.org/">notmuch mail indexer</a>, which
includes a powerful, fast Emacs e-mail client. Just from its technical
superiority <strong>I’m wishing I switched to notmuch years ago</strong>. I’m now
understanding for the first time why all those old fogey hackers like
to use e-mail for everything: mailing lists, software patches, bug
reporting, etc. E-mail a user interface agnostic system, giving
everyone their own choice. The tricky part is setting up a decent
interface to it.</p>

<h3 id="prompting-the-change">Prompting the change</h3>

<p>As you know, <a href="/blog/2013/06/13/">Google Reader shut down this past July</a>. This
left me using only two Google services: Talk and Mail. Not only is
Talk easily replaceable — it has no significant data on Google’s
servers — <a href="http://windowspbx.blogspot.com/2013/05/hangouts-wont-hangout-with-other.html">it will be shut down soon as well</a>. I would need to
move to a different XMPP server anyway. If I could move off of Gmail,
I would finally be able to discontinue my Google account <em>for good</em>!</p>

<p>Why would I want to do this? It’s become increasingly apparent,
especially this year, that there is very little privacy to be had when
logged into a Google account. Various intelligence and law enforcement
organizations have easy — likely automated — access to user data,
especially e-mail. I’d really like to take that privacy back.</p>

<p>There are also the technical reasons. It’s a bit embarrassing to
admit, but the final straw that pushed me to finally leaving Gmail was
<a href="http://gizmodo.com/gmails-new-compose-window-will-soon-be-your-only-choic-1123199551">the new compose interface</a> — a technical issue rather than
a privacy issue — which was pushed out just a few days before I left.
I found it to be very unpleasant and, worse, completely incompatible
with Pentadactyl, as there is no longer a plain-text option (the one
listed is a fake). A huge technical step backwards, towards the layman
and away from the power user. I could do so much better than this.</p>

<p>Also embarrassing was being unable to have any meaningful use of PGP
with e-mail all these years. That’s something I’ve always wanted to
fix.</p>

<p><a href="http://www.50ply.com/">Brian</a> was a guinea pig for me, because, between the two of
us, he was actually the first one to move to his own e-mail platform.
Seeing his success was a big encouragement for me. Not only could it
be done fairly easily, but the results would be a huge improvement.
This was all within my grasp!</p>

<h3 id="a-daunting-task">A daunting task</h3>

<p>Switching <em>everything</em> about my e-mail — provider, client, spam
filter, server, domain — and running it myself would be a daunting
task. There’s 8 years of archived e-mail to manage, though I kept it
trimmed to a relatively light 1 GB of storage (which I cut down to 200
MB before exporting). I’ve made backups of all this e-mail on
occasion, but I never had to worry about searching or actually using
it. The backups were done “just in case.”</p>

<p>Gmail has excellent spam filtering, an advantage of having so many
samples available, and I’m a complete newbie at dealing with it
myself. Despite having my e-mail address published on this blog for
the last 6 years, I surprisingly only receive about one spam message
per day, so this wasn’t actually a huge risk for me.</p>

<p>Then there’s the issue of not looking like spam myself. My e-mail
server needs to sit in a friendly IP neighborhood. I need to have a
proper PTR record (reverse DNS). I need to generally look legitimate.
No showing up to deliver mail to other mail servers in just a t-shirt.
This is actually something I’m still struggling with right now. In
fact, if I’ve sent you personal e-mail in the past and you’re using
Gmail, you should check your spam folder right now, because something
I’ve sent recently may likely have been caught in it. I don’t know why
yet.</p>

<p>I would also need to learn the ropes of a new e-mail client. I used
Eudora from 1995 to 2005, then Gmail up until now. Now, I’m the last
person to be reluctant about learning how to operate a new piece of
software. I’m constantly on the lookout for better software. The
problem is that I use e-mail for a lot of very important things. I
can’t afford mistakes. I need to hit the ground running on this one.</p>

<h3 id="notmuch-vs-mu4e">notmuch vs mu4e</h3>

<p>Since I decided early on to go with an Emacs-based e-mail client,
learning a new e-mail client got a lot easier. I know Emacs Lisp
<em>pretty</em> darn well. In the worst case of getting stuck, I could very
easily study the client’s source code and work out for myself whatever
is going on. I could even monkey patch it in my configuration if it
was causing problems for me (and I’ve already done exactly that).</p>

<p>I wanted to use the <a href="http://cr.yp.to/proto/maildir.html">Maildir format</a>, something I could hack
on if needed. The two obvious choices for this were mu4e and notmuch,
both started in 2009. I initially reached for mu4e. Compared to
notmuch, it follows Emacs idioms more closely. For example, the e-mail
listing is oriented around a mark and execute paradigm, like a dired
buffer. After an initial glance, it felt more integrated.</p>

<p>Unfortunately, mu4e is still not mature enough for real productive
use, making it far too risky for e-mail. I found out the hard way that
the database format has varied regularly between versions. Worse, mu4e
is not suitable for remote access. Not only does it assume the Maildir
directory is on the local host, it uses absolute paths to access it,
so it won’t work over <a href="http://fuse.sourceforge.net/sshfs.html">sshfs</a> as I had hoped. Bummer.</p>

<p>In contrast, the notmuch client
<a href="http://notmuchmail.org/remoteusage/">is specifically designed to be operated remotely</a>. Emacs
doesn’t realize that it runs the notmuch client over SSH. Emacs
doesn’t need to touch the Maildir directly. It’s a beautiful setup,
one very friendly to <a href="/blog/2012/06/23/">versioning dotfiles</a>. I’ve already
done some pretty heavy configuration to get it exactly the way I want
it. On top of this, notmuch is incredibly fast and stable. It’s been a
very enjoyable client. So much so that it inspired me to build a web
feed reader with a similar interface (to be described in my next
post).</p>

<h3 id="the-mail-server">The mail server</h3>

<p>My early plan was to run an e-mail server on a Raspberry Pi. It’s
low-powered, making it very inexpensive and quiet to operate. It’s
also very portable, so I wouldn’t need to lug a server around if I
needed to move it — no more difficult than moving a cell phone
charger. I could run it from my nightstand next to my bed if I wanted
to. On the other hand, I would be a little nervous running my mail
server on a residential connection. The downtime would be a product of
my ISP, my power company, and my router. It’s not bad, but it’s risky
when I’m worried about receiving important e-mail. If I was in the
middle of job hunting I probably wouldn’t attempt it at all.
Fortunately e-mail servers will retry over the course of several days,
so I think this would generally be manageable.</p>

<p>This plan was struck down by Comcast’s network policies. The good news
is that my IP address has not changed in years. The bad news is that
they block port 25 both incoming and outgoing as an anti-spam measure.
This makes it impossible to run an e-mail server, because e-mail
<em>must</em> be received on port 25 (MX records were misdesigned feature).
For outgoing, I would need to send e-mail through Comcast’s smarthost
server, which brings up the privacy issue again. I assume the same
organizations have this tapped as well. Even if port 25 wasn’t
blocked, I wouldn’t be able to set a PTR record and my IP neighborhood
would be suspicious.</p>

<p>I ended up going with <a href="https://www.digitalocean.com/?refcode=a613ef5c79c1">Digital Ocean</a>, as the linked guide
suggested. The smallest, cheapest offering is more than suitable for
my needs both as an e-mail server and an XMPP server. It will probably
be handy for <a href="/blog/2013/01/26/">other short-lived, experimental</a> servers
too.</p>

<p><a href="http://www.mattcutts.com/blog/backup-gmail-in-linux-with-getmail/">I used getmail</a> to get all of my old mail onto the mail
server in the Maildir format. It was completely straightforward and
probably the easiest part of it all.</p>

<h3 id="pgp">PGP</h3>

<p>I can finally use GnuPG with my e-mail. An important factor in this
setup is that encryption and decryption is done <em>locally</em>, not on my
e-mail server. I don’t need to trust the server with my private keys.
However, verification of signatures is done on the server, which is
slightly less than ideal, but manageable.</p>

<p>I thought I might need to generate a fresh PGP key for this new e-mail
address, but instead I learned something new about PGP. A key can have
multiple identities attached to it, so all I needed to do was edit the
key and add the new e-mail address. The PGP designers had already
thought of this problem two decades ago! The updated key is linked
next to my portrait, as well as distributed on the public keyservers.</p>

<p>I look forward to making more use of cryptography with my e-mail.</p>

<h3 id="the-old-address">The old address</h3>

<p>It will take me awhile, a year or more, to move everything off of my
old e-mail address. I have a lot of accounts associated with it, many
of which won’t allow me to change my e-mail address. So, in the
meantime, anything sent there will continue to be forwarded to me.
That’s now the only purpose of my Google account.</p>

<p>I’m a little worried about using my new e-mail address as my Digital
Ocean account because it presents a circularity problem. I could
easily get myself locked out of everything. I’ll have to figure out
how I’m going to handle that situation. (Use my work e-mail address?
So long as I don’t get locked out of my e-mail and get fired all at
the same time.)</p>

<p>If you’re a hacker, I encourage you to run your own e-mail server if
you’re not doing so already. It’s been extremely liberating for me.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Personal OS Configuration Live System</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/06/17/"/>
    <id>urn:uuid:ce0f1153-4efa-3703-2e46-4c78b52adba6</id>
    <updated>2013-06-17T00:00:00Z</updated>
    <category term="debian"/><category term="emacs"/>
    <content type="html">
      <![CDATA[<p>I don’t know what to title or name this thing, so bear with me.</p>

<p>Two years ago I started <a href="/blog/2011/10/19/">versioning my Emacs configuration</a>.
One year ago I started <a href="/blog/2012/06/23/">versioning the rest of my dotfiles</a>
the same way. This has composed beautifully with Debian, which truly
is <a href="http://www.debian.org/"><em>the</em> universal operating system</a>. To create my
comfortable development environment from scratch on a new (or used)
computer, all I need to do is
<a href="http://www.debian.org/CD/netinst/">install a bare-bones Debian system</a>, then direct it to
automatically install a short list of my preferred software (apt-get),
and finally clone these two repositories into place. Given a decent
Internet connection, the whole process takes under an hour to go from
blank hard drive to highly-productive computer system.</p>

<p>In fact, this whole process is so straightforward that it can be
automated using an amazing tool called <a href="http://live.debian.net/">live-build</a>!
Taking the next step in versioning and automation, I wrote a
live-build build that creates a live system with my personal
configuration baked in.</p>

<ul>
  <li><a href="https://github.com/skeeto/live-dev-env">https://github.com/skeeto/live-dev-env</a></li>
</ul>

<p><strong>A link to the latest ISO build can be found in the above link</strong>. To
try it out, burn it to a CD, write it onto a flash drive (it’s a
hybrid ISO), or just fire it up in your favorite virtual machine. You
will be booted directly into <em>very nearly</em> my exact configuration,
down to the same random wallpaper selection. It’s extremely minimal and
will look like this.</p>

<p><img src="/img/screenshot/live-skeeto.jpg" alt="" /></p>

<p>I don’t know for sure yet if this will be useful to anyone except me.
On occasions where I need to make quick use of some arbitrary
computer, or maybe just for system rescue, having this will be
incredibly handy. <a href="http://www.knopper.net/knoppix/index-en.html">Knoppix</a> is nice, but working without my
own configuration can be discouraging; it’s so slow in comparison.</p>

<p>It’s also a chance for others to glimpse at my workflow without any
commitment (i.e. potential dotfile clobbering). I like to study other
developers’ workflows, stealing their ideas for my own, so I want to
make mine easy to study. I think I’m doing some innovative things with
a <a href="https://github.com/skeeto/dotfiles#openbox">hacked-together pseudo-tiling window manager</a>, my Firefox
configuration, and my Emacs configuration. At least two of my
co-workers’ Emacs configurations are forked from mine (you know who
you are!). From a selfish perspective, the more people using workflows
like mine, the better these workflows will be supported by the
community at large!</p>

<p>I had said that it’s “very nearly” my exact configuration. At the time
of this writing, what’s missing is <a href="http://www.quicklisp.org/">Quicklisp</a> and
<a href="https://github.com/technomancy/leiningen">Leiningen</a>, since these aren’t available as fully-functioning
Debian packages at the moment. I’ll work them in eventually. The build
is non-incremental and takes about an hour right now, so adding these
little extras by trial-and-error will take some time.</p>

<p><a href="http://5digits.org/pentadactyl/">Pentadactyl</a> really shines here, because it allows me to
completely configure Firefox/Iceweasel from a version-friendly text
file. Except for some user scripts (still figuring out how to install
those at build time), the browser in that image is identical to what
I’m using right now. Even though <a href="/blog/2013/02/25/">V8 is the king of performance</a>,
Firefox still wins hands-down for power users due to its superior
configurability.</p>

<p>However, this Firefox configuration still has an annoyance. Several of
the browser add-ons that I pre-install always pop up their first-run
welcome messages. These messages flip a setting in Firefox’s registry
so they don’t run again, a setting that is lost when the system shuts
down. I can toggle these settings from the Pentadactyl configuration
file, but not early enough. By the time Pentadactyl gets to applying
these settings it’s too late. The messages, including Pentadactyl’s,
are already queued to be shown. I don’t think it’s possible to
completely fix this, even if Pentadactyl is fixed.</p>

<p>Anything not already installed is readily installable through apt-get.
Right now I’m considering the feasibility of some sort of lazy-install
system based on command-not-found. Debian has a package called
command-not-found which intercepts the shell’s error handler when an
issued command is not found. Instead of just giving the normal
warning, it prints out what package needs to be installed in order to
provide the command. What I think would be really neat is if the
needed package is automatically installed, then the requested command
is then re-run, all without returning control to the shell in the
interim. It would be a lot like Emacs autoloads. As long as I have an
Internet connection, most of Debian’s packages would be <em>virtually</em>
installed on my live system as far as I’m concerned. The initial run
of any program just takes a little longer.</p>

<p>I’ll continue to tweak this image over time, not only as I figure out
how to make things work in Debian live-build, but also as my
preferences and workflows evolve. Adjusting my configuration to work
on a live-system has been enlightening, revealing all sorts of little
manual things that I hadn’t yet automated. Perhaps someday this build
will replace traditional operating system installations for me, at
least for productive work. I could do all of my work from a portable
read-only live system with a bit of short-lived (i.e. local cache)
user-data persistence stored on a separate writable medium.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Emacs Mouse Slider Mode for Numbers</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/06/07/"/>
    <id>urn:uuid:26f803e9-776a-309d-9d7d-76448c2d1231</id>
    <updated>2013-06-07T00:00:00Z</updated>
    <category term="emacs"/><category term="media"/>
    <content type="html">
      <![CDATA[<p>One of my regular commenters, and as of recently co-worker, Ahmed
Fasih, sent me a video, <a href="http://youtu.be/FpxIfCHKGpQ">Live coding in Lua</a>. The author of the
video added support to his IDE for scaling numbers in source code by
dragging over them with the mouse. This feature was directly inspired
by <a href="http://worrydream.com/#">Bret Victor</a>, a user interface visionary, probably best
introduced through his presentation <a href="http://youtu.be/PUv66718DII">Inventing on Principle</a>.</p>

<p>I think Bret’s interface ideas are interesting and his demos very
impressive. However, I feel they’re <em>too</em> specialized to generally be
very useful. <a href="https://github.com/skeeto/skewer-mode">Skewer</a> suffers from the same problem: in order
to truly be useful, programs need to be written in a form that expose
themselves well enough for Skewer to manipulate at run-time. Some
styles of programming are simply better suited to live development
than others. This problem is amplified in Bret’s case by the extreme
specialty of the tools. They’re fun to play with, and probably great
for education, but I can’t imagine any time I would find them useful
while being productive.</p>

<p>Anyway, Ahmed wanted to know if it would be possible to implement this
feature in Emacs. I said yes, knowing that Emacs ships with
<a href="http://www.emacswiki.org/emacs/ArtistMode">artist-mode</a>, where the mouse can be used to draw with
characters in an editing buffer. That’s proof that Emacs has the
necessary mouse events to do the job. After spending a couple of hours
on the problem I was able to create a working prototype:
<strong>mouse-slider-mode</strong>.</p>

<ul>
  <li><a href="https://github.com/skeeto/mouse-slider-mode">https://github.com/skeeto/mouse-slider-mode</a></li>
</ul>

<video src="https://nullprogram.s3.amazonaws.com/skewer/mouse-slider-mode.webm" controls="controls" width="350" height="350">
  Demo video requires HTML5 with WebM support.
</video>

<p>It’s a bit rough around the edges, but it works. When this minor mode
is enabled, right-clicking and dragging left or right on any number
will decrease or increase that number’s value. More so, if the current
major mode has an entry in the <code class="language-plaintext highlighter-rouge">mouse-slider-mode-eval-funcs</code> alist,
as the value is scaled the expression around it is automatically
evaluated in the live environment. The documentation shows how to
enable this in js2-mode buffers using skewer-mode. This is actually a
step up from the other, non-Emacs implementations of this mouse slider
feature. If I understood correctly, the other implementations
re-evaluate the entire buffer on each update. My version only needs to
evaluate the surrounding expression, so the manipulated code doesn’t
need to be so isolated.</p>

<p>There is one limitation that cannot be fixed using Elisp. If the mouse
exits the Emacs window, Elisp stops receiving valid mouse events.
Number scaling is limited by the width of the Emacs window. Fixing
this would require patching Emacs itself.</p>

<p>This is purely a proof-of-concept. It’s not installed in my Emacs
configuration and I probably won’t ever use it myself, except to show
it off as a flashy demo with an HTML5 canvas. If anyone out there
finds it useful, or thinks it could be better, go ahead and adopt it.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>A Handy Emacs Package Configuration Macro</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/06/02/"/>
    <id>urn:uuid:3e64c259-69a6-3500-0e87-3a93d61c1644</id>
    <updated>2013-06-02T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p><em>Update April 2015</em>: I now use <a href="https://github.com/jwiegley/use-package">use-package</a> instead of
the <code class="language-plaintext highlighter-rouge">with-package</code> macro explained below. It’s cleaner, nicer, and
better maintained.</p>

<p>I was inspired by <a href="http://milkbox.net/note/single-file-master-emacs-configuration/">a post recently written by Milkypostman</a>
(the M in MELPA). He describes some of his <code class="language-plaintext highlighter-rouge">init.el</code> configuration,
specifically focusing on an <code class="language-plaintext highlighter-rouge">after</code> macro that wraps the misdesigned
<code class="language-plaintext highlighter-rouge">eval-after-load</code> function. I wanted to take this macro further in
three ways:</p>

<ul>
  <li>
    <p>The delayed expression should be <a href="http://lunaryorn.com/blog/2013/05/31/byte-compiling-eval-after-load/">properly byte-compiled</a>,
which doesn’t happen by default with <code class="language-plaintext highlighter-rouge">eval-after-load</code>.</p>
  </li>
  <li>
    <p>In a few cases my expression depends on multiple, independent
packages but <code class="language-plaintext highlighter-rouge">eval-after-load</code> only accepts one.</p>
  </li>
  <li>
    <p>If I’m specifying packages when using my macro, why bother listing
them at the top of my initialize file? I could DRY things up by
learning what packages to install when the macro is used. Here’s
the kicker: <strong>I can pretend that every available package is already
installed like built-in packages!</strong></p>
  </li>
</ul>

<p>The result is a pair of macros <code class="language-plaintext highlighter-rouge">with-package</code> and <code class="language-plaintext highlighter-rouge">with-package*</code>
which can be found in <a href="https://github.com/skeeto/.emacs.d/blob/master/lisp/package-helper.el">package-helper.el</a>. The latter form
doesn’t wait but immediately loads the specified packages with
<code class="language-plaintext highlighter-rouge">require</code>. It’s shaped just like Milkypostman’s <code class="language-plaintext highlighter-rouge">after</code> macro, except
that it can accept a list of packages in place of a single symbol.
Also, the package names aren’t quoted; they don’t need to be since
this is a macro instead of a function.</p>

<p>Here’s a typical use case for each macro. That <code class="language-plaintext highlighter-rouge">expose</code> higher-order
function is <a href="/blog/2010/09/29/">from my personal <code class="language-plaintext highlighter-rouge">utility</code> library</a>. The
expressions to be evaluated depend on both packages and neither needs
to be loaded immediately, so I’m using the first form of the macro.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">with-package</span> <span class="p">(</span><span class="nv">skewer-mode</span> <span class="nv">utility</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">skewer-setup</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">define-key</span> <span class="nv">skewer-mode-map</span> <span class="p">(</span><span class="nv">kbd</span> <span class="s">"C-c $"</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">expose</span> <span class="nf">#'</span><span class="nv">skewer-bower-load</span> <span class="s">"jquery"</span> <span class="s">"1.9.1"</span><span class="p">)))</span>

<span class="p">(</span><span class="nv">with-package*</span> <span class="nv">smex</span>
  <span class="p">(</span><span class="nv">smex-initialize</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">global-set-key</span> <span class="p">(</span><span class="nv">kbd</span> <span class="s">"M-x"</span><span class="p">)</span> <span class="ss">'smex</span><span class="p">))</span>
</code></pre></div></div>

<p>For the second one, I’m going to be using smex right away (takes over
<code class="language-plaintext highlighter-rouge">M-x</code>), so I use the second form, which immediately loads smex. The
macro isn’t really necessary at all here since I could just use
<code class="language-plaintext highlighter-rouge">require</code> and follow it with these expressions, but I <em>really</em> like
how this organizes my <code class="language-plaintext highlighter-rouge">init.el</code>. It creates a domain-specific language
(DSL) just for Emacs configuration. Each package configuration is
grouped up in a clean <code class="language-plaintext highlighter-rouge">let</code>-like form. Since I’ve added syntax
highlighting to <code class="language-plaintext highlighter-rouge">with-package</code> it looks very elegant. Normal syntax
highlighters aren’t going to do this, so here’s a screenshot of my
buffer.</p>

<p><img src="/img/emacs/with-package.png" alt="" /></p>

<p>JavaScript developers with a keen eye may notice a familiar pattern
here. This macro is shaped a bit like the
<a href="https://github.com/amdjs/amdjs-api/wiki/AMD">Asynchronous Module Definition (AMD)</a>, with asynchronousy in
mind. Since this is Lisp with a powerful macro system, I get to hide
away the function wrapper part.</p>

<p>Using this macro has caused me to use <code class="language-plaintext highlighter-rouge">eval-after-load</code> with just
about everything. This has cut my initialization time down to about
10% of what it was before! On those occasions that I <em>do</em> restart
Emacs, it’s really nice that it’s back to under 1 second (0.6 seconds
vs 6 seconds).</p>

<h3 id="the-problem-of-eval-after-load">The problem of eval-after-load</h3>

<p>I’m calling <code class="language-plaintext highlighter-rouge">eval-after-load</code> poorly designed because it’s a perfect
example of an inappropriate use of <code class="language-plaintext highlighter-rouge">eval</code>. In function form it
<em>should</em> have accepted a function as its second argument instead of an
s-expression, so it would work like a hook. This is even more
inappropriate now that Emacs has proper lexical closures, which is the
perfect mechanism for delayed evaluation. <strong>The whole point of
<code class="language-plaintext highlighter-rouge">eval-after-load</code> is to speed up Emacs initialization time, but <em>using
<code class="language-plaintext highlighter-rouge">eval</code> is slow</em></strong>. To the compiler, this isn’t code, just data. This
means no byte-compilation and no compiler warnings.</p>

<p>A possible alternative design for <code class="language-plaintext highlighter-rouge">eval-after-load</code> would be a hook
named something like <code class="language-plaintext highlighter-rouge">&lt;package&gt;-load-hook</code>. Then when <code class="language-plaintext highlighter-rouge">load</code> or
<code class="language-plaintext highlighter-rouge">require</code> loads a file, it runs the hook with the matching name. This
removes <code class="language-plaintext highlighter-rouge">eval-after-load</code> as its own standalone language concept.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'skewer-mode-load-hook</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="o">...</span><span class="p">))</span>
</code></pre></div></div>

<p>The problem here is when the package is already loaded the hook is
never run. In contrast, when <code class="language-plaintext highlighter-rouge">eval-after-load</code> is used on an
already-loaded package, the expression is immediately evaluated.</p>

<p>Given this, if there was something I could change about this it would
simply be for <code class="language-plaintext highlighter-rouge">eval-after-load</code>, whatever it would be called, to take
a function for the second argument. I would also provide a simple
macro just like <code class="language-plaintext highlighter-rouge">after</code> that wraps this function. Why not just a
macro? The function form would be really useful for a situation like
this,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">eval-after-load</span> <span class="ss">'skewer-mode</span> <span class="nf">#'</span><span class="nv">skewer-setup</span><span class="p">)</span>
</code></pre></div></div>

<p>Here there’s no need to instantiate a new anonymous function or
s-expression. If all it’s doing is calling a zero-arity function, that
function can be passed in directly.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Skewer Gets HTML Interaction</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/06/01/"/>
    <id>urn:uuid:f8c13ac6-2da6-3851-497c-8785db8a203e</id>
    <updated>2013-06-01T00:00:00Z</updated>
    <category term="javascript"/><category term="emacs"/><category term="web"/>
    <content type="html">
      <![CDATA[<p>A month ago Zane Ashby made a pull request that <a href="https://github.com/skeeto/skewer-mode/pull/19">added another minor
mode to Skewer</a>: skewer-html-mode. It’s analogous to the
skewer-css minor mode in that it evaluates HTML “expressions” in the
context of the current page. The original pull request was mostly a
proof of concept, with evaluated HTML snippets being appended to the
end of the page (<code class="language-plaintext highlighter-rouge">body</code>) unless a target selector is manually
specified.</p>

<p>This mode is still a bit rough around this edges, but since I think
it’s useful enough for productive work I’ve merged it in.</p>

<h3 id="replacing-html">Replacing HTML</h3>

<p>Unsatisfied with just appending content, I ran with the idea and
updated it to automatically <em>replace</em> structurally-matching content on
the page when possible. Zane’s fundamental idea remained intact: a CSS
selector is sent to the browser along with the HTML. Skewer running in
the browser uses <code class="language-plaintext highlighter-rouge">querySelector()</code> to find the relevant part of the
document and replaces it with the provided HTML. This is done with the
command <code class="language-plaintext highlighter-rouge">skewer-html-eval-tag</code> (default: <code class="language-plaintext highlighter-rouge">C-M-x</code>), which selects the
innermost tag enclosing the point.</p>

<p>To accomplish this, an important piece of skewer-html exists to
compute this CSS selector. It’s a purely structural selector, ignoring
classes, IDs, and so on, instead relying on the pseudo-selector
<a href="https://developer.mozilla.org/en-US/docs/Web/CSS/:nth-of-type">:nth-of-type</a>. For example, say this is the content of
the buffer and the point is somewhere inside the second heading (Bar).</p>

<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">&lt;html&gt;</span>
  <span class="nt">&lt;head&gt;&lt;/head&gt;</span>
  <span class="nt">&lt;body&gt;</span>
    <span class="nt">&lt;div</span> <span class="na">id=</span><span class="s">"main"</span><span class="nt">&gt;</span>
      <span class="nt">&lt;h1&gt;</span>Foo<span class="nt">&lt;/h1&gt;</span>
      <span class="nt">&lt;p&gt;</span>I am foo.<span class="nt">&lt;/p&gt;</span>
      <span class="nt">&lt;h1&gt;</span>Bar<span class="nt">&lt;/h1&gt;</span>
      <span class="nt">&lt;p&gt;</span>I am bar.<span class="nt">&lt;/p&gt;</span>
    <span class="nt">&lt;/div&gt;</span>
  <span class="nt">&lt;/body&gt;</span>
<span class="nt">&lt;/html&gt;</span>
</code></pre></div></div>

<p>The function <code class="language-plaintext highlighter-rouge">skewer-html-compute-selector</code> will generate this
selector. Note that :nth-of-type is 1-indexed.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>body:nth-of-type(1) &gt; div:nth-of-type(1) &gt; h1:nth-of-type(2)
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">&gt;</code> syntax requires that these all be direct descendants and
:nth-of-type allows it to ignore all those paragraph elements. This
means other types of elements can be added around these headers, like
additional paragraphs, without changing the selector. The :nth-of-type
on <code class="language-plaintext highlighter-rouge">body</code> is obviously unnecessary, but this is just to keep
skewer-html dead simple. It doesn’t need to know the semantics of
HTML, just the surface syntax. There will only ever be one <code class="language-plaintext highlighter-rouge">body</code> tag,
but to skewer-html it’s just another HTML tag.</p>

<p>Side note: this is why I <em>strongly</em> prefer to use <code class="language-plaintext highlighter-rouge">/&gt;</code> self-closing
syntax in HTML5 even though it’s unnecessary. Unlike XML, that closing
slash is treated as whitespace and it’s impossible to self-close tags.
The schema specifies which tags are “void” (always self-closing:
<code class="language-plaintext highlighter-rouge">img</code>, <code class="language-plaintext highlighter-rouge">br</code>) and which tags are “normal” (explicitly closed: <code class="language-plaintext highlighter-rouge">script</code>,
<code class="language-plaintext highlighter-rouge">canvas</code>). This means if you <em>don’t</em> use <code class="language-plaintext highlighter-rouge">/&gt;</code> syntax, your editor
would need to know the HTML5 schema in order to properly understand
the syntax. I prefer not to require this of a text editor — or
anything else doing dumb manipulations of HTML text — especially with
the HTML5 specification constantly changing.</p>

<p>When I was writing this I originally included <code class="language-plaintext highlighter-rouge">html</code> in the selector.
Selector computation would just walk up to the root of the document
regardless of what the tags were. Curiously, including this causes the
selector to fail to match even though this is literally the page
structure. So, out of necessity, skewer-html knows enough to leave it
off.</p>

<p>For replacement, rather than a simple <code class="language-plaintext highlighter-rouge">innerHTML</code> assignment on the
selected element, Skewer is parsing the HTML into an node object,
removing the selected node object, and putting the new one in its
place. The reason for this is that I want to include all of the
replacement element’s attributes.</p>

<p>Another HTML oddity is that the <code class="language-plaintext highlighter-rouge">body</code> and <code class="language-plaintext highlighter-rouge">head</code> elements cannot be
replaced. It’s a limitation of the DOM. This means these tags cannot
be “evaluated” directly, only their descendants. Brian and I also ran
into this issue in <a href="http://www.50ply.com/blog/2012/08/13/introducing-impatient-mode/">impatient-mode</a> while trying to work around a
strange HTML encoding corner case: scripts loaded with a <code class="language-plaintext highlighter-rouge">script</code> tag
created by <code class="language-plaintext highlighter-rouge">document.write()</code> are parsed with a different encoding
than when loaded directly by adding a <code class="language-plaintext highlighter-rouge">script</code> element to the page.</p>

<p>This last part is actually a small saving grace for skewer-css, which
works by appending new stylesheets to the end of <code class="language-plaintext highlighter-rouge">body</code>. Why <code class="language-plaintext highlighter-rouge">body</code>
and not <code class="language-plaintext highlighter-rouge">head</code>? Because some documents out there have stylesheets
linked from <code class="language-plaintext highlighter-rouge">body</code>, and properly overriding these requires appending
stylesheets <em>after</em> them. If <code class="language-plaintext highlighter-rouge">body</code> is replaced by skewer-html, all of
the dynamic stylesheets appended by skewer-css would be lost,
reverting the style of the page. Since we can’t do that, this isn’t an
issue!</p>

<h3 id="appending-html">Appending HTML</h3>

<p>So what happens when the selector doesn’t match anything in the
current document? Skewer fills in the missing part of the structure
and sticks the content in the right place. Next time the tag is
evaluated, the structure exists and it becomes a replacement
operation. This means the document in the browser can start completely
empty (like the <code class="language-plaintext highlighter-rouge">run-skewer</code> page) and you can fill in content as you
write it.</p>

<p>But what if the page already has content? There’s an interactive
command <code class="language-plaintext highlighter-rouge">skewer-html-fetch-selector-into-buffer</code>. You select a part of
the page and it gets inserted into the current buffer (probably a
scratch buffer). The idea is that you can then modify and then
evaluate it to update the page. This is the roughest part of
skewer-html right now since I’m still figuring out a good workflow
around it.</p>

<p>If you have Skewer installed and updated, you already have
skewer-html. It was merged into <code class="language-plaintext highlighter-rouge">master</code> about a month ago. If you
have any ideas or opinions for how you think this minor mode should
work, please share it. The intended workflow is still not a
fully-formed idea.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Should Emacs Packages Self Configure?</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/05/23/"/>
    <id>urn:uuid:3421bb8a-23e9-3f5f-3329-f3ec256a91af</id>
    <updated>2013-05-23T00:00:00Z</updated>
    <category term="emacs"/>
    <content type="html">
      <![CDATA[<p><em>Update 2013-06-01</em>: I ultimately decided that Skewer should <em>not</em>
modify any mode hooks automatically. Instead the major mode hooks can
be configured by putting <code class="language-plaintext highlighter-rouge">(skewer-setup)</code> in your initialization file.
This function is designed to play well with autoloading, so using it
won’t increase your startup time.</p>

<p>There’s a discussion happening on a Skewer issue on GitHub:
<a href="https://github.com/skeeto/skewer-mode/issues/22">Problems with skewer-css autoload</a>. The issue was opened by
Steve Purcell. Right now Skewer’s CSS minor mode is enabled by default
in less-css-mode, which is, of course, incompatible with the minor
mode. It’s enabled because less-css-mode is derived from css-mode, so
it runs all of css-mode’s hooks.</p>

<p>There are actually two separate problems here.</p>

<ul>
  <li>
    <p>The hook to activate the minor mode needs to check what major mode
is activating it, because css-mode-hook is run by other modes. This
is easy to fix, though it’s not very elegant. I would need to do
the same for skewer-html-mode.</p>
  </li>
  <li>
    <p>Skewer is eagerly configuring itself once it’s installed. This is
intentional: I want Skewer to be <em>really</em> easy to use right
out-of-the-box. There’s no “install the package, then add these
lines to your startup configuration.” If I remove this behavior,
the previous problem becomes the user’s problem, since it’s up to
them to activate the minor mode.</p>
  </li>
</ul>

<p>Steve is telling me that this auto-configuration is a bad idea; it so
often causes these sorts of messes. The gist of his argument is that
<em>installing</em> a package is separate from <em>enabling</em> a package. It’s up
to the user to decide how and when they want to use a package. Steve
maintains a number of popular Emacs packages and, even more
importantly, he’s one of the MELPA maintainers. He knows this stuff
much better than I do. He even used to share my current opinion
<a href="https://github.com/purcell/elisp-slime-nav/pull/6">up until two months ago</a> when someone changed his mind.</p>

<p>On the other hand, I really dislike when software has such awful
defaults that it’s unusable without first configuring it. Skewer’s
case wouldn’t be too bad since it can be enabled manually without
editing any configuration, but practical use would require that users
configure Skewer in their startup files. It would also make it harder
for users to discover Skewer’s features. They might not otherwise even
be aware there are CSS and HTML minor modes! If the concern is
separating installation and activation, package.el <em>does</em> have a
variable, <code class="language-plaintext highlighter-rouge">package-load-list</code>, for this purpose, though using it isn’t
very convenient.</p>

<p>Right now I’m stuck in this dilemma like a deer caught in the
headlights. Formal packages are a very new thing for Emacs, so there
doesn’t seem to be a community consensus on this issue yet. I really
like when the packages I install behave like Skewer does now (i.e.
<a href="https://github.com/kingtim/nrepl.el">nrepl.el</a>) so I don’t need to configure them. But I would also
be easily frustrated if that configuration magic was getting in my
way. This particular annoyance happens to me outside of Emacs often
enough (i.e. Chromium), though it’s worse because it generally can’t
easily be fixed like in Emacs.</p>

<p>I’m still trying to make up my mind about this. If you have an opinion
on the matter I’d like to hear it. You can leave a comment here, or
much better, leave your comment on the issue on GitHub. It’s not going
to come down to a vote or anything like that. I just want to get a
feel for how people expect Emacs packages to work.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Load Libraries in Skewer with Bower</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/05/18/"/>
    <id>urn:uuid:4dc317df-853c-3d75-68da-ebaa2a43f628</id>
    <updated>2013-05-18T00:00:00Z</updated>
    <category term="javascript"/><category term="emacs"/>
    <content type="html">
      <![CDATA[<p>I recently added support to <a href="/blog/2012/10/31/">Skewer</a> for loading libraries on
the fly using <a href="https://github.com/bower/bower">Bower’s package infrastructure</a>. Just make sure
you’re up to date, then while skewering a page run <code class="language-plaintext highlighter-rouge">M-x
skewer-bower-load</code>. It will prompt for a package and version, download
the library, then inject it into the currently skewered page.</p>

<p>Because the Bower infrastructure is so simple, <strong>Bower is not actually
needed in order to use this.</strong> Only Git is required, configured by
<code class="language-plaintext highlighter-rouge">skewer-bower-git-executable</code>, which it tries to configure itself from
<a href="https://github.com/magit/magit">Magit</a> if it’s been loaded.</p>

<h3 id="motivation">Motivation</h3>

<p>Skewer comes with a <a href="http://en.wikipedia.org/wiki/Greasemonkey">userscript</a> that adds a small toggle button
to the top-right corner every page I visit. Here’s a screenshot of the
toggle on this page.</p>

<p><img src="/img/screenshot/skewer-toggle.png" alt="" /></p>

<p>When that little red triangle is clicked, the page is connected to
Emacs and the triangle turns green. Click it again and it disconnects,
turning red. It remembers its state between page refreshes so that I’m
not constantly having to toggle.</p>

<p>It’s mainly for development purposes, but it’s occasionally useful to
Skewer an arbitrary page on the Internet so that I can poke at it from
Emacs. One habit that I noticed comes up a lot is that I want to use
jQuery as I fiddle with the page, but jQuery isn’t actually loaded for
this page. What I’ll do is visit a jQuery script in Emacs and load
this buffer (<code class="language-plaintext highlighter-rouge">C-c C-k</code>). As expected, this is tedious and easily
automated.</p>

<p>Rather than add specific support for jQuery, I thought it would be
more useful to hook into one of the existing JavaScript package
managers. Not only would I get jQuery but I’d be able to load anything
else provided by the package manager. This means if I learn about a
cool new library, chances are I could just switch to my <code class="language-plaintext highlighter-rouge">*javascript*</code>
scratch buffer, load the library with this new Skewer feature, and
play with it. Very convenient.</p>

<h3 id="how-it-works">How it Works</h3>

<p>There are <a href="http://wp.me/p2UXnc-f">a number of package managers</a> out there. I chose
Bower because of its emphasis on client-side JavaScript and, more so,
because its infrastructure is <em>so</em> simple that I wouldn’t actually
need to use Bower itself to access it. In adding this feature to
Skewer, I wrote half a Bower client from scratch very easily.</p>

<p>The only part of the Bower infrastructure hosted by Bower itself is a
tiny registry that maps package names to Git repositories. This host
also accepts new mappings, unauthenticated, for registering new
packages. The entire database is served up as plain old JSON.</p>

<ul>
  <li><a href="https://bower.herokuapp.com/packages">https://bower.herokuapp.com/packages</a></li>
</ul>

<p>To find out what versions are available, clone this repository with
Git and inspect the repository tags. Tags that follow the
<a href="http://semver.org/">Semantic Versioning</a> scheme are versions of the package
available for use with Bower. Once a version is specified, look at
<code class="language-plaintext highlighter-rouge">bower.json</code> in the tree-ish referenced by that tag to get the rest of
the package metadata, such as dependencies, endpoint listing, and
description.</p>

<p>This is all very clever. The Bower registry doesn’t have to host any
code, so it remains simple and small. It could probably be rewritten
from scratch in 15-30 minutes. Almost all the repositories are on
GitHub, which most package developers are already comfortable with.
Package maintainers don’t need to use any tools or interact with any
new host systems. Except for adding some metadata they just keep doing
what they’re doing. I think this last point is a big part of
<a href="http://melpa.milkbox.net/">MELPA’s</a> success.</p>

<h3 id="bowers-fatal-weaknesses">Bower’s Fatal Weaknesses</h3>

<p>Unfortunately Bower has two issues, one of which is widespread, that
seriously impacts its usefulness.</p>

<h4 id="dependency-specification">Dependency Specification</h4>

<p>Even though Bower specifies <a href="http://semver.org/">Semantic Versioning</a> for package
versions, which <em>very</em> precisely describes version syntax and
semantics, the <strong>dependencies field in <code class="language-plaintext highlighter-rouge">bower.json</code> is
underspecified</strong>. There’s no agreed upon method for specifying
relative dependency versions.</p>

<p>Say your package depends on jQuery and it relies on the newer jQuery
1.6 behavior of <code class="language-plaintext highlighter-rouge">attr()</code>. You would mark down that you depend on
jQuery 1.6.0. Say a user of your package is also using another package
that depends on jQuery, it’s using the <code class="language-plaintext highlighter-rouge">on()</code> method, which requires
jQuery 1.7 or newer. It specifies jQuery 1.7.0. This is a dependency
conflict.</p>

<p>Of course your package works perfectly fine with 1.7.0. It works fine
at 1.6.0 and later. In other package management systems, you would
probably have marked that you depend on “&gt;=1.6.0” rather than just
1.6.0. Unfortunately, Bower doesn’t specify this as a valid dependency
version. Some package maintainers have gone ahead and specified
relative versions anyway, but inconsistently. Some use the “&gt;=” prefix
like I did above, some prefix with “~” (“<em>about</em> this version”), which
is pretty useless.</p>

<p>And this leads into the other flaw.</p>

<h4 id="most-bower-packages-are-broken">Most Bower Packages are Broken</h4>

<p>While some parts of Bower are underspecified, most packages don’t
follow the simple specifications that already exist! That is to say,
<strong>most Bower packages are broken</strong>. This is incredibly unfortunate
because it means at least half of the packages can’t be loaded by this
new Skewer feature.</p>

<p>How are they broken? As of this writing, there are 2,195 packages list
in Bower’s registry.</p>

<ul>
  <li>
    <p>113 (5%) of them have unreachable or unresponsive repositories.
About half of these are due to invalid repository URLs.</p>
  </li>
  <li>
    <p>1,830 (83%) have no bower.json metadata file. This means the client
has to guess at the metadata.</p>
  </li>
  <li>
    <p>1,034 (47%) have unguessable endpoints. My client looks for other
package management metadata outside of Bower’s, as well as tries to
guess base on the package name. Failing to guess causes the package
to fail to load. These packages aren’t a subset of the last set
with missing bower.json files. Sometimes the bower.json files
contain incorrect information, which causes my client to drop into
guessing mode.</p>
  </li>
  <li>
    <p>1400 (64%) don’t use Semantic Versioning: either no versioning at
all or some other arbitrary versioning system.</p>
  </li>
  <li>
    <p>In total, <strong>2041 (93%) of all Bower packages have invalid or
missing metadata</strong> — bad registry entry, missing bower.json file,
or lack of semantic version tags.</p>
  </li>
</ul>

<p>The good news is that most of the important libraries, like jQuery and
Underscore, work properly. I’ve also registered two of my JavaScript
libraries, <a href="/blog/2013/03/28/">ResurrectJS</a> and <a href="/blog/2013/03/25/">rng-js</a>, so these can be
loaded on the fly in Skewer.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Tracking Mobile Device Orientation with Emacs</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/04/27/"/>
    <id>urn:uuid:3e015231-d0f9-3d53-72a1-ec7d4a30c941</id>
    <updated>2013-04-27T00:00:00Z</updated>
    <category term="emacs"/><category term="javascript"/><category term="web"/>
    <content type="html">
      <![CDATA[<p>Nine years ago I bought my first laptop computer. For the first time I
could carry my computer around and do productive things at places
beyond my desk. In the meantime a new paradigm of mobile computing has
arrived. Following a similar pattern, this month I bought a Samsung
Galaxy Note 10.1, an Android tablet computer. Having never owned a
smartphone, this is my first taste of modern mobile computing.</p>

<p><a href="/img/misc/tablet.jpg"><img src="/img/misc/tablet-thumb.jpg" alt="" /></a></p>

<p>Once the technology caught up, laptops were capable enough to fully
replace desktops. However, this tablet is no replacement for my
laptop. <a href="http://www.terminally-incoherent.com/blog/2012/06/13/ipad/">Mobile devices are purely for consumption</a>, so I will
continue to use desktops and laptops for the majority of my computing.
I’m writing this post on my laptop, not my tablet, for example.</p>

<p>Owning a tablet has opened up a whole new platform for me to explore
as a programmer. I’m not particularly interested in writing Android
apps, though. I’m obviously not alone in this, as I’ve found that
nearly all Android software available right now is somewhere between
poor and mediocre in quality. The hardware was worth the cost of the
device, but the software still has a long way to go. I’m optimistic
about this so I have no regrets.</p>

<h3 id="a-new-web-platform">A New Web Platform</h3>

<p>Instead, I’m interested in mobile devices as a web platform. One of
the few high-quality pieces of software on Android are the web
browsers (Chrome and Firefox), and I’m already familiar with
developing for these. Even more, I can develop software live on the
tablet remotely from my laptop using <a href="/blog/2012/10/31/">Skewer</a> —
i.e. the exact same development tools and workflow I’m already using.</p>

<p>What’s new and challenging is the user interface. Instead of
traditional clicking and typing, mobile users tap, hold, swipe, and
even tilt the screen. Most challenging of all is probably
accommodating both kinds of interfaces at once.</p>

<p>One of the first things I wanted to play with after buying the tablet
was the gyro. The tablet knows its acceleration and orientation at all
times. This information can be accessed in JavaScript using
<a href="http://dev.w3.org/geo/api/spec-source-orientation.html">a fairly new API</a>. The two events of interest are
<code class="language-plaintext highlighter-rouge">ondevicemotion</code> and <code class="language-plaintext highlighter-rouge">ondeviceorientation</code>. Using
<a href="/blog/2012/08/20/">simple-httpd</a> I can transmit all this information
to Emacs as it arrives.</p>

<p>Instead of writing a new servlet for this, to try it out I used
<code class="language-plaintext highlighter-rouge">skewer.log()</code>. Connect a web page viewed on the tablet to Skewer
hosted on the laptop, then evaluate this in a <code class="language-plaintext highlighter-rouge">js2-mode</code> buffer on the
laptop.</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">window</span><span class="p">.</span><span class="nx">addEventListener</span><span class="p">(</span><span class="dl">'</span><span class="s1">devicemotion</span><span class="dl">'</span><span class="p">,</span> <span class="kd">function</span><span class="p">(</span><span class="nx">event</span><span class="p">)</span> <span class="p">{</span>
    <span class="kd">var</span> <span class="nx">a</span> <span class="o">=</span> <span class="nx">event</span><span class="p">.</span><span class="nx">accelerationIncludingGravity</span><span class="p">;</span>
    <span class="nx">skewer</span><span class="p">.</span><span class="nx">log</span><span class="p">([</span><span class="nx">a</span><span class="p">.</span><span class="nx">x</span><span class="p">,</span> <span class="nx">a</span><span class="p">.</span><span class="nx">y</span><span class="p">,</span> <span class="nx">a</span><span class="p">.</span><span class="nx">z</span><span class="p">]);</span>
<span class="p">});</span>
</code></pre></div></div>

<p>Or for orientation,</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">window</span><span class="p">.</span><span class="nx">addEventListener</span><span class="p">(</span><span class="dl">'</span><span class="s1">deviceorientation</span><span class="dl">'</span><span class="p">,</span> <span class="kd">function</span><span class="p">(</span><span class="nx">event</span><span class="p">)</span> <span class="p">{</span>
    <span class="nx">skewer</span><span class="p">.</span><span class="nx">log</span><span class="p">([</span><span class="nx">event</span><span class="p">.</span><span class="nx">alpha</span><span class="p">,</span> <span class="nx">event</span><span class="p">.</span><span class="nx">beta</span><span class="p">,</span> <span class="nx">event</span><span class="p">.</span><span class="nx">gamma</span><span class="p">]);</span>
<span class="p">});</span>
</code></pre></div></div>

<p>These orientation values appeared in my <code class="language-plaintext highlighter-rouge">*skewer-repl*</code> buffer as I
casually rolled the tablet on one axis. The units are obviously
degrees.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[157.4155398727678, 0.38583511837777246, -44.61023992234689]
[155.4477623728871, -0.6438986350040569, -44.69645057005079]
[154.32208572596647, -0.7516393196323073, -45.79730289443301]
[155.437674183483, -0.48375529832044045, -46.406449900466015]
[156.2974174150692, 0.21938214098430556, -47.482812581579154]
[154.85869270791937, 0.11046702400456986, -48.67378583696511]
[153.3284161451347, -0.9344782009891125, -48.61755630462298]
[154.11860073021347, -0.6553947505116874, -49.949668589018074]
[155.85919247792117, 0.05473832995756562, -49.84400214746339]
[156.92487274317241, 0.4946305069438346, -49.86369016774595]
[158.06542554210534, 0.712759801803332, -49.61875275392013]
[159.356905031128, 1.3387109941852697, -49.9372717956745]
</code></pre></div></div>

<p>It would be neat to pump these into a 3D plot display as they come in,
such that my laptop displays the current tablet orientation on the
screen as I move it around, but I didn’t see any quick way to do this.</p>

<p>Here are some acceleration values at rest. Since I took these samples
on Earth the units are obviously in meters per second per second.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[-0.009576806798577309, 0.31603461503982544, 9.816226959228516]
[-0.047884032130241394, 0.3064578175544739, 9.806650161743164]
[-0.009576806798577309, 0.28730419278144836, 9.787496566772461]
[0.009576806798577309, 0.3064578175544739, 9.816226959228516]
[-0.06703764945268631, 0.3256114423274994, 9.797073364257812]
[-0.047884032130241394, 0.2968810200691223, 9.864110946655273]
[-0.028730420395731926, 0.2968810200691223, 9.576807022094727]
[-0.019153613597154617, 0.363918662071228, 9.691728591918945]
[-0.05746084079146385, 0.3734954595565796, 10.199298858642578]
</code></pre></div></div>

<p>Now that I have the hardware for it, I really want to use this API to
do something interesting in a web application. I just don’t have any
specific ideas yet.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Prototype-based Elisp Objects with @</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/04/07/"/>
    <id>urn:uuid:d1361157-9022-3e77-270c-5410d903c7d4</id>
    <updated>2013-04-07T00:00:00Z</updated>
    <category term="elisp"/><category term="emacs"/>
    <content type="html">
      <![CDATA[<p><strong>Reflection from the future</strong>: <em>This library is super slow and
inefficient. It should probably not be used for anything serious.</em></p>

<p>Last weekend I had the itch to play around with a multiple-inheritance
prototype-based object system in lisp. It would
<a href="http://steve-yegge.blogspot.com/2008/10/universal-design-pattern.html">look a lot like JavaScript’s object system</a> but wanted to try
experimenting some different ideas. My favorite lisp to hack in is
Emacs Lisp, so that’s what I built it on. What I ended up with is
actually pretty neat. Despite the lack of reader macros in Elisp, I
still managed to introduce new syntax by manipulating symbols at
compile time.</p>

<ul>
  <li><a href="https://github.com/skeeto/at-el">https://github.com/skeeto/at-el</a></li>
</ul>

<p>See the README for a quick demonstration. What follows is the long
explanation.</p>

<p>It’s called <a href="https://github.com/skeeto/at-el">@</a>, due to the syntax that it adds to Elisp as a
domain-specific language. It’s a mini-language, really. The name is also
a challenge to the code that supports Elisp, because so much of it —
including emacs-lisp-mode and Paredit — doesn’t properly handle @ in
identifiers. <del>Even <a href="https://github.com/bhollis/maruku">Maruku</a>, the Markdown to HTML translator I
use for this blog, has bugs that won’t allow it to handle the @
characters in my code, so I had to forgo most syntax highlighting for
this post.</del> (Update: I now use Kramdown so this is no longer an issue.)</p>

<p>Fortunately <code class="language-plaintext highlighter-rouge">require</code> <em>does</em> manage just fine.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'@</span><span class="p">)</span>
</code></pre></div></div>

<p>Objects in @ are vectors with the symbol @ as the first element. The
rest of the elements are implementation specific, but, at the moment,
the second element is a plist (property list) of all of that object’s
properties.</p>

<p>The root object of @ is @, and all other objects are instances of this
object, either directly or indirectly. Because it’s prototype based,
creating a new object is a matter of extending one or more
(multiple-inheritance) existing objects. This is done with the
function <code class="language-plaintext highlighter-rouge">@extend</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Create a brand new object</span>
<span class="p">(</span><span class="nb">defvar</span> <span class="nv">foo</span> <span class="p">(</span><span class="nv">@extend</span> <span class="nv">@</span><span class="p">))</span>
</code></pre></div></div>

<p>If no objects are given to <code class="language-plaintext highlighter-rouge">@extend</code>, @ will be used as the parent
object, so it’s not necessary as an argument above. This is actually
very important, as objects that don’t inherit from @ will not work at
all! I’ll get into that detail in a bit. Additionally, <code class="language-plaintext highlighter-rouge">@extend</code>
accepts keyword arguments, which become properties on the created
object.</p>

<p>The function @ is used to access properties on an object. Remember,
Elisp is a <em>lisp-2</em> meaning that variables and functions exist in
their own namespaces. This means there can be both a variable @ (the
root object) and function @ (property accessor).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">rectangle</span> <span class="p">(</span><span class="nv">@extend</span> <span class="ss">:width</span> <span class="mi">3</span> <span class="ss">:height</span> <span class="mi">4</span><span class="p">))</span>
<span class="p">(</span><span class="nv">@</span> <span class="nv">rectangle</span> <span class="ss">:width</span><span class="p">)</span>  <span class="c1">; =&gt; 3</span>
<span class="p">(</span><span class="nv">@</span> <span class="nv">rectangle</span> <span class="ss">:height</span><span class="p">)</span>  <span class="c1">; =&gt; 4</span>
</code></pre></div></div>

<p>The @ function is also <em>setf-able</em>, so setting properties should be
obvious to any lisper.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">@</span> <span class="nv">rectangle</span> <span class="ss">:width</span><span class="p">)</span> <span class="mi">13</span><span class="p">)</span>
<span class="p">(</span><span class="nv">@</span> <span class="nv">rectangle</span> <span class="ss">:width</span><span class="p">)</span>  <span class="c1">; =&gt; 13</span>
</code></pre></div></div>

<p>Like JavaScript, methods are just functions stored in properties on an
object. In @, the first argument for a method is the object itself,
which is called @@ by convention.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nv">@</span> <span class="nv">rectangle</span> <span class="ss">:area</span><span class="p">)</span>
  <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">@</span><span class="p">)</span> <span class="p">(</span><span class="nb">*</span> <span class="p">(</span><span class="nv">@</span> <span class="nv">@@</span> <span class="ss">:width</span><span class="p">)</span> <span class="p">(</span><span class="nv">@</span> <span class="nv">@@</span> <span class="ss">:height</span><span class="p">))))</span>

<span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">@</span> <span class="nv">rectangle</span> <span class="ss">:area</span><span class="p">)</span> <span class="nv">rectangle</span><span class="p">)</span>  <span class="c1">; =&gt; 52</span>
</code></pre></div></div>

<h3 id="new-syntax">New Syntax</h3>

<p>Here’s the first really neat part. I find all that <code class="language-plaintext highlighter-rouge">(@ @@ ...)</code>
business to be visually unpleasing. Fortunately this can be fixed by
adding syntax. The macro <code class="language-plaintext highlighter-rouge">def@</code> transforms variables that look like @:
into these @ accessors. The following declaration is equivalent to the
lambda assignment above. It’s meant to be very convenient.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">def@</span> <span class="nv">rectangle</span> <span class="ss">:area</span> <span class="p">()</span>
  <span class="p">(</span><span class="nb">*</span> <span class="nv">@:width</span> <span class="nv">@:height</span><span class="p">))</span>
</code></pre></div></div>

<p>This macro walks the body of the function at compile-time (macro
expansion time) and transforms these symbols into the full @ calls
above. Like most lisp macros, this has <em>no</em> run-time performance cost.</p>

<p>Because using <code class="language-plaintext highlighter-rouge">funcall</code> all the time and remembering to pass the
object as the first argument is tedious, the @! function is provided
for calling methods.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">@!</span> <span class="nv">rectangle</span> <span class="ss">:area</span><span class="p">)</span>  <span class="c1">; =&gt; 52</span>
</code></pre></div></div>

<p>The @: variables become function calls when in function position.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">def@</span> <span class="nv">rectangle</span> <span class="ss">:double-area</span> <span class="p">()</span>
  <span class="p">(</span><span class="nb">*</span> <span class="mi">2</span> <span class="p">(</span><span class="nv">@:area</span><span class="p">))</span>
</code></pre></div></div>

<p>In a <em>lisp-1</em> this would happen for free, but in Elisp this situation
expands to the @! form.</p>

<h3 id="inheritance">Inheritance</h3>

<p>This <code class="language-plaintext highlighter-rouge">rectangle</code> is starting to look like a nice re-usable object.
There’s a @ convention for this: prefix “class” object names with @.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">@rectangle</span> <span class="nv">rectangle</span><span class="p">)</span>
</code></pre></div></div>

<p>Now to create new rectangle objects.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="nv">foo</span> <span class="p">(</span><span class="nv">@extend</span> <span class="nv">@rectangle</span> <span class="ss">:width</span> <span class="mi">3</span> <span class="ss">:height</span> <span class="mf">7.1</span><span class="p">))</span>
<span class="p">(</span><span class="nv">@!</span> <span class="nv">foo</span> <span class="ss">:area</span><span class="p">)</span>  <span class="c1">; =&gt; 21.3</span>
</code></pre></div></div>

<p>Notice that the <code class="language-plaintext highlighter-rouge">foo</code> object doesn’t actually have an <code class="language-plaintext highlighter-rouge">:area</code> property
on itself. It was found on its parent, <code class="language-plaintext highlighter-rouge">@rectangle</code> by inheritance.
<code class="language-plaintext highlighter-rouge">:width</code> and <code class="language-plaintext highlighter-rouge">:height</code> were not looked up on the parent because
they’re already bound on <code class="language-plaintext highlighter-rouge">foo</code>.</p>

<p>Here’s another re-usable prototype. Notice that @: variables are
also setf-able — using <code class="language-plaintext highlighter-rouge">push</code> in this case.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">@colored</span> <span class="p">(</span><span class="nv">@extend</span> <span class="ss">:color</span> <span class="p">()))</span>

<span class="p">(</span><span class="nv">def@</span> <span class="nv">@colored</span> <span class="ss">:mix</span> <span class="p">(</span><span class="nv">color</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">push</span> <span class="nv">color</span> <span class="nv">@:color</span><span class="p">))</span>
</code></pre></div></div>

<p>The object system has multiple-inheritance, so colored rectangles can
be created from these two objects. The parent objects of an object are
listed in the <code class="language-plaintext highlighter-rouge">:proto</code> property as a list (similar to JavaScript’s
<code class="language-plaintext highlighter-rouge">__proto__</code>), which can be modified at any time to change an object’s
prototype chain.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">foo</span> <span class="p">(</span><span class="nv">@extend</span> <span class="nv">@colored</span> <span class="nv">@rectangle</span> <span class="ss">:width</span> <span class="mi">10</span> <span class="ss">:height</span> <span class="mi">4</span><span class="p">))</span>

<span class="p">(</span><span class="nv">@!</span> <span class="nv">foo</span> <span class="ss">:area</span><span class="p">)</span>  <span class="c1">; =&gt; 40</span>
<span class="p">(</span><span class="nv">@!</span> <span class="nv">foo</span> <span class="ss">:mix</span> <span class="ss">:red</span><span class="p">)</span>
<span class="p">(</span><span class="nv">@!</span> <span class="nv">foo</span> <span class="ss">:mix</span> <span class="ss">:blue</span><span class="p">)</span>
<span class="p">(</span><span class="nv">@</span> <span class="nv">foo</span> <span class="ss">:color</span><span class="p">)</span>  <span class="c1">; =&gt; (:blue :red)</span>
</code></pre></div></div>

<p>Even though the initial property was read from the parent, the
assignment (<code class="language-plaintext highlighter-rouge">push</code>), like all assignments, actually occurred on <code class="language-plaintext highlighter-rouge">foo</code>.</p>

<h3 id="setters-and-getters">Setters and Getters</h3>

<p>Remember how I said that objects that don’t eventually inherit from @
will be broken? This is because properties are actually set and
accessed through <code class="language-plaintext highlighter-rouge">:set</code> and <code class="language-plaintext highlighter-rouge">:get</code> methods. That is, @ calls these
methods as needed. The @ object provides the default actions for
these. An interesting part of the @ code: initially setting <code class="language-plaintext highlighter-rouge">:set</code> on
@ is a circularity problem, so there’s a special bootstrap step to
accomplish it.</p>

<p>By providing your own you can fundamentally change how your object
works. For example, here’s an <code class="language-plaintext highlighter-rouge">@immutable</code> mix-in which prevents all
property assignments. It’s provided as part of @.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">@immutable</span> <span class="p">(</span><span class="nv">@extend</span><span class="p">))</span>

<span class="p">(</span><span class="nv">def@</span> <span class="nv">@immutable</span> <span class="ss">:set</span> <span class="p">(</span><span class="nv">property</span> <span class="nv">_value</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">error</span> <span class="s">"Object is immutable, cannot set %s"</span> <span class="nv">property</span><span class="p">))</span>
</code></pre></div></div>

<p>This <code class="language-plaintext highlighter-rouge">:set</code> method will be found before the @ <code class="language-plaintext highlighter-rouge">:set</code> method, so it
gets overridden.</p>

<p>Remember how I said all object have a <code class="language-plaintext highlighter-rouge">:proto</code> that can be used to
modify the objects inheritance? This can be used to <em>freeze</em> an
object’s properties in place. Here’s a <code class="language-plaintext highlighter-rouge">:freeze</code> method for all
objects.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">def@</span> <span class="nv">@</span> <span class="ss">:freeze</span> <span class="p">()</span>
  <span class="s">"Make this object immutable."</span>
  <span class="p">(</span><span class="nb">push</span> <span class="nv">@immutable</span> <span class="nv">@:proto</span><span class="p">))</span>
</code></pre></div></div>

<p>Pretty cool, eh?</p>

<p>The <code class="language-plaintext highlighter-rouge">:get</code> method can be used to provide virtual properties.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">@squares</span> <span class="p">(</span><span class="nv">@extend</span><span class="p">))</span>

<span class="p">(</span><span class="nv">def@</span> <span class="nv">@squares</span> <span class="ss">:get</span> <span class="p">(</span><span class="nv">property</span><span class="p">)</span>
  <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">numberp</span> <span class="nv">property</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">expt</span> <span class="nv">property</span> <span class="mi">2</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">@^:get</span> <span class="nv">property</span><span class="p">)))</span>  <span class="c1">; explained in a moment</span>

<span class="p">(</span><span class="nb">mapcar</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">n</span><span class="p">)</span> <span class="p">(</span><span class="nv">@</span> <span class="nv">@squares</span> <span class="nv">n</span><span class="p">))</span> <span class="o">'</span><span class="p">(</span><span class="mi">0</span> <span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span> <span class="mi">4</span><span class="p">))</span>
<span class="c1">; =&gt; (0 1 4 9 16)</span>
</code></pre></div></div>

<p>I use this technique in the <code class="language-plaintext highlighter-rouge">@vector</code> class under <code class="language-plaintext highlighter-rouge">lib/</code> to expose the
elements of the internal vector as if they were properties.
<a href="http://50ply.com/">Brian</a> used this trick to make a @buffer prototype that wraps
Emacs’ buffers, with methods provided virtually by <code class="language-plaintext highlighter-rouge">:get</code>. For
example, the <code class="language-plaintext highlighter-rouge">:string</code> property would return a lambda that calls
<code class="language-plaintext highlighter-rouge">buffer-string</code>.</p>

<p>With multiple-inheritance and these setters and getters, there are a
lot of interesting mix-in possibilities. I’m only just discovering
some of them now.</p>

<h3 id="supermethods">Supermethods</h3>

<p>Sometimes it’s really useful to call supermethods. There’s syntax for
this: @^:. This calls the next method of that name in the prototype
chain. For example, here’s a <code class="language-plaintext highlighter-rouge">@watchable</code> mix-in (also provided by @)
that allows other code to be notified of changes to an object. It
needs to override <code class="language-plaintext highlighter-rouge">:set</code> but still call the original <code class="language-plaintext highlighter-rouge">:set</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">@watchable</span> <span class="p">(</span><span class="nv">@extend</span> <span class="ss">:watchers</span> <span class="no">nil</span><span class="p">))</span>

<span class="p">(</span><span class="nv">def@</span> <span class="nv">@watchable</span> <span class="ss">:watch</span> <span class="p">(</span><span class="nv">callback</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">push</span> <span class="nv">callback</span> <span class="nv">@:watchers</span><span class="p">))</span>

<span class="p">(</span><span class="nv">def@</span> <span class="nv">@watchable</span> <span class="ss">:unwatch</span> <span class="p">(</span><span class="nv">callback</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="nv">@:watchers</span> <span class="p">(</span><span class="nb">remove</span> <span class="nv">callback</span> <span class="nv">@:watchers</span><span class="p">)))</span>

<span class="p">(</span><span class="nv">def@</span> <span class="nv">@watchable</span> <span class="ss">:set</span> <span class="p">(</span><span class="nv">property</span> <span class="nv">new</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">callback</span> <span class="nv">@:watchers</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">funcall</span> <span class="nv">callback</span> <span class="nv">@@</span> <span class="nv">property</span> <span class="nv">new</span><span class="p">))</span>
  <span class="p">(</span><span class="nv">@^:set</span> <span class="nv">property</span> <span class="nv">new</span><span class="p">))</span>
</code></pre></div></div>

<p>This behavior is also used for constructors. By convention, the
<code class="language-plaintext highlighter-rouge">:init</code> method is the constructor. It should generally call the next
constructor with <code class="language-plaintext highlighter-rouge">(@^:init)</code>. @ has a no-op, no-argument <code class="language-plaintext highlighter-rouge">:init</code>
method to bottom-out this process.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">def@</span> <span class="nv">@rectangle</span> <span class="ss">:init</span> <span class="p">(</span><span class="nv">width</span> <span class="nv">height</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">@^:init</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="nv">@:width</span> <span class="nv">width</span> <span class="nv">@:height</span> <span class="nv">height</span><span class="p">))</span>

<span class="p">(</span><span class="nv">@!</span> <span class="p">(</span><span class="nv">@!</span> <span class="nv">@rectangle</span> <span class="ss">:new</span> <span class="mf">13.2</span> <span class="mf">2.1</span><span class="p">)</span> <span class="ss">:area</span><span class="p">)</span> <span class="c1">; =&gt; 27.72</span>
</code></pre></div></div>

<p>As shown, the <code class="language-plaintext highlighter-rouge">:new</code> method provided by the @ object combines both
<code class="language-plaintext highlighter-rouge">@extend</code> and <code class="language-plaintext highlighter-rouge">:init</code> to provide simple single-object inheritance.</p>

<h3 id="the-cost-of-">The Cost of @</h3>

<p>In the lib/ directory there are a bunch of example objects
implemented: including @vector, @queue, @stack, and @heap. I found
these to be very enjoyable to write, and they’ve been the testing
grounds for @. @heap uses an internal @vector instance and exercises
@’s features the most.</p>

<p>The performance cost of @ very apparent with @heap. Even byte-compiled
it’s slower than the naive implementation (compose <code class="language-plaintext highlighter-rouge">push</code> and <code class="language-plaintext highlighter-rouge">sort</code>)
for even as high as 1,000 elements. While I think @ leads to elegant
code, there’s still plenty to do for performance. It’s comically slow.</p>

<p>This really caught Brian’s interest, because it was an opportunity to
put on his programming language designer’s hat — which I believe to
be his favorite hat. He’s been trying different caching strategies to
reduce all the walking of the prototype chain. This effort can be
found in the other repository branches and in his fork. The system is
so dynamic that cache invalidation is a really complex problem.</p>

<p>Every time a property is set, @ has to find the <code class="language-plaintext highlighter-rouge">:set</code> property for
that object, which generally means walking all the way up to @.
Because <code class="language-plaintext highlighter-rouge">:proto</code> can be modified at any time, every property look-up
requires computing the precedence order (lazily). This all makes
property assignment quite expensive! I can understand why real object
systems aren’t this flexible. It comes at a high price.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Fast Monte Carlo Method with JavaScript</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/02/25/"/>
    <id>urn:uuid:0208230e-3f57-334e-5d57-7a18f3794288</id>
    <updated>2013-02-25T00:00:00Z</updated>
    <category term="emacs"/><category term="lisp"/><category term="c"/><category term="javascript"/>
    <content type="html">
      <![CDATA[<blockquote>
  <p>How many times should a random number from <code class="language-plaintext highlighter-rouge">[0, 1]</code> be drawn to have
it sum over 1?</p>
</blockquote>

<p>If you want to figure it out for yourself, stop reading now and come
back when you’re done.</p>

<p><a href="http://bayesianthink.blogspot.com/2013/02/the-expected-draws-to-sum-over-one.html">The answer</a> is <em>e</em>. When I came across this question I took
the lazy programmer route and, rather than work out the math, I
estimated the answer using the Monte Carlo method. I used the language
I always use for these scratchpad computations: Emacs Lisp. All I need
to do is switch to the <code class="language-plaintext highlighter-rouge">*scratch*</code> buffer and start hacking. No
external program needed.</p>

<p>The downside is that Elisp is incredibly slow. Fortunately, Elisp is
so similar to Common Lisp that porting to it is almost trivial. My
preferred Common Lisp implementation, SBCL, is very, very fast so it’s
a huge speed upgrade with little cost, should I need it. As far as I
know, SBCL is the fastest Common Lisp implementation.</p>

<p>Even though Elisp was fast enough to determine that the answer is
probably <em>e</em>, I wanted to play around with it. This little test
program doubles as a way to estimate the value of <em>e</em>,
<a href="http://math.fullerton.edu/mathews/n2003/montecarlopimod.html">similar to estimating <em>pi</em></a>. The more trial runs I give it the
more accurate my answer will get — to a point.</p>

<p>Here’s the Common Lisp version. (I love the loop macro, obviously.)</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">trial</span> <span class="p">()</span>
  <span class="p">(</span><span class="nb">loop</span> <span class="nv">for</span> <span class="nb">count</span> <span class="nv">upfrom</span> <span class="mi">1</span>
     <span class="nv">sum</span> <span class="p">(</span><span class="nb">random</span> <span class="mf">1.0</span><span class="p">)</span> <span class="nv">into</span> <span class="nv">total</span>
     <span class="nv">until</span> <span class="p">(</span><span class="nb">&gt;</span> <span class="nv">total</span> <span class="mi">1</span><span class="p">)</span>
     <span class="nv">finally</span> <span class="p">(</span><span class="nb">return</span> <span class="nb">count</span><span class="p">)))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">monte-carlo</span> <span class="p">(</span><span class="nv">n</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">loop</span> <span class="nv">repeat</span> <span class="nv">n</span>
     <span class="nv">sum</span> <span class="p">(</span><span class="nv">trial</span><span class="p">)</span> <span class="nv">into</span> <span class="nv">total</span>
     <span class="nv">finally</span> <span class="p">(</span><span class="nb">return</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">total</span> <span class="mf">1.0</span> <span class="nv">n</span><span class="p">))))</span>
</code></pre></div></div>

<p>Using SBCL 1.0.57.0.debian on an Intel Core i7-2600 CPU, once
everything’s warmed up this takes about 9.4 seconds with 100 million
trials.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(time (monte-carlo 100000000))
Evaluation took:
  9.423 seconds of real time
  9.388587 seconds of total run time (9.380586 user, 0.008001 system)
  99.64% CPU
  31,965,834,356 processor cycles
  99,008 bytes consed
2.7185063
</code></pre></div></div>

<p>Since this makes for an interesting benchmark I gave it a whirl in
JavaScript,</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">function</span> <span class="nx">trial</span><span class="p">()</span> <span class="p">{</span>
    <span class="kd">var</span> <span class="nx">count</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="nx">sum</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">while</span> <span class="p">(</span><span class="nx">sum</span> <span class="o">&lt;=</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="nx">sum</span> <span class="o">+=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">random</span><span class="p">();</span>
        <span class="nx">count</span><span class="o">++</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="nx">count</span><span class="p">;</span>
<span class="p">}</span>

<span class="kd">function</span> <span class="nx">monteCarlo</span><span class="p">(</span><span class="nx">n</span><span class="p">)</span> <span class="p">{</span>
    <span class="kd">var</span> <span class="nx">total</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o">&lt;</span> <span class="nx">n</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="nx">total</span> <span class="o">+=</span> <span class="nx">trial</span><span class="p">();</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="nx">total</span> <span class="o">/</span> <span class="nx">n</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I ran this on Chromium 24.0.1312.68 Debian 7.0 (180326) which uses V8,
currently the fastest JavaScript engine. With 100 million trials,
<strong>this only took about 2.7 seconds</strong>!</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">monteCarlo</span><span class="p">(</span><span class="mi">100000000</span><span class="p">);</span> <span class="c1">// ~2.7 seconds, according to Skewer</span>
<span class="c1">// =&gt; 2.71850356</span>
</code></pre></div></div>

<p>Whoa! It beat SBCL! I was shocked. Let’s try using C as a
baseline. Surely C will be the fastest.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;stdlib.h&gt;</span><span class="cp">
</span>
<span class="kt">int</span> <span class="nf">trial</span><span class="p">()</span> <span class="p">{</span>
    <span class="kt">int</span> <span class="n">count</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="kt">double</span> <span class="n">sum</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">while</span> <span class="p">(</span><span class="n">sum</span> <span class="o">&lt;=</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">sum</span> <span class="o">+=</span> <span class="n">rand</span><span class="p">()</span> <span class="o">/</span> <span class="p">(</span><span class="kt">double</span><span class="p">)</span> <span class="n">RAND_MAX</span><span class="p">;</span>
        <span class="n">count</span><span class="o">++</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">count</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">double</span> <span class="nf">monteCarlo</span><span class="p">(</span><span class="kt">int</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">int</span> <span class="n">i</span><span class="p">,</span> <span class="n">total</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">n</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">total</span> <span class="o">+=</span> <span class="n">trial</span><span class="p">();</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">total</span> <span class="o">/</span> <span class="p">(</span><span class="kt">double</span><span class="p">)</span> <span class="n">n</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"%f</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">monteCarlo</span><span class="p">(</span><span class="mi">100000000</span><span class="p">));</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I used the highest optimization setting on the compiler.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -ansi -W -Wall -Wextra -O3 temp.c
$ time ./a.out
2.718359

real	0m3.782s
user	0m3.760s
sys	0m0.000s
</code></pre></div></div>

<p>Incredible! <strong>JavaScript was faster than C!</strong> That was completely
unexpected.</p>

<h3 id="the-circumstances">The Circumstances</h3>

<p>Both the Common Lisp and C code could probably be carefully tweaked to
improve performance. In Common Lisp’s case I could attach type
information and turn down safety. For C I could use more compiler
flags to squeeze out a bit more performance. Then <em>maybe</em> they could
beat JavaScript.</p>

<p>In contrast, as far as I can tell the JavaScript code is already as
optimized as it can get. There just aren’t many knobs to tweak. Note
that minifying the code will make no difference, especially since I’m
not measuring the parsing time. Except for the functions themselves,
the variables are all local, so they are never “looked up” at
run-time. Their name length doesn’t matter. Remember, in JavaScript
<em>global</em> variables are expensive, because they’re (generally) hash
table lookups on the global object at run-time. For any decent
compiler, local variables are basically precomputed memory offsets —
very fast.</p>

<p>The function names themselves are global variables, but the V8
compiler appears to eliminate this cost (inlining?). Wrapping the
entire thing in another function, turning the two original functions
into local variables, makes no difference in performance.</p>

<p>While Common Lisp and C <em>may</em> be able to beat JavaScript if time is
invested in optimizing them — something to be done rarely — in a
casual implementation of this algorithm, JavaScript beats them both. I
find this really exciting.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>How to Make an Emacs Minor Mode</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/02/06/"/>
    <id>urn:uuid:064f3179-c778-3ed8-f75e-01be022f9751</id>
    <updated>2013-02-06T00:00:00Z</updated>
    <category term="emacs"/>
    <content type="html">
      <![CDATA[<p>An Emacs buffer always has one major mode and zero or more minor
modes. Major modes tend to be significant efforts, especially when it
comes to automatic indentation. In contrast, minor modes are often
simple, perhaps only overlaying a small keymap for additional
functionality. Creating a new minor mode is really easy, it’s just a
matter of understanding Emacs’ conventions.</p>

<p>Mode names should end in <code class="language-plaintext highlighter-rouge">-mode</code> and the command for toggling the mode
should be the same name. They keymap for the mode should be called
<em>mode</em><code class="language-plaintext highlighter-rouge">-map</code> and the mode’s toggle hook should be called
<em>mode</em><code class="language-plaintext highlighter-rouge">-hook</code>. Keep all of this in mind when picking a name for your
minor mode.</p>

<p>There are a number of other tedious issues that need to be taken into
account when manually building a minor mode. The good news is that no
one needs to worry about most of it! Lisp has macros for cutting down
on boilerplate code and so there’s a macro for this very purpose:
<code class="language-plaintext highlighter-rouge">define-minor-mode</code>. Here’s all it takes to make a new minor mode,
<code class="language-plaintext highlighter-rouge">foo-mode</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">define-minor-mode</span> <span class="nv">foo-mode</span>
  <span class="s">"Get your foos in the right places."</span><span class="p">)</span>
</code></pre></div></div>

<p>This creates a command <code class="language-plaintext highlighter-rouge">foo-mode</code> for toggling the minor mode and a
hook called <code class="language-plaintext highlighter-rouge">foo-mode-hook</code>. There’s a strange caveat about the hook:
it’s not immediately declared as a variable. My guess is that this is
some archaic optimization which now exists as bad design. The hook
function <code class="language-plaintext highlighter-rouge">add-hook</code> will create this variable lazily when needed and
the function <code class="language-plaintext highlighter-rouge">run-hooks</code> will ignore hook variables that don’t yet
exist, so it doesn’t get tripped up by this situation. So despite its
strange initial absence, the new minor mode <em>will</em> use this hook as
soon as functions are added to it.</p>

<h3 id="minor-mode-options">Minor Mode Options</h3>

<p>This mode doesn’t <em>do</em> anything yet. It doesn’t have its own keymap
and it doesn’t even show up in the modeline. It’s just a toggle and a
hook that’s run when the toggle is used. To add more to the mode,
<code class="language-plaintext highlighter-rouge">define-minor-mode</code> accepts a number of keywords. Here are the
important ones.</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">:lighter</code>: the name, a string, to show in the modeline</li>
  <li><code class="language-plaintext highlighter-rouge">:keymap</code>: the mode’s keymap</li>
  <li><code class="language-plaintext highlighter-rouge">:global</code>: specifies if the minor mode is <em>global</em></li>
</ul>

<p>The <code class="language-plaintext highlighter-rouge">:lighter</code> option has one caveat: it’s concatenated to the rest of
the modeline without any delimiter. This means it needs to be prefixed
with a space. I think this is mistake, but we’re stuck with it
probably forever. Otherwise this string should be kept short: there’s
generally not much room on the modeline.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">define-minor-mode</span> <span class="nv">foo-mode</span>
  <span class="s">"Get your foos in the right places."</span>
  <span class="ss">:lighter</span> <span class="s">" foo"</span><span class="p">)</span>
</code></pre></div></div>

<p>New, empty keymaps are created with <code class="language-plaintext highlighter-rouge">(make-keymap)</code> or
<code class="language-plaintext highlighter-rouge">(make-sparse-keymap)</code>. The latter is more efficient when the map will
contain a small number of keybindings, as is the case with most minor
modes. The fact that these separate functions exist is probably
another outdated, premature optimization. To avoid confusing others, I
recommend you use the one that matches your intended usage.</p>

<p>The keymap can be provided directly to <code class="language-plaintext highlighter-rouge">:keymap</code> and it will be bound
to <code class="language-plaintext highlighter-rouge">foo-mode-map</code> automatically. I could just put an empty keymap here
and define keys separately outside the <code class="language-plaintext highlighter-rouge">define-minor-mode</code>
declaration, but I like the idea of creating the whole map in one
expression.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">insert-foo</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">insert</span> <span class="s">"foo"</span><span class="p">))</span>

<span class="p">(</span><span class="nv">define-minor-mode</span> <span class="nv">foo-mode</span>
  <span class="s">"Get your foos in the right places."</span>
  <span class="ss">:lighter</span> <span class="s">" foo"</span>
  <span class="ss">:keymap</span> <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nb">map</span> <span class="p">(</span><span class="nv">make-sparse-keymap</span><span class="p">)))</span>
            <span class="p">(</span><span class="nv">define-key</span> <span class="nb">map</span> <span class="p">(</span><span class="nv">kbd</span> <span class="s">"C-c f"</span><span class="p">)</span> <span class="ss">'insert-foo</span><span class="p">)</span>
            <span class="nb">map</span><span class="p">))</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">:global</code> option means the minor mode is not local to a buffer,
it’s present everywhere. As far as I know, the only global minor mode
I’ve ever used is <a href="https://github.com/capitaomorte/yasnippet">YASnippet</a>.</p>

<h3 id="minor-mode-body">Minor Mode Body</h3>

<p>The rest of <code class="language-plaintext highlighter-rouge">define-minor-mode</code> is a body for arbitrary Lisp, like a
<code class="language-plaintext highlighter-rouge">defun</code>. It’s run every time the mode is toggled off or on, so it’s
like a built-in hook function. Use it to do any sort of special setup
or teardown, such hooking or unhooking Emacs’ hooks. A likely thing to
be done in here is specifying <em>buffer-local variables</em>.</p>

<p>Any time the Emacs interpreter is evaluating an expression there’s
always a <em>current buffer</em> acting as context. Many functions that
operate on buffers don’t actually accept a buffer as an
argument. Instead they operate on the current buffer. Furthermore,
some variables are buffer-local: the binding is dynamic over the
current buffer. This is useful for maintaining state relevant only to
a particular buffer.</p>

<p>Side note: the <code class="language-plaintext highlighter-rouge">with-current-buffer</code> macro is used to specify a
different current buffer for a body of code. It can be used to access
other buffer’s local variables. Similarly, <code class="language-plaintext highlighter-rouge">with-temp-buffer</code> creates
a brand new buffer, uses it as the current buffer for its body, and
then destroys the buffer.</p>

<p>For example, let’s say I want to keep track of how many times
<code class="language-plaintext highlighter-rouge">foo-mode</code> inserted “foo” into the current buffer.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">foo-count</span> <span class="mi">0</span>
  <span class="s">"Number of foos inserted into the current buffer."</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">insert-foo</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
  <span class="p">(</span><span class="k">setq</span> <span class="nv">foo-count</span> <span class="p">(</span><span class="nb">1+</span> <span class="nv">foo-count</span><span class="p">))</span>
  <span class="p">(</span><span class="nv">insert</span> <span class="s">"foo"</span><span class="p">))</span>

<span class="p">(</span><span class="nv">define-minor-mode</span> <span class="nv">foo-mode</span>
  <span class="s">"Get your foos in the right places."</span>
  <span class="ss">:lighter</span> <span class="s">" foo"</span>
  <span class="ss">:keymap</span> <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nb">map</span> <span class="p">(</span><span class="nv">make-sparse-keymap</span><span class="p">)))</span>
            <span class="p">(</span><span class="nv">define-key</span> <span class="nb">map</span> <span class="p">(</span><span class="nv">kbd</span> <span class="s">"C-c f"</span><span class="p">)</span> <span class="ss">'insert-foo</span><span class="p">)</span>
            <span class="nb">map</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">make-local-variable</span> <span class="ss">'foo-count</span><span class="p">))</span>
</code></pre></div></div>

<p>The built-in function <code class="language-plaintext highlighter-rouge">make-local-variable</code> creates a new buffer-local
version of a global variable in the current buffer. Here, the
buffer-local <code class="language-plaintext highlighter-rouge">foo-count</code> will be initialized with the value 0 from the
global variable but all reassignments will only be visible in the
current buffer.</p>

<p>However, in this case it may be better to use
<code class="language-plaintext highlighter-rouge">make-variable-buffer-local</code> on the global variable and skip the
<code class="language-plaintext highlighter-rouge">make-local-variable</code>. The main reason is that I don’t want
<code class="language-plaintext highlighter-rouge">insert-foo</code> to clobber the global variable if it happens to be used
in a buffer that doesn’t have the minor mode enabled.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">make-variable-buffer-local</span>
 <span class="p">(</span><span class="nb">defvar</span> <span class="nv">foo-count</span> <span class="mi">0</span>
   <span class="s">"Number of foos inserted into the current buffer."</span><span class="p">))</span>
</code></pre></div></div>

<p>A big advantage is that this buffer-local intention for the variable
is documented globally. This message will appear in the variable’s
documentation.</p>

<blockquote>
  <p>Automatically becomes buffer-local when set in any fashion.</p>
</blockquote>

<p>Which method you use is up to your personal preference. The Emacs
documentation encourages the former but I think the latter is nicer
in many situations.</p>

<h3 id="automatically-enabling-the-minor-mode">Automatically Enabling the Minor Mode</h3>

<p>Some minor modes don’t have any particular major mode association and
the user will toggle it at will. Some minor modes only make sense when
used with particular major mode and it might make sense to
automatically enable along with that mode. This is done by hooking
that major mode’s hook. So long as the mode follows Emacs’ conventions
as mentioned at the top, this hook should be easy to find.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'text-mode-hook</span> <span class="ss">'foo-mode</span><span class="p">)</span>
</code></pre></div></div>

<p>Here, <code class="language-plaintext highlighter-rouge">foo-mode</code> will automatically be activated in all <code class="language-plaintext highlighter-rouge">text-mode</code>
buffers.</p>

<h3 id="full-code">Full Code</h3>

<p>Here’s the final code for our minor mode, saved to <code class="language-plaintext highlighter-rouge">foo-mode.el</code>. It
has one keybinding and it’s easily open for users to define more keys
in <code class="language-plaintext highlighter-rouge">foo-mode-map</code>. It also automatically activates when the user is
editing a plain text file.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">make-variable-buffer-local</span>
 <span class="p">(</span><span class="nb">defvar</span> <span class="nv">foo-count</span> <span class="mi">0</span>
   <span class="s">"Number of foos inserted into the current buffer."</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">insert-foo</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
  <span class="p">(</span><span class="k">setq</span> <span class="nv">foo-count</span> <span class="p">(</span><span class="nb">1+</span> <span class="nv">foo-count</span><span class="p">))</span>
  <span class="p">(</span><span class="nv">insert</span> <span class="s">"foo"</span><span class="p">))</span>

<span class="c1">;;;###autoload</span>
<span class="p">(</span><span class="nv">define-minor-mode</span> <span class="nv">foo-mode</span>
  <span class="s">"Get your foos in the right places."</span>
  <span class="ss">:lighter</span> <span class="s">" foo"</span>
  <span class="ss">:keymap</span> <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nb">map</span> <span class="p">(</span><span class="nv">make-sparse-keymap</span><span class="p">)))</span>
            <span class="p">(</span><span class="nv">define-key</span> <span class="nb">map</span> <span class="p">(</span><span class="nv">kbd</span> <span class="s">"C-c f"</span><span class="p">)</span> <span class="ss">'insert-foo</span><span class="p">)</span>
            <span class="nb">map</span><span class="p">))</span>

<span class="c1">;;;###autoload</span>
<span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'text-mode-hook</span> <span class="ss">'foo-mode</span><span class="p">)</span>

<span class="p">(</span><span class="nb">provide</span> <span class="ss">'foo-mode</span><span class="p">)</span>
</code></pre></div></div>

<p>I added some autoload declarations and a <code class="language-plaintext highlighter-rouge">provide</code> in case this mode
is ever distributed or used as a package. If an autoloads script is
generated for this minor mode, a temporary function called <code class="language-plaintext highlighter-rouge">foo-mode</code>
will be defined whose sole purpose is to load the real <code class="language-plaintext highlighter-rouge">foo-mode.el</code>
and then call <code class="language-plaintext highlighter-rouge">foo-mode</code> again with its new definition, which was
loaded overtop the temporary definition.</p>

<p>The autoloads script also adds this temporary <code class="language-plaintext highlighter-rouge">foo-mode</code> function to
the <code class="language-plaintext highlighter-rouge">text-mode-hook</code>. If a <code class="language-plaintext highlighter-rouge">text-mode</code> buffer is created, the hook
will call <code class="language-plaintext highlighter-rouge">foo-mode</code> which will load <code class="language-plaintext highlighter-rouge">foo-mode.el</code>, redefining
<code class="language-plaintext highlighter-rouge">foo-mode</code> to its real definition, then activate <code class="language-plaintext highlighter-rouge">foo-mode</code>.</p>

<p>The point of autoloads is to defer loading code until it’s needed. You
may notice this as a short delay the first time you activate a mode
after starting Emacs. This is what keeps Emacs’ start time reasonable
despite having millions of lines of Elisp virtually loaded at startup.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Emacs Javadoc Lookups Get a Facelift</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/01/30/"/>
    <id>urn:uuid:18159666-2379-359b-35dd-edf5c4bdbf60</id>
    <updated>2013-01-30T00:00:00Z</updated>
    <category term="emacs"/><category term="java"/><category term="clojure"/>
    <content type="html">
      <![CDATA[<p>Ever since
<a href="/blog/2012/08/12/">I started using the Emacs package archive</a>,
specifically <a href="http://melpa.milkbox.net/">MELPA</a>, I’d been wanting to tidy up
<a href="/blog/2010/10/14/">my Emacs Java extensions</a>, java-mode-plus, into a
nice, official package. Observing my own attitude after the switch, I
noticed that if a package isn’t available on ELPA or MELPA, it
practically doesn’t exist for me. Manually installing anything now
seems like so much trouble in comparison, and getting a package on
MELPA is so easy that there’s little excuse for package authors not to
have their package in at least one of the three major Elisp
archives. This is exactly the attitude my own un-archived package
would be facing from other people, and rightfully so.</p>

<p>Before I dive in, this is what the user configuration now looks like,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">javadoc-add-artifacts</span> <span class="nv">[org.lwjgl.lwjg</span> <span class="nv">lwjgl</span> <span class="s">"2.8.2"</span><span class="nv">]</span>
                       <span class="nv">[com.nullprogram</span> <span class="nv">native-guide</span> <span class="s">"0.2"</span><span class="nv">]</span>
                       <span class="nv">[org.apache.commons</span> <span class="nv">commons-math3</span> <span class="s">"3.0"</span><span class="nv">]</span><span class="p">)</span>
</code></pre></div></div>

<p>That’s right: it knows how to find, fetch, and index documentation on
its own. Keep reading if this sounds useful to you.</p>

<h3 id="the-problem">The Problem</h3>

<p>The problem was that java-mode-plus was doing two unrelated things:</p>

<ul>
  <li>
    <p>Supporting Ant-oriented Java projects. Not
 <a href="http://kent.spillner.org/blog/work/2009/11/14/java-build-tools.html">being a fan of Maven</a>, I’ve used Ant for all of my own
 personal projects. (However, I really do like the Maven
 infrastructure, so I use Apache Ivy.) It seems Maven is a lot more
 popular, so this part isn’t useful for many people.</p>
  </li>
  <li>
    <p>Quick Javadoc referencing, which I was calling java-docs. I think
 this is generally useful for anyone writing Java in Emacs, even if
 they’re using another suite like JDEE or writing in another JVM
 language. It would be nice for people to be able to use this without
 pulling in all of java-mode-plus — which was somewhat intrusive.</p>
  </li>
</ul>

<p>I also didn’t like the names I had picked. java-mode-plus wasn’t even
a mode until recently and its name isn’t conventional. And “java-docs”
is just stupid. I recently solved all this by splitting the
java-mode-plus into two new packages,</p>

<ul>
  <li>
    <p><a href="https://github.com/skeeto/ant-project-mode"><em>ant-project-mode</em></a> — A minor mode that
 performs the duties of the first task above. Since I’ve
 <a href="/blog/2012/08/12/">phased Java out</a> from my own personal projects
 and no longer intend to write Java anymore, this part isn’t very
 useful to me personally at the moment. If I do need to write Java for
 work again I’ll probably dust this off. It’s by no means
 un-maintained, it’s just in maintenance mode for now. Because of
 this, this is not in any Emacs package archive</p>
  </li>
  <li>
    <p><a href="https://github.com/skeeto/javadoc-lookup"><em>javadoc-lookup</em></a> — This is java-docs renamed and
 <strong>with some new goodies!</strong> I also <strong>put this on MELPA</strong>, where it’s
 easy for anyone to use. This is continues to be useful for me as I
 use Clojure.</p>
  </li>
</ul>

<h3 id="javadoc-lookup">javadoc-lookup</h3>

<p>This is used like java-docs before it, just under a different
name. The function <code class="language-plaintext highlighter-rouge">javadoc-lookup</code> asks for a Java class for
documentation. I like to bind this to <code class="language-plaintext highlighter-rouge">C-h j</code>.</p>

<p>The function <code class="language-plaintext highlighter-rouge">javadoc-add-roots</code> provides filesystem paths to be
indexed for lookup.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">javadoc-add-roots</span> <span class="s">"/usr/share/doc/openjdk-6-jdk/api"</span>
                   <span class="s">"~/src/project/doc"</span><span class="p">)</span>
</code></pre></div></div>

<p>Also, as before, if you don’t provide a root for the core Java API, it
will automatically load an index of the official Javadoc hosted
online. This means it can be installed from MELPA and used immediately
without any configuration. Good defaults and minimal required
configuration <a href="/blog/2012/10/31/">is something I highly value</a>.</p>

<p>Back in the java-docs days, when I started using a new library I’d
track down the Javadoc jar, unzip it somewhere on my machine, and add
it to be indexed. I regularly do development on four different
computers, so this gets tedious fast. Since the Javadoc jars are
easily available from the Maven repository, I maintained a small Ant
project within my .emacs.d for awhile just to do this fetching, but it
was a dirty hack.</p>

<h4 id="finally-the-goodies">Finally, the Goodies</h4>

<p>Here’s the cool new part: I built this functionality into
javadoc-lookup. <strong>It can fetch all your documentation for you!</strong>
Instead of providing a path on your filesystem, you name an artifact
that Maven can find. javadoc-lookup will call Maven to fetch the
Javadoc jar, unzip it into a cache directory, and index it for
lookups. You will need Maven installed either on your <code class="language-plaintext highlighter-rouge">$PATH</code> or at
<code class="language-plaintext highlighter-rouge">maven-program-name</code> (Elisp variable).</p>

<p>Here’s a sample configuration. It’s group, artifact, version provided
as a sequence. I say “sequence” because it can be either a list or a
vector and those names can be either strings or symbols. I prefer the
vector/symbol method because it requires
<a href="/blog/2012/07/17/">the least quoting</a>, plus it looks Clojure-ish.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">javadoc-add-artifacts</span> <span class="nv">[org.lwjgl.lwjg</span> <span class="nv">lwjgl</span> <span class="s">"2.8.2"</span><span class="nv">]</span>
                       <span class="nv">[com.nullprogram</span> <span class="nv">native-guide</span> <span class="s">"0.2"</span><span class="nv">]</span>
                       <span class="nv">[org.apache.commons</span> <span class="nv">commons-math3</span> <span class="s">"3.0"</span><span class="nv">]</span><span class="p">)</span>
</code></pre></div></div>

<p>Put that in your initialization and all this documentation will appear
in the lookup index. It only needs to fetch from Maven once per
artifact per system — a very very slow process. After that it
operates entirely from its own cache which is very fast, so it won’t
slow down your startup.</p>

<p>This has been extremely convenient for me so I hope other people find
it useful, too.</p>

<p>As a final note, javadoc-lookup also exploits structural sharing in
its tables, using a lot less memory than java-docs. Not that it was a
problem before; it’s a feel-good feature.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>Live CSS Interaction with Skewer</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/01/24/"/>
    <id>urn:uuid:92c8a519-1e4c-374b-7f90-37b1dadfc862</id>
    <updated>2013-01-24T00:00:00Z</updated>
    <category term="emacs"/><category term="web"/>
    <content type="html">
      <![CDATA[<p>This evening <a href="/blog/2012/10/31/">Skewer</a> gained support for live CSS.
When editing CSS code, you can send your rules and declarations from
the editing buffer to be applied in the open page in the browser. It
makes experimenting with CSS really, really easy. The functionality is
exposed through the familiar interaction keybindings, so if you’re
already familiar with other Emacs interaction modes
(<a href="/blog/2010/01/15/">SLIME</a>, <a href="/blog/2013/01/07/">nREPL</a>, Skewer,
<a href="http://www.nongnu.org/geiser/">Geiser</a>, Emacs Lisp), this should feel right at home.</p>

<p>To provide the keybindings in css-mode there’s a new minor mode,
skewer-css-mode. CSS “expressions” are sent to the browser through the
communication channel already provided by Skewer. It’s essentially an
extension to Skewer: it could have been created without making any
changes to Skewer itself.</p>

<p>Unfortunately Emacs’ css-mode is nowhere near as sophisticated as
js2-mode — which reads in and exposes a full JavaScript AST. I had to
write my own very primitive CSS parsing routines to tease things
apart. It should generally be able to parse declarations and rules
reasonably no matter how it’s indented, but it’s not very good at
navigating <em>around</em> comments, especially when they contain CSS
syntax. If I find a way to parse CSS more easily sometime I’ll see
about fixing it, but it’s plenty good enough for now.</p>

<p>To “evaluate” the CSS, the code is simply dropped into the page as a
new <code class="language-plaintext highlighter-rouge">&lt;style&gt;</code> tag. I had considered other approaches, but this seemed
to be by far the simplest way to support arbitrary selectors and
shorthand properties. The more programmatic approaches would require
re-writing something that browser already does.</p>

<p>The consequence of this is that every “evaluation” adds a new
<code class="language-plaintext highlighter-rouge">&lt;style&gt;</code> tag to the page, which adds more and more load to style
computation, most of which completely mask each other. Since there’s
no way to tell when a particular <code class="language-plaintext highlighter-rouge">&lt;style&gt;</code> tag has been completely
masked I can’t remove any of them from the page. That might revert a
declaration that’s still in usde. I haven’t seen it happen yet but I
wonder if it’s possible to run into browser problems during extended
CSS interaction, when thousands of stylesheets have built up on a
single page. Time will tell.</p>

<p>Just before doing all this, I added full support for Cross-resource
Resource Sharing (CORS), which means <em>any</em> page from any server can be
skewered, not just pages hosted by Emacs itself … as long as you can
get skewer.js in the page as a script. To help with that, I wrote a
<a href="https://github.com/skeeto/skewer-mode/blob/master/skewer-everything.user.js">Greasemonkey userscript</a> that can automatically skewer any
visited page. I can now manipulate from Emacs the JavaScript and CSS
of <em>any</em> page I visit in my browser. It feels really powerful. I
already have a good use for this at work right now.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>The Limits of Emacs Advice</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/01/22/"/>
    <id>urn:uuid:1cfcdeee-19e5-33b8-1344-8fef68333e41</id>
    <updated>2013-01-22T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>Today at work I was using <a href="http://www.50ply.com/blog/2012/08/13/introducing-impatient-mode/">impatient-mode</a> to share some code
with <a href="http://www.50ply.com/">Brian</a>. It makes for a really handy live pastebin. To
limit the buffer to the relevant code, I narrowed it down with
<code class="language-plaintext highlighter-rouge">narrow-to-region</code>. However, the browser wouldn’t update to show only
the narrowed region until I made an edit. This makes sense because
impatient-mode hooks <code class="language-plaintext highlighter-rouge">after-change-functions</code>.  Narrowing the buffer
doesn’t <em>change</em> anything in the buffer, so, as expected, this hook is
not called.</p>

<p>The solution would be to also join whatever hook is called when the
buffer restriction changes. Unfortunately,
<a href="http://www.gnu.org/software/emacs/manual/html_node/elisp/Standard-Hooks.html">no such hook exists</a>. I thought I could create this hook with
some <a href="http://www.gnu.org/software/emacs/manual/html_node/elisp/Advising-Functions.html">advice</a>, but this turns out to be currently impossible.</p>

<h3 id="emacs-advice">Emacs Advice</h3>

<p>What’s advice? It’s a handy feature of Emacs lisp that allows users to
modify the behavior of almost any function without having to redefine
it. It works a little bit like methods in the Common Lisp Object
System (CLOS): advice is code than can be evaluated before, after, or
around a function.</p>

<p>Advice is defined with <code class="language-plaintext highlighter-rouge">defadvice</code>. Duh. For example, say we wanted to
be silly and have Emacs say “Ouch!” when a line is killed with
<code class="language-plaintext highlighter-rouge">kill-line</code>. We can advise this function to display a message.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defadvice</span> <span class="nv">kill-line</span> <span class="p">(</span><span class="nv">after</span> <span class="nv">say-ouch</span> <span class="nv">activate</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">message</span> <span class="s">"Ouch!"</span><span class="p">))</span>
</code></pre></div></div>

<p>This says we want to advise the function <code class="language-plaintext highlighter-rouge">kill-line</code>, we want this
advise to execute <em>after</em> <code class="language-plaintext highlighter-rouge">kill-line</code> has run, our advice is named
“<code class="language-plaintext highlighter-rouge">say-ouch</code>”, and we want to immediately activate this advice so it
gets used right away. The rest is the body of the advice, like the
body of a function. After evaluating this <code class="language-plaintext highlighter-rouge">defadvice</code>, every time I
hit <code class="language-plaintext highlighter-rouge">C-k</code> Emacs says “Ouch!” in the minibuffer. Cool!</p>

<h3 id="narrow-to-region-and-widen">narrow-to-region and widen</h3>

<p>A hook is a variable that holds a list of functions. (Or maybe hooks
are the functions in this list? Emacs’ documentation calls both of
these things hooks.) These functions are called, usually without
arguments, when some specific event occurs. For example, every mode
has its own mode hook which is called when the mode is activated in a
buffer. This allows users to extend or modify the mode — like by
enabling additional minor modes — without editing the mode’s source
code directly.</p>

<p>To make our hook work we need to advise <code class="language-plaintext highlighter-rouge">narrow-to-region</code> and <code class="language-plaintext highlighter-rouge">widen</code>
to run the hook after they’ve done their work. These are the primitive
narrowing functions which all the other narrowing functions eventually
call, like <code class="language-plaintext highlighter-rouge">narrow-to-defun</code>, <code class="language-plaintext highlighter-rouge">narrow-to-page</code>, and any other
mode-specific narrowing. <strong>Advising these two functions will cover all
buffer narrowing.</strong> It <em>should</em> be this simple.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">change-restriction-hook</span> <span class="p">())</span>

<span class="p">(</span><span class="nv">defadvice</span> <span class="nv">narrow-to-region</span> <span class="p">(</span><span class="nv">after</span> <span class="nv">hook</span> <span class="nv">activate</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">run-hooks</span> <span class="ss">'change-restriction-hook</span><span class="p">))</span>

<span class="p">(</span><span class="nv">defadvice</span> <span class="nv">widen</span> <span class="p">(</span><span class="nv">after</span> <span class="nv">hook</span> <span class="nv">activate</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">run-hooks</span> <span class="ss">'change-restriction-hook</span><span class="p">))</span>
</code></pre></div></div>

<p>At first this seems to work. I can add a test hook see them activate
when I use <code class="language-plaintext highlighter-rouge">M-x narrow-to-region</code> and <code class="language-plaintext highlighter-rouge">M-x widen</code>. However, when I use
other narrowing functions, like <code class="language-plaintext highlighter-rouge">narrow-to-defun</code>, my hook functions
aren’t called.</p>

<p>Is there a narrowing primitive I missed? I check the source
code. Nope, these are lisp functions which ultimately call
<code class="language-plaintext highlighter-rouge">narrow-to-region</code>. Is the advice not getting used when called
indirectly? I test that out.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">foo</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">narrow-to-region</span> <span class="mi">1</span> <span class="mi">2</span><span class="p">))</span>
</code></pre></div></div>

<p>This works fine. Hmmm, these other functions are byte-compiled, maybe
that’s the problem.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">byte-compile</span> <span class="ss">'foo</span><span class="p">)</span>
</code></pre></div></div>

<p>Bingo. The advice has stopped working. It has something to do with
byte-compilation.</p>

<h3 id="bytecode">Bytecode</h3>

<p>Let’s take a look at the bytecode for <code class="language-plaintext highlighter-rouge">foo</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">symbol-function</span> <span class="ss">'foo</span><span class="p">)</span>
<span class="c1">;; =&gt; #[nil "\300\301}\207" [1 2] 2 nil nil]</span>
</code></pre></div></div>

<p>I don’t know too much about Emacs’ byte code, but here’s the gist of
it. A compiled function is a special type of vector (hence the <code class="language-plaintext highlighter-rouge">#[]</code>
form). This is a legal s-expression which you can use directly in
regular Elisp code just like it was a function. The only reason you’d
do so is for obfuscation, so it would look very suspicious.</p>

<p>The first element of this function vector is the parameter list —
empty in this case. The second is a string containing the actual
bytecodes. The rest holds the various constants from the function
body. This includes the symbols of other functions called by this
function. It’s important to note that <strong><code class="language-plaintext highlighter-rouge">narrow-to-region</code> does not
appear in this list</strong>!</p>

<p>Curious. Let’s take a closer look at the bytecode.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">coerce</span> <span class="p">(</span><span class="nb">aref</span> <span class="p">(</span><span class="nb">symbol-function</span> <span class="ss">'foo2</span><span class="p">)</span> <span class="mi">1</span><span class="p">)</span> <span class="ss">'list</span><span class="p">)</span>
<span class="c1">;; =&gt; (192 193 125 135)</span>
</code></pre></div></div>

<p>Looking at <code class="language-plaintext highlighter-rouge">bytecomp.el</code> from the Emacs distribution I can see that
codes 192 and 193 are used for accessing constants. This pushes my
constants 1 and 2 onto a stack for use as function arguments. Next up
is 125, which corresponds to <code class="language-plaintext highlighter-rouge">byte-narrow-to-region</code>. Gotcha!</p>

<p>It turns out <code class="language-plaintext highlighter-rouge">narrow-to-region</code> is so special — probably because it’s
used very frequently — that it gets its own bytecode. The <strong>primitive
function call is being compiled away into a single instruction</strong>. This
means my advice will not be considered in byte-compiled code. Darnit.
The same is true for <code class="language-plaintext highlighter-rouge">widen</code> (code 126).</p>

<h3 id="where-to-go-now">Where to go now?</h3>

<p>Since it’s not possible to hook or advise the buffer-narrowing
primitives, impatient-mode would need to hook some other event that
tends to happen at the same time. Perhaps any time a command is
executed in the current buffer it could check for changes to the
buffer restriction and, if so, update any attached web clients. I’ll
figure something out.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>Turning Asynchronous into Synchronous in Elisp</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/01/14/"/>
    <id>urn:uuid:61c72b82-371c-304f-7e0e-f5ea4a990ef3</id>
    <updated>2013-01-14T00:00:00Z</updated>
    <category term="elisp"/><category term="emacs"/>
    <content type="html">
      <![CDATA[<p>As a <a href="/blog/2013/01/07/">new user of nREPL</a> I was poking around
nrepl.el, seeing what sorts of Elisp tricks I could learn. Even though
it was written 6 months before Skewer, and I was completely unaware of
nREPL’s existence until two weeks ago, there’s a lot of similarity
between nrepl.el and <a href="/blog/2012/10/31/">Skewer</a>. Due to serving the
same purpose for different platforms, this isn’t very surprising.</p>

<p>In particular, Skewer has <code class="language-plaintext highlighter-rouge">skewer-eval</code> for sending a string to the
browser for evaluation. Like JavaScript, Emacs Lisp is
single-threaded: there’s only one execution context at a time and it
has to return to the top-level before a new context can execute. There
are no continuations or coroutines. <code class="language-plaintext highlighter-rouge">skewer-eval</code> requires
coordination with an external process (the browser) making it
inherently asynchronous. So as a second, optional argument, a callback
can be provided for receiving the result.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Echo the result in the minibuffer.</span>
<span class="p">(</span><span class="nv">skewer-eval</span> <span class="s">"Math.pow(2.1, 3.1)"</span>
             <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">r</span><span class="p">)</span> <span class="p">(</span><span class="nv">message</span> <span class="p">(</span><span class="nb">cdr</span> <span class="p">(</span><span class="nb">assoc</span> <span class="ss">'value</span> <span class="nv">r</span><span class="p">)))))</span>
</code></pre></div></div>

<p>However, <strong>the equivalent function in nrepl.el, <code class="language-plaintext highlighter-rouge">nrepl-eval</code>, is
synchronous!</strong> It <em>returns</em> the evaluation result. “That’s not true!
That’s impossible!”</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; !!!</span>
<span class="p">(</span><span class="nv">plist-get</span> <span class="p">(</span><span class="nv">nrepl-eval</span> <span class="s">"(Math/pow 2.1 3.1)"</span><span class="p">)</span> <span class="ss">:value</span><span class="p">)</span>
<span class="c1">;; =&gt; "9.97423999265871"</span>
</code></pre></div></div>

<p>Well, it turns out what I said above about execution contexts wasn’t
completely true. There’s exactly <em>one</em> sneaky function that breaks the
rule: <code class="language-plaintext highlighter-rouge">accept-process-output</code>. It blocks the current execution context
allowing some other execution contexts to run, including timers and
I/O. However, it will lock up Emacs’ interface. <code class="language-plaintext highlighter-rouge">nrepl-eval</code> uses this
function to poll for a response from the nREPL process.</p>

<p>When I saw this, a lightbulb went off in my head. This lone loophole
in Emacs execution model can be abused to provide interesting
benefits. Specifically, it can be used to create a <strong><em>latch</em></strong>
synchronization primitive.</p>

<p>The full source code is here if you want to dive right in. I’ll be
going over a simplified version piece-by-piece below.</p>

<ul>
  <li><a href="https://github.com/skeeto/elisp-latch">https://github.com/skeeto/elisp-latch</a></li>
</ul>

<h3 id="the-latch-primitive">The Latch Primitive</h3>

<p>The idea of a latch is that a thread can <em>wait</em> on the latch, blocking
its execution. It will remain in that state until another thread
<em>notifies</em> the latch, releasing any threads blocked on the
latch. Here’s how it might look in Lisp.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">result</span> <span class="no">nil</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defvar</span> <span class="nv">my-latch</span> <span class="p">(</span><span class="nv">make-latch</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">get-result</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">if</span> <span class="nv">result</span>
      <span class="nv">result</span>
    <span class="p">(</span><span class="nv">wait</span> <span class="nv">my-latch</span><span class="p">)</span> <span class="c1">; Block, waiting for the result</span>
    <span class="nv">result</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">set-result</span> <span class="p">(</span><span class="nv">value</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="nv">result</span> <span class="nv">value</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">notify</span> <span class="nv">my-latch</span><span class="p">))</span> <span class="c1">; Release anyone waiting on my-latch</span>
</code></pre></div></div>

<p>The pattern above is similar to a <strong><em>promise</em></strong>, which we will later
implement on top of latches. In our latch implementation I’d also like
to optionally pass a value from <code class="language-plaintext highlighter-rouge">notify</code> to anyone <code class="language-plaintext highlighter-rouge">wait</code>ing, which
would make the above simpler.</p>

<p>Emacs doesn’t have threads but instead non-preemptive execution
contexts. Ignoring the Emacs UI lockup, we can mostly ignore that
distinction for now.</p>

<p>To exploit <code class="language-plaintext highlighter-rouge">accept-process-output</code> each latch needs to have its own
process object. When blocking on a latch it will simply wait for that
process to receive input. To notify a latch, we need to send data to
that process.</p>

<p>For the process, we’ll ask Emacs to make a pseudo-terminal “process.”
It’s basically just a pipe for Emacs to talk to itself. It’s possible
to literally make a pipe, which is better for this purpose, but
<a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=698096">that’s currently broken</a>. To make such a process, we call
<code class="language-plaintext highlighter-rouge">start-process</code> with <code class="language-plaintext highlighter-rouge">nil</code> as the program name (third argument).</p>

<p>Let’s start by making a new class called <code class="language-plaintext highlighter-rouge">latch</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'eieio</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defclass</span> <span class="nv">latch</span> <span class="p">()</span>
  <span class="p">((</span><span class="nv">process</span> <span class="ss">:initform</span> <span class="p">(</span><span class="nv">start-process</span> <span class="s">"latch"</span> <span class="no">nil</span> <span class="no">nil</span><span class="p">))</span>
   <span class="p">(</span><span class="nv">value</span> <span class="ss">:initform</span> <span class="no">nil</span><span class="p">)))</span>
</code></pre></div></div>

<p>This class has two slots, <code class="language-plaintext highlighter-rouge">process</code> and <code class="language-plaintext highlighter-rouge">value</code>. The process slot
holds the aforementioned process we’ll be blocking on. The <code class="language-plaintext highlighter-rouge">value</code>
slot will be used to pass a value from <code class="language-plaintext highlighter-rouge">notify</code> to <code class="language-plaintext highlighter-rouge">wait</code>. The
<code class="language-plaintext highlighter-rouge">process</code> slot is initialized with a brand new process object upon
instantiation.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmethod</span> <span class="nv">wait</span> <span class="p">((</span><span class="nv">latch</span> <span class="nv">latch</span><span class="p">))</span>
  <span class="p">(</span><span class="nv">accept-process-output</span> <span class="p">(</span><span class="nb">slot-value</span> <span class="nv">latch</span> <span class="ss">'process</span><span class="p">))</span>
  <span class="p">(</span><span class="nb">slot-value</span> <span class="nv">latch</span> <span class="ss">'value</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defmethod</span> <span class="nv">notify</span> <span class="p">((</span><span class="nv">latch</span> <span class="nv">latch</span><span class="p">)</span> <span class="k">&amp;optional</span> <span class="nv">value</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">slot-value</span> <span class="nv">latch</span> <span class="ss">'value</span><span class="p">)</span> <span class="nv">value</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">process-send-string</span> <span class="p">(</span><span class="nb">slot-value</span> <span class="nv">latch</span> <span class="ss">'process</span><span class="p">)</span> <span class="s">"\n"</span><span class="p">))</span>
</code></pre></div></div>

<p>To wait, call <code class="language-plaintext highlighter-rouge">accept-process-output</code> on the latch’s private
process. This function won’t return until data is sent to the
process. By that time, the <code class="language-plaintext highlighter-rouge">value</code> slot will be filled in with the
value from <code class="language-plaintext highlighter-rouge">notify</code>.</p>

<p>To notify, send a newline with <code class="language-plaintext highlighter-rouge">process-send-string</code>. The data to send
is arbitrary, but I wanted to send as little as possible (one byte)
and I figure a newline might be safer when it comes to flushing any
sort of buffer. Buffers tend to flush on newlines. Before sending
data, we set the <code class="language-plaintext highlighter-rouge">value</code> slot to the value that <code class="language-plaintext highlighter-rouge">wait</code> will return.</p>

<p>That’s basically it! However, processes are not garbage collected by
Emacs, so we need a <code class="language-plaintext highlighter-rouge">destroy</code> destructor method. The name <code class="language-plaintext highlighter-rouge">destroy</code>
here is not special to Emacs. It’s something for the user of the
library to call.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmethod</span> <span class="nv">destroy</span> <span class="p">((</span><span class="nv">latch</span> <span class="nv">latch</span><span class="p">))</span>
  <span class="p">(</span><span class="nb">ignore-errors</span>
    <span class="p">(</span><span class="nv">delete-process</span> <span class="p">(</span><span class="nb">slot-value</span> <span class="nv">latch</span> <span class="ss">'process</span><span class="p">))))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">make-latch</span> <span class="p">()</span>
  <span class="p">(</span><span class="nb">make-instance</span> <span class="ss">'latch</span><span class="p">))</span>
</code></pre></div></div>

<p>I also made a convenience constructor function <code class="language-plaintext highlighter-rouge">make-latch</code>, with the
conventional name <code class="language-plaintext highlighter-rouge">make-</code>, since users shouldn’t have to call
<code class="language-plaintext highlighter-rouge">make-instance</code> for our classes.</p>

<p>That’s enough to turn <code class="language-plaintext highlighter-rouge">skewer-eval</code> into a synchronous function.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">skewer-eval-synchronously</span> <span class="p">(</span><span class="nv">js-code</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">lexical-let</span> <span class="p">((</span><span class="nv">latch</span> <span class="p">(</span><span class="nv">make-latch</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">skewer-eval</span> <span class="nv">js-code</span> <span class="p">(</span><span class="nv">apply-partially</span> <span class="nf">#'</span><span class="nv">notify</span> <span class="nv">latch</span><span class="p">))</span>
    <span class="p">(</span><span class="nb">prog1</span> <span class="p">(</span><span class="nv">wait</span> <span class="nv">latch</span><span class="p">)</span>
      <span class="p">(</span><span class="nv">destroy</span> <span class="nv">latch</span><span class="p">))))</span>
</code></pre></div></div>

<p>In combination with <code class="language-plaintext highlighter-rouge">lexical-let</code>, <code class="language-plaintext highlighter-rouge">apply-partially</code> returns a closure
that will notify the latch with the return value passed to it from
skewer. We need to get the return value from <code class="language-plaintext highlighter-rouge">wait</code>, destroy the
latch, then return the value, so I use a <code class="language-plaintext highlighter-rouge">prog1</code> for this.</p>

<h3 id="one-use-latches">One-use Latches</h3>

<p>In my experimenting, I noticed the <code class="language-plaintext highlighter-rouge">prog1</code> pattern coming up a
lot. Having to destroy my latch after a single use was really
inconvenient. Fortunately this pattern can be captured by a subclass:
one-time-latch.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defclass</span> <span class="nv">one-time-latch</span> <span class="p">(</span><span class="nv">latch</span><span class="p">)</span>
  <span class="p">())</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">make-one-time-latch</span> <span class="p">()</span>
  <span class="p">(</span><span class="nb">make-instance</span> <span class="ss">'one-time-latch</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defmethod</span> <span class="nv">wait</span> <span class="ss">:after</span> <span class="p">((</span><span class="nv">latch</span> <span class="nv">one-time-latch</span><span class="p">))</span>
  <span class="p">(</span><span class="nv">destroy</span> <span class="nv">latch</span><span class="p">))</span>
</code></pre></div></div>

<p>This subclass destroys the latch after the superclass’s <code class="language-plaintext highlighter-rouge">wait</code> is
done, through an <code class="language-plaintext highlighter-rouge">:after</code> method (purely for side-effects). CLOS is
fun, isn’t it?</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">skewer-eval-synchronously</span> <span class="p">(</span><span class="nv">js-code</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">lexical-let</span> <span class="p">((</span><span class="nv">latch</span> <span class="p">(</span><span class="nv">make-one-time-latch</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">skewer-eval</span> <span class="nv">js-code</span> <span class="p">(</span><span class="nv">apply-partially</span> <span class="nf">#'</span><span class="nv">notify</span> <span class="nv">latch</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">wait</span> <span class="nv">latch</span><span class="p">)))</span>
</code></pre></div></div>

<p>There, that’s a lot more elegant.</p>

<p>If eieio was a more capable mini-CLOS I could also demonstrate a
<code class="language-plaintext highlighter-rouge">countdown-latch</code>, but this would require an <code class="language-plaintext highlighter-rouge">:around</code> method. Most
uses of <code class="language-plaintext highlighter-rouge">notify</code> would need to skip over the superclass method.</p>

<h3 id="promises">Promises</h3>

<p>We can build promises on top of our latch implementation. Basically, a
promise is a one-time-latch where we can query the <code class="language-plaintext highlighter-rouge">notify</code> value more
than once. In a one-time-latch we can only <code class="language-plaintext highlighter-rouge">wait</code> once.</p>

<p>Our promise will have two similar methods, <code class="language-plaintext highlighter-rouge">deliver</code> (like notify),
and <code class="language-plaintext highlighter-rouge">retrieve</code> (like wait). If a value has been delivered already,
<code class="language-plaintext highlighter-rouge">retrieve</code> will return that value. Otherwise, it will block and wait
until a value is delivered,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defclass</span> <span class="nv">promise</span> <span class="p">()</span>
  <span class="p">((</span><span class="nv">latch</span> <span class="ss">:initform</span> <span class="p">(</span><span class="nv">make-one-time-latch</span><span class="p">))</span>
   <span class="p">(</span><span class="nv">delivered</span> <span class="ss">:initform</span> <span class="no">nil</span><span class="p">)</span>
   <span class="p">(</span><span class="nv">value</span> <span class="ss">:initform</span> <span class="no">nil</span><span class="p">)))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">make-promise</span> <span class="p">()</span>
  <span class="p">(</span><span class="nb">make-instance</span> <span class="ss">'promise</span><span class="p">))</span>
</code></pre></div></div>

<p>It has three slots, the one-time-latch used for blocking, a Boolean
determining the delivery status, and the <code class="language-plaintext highlighter-rouge">value</code> of the promise.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defmethod</span> <span class="nv">deliver</span> <span class="p">((</span><span class="nv">promise</span> <span class="nv">promise</span><span class="p">)</span> <span class="nv">value</span><span class="p">)</span>
  <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">slot-value</span> <span class="nv">promise</span> <span class="ss">'delivered</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">error</span> <span class="s">"Promise has already been delivered."</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">slot-value</span> <span class="nv">promise</span> <span class="ss">'value</span><span class="p">)</span> <span class="nv">value</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">slot-value</span> <span class="nv">promise</span> <span class="ss">'delivered</span><span class="p">)</span> <span class="no">t</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">notify</span> <span class="p">(</span><span class="nb">slot-value</span> <span class="nv">promise</span> <span class="ss">'latch</span><span class="p">)</span> <span class="nv">value</span><span class="p">)))</span>

<span class="p">(</span><span class="nb">defmethod</span> <span class="nv">retrieve</span> <span class="p">((</span><span class="nv">promise</span> <span class="nv">promise</span><span class="p">))</span>
  <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">slot-value</span> <span class="nv">promise</span> <span class="ss">'delivered</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">slot-value</span> <span class="nv">promise</span> <span class="ss">'value</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">wait</span> <span class="p">(</span><span class="nb">slot-value</span> <span class="nv">promise</span> <span class="ss">'latch</span><span class="p">))))</span>
</code></pre></div></div>

<p>A promise can only be delivered once, so it throws an error if it is
attempted more than once. Otherwise it updates the promise state and
releases anything waiting on it.</p>

<h3 id="what-to-do-with-this">What to do with this?</h3>

<p>Locking up Emacs’ UI really limits the usefulness of this
library. Since Emacs’ primary purpose is being a text editor, it needs
to remain very lively or else the user will become annoyed. If I used
a synchronous version of <code class="language-plaintext highlighter-rouge">skewer-eval</code>, Emacs would completely lock up
(easily interrupted with <code class="language-plaintext highlighter-rouge">C-g</code>) until the browser responds — which
would be never if no browser is connected. That’s unacceptable.</p>

<p>Also, not very many Emacs functions have the callback pattern. The
only core function I’m aware of that does is <code class="language-plaintext highlighter-rouge">url-retrieve</code>, but it
already has a <code class="language-plaintext highlighter-rouge">url-retrieve-synchronously</code> counterpart.</p>

<p>Please tell me if you have a neat use of any of this!</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>Clojure and Emacs for Lispers</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2013/01/07/"/>
    <id>urn:uuid:5a316444-8f60-30c0-2a94-b87eb239eb13</id>
    <updated>2013-01-07T00:00:00Z</updated>
    <category term="lisp"/><category term="emacs"/><category term="clojure"/>
    <content type="html">
      <![CDATA[<p>According to my e-mail archives I’ve been interested in
<a href="http://clojure.org/">Clojure</a> for about three and a half years now. During that
period I would occasionally spend an evening trying to pick it up,
only to give up after getting stuck on some installation or
configuration issue. With a little bit of pushing from <a href="http://50ply.com/">Brian</a>,
and the fact that this installation and configuration <em>is now
trivial</em>, I finally broke that losing streak last week.</p>

<h3 id="im-damn-picky">I’m Damn Picky</h3>

<p>Personally, there’s a high barrier in place to learn new programming
languages. It’s entirely my own fault. I’m <em>really</em> picky about my
development environment. If I’m going to write code in a language I
need Emacs to support a comfortable workflow around it. Otherwise
progress feels agonizingly sluggish. If at all possible this means
live interaction with the runtime (Lisp, JavaScript). If not, then I
need to be able to invoke builds and run tests from within Emacs (C,
Java). Basically, <strong>I want to leave the Emacs window as infrequently
possible</strong>.</p>

<p>I also need a major mode with decent indentation support. This tends
to be the hardest part to create. Automatic indentation in Emacs is
considered a black magic. Fortunately, it’s unusual to come across a
language that doesn’t already have a major mode written for it. It’s
only happened once for me and that’s because it was a custom language
for a computer languages course. To remedy this, I ended up
<a href="/blog/2012/09/20/">writing my own major mode</a>, including in-Emacs evaluation.</p>

<p>Unsatisfied with JDEE, I did the same for Java,
<a href="/blog/2011/11/19/">growing my own extensions</a> to support my development for the
couple of years when Java was my primary programming language. The
dread of having to switch back and forth between Emacs and my browser
kept me away from web development for years. That changed this past
October when I <a href="/blog/2012/10/31/">wrote skewer-mode</a> to support interactive
JavaScript development. JavaScript is now one of my favorite
programming languages.</p>

<p>I’ve wasted enough time in my life configuring and installing
software. I hate sinking time into doing so without capturing that
work in source control, so that I never need to spend time on that
particular thing again. I don’t mean the installation itself but the
configuration — the difference from the defaults. (And the better the
defaults, the smaller my configuration needs to be.) With
<a href="/blog/2012/06/23/">my dotfiles repository</a> and Debian, I can go from a
computer with no operating system to a fully productive development
environment inside of about one hour. Almost all of that time is just
waiting on Debian to install all its packages. Any new language
development workflow needs to be compatible with this.</p>

<h3 id="clojure-installation">Clojure Installation</h3>

<p>Until last year sometime the standard way to interact with Clojure
from Emacs was through <a href="https://github.com/technomancy/swank-clojure">swank-clojure</a> with
SLIME. Well, installing <a href="/blog/2010/01/15/">SLIME itself can be a pain</a>.
<a href="http://www.quicklisp.org/">Quicklisp</a> now makes this part trivial but it’s specific
to Common Lisp. This is also a conflict with Common Lisp, so I’d
basically need to choose one language or the other.</p>

<p>SLIME doesn’t have any official stable releases. On top of this, the
SWANK protocol is undocumented and subject to change at any time. As a
result, SWANK backends are generally tied to a very specific version
of SLIME and it’s not unusual for something to break when upgrading
one or the other. I know because I wrote
<a href="/blog/2011/01/30/">my own SWANK backend</a> for BrianScheme. Thanks to
Quicklisp, today this isn’t an issue for Common Lisp users, but it’s
not as much help for Clojure.</p>

<p>The good news is that <strong>swank-clojure is now depreciated</strong>. The
replacement is a similar, but entirely independent, library called
<strong>nREPL</strong>. (I’d link to it but there doesn’t seem to be a website.)
Additionally, there’s an excellent Emacs interface to it:
<a href="https://github.com/kingtim/nrepl.el">nrepl.el</a>. It’s available on MELPA, so installation is
trivial.</p>

<p>There’s also a clojure-mode package on MELPA, so install that, too.</p>

<p>That covers the Emacs side of things, so what about Clojure itself?
The Clojure community is a fast-moving target and the Debian packages
can’t quite keep up. At the time of this writing they’re too old to
use nREPL. The good news is that there’s an alternative that’s just as
good, if not better: <a href="http://leiningen.org/">Leiningen</a>.</p>

<p>Leiningen is the standard Clojure build tool and dependency
manager. Here, “dependencies” includes Clojure itself. If you have
Leiningen you have Clojure. Installing Leiningen is as simple as
placing a single shell script in your <code class="language-plaintext highlighter-rouge">$PATH</code>. Since I always have
<code class="language-plaintext highlighter-rouge">~/bin</code> in my <code class="language-plaintext highlighter-rouge">$PATH</code>, all I need to do is wget/curl the script there
and <code class="language-plaintext highlighter-rouge">chmod +x</code> it. The first time it runs it pulls down all of its own
dependencies automatically. Right now the biggest downside seems to be
that it’s really slow to start. I think the JVM warmup time is to
blame.</p>

<p>Let’s review. To install a working Emacs live-interaction Clojure
development environment,</p>

<ul>
  <li>
    <p><strong>Install the nrepl.el package in Emacs.</strong> For me this happens
automatically by the configuration in my
<a href="/blog/2011/10/19/">.emacs.d repository</a>. I only had to do this step once.</p>
  </li>
  <li>
    <p><strong>Install the clojure-mode package.</strong> Same deal.</p>
  </li>
  <li>
    <p><strong>Install a JDK.</strong> OpenJDK is probably in your system’s package
manager, so this is trivial.</p>
  </li>
  <li>
    <p><strong>Put the <code class="language-plaintext highlighter-rouge">lein</code> shell script in the <code class="language-plaintext highlighter-rouge">$PATH</code>.</strong> This takes about
five seconds. If even this was too much for my precious
sensibilities I could put this script in my dotfiles repository.</p>
  </li>
</ul>

<p>With this all in place, do <code class="language-plaintext highlighter-rouge">M-x nrepl-jack-in</code> in Emacs and any
clojure-mode buffer will be ready to evaluate code as expected. It’s
wonderful.</p>

<h3 id="further-extending-emacs">Further Extending Emacs</h3>

<p>I made some tweaks to further increase my comfort. Perhaps nREPL’s
biggest annoyance is not focusing the error buffer, like all the other
interactive modes. Once I’m done glancing at it I’ll dismiss it with
<code class="language-plaintext highlighter-rouge">q</code>. This advice fixes that.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defadvice</span> <span class="nv">nrepl-default-err-handler</span> <span class="p">(</span><span class="nv">after</span> <span class="nv">nrepl-focus-errors</span> <span class="nv">activate</span><span class="p">)</span>
  <span class="s">"Focus the error buffer after errors, like Emacs normally does."</span>
  <span class="p">(</span><span class="nv">select-window</span> <span class="p">(</span><span class="nv">get-buffer-window</span> <span class="s">"*nrepl-error*"</span><span class="p">)))</span>
</code></pre></div></div>

<p>I also like having expressions flash when I evaluate them. Both SLIME
and Skewer do this. This uses <code class="language-plaintext highlighter-rouge">slime-flash-region</code> to do so when
available.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defadvice</span> <span class="nv">nrepl-eval-last-expression</span> <span class="p">(</span><span class="nv">after</span> <span class="nv">nrepl-flash-last</span> <span class="nv">activate</span><span class="p">)</span>
  <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">fboundp</span> <span class="ss">'slime-flash-region</span><span class="p">)</span>
      <span class="p">(</span><span class="nv">slime-flash-region</span> <span class="p">(</span><span class="nv">save-excursion</span> <span class="p">(</span><span class="nv">backward-sexp</span><span class="p">)</span> <span class="p">(</span><span class="nv">point</span><span class="p">))</span> <span class="p">(</span><span class="nv">point</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">defadvice</span> <span class="nv">nrepl-eval-expression-at-point</span> <span class="p">(</span><span class="nv">after</span> <span class="nv">nrepl-flash-at</span> <span class="nv">activate</span><span class="p">)</span>
  <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">fboundp</span> <span class="ss">'slime-flash-region</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">apply</span> <span class="nf">#'</span><span class="nv">slime-flash-region</span> <span class="p">(</span><span class="nv">nrepl-region-for-expression-at-point</span><span class="p">))))</span>
</code></pre></div></div>

<p>For Lisp modes I use parenface to de-emphasize parenthesis. Reading
Lisp is more about indentation than parenthesis. Clojure uses square
brackets (<code class="language-plaintext highlighter-rouge">[]</code>) and curly braces (<code class="language-plaintext highlighter-rouge">{}</code>) heavily, so these now also get
special highlighting. See <a href="/blog/2011/10/19/">my .emacs.d</a> for that. Here’s
what it looks like,</p>

<p><img src="/img/screenshot/clojure-brackets.png" alt="" /></p>

<h3 id="learning-clojure">Learning Clojure</h3>

<p>The next step is actually learning Clojure. I already know Common Lisp
very well. It has a lot in common with Clojure so I didn’t want to
start from a pure introductory text. More importantly, I needed to
know upfront <a href="http://clojure.org/lisps">which of my pre-conceptions were wrong</a>. This was
an issue I had, and still have, with JavaScript. Nearly all the
introductory texts for JavaScript are aimed at beginner
programmers. It’s a lot of text for very little new information.</p>

<p>More good news! There’s a very thorough Clojure introductory guide
that starts at a reasonable level of knowledge.</p>

<ul>
  <li><a href="http://java.ociweb.com/mark/clojure/article.html">Clojure - Functional Programming for the JVM</a></li>
</ul>

<p>A few hours going through that while experimenting in a <code class="language-plaintext highlighter-rouge">*clojure*</code>
scratch buffer and I was already feeling pretty comfortable. With a
few months of studying <a href="http://clojure.github.com/clojure/">the API</a>, learning the idioms, and
practicing, I expect to be a fluent speaker.</p>

<p>I think it’s ultimately a good thing I didn’t get into Clojure a
couple of years ago. That gave me time to build up — as a sort of
rite of passage — needed knowledge and experience with Java, which
deliberately, through the interop, plays a significant role in
Clojure.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>An Emacs Pastebin</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/12/29/"/>
    <id>urn:uuid:cbfbf5b0-607d-34d0-6f31-b2712d4e421f</id>
    <updated>2012-12-29T00:00:00Z</updated>
    <category term="elisp"/><category term="emacs"/><category term="javascript"/><category term="web"/>
    <content type="html">
      <![CDATA[<p>Luke is doing an interesting <s>three</s>five-part tutorial on writing
a pastebin in PHP: <a href="http://terminally-incoherent.com/blog/2012/12/17/php-like-a-pro-part-1/">PHP Like a Pro</a> (<a href="http://terminally-incoherent.com/blog/2012/12/19/php-like-a-pro-part-2/">2</a>, <a href="http://terminally-incoherent.com/blog/2012/12/26/php-like-a-pro-part-3/">3</a>,
<a href="http://terminally-incoherent.com/blog/2013/01/02/php-like-a-pro-part-4/">4</a>, <a href="http://terminally-incoherent.com/blog/2013/01/04/php-like-a-pro-part-5/">5</a>). The tutorial is largely an introduction to
the set of tools a professional would use to accomplish a more
involved project, the most interesting of which, for me, is
<a href="http://vagrantup.com/">Vagrant</a>.</p>

<p>Because I have <a href="http://me.veekun.com/blog/2012/04/09/php-a-fractal-of-bad-design/">no intention of ever using PHP</a>, I decided to
follow along in parallel with my own version. I used Emacs Lisp with
my <a href="/blog/2012/08/20/">simple-httpd</a> package for the server. I really
like my servlet API so was a lot more fun than I expected it to be!
Here’s the source code,</p>

<ul>
  <li><a href="https://github.com/skeeto/emacs-pastebin">https://github.com/skeeto/emacs-pastebin</a></li>
</ul>

<p>Here’s what it looked like once I was all done,</p>

<p><a href="/img/screenshot/pastebin.png"><img src="/img/screenshot/pastebin-thumb.png" alt="" /></a></p>

<p>It has syntax highlighting, paste expiration, and light version
control. The server side is as simple as possible, consisting of only
three servlets,</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">/pastebin/</code>: static files</li>
  <li><code class="language-plaintext highlighter-rouge">/pastebin/get</code>: serves (immutable) pastes in JSON</li>
  <li><code class="language-plaintext highlighter-rouge">/pastebin/post</code>: accepts new pastes in JSON, returns the ID</li>
</ul>

<p>A paste’s JSON is the raw paste content plus some metadata, including
post date, expiration date, language (highlighting), parent paste ID,
and title. That’s it! The server is just a database and static file
host. It performs no dynamic page generation. Instead, the client-side
JavaScript does all the work.</p>

<p>For you non-Emacs users, the repository has a <code class="language-plaintext highlighter-rouge">pastebin-standalone.el</code>
which can be used to launch a standalone instance of the pastebin
server, so long as you have Emacs on your computer. It will fetch any
needed dependencies automatically. See the header comment of this file
for instructions.</p>

<h3 id="ids">IDs</h3>

<p>A paste ID is four or more randomly-generated numbers, letters, dashes
or underscores, with some minor restrictions (<code class="language-plaintext highlighter-rouge">pastebin-id-valid-p</code>).
It’s appended to the end of the servlet URL.</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">/pastebin/&lt;id&gt;</code></li>
  <li><code class="language-plaintext highlighter-rouge">/pastebin/get/&lt;id&gt;</code></li>
</ul>

<p>In the first case, the servlet entirely ignores the ID. Its job is
only to serve static files. In the second case the server looks up the
ID in the database and returns the paste JSON.</p>

<p>The client-side inspects the page’s URL to determine the ID currently
being viewed, if any. It performs an asynchronous request to
<code class="language-plaintext highlighter-rouge">/pastebin/get/&lt;id&gt;</code> to fetch the paste and insert the result, if
found, into the current page.</p>

<p>Form submission isn’t done the normal way. Instead, the submission is
intercepted by an event handler, which wraps the form data up in JSON
(much cleaner to parse!) and sends it asynchronously to
<code class="language-plaintext highlighter-rouge">/pastebin/post</code> via POST. This servlet inserts the paste in the
database and responds in <code class="language-plaintext highlighter-rouge">text/plain</code> with the paste ID it
generated. The client-side then redirects the browser to the paste URL
for that paste.</p>

<h3 id="features">Features</h3>

<p>As I said, the server performs no page generation, so syntax
highlighting is done in the client with
<a href="http://softwaremaniacs.org/soft/highlight/en/">highlight.js</a>. I <em>could</em> have used <a href="http://emacswiki.org/emacs/Htmlize">htmlize</a>
and supported any language that Emacs supports. However, I wanted to
keep the server as simple as possible, and, more importantly, I
<em>really</em> don’t trust Emacs’ various modes to be secure in operating on
arbitrary data. That’s a huge attack surface and these modes were
written without security in mind (fairly reasonable). It’s actually a
deliberate feature for Emacs to automatically <code class="language-plaintext highlighter-rouge">eval</code> Elisp in comments
<a href="http://www.gnu.org/software/emacs/manual/html_node/emacs/Specifying-File-Variables.html">under certain circumstances</a>.</p>

<p>Version control is accomplished by keeping track of which paste was
the parent of the paste being posted. When viewing a paste, the
content is also placed in a textarea for editing. Submitting this form
will create a new paste with the current paste as the parent. When
viewing a paste that has a parent, a “diff” option is provided to view
a diff patch of the current paste with its parent (see the screenshot
above). Again, the server is dead simple, so this patch is computed by
JavaScript after fetching the parent paste from the server.</p>

<h3 id="databases">Databases</h3>

<p>As part of my fun I made a generic database API for the servlets, then
implemented three different database backends. I used eieio, Emacs
Lisp’s CLOS-like object system, to implement this API. Creating a new
database backend is just a matter of making a new class that
implements two specific methods.</p>

<p>The first, and default, implementation uses an Elisp hash table for
storage, which is lost when Emacs exits.</p>

<p>The second is a flat-file database. I estimate it should be able to
support at least 16 million different pastes gracefully. The on-disk
format for pastes is an s-expression. Basically, this is read by
Emacs, expiration date checked, converted to JSON, then served to the
client.</p>

<p>To my great surprise there is practically no support for programmatic
access to a SQL database from <em>GNU</em> Emacs Lisp (other Emacsen do). The
closest I found was <a href="http://www.online-marketwatch.com/pgel/pg.html">pg.el</a>, which is asynchronous by
necessity. However, the specific target I had in mind was SQLite.</p>

<p>I <em>did</em> manage to implement a third backend that uses SQLite, but it’s
a big hack. It invokes the <code class="language-plaintext highlighter-rouge">sqlite3</code> command line program once for
every request, asking for a response in CSV — the only output format
that seems to escape unambiguously. This response then has to be
parsed, so long as it’s not too long to blow the regex stack.</p>

<p><em>Update February 2014</em>: I have
<a href="/blog/2014/02/06/">found a solution to this problem</a>!</p>

<h3 id="future">Future</h3>

<p>This has been an educational project for me. As a tutorial and for
practice I’ll probably write the server again from scratch using other
languages and platforms (Node.js and Hunchentoot maybe?), keeping the
same front-end.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Skewer: Emacs Live Browser Interaction</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/10/31/"/>
    <id>urn:uuid:2564011d-0057-3a3e-eaff-c28748d077b0</id>
    <updated>2012-10-31T00:00:00Z</updated>
    <category term="emacs"/><category term="javascript"/>
    <content type="html">
      <![CDATA[<p>Inspired by <a href="http://youtu.be/qwtVtcQQfqc">Emacs Rocks! Episode 11</a> on
<a href="https://github.com/swank-js/swank-js">swank-js</a>, I spent the last week writing a new extension to
Emacs to improve support for web development. It’s called
<a href="https://github.com/skeeto/skewer-mode">Skewer</a> and it allows you to interact with a browser
like you would an inferior Lisp process. It’s written in pure Emacs
Lisp, operates as a servlet for <a href="/blog/2009/05/17/">my Elisp webserver</a>,
and <strong>requires no special support from your browser or any other
external programs</strong>, making it portable and very easy to set up.</p>

<h3 id="repository">Repository</h3>

<ul>
  <li>
    <p>Available on <a href="http://melpa.milkbox.net/">MELPA</a></p>
  </li>
  <li>
    <p><a href="https://github.com/skeeto/skewer-mode">https://github.com/skeeto/skewer-mode</a></p>
  </li>
</ul>

<h3 id="demo">Demo</h3>

<p>(No audio.)</p>

<video src="https://nullprogram.s3.amazonaws.com/skewer/demo.webm" controls="controls" width="600" height="375">
       <a href="http://youtu.be/4tyTgyzUJqM">YouTube video</a>
</video>

<p>The video also <a href="http://youtu.be/4tyTgyzUJqM">on YouTube</a>.</p>

<p>It works a little bit like <a href="http://50ply.com/blog/2012/08/13/introducing-impatient-mode/">impatient-mode</a>. First,
the browser makes a long poll to Emacs. When you’re ready to send code
to the browser to evaluate, Emacs wraps the expression in a bit of
JSON and sends it to the browser. The browser responds with the result
and starts another long poll.</p>

<p>As such, the browser doesn’t need to do anything special to support
Skewer. If it can run jQuery, it can be skewered. I’ve tested it and
found it working successfully on the latest versions of all the major
browsers, including <em>you-know-who</em>.</p>

<p>To properly grab expressions around/before the point I’m using the
<em>amazing</em> <a href="https://github.com/mooz/js2-mode">js2-mode</a>, originally written by the famous Steve
Yegge. If you’re developing JavaScript you should be using this mode
anyway! I thought I was clever with my <a href="https://github.com/skeeto/psl-mode">psl-mode</a>, writing
my own full language parser. Steve Yegge did the same thing on a much
larger scale three years ago with js2-mode. It includes an entire
JavaScript 1.8 parser so the mode has <em>full</em> semantic understanding of
the language. For Skewer, I use js2-mode’s functions to access the AST
and extract complete, valid expressions.</p>

<h3 id="whats-wrong-with-swank-js">What’s wrong with swank-js?</h3>

<p>Skewer provides nearly the same functionality as swank-js, a
JavaScript back-end to SLIME. At a glance my extension seems redundant.</p>

<p>The problem with swank-js is the complicated setup. It requires a
cooperating Node.js server, a particular version of SLIME, and a lot
of patience. I could never get it working, and if I did I wouldn’t
want to have to do all that setup again on another computer. In
contrast, Skewer is just another Emacs package, no special setup
needed. Thanks to <code class="language-plaintext highlighter-rouge">package.el</code> installing and using it should be no
more difficult than installing any other package.</p>

<p>Most importantly, with Skewer I can capture the setup in my
<a href="/blog/2011/10/19/">.emacs.d repository</a> where it will automatically
work across any operating system, so long as it has Emacs installed.</p>

<h3 id="getting-into-javascript">Getting into JavaScript</h3>

<p>I already used Skewer to develop a little <a href="https://github.com/skeeto/boids-js">boids toy</a>, which
I’m using to demonstrate the mode (the video). Unlike my previous
experiences in web development, this was extremely enjoyable —
probably because it felt a lot like I was writing Lisp. And unlike any
Lisp I’ve used so far, I had a canvas to draw on with my live
code. That’s a satisfying tool to have.</p>

<p>Due to those prior poor experiences, I had avoided web development for
a long time. But now that I have some decent tools configured I’m
going to get into it more. In fact, I’ve decided I’m completely done
with <a href="/toys/">writing Java applets</a>. Bounze will have been my last
one.</p>

<p>This has become a pattern for me. When I want to start using a new
language or platform I need to figure out a work-flow with Emacs. This
involves trying out new modes, reading about how other people do it,
and, ultimately, when I found out the existing stuff is inadequate I
build my own extensions to create the work-flow I desire. I did this
with <a href="/blog/2010/10/14/">Java</a>, recently with psl-mode (which was to
be expected), and now web development.</p>

<p>In my recent proper introduction JavaScript in order to create and
demo Skewer mode idiomatically, perhaps the most exciting discovery
this past week was the JavaScript community itself. I’ve been mostly
unaware of this community and taking my first steps into it has been
enlightening.</p>

<p>JavaScript had a rough start. It was designed in a rush by developers
who, at the time, didn’t quite understand the consequences of their
design decisions, and later extended by similar people. The name of
the language itself is evidence of this. Fortunately some really smart
people jumped on board along the way (including Guy Steele of Lisp
fame) and have tried to undo, or at least mitigate, the mistakes.</p>

<p>Due to the coarseness of the language, the JavaScript community is
actually a lot like the Elisp community, but on a larger scale:
there’s still a whole lot of frontier to explore and it’s pretty easy
to make a noticeable splash.</p>

<p>Here’s to splashing!</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>Emacs visual-indentation-mode</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/09/29/"/>
    <id>urn:uuid:6171fe76-7d01-3230-66b1-a9fa03078001</id>
    <updated>2012-09-29T00:00:00Z</updated>
    <category term="emacs"/>
    <content type="html">
      <![CDATA[<p>I watched this presentation last night about introducing Clojure in
the workplace, <a href="http://www.infoq.com/presentations/Bootstrapping-Clojure">Bootstrapping Clojure</a>. Most of the video is
Tyler presenting code he wrote, so there’s a lot of Clojure code
displayed in the presentation. The way he presented the code itself
was interesting: indentation was highlighted in alternating shades of
gray.</p>

<p>Such emphasis on indentation could be useful specifically for reading
Lisp, because, for humans, what makes s-expressions readable isn’t the
parenthesis but the indentation. That’s why us Lispers shake our heads
when non-Lispers complain that there are too many parenthesis. We
don’t notice them!</p>

<p>He’s a Vim user so I assume this is some Vim extension, but maybe it’s
just the output from a particular pretty printer. I thought it would
be interesting to have this feature in Emacs, so I created a minor
mode for it.</p>

<ul>
  <li><a href="https://github.com/skeeto/visual-indentation-mode">https://github.com/skeeto/visual-indentation-mode</a></li>
</ul>

<p>Here’s what it looks like with the default Emacs theme.</p>

<p><img src="/img/emacs/visual-indentation-mode-lisp.png" alt="" /></p>

<p>And some Java with <code class="language-plaintext highlighter-rouge">visual-indentation-width</code> set to 4,</p>

<p><img src="/img/emacs/visual-indentation-mode-java.png" alt="" /></p>

<p>Dark themes (like the Wombat theme I personally use) will work as
well, with the indentation highlighting appearing darker.</p>

<p>It can be enabled by default in all programming modes easily,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'prog-mode-hook</span> <span class="ss">'visual-indentation-mode</span><span class="p">)</span>
</code></pre></div></div>

<p>It completely falls apart when tabs are used for indentation, which
Emacs will use by default. <a href="/blog/2011/10/19/">My configuration</a>
forbids tabs (tabs are stupid, people!) but I still need to edit other
people’s code containing tabs. I don’t think there’s a way to apply
highlighting to <em>part</em> of a tab, so I’m not sure if there’s a way to
fix that. Because I don’t intend to actually use the mode regularly,
it’s just a proof of concept, and fixing it would be non-trivial, I
don’t intend to fix it.</p>

<h3 id="see-also">See Also</h3>

<ul>
  <li><a href="https://github.com/antonj/Highlight-Indentation-for-Emacs/">Highlighting indentation for Emacs</a></li>
  <li><a href="http://emacswiki.org/emacs/ShowWhiteSpace">Show White Space</a></li>
</ul>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Emacs Abnormal Termination</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/09/28/"/>
    <id>urn:uuid:81986f77-1a4a-3b00-ec3a-a6517f9ca4ca</id>
    <updated>2012-09-28T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="debian"/>
    <content type="html">
      <![CDATA[<p><em>Update: This bug was fixed in Emacs 24.4 (released October 2014).</em></p>

<p>A few months ago I <a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=682995">filed a bug report for Emacs</a>
(<a href="http://lists.gnu.org/archive/html/bug-gnu-emacs/2012-07/msg01071.html">upstream</a>) when I stumbled across Emacs aborting under <em>very</em>
specific circumstances. I was editing in <a href="http://jblevins.org/projects/markdown-mode/">markdown-mode</a> and a
regular expression replacement on lists would reliably, and
frustratingly, cause Emacs to crash.</p>

<p>Through a sort-of binary search I only loaded only half of
markdown-mode to see in which half it would trigger, then I cut that
half in half again and repeated recursively until I had it down to a
small expression that causes a <code class="language-plaintext highlighter-rouge">--no-init-file</code> (<code class="language-plaintext highlighter-rouge">-q</code>) Emacs to
abort. It almost looks like I found it through fuzz testing. Change or
remove anything even slightly and it no longer triggers the abort.</p>

<p>To trigger it, there’s an <code class="language-plaintext highlighter-rouge">after-change-functions</code> hook that performs
a regular expression search immediately after a <code class="language-plaintext highlighter-rouge">replace-regexp</code>. A
peek at the backtrace with gdb shows that this somehow causes the
point to leave the bounds of the buffer. Emacs detects this as an
assertion before dereferencing anything, and it aborts, thus
preventing a buffer overflow vulnerability. This is important for
<a href="/blog/2009/05/17/">my Emacs web server</a> because if there’s a way to
trigger this bug in the web server I’d much rather have it abort than
run arbitrary shellcode injected in by a malicious HTTP request.</p>

<p>My bug report has seen no activity since I posted it. I can understand
why. The circumstances to trigger it are unlikely and it’s a very old
bug, so it’s low priority. It’s also a huge pain to debug. Hacking on
Emacs from Lisp is pleasant but hacking on Emacs from C is not. The
bug likely sits in the bowels of the complicated regular expression
engine, making it even more unpleasant. I personally have no interest
in trying to fix it myself.</p>

<p>So, since it looks like it’s here for the long haul it’s kind of fun
to implement an <code class="language-plaintext highlighter-rouge">abort</code> function on top of it, allowing Elisp programs
to terminate Emacs abnormally — you know, in case <code class="language-plaintext highlighter-rouge">kill-emacs</code> isn’t
fun enough.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nb">abort</span> <span class="p">()</span>
  <span class="s">"Ask Emacs to abnormally terminate itself (bug#12077)."</span>
  <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">with-temp-buffer</span>
    <span class="p">(</span><span class="nv">insert</span> <span class="s">"#\n*\n"</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">goto-char</span> <span class="p">(</span><span class="nv">point-min</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'after-change-functions</span>
              <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">a</span> <span class="nv">b</span> <span class="nv">c</span><span class="p">)</span> <span class="p">(</span><span class="nv">re-search-forward</span> <span class="s">""</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">replace-regexp</span> <span class="s">"^\\*"</span> <span class="s">" *"</span><span class="p">)))</span>
</code></pre></div></div>

<p>It’s interactive so you could even bind a key to it.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Programs as Elisp Macros</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/09/21/"/>
    <id>urn:uuid:22a67760-c114-3285-fff8-36a6d23f0c65</id>
    <updated>2012-09-21T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>This evening I came across an interesting idea:
<a href="http://sunng.info/blog/2012/09/shake-every-program-can-be-a-clojure-function/">using system programs as functions</a>. The original idea goes to
<a href="http://amoffat.github.com/sh/index.html"><code class="language-plaintext highlighter-rouge">sh</code></a>, a Python module that
exposes system programs as functions. There’s also a Clojure library
called <a href="https://github.com/sunng87/shake/"><code class="language-plaintext highlighter-rouge">shake</code></a> to do the same thing in Clojure.</p>

<p>Thanks to symbols, I think the idea maps especially well onto Lisp
because arguments don’t need to be provided as strings. Here are some
examples,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(ls -lh)
(uname -a)
(cat /etc/debian_version)
(git checkout -b foo)
</code></pre></div></div>

<p>It’s easy to achieve the same effect in Elisp,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;;; -*- lexical-binding: t; -*-</span>
<span class="p">(</span><span class="nb">require</span> <span class="ss">'cl</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">make-shell-macro</span> <span class="p">(</span><span class="nv">program</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">fset</span> <span class="nv">program</span>
        <span class="p">(</span><span class="nb">cons</span> <span class="ss">'macro</span>
              <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">args</span><span class="p">)</span>
                <span class="o">`</span><span class="p">(</span><span class="nv">with-temp-buffer</span>
                   <span class="p">(</span><span class="nb">funcall</span> <span class="nf">#'</span><span class="nv">call-process</span>
                            <span class="o">,</span><span class="p">(</span><span class="nb">symbol-name</span> <span class="nv">program</span><span class="p">)</span> <span class="no">nil</span> <span class="no">t</span> <span class="no">nil</span>
                            <span class="o">,@</span><span class="p">(</span><span class="nb">mapcar</span> <span class="nf">#'</span><span class="nb">prin1-to-string</span> <span class="nv">args</span><span class="p">))</span>
                   <span class="p">(</span><span class="nv">buffer-string</span><span class="p">))))))</span>

<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">path</span> <span class="p">(</span><span class="nb">mapcan</span> <span class="nf">#'</span><span class="nv">directory-files</span> <span class="p">(</span><span class="nv">parse-colon-path</span> <span class="p">(</span><span class="nv">getenv</span> <span class="s">"PATH"</span><span class="p">)))))</span>
  <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">program</span> <span class="p">(</span><span class="nb">remove-if</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">f</span><span class="p">)</span> <span class="p">(</span><span class="nb">member</span> <span class="nv">f</span> <span class="o">'</span><span class="p">(</span><span class="s">"."</span> <span class="s">".."</span><span class="p">)))</span> <span class="nv">path</span><span class="p">))</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nc">symbol</span> <span class="p">(</span><span class="nb">intern</span> <span class="nv">program</span><span class="p">)))</span>
      <span class="p">(</span><span class="nb">unless</span> <span class="p">(</span><span class="nb">fboundp</span> <span class="nc">symbol</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">make-shell-macro</span> <span class="nc">symbol</span><span class="p">)))))</span>
</code></pre></div></div>

<p>Evaluating the above will install macros for all programs in your
<code class="language-plaintext highlighter-rouge">PATH</code>, except where you already have functions or macros defined. I
messed up on the latter point while writing this and broke Emacs
enough to require a restart. The system program is called
synchronously and the output is returned as a string.</p>

<p>However, <em>because</em> arguments aren’t evaluated (macros) this has
limited usefulness. These function calls are static and can’t be
passed variable arguments. In order to do that arguments would need to
be evaluated and symbols would need to be quoted. For example,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">git-checkout</span> <span class="p">(</span><span class="nv">branch</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">git</span> <span class="ss">'checkout</span> <span class="nv">branch</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">ls-l</span> <span class="p">(</span><span class="nv">file</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">ls</span> <span class="ss">'-l</span> <span class="nv">file</span><span class="p">))</span>
</code></pre></div></div>

<p>So I think I’d prefer this interface to the one provided by Clojure’s
<code class="language-plaintext highlighter-rouge">shake</code> (and my Elisp code at the top). I have little need to call
programs with static arguments.</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Elisp Recursive Descent Parser (rdp)</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/09/20/"/>
    <id>urn:uuid:0cb87ff3-6862-3772-6d64-3222ff8e56fe</id>
    <updated>2012-09-20T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="lisp"/>
    <content type="html">
      <![CDATA[<p>I recently developed a recursive descent parser, named rdp, for use in
Emacs Lisp programs. I’ve already used it to write a compiler.</p>

<ul>
  <li><a href="https://github.com/skeeto/rdp">https://github.com/skeeto/rdp</a></li>
</ul>

<p>It’s available as a package on <a href="http://melpa.milkbox.net/">MELPA</a>.</p>

<h3 id="the-long-story">The Long Story</h3>

<p>Last month <a href="http://www.50ply.com/">Brian</a> invited me to take
<a href="(http://www.cs.brown.edu/courses/cs173/2012/)">a free, online programming languages course</a> with him. You
may recall that <a href="/blog/2011/01/11/">we developed a programming language
together</a> so it was only natural we would take
this class.</p>

<p>The first part of the class is oriented around a small programming
language created just for this class called <a href="http://www.cs.brown.edu/courses/cs173/2012/Assignments/ParselTest/">ParselTongue</a>.  It
looks like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>deffun evenp(x)
    if ==(x, 0) then
        true
    else if ==(x, 1) then
            false
        else evenp(-(x, 2))
in defvar x = 14 in {
    while (evenp(x)) { x--; };   # Make sure x odd
    print("This is an odd number: ");
    print(x);
    ""; # No output
}
</code></pre></div></div>

<p>I’ve gotten so used to having a solid Emacs major mode when coding
that I can’t stand writing code without the support of a major
mode. Since this language was invented recently <em>just</em> for this class
there was no mode for it, nor would there be unless someone stepped up
to make one. I ended up taking that role. It was an opportunity to
learn how to create a major mode, something I had never done before.</p>

<p>It’s called <a href="https://github.com/skeeto/psl-mode">psl-mode</a>.</p>

<p>At first it was just some syntax highlighting (very easy) and some
poor automatic indentation. The indentation function would get
confused by anything non-trivial. It’s actually <em>really</em> hard to get
it right. I’ve grown a much better appreciation for automatic
indentation in other modes.</p>

<p>In an attempt to improve this I decided I would try to fully parse the
language and use the resulting parse tree to determine indentation —
something like the depth of the pointer in the
tree. <a href="/blog/2009/01/04/">My experience with Perl’s Parse::RecDescent</a>
some years ago was very positive and I wanted to reproduce that
effect. However, rather than write the grammar in a separate language
that mixes in the programming language, which I find extremely messy,
instead I wanted to use pure s-expressions. A grammar looks very nice
as an alist of symbols.</p>

<h4 id="arithmetic-parser">Arithmetic Parser</h4>

<p>For example, here’s a grammar for simple arithmetic expressions,
including operator precedence and grouping (i.e. “4 + 5 * 2.5”,
“(4 + 5) * 2.5”, etc.).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defvar</span> <span class="nv">arith-tokens</span>
  <span class="o">'</span><span class="p">((</span><span class="nv">sum</span>       <span class="nv">prod</span>  <span class="nv">[</span><span class="p">(</span><span class="nv">[+</span> <span class="nv">-]</span> <span class="nv">sum</span><span class="p">)</span>  <span class="nv">no-sum]</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">prod</span>      <span class="nv">value</span> <span class="nv">[</span><span class="p">(</span><span class="nv">[*</span> <span class="nv">/]</span> <span class="nv">prod</span><span class="p">)</span> <span class="nv">no-prod]</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">num</span>     <span class="o">.</span> <span class="s">"-?[0-9]+\\(\\.[0-9]*\\)?"</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">+</span>       <span class="o">.</span> <span class="s">"\\+"</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">-</span>       <span class="o">.</span> <span class="s">"-"</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">*</span>       <span class="o">.</span> <span class="s">"\\*"</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">/</span>       <span class="o">.</span> <span class="s">"/"</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">pexpr</span>     <span class="s">"("</span> <span class="nv">[sum</span> <span class="nv">prod</span> <span class="nv">num</span> <span class="nv">pexpr]</span> <span class="s">")"</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">value</span>   <span class="o">.</span> <span class="nv">[pexpr</span> <span class="nv">num]</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">no-prod</span> <span class="o">.</span> <span class="s">""</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">no-sum</span>  <span class="o">.</span> <span class="s">""</span><span class="p">)))</span>
</code></pre></div></div>

<p>Strings are regular expressions , the only thing to actually match
input text (<em>terminals</em>). Lists are <em>sequences</em>, where each element in
the list must match in order. Vectors (in brackets) are <em>choices</em>
where one of the elements must match. Symbols name an expression so
that it can be referred to by other expression recursively.</p>

<p>Give this alist to the parser and it will return an s-expression of
the parse tree of the current buffer. Due to the way the grammar must
be written this parse tree isn’t really pleasant to handle
directly. For example, a series of multiplications (“1 * 2 * 3 * 4”)
wouldn’t parse to a nice flat list but with further depth for each
additional operand.</p>

<p>To help squash these, the parser will accept an alist of symbols and
functions which process the parse tree at parse time. For example,
these corresponding functions will make sure <code class="language-plaintext highlighter-rouge">"4 * 5 * 6"</code> gets parsed
into <code class="language-plaintext highlighter-rouge">(* 4 (* 5 (* 6 1)))</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">arith-op</span> <span class="p">(</span><span class="nv">expr</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">destructuring-bind</span> <span class="p">(</span><span class="nv">a</span> <span class="p">(</span><span class="nv">op</span> <span class="nv">b</span><span class="p">))</span> <span class="nv">expr</span>
    <span class="p">(</span><span class="nb">list</span> <span class="nv">op</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>

<span class="p">(</span><span class="nb">defvar</span> <span class="nv">arith-funcs</span>
  <span class="o">`</span><span class="p">((</span><span class="nv">sum</span>     <span class="o">.</span> <span class="o">,</span><span class="nf">#'</span><span class="nv">arith-op</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">prod</span>    <span class="o">.</span> <span class="o">,</span><span class="nf">#'</span><span class="nv">arith-op</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">num</span>     <span class="o">.</span> <span class="o">,</span><span class="nf">#'</span><span class="nv">string-to-number</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">+</span>       <span class="o">.</span> <span class="o">,</span><span class="nf">#'</span><span class="nb">intern</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">-</span>       <span class="o">.</span> <span class="o">,</span><span class="nf">#'</span><span class="nb">intern</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">*</span>       <span class="o">.</span> <span class="o">,</span><span class="nf">#'</span><span class="nb">intern</span><span class="p">)</span>
    <span class="p">(</span><span class="nb">/</span>       <span class="o">.</span> <span class="o">,</span><span class="nf">#'</span><span class="nb">intern</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">pexpr</span>   <span class="o">.</span> <span class="o">,</span><span class="nf">#'</span><span class="nb">cadr</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">value</span>   <span class="o">.</span> <span class="o">,</span><span class="nf">#'</span><span class="nb">identity</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">no-prod</span> <span class="o">.</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">e</span><span class="p">)</span> <span class="o">'</span><span class="p">(</span><span class="nb">*</span> <span class="mi">1</span><span class="p">)))</span>
    <span class="p">(</span><span class="nv">no-sum</span>  <span class="o">.</span> <span class="o">,</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">e</span><span class="p">)</span> <span class="o">'</span><span class="p">(</span><span class="nb">+</span> <span class="mi">0</span><span class="p">)))))</span>
</code></pre></div></div>

<p>Notice how normal Emacs functions could be supplied directly in most
cases! That makes this approach so elegant in my opinion.</p>

<p>Also, in <code class="language-plaintext highlighter-rouge">arith-op</code> note the use of <code class="language-plaintext highlighter-rouge">destructuring-bind</code>. I’ve found
that macro to be invaluable when writing these syntax tree functions.</p>

<p>In this case, we can be even more clever. Rather than build a nice
parse tree, the expression can be evaluated directly. All it takes is
one small change,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">arith-op</span> <span class="p">(</span><span class="nv">expr</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">destructuring-bind</span> <span class="p">(</span><span class="nv">a</span> <span class="p">(</span><span class="nv">op</span> <span class="nv">b</span><span class="p">))</span> <span class="nv">expr</span>
    <span class="p">(</span><span class="nb">funcall</span> <span class="nv">op</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)))</span>
</code></pre></div></div>

<p>With this, the parser returns the computed value directly. So this
evaluates to 120.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">rdp-parse-string</span> <span class="s">"4 * 5 * 6"</span> <span class="nv">arith-tokens</span> <span class="nv">arith-funcs</span><span class="p">)</span>
</code></pre></div></div>

<h4 id="parseltongue-compiler">ParselTongue Compiler</h4>

<p>I discovered this useful side effect while making my ParselTongue
parser. The original intention was that I’d parse the buffer for use
in indentation, then maybe I’d create an interpreter to evaluate the
parser output. However, the resulting parse tree was looking a lot
like Elisp. In an epiphany I realized I could simply emit valid Elisp
directly and forgo writing the interpreter altogether. And so I
accidentally created a ParselTongue compiler! This was incredibly
exciting for me to realize.</p>

<p>This ParselTongue program,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>defvar obj = {x: 1} in { obj.x }
</code></pre></div></div>

<p>Compiles to this Elisp,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">obj</span> <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="nb">cons</span> <span class="ss">'x</span> <span class="mi">1</span><span class="p">))))</span>
  <span class="p">(</span><span class="k">progn</span> <span class="p">(</span><span class="nb">cdr</span> <span class="p">(</span><span class="nv">assq</span> <span class="ss">'x</span> <span class="nv">obj</span><span class="p">))))</span>
</code></pre></div></div>

<p>Because it compiles to such a high level language, and because
ParselTongue is very Lisp-like semantically, it’s a bit
unconventional: the compiler emits code <em>during</em> parsing. In fact,
when the parser backtracks, some emitted code is thrown away.</p>

<p>By the end of the first evening I had implemented the majority of the
compiler, which quickly took precedence over indentation. The compiler
is now integrated as part of psl-mode. The current buffer can be
evaluated at any time with <code class="language-plaintext highlighter-rouge">psl-eval-buffer</code>. This function compiles
the buffer and has Emacs <code class="language-plaintext highlighter-rouge">eval</code> the result, printing the output in the
minibuffer. Compiler output can be viewed with
<code class="language-plaintext highlighter-rouge">psl-show-elisp-compilation</code> (mostly for my own debugging).</p>

<p>After a few days I integrated indentation with parsing, which required
modifying the parser (changes included in rdp itself). The parser
needed to keep track of where the point is in the parse tree. For
indentation it basically counts the depth into the parse tree, plus a
few more checks for special cases.</p>

<p>The parser was intentionally isolated from the rest of psl-mode so
that it could be separated for general use, which I have now
done. It’s been a <em>really</em> handy general purpose tool since then. That
arithmetic parser is only 35 lines of code and took about half-an-hour
to create.</p>

<h4 id="future-directions">Future Directions</h4>

<p>I also <a href="(https://github.com/skeeto/emacs-torrent/blob/master/bencode.el)">wrote a bencode parser</a> — <em>only</em> the
<code class="language-plaintext highlighter-rouge">bencode-tokens</code> and <code class="language-plaintext highlighter-rouge">bencode-funcs</code> alists are needed to parse
bencode, about 30 LOC. Careful observation will reveal that I cheated
and the result is a little hackish. Due to the way strings work,
bencode is <em>not</em> context-free so it can’t be parsed purely by the
grammar. I can work around it by having the parse tree function for
strings consume input, since it’s called during parsing.</p>

<p>I’ll be using rdp to parse many more things in the future, I’m
sure. It’s much more powerful than I expected.</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>Fractal Rendering in Emacs</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/09/14/"/>
    <id>urn:uuid:71006dd1-ae7b-3860-e82b-4a0affd6524a</id>
    <updated>2012-09-14T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>Taking advantage of Emacs’ <code class="language-plaintext highlighter-rouge">image-mode</code> and the handy
<a href="http://en.wikipedia.org/wiki/Netpbm_format">Netpbm format</a> it’s
possible to generate and render images <em>inside</em> Emacs using
Elisp. This function will generate a
<a href="http://en.wikipedia.org/wiki/Sierpi%C5%84ski_carpet">Sierpinski carpet</a>
and display the result in a buffer.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">sierpinski</span> <span class="p">(</span><span class="nv">s</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">pop-to-buffer</span> <span class="p">(</span><span class="nv">get-buffer-create</span> <span class="s">"*sierpinski*"</span><span class="p">))</span>
  <span class="p">(</span><span class="nv">fundamental-mode</span><span class="p">)</span> <span class="p">(</span><span class="nv">erase-buffer</span><span class="p">)</span>
  <span class="p">(</span><span class="k">labels</span> <span class="p">((</span><span class="nv">fill-p</span> <span class="p">(</span><span class="nv">x</span> <span class="nv">y</span><span class="p">)</span>
                   <span class="p">(</span><span class="nb">cond</span> <span class="p">((</span><span class="nb">or</span> <span class="p">(</span><span class="nb">zerop</span> <span class="nv">x</span><span class="p">)</span> <span class="p">(</span><span class="nb">zerop</span> <span class="nv">y</span><span class="p">))</span> <span class="s">"0"</span><span class="p">)</span>
                         <span class="p">((</span><span class="nb">and</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">1</span> <span class="p">(</span><span class="nb">mod</span> <span class="nv">x</span> <span class="mi">3</span><span class="p">))</span> <span class="p">(</span><span class="nb">=</span> <span class="mi">1</span> <span class="p">(</span><span class="nb">mod</span> <span class="nv">y</span> <span class="mi">3</span><span class="p">)))</span> <span class="s">"1"</span><span class="p">)</span>
                         <span class="p">(</span><span class="no">t</span> <span class="p">(</span><span class="nv">fill-p</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">x</span> <span class="mi">3</span><span class="p">)</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">y</span> <span class="mi">3</span><span class="p">))))))</span>
    <span class="p">(</span><span class="nv">insert</span> <span class="p">(</span><span class="nb">format</span> <span class="s">"P1\n%d %d\n"</span> <span class="nv">s</span> <span class="nv">s</span><span class="p">))</span>
    <span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">y</span> <span class="nv">s</span><span class="p">)</span> <span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">x</span> <span class="nv">s</span><span class="p">)</span> <span class="p">(</span><span class="nv">insert</span> <span class="p">(</span><span class="nv">fill-p</span> <span class="nv">x</span> <span class="nv">y</span><span class="p">)</span> <span class="s">" "</span><span class="p">))))</span>
  <span class="p">(</span><span class="nv">image-mode</span><span class="p">))</span>
</code></pre></div></div>

<p>It’s best called with powers of three,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">sierpinski</span> <span class="p">(</span><span class="nb">expt</span> <span class="mi">3</span> <span class="mi">5</span><span class="p">))</span>
</code></pre></div></div>

<p><a href="/img/fractal/sierpinski.png"><img src="/img/fractal/sierpinski-thumb.png" alt="" /></a></p>

<p>This one should <a href="/blog/2007/10/01/">look quite familiar</a>. Using the
same technique,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">mandelbrot</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">pop-to-buffer</span> <span class="p">(</span><span class="nv">get-buffer-create</span> <span class="s">"*mandelbrot*"</span><span class="p">))</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">w</span> <span class="mi">400</span><span class="p">)</span> <span class="p">(</span><span class="nv">h</span> <span class="mi">300</span><span class="p">)</span> <span class="p">(</span><span class="nv">d</span> <span class="mi">32</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">fundamental-mode</span><span class="p">)</span> <span class="p">(</span><span class="nv">erase-buffer</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">set-buffer-multibyte</span> <span class="no">nil</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">insert</span> <span class="p">(</span><span class="nb">format</span> <span class="s">"P6\n%d %d\n255\n"</span> <span class="nv">w</span> <span class="nv">h</span><span class="p">))</span>
    <span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">y</span> <span class="nv">h</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">x</span> <span class="nv">w</span><span class="p">)</span>
        <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">cx</span> <span class="p">(</span><span class="nb">*</span> <span class="mf">1.5</span> <span class="p">(</span><span class="nb">/</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">x</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">w</span> <span class="mf">1.45</span><span class="p">))</span> <span class="nv">w</span> <span class="mf">0.45</span><span class="p">)))</span>
               <span class="p">(</span><span class="nv">cy</span> <span class="p">(</span><span class="nb">*</span> <span class="mf">1.5</span> <span class="p">(</span><span class="nb">/</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">y</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">h</span> <span class="mf">2.0</span><span class="p">))</span> <span class="nv">h</span> <span class="mf">0.5</span><span class="p">)))</span>
               <span class="p">(</span><span class="nv">zr</span> <span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="nv">zi</span> <span class="mi">0</span><span class="p">)</span>
               <span class="p">(</span><span class="nv">v</span> <span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">i</span> <span class="nv">d</span> <span class="nv">d</span><span class="p">)</span>
                    <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">&gt;</span> <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">zr</span> <span class="nv">zr</span><span class="p">)</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">zi</span> <span class="nv">zi</span><span class="p">))</span> <span class="mi">4</span><span class="p">)</span> <span class="p">(</span><span class="nb">return</span> <span class="nv">i</span><span class="p">)</span>
                      <span class="p">(</span><span class="nb">psetq</span> <span class="nv">zr</span> <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">zr</span> <span class="nv">zr</span><span class="p">)</span> <span class="p">(</span><span class="nb">-</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">zi</span> <span class="nv">zi</span><span class="p">))</span> <span class="nv">cx</span><span class="p">)</span>
                             <span class="nv">zi</span> <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nb">*</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">zr</span> <span class="nv">zi</span><span class="p">)</span> <span class="mi">2</span><span class="p">)</span> <span class="nv">cy</span><span class="p">))))))</span>
          <span class="p">(</span><span class="nv">insert-char</span> <span class="p">(</span><span class="nb">floor</span> <span class="p">(</span><span class="nb">*</span> <span class="mi">256</span> <span class="p">(</span><span class="nb">/</span> <span class="nv">v</span> <span class="mf">1.0</span> <span class="nv">d</span><span class="p">)))</span> <span class="mi">3</span><span class="p">))))</span>
    <span class="p">(</span><span class="nv">image-mode</span><span class="p">)))</span>
</code></pre></div></div>

<p><img src="/img/fractal/elisp-mandelbrot.png" alt="" /></p>

<p>Tweak it with a colormap,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">colormap</span> <span class="p">(</span><span class="nv">v</span><span class="p">)</span>
  <span class="s">"Given a value between 0 and 1.0, insert a P6 color."</span>
  <span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">i</span> <span class="mi">3</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">insert-char</span> <span class="p">(</span><span class="nb">floor</span> <span class="p">(</span><span class="nb">*</span> <span class="mi">256</span> <span class="p">(</span><span class="nb">min</span> <span class="mf">0.99</span> <span class="p">(</span><span class="nb">sqrt</span> <span class="p">(</span><span class="nb">*</span> <span class="p">(</span><span class="nb">-</span> <span class="mi">3</span> <span class="nv">i</span><span class="p">)</span> <span class="nv">v</span><span class="p">)))))</span> <span class="mi">1</span><span class="p">)))</span>
</code></pre></div></div>

<p><img src="/img/fractal/elisp-mandelbrot-color.png" alt="" /></p>

<p>One of the project ideas on my mental back-burner of things I’ll never
get to is to create a little graphics library for Elisp. It would use
a technique like this to pull it off. Assuming support was compiled
in, Emacs can even render SVGs to a buffer, so creating a rich
graphics library wouldn’t be difficult at all. Plus, unlike bare
Elisp, it would be <em>fast</em>.</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Markov Chain Text Generation</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/09/05/"/>
    <id>urn:uuid:3f808165-be65-3f4b-f485-8df6aacccd04</id>
    <updated>2012-09-05T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/><category term="ai"/>
    <content type="html">
      <![CDATA[<p>You may have been confused by
<a href="/blog/2012/09/04/">yesterday’s nonsense post</a>. That’s because it was
generated by a few
<a href="https://github.com/skeeto/markov-text">Elisp Markov chain functions</a>. It
was fed my entire blog and used to generate a ~1500 word post.  I
tidied up a bit to make sure the markup was valid and parenthesis were
balanced, but that’s about it.</p>

<p>The algorithm is really simple and I was quite surprised by the
quality of the output. After feeding it <em>Great Expectations</em> and <em>A
Princess of Mars</em> (easily obtainable from
<a href="http://www.gutenberg.org/">Project Gutenberg</a>) I had a good laugh at
some of the output. Some choice quotes,</p>

<blockquote>
  <p>He wiped himself again, as if he didn’t marry her by hand.</p>
</blockquote>

<blockquote>
  <p>I admit having done so, and the summer afternoon toned down into the
house.</p>
</blockquote>

<p>My favorite of yesterday’s post was this one,</p>

<blockquote>
  <p>Suppose you want to read a great story, I recommend it.</p>
</blockquote>

<p>The output also looks like some types of spam, so this may be how some
spammers generate content in order to get around spam filters.</p>

<p>To build a Markov chain from input, the program looks at
<code class="language-plaintext highlighter-rouge">markov-text-state-size</code> words (default 3) and makes note of what word
follows. Then it slides the window forward one word and repeats. To
generate text, the last <code class="language-plaintext highlighter-rouge">markov-text-state-size</code> words outputted is
the state and the next word is selected from these notes at random,
weighted by the frequency of its appearance in the input text. Smaller
state sizes generates more random output and larger state sizes
generates better structured output. Too large and the output is the
input verbatim.</p>

<p>For example, given this sentence and a state size of <em>two</em> words,</p>

<blockquote>
  <p>Quickly, he ran and he ran until he couldn’t.</p>
</blockquote>

<p>The produced chain looks like this in alist form,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>((("Quickly," "he") "ran")
 (("he" "ran") "and" "until")
 (("ran" "and") "he")
 (("and" "he") "ran")
 (("ran" "until") "he")
 (("until" "he") "couldn't.")
 (("he" "couldn't.")))
</code></pre></div></div>

<p><a href="/img/diagram/markov-chain.gv"><img src="/img/diagram/markov-chain.png" alt="" /></a></p>

<p>Because there are two options for (“he” “ran”), the generator might
loop around that state for awhile like so,</p>

<blockquote>
  <p>Quickly, he ran and he ran and he ran and he ran until he couldn’t.</p>
</blockquote>

<p>Or it might skip the section altogether,</p>

<blockquote>
  <p>Quickly, he ran until he couldn’t.</p>
</blockquote>

<p>Also notice that the punctuation is part of the word. This makes the
output more natural, automatically forming sentences. More so, my
program also holds onto all newlines. This breaks the output into nice
paragraphs without any extra effort. Since I wrote it in Elisp, I use
<code class="language-plaintext highlighter-rouge">fill-paragraph</code> to properly wrap the paragraphs as I generate them,
so superfluous single newlines don’t hurt anything.</p>

<p>One problem I did run into with my input text was quotes. I was using
novels so there is a lot of quoted text (character dialog). The
generated text tends to balance quotes poorly. My solution for the
moment is to strip these out along with spaces when forming
words. That’s still not ideal.</p>

<p>I’m going to play with this a bit more, using it as a tool for other
project ideas (ERC bot, etc.). I already did this by including a
<a href="http://en.wikipedia.org/wiki/Lorem_ipsum"><em>lorem ipsum</em></a> generator
alongside the <code class="language-plaintext highlighter-rouge">markov-text</code> package. The input text is Cicero’s <em>De
finibus bonorum et malorum</em>, the original source of <em>lorem
ipsum</em>. This was actually the original inspiration for this project,
after I saw <code class="language-plaintext highlighter-rouge">lorem-ipsum.el</code> on EmacsWiki and decided I could do
better.</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Implemented Is Simple Data Compression</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/09/04/"/>
    <id>urn:uuid:2dd8265c-e498-333f-7f07-4fd93d873975</id>
    <updated>2012-09-04T00:00:00Z</updated>
    <category term="emacs"/><category term="lisp"/><category term="ai"/>
    <content type="html">
      <![CDATA[<p><em>Update</em>: This post shouldn’t make sense to anyone
(hopefully). <a href="/blog/2012/09/05/">Read the follow-up</a> for an
explanation.</p>

<hr />

<p>When a branch of my posts remains simple.</p>

<p>This is necessary when one will assume Alan is more important than
number 12. By using numbers to repeat them, but this won’t work with
any sort of thing you want to load what’s needed. This includes
reimplementing the reader as it seems you still need to specify any
video-specific parameters, <code class="language-plaintext highlighter-rouge">ppmtoy4m</code> is the whole thing is just that,
decorated with some tips on how the current space as visited, then
recurse from the client to read a great story, I recommend you use to
launch a daemon process and prints the variable information to
stdout. As an added bonus, when a second variable for accumulation and
a second argument is relevant.</p>

<p>Suppose you want to read a great story, I recommend it.</p>

<p>This servlet uses the Term::ProgressBar, if it’s any good, but it’s
funny. As anyone with cats knows, it’s not <em>too</em> stupid to call
<code class="language-plaintext highlighter-rouge">fsync()</code> to force the write to the snapshot and uninterns any new
symbols. These symbols will be added to the the second experiment.</p>

<p>At this line, you can perform a number from a couple of these and give
them back any other language that can turn out even from a large
header comment in the logs, so getting someone into my honeypot
wouldn’t take long at all. The only proof I could then
cherry-pick/pull the issues from that repository and see the
polynomial interpolation at that time, presented in order. This makes
so much of web development (I think that’s his name). I am an Emacs
person myself, which I use branches all the time, now that they can be
written.</p>

<p>We will run your build system in a web front-end to it, and made a
couple of seconds.</p>

<p>You should also be a good head start, though. The SPARC is big-endian
and the results to seed their program accordingly. You could do this
is by mounting the compromised filesystem in a list. In the
decentralized model, everyone has their own solutions in parallel when
it comes across 10 it emits 0.</p>

<p>Here’s an example of some of the fire gem activated and exploded,
causing no blindness to me. They take a look at the same level as the
printed string. You can grab my source code in response to abuse by
spammers who hide fraudulent URLs behind shortened ones. If these
services ever went down all at once, these shortened URLs would rot,
destroying many of the image, with the FFI.</p>

<p>Because I wrote a shell script that will also remove the execs and
live with nested shells because the zeros cancel out everything else?
Here is the protocol.</p>

<p>Generate a 10-byte random IV. This need not implement this.</p>

<p>Note that the shell script, and the arcfour key scheduler at least n
days.</p>

<p>However, generating a series of commits to all other encounters
nothing changes.</p>

<p>Your program should simulate this by having the user to reseed
somewhere. There’s no direct way to install it to dominate for
awhile. It is strange that Matlab itself doesn’t have any sort of
syntax highlighting. Boring! I finally ran into this image. After each
paste, make a saving throw to prevent an explosion.</p>

<p>Because Gnohkk would also suffer from the bottom are arranged around
the cats in the logs, so getting someone into my honeypot wouldn’t
take long at the link in the block. Another was going to used a
stationary magnet.</p>

<p>Our team went with this array (and replaced the current layer 5). Now,
duplicate the work was done just once by freeing the entire number, it
can perform both compression and decompression on both sides don’t pay
attention to the development loop is just an ordered list of 50 H’s
and T’s. If you implement this in the same time. This is along the
way, clone my repository right into the official website so I had to
do this for any long-blocking function that I use <code class="language-plaintext highlighter-rouge">ppmtoy4m</code> to pipe
the new frames to keep, such as n^p mod M, which this will handle
efficiently. For example, to add a new compression algorithm in terms
of brute-force attacks it requires using numbers long enough to fit
three Emacs’ windows side-by-side at 78 columns each. The leftmost one
contains my active work buffer where I do most useful things, a fresh
array every time it sees a free musical.  Unfortunately, my writing
skills are even worse. I have gotten good mileage out of a file based
on their website demonstrating how to increment the iterator. I have
to type a negative comment about zip archives and moved on. I <em>am</em>
using a constant amount of memory.</p>

<p>It turns out that everyone is free to share his source code samples,
particularly more recent entries, was that producing the relief
surface was an e-mail address, I get home from work I don’t recommend
doing this with secret Java applets.</p>

<p>There are a few weeks since I last used KOffice, so I could easily
plug it into Emacs and run the test above, I would rather not do
damage, but rather a patient human being. Getting tired of manually
synchronizing them. It was finally time to document the effort as a
single mine is destroyed, the neighboring mines will replicate a
replacement. The minefield itself could therefore hold no secrets
whatsoever. This leaves out any possibility of a rumor among a group
of people. At any given time, each person in the background. My shell
habits looked like the ones you’re seeing after <code class="language-plaintext highlighter-rouge">end-package</code>.</p>

<p>It’s really simple way to detect edges all over the weekend I came up
with some rough edges. So I got it right while IE, Opera, Safari, and
Chrome all do it again.</p>

<p>Numbers can be found inside the fake closure provided by
lexical-let. In a previous post about Lua, another about a third of my
name generation code.</p>

<p>S-expressions are handy anywhere.</p>

<p>Two months ago I was so happy when I run the program with the proper
Perl regular expression contains quotes and these will not be worth
it.</p>

<p>I can’t help but think that a knight moving according to the current
symbol table to the existing mountain of elisp code out there,
requiring a massive increase in speed when using OpenCL. In fact,
there is virtually no computation involved. So what I want to look
like SBCL. Fortunately, that’s not all!!! There is a fake service or
computer on a chess board such that it’s somewhat easier to tell when
the handler can present any contents it wants. In this case, rather
than just one, even though I don’t know what it looks good, except you
want to italicize a few bits smaller than a minute. All the other day
I will probably be ordered by their own directory. Modern applications
have moved into a directory under <code class="language-plaintext highlighter-rouge">~/.config/</code>. Your script needs to
be broken into small computation units, because Emacs lacked network
functionality until recently was the package manager, <code class="language-plaintext highlighter-rouge">package</code>, and
the Emacs Lisp Package Archive.</p>

<p>One of the info field in the list, which sounds like a .emacs file in
your program. If the slot is already taken, the symbol was in an
external system.</p>

<p>After all this, I thought I’d give it a YouTube URL and a single
password if the required artifacts, digitally signs them, and bundles
them up.</p>

<p>The demo at the same length as the variable declarations are exactly
the right magical string of, say, 31 fractions.</p>

<p>The story is really happening. Optimizing away variables that point to
it.</p>

<p>Oh, and I was just a tiny subset of the memory at once became a lot of
memory. For example, here’s my laptop’s /bin/ls, very roughly
labeled.</p>

<p>The different segments of the game area was a mistake on my rolls and
had some wires, connected to some sort of bad things this may happen
subconsciously, which is given in ImageMagick’s montage tool, which
made the final montage out of the image functions described below.</p>

<p>You can write a lexer or tokenizer without one. Because of this tool,
Samuel Stoddard, gives some in-game context to the light of day. I
just use your own program, the script in your load-path somewhere.</p>

<p>I’ve frequently thought that a Lisp-based shell would be produced by
first individually gzipping each file in first.</p>

<p>For a long ways away from a simple double-click shortcut. If you just
want to duplicate the remaining canines. Her reward for victory was a
very similar process, but without any sort of thing is
transparent. I’ve already used it with a degree in, say, a few
months. I’ve used POSIX threads, Pthreads, before, so it suits my
needs for the first two arguments from filter2, as well as some more
to see my changes, but I don’t know much about it, user AJR spoiled it
with ssh-add and it queries for your passphrase, storing it in two
obarrays at once, these shortened URLs would rot, destroying many of
its input. For example, this is what registration looks like,</p>

<p>Unfortunately, the HTML output is a Harsh Mistress. If you know that
the opposite way that the adventures and characters are riddled with
mistakes and very unbalanced. For an easier way to set up properly in
your configuration.</p>

<p>I strongly recommend that you generally want to have a master pad, K,
that you often generate very improbable series of commits.</p>

<p>To all other encounters nothing changes.</p>

<p>And that’s it! I put this line in your program. If you are subscribed
to the rescue!</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>simple-httpd and impatient-mode</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/08/20/"/>
    <id>urn:uuid:627e438d-36e5-3a9d-dd35-6a9a3914a63a</id>
    <updated>2012-08-20T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>After <a href="/blog/2012/08/12/">settling in with MELPA</a> I wanted to see
about into turning <a href="/blog/2009/05/17/">my Emacs web server</a> into an
installable package. Someone had already uploaded my code to
<a href="http://marmalade-repo.org/">Marmalade</a> after taking credit for all
the work and slapping the GPL on it (my version is public domain). So,
due to that and because the name <code class="language-plaintext highlighter-rouge">httpd.el</code> is already overloaded as
it is, I renamed it to <code class="language-plaintext highlighter-rouge">simple-httpd</code>. That’s the name of the package
in MELPA.</p>

<p>I did more than rename the package; it got an overhaul. I rewrote a
few functions, tossed a whole bunch of functions, created
<a href="/blog/2012/08/15/">a test suite</a>, and <strong>finally added directory
listings</strong> — a feature that had long been on the TODO list. To keep
with the name “simple”, I ripped out the
<a href="/blog/2009/11/03/">clunky servlet system</a> (sorry Chunye). This new
version was leaner, cleaner, and more useful.</p>

<p>I’ve definitely improved my software development skill over the last
three years since I originally wrote it. In my refactor I made it
buffer oriented. When a request comes in, the server fills a buffer
with the response and sends it back. This means I could send a
<code class="language-plaintext highlighter-rouge">Content-Length</code> header and use keep-alive to serve multiple requests
over one connection. It also suggested a new servlet paradigm — the
servlet prepares a buffer and the server sends it to the client.</p>

<h3 id="servlets">Servlets</h3>

<p>So I ended up adding servlet support again, from scratch. This time
it’s really easy to use. Here’s a “Hello, World” servlet,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defservlet</span> <span class="nv">hello-world</span> <span class="nv">text/plain</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">insert</span> <span class="s">"Hello, World"</span><span class="p">))</span>
</code></pre></div></div>

<p>The “function name” part is the path to the servlet. This one would be
found at <code class="language-plaintext highlighter-rouge">/hello-world</code>. The second is the MIME type as a
symbol. We’re just sending plain text in this example. The third is
the argument list. A servlet takes up to three arguments: the path,
the query alist, and the full request object (which includes the first
two). Unless a more specific servlet is defined, this servlet handles
everything under its root. In this case <code class="language-plaintext highlighter-rouge">/hello-world</code>, including
<code class="language-plaintext highlighter-rouge">/hello-world/foo</code> and <code class="language-plaintext highlighter-rouge">/hello-world/foo/bar.txt</code>. This is why the
path argument is relevant.</p>

<p>This servlet uses the path to get a name,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defservlet</span> <span class="nv">hello</span> <span class="nv">text/plain</span> <span class="p">(</span><span class="nv">path</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">insert</span> <span class="s">"hello, "</span> <span class="p">(</span><span class="nv">file-name-nondirectory</span> <span class="nv">path</span><span class="p">)))</span>
</code></pre></div></div>

<p>If you visit <code class="language-plaintext highlighter-rouge">/hello/Chris</code> it will send you “Hello, Chris”. Servlets
are trivial to write!</p>

<p>This one serves the contents of the <code class="language-plaintext highlighter-rouge">*scratch*</code> buffer,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">defservlet</span> <span class="nv">scratch</span> <span class="nv">text/plain</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">insert-buffer-substring</span> <span class="p">(</span><span class="nv">get-buffer</span> <span class="s">"*scratch*"</span><span class="p">)))</span>
</code></pre></div></div>

<p>In the background I continue to use Chunye’s symbol dispatch
technique, so all servlets are actually functions that begin with
<code class="language-plaintext highlighter-rouge">httpd/</code> (<code class="language-plaintext highlighter-rouge">http/hello-world</code> and <code class="language-plaintext highlighter-rouge">httpd/hello</code>). For a more advanced
servlet, the function can be written directly. There’s another macro,
<code class="language-plaintext highlighter-rouge">with-httpd-buffer</code> to help keep this simple. The server will always
pass four arguments (the three servlet arguments plus one more), so
when creating the function directly it needs to accept at least four
arguments.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">httpd/hello</span> <span class="p">(</span><span class="nv">proc</span> <span class="nv">path</span> <span class="k">&amp;rest</span> <span class="nv">args</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">with-httpd-buffer</span> <span class="nv">proc</span> <span class="s">"text/plain"</span>
    <span class="p">(</span><span class="nv">insert</span> <span class="s">"hello, "</span> <span class="p">(</span><span class="nv">file-name-nondirectory</span> <span class="nv">path</span><span class="p">))))</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">proc</code> object here is the network connection process, providing
more exclusive access to the client. This allows the servlet to do
more interesting things like respond in the future (long polls). The
<code class="language-plaintext highlighter-rouge">with-httpd-buffer</code> macro creates a temporary buffer and, when the
body completes, sends an HTTP header and the buffer as the content,
similar to <code class="language-plaintext highlighter-rouge">defservlet</code>.</p>

<p>With access to the process, the servlet can do more specialized things
like send custom headers with <code class="language-plaintext highlighter-rouge">httpd-send-header</code>, send files with
<code class="language-plaintext highlighter-rouge">httpd-send-file</code>, send an error page with <code class="language-plaintext highlighter-rouge">httpd-error</code>, or do
redirects with <code class="language-plaintext highlighter-rouge">httpd-redirect</code>. The file server part of the server is
actually just another a servlet as well: <code class="language-plaintext highlighter-rouge">httpd/</code>. This could be
redefined to redirect the browser to our example servlet (HTTP 301).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">httpd/</span> <span class="p">(</span><span class="nv">proc</span> <span class="k">&amp;rest</span> <span class="nv">args</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">httpd-redirect</span> <span class="nv">proc</span> <span class="s">"/hello-world"</span><span class="p">))</span>
</code></pre></div></div>

<h3 id="impatient-mode">impatient-mode</h3>

<p>I showed this to <a href="http://50ply.com">Brian</a>, like I do everything, and
he found my servlet concept to be compelling, especially the
buffer-serving servlet. I believe his exact words were, “That’s so
simple.” He found it interesting enough that
<a href="http://www.50ply.com/blog/2012/08/13/introducing-impatient-mode/">he wrote a mode based on it called <code class="language-plaintext highlighter-rouge">impatient-mode</code></a>!</p>

<p>It serves a buffer’s content live to the web browser, including syntax
highlighting (via htmlize). Updates to the buffer are communicated by
a long-poll. The browser initiates a request in the background for an
update. Emacs adds the request to a list. A hook in
<code class="language-plaintext highlighter-rouge">after-change-functions</code> updates all the browsers waiting for an
update.</p>

<p>Enabling <code class="language-plaintext highlighter-rouge">impatient-mode</code>, a minor mode, publishes the buffer. If the
server’s running, the list of published buffers can be found under
<code class="language-plaintext highlighter-rouge">/imp</code> —
i.e. <a href="http://localhost:8080/imp">http://localhost:8080/imp</a>. The
buffer can be accessed directly at <code class="language-plaintext highlighter-rouge">/imp/live/&lt;buffer-name&gt;</code>, which is
where <code class="language-plaintext highlighter-rouge">/imp</code> will link.</p>

<p>Perhaps the coolest thing is serving an HTML buffer <em>without</em>
htmlize. That is, send the raw buffer as <code class="language-plaintext highlighter-rouge">text/html</code>. Brian has a demo
of this in the linked post. You can tweak CSS and HTML and watch it
update live in the browser as you edit. It’s a really neat way to edit
CSS, since it’s often unintuitive (at least for me).</p>

<p><code class="language-plaintext highlighter-rouge">impatient-mode</code> can also be installed through MELPA.</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Elisp Unit Testing with ERT</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/08/15/"/>
    <id>urn:uuid:f5798f49-155b-3038-a8d9-4f5a5f1c2d0c</id>
    <updated>2012-08-15T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p>Emacs 24 comes with a unit testing library, ERT (Emacs Lisp
Regression Testing). I learned about it after watching
<a href="http://emacsrocks.com/">Extending Emacs Rocks!</a> and I’ve been using
it ever since. It’s been a pleasant experience; enough so that
<a href="https://github.com/skeeto/.emacs.d/commit/59d3eac73edbad8a5be72a81c7d6c5b1193bbb90">I made a key binding for it</a>
so that I can effortlessly run tests at any time. When I recently made
a major overhaul to my Emacs web server I added
<a href="https://github.com/skeeto/emacs-http-server/blob/master/simple-httpd-test.el">a small test suite</a>
using ERT.</p>

<p>Emacs also comes with the ERT manual so it’s easy to start learning,
but here’s the gist of it. There are essentially two macros to worry
about: <code class="language-plaintext highlighter-rouge">ert-deftest</code> and <code class="language-plaintext highlighter-rouge">should</code>. The first is used to create tests
and the second behaves like <code class="language-plaintext highlighter-rouge">assert</code> but with nicer behavior. Here’s
an example,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">ert-deftest</span> <span class="nv">example-test</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">should</span> <span class="p">(</span><span class="nb">=</span> <span class="p">(</span><span class="nb">+</span> <span class="mi">9</span> <span class="mi">2</span><span class="p">)</span> <span class="mi">11</span><span class="p">)))</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">ert-deftest</code> is what you’d expect from every other <code class="language-plaintext highlighter-rouge">def*</code>. The empty
parameter list does nothing at the moment other than to make it feel
like writing a <code class="language-plaintext highlighter-rouge">defun</code>. The body is evaluated as normal. This is all
turned into an anonymous function which is stuffed in the <em>plist</em> of
the symbol <code class="language-plaintext highlighter-rouge">example-test</code>. When it comes time to running tests, they
are found by searching the plists of every interned symbol.</p>

<p>The other macro, <code class="language-plaintext highlighter-rouge">should</code>, takes one argument: a form that <em>should</em>
evaluate to true. There is also a <code class="language-plaintext highlighter-rouge">should-not</code> and a <code class="language-plaintext highlighter-rouge">should-error</code>,
which do what you would expect.</p>

<p>Tests are run with <code class="language-plaintext highlighter-rouge">M-x ert</code>. It will ask for a <em>test selector</em>, where
<code class="language-plaintext highlighter-rouge">t</code> selects all defined tests. There are many ways to select a subset
of all tests (<code class="language-plaintext highlighter-rouge">:new</code>, <code class="language-plaintext highlighter-rouge">:passed</code>, <code class="language-plaintext highlighter-rouge">:failed</code>, etc.) but I usually just
run all of them (as my key binding makes obvious). The results are
displayed in a separate pop-up buffer which, as usual, can be
dismissed with <code class="language-plaintext highlighter-rouge">q</code>.</p>

<h3 id="running-ert">Running ERT</h3>

<p>What makes <code class="language-plaintext highlighter-rouge">should</code> special is error reporting. When tests fail you
will be provided with the forms that failed and their return
values. For example, if we modify the test above to fail.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">ert-deftest</span> <span class="nv">example-test</span> <span class="p">()</span>
  <span class="p">(</span><span class="nv">should</span> <span class="p">(</span><span class="nb">=</span> <span class="p">(</span><span class="nb">+</span> <span class="mi">9</span> <span class="mi">2</span><span class="p">)</span> <span class="mi">100</span><span class="p">)))</span>
</code></pre></div></div>

<p>Then run the test and it will note the failure. There is also some red
coloring not captured here.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">F</span> <span class="nv">example-test</span>
    <span class="p">(</span><span class="nv">ert-test-failed</span>
     <span class="p">((</span><span class="nv">should</span>
       <span class="p">(</span><span class="nb">=</span>
        <span class="p">(</span><span class="nb">+</span> <span class="mi">9</span> <span class="mi">2</span><span class="p">)</span>
        <span class="mi">100</span><span class="p">))</span>
      <span class="ss">:form</span>
      <span class="p">(</span><span class="nb">=</span> <span class="mi">11</span> <span class="mi">100</span><span class="p">)</span>
      <span class="ss">:value</span> <span class="no">nil</span><span class="p">))</span>
</code></pre></div></div>

<p>Displayed are the forms we were comparing — <code class="language-plaintext highlighter-rouge">(+ 9 2)</code> and <code class="language-plaintext highlighter-rouge">100</code> — and
what they evaluated to: <code class="language-plaintext highlighter-rouge">(= 11 100)</code>. If I put the point at the test
result and type <code class="language-plaintext highlighter-rouge">.</code> it will take me to the test definition so that I
can start looking further. Or I can press <code class="language-plaintext highlighter-rouge">b</code> to see a backtrace, <code class="language-plaintext highlighter-rouge">m</code>
to see all output messages from that test, or, if I’m in disbelief,
<code class="language-plaintext highlighter-rouge">r</code> to rerun that test.</p>

<h3 id="mocking">Mocking</h3>

<p>Elisp’s dynamic bindings really come in handy when functions need to
be mocked. For example, say I have a function that, at some point,
needs to check whether or not a particular file exists. This would be
done using <code class="language-plaintext highlighter-rouge">file-exists-p</code>. Creating or removing the file in the
filesystem before the test isn’t a well-contained unit test. Tests
running in parallel could interfere and there are a number of ways
something could go wrong.</p>

<p>Instead I’ll temporarily override the definition of <code class="language-plaintext highlighter-rouge">file-exists-p</code>
with a <em>mock</em> function using <code class="language-plaintext highlighter-rouge">let</code>’s cousin, <code class="language-plaintext highlighter-rouge">flet</code>. Note that
<code class="language-plaintext highlighter-rouge">file-exists-p</code> is a C source function but I can still override it as
if it was any regular lisp function.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">determine-next-action</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nv">file-exists-p</span> <span class="s">"death-star-plans.org"</span><span class="p">)</span>
      <span class="ss">'bring-him-the-passengers</span>
    <span class="ss">'tear-this-ship-apart</span><span class="p">))</span>

<span class="p">(</span><span class="nv">ert-deftest</span> <span class="nv">file-check-test</span> <span class="p">()</span>
  <span class="p">(</span><span class="k">flet</span> <span class="p">((</span><span class="nv">file-exists-p</span> <span class="p">(</span><span class="nv">file</span><span class="p">)</span> <span class="no">t</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">should</span> <span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nv">determine-next-action</span><span class="p">)</span> <span class="ss">'bring-him-the-passengers</span><span class="p">)))</span>
  <span class="p">(</span><span class="k">flet</span> <span class="p">((</span><span class="nv">file-exists-p</span> <span class="p">(</span><span class="nv">file</span><span class="p">)</span> <span class="no">nil</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">should</span> <span class="p">(</span><span class="nb">eq</span> <span class="p">(</span><span class="nv">determine-next-action</span><span class="p">)</span> <span class="ss">'tear-this-ship-apart</span><span class="p">))))</span>
</code></pre></div></div>

<p>This is a very simple mock. For a real unit test I might want the mock
to return <code class="language-plaintext highlighter-rouge">t</code> for some filename patterns and <code class="language-plaintext highlighter-rouge">nil</code> for others. There’s
an extension to ERT, <code class="language-plaintext highlighter-rouge">el-mock.el</code>, which assists in creating more
complex mocks, but I haven’t used or needed it yet.</p>

<p>Since it’s so convenient I’m going to be using ERT more and more until
it becomes second-nature.</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Switching to the Emacs Lisp Package Archive</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/08/12/"/>
    <id>urn:uuid:3e3186c8-dccd-3167-1f42-79f34d08a3dd</id>
    <updated>2012-08-12T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<p><em>Update June 2017</em>: I no longer use Emacs’ <code class="language-plaintext highlighter-rouge">package.el</code> and instead
manage packages and their dependencies (manually) through my own
decentralized package system called <code class="language-plaintext highlighter-rouge">gpkg</code> (“git package”).</p>

<p>For those who are unaware, Emacs 24 was finally released this past
June. I had been following the official repository for about a year
before the release using what was becoming version 24, very quickly
becoming dependent on several of <a href="http://www.gnu.org/software/emacs/NEWS.24.1">the new features</a>. Now that
it’s been officially released I’m back to using a stable version of
Emacs, about which I’m quite relieved.</p>

<p>One of the new features that I <em>hadn’t</em> been using until recently was
the package manager, <code class="language-plaintext highlighter-rouge">package</code>, and the
<a href="http://elpa.gnu.org/">Emacs Lisp Package Archive</a> (ELPA). You can now
ask Emacs to download and install new modes and extensions from the
Internet. By default, it only uses the official archive. It only hosts
packages with copyright assigned to the FSF — quite
restrictive. There are alternatives, the most popular of which is
<a href="http://marmalade-repo.org/">Marmalade</a>. Fortunately it’s easy to ask
<code class="language-plaintext highlighter-rouge">package</code> to use additional repositories, so this is a non-issue.</p>

<p>Because it was still unstable and buggy at the time, I avoided using
it when <a href="/blog/2011/10/19/">setting up my configuration repository</a>.
Instead I opted to gather packages by way of Git submodules. I’d give
<code class="language-plaintext highlighter-rouge">package</code> a shot once Emacs 24 was released. Once it was released in
June it was just a matter of time until I invested into this new
system.</p>

<p>The trigger was an e-mail from one of my readers, Rolando. He asked me
if I could move my <a href="/blog/2012/08/02/">recently updated</a> memoization
function into its own repository and touch it up so that it could be
turned into a package with <a href="http://melpa.milkbox.net/">MELPA</a>, another
alternative package repository. This forced me to finally investigate.</p>

<p>It turns out MELPA is <em>really</em> interesting. Each package is described
by a “recipe” file, which is essentially just a tiny s-expression
listing the repository URL. In the case of my memoization package,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">memoize</span> <span class="ss">:repo</span> <span class="s">"skeeto/emacs-memoize"</span>
         <span class="ss">:fetcher</span> <span class="nv">github</span><span class="p">)</span>
</code></pre></div></div>

<p>From a package maintainer’s point-of-view, this is fantastic. I don’t
have to take any extra steps to publish updates to my package. I just
keep doing what I do and it happens automatically. However, I need to
be more careful about not pushing broken commits — which is why I
started unit testing (to be
<a href="/blog/2012/08/15/">covered in a future post</a>). And I need to be extra
careful with my SSH keys, since they’re now used to publish code that
other people automatically trust and execute.</p>

<p>Excited about MELPA and wanting to actually use my own package, I
started throwing out my submodules, replacing them with their package
equivalents. If you follow my configuration repository you probably
noticed all the recent disruption, because updating requires manual
intervention. Git leaves submodules around (for good reason!) so they
need to be manually removed.</p>

<p>I also heavily updated and renamed <a href="/blog/2009/05/17/">my web server</a>
(now called <code class="language-plaintext highlighter-rouge">simple-httpd</code>) to provide it as a package (also to be
covered in a future post). Thanks to MELPA, I follow the package
rather than my own repository since it follows so closely (&lt; 1 hour).</p>

<p>Another barrier was that I was using an old version of <a href="https://github.com/magit/magit">Magit</a>
due to a bad interaction of modern versions with Wombat, my preferred
color theme. After <a href="https://github.com/skeeto/.emacs.d/commit/aec488937ff9a344278359ded7732446f2380748">some face tweaking</a>, I not only fixed it
but I made it better than it was before. Sinking a an hour or two into
these sorts of annoyances usually works out really well. I need to
remind myself of this in the future when I run into annoyance issues.</p>

<p>Surprisingly, <code class="language-plaintext highlighter-rouge">package</code> doesn’t seem to be written with managed
configuration in mind. The provided functionally is designed to be
used interactively rather than programmatically. <code class="language-plaintext highlighter-rouge">package-install</code> is
only meant to be invoked once, so care needs to be taken in listing
packages in a configuration and doing everything in the right
order. Here’s how I have it set up at the moment, after after listing
the packages to use in <code class="language-plaintext highlighter-rouge">my-packages</code>,</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'package</span><span class="p">)</span>
<span class="p">(</span><span class="nv">add-to-list</span> <span class="ss">'package-archives</span>
             <span class="o">'</span><span class="p">(</span><span class="s">"melpa"</span> <span class="o">.</span> <span class="s">"http://melpa.milkbox.net/packages/"</span><span class="p">)</span> <span class="no">t</span><span class="p">)</span>
<span class="p">(</span><span class="nv">package-initialize</span><span class="p">)</span>
<span class="p">(</span><span class="nb">unless</span> <span class="nv">package-archive-contents</span>
  <span class="p">(</span><span class="nv">package-refresh-contents</span><span class="p">))</span>
<span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">p</span> <span class="nv">my-packages</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">when</span> <span class="p">(</span><span class="nb">not</span> <span class="p">(</span><span class="nv">package-installed-p</span> <span class="nv">p</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">package-install</span> <span class="nv">p</span><span class="p">)))</span>
</code></pre></div></div>

<p>Upgrading/updating is currently a manual process. Run
<code class="language-plaintext highlighter-rouge">package-refresh-contents</code>, list the packages with <code class="language-plaintext highlighter-rouge">list-packages</code>,
type <code class="language-plaintext highlighter-rouge">U</code> to mark updates, then <code class="language-plaintext highlighter-rouge">x</code> to e<code class="language-plaintext highlighter-rouge">x</code>ecute the upgrade. Sometime
I may work that into my configuration to be done automatically
once-per-week or something.</p>

<p>I really look forward to making more use of the package manager,
especially as packages can more easily become interdependent, reducing
duplication of effort.</p>

]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>Programmatically Setting Lisp Docstrings</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/08/02/"/>
    <id>urn:uuid:d35e27e8-212a-3d1c-5168-afcccc04bf76</id>
    <updated>2012-08-02T00:00:00Z</updated>
    <category term="elisp"/><category term="emacs"/>
    <content type="html">
      <![CDATA[<p>I just updated my <a href="/blog/2010/07/26/">Elisp memoization function</a> so
that it’s no longer a dirty hack. To work around the lack of closures,
due to the lack of lexical scope in Elisp, the original version used
uninterned symbols to store the look-up table. The new version in the
post uses <code class="language-plaintext highlighter-rouge">lexical-let</code>, which does the same thing internally to fake
a closure. The new version in <a href="/blog/2011/10/19/">my dotfiles</a>
repository uses the brand new
<a href="http://www.gnu.org/software/emacs/NEWS.24.1">Emacs 24 lexical scoping</a>.</p>

<p>It was “dirty” because it built a lambda function out of a list at run
time, taking advantage of the way Elisp currently handles
functions. The reason for this was that I wanted to inject the
original documentation string into the new function which can’t
normally be done when <code class="language-plaintext highlighter-rouge">lambda</code> is used the correct way. When I updated
the function I fixed this as well. It uses a trick provided by Elisp,
which is different than the Common Lisp way that I assumed.</p>

<p>Both Elisp and Common Lisp have a <code class="language-plaintext highlighter-rouge">documentation</code> function for
programmatically accessing symbol documentation. The Elisp version
only provides <em>function</em> documentation, so it only accepts one
argument.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">defun</span> <span class="nv">foo</span> <span class="p">()</span>
  <span class="s">"Foo."</span>
  <span class="no">nil</span><span class="p">)</span>

<span class="p">(</span><span class="nb">documentation</span> <span class="ss">'foo</span><span class="p">)</span>
<span class="nv">=&gt;</span> <span class="s">"Foo."</span>
</code></pre></div></div>

<p>The Common Lisp version must be told what type of documentation to
return, such as <code class="language-plaintext highlighter-rouge">function</code> or <code class="language-plaintext highlighter-rouge">variable</code> (<code class="language-plaintext highlighter-rouge">defvar</code>, <code class="language-plaintext highlighter-rouge">defconst</code>).</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">documentation</span> <span class="ss">'foo</span> <span class="ss">'function</span><span class="p">)</span>
<span class="nv">=&gt;</span> <span class="s">"Foo."</span>
</code></pre></div></div>

<p>As it might be expected, this is <code class="language-plaintext highlighter-rouge">setf</code>-able! It’s possible to update
or modify documentation strings without needing to redefine the
function.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">setf</span> <span class="p">(</span><span class="nb">documentation</span> <span class="ss">'foo</span> <span class="ss">'function</span><span class="p">)</span> <span class="s">"New doc string."</span><span class="p">)</span>
</code></pre></div></div>

<p>Unfortunately it’s not <code class="language-plaintext highlighter-rouge">setf</code>-able in Elisp. Instead you can set the
<code class="language-plaintext highlighter-rouge">function-documentation</code> <em>property</em> of the symbol. The <code class="language-plaintext highlighter-rouge">documentation</code>
function will prefer this over the string stored in the function
itself.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">put</span> <span class="ss">'foo</span> <span class="ss">'function-documentation</span> <span class="s">"Foo updated."</span><span class="p">)</span>

<span class="p">(</span><span class="nb">documentation</span> <span class="ss">'foo</span><span class="p">)</span>
<span class="nv">=&gt;</span> <span class="s">"Foo updated."</span>
</code></pre></div></div>

<p>The downside is that this is a second place to put docstrings, leading
to surprising behavior for developers unaware of this hack.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">put</span> <span class="ss">'foo</span> <span class="ss">'function-documentation</span> <span class="s">"Old docstring."</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">foo</span> <span class="p">()</span>
  <span class="s">"New docstring."</span>
  <span class="no">nil</span><span class="p">)</span>

<span class="p">(</span><span class="nb">documentation</span> <span class="ss">'foo</span><span class="p">)</span>
<span class="nv">=&gt;</span> <span class="s">"Old docstring."</span>
</code></pre></div></div>

<p>This can be fixed by setting the symbol property for
<code class="language-plaintext highlighter-rouge">function-documentation</code> to <code class="language-plaintext highlighter-rouge">nil</code>.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">put</span> <span class="ss">'foo</span> <span class="ss">'function-documentation</span> <span class="no">nil</span><span class="p">)</span>
</code></pre></div></div>

<p>I prefer the Common Lisp method.</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Viewing Java Class Files in Emacs</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/08/01/"/>
    <id>urn:uuid:1bb9f8a9-61eb-34eb-bb62-83e93166cbea</id>
    <updated>2012-08-01T00:00:00Z</updated>
    <category term="emacs"/><category term="java"/>
    <content type="html">
      <![CDATA[<p>One of the users of <a href="/blog/2010/10/15/">my Emacs java extensions</a>
e-mailed me with a question/suggestion about viewing .class files in
Emacs. Emacs has automatic compression, encryption, and archive modes
which allow certain non-text files to be viewed within Emacs in a
sensible text form. He wanted to do the same with Java byte-compiled
.class files: when opening a .class file, Emacs should automatically
and transparently decompile the bytecode into Java source.</p>

<p>He mentioned
[JAD](http://en.wikipedia.org/wiki/JAD_(JAva_Decompiler%29)
specifically, a popular, proprietary, but unmaintained and outdated
Java bytecode decompiler. I’ve never used it and honestly I see no
reason to start using it. Unfortunately there are no other decompilers
in the Debian package archives and I know nothing else about Java
decompiling, so this left me kind of stuck. Instead I decided to build
a proof-of-concept using <code class="language-plaintext highlighter-rouge">javap</code>, the Java disassembler, which comes
with JDKs.</p>

<p>Here it is: <a href="https://gist.github.com/3178747">javap-handler.el</a>. With
these forms evaluated, try opening a .class file in Emacs. Rather than
a screen full of junk, you’ll (hopefully) be presented with a
read-only buffer containing detailed information about the class.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nv">add-to-list</span> <span class="ss">'file-name-handler-alist</span> <span class="o">'</span><span class="p">(</span><span class="s">"\\.class$"</span> <span class="o">.</span> <span class="nv">javap-handler</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">javap-handler</span> <span class="p">(</span><span class="nv">op</span> <span class="k">&amp;rest</span> <span class="nv">args</span><span class="p">)</span>
  <span class="s">"Handle .class files by putting the output of javap in the buffer."</span>
  <span class="p">(</span><span class="nb">cond</span>
   <span class="p">((</span><span class="nb">eq</span> <span class="nv">op</span> <span class="ss">'get-file-buffer</span><span class="p">)</span>
    <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">file</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">args</span><span class="p">)))</span>
      <span class="p">(</span><span class="nv">with-current-buffer</span> <span class="p">(</span><span class="nv">create-file-buffer</span> <span class="nv">file</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">call-process</span> <span class="s">"javap"</span> <span class="no">nil</span> <span class="p">(</span><span class="nv">current-buffer</span><span class="p">)</span> <span class="no">nil</span> <span class="s">"-verbose"</span>
                      <span class="s">"-classpath"</span> <span class="p">(</span><span class="nv">file-name-directory</span> <span class="nv">file</span><span class="p">)</span>
                      <span class="p">(</span><span class="nv">file-name-sans-extension</span>
                       <span class="p">(</span><span class="nv">file-name-nondirectory</span> <span class="nv">file</span><span class="p">)))</span>
        <span class="p">(</span><span class="k">setq</span> <span class="nv">buffer-file-name</span> <span class="nv">file</span><span class="p">)</span>
        <span class="p">(</span><span class="k">setq</span> <span class="nv">buffer-read-only</span> <span class="no">t</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">set-buffer-modified-p</span> <span class="no">nil</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">goto-char</span> <span class="p">(</span><span class="nv">point-min</span><span class="p">))</span>
        <span class="p">(</span><span class="nv">java-mode</span><span class="p">)</span>
        <span class="p">(</span><span class="nv">current-buffer</span><span class="p">))))</span>
   <span class="p">((</span><span class="nv">javap-handler-real</span> <span class="nv">op</span> <span class="nv">args</span><span class="p">))))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">javap-handler-real</span> <span class="p">(</span><span class="nv">operation</span> <span class="nv">args</span><span class="p">)</span>
  <span class="s">"Run the real handler without the javap handler installed."</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">inhibit-file-name-handlers</span>
         <span class="p">(</span><span class="nb">cons</span> <span class="ss">'javap-handler</span>
               <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">eq</span> <span class="nv">inhibit-file-name-operation</span> <span class="nv">operation</span><span class="p">)</span>
                    <span class="nv">inhibit-file-name-handlers</span><span class="p">)))</span>
        <span class="p">(</span><span class="nv">inhibit-file-name-operation</span> <span class="nv">operation</span><span class="p">))</span>
    <span class="p">(</span><span class="nb">apply</span> <span class="nv">operation</span> <span class="nv">args</span><span class="p">)))</span>
</code></pre></div></div>

<p><a href="/img/emacs/javap-junk.png"><img src="/img/emacs/javap-junk-thumb.png" alt="" /></a></p>

<p><a href="/img/emacs/javap-clear.png"><img src="/img/emacs/javap-clear-thumb.png" alt="" /></a></p>

<p>This was harder to do than I thought it would be. To make a new
“magic” file mode requires the use of a half-documented, hackish
file-name-handler API. There’s
<a href="http://www.gnu.org/software/emacs/manual/html_node/elisp/Magic-File-Names.html">a page on it</a>
in the <em>GNU Emacs Lisp Reference Manual</em> but I mostly figured it out
by reading the source code around auto-compression-mode and
auto-encryption-mode.</p>

<p>It works by installing a handler function in <code class="language-plaintext highlighter-rouge">file-name-handler-alist</code>
— similar to <code class="language-plaintext highlighter-rouge">auto-mode-alist</code>. The handler has complete control over
how a particularly-named class of files is handled by Emacs. For
example, the most useful part is instead of actually providing the
contents of a file, the handler can present any contents it wants. In
this case, rather than read in the actual bytecode, the handler
executes <code class="language-plaintext highlighter-rouge">javap</code> on the file and uses the output for the buffer
content.</p>

<p>The hackish part is when the handler wants to let Emacs handle an
operation the normal way, which is pretty much every case except for
<code class="language-plaintext highlighter-rouge">get-file-buffer</code>. The handler has to disable itself by temporarily
setting a dynamically-scoped variable (one of the many legacy areas
that prevents Emacs from being lexically-scoped by default), then ask
Emacs to try the operation again.</p>

<p>As I said, this is just a proof-of-concept so there are two issues
remaining. The first was something requested specifically: viewing
.class files inside .jar archives. It could do this if it was just a
little bit smarter about the classpath. I leave that as an exercise to
the reader. :-)</p>

<p>The second is finding a well-behaved, reasonable decompiler (GUI-less
Unix filter) and replacing <code class="language-plaintext highlighter-rouge">javap</code> with it. Given that assumption,
this should be as simple as replacing a couple of strings in the
<code class="language-plaintext highlighter-rouge">call-process</code>.</p>

<p>This is interesting enough that, if I were to fix it up for
correctness sometime, I may include it as part of java-mode-plus
someday.</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Presentations with Jekyll and deck.js</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/04/30/"/>
    <id>urn:uuid:23949cf7-f8c2-332e-6889-5d4c9d128cf7</id>
    <updated>2012-04-30T00:00:00Z</updated>
    <category term="emacs"/><category term="git"/>
    <content type="html">
      <![CDATA[<p>At work, this has been The Year of Presentations for me so far. I’ve
prepared and performed three hour-long presentations so far this year,
and I will continue to do more. The presentations I’ve done before
haven’t been too serious; I’d just slap a few slides together in
whatever was handy and talk in front of them. However, with these more
serious presentations, I was making much more use of the associated
software. I haven’t been happy with any of them. They violate my
<a href="/blog/2012/04/29/">preference for precision</a>, after all.</p>

<p>The first one I went with KPresenter, part of KOffice. It had been
years since I last used KOffice, so I thought I’d give it a shot. One
the good side, I liked the templates. However, it crashed on me a lot,
which was very frustrating. The GUI is lacking in a lot of places. For
example, I wanted to re-arrange my slides, and dragging and dropping
them feels like the natural choice. The mouse cursor even suggests it
by switching to a hand icon. Nope, dragging and dropping does
nothing. Overall, it felt like using a crummy version of Inkscape. The
presentation was a mess when viewed by other presentation software, so
I had to export it to a PDF to use it.</p>

<p>For the second one, I used LibreOffice’s Impress. It’s better than
KPresenter, but it still feels clunky. It took some wrestling to get
it to do what I wanted. As to be expected, I still had the same
feeling of uneasiness I have about any WYSIWYG tool.</p>

<p>For the third one I used PowerPoint, as provided by my employer. The
main reason for this was that I was <s>stealing</s> borrowing some
important slides from a couple of other people’s presentations, so I
had little choice. It was also an opportunity to compare it to the
others. Overall I’d say it’s on the same level as Impress, with some
slightly nicer GUI behavior.</p>

<p>Fortunately, I recently discovered what may become my preferred
presentation tool! It’s <a href="http://imakewebthings.com/deck.js/">deck.js</a>.</p>

<p>With deck.js, I’ll be writing my presentations in HTML 5, something
with which I’m already comfortable and experienced. Most importantly,
I’ll be able to create my presentations with Emacs and version them
with Git. That allows for easy collaboration on presentations
without all the stupid e-mailing documents back and forth — though
the other person would need to be comfortable with using deck.js,
too. That leaves … well, just <a href="http://50ply.com/">Brian</a> I
guess. So, <em>in theory</em>, this could make collaboration easier.</p>

<p>The downside to deck.js is that it requires a lot of boilerplate,
especially if you want to use the extensions, a couple of which are
absolutely <em>essential</em> in my opinion. Creating a new presentation
requires going through this setup phase, and then working around all
the boilerplate the rest of the time. I’ve successfully used Git to
<a href="/blog/2010/10/04/">work around this problem with Java</a>, so I’ve done
the same here, with a little bit of help from
<a href="https://github.com/mojombo/jekyll">Jekyll</a>.</p>

<p>What I’ve done is used Jekyll as a default layout for deck.js. It
hides away all of the deck.js boilerplate so that I can focus on my
presentation. It also makes it trivial to start a new
presentation. All I have to do is clone this repository and I’m ready
to go.</p>

<pre>
git clone --recursive <a href="https://github.com/skeeto/jekyll-deck">https://github.com/skeeto/jekyll-deck.git</a> <i>my-pres</i>
</pre>

<p>The result looks like this: <a href="/jekyll-deck/">A Jekyll / deck.js Presentation</a>.</p>

<p>Jekyll <em>almost</em> opens up the opportunity to really take deck.js to the
next level: presentations written in Markdown! That would be
wonderful. Unfortunately, the HTML output is a little bit too
demanding for Jekyll (i.e. Maruku) to manage. It’s not quite
extensible enough to pull it off. So it’s just HTML5 for now, which is
unfortunately bulky when it comes to lists — a common element of
presentations. Oh well. I do still get syntax highlighting with
Pygments!</p>

<p>I haven’t used it for anything serious yet, so it’s still untried. In
my experimentation I found it enjoyable to work with, so I really look
forward to making use of it in the future. Feel free to use it
yourself, of course, and tell me how it goes.</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Why Do Developers Prefer Certain Kinds of Tools?</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2012/04/29/"/>
    <id>urn:uuid:4dd7c07d-982d-3ff6-5cdd-70db7c3800bb</id>
    <updated>2012-04-29T00:00:00Z</updated>
    <category term="rant"/><category term="emacs"/><category term="git"/>
    <content type="html">
      <![CDATA[<p>In my experience, software developers generally prefer some flavor of
programmer’s tools when it comes to getting things done. We like plain
text, text editors, command line programs, source control, markup, and
shells. In contrast, non-developer computer users generally prefer
WYSIWYG word processors and GUIs. Developers often have somewhere
between a distaste and a
<a href="http://terminally-incoherent.com/blog/2008/10/16/wysiwyg-is-a-lie/">revulsion</a>
to WYSIWYG editors.</p>

<p>Why is this? What are programmers looking for that other users aren’t?
What I believe it really comes down to is one simple idea: <strong>clean
state transformations</strong>. I’m talking about modifying data, text or
binary, in a precise manner with the possibility of verifying the
modification for correctness in the future.</p>

<p>Think of a file produced by a word processor. It may be some
proprietary format, like a Word’s old .doc format, or, more likely as
we move into the future, it’s in some bloated XML format that’s dumped
into a .zip file. In either case, it’s a blob of data that requires a
complex word processor to view and manipulate. It’s opaque to source
control, so even merging documents requires a capable, full word
processor.</p>

<p>For example, say you’ve received such a document from a colleague by
e-mail, for editing. You’ve read it over and think it looks good,
except you want to italicize a few words in the document. To do that,
you open up the document in a word processor and go through looking
for the words you want to modify. When you’re done you click save.</p>

<p>The problem is did you accidentally make any other changes? Maybe you
had to reply to an email while you were in the middle of it and you
accidentally typed an extra letter into the document. It would be easy
to miss and you’re probably not set up to easily to check what changes
you’ve made.</p>

<p>I am aware that modern word processors have a feature that can show
changes made, which can then be committed to the document. This is
really crude compared to a good source control management system. Due
to the nature of WYSIWYG, you’re still not seeing all of the
changes. There could be invisible markup changes and there’s no way to
know. It’s an example of a single program trying to do too many
unrelated things, so that it ends up do many things poorly.</p>

<p>With source code, the idea of patches come up frequently. The program
<code class="language-plaintext highlighter-rouge">diff</code>, given two text files, can produce a patch file describing
their differences. The complimentary program is <code class="language-plaintext highlighter-rouge">patch</code>, which can
take the output from <code class="language-plaintext highlighter-rouge">diff</code> and one of the original files, and use it
to produce the other file. As an example, say you have this source
file <code class="language-plaintext highlighter-rouge">example.c</code>,</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"Hello, world."</span><span class="p">);</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>If you change the string and save it as a different file, then run
<code class="language-plaintext highlighter-rouge">diff -u</code> (<code class="language-plaintext highlighter-rouge">-u</code> for unified, producing a diff with extra context), you
get this output,</p>

<div class="language-udiff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gd">--- example.c  2012-04-29 21:50:00.250249543 -0400
</span><span class="gi">+++ example2.c   2012-04-29 21:50:09.514206233 -0400
</span><span class="p">@@ -1,5 +1,5 @@</span>
 int main()
 {
<span class="gi">+    printf("Hello, world.");
</span><span class="gd">-    printf("Goodbye, world.");
</span>     return 0;
 }
</code></pre></div></div>

<p>This is very human readable. It states what two files are being
compared, where they differ, some context around the difference
(beginning with a space), and shows which lines were removed
(beginning with <code class="language-plaintext highlighter-rouge">+</code> and <code class="language-plaintext highlighter-rouge">-</code>). A diff like this is capable of
describing any number of files and changes in a row, so it can all fit
comfortably in a single patch file.</p>

<p>If you made changes to a codebase and calculated a diff, you could
send the patch (the diff) to other people with the same codebase and
they could use it to reproduce your exact changes. By looking at it,
they know exactly what changed, so it’s not some mystery to them. This
patch is a <em>clean transformation</em> from one source code state to
another.</p>

<p>More than that: you can send it to people with a similar, but not
exactly identical, codebase and they could still likely apply your
changes. This process is really what source control is all about: an
easy way to coordinate and track patches from many people. A good
version history is going to be a tidy set of patches that take the
source code in its original form and add a feature or fix a bug
through a series of concise changes.</p>

<p>On a side note, you could efficiently store a series of changes to a
file by storing the original document along with a series of
relatively small patches. This is called delta encoding. This is how
both source control and video codecs usually store data on disk.</p>

<p>Anytime I’m outside of this world of precision I start to get
nervous. I feel sloppy and become distrustful of my tools, because I
generally can’t verify that they’re doing what I think they’re
doing. This applies not just to source code, but also writing. I’m
typing this article in Emacs and when I’m done I’ll commit it to
Git. If I make any corrections, I’ll verify that my changes are what I
wanted them to be (via <a href="http://philjackson.github.com/magit/">Magit</a>)
before committing and publishing them.</p>

<p>One of my longterm goals with my work is to try to do as much as
possible with my precision developer tools. I’ve already got
<a href="/blog/2011/11/28/">basic video editing</a> and
<a href="/blog/2012/04/10/">GIF creation</a> worked out. I’m still working out a
happy process for documents (i.e. LaTeX and friends) and
presentations.</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Try Out My Java With Emacs Workflow Within Minutes</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2011/11/19/"/>
    <id>urn:uuid:0096ac53-9db1-3aa8-81ed-64497696bdcb</id>
    <updated>2011-11-19T00:00:00Z</updated>
    <category term="emacs"/><category term="java"/><category term="tutorial"/>
    <content type="html">
      <![CDATA[<p><strong>Update January 2013:</strong> I’ve learned more about Java dependency
management and no longer use my old .ant repository. As a result, I
have deleted it, so ignore any references to it below. The only thing
I keep in <code class="language-plaintext highlighter-rouge">$HOME/.ant/lib</code> these days is an up-to-date <code class="language-plaintext highlighter-rouge">ivy.jar</code>.</p>

<hr />

<p>Last month I started <a href="/blog/2011/10/19/">managing my entire Emacs configuration in
Git</a>, which has already paid for itself by saving
me time. I found out a few other people have been using it (including
<a href="http://www.50ply.com/">Brian</a>), so I also <a href="https://github.com/skeeto/.emacs.d#readme">wrote up a README
file</a> describing my
specific changes.</p>

<p>With Emacs being a breeze to synchronize between my computers, I
noticed a new bottleneck emerged: my <code class="language-plaintext highlighter-rouge">.ant</code>
directory. <a href="http://ant.apache.org/">Apache Ant</a> puts everything in
<code class="language-plaintext highlighter-rouge">$ANT_HOME/lib</code> and <code class="language-plaintext highlighter-rouge">$HOME/.ant/lib</code> into its classpath. So, for
example, if you wanted to use <a href="http://www.junit.org/">JUnit</a> with Ant,
you’d toss <code class="language-plaintext highlighter-rouge">junit.jar</code> in either of those directories. <code class="language-plaintext highlighter-rouge">$ANT_HOME</code>
tends to be a system directory, and I prefer to only modify system
directories indirectly through <code class="language-plaintext highlighter-rouge">apt</code>, so I put everything in
<code class="language-plaintext highlighter-rouge">$HOME/.ant/lib</code>. Unfortunately, that’s another directory to keep
track of on my own. Fortunately, I already know how to deal with
that. It’s now another Git repository,</p>

<p><a href="https://github.com/skeeto/.ant">https://github.com/skeeto/.ant</a>
(<a href="https://github.com/skeeto/.ant#readme">README</a>)</p>

<p>With that in place, settling into a new computer for development is
almost as simple as cloning those two repositories. Yesterday I took
the step to eliminate the only significant step that remained:
<a href="/blog/2010/10/14/">setting up <code class="language-plaintext highlighter-rouge">java-docs</code></a>. Before you could really
take advantage of my Java extension, you really needed to have a
Javadoc directory scanned by Emacs. The results of that scan not only
provided an easy way to jump into documentation, but also provided the
lists for class name completion. Now, <code class="language-plaintext highlighter-rouge">java-docs</code> now automatically
loads up the core Java Javadoc, linking to the official website, if
the user never sets it up.</p>

<p>So if you want to see exactly how my Emacs workflow with Java
operates, it’s just a few small steps away. This <em>should</em> work for any
operating system suitable for Java development.</p>

<p>Let’s start by getting Java set up. First, install a JDK and Apache
Ant. This is trivial to do on Debian-based systems,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sudo apt-get install openjdk-6-jdk ant
</code></pre></div></div>

<p>On Windows, the JDK is easy, but Ant needs some help. You probably
need to set <code class="language-plaintext highlighter-rouge">ANT_HOME</code> to point to the install location, and you
definitely need to add it to your <code class="language-plaintext highlighter-rouge">PATH</code>.</p>

<p>Next install Git. This should be straightforward; just make sure its
in your <code class="language-plaintext highlighter-rouge">PATH</code> (so Emacs can find it).</p>

<p>Clone my <code class="language-plaintext highlighter-rouge">.ant</code> repository in your home directory.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd
git clone https://github.com/skeeto/.ant.git
</code></pre></div></div>

<p>Except for Emacs, that’s really all I need to develop with Java. This
setup should allow you to compile and hack on just about any of my
Java projects. To test it out, anywhere you like clone one of my
projects, such as my
<a href="https://github.com/skeeto/sample-java-project">example project</a>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone https://github.com/skeeto/sample-java-project.git
</code></pre></div></div>

<p>You should be able to build and run it now,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd sample-java-project
ant run
</code></pre></div></div>

<p>If that works, you’re ready to set up Emacs. First, install Emacs. If
you’re not familiar with Emacs, now would be the time to go through
the tutorial to pick up the basics. Fire it up and type <code class="language-plaintext highlighter-rouge">CTRL + h</code> and
then <code class="language-plaintext highlighter-rouge">t</code> (in Emacs’ terms: <code class="language-plaintext highlighter-rouge">C-h t</code>), or select the tutorial from the
menu.</p>

<p>Move any existing configuration out of the way,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mv .emacs .old.emacs
mv .emacs.d .old.emacs.d
</code></pre></div></div>

<p>Clone my configuration,</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone https://github.com/skeeto/.emacs.d.git
</code></pre></div></div>

<p>Then run Emacs. You should be greeted with a plain, gray window: the
wombat theme. No menu bar, no toolbar, just a minibuffer, mode line,
and wide open window. Anything else is a waste of screen real
estate. This initial empty buffer has a great aesthetic, don’t you
think?</p>

<p><a href="/img/emacs/init.png"><img src="/img/emacs/init-thumb.png" alt="" /></a></p>

<p>Now to go for a test drive: open up that Java project you cloned, with
<code class="language-plaintext highlighter-rouge">M-x open-java-project</code>. That will prompt you for the root directory
of the project. The only thing this does is pre-opens all of the
source files for you, exposing their contents to <code class="language-plaintext highlighter-rouge">dabbrev-expand</code> and
makes jumping to other source files as easy as changing buffers — so
it’s not <em>strictly</em> necessary.</p>

<p>Switch to a buffer with a source file, such as
<code class="language-plaintext highlighter-rouge">SampleJavaProject.java</code> if you used my example project. Change
whatever you like, such as the printed string. You can add import
statements at any time with <code class="language-plaintext highlighter-rouge">C-x I</code> (note: capital <code class="language-plaintext highlighter-rouge">I</code>), where
<code class="language-plaintext highlighter-rouge">java-docs</code> will present you with a huge list of classes from which to
pick. The import will be added at the top of the buffer in the correct
position in the import listing.</p>

<p><a href="/img/emacs/java-import.png"><img src="/img/emacs/java-import-thumb.png" alt="" /></a></p>

<p>Without needing to save, hit <code class="language-plaintext highlighter-rouge">C-x r</code> to run the program from Emacs. A
<code class="language-plaintext highlighter-rouge">*compilation-1*</code> buffer will pop up with all of the output from Ant
and the program. If you just want to compile without running it, type
<code class="language-plaintext highlighter-rouge">C-x c</code> instead. If there were any errors, Ant will report them in the
compilation buffer. You can jump directly to these with <code class="language-plaintext highlighter-rouge">C-x `</code>
(that’s a backtick).</p>

<p><a href="/img/emacs/java-run.png"><img src="/img/emacs/java-run-thumb.png" alt="" /></a></p>

<p>Now open a new source file in the same package (same directory) as the
source file you just edited. Type <code class="language-plaintext highlighter-rouge">cls</code> and hit tab. The boilerplate,
including package statement, will be filled out for you by
YASnippet. There are a bunch of completion snippets available. Try
<code class="language-plaintext highlighter-rouge">jal</code> for example, which completes with information from <code class="language-plaintext highlighter-rouge">java-docs</code>.</p>

<p>When I’m developing a library, I don’t have a main function, so
there’s nothing to “run”. Instead, I drive things from unit tests,
which can be run with <code class="language-plaintext highlighter-rouge">C-x t</code>, which runs the “test” target if there
is one.</p>

<p><a href="/img/emacs/junit-mock.png"><img src="/img/emacs/junit-mock-thumb.png" alt="" /></a></p>

<p>To see your changes, type <code class="language-plaintext highlighter-rouge">C-x g</code> to bring up Magit and type <code class="language-plaintext highlighter-rouge">M-s</code> in
the Magit buffer (to show a full diff). From here you can make
commits, push, pull, merge, switch branches, reset, and so on. To
learn how to do all this, see the
<a href="http://philjackson.github.com/magit/magit.html">Magit manual</a>. You
can type <code class="language-plaintext highlighter-rouge">q</code> to exit the Magit window, or use <code class="language-plaintext highlighter-rouge">S-&lt;arrow key&gt;</code> to move
to an adjacent buffer in any direction.</p>

<p><a href="/img/emacs/magit.png"><img src="/img/emacs/magit-thumb.png" alt="" /></a></p>

<p>And that’s basically my workflow. Developing in C is a very similar
process, but without the <code class="language-plaintext highlighter-rouge">java-docs</code> part.</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  <entry>
    <title>Emacs Configuration Repository</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2011/10/19/"/>
    <id>urn:uuid:d3b9a99d-3526-3c74-7b43-643b752df6ac</id>
    <updated>2011-10-19T00:00:00Z</updated>
    <category term="emacs"/>
    <content type="html">
      <![CDATA[<p>I finally got my entire Emacs configuration into source control. My
previous solution was to copy around my <code class="language-plaintext highlighter-rouge">.emacs.d/</code> to each computer I
use. This works well enough with two computers, but beyond that it’s
difficult to propagate any changes I make. Counting all my VMs, I have
around a dozen systems where I use Emacs. This is the exact problem
that source control exists to fix.</p>

<p>If you move your <code class="language-plaintext highlighter-rouge">.emacs</code> and <code class="language-plaintext highlighter-rouge">.emacs.d/</code> out of the way, clone my
repository right into your home directory, clone the submodules, and
then run Emacs 23 or greater, you’ll see my exact Emacs setup, theme
and all.</p>

<pre>
cd
git clone <a href="https://github.com/skeeto/.emacs.d">git://github.com/skeeto/.emacs.d.git</a>
cd .emacs.d
git submodule init
git submodule update
</pre>

<p>Notice there’s an <code class="language-plaintext highlighter-rouge">init.el</code> in there. Emacs tries to load <code class="language-plaintext highlighter-rouge">~/.emacs</code>
first, but if that doesn’t exist it loads <code class="language-plaintext highlighter-rouge">~/.emacs.d/init.el</code>. That’s
why you need to move your own <code class="language-plaintext highlighter-rouge">.emacs</code> out of the way to see my
stuff. I do still make use of a <code class="language-plaintext highlighter-rouge">.emacs</code> file. That’s my
system-specific configuration, where, for example, I tell Emacs <a href="/blog/2010/10/14/">where
to find Javadoc files</a>. At the top of this file I
make sure to load my other init file.</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">;; Load standard configuration</span>
<span class="p">(</span><span class="nv">load-file</span> <span class="s">"~/.emacs.d/init.el"</span><span class="p">)</span>
</code></pre></div></div>

<p>One reason I didn’t use source control right away was the submodule
problem — my configuration is largely made up of <em>other</em> repositories.
Git has good support for putting foreign Git repositories within your
own repository, but a couple of repositories I was using were
Subversion and CVS. I managed to cut down to just Git repositories</p>
<s>and one Subversion repository, for which I now maintain a Git
mirror, making these *all* Git repositories</s>
<p>. (<em>Update November
2011</em>: YASnippet has moved to Git.)</p>

<p>I also trimmed down a bit, cutting out some things I noticed I wasn’t
using (breadcrumbs, pabbrev) or things that didn’t need to be in
there, such as Slime. I now use <a href="http://www.quicklisp.org/">Quicklisp</a>
to manage my Slime installation, which I connect with my configuration
in my system-specific <code class="language-plaintext highlighter-rouge">.emacs</code>. Using source control will help better
track what I’m using and not using, keeping the whole thing more
tidy. Removing an experimental addition should be a simple revert
commit.</p>

<p>Some of the important pieces of my configuration are a spattering of
new modes, <a href="http://philjackson.github.com/magit/">Magit</a> (<code class="language-plaintext highlighter-rouge">M-x g</code>),
<a href="https://github.com/capitaomorte/yasnippet">yasnippet</a> (including
several of my own snippets),
<a href="http://www.emacswiki.org/emacs/DiredPlus">dired+</a>,
<a href="http://www.emacswiki.org/emacs/ParEdit">ParEdit</a>,
<a href="https://github.com/nonsequitur/smex">smex</a>, my
<a href="/blog/2010/10/15/">Java editing extensions</a>, and
<a href="/blog/2009/05/17/">a web server</a> (<code class="language-plaintext highlighter-rouge">M-x httpd-start</code>).</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Fake Emacs Namespaces</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2011/08/18/"/>
    <id>urn:uuid:f89408fd-9b2f-3110-af83-fe96f7c1e7f7</id>
    <updated>2011-08-18T00:00:00Z</updated>
    <category term="elisp"/><category term="emacs"/>
    <content type="html">
      <![CDATA[<p>Back in May I wrote a crude <code class="language-plaintext highlighter-rouge">defpackage</code> function for Elisp, modeled
after Common Lisp’s version. I’m calling them fakespaces.</p>

<ul>
  <li><a href="https://github.com/skeeto/elisp-fakespace">https://github.com/skeeto/elisp-fakespace</a></li>
</ul>

<p>It works like so (see <code class="language-plaintext highlighter-rouge">example.el</code> for detailed information on this
code),</p>

<div class="language-cl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nb">require</span> <span class="ss">'fakespace</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defpackage</span> <span class="nv">example</span>
  <span class="p">(</span><span class="ss">:use</span> <span class="nv">cl</span> <span class="nv">ido</span><span class="p">)</span>
  <span class="p">(</span><span class="ss">:export</span> <span class="nv">example-main</span> <span class="nv">example-var</span> <span class="nv">eq-hello</span> <span class="nv">hello</span><span class="p">))</span>

<span class="p">(</span><span class="nb">defvar</span> <span class="nv">my-var</span> <span class="mi">100</span>
  <span class="s">"A hidden variable."</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defvar</span> <span class="nv">example-var</span> <span class="no">nil</span>
  <span class="s">"A public variable."</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">my-func</span> <span class="p">()</span>
  <span class="s">"A private function."</span>
  <span class="nv">my-var</span><span class="p">)</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">example-main</span> <span class="p">()</span>
  <span class="s">"An exported function. Notice we can access all the private
variables and functions from here."</span>
  <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="nv">my-func</span><span class="p">)</span> <span class="nv">my-var</span><span class="p">)</span> <span class="nv">example-var</span>
        <span class="p">(</span><span class="nv">ido-completing-read</span> <span class="s">"New value: "</span> <span class="p">(</span><span class="nb">list</span> <span class="s">"foo"</span> <span class="s">"bar"</span><span class="p">))))</span>

<span class="p">(</span><span class="nb">defun</span> <span class="nv">eq-hello</span> <span class="p">(</span><span class="nv">sym</span><span class="p">)</span>
  <span class="p">(</span><span class="nb">eq</span> <span class="nv">sym</span> <span class="ss">'hello</span><span class="p">))</span>

<span class="p">(</span><span class="nv">end-package</span><span class="p">)</span>
</code></pre></div></div>

<p>Notice <code class="language-plaintext highlighter-rouge">end-package</code> at the end, which is not needed in Common
Lisp. That’s part of what makes it crude.</p>

<p>If you run those functions and try changing the assignment of
non-exported symbols, you’ll see the namespace separation in
action. <code class="language-plaintext highlighter-rouge">my-var</code> and <code class="language-plaintext highlighter-rouge">my-func</code> are a completely different symbols than
the ones you’re seeing after <code class="language-plaintext highlighter-rouge">end-package</code>.</p>

<p>It’s really simple in how it works (it’s 40 lines of code). The
<code class="language-plaintext highlighter-rouge">defpackage</code> macro takes a snapshot of the symbol table. Then new
symbols get interned through various function and variable
definitions. Finally <code class="language-plaintext highlighter-rouge">end-package</code> compares the current symbol table
to the snapshot and uninterns any new symbols. These symbols will be
unaccessible to other code, effectively giving them their own
namespace.</p>

<p>Snapshots are pushed onto a stack, so it’s safe to create a new
package within another package, as long as <code class="language-plaintext highlighter-rouge">end-package</code> is used
properly. This is necessary when one namespaced package depends on
another, because the dependency will tend to be loaded in the middle
of defining the current package.</p>

<p><code class="language-plaintext highlighter-rouge">in-package</code> is not provided, so there’s no way to get the symbols
back to where they can be accessed. It’s impossible to modify a
package using fake namespacing. Worst of all, implementing
<code class="language-plaintext highlighter-rouge">in-package</code> is currently (and will likely always be) impossible. When
symbols are uninterned they would need to be stored in a package
symbol table for future re-interning. <code class="language-plaintext highlighter-rouge">in-package</code>’s job would be to
unintern and store away the current package’s symbols and then place
the new package’s symbols into the main symbol table.</p>

<p>However, symbols cannot be re-interned. This is because it’s
impossible for a symbol to exist in two different obarrays at the same
time, so the functionality is intentionally not provided. An obarray
is an Elisp vector containing symbols. It’s treated like a hash table:
the symbol is hashed to choose a location in the vector. If the slot
is already taken, the symbol is invisibly chain behind the residing
symbol by an inaccessible linked list. If the symbol was in two
obarrays at once, it would need to be able to chain to two different
symbols at the same time.</p>

<p>Providing access to symbols through a colon-specificed namespace
(<code class="language-plaintext highlighter-rouge">my-package:my-symbol</code>) is also currently impossible — without
hacking in C anyway.</p>

<p>There’s a neat trick to the <code class="language-plaintext highlighter-rouge">:export</code> list. The <code class="language-plaintext highlighter-rouge">defpackage</code> macro
definition actually ignores that list altogether, because it works
automatically. By the time <code class="language-plaintext highlighter-rouge">defpackage</code> is invoked, the listed symbols
have already been interned by the reader, so they get stored in the
snapshot.</p>

<p>I doubt I’ll ever make use of this for my own packages. This was
mostly a fun exercise in toying with Elisp.</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Elisp Function Composition</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2010/11/15/"/>
    <id>urn:uuid:86809c4e-9f00-396d-71ab-b48d5950a343</id>
    <updated>2010-11-15T00:00:00Z</updated>
    <category term="elisp"/><category term="emacs"/>
    <content type="html">
      <![CDATA[<!-- 15 November 2010 -->
<p>
During my recent Elisp hacking I've run into the situation enough
times where I really wanted function composition that I officially
implemented it for myself. While there is
an <a href="/blog/2010/09/29/"> <code>apply-partially</code></a>
function, Elisp does not currently come with a <code>compose</code>
function. Here's an Elisp definition,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="c1">;; ID: f0c736a9-afec-3e3f-455c-40997023e130</span>
<span class="p">(</span><span class="nb">defun</span> <span class="nv">compose</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">funs</span><span class="p">)</span>
  <span class="s">"Return function composed of FUNS."</span>
  <span class="p">(</span><span class="nv">lexical-let</span> <span class="p">((</span><span class="nv">lex-funs</span> <span class="nv">funs</span><span class="p">))</span>
    <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">args</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">reduce</span> <span class="ss">'funcall</span> <span class="p">(</span><span class="nb">butlast</span> <span class="nv">lex-funs</span><span class="p">)</span>
              <span class="ss">:from-end</span> <span class="no">t</span>
              <span class="ss">:initial-value</span> <span class="p">(</span><span class="nb">apply</span> <span class="p">(</span><span class="nb">car</span> <span class="p">(</span><span class="nb">last</span> <span class="nv">lex-funs</span><span class="p">))</span> <span class="nv">args</span><span class="p">)))))</span></code></pre></figure>
<p>
Here it is in action with three functions.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">compose</span> <span class="ss">'prin1-to-string</span> <span class="ss">'random*</span> <span class="ss">'exp</span><span class="p">)</span> <span class="mi">10</span><span class="p">)</span></code></pre></figure>
<p>
I'll be using this in later posts (and linking back here when I do).
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Introducing Java Mode Plus</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2010/10/15/"/>
    <id>urn:uuid:5cf6344b-537b-36eb-53d8-6ca6b24fd492</id>
    <updated>2010-10-15T00:00:00Z</updated>
    <category term="java"/><category term="emacs"/>
    <content type="html">
      <![CDATA[<!-- 15 October 2010 -->
<p>
There's an extension to Emacs
called <a href="http://jdee.sourceforge.net/"> JDEE</a> which tries to
turn Emacs into a heavyweight IDE for Java. I've never had any success
with it, and I don't know anyone else who has either. It's difficult
to set up, the dependencies are even worse, poorly documented, and
then it doesn't seem to work very well anyway. I think it's too
divorced from Emacs'
core <a href="http://en.wikipedia.org/wiki/Composability">
composable</a> functionality to be of much use. I may as well be using
a big IDE.
</p>
<p>
So, instead, as <a href="/blog/2009/12/06/">I've
posted</a> <a href="/blog/2010/09/30/">about</a>
<a href="/blog/2010/10/14/">over time</a>, I've started with the basic
Emacs Java functionality and tweaked my way up from there. I've
extended it enough that I decided to package it up on it's own, and
hopefully others will find it useful too. I call
it <code>java-mode-plus</code>!
</p>
<pre>
git clone <a href="https://github.com/skeeto/emacs-java">git://github.com/skeeto/emacs-java.git</a>
</pre>
<p>
Specifically: <a href="https://github.com/skeeto/emacs-java/blob/master/java-mode-plus.el">java-mode-plus.el</a>
</p>
<p>
It provides a hook into <code>java-mode</code> that creates a bunch of
new bindings. It also creates some new globally-available functions
like <code>open-java-project</code>. It's all heavily Ant-based since
that's what I like to use. It wouldn't be very hard to modify it to
use Maven, if that's what your thing.
</p>
<p>
<b>My very thorough documentation is in a large header comment in the
source file itself.</b> I cover my whole workflow from top to
bottom. If you're interested in making Emacs more Java-friendly take a
look at it. It's not a lot of code, but each line has been
thoughtfully added after hours and hours of Java development.
</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Jump to Java Documentation from Emacs</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2010/10/14/"/>
    <id>urn:uuid:4fe4bf26-5447-3337-33bd-1cb0647708b0</id>
    <updated>2010-10-14T00:00:00Z</updated>
    <category term="java"/><category term="emacs"/>
    <content type="html">
      <![CDATA[<!-- 14 October 2010 -->
<p class="abstract">
Update January 2013: this package has been refined and formally
renamed
to <a href="https://github.com/skeeto/javadoc-lookup">javadoc-lookup</a>.
The user interface is essentially the same — under different function
names — but with some extra goodies. It's available for install from
MELPA.
</p>

<p>
I keep running to either a search engine or, when offline, manually
browsing to Java API documentation when I need to look something
up. When I'm using Emacs this is stupid, so I fixed it. I put together
a <code>java-docs</code> package that let's me quickly jump to
documentation from within Emacs.
</p>
<p>
Repository: <code>git clone <a href="https://github.com/skeeto/emacs-java">git://github.com/skeeto/javadoc-lookup.git</a></code>
</p>
<p>
Unfortunately it launches it in a web browser right now because there
doesn't seem to be a reasonable way to render the
documentation <i>inside</i> Emacs itself. So
you'll <a href="http://www.emacswiki.org/emacs/BrowseUrl"> need to
have <code>browse-url</code> set up properly</a> in your
configuration.
</p>
<p>
I strongly recommend you use this
with <a href="http://www.emacswiki.org/emacs/InteractivelyDoThings">
Ido</a>, which comes with Emacs. If you do, you'll want to load
it <i>after</i> you enable <code>ido-mode</code>, which will enable
the Ido minibuffer completion in <code>java-docs</code>.
</p>
<p>
So, after you <code>require</code> <code>java-docs</code>, you give it
a list of places to look for documentation.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">require</span> <span class="ss">'java-docs</span><span class="p">)</span>
<span class="p">(</span><span class="nv">java-docs</span> <span class="s">"/usr/share/doc/openjdk-6-jdk/api"</span> <span class="s">"~/src/project/doc"</span><span class="p">)</span></code></pre></figure>
<p>
It will scan these locations and build up an index of
classes. <a href="/blog/2010/06/07/">If you're using a recent enough
version of Emacs</a> it will cache that index for faster loading in
the future, since on <i>certain</i> systems it can needlessly take a
bit of time.
</p>
<p>
After that you can jump to documentation with <code>C-h j</code>
(<code>java-docs-lookup</code>). It will ask you what you want to look
up and offer completion with your preferred completion function.
</p>
<p class="center">
  <a href="/img/emacs/java-import.png">
    <img src="/img/emacs/java-import-thumb.png" alt=""/>
  </a>
</p>
<p>
If you don't want to open it up in an external browser, you can set
Emacs to run a text-based browser inside itself.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="k">setq</span> <span class="nv">browse-url-browser-function</span> <span class="ss">'browse-url-text-emacs</span><span class="p">)</span></code></pre></figure>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Emacs Set Window to 80 Columns</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2010/10/06/"/>
    <id>urn:uuid:90343c39-62b8-35c2-d9a7-8b3b29cb8338</id>
    <updated>2010-10-06T00:00:00Z</updated>
    <category term="emacs"/><category term="lisp"/>
    <content type="html">
      <![CDATA[<!-- 6 October 2010 -->
<p>
When I'm coding, I maximize Emacs and enable
<a href="http://www.emacswiki.org/emacs/WinnerMode">
<code>winner-mode</code></a>, turning my display into something much
like a <a href="http://en.wikipedia.org/wiki/Tiling_window_manager">
tiling window manager</a>. Then I try not to leave Emacs until it's
necessary. It's a really nice way to work: no mouse touching needed.
</p>
<p>
At work they gave me a nice 24" monitor, 1920 pixels across. That's
just about enough to fit three Emacs' windows side-by-side at 78
columns each. The leftmost one contains my active work buffer where I
do most of my typing. The center one is usually split
horizontally. The top half is the <code>*compilation*</code> buffer
and the bottom half is either <a href="/blog/2009/06/23/">Emacs
calculator</a> or an <code>*ansi-term*</code> buffer. The rightmost
buffer contains something more static, like some sort of reference
material.
</p>
<p>
However, I like my main editing window to be 80 columns wide. 78
columns cuts just too short. For awhile I was creating 80 dashes
(<code>C-u 80 -</code>) and adjusting the window width manually to
size. After doing it a few times I decided to extend Emacs to do it
instead. First define a function to set the current window width.
</p>

<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">defun</span> <span class="nv">set-window-width</span> <span class="p">(</span><span class="nv">n</span><span class="p">)</span>
  <span class="s">"Set the selected window's width."</span>
  <span class="p">(</span><span class="nv">adjust-window-trailing-edge</span> <span class="p">(</span><span class="nv">selected-window</span><span class="p">)</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">n</span> <span class="p">(</span><span class="nv">window-width</span><span class="p">))</span> <span class="no">t</span><span class="p">))</span></code></pre></figure>

<p>
Wrap it with an interactive function and bind it.
</p>

<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">defun</span> <span class="nv">set-80-columns</span> <span class="p">()</span>
  <span class="s">"Set the selected window to 80 columns."</span>
  <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">set-window-width</span> <span class="mi">80</span><span class="p">))</span>

<span class="p">(</span><span class="nv">global-set-key</span> <span class="s">"\C-x~"</span> <span class="ss">'set-80-columns</span><span class="p">)</span></code></pre></figure>

<p>
For those paying extra attention: instead of writing the extra
function, you could
use <a href="/blog/2010/09/29/">my <code>expose</code> function from
the other day</a>.
</p>

<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">global-set-key</span> <span class="s">"\C-x~"</span> <span class="p">(</span><span class="nv">expose</span> <span class="p">(</span><span class="nv">apply-partially</span> <span class="ss">'set-window-width</span> <span class="mi">80</span><span class="p">)))</span></code></pre></figure>

<p>
The problem with this, though, is the dynamically generated function
doesn't have a name or a docstring. Someone
using <code>describe-key</code> would have little information to go
on.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  <entry>
    <title>Emacs Find All Files</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2010/09/30/"/>
    <id>urn:uuid:6b914b5a-f8f8-3d5d-469d-5d2c25b909c8</id>
    <updated>2010-09-30T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<!-- 30 September 2010 -->
<p>
Here's another bit of code I started using recently. I often find
myself wanting to open — or reopen
after <code>kill-matching-buffers</code> — all the files under a
specific point in the file system. I'm using it at work now to open up
all the source files in a deep Java source tree on small-ish
project. Once it's all open I can switch to any file quickly
with <a href="http://www.emacswiki.org/emacs/InteractivelyDoThings">
ido's fuzzy matching</a>, flattening out the directory structure a
bit. (And the ridiculous "security" software at work imposes a
3-second I/O block when opening files, so I get to pay this all up
front at once rather than having it later
<a href="http://c2.com/cgi/wiki?MentalStateCalledFlow"> break my
flow</a>.)
</p>
<p>
This just recursively travels down the sub-directories opening a
buffer for everything it comes across. It ignores dot-files, like the
ones your source control might litter.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="c1">;; ID: 72dc0a9e-c41c-31f8-c8f5-d9db8482de1e</span>
<span class="p">(</span><span class="nb">defun</span> <span class="nv">find-all-files</span> <span class="p">(</span><span class="nv">dir</span><span class="p">)</span>
  <span class="s">"Open all files and sub-directories below the given directory."</span>
  <span class="p">(</span><span class="nv">interactive</span> <span class="s">"DBase directory: "</span><span class="p">)</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nb">list</span> <span class="p">(</span><span class="nv">directory-files</span> <span class="nv">dir</span> <span class="no">t</span> <span class="s">"^[^.]"</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">files</span> <span class="p">(</span><span class="nb">remove-if</span> <span class="ss">'file-directory-p</span> <span class="nb">list</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">dirs</span> <span class="p">(</span><span class="nb">remove-if-not</span> <span class="ss">'file-directory-p</span> <span class="nb">list</span><span class="p">)))</span>
    <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">file</span> <span class="nv">files</span><span class="p">)</span>
      <span class="p">(</span><span class="nv">find-file-noselect</span> <span class="nv">file</span><span class="p">))</span>
    <span class="p">(</span><span class="nb">dolist</span> <span class="p">(</span><span class="nv">dir</span> <span class="nv">dirs</span><span class="p">)</span>
      <span class="p">(</span><span class="nv">find-file-noselect</span> <span class="nv">dir</span><span class="p">)</span>
      <span class="p">(</span><span class="nv">find-all-files</span> <span class="nv">dir</span><span class="p">))))</span></code></pre></figure>
<p>
One caveat: if you have a symbolic link that creates a file system
loop, this will probably get hung on it.
</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Elisp Higher-order Conversion to Interactive</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2010/09/29/"/>
    <id>urn:uuid:0677cf0d-4ba9-300f-3bf0-795a821b1287</id>
    <updated>2010-09-29T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<!-- 29 September 2010 -->
<p>
For those not familiar with extending Emacs, when you create a
function in Elisp it cannot be called directly by the user
("interactively") without declaring the function interactive. The
simplest way to do this is by adding <code>(interactive)</code> to the
top of the function definition. The <code>interactive</code> call can
be made more complex, if needed, to ask the user interactively for
input.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">defun</span> <span class="nv">hello-world</span> <span class="p">()</span>
  <span class="s">"Example function."</span>
  <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">message</span> <span class="s">"hello"</span><span class="p">))</span></code></pre></figure>
<p>
There are some handy higher-order functions in Elisp, such
as <code>compose</code> and <code>apply-partially</code>. Today I
wanted to bind the output of <code>apply-partially</code> to a key. My
situation was this: I use <code>revert-buffer</code> often enough that
it needs a binding. Also because I use it so much, I wanted it to
stop asking me for confirmation. (Yes,
there <a href="http://www.emacswiki.org/emacs/YesOrNoP"> are other
ways to do this</a> including <code>revert-without-query</code>, but I
wanted a general solution.) Using <code>apply-partially</code> I could
supply the needed function arguments at keybind time.
</p>
<p>
The problem is that you can only bind interactive functions, and the
output of <code>apply-partially</code> is not interactive. A quick way
to work around this is to wrap it in an anonymous function, which also
takes away the need for <code>apply-partially</code>.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span> <span class="p">(</span><span class="nv">revert-buffer</span> <span class="no">nil</span> <span class="no">t</span><span class="p">))</span></code></pre></figure>
<p>
I'd rather there be <i>another</i> higher-order function that takes a
non-interactive function and creates an interactive version. Here it is,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="c1">;; ID: c7db6dec-e7ab-3b0f-bf26-0fa268674c6c</span>
<span class="p">(</span><span class="nb">defun</span> <span class="nv">expose</span> <span class="p">(</span><span class="k">function</span><span class="p">)</span>
  <span class="s">"Return an interactive version of FUNCTION."</span>
  <span class="p">(</span><span class="nv">lexical-let</span> <span class="p">((</span><span class="nv">lex-func</span> <span class="k">function</span><span class="p">))</span>
    <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
      <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
      <span class="p">(</span><span class="nb">funcall</span> <span class="nv">lex-func</span><span class="p">))))</span></code></pre></figure>
<p>
Now the binding looks like this,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">global-set-key</span> <span class="nv">[f2]</span> <span class="p">(</span><span class="nv">expose</span> <span class="p">(</span><span class="nv">apply-partially</span> <span class="ss">'revert-buffer</span> <span class="no">nil</span> <span class="no">t</span><span class="p">)))</span></code></pre></figure>
<p>
I think this more clearly expresses my intention than
the <code>lambda</code> wrapper would. Maybe?
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Distributed Computing with Emacs</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2010/08/07/"/>
    <id>urn:uuid:9bc67be6-abda-37cd-d34f-ef0e4622358c</id>
    <updated>2010-08-07T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<!-- 7 August 2010 -->
<p>
I got an Elisp idea today and even went as far as implementing a proof
of concept for it: distributed computing with Emacs Lisp. As usual for
me the idea takes advantage of Lisp features to make the task pretty
simple, very specifically Elisp's implementation. In this case it's
the Lisp reader, printer, and the fact that Elisp functions have a
printed representation, both byte-compiled and not.
</p>
<p>
Here's the proof of concept code: <a href="/download/dist-emacs.el">dist-emacs.el</a>
</p>
<p>
A central server listens for TCP connections. Clients offering their
CPU for use connect to the server and await instructions. The server
sends a single, no-argument, anonymous function to the client. The
client calls the function, returning the resulting form back to the
server. In order to transmit the function it's encoded into a string
using the Lisp printer, and the client turns it back into an
executable function with the Lisp reader.
</p>
<p>
For some simple security there is a shared password between the client
and server. When the server sends a function it includes a signature,
and the client only runs code that matches the signature. To create a
signature the string encoded version of the function is appended with
the password (both strings) and hashed with a secure hashing
algorithm. Only someone who knows the password — including other
clients — can create the signature.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">defun</span> <span class="nv">sign-sexp</span> <span class="p">(</span><span class="nv">password</span> <span class="nv">sexp</span><span class="p">)</span>
  <span class="s">"Return signature of the given s-exp."</span>
  <span class="p">(</span><span class="nv">sha1</span> <span class="p">(</span><span class="nb">format</span> <span class="s">"%s%s"</span> <span class="nv">password</span> <span class="nv">sexp</span><span class="p">))</span></code></pre></figure>
<p>
To make it easy for the client to read in both the signature and the
function we just cons them together before encoding them as text.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">defun</span> <span class="nv">encode</span> <span class="p">(</span><span class="nv">password</span> <span class="nv">sexp</span><span class="p">)</span>
  <span class="s">"Encode a s-exp for transmission to client."</span>
  <span class="p">(</span><span class="nb">prin1-to-string</span> <span class="p">(</span><span class="nb">cons</span> <span class="p">(</span><span class="nv">sign-sexp</span> <span class="nv">password</span> <span class="nv">sexp</span><span class="p">)</span> <span class="nv">sexp</span><span class="p">)))</span></code></pre></figure>
<p>
The client calls the Lisp reader on the string, then checks the
signature in the <code>car</code> cell against the s-expression in
the <code>cdr</code> cell. This will return the function if it's
legitimate, otherwise <code>nil</code>.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">defun</span> <span class="nv">decode</span> <span class="p">(</span><span class="nv">password</span> <span class="nv">str</span><span class="p">)</span>
  <span class="s">"Decode string into s-exp, checking the signature in the process."</span>
  <span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nb">cons</span> <span class="p">(</span><span class="nb">read</span> <span class="nv">str</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">sig</span>  <span class="p">(</span><span class="nb">car</span> <span class="nb">cons</span><span class="p">))</span>
         <span class="p">(</span><span class="nv">sexp</span> <span class="p">(</span><span class="nb">cdr</span> <span class="nb">cons</span><span class="p">)))</span>
    <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">equal</span> <span class="nv">sig</span> <span class="p">(</span><span class="nv">sign-sexp</span> <span class="nv">password</span> <span class="nv">sexp</span><span class="p">))</span>
        <span class="nv">sexp</span>
      <span class="no">nil</span><span class="p">)))</span></code></pre></figure>
<p>
And that's the core of it. It just needs some network code to move the
string between computers. That part can be found in the linked source
above.
</p>
<p>
To demo this, I'll use the <code>whiten</code> function from
my <a href="/blog/2010/07/26/">previous post</a>. I'll run it with
three different strings on three different computers. Assume we
started the dist-emacs server (<code>dist-start</code>) and connected
three clients (<code>dist-connect</code>) from three computers to
it. The clients were fired up from scratch so there's
no <code>whiten</code> function on them yet, but there <i>is</i> one
defined on the server. First we'll send the function definition to the
clients. The <code>dist-dist</code> function takes a list of functions
and passes each one to a client. Ideally I'd want this function to be
more intelligent, managing a work queue so that an arbitrary length
list of functions will be fed one at a time to each client. That's not
the case here.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">dist-dist</span> <span class="p">(</span><span class="nb">mapcar</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">p</span><span class="p">)</span>
                          <span class="o">`</span><span class="p">(</span><span class="k">lambda</span> <span class="p">()</span>
                             <span class="p">(</span><span class="nv">fset</span> <span class="ss">'whiten</span> <span class="o">,</span><span class="p">(</span><span class="nb">symbol-function</span> <span class="ss">'whiten</span><span class="p">))))</span>
                      <span class="nv">dist-clients</span><span class="p">))</span></code></pre></figure>
<p>
Also like in the previous post, this is an abstraction leak with the
Emacs implementation. But I like this trick so I'm going to use it
anyway. :-) Next we call it on each client with a different string.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">dist-dist</span> <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="p">(</span><span class="nv">whiten</span> <span class="s">"good"</span><span class="p">))</span>
                 <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="p">(</span><span class="nv">whiten</span> <span class="s">"news"</span><span class="p">))</span>
                 <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="p">(</span><span class="nv">whiten</span> <span class="s">"everyone"</span><span class="p">))))</span></code></pre></figure>
<p>
The way I have it set up for my proof of concept the results are just
spit back into the server's <code>*Messages*</code> buffer. If we
watch that buffer we can see each results come back in one at a time
as each machine finishes. I can watch Emacs saturate the CPU on every
client machine simultaneously as it works.
</p>
<pre>
"2577343027adf7817185db876032d8ed"
"46a65dac2c0040afde175adf1e9a81fd"
"f39baf9e74475dd5be7d5495a025fe84"
</pre>
<p>
This isn't the same order as the clients, but the order in which the
jobs were completed.
</p>
<p>
As for the practicality, I doubt there really is one. It's really only
a neat concept (or maybe not even that). For almost the exact same
reasons as my <a href="/blog/2009/06/09/">distributed JavaScript</a>
idea, this is a solution looking for a problem. The problem needs to
be able to be broken into small computation units, because Emacs has
no threading, and it has to be low bandwidth, because it has to be
parsed all at once from a string. If you want to pass large data sets
it needs to be done out-of-band, which probably defeats the
purpose. There seem to be few to no problems that fit these
limitations.
</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Elisp Memoize</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2010/07/26/"/>
    <id>urn:uuid:637ae303-cbcd-3817-8917-26e1640944f5</id>
    <updated>2010-07-26T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<!-- 26 July 2010 -->
<p>
<a href="/blog/2008/03/25/">Memoization</a> is something I think
should be packaged as a standard function for just about every
language. That's not generally the case, but luckily this is easy to
fix in Lisps. I needed memoization recently for an Elisp project I'm
working on. I could have hand-written one but a generic memoization
function would have worked just fine. Since I didn't find any generic
Elisp memoization on-line I wrote my own.
</p>
<p>
<b>Download:
  <a href="https://raw.github.com/skeeto/emacs-memoize/master/memoize.el">
    memoize.el
  </a>
</b>
</p>
<p>
Just put it in your path
and <code>(require 'memoize)</code> it. Here's the core
function.
</p>

<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="c1">;; ID: 83bae208-da65-3e26-2ecb-4941fb310848</span>
<span class="p">(</span><span class="nb">defun</span> <span class="nv">memoize-wrap</span> <span class="p">(</span><span class="nv">func</span><span class="p">)</span>
  <span class="s">"Return the memoized version of FUNC."</span>
  <span class="p">(</span><span class="nv">lexical-let</span> <span class="p">((</span><span class="nv">table</span> <span class="p">(</span><span class="nb">make-hash-table</span> <span class="ss">:test</span> <span class="ss">'equal</span><span class="p">)))</span>
    <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">args</span><span class="p">)</span>
      <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">value</span> <span class="p">(</span><span class="nb">gethash</span> <span class="nv">args</span> <span class="nv">table</span><span class="p">)))</span>
        <span class="p">(</span><span class="k">if</span> <span class="nv">value</span>
            <span class="nv">value</span>
          <span class="p">(</span><span class="nv">puthash</span> <span class="nv">args</span> <span class="p">(</span><span class="nb">apply</span> <span class="nv">func</span> <span class="nv">args</span><span class="p">)</span> <span class="nv">table</span><span class="p">))))))</span></code></pre></figure>

<p>
The hash table is stored inside the fake closure provided by
<code>lexical-let</code>. In a previous version of this function, I
stored it in an uninterned symbol, which is what is going on behind
the scenes of <code>lexical-let</code>.
</p>
<p>
Note that in the full code it keeps the original function
documentation intact. I want the memoization wrapper to be an
unobtrusive as possible.
</p>
<p>
Here's a demo of it in action. This <code>whiten</code> function is
computationally expensive: it performs key whitening. It repeats a
hash function thousands of times to produce an expensive value. This
isn't something you generally want to memoize, but stick with me.
</p>

<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">defun</span> <span class="nv">whiten</span> <span class="p">(</span><span class="nv">key</span><span class="p">)</span>
  <span class="s">"Perform key whitening with the md5 hash function."</span>
  <span class="p">(</span><span class="nb">dotimes</span> <span class="p">(</span><span class="nv">i</span> <span class="mi">100000</span> <span class="nv">key</span><span class="p">)</span>
    <span class="p">(</span><span class="k">setq</span> <span class="nv">key</span> <span class="p">(</span><span class="nv">md5</span> <span class="nv">key</span><span class="p">))))</span>

<span class="p">(</span><span class="nv">whiten</span> <span class="s">"password"</span><span class="p">)</span>   <span class="c1">; takes a couple of seconds</span></code></pre></figure>

<p>
On my laptop that takes a couple of seconds to run. Increase that
counter if it's quick on your computer. My memoize package provides
a <code>memoize</code> function which will create a new function that
wraps the original, then installs the new function in place of the old
one if we give it the function symbol.
</p>

<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">memoize</span> <span class="ss">'whiten</span><span class="p">)</span></code></pre></figure>

<p>
The first time you run it after memoization it will be slow, but after
that the memoization kicks in for a quick return.
</p>
<p>
There are two Elisp specific issues at hand. First is that memoizing
an interactive function will produce a non-interactive function. It
would be easy to fix this problem when it comes to non-byte-compiled
functions, but recovering the interactive definition from a
byte-compiled function is more complex than I care to deal
with. Besides, interactive functions are always used for their side
effects so there's no reason to memoize them.
</p>
<p>
Second is a limitation of Elisp hash tables. There's no way to
distinguish a nil value and no value. The hash table returns nil for
both. This means you cannot memoize nil returns. But a computationally
expensive function shouldn't be returning nil anyway.
</p>
<p>
<i>Update</i>: As of August 2012, me and several other people have
gotten good mileage out of this function! It's an essential part of my
Emacs dotfiles.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Emacs Byte Compilation</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2010/07/01/"/>
    <id>urn:uuid:0c22fcdf-5f7b-315f-5a59-54ce3d7804c3</id>
    <updated>2010-07-01T00:00:00Z</updated>
    <category term="emacs"/><category term="lisp"/>
    <content type="html">
      <![CDATA[<!-- 1 July 2010 -->
<p>
A feature unique to some Lisps is the ability to compile functions
individually at any time. This could be to a bytecode or native code,
depending on the dialect and implementation. In a Lisp implementations
where compilation matters (such as <a href="http://clisp.cons.org/">
CLISP</a>), there are typically two forms in which code can be
evaluated: a slower, unoptimized uncompiled form and a fast, efficient
compiled form. The uncompiled form would have some sort of advantage,
even if it's merely not having to spend time on compilation.
</p>
<p>
In Emacs Lisp, the uncompiled form of a function is just a lambda
s-expression. The only thing that gives it a name is the symbol it's
stored in. The compiled form is a (special) vector, with the actual
byte codes stored in a string as the second element. Constants, the
docstring, and other things are stored in this function vector as
well. The Elisp function to compile functions
is <code>byte-compile</code>. It can be given a lambda function or a
symbol. In the case of a symbol, the compiled function is installed
over top of the s-expression form.
</p>
<pre>
(byte-compile (lambda (x) (* 2 x)))
  <i>=&gt; #[(x) "^H\301_\207" [x 2] 2]</i>
</pre>
<p>
The compiler will not only convert the function to bytecode and expand
macros, but also perform optimizations such as removing dead code,
evaluating safe constant forms, and inline functions. This provides a
nice performance boost (testing using my <a href="/blog/2009/05/28/">
measure-time macro</a>),
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">defun</span> <span class="nv">fib</span> <span class="p">(</span><span class="nv">n</span><span class="p">)</span>
  <span class="s">"Fibonacci sequence."</span>
  <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">&lt;=</span> <span class="nv">n</span> <span class="mi">2</span><span class="p">)</span> <span class="mi">1</span>
    <span class="p">(</span><span class="nb">+</span> <span class="p">(</span><span class="nv">fib</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">n</span> <span class="mi">1</span><span class="p">))</span> <span class="p">(</span><span class="nv">fib</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">n</span> <span class="mi">2</span><span class="p">)))))</span></code></pre></figure>
<pre>
(measure-time
 (fib 30))
  <i>=&gt; 1.0508708953857422</i>

(byte-compile 'fib)

(measure-time
 (fib 30))
  <i>=&gt; 0.4302399158477783</i>
</pre>
<p>
Most of the installed functions in a typical Emacs instance are
already compiled, since they are loaded already compiled. But a number
of them <i>aren't</i> compiled. So, I thought, why not spend a few
seconds to do this?
</p>
<p>
In Common Lisp, there is a predicate for testing whether a function
has been compiled or not: <code>compiled-function-p</code>. For
whatever reason, there is no equivalent predefined in Elisp, so I
wrote one,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">defun</span> <span class="nv">byte-compiled-p</span> <span class="p">(</span><span class="nv">func</span><span class="p">)</span>
  <span class="s">"Return t if function is byte compiled."</span>
  <span class="p">(</span><span class="nb">cond</span>
   <span class="p">((</span><span class="nb">symbolp</span>   <span class="nv">func</span><span class="p">)</span> <span class="p">(</span><span class="nv">byte-compiled-p</span> <span class="p">(</span><span class="nb">symbol-function</span> <span class="nv">func</span><span class="p">)))</span>
   <span class="p">((</span><span class="nb">functionp</span> <span class="nv">func</span><span class="p">)</span> <span class="p">(</span><span class="nb">not</span> <span class="p">(</span><span class="nv">sequencep</span> <span class="nv">func</span><span class="p">)))</span>
   <span class="p">(</span><span class="no">t</span> <span class="no">nil</span><span class="p">)))</span></code></pre></figure>
<p>
My idea was to iterate over every interned symbol and, if the function
slot contains an uncompiled function, using the test above, I would
call <code>byte-compile</code> on it. Well, it turns out
that <code>byte-compile</code> is very flexible and will ignore
symbols with no function and symbols with already compiled functions.
</p>
<p>
So next, how do we iterate over every interned symbol? There is
a <code>mapatoms</code> function for this. Provide it a function and
it calls it on every interned symbol. Well, that's simple and
anticlimactic.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">mapatoms</span> <span class="ss">'byte-compile</span><span class="p">)</span></code></pre></figure>
<p>
That's it! It will take only a few seconds and spew a lot of
warnings. I haven't found a way to disable those warnings, so this
isn't something you'd want to have run automatically, unless you like
having an extra window thrown in your face. I've only discovered this
recently, so I'm not sure what sort of bad things this may do to your
Emacs session. Not every function was written with compilation in
mind. There are interactions with macros to consider.
</p>
<p>
I doubt there will be a noticeable performance difference. Like I said
before, most everything is already compiled, and those are the
functions that get used the most. There's just something nice about
knowing all your functions are compiled and optimized.
</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Emacs ParEdit and IELM</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2010/06/10/"/>
    <id>urn:uuid:2fdb7ef9-e9c6-379c-073f-ab33fc8f5875</id>
    <updated>2010-06-10T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<!-- 10 June 2010 -->
<p>
<a href="http://www.emacswiki.org/emacs/ParEdit">ParEdit</a> is a
powerful extension to Emacs that I've just begun using recently. It's
a minor mode that forces all parenthesis, square brackets, and quotes
to be balanced at all times. While it's useful for any programming
language it's especially suited for Lisps, because it's designed for
manipulating nested parenthesis — i.e. s-expressions. It's not
currently part of Emacs so you have to drop the script in
your <code>load-path</code> somewhere.
</p>
<p>
I've frequently thought that a Lisp-based shell would be an
interesting and powerful tool, much like a normal Lisp REPL. Programs
would be treated like Lisp functions. For example,
</p>
<pre>
wellons@luna:~$ (ls -l .emacs)
-rw------- 1 wellons wellons 4859 2010-06-10 23:20 .emacs
wellons@luna:~$
</pre>
<p>
But typing all those parenthesis all the time would be quite the
nuisance. I know this from experience typing at Lisp REPLs. I imagined
something that works exactly like ParEdit would be needed to make all
that work go away. To save even more time each prompt would begin with
a nested pair, with the cursor placed between them. Then typing a
quick command is no different than a normal shell.
</p>
<pre>
wellons@luna:~$ ()
</pre>
<p>
Well, in Emacs we have both ParEdit and REPLs, so we can compose these
features together with just a little advice. Here's how to do it with
the Interactive Emacs-Lisp Mode (IELM) REPL. First tell IELM to use
ParEdit,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'ielm-mode-hook</span> <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="p">(</span><span class="nv">paredit-mode</span> <span class="mi">1</span><span class="p">)))</span></code></pre></figure>
<p>
The function in IELM that spits out the next prompt
is <code>ielm-eval-input</code>, so we give it the advice to call the
ParEdit function afterwards to insert a parenthesis pair.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">defadvice</span> <span class="nv">ielm-eval-input</span> <span class="p">(</span><span class="nv">after</span> <span class="nv">ielm-paredit</span> <span class="nv">activate</span><span class="p">)</span>
  <span class="s">"Begin each IELM prompt with a ParEdit parenthesis pair."</span>
  <span class="p">(</span><span class="nv">paredit-open-round</span><span class="p">))</span></code></pre></figure>
<p>
And that's it! Note that the first IELM prompt is not placed by this
function so it won't appear until the second prompt.
</p>
<pre>
*** Welcome to IELM ***  Type (describe-mode) for help.
ELISP>
ELISP> ()
</pre>
<p>
If you want to enter a single atom and don't need parenthesis, just
hit backspace once. This is much less common so it gets the extra
keystroke.
</p>
<p>
This can be done for <code>inferior-lisp</code>
and <a href="/blog/2010/01/15/">SLIME</a> to enhance those REPLs as
well. You just have to figure out which defun to advise.
</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Elisp Printed Hash Tables</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2010/06/07/"/>
    <id>urn:uuid:3a623665-57e7-3a74-cb71-fef1573e0e09</id>
    <updated>2010-06-07T00:00:00Z</updated>
    <category term="emacs"/><category term="javascript"/>
    <content type="html">
      <![CDATA[<!-- 7 June 2010 -->
<p>
A printed hash table representation is pretty new to Elisp, and a bit
late. As far as I know Elisp didn't come with a way to print, and read
back in, a hash table without rolling your own
(like <a href="http://curiousprogrammer.wordpress.com/2010/06/07/data-dumper-in-emacs-lisp/">
Jared Dilettante was doing with a Data::Dumper style output</a>),
until 23.1 in July 2009. This is
when <a href="http://www.gnu.org/software/emacs/NEWS.23.1">
<code>json.el</code> was first included with Emacs</a>, for dumping to
and reading from <a href="http://en.wikipedia.org/wiki/JSON">JSON</a>.
</p>

<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">require</span> <span class="ss">'json</span><span class="p">)</span>

<span class="p">(</span><span class="k">setq</span> <span class="nv">hash</span> <span class="p">(</span><span class="nb">make-hash-table</span><span class="p">))</span>
<span class="p">(</span><span class="nv">puthash</span> <span class="s">"key1"</span> <span class="s">"data1"</span> <span class="nv">hash</span><span class="p">)</span>
<span class="p">(</span><span class="nv">puthash</span> <span class="s">"key2"</span> <span class="s">"data2"</span> <span class="nv">hash</span><span class="p">)</span>

<span class="p">(</span><span class="nv">insert</span> <span class="s">"\n;; "</span> <span class="p">(</span><span class="nv">json-encode</span> <span class="nv">hash</span><span class="p">))</span>
<span class="c1">;; {"key2":"data2", "key1":"data1"}</span></code></pre></figure>

<p>
Just a month ago Emacs 23.2 came out, very silently
including <a href="http://www.gnu.org/software/emacs/NEWS.23.2">a new
printed representation for hash tables</a> with a <code>#s</code> hash
notation.
</p>

<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="sx">#s</span><span class="p">(</span><span class="nc">hash-table</span> <span class="nv">data</span> <span class="p">(</span><span class="s">"key1"</span> <span class="s">"data1"</span> <span class="s">"key2"</span> <span class="s">"data2"</span><span class="p">))</span></code></pre></figure>

<p>
With this hash tables can be printed and read as part of normal
s-expressions with the standard lisp reader and printer functions.  It
seems heavy, having to write out "<code>hash-table</code>" in there,
but I think it's because the <code>#s</code> notation will be used to
create printed forms of other lisp objects that currently do not have
one.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Emacs cat-safe</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2010/03/31/"/>
    <id>urn:uuid:47f5f041-2bfc-363f-0a94-73abb0b96ca6</id>
    <updated>2010-03-31T00:00:00Z</updated>
    <category term="emacs"/><category term="lisp"/>
    <content type="html">
      <![CDATA[<!-- 31 March 2010 -->
<p>
<img src="/img/emacs/cat-paw.jpg" class="right"
     alt="" title="Calvin's front right paw."/>

I was inspired by an item
in <a href="http://www.terminally-incoherent.com/blog/">
Luke's</a> <a href="http://random.terminally-incoherent.com/post/484785491">
Tumblr blog</a> last night. It was a screenshot of a program
called <a href="http://www.bitboost.com/pawsense/">PawSense</a>, which
monitors a computer's keyboard for cat activity. (I don't know if it's
any good, but it's funny.) As anyone with cats knows, it's not unusual
to leave a computer only to come back later to see garbage typed in by
a wandering cat. I wrote a version for Emacs today.
</p>
<pre>
git clone <a href="https://github.com/skeeto/cat-safe">git://github.com/skeeto/cat-safe.git</a>
</pre>
<p>
Put it (<code>cat-safe.el</code>) somewhere in
your <code>load-path</code> (like <code>~/.emacs.d/</code>) and put
this line in your <code>.emacs</code> file,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">require</span> <span class="ss">'cat-safe</span><span class="p">)</span></code></pre></figure>
<p class="center">
<img src="/img/emacs/cat-safe-thumb.png" title="cat-safe in action"
     alt="Emacs switches focus to a new buffer to stop cat damage."/>
</p>
<p>
This only monitors Emacs itself; it should help protect your buffers
but not your web browser. When cat interference is detected Emacs
switches focus to a junk buffer and lets the cat make a mess there
instead. In case your
cat <a href="http://en.wikipedia.org/wiki/Infinite_monkey_theorem">happens
to type out some Shakespeare</a> you will be able to read it in the
junk buffer. Just kill the junk buffer to return to work.
</p>
<p>
It could still use some improvement. Right now it looks for a single
key being help down, excepting keys humans tend to hold down like
backspace, delete, and space. If you play around with it you'll notice
if you press several keys at once Emacs will sometimes create a
pattern with them. I need to figure out a good way to detect this.
</p>
<p>
I'm going to run it at home for awhile to make sure it remains
transparent, but still does its job. It will probably incur a
performance penalty on frequently repeated keyboard macros.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Setting up a Common Lisp Environment</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2010/01/15/"/>
    <id>urn:uuid:3d8b15b6-6f09-3969-b991-b37fa4c13ab4</id>
    <updated>2010-01-15T00:00:00Z</updated>
    <category term="emacs"/><category term="lisp"/>
    <content type="html">
      <![CDATA[<!-- 15 January 2010 -->
<p class="abstract">
Update August 2011: Things have changed again, which has always been
the problem with Slime, and the reason I originally wrote
this. Currently, I think the best way to install Slime is
with <a href="http://www.quicklisp.org/">Quicklisp</a>
using <code>quicklisp-slime-helper</code>.
</p>
<p>
Common Lisp is possibly the most advanced programming language. Think
of pretty much any programming language feature and Common Lisp
probably has it. Since lisp is the programmable programming language,
when someone invents a new language feature it can probably be added
to Common Lisp without even touching the language core.
</p>
<p>
<img src="/img/emacs/slime.png" alt="" title="SBCL SLIME REPL"
     class="right"/>

However, if you're interested in digging into Common Lisp to try it
out, you may find yourself quickly running into walls just getting
started. It's a lot different than other programming environments you
may be used to. The Common Lisp tutorials generally skip this step,
assuming the user has an environment, or leaving that setup for the
"vendor" to handle. So, here's a guide to setting up a great Common
Lisp environment with <a href="http://www.gnu.org/software/emacs/">
Emacs</a> and <a href="http://common-lisp.net/project/slime/">
SLIME</a>. It should work with any Common Lisp implementation and any
operating system that can run Emacs (i.e. most of them). Even a much
less capable one like Windows.
</p>
<p>
First, you need to pick a Common Lisp implementation and install
it. Ideally, it should end up in your PATH. Like C, the language is
defined solely by its standardized specification, rather than some
canonical implementation. <a href="http://www.sbcl.org/"> Steel Bank
Common Lisp</a> (SBCL) is currently
the <a href="http://www.cliki.net/Benchmarks">
highest</a> <a href="http://john.freml.in/sbcl-optimise">
performing</a> implementation, it's Free Software, and it runs on a
wide variety of platforms, so take a look at that one if you're not
sure.
</p>
<p>
Next, install Emacs. We're using Emacs not just because it's the best
text editor ever created. <code>:-D</code> It's because that's what
SLIME is written for, and Emacs is a lisp-aware editor. Really, Emacs
is a lisp interpreter that <i>happens</i> to be geared towards
text-editing. It's accused of breaking the rules of unix by being a
single, monolithic program, but it's really a whole bunch of small
lisp programs. You can even have a lisp REPL in Emacs
(<code>ielm</code>), similar to what we will have once we're done
here. It's plays very well with Common Lisp.
</p>
<p>
If you're unfamiliar with Emacs, you should stop here and familiarize
yourself with it a bit. Really, you could spend a decade learning
Emacs and still have more to learn. The tutorial should be good enough
for now. Fire up Emacs and run the tutorial by pressing
<code>control+h</code> then <code>t</code>. In Emacs notation,
that's <code>C-h t</code>. <code>C-h</code> is the help/documentation
prefix, which can be used to look up variables/symbols
(<code>v</code>), functions (<code>f</code>), key bindings
(<code>k</code>), info manuals (<code>i</code>), the current mode
(<code>m</code>), and apropos (searching) (<code>a</code>). In the
info manuals, you should be able to find the full Emacs manual, Elisp
reference, and Elisp tutorial, since they are generally installed
alongside Emacs these days. Nearly anything you might need to know can
be found inside the included documentation.
</p>
<p>
Next, install SLIME. I'll be a bit more specific for this one. Make
a <code>.emacs.d</code> directory in your home directory (whatever
your HOME environmental variable is set to). This is a common place to
put user-installed Emacs extensions. You will be putting
your <code>slime</code> directory in here. There are two basic ways to
obtain SLIME, as indicated right on their main page. You can do a CVS
checkout of the SLIME repository, which allows you to follow it and
run the latest version. Or you can grab a snapshot of the repository,
which is provided, and dump it in there. Since I like you so much,
I'll give you a third option. Here's a Git repository, maintained by
someone very kind, that follows SLIME's CVS repository,
</p>
<pre>
git clone git://git.boinkor.net/slime.git
</pre>
<p>
Ultimately, you should have a directory <code>~/.emacs.d/slime/</code>
that contains a bunch of SLIME source files directly inside.
</p>
<p>
Now, we tell Emacs where SLIME is and how to use it. Make
a <code>.emacs</code> file in your home directory, if you haven't
already, and put this in it,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">add-to-list</span> <span class="ss">'load-path</span> <span class="s">"~/.emacs.d/slime/"</span><span class="p">)</span>
<span class="p">(</span><span class="nb">require</span> <span class="ss">'slime</span><span class="p">)</span>
<span class="p">(</span><span class="nv">slime-setup</span> <span class="o">'</span><span class="p">(</span><span class="nv">slime-repl</span><span class="p">))</span></code></pre></figure>
<p>
Once it's saved, either restart Emacs, or simply evaluate those lines
by putting the cursor after each them in turn and typing <code>C-x
C-e</code>. If you did everything right so far, you shouldn't have any
errors. (If you did, go back up and see what you did wrong.) If your
Common Lisp installation didn't end up in your PATH as
"<code>lisp</code>" (not uncommon) for some reason, you may need to
tell Emacs where it is. For example, I can point directly to my SBCL
installation with this line,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="k">setq</span> <span class="nv">inferior-lisp-program</span> <span class="s">"/usr/bin/sbcl"</span><span class="p">)</span></code></pre></figure>
<p>
If everything is set up right, fire up SLIME with "<code>M-x
slime</code>". It should compile the back-end, called swank, and run a
Common Lisp REPL as an inferior process to Emacs. You should end up
with a nice prompt like this,
</p>
<pre>
CL-USER>
</pre>
<p>
At this line, you can start evaluating lisp expressions as you
please. But this isn't where the true power of SLIME comes in
yet. I'll give you an example: make a new file with
a <code>.lisp</code> extension and open it. Throw some lisp in there,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">defun</span> <span class="nv">adder</span> <span class="p">(</span><span class="nv">x</span><span class="p">)</span>
  <span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nv">y</span><span class="p">)</span> <span class="p">(</span><span class="nb">+</span> <span class="nv">x</span> <span class="nv">y</span><span class="p">)))</span></code></pre></figure>
<p>
Type <code>C-x C-k</code> and it will send the current buffer over to
be compiled and loaded. This code here uses a closure, so you know you
aren't accidentally using Emacs lisp, as it doesn't have closures. At
the REPL you can call it,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="nv">CL-USER&gt;</span> <span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nv">adder</span> <span class="mi">5</span><span class="p">)</span> <span class="mi">6</span><span class="p">)</span></code></pre></figure>
<p>
Which will print the return value, <code>11</code>. That's all there
is to it. You write code in the buffer, then with a simple keystroke
send it to the Common Lisp system to be evaluated and loaded. Because
the SLIME key bindings eclipse the Emacs lisp key bindings, you can
type this same line in the lisp source buffer place the cursor at the
end, and type C-x C-e, which will send it out to Common Lisp to be
evaluated. Look at the mode help (<code>C-h m</code>) to see all the
key bindings made available.
</p>
<p>
This is a great programming environment that makes Common Lisp all the
more fun to use. You run a single, continuous instance if your program
growing it gradually. (This is exactly how I
built <a href="/blog/2009/05/17/">my Emacs web server</a> with elisp.)
You can test your code as soon as soon as it's written.
</p>
<p>
The setup can get even more advanced. The Common Lisp REPL need not be
running on the same computer. It can be running on another computer,
as long as SLIME is able to connect to it over the network. Several
developers could even share a single Common Lisp process running on a
common machine. Lots of possibilities.
</p>
<p>
If you don't have a Common Lisp book yet,
there's <a href="http://gigamonkeys.com/book/">Practical Common
Lisp</a>, which you can read at no cost online
or <a href="http://www.computer-books.us/lisp_0004.php">download</a>
for reading offline. It's based on an Emacs and SLIME setup, so you'll
be right on track.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  <entry>
    <title>Tweaking Emacs for Ant and Java</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2009/12/06/"/>
    <id>urn:uuid:42b07d86-b8d5-3992-5b5e-ad5c41b9256d</id>
    <updated>2009-12-06T00:00:00Z</updated>
    <category term="emacs"/><category term="java"/><category term="lisp"/>
    <content type="html">
      <![CDATA[<!-- 6 December 2009 -->
<p class="abstract">
Update: This is now part of my
<a href="https://github.com/skeeto/emacs-java">
<code>java-mode-plus</code></a> Emacs extension.
</p>
<p>
Developing C in Emacs is a real joy, and it's mostly thanks to the
compile command. Once you have your Makefile — or SConstruct or
whatever build system you like — setup and you want to compile your
latest changes, just run <code>M-x compile</code>, which will run your
build system in a buffer. You can then step through the errors and
warnings with <code>C-x `</code>, and Emacs will take you to them.
It's a very nice way to write code.
</p>
<p>
I use the compile command so much that I bound it to <code>C-x
C-k</code> (<code>C-k</code> tends to be part of compile key
bindings),
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">global-set-key</span> <span class="s">"\C-x\C-k"</span> <span class="ss">'compile</span><span class="p">)</span></code></pre></figure>
<p>
Until recently, I didn't have as nice of a setup for Java. Since they
generally force offensive IDEs onto me at work this wasn't something I
needed yet anyway, but <i>I</i> get to choose my environment on a new
project this time. If you're using Makefiles for some reason when
building your Java project, it still works out fairly well because
they're usually called recursively. It gets more complicated with <a
href="http://ant.apache.org/">Ant</a>, where there is only one
top-level build file. Emacs' compile command only runs the build
command in the buffer's current directory.
</p>
<p>
I know three solutions to this problem. One is to provide the build
file's absolute path when <code>compile</code> asks for the command
with the <code>-buildfile</code> (<code>-f</code>) option. You only
need to type it once per Emacs session, so that's not <i>too</i> bad.
</p>
<pre>
ant -emacs -buildfile /path/to/build.xml
</pre>
<p>
It's not well documented, but there is a <code>-find</code> option
that can be given to Ant that will cause it to search for the build
file itself. This is even nicer than the previous solution. Just
remember to place it last, unless you give it the build filename
too. For example, if you wanted to run the <code>clean</code> target,
</p>
<pre>
ant -emacs clean -find
</pre>
<p>
To keep the actual call as simple as possible, I wrote a wrapper for
<code>compile</code>, and put a hook in <code>java-mode</code> to
change the local binding. The wrapper, <code>ant-compile</code>,
searches for the build file the same way <code>-find</code> would do.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">defun</span> <span class="nv">ant-compile</span> <span class="p">()</span>
  <span class="s">"Traveling up the path, find build.xml file and run compile."</span>
  <span class="p">(</span><span class="nv">interactive</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">with-temp-buffer</span>
    <span class="p">(</span><span class="nv">while</span> <span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">not</span> <span class="p">(</span><span class="nv">file-exists-p</span> <span class="s">"build.xml"</span><span class="p">))</span>
                <span class="p">(</span><span class="nb">not</span> <span class="p">(</span><span class="nb">equal</span> <span class="s">"/"</span> <span class="nv">default-directory</span><span class="p">)))</span>
      <span class="p">(</span><span class="nv">cd</span> <span class="s">".."</span><span class="p">))</span>
    <span class="p">(</span><span class="nv">call-interactively</span> <span class="ss">'compile</span><span class="p">)))</span></code></pre></figure>
<p>
So I can transparently keep using my muscle memory compile binding, I
set up the key binding in a hook,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">add-hook</span> <span class="ss">'java-mode-hook</span>
          <span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="p">(</span><span class="nv">local-set-key</span> <span class="s">"\C-x\C-k"</span> <span class="ss">'ant-compile</span><span class="p">)))</span></code></pre></figure>
<p>
Voila! Java works looks a little bit more like C.
</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Emacs Web Servlets</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2009/11/03/"/>
    <id>urn:uuid:b0a4e98c-4cf7-3c5b-6425-ca437c1ca4ee</id>
    <updated>2009-11-03T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<!-- 3 November 2009 -->
<p>
Remember that <a href="/blog/2009/05/17/">Emacs web server I wrote</a>
back in May? Well, I got an e-mail last night from Chunye Wang
containing a patch with a variant of my dynamic lisp idea, called
"servlets" (not to be confused with Java servlets). Chunye had similar
concept for an Emacs web server for a long time, but never implemented
because Emacs lacked network functionality until recently
(Specifically, <a href="http://www.gnu.org/software/emacs/NEWS.22.1">
<code>make-network-process</code></a> in Emacs 22.1, June 2007). This
led Chunye to find my implementation.
</p>
<p>
Again, you can clone/view the code here. I turned the patch into a
series of commits,
</p>
<pre>
git clone <a href="https://github.com/skeeto/emacs-http-server">git://github.com/skeeto/emacs-http-server.git</a>
</pre>
<p>
This is some cool stuff here.
</p>
<p>
The servlets are simply functions installed under an
"<code>httpd/</code>" namespace, where the trailing slash represents
the server root. So, the function <code>httpd/example-servlet</code>
will be executed when "/example-servlet" is requested from the
server. The servlet runs on a temporary buffer, whose contents are
served when the servlet function returns.
</p>
<p>
To assist in HTML generation, Chunye also wrote a function to turn an
<a href="http://en.wikipedia.org/wiki/S-expression">S-expression</a>
into HTML, similar to the one I described in the web server previous
post. Symbols are converted into strings, alists are attributes, and
the <code>elisp</code> symbol indicates code to be executed, and the
results used to generate HTML. For a simple hello word,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">html</span> <span class="p">(</span><span class="nv">head</span> <span class="p">(</span><span class="nv">title</span> <span class="s">"hello world"</span><span class="p">))</span> <span class="p">(</span><span class="nv">body</span> <span class="s">"hello world"</span><span class="p">))</span></code></pre></figure>
<p>
And for some dynamic content, a die roller,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">defun</span> <span class="nv">httpd/roll-die</span> <span class="p">(</span><span class="nv">uri-query</span> <span class="nv">req</span> <span class="nv">uri-path</span><span class="p">)</span>
  <span class="s">"Rolls a die with the requested number of sides (default 6)."</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">sides</span>
         <span class="p">(</span><span class="nb">1-</span> <span class="p">(</span><span class="nv">string-to-number</span> <span class="p">(</span><span class="nb">or</span> <span class="p">(</span><span class="nb">cadr</span> <span class="p">(</span><span class="nb">assoc</span> <span class="s">"sides"</span> <span class="nv">uri-query</span><span class="p">))</span> <span class="s">"6"</span><span class="p">)))))</span>
    <span class="p">(</span><span class="nv">httpd-generate-html</span>
     <span class="o">'</span><span class="p">(</span><span class="nv">html</span>
       <span class="p">(</span><span class="nv">head</span>
        <span class="p">(</span><span class="nv">title</span> <span class="s">"Die Roll Servlet"</span><span class="p">))</span>
       <span class="p">(</span><span class="nv">body</span>
        <span class="p">(</span><span class="nv">h1</span> <span class="s">"Die Roll Servlet"</span><span class="p">)</span>
        <span class="s">"You rolled a "</span>
        <span class="p">(</span><span class="nv">b</span>
         <span class="p">(</span><span class="nv">elisp</span> <span class="p">(</span><span class="nb">list</span> <span class="p">(</span><span class="nv">number-to-string</span> <span class="p">(</span><span class="nb">1+</span> <span class="p">(</span><span class="nb">random</span> <span class="nv">sides</span><span class="p">)))))))))))</span></code></pre></figure>
<p>
That one would be accessed from the browser with with
"<code>/roll-die</code>" or "<code>/roll-die?sides=100</code>".
</p>
<p>
Chunye provided some sample servlets that list the buffers, with links
that serve them up. There is also another servlet that will switch the
current buffer, which I find compelling. All of Emacs' functionality
is available to the servlet.
</p>
<p>
Now, to write a servlet that runs the Emacs psychiatrist ...
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  <entry>
    <title>The Emacs Calculator</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2009/06/23/"/>
    <id>urn:uuid:f1c60b0a-4b3b-3dd5-fad6-2cfe72c6305e</id>
    <updated>2009-06-23T00:00:00Z</updated>
    <category term="emacs"/><category term="tutorial"/>
    <content type="html">
      <![CDATA[<!-- 23 June 2009 -->
<p>
Did you know that <a
href="http://www.gnu.org/software/emacs/calc.html">Emacs comes with a
calculator</a>? Woop-dee-doo!  Call the presses! Wow, a whole
calculator!  Sounds a bit lame, right?
</p>
<p>
Actually, it's much more than just a simple calculator. It's a <a
href="http://en.wikipedia.org/wiki/Computer_algebra_system"> computer
algebra system</a>! It is officially called a calculator, which isn't
fair. It's an understatement, and I am sure has caused many people to
overlook it. I finally ran into it during a thorough (re)reading of
the Emacs manuals and almost skipped over it myself.
</p>
<p>
Ever see that demonstration by Will Wright for the game <i>Spore</i>
several years ago? The player starts as a single-cell organism and
evolves into a civilization with interstellar presence. When he
started the demo he showed a cell through what looked like a
microscope. No one had any idea yet what the game was about, so every
time he increased the scope, from bacteria to animal, animal to
civilization, civilization to space travel, interplanetary travel to
interstellar travel, there was a huge reaction from the audience. It
was like those infomercials: "But that's not all!!!"
</p>
<p>
As I made my way through the Emacs calc manual I was continually
amazed by its power, with a similar constant increase in scope. Each
new page was almost saying, "But that's not all!!!"
</p>
<p>
Like an infomercial I'm going to run through some of its features. See
the calc manual for a real thorough introduction. It has practice
exercises that shows some gotchas and interesting feature
interactions.
</p>
<p>
Fire it up with <code>C-x * c</code> or <code>M-x calc</code>. There
will be two new windows (Emacs windows, that is), one with the
calculator and the other with usage history (the "trail").
</p>
<p>
First of all, the calculator operates on a stack and so its basic use
is done with RPN. The stack builds vertically, downwards. Type in
numbers and hit enter to push them onto the stack. Operators can be
typed right after the number, so no need to hit enter all the
time. Because negative (<code>-</code>) is reserved for subtraction an
underscore <code>_</code> is used to type a negative number. An
example stack with 3, 4, and 10,
</p>
<pre>
3:  3
2:  4
1:  10
    .
</pre>
<p>
10 is at the "top" of the stack (indicated by the "1:"), so if we type
a <code>*</code> the top two elements are multiplied. Like so,
</p>
<pre>
2:  3
1:  40
    .
</pre>
<p>
The calculator has no limitations on the size of integers, so you work
with large numbers without losing precision. For example, we'll
take <code>2^200</code>.
</p>
<pre>
2:  2
1:  200
    .
</pre>
<p>
Apply the <code>^</code> operator,
</p>
<pre>
1:  1606938044258990275541962092341162602522202993782792835301376
    .
</pre>
<p>
But that's not all!!! It has a complex number type, which is entered
in pairs (real, imaginary) with parenthesis. They can be operated on
like any other number. Take <code>-1 + 2i</code> minus <code>4 +
2i</code>,
</p>
<pre>
2:  (-1, 2)
1:  (4, 2)
    .
</pre>
<p>
Subtract with <code>-</code>,
</p>
<pre>
1:  -5
    .
</pre>
<p>
Then take the square root of that using <code>Q</code>, the square
root function.
</p>
<pre>
1:  (0., 2.2360679775)
    .
</pre>
<p>
We can set the calculator's precision with <code>p</code>. The default
is 12 places, showing here <code>1 / 7</code>.
</p>
<pre>
1:  0.142857142857
    .
</pre>
<p>
If we adjust the precision to 50 and do it again,
</p>
<pre>
2:  0.142857142857
1:  0.14285714285714285714285714285714285714285714285714
    .
</pre>
<p>
Numbers can be displayed in various notations, too, like fixed-point,
scientific notation, and engineering notation. It will switch between
these without losing any information (the stored form is separate from
the displayed form).
</p>
<p>
But that's not all!!! We can represent rational numbers precisely with
ratios. These are entered with a <code>:</code>. Push
on <code>1/7</code>, <code>3/14</code>, and <code>17/29</code>,
</p>
<pre>
3:  1:7
2:  3:13
1:  17:29
    .
</pre>
<p>
And multiply them all together, which displays in the lowest form,
</p>
<pre>
1:  51:2842
    .
</pre>
<p>
There is a mode for working in these automatically.
</p>
<p>
But that's not all!!! We can change the radix. To enter a number with
a different radix, which prefix it with the radix and a
<code>#</code>. Here is how we enter 29 in base-2,
</p>
<pre>
2#11101
</pre>
<p>
We can change the display radix with <code>d r</code>. With 29 on the
stack, here's base-4,
</p>
<pre>
1:  4#131
    .
</pre>
<p>
Base-16,
</p>
<pre>
1:  16#1D
    .
</pre>
<p>
Base-36,
</p>
<pre>
1:  36#T
    .
</pre>
<p>
But that's not all!!! We can enter algebraic expressions onto the
stack with apostrophe, <code>'</code>. Symbols can be entered as part
of the expression. Note: these expressions are not entered in RPN.
</p>
<pre>
1:  a^3 + a^2 b / c d - a / b
    .
</pre>
<p>
There is a "big" mode (<code>d B</code>) for easier reading,
</p>
<pre>
          2
     3   a  b   a
1:  a  + ---- - -
         c d    b

    .
</pre>
<p>
We can assign values to variables to have the expression evaluated. If
we assign <code>a</code> to 10 and use the "evaluates-to" operator,
</p>
<pre>
          2
     3   a  b   a             100 b   10
1:  a  + ---- - -  =>  1000 + ----- - --
         c d    b              c d    b

    .
</pre>
<p>
But that's not all!!! There is a vector type for working with vectors
and matrices and doing linear algebra. They are entered with
brackets, <code>[]</code>.
</p>
<pre>
2:  [4, 1, 5]
1:  [ [ 1, 2, 3 ]
      [ 4, 5, 6 ]
      [ 6, 7, 8 ] ]
    .
</pre>
<p>
And take the dot product, then take cross product of this vector and matrix,
</p>
<pre>
2:  [38, 48, 58]
1:  [ [ -14, -18, -22 ]
      [ -19, -18, -17 ]
      [ 15,  18,  21  ] ]
    .
</pre>
<p>
Any matrix and vector operator you could probably think of is
available, including map and reduce (and you can define your own
expression to apply).
</p>
<p>
We can use this to solve a linear system. Find <code>x</code>
and <code>y</code> in terms of <code>a</code> and <code>b</code>,
</p>
<pre>
x + a y = 6
x + b y = 10
</pre>
<p>
Enter it (note we are using symbols),
</p>
<pre>
2:  [6, 10]
1:  [ [ 1, a ]
      [ 1, b ] ]
    .
</pre>
<p>
And divide,
</p>
<pre>
          4 a     4
1:  [6 + -----, -----]
         a - b  b - a

    .
</pre>
<p>
But that's not all!!! We can create graphs if gnuplot is installed. We
can give it two vectors, or an algebraic expression. This plot
of <code>sin(x)</code> and <code>x cos(x)</code> was made with just a
few keystrokes,
</p>
<p class="center">
<img src="/img/emacs/calc-plot.png" alt="" title="See! Pretty!"/>
</p>
<p>
But that's not all!!! There is an HMS type for handling times and
angles. For 2 hours, 30 minutes, and 4 seconds, and some others,
</p>
<pre>
3:  2@ 30' 4"
2:  4@ 22' 13"
1:  1@ 2' 56"
    .
</pre>
<p>
Of course, the normal operators work as expected. We can add them all up,
</p>
<pre>
1:  7@ 55' 13"
    .
</pre>
<p>
We can convert between this and radians, and degrees, and so on.
</p>
<p>
But that's not all!!! The calculator also has a date type, entered
inside angled brackets, <code>&lt;&gt;</code> (in algebra entry
mode). It is really flexible on input dates. We can insert the current
date with <code>t N</code>.
</p>
<pre>
1:  &lt;6:59:34pm Tue Jun 23, 2009&gt;
    .
</pre>
<p>
If we add numbers they are treated as days. Add 4,
</p>
<pre>
1:  &lt;6:59:34pm Sat Jun 27, 2009&gt;
    .
</pre>
<p>
It works with the HMS format from before too. Subtract <code>2@ 3'
15"</code>.
</p>
<pre>
1:  &lt;4:56:32pm Sat Jun 27, 2009&gt;
    .
</pre>
<p>
But that's not all!!! There is a modulo form for performing modulo
arithmetic. For example, 17 mod 24,
</p>
<pre>
1:  17 mod 24
    .
</pre>
<p>
Add 10,
</p>
<pre>
1:  3 mod 24
    .
</pre>
<p>
This is most useful for forms such as <code>n^p mod M</code>, which
this will handle efficiently. For example, <code>3^100000 mod
24</code>. The naive way would be to find <code>3^100000</code> first,
then take the modulus. This involves a computationally expensive
middle step of calculating <code>3^100000</code>, a huge number. The
modulo form does it smarter.
</p>
<p>
But that's not all!!! The calculator can do unit conversions. The
version of Emacs (22.3.1) I am typing in right now knows about 159
different units. For example, I push 65 mph onto the stack,
</p>
<pre>
1:  65 mph
    .
</pre>
<p>
Convert to meters per second with <code>u c</code>,
</p>
<pre>
1:  29.0576 m / s
    .
</pre>
<p>
It is flexible about mixing type of units. For example, I enter 3
cubic meters,
</p>
<pre>
       3
1:  3 m

    .
</pre>
<p>
I can convert to gallons,
</p>
<pre>
1:  792.516157074 gal
    .
</pre>
<p>
I work in a lab without Internet access during the day, so when I need
to do various conversions Emacs is indispensable.
</p>
<p>
The speed of light is also a unit. I can enter <code>1 c</code> and
convert to meters per second,
</p>
<pre>
1:  299792458 m / s
    .
</pre>
<p>
But that's not all!!! As I said, it's a computer algebra system so it
understands symbolic math. Remember those algebraic expressions from
before? I can operate on those. Let's push some expressions onto the
stack,
</p>
<pre>
3:  ln(x)

       2   a x
2:  a x  + --- + c
            b

1:  y + c

    .
</pre>
<p>
Multiply the top two, then add the third,
</p>
<pre>
                2   a x
1:  ln(x) + (a x  + --- + c) (y + c)
                     b

    .
</pre>
<p>
Expand with <code>a x</code>, then simplify with <code>a s</code>,
</p>
<pre>
                 2   a x y              2   a c x    2
1:  ln(x) + a y x  + ----- + c y + a c x  + ----- + c
                       b                      b

    .
</pre>
<p>
Now, one of the coolest features: calculus. Differentiate with respect
to x, with <code>a d</code>,
</p>
<pre>
    1             a y             a c
1:  - + 2 a y x + --- + 2 a c x + ---
    x              b               b

    .
</pre>
<p>
Or undo that and integrate it,
</p>
<pre>
                       3      2                  3        2
                  a y x    a x  y           a c x    a c x       2
1:  x ln(x) - x + ------ + ------ + c x y + ------ + ------ + x c
                    3       2 b               3       2 b

    .
</pre>
<p>
That's just awesome! That's a text editor ... doing calculus!
</p>
<p>
So, that was most of the main features. It was kind of exhausting
going through all of that, and I am only scratching the surface of
what the calculator can do.
</p>
<p>
Naturally, it can be extended with some elisp. It provides a
<code>defmath</code> macro specifically for this.
</p>
<p>
I bet (hope?) someday it will have a functions for doing Laplace and
Fourier transforms.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Elisp Wishlist</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2009/05/29/"/>
    <id>urn:uuid:41fa774c-1f9e-3ef1-1029-69b775475150</id>
    <updated>2009-05-29T00:00:00Z</updated>
    <category term="rant"/><category term="emacs"/><category term="elisp"/><category term="lisp"/>
    <content type="html">
      <![CDATA[<!-- 29 May 2009 -->
<p class="abstract">
<b>Update:</b> It looks like all these wishes, except the last one,
may actually be coming
true! <a href="http://lists.gnu.org/archive/html/emacs-devel/2010-04/msg00665.html">
Guile can run Elisp better than Emacs</a>! The idea is that the Elisp
engine is replaced with Guile — the GNU project's Scheme
implementation designed to be used as an extension language — and
written in Scheme is an Elisp compiler that targets Guile's VM. The
extension language of Emacs then becomes Scheme, but Emacs is still
able to run all the old Elisp code. At the same time Elisp itself,
which I'm sure many people will continue to use, gets an upgrade of
arbitrary precision, closures, and better performance.
</p>
<p>
I've been using elisp a lot lately, but unfortunately it's missing a
lot of features that one would find in a more standard lisp. The
following are some features I wish elisp had. Many of these could be
fit into a generic "be more like Scheme or Common Lisp". Some of these
features would break the existing mountain of elisp code out there,
requiring a massive rewrite, which is likely the main reason they are
being held back.
</p>
<p>
<b>Closures</b>, and maybe continuations. Closures are one of the
features I miss the most when writing elisp. They would allow the
implementation of Scheme-style lazy evaluation with <code>delay</code>
and <code>force</code>, among other neat tools. Continuations would
just be a neat thing to have, though they come with a performance
penalty.
</p>
<p>
Closures would also pretty much require Emacs switch to lexical
scoping.
</p>
<p>
<b>Arbitrary precision</b>. Really, any higher order language's
numbers should be bignums. Emacs 22 <i>does</i> come with the Calc
package which provides arbitrary precision via
<code>defmath</code>. Perl does something like this with the bignum
module.
</p>
<p>
<b>Packages/namespaces</b>. Without namespaces all of the Emacs
packages prefix their functions and variables with its name
(i.e. <code>dired-</code>). Some real namespaces would be useful for
large projects.
</p>
<p>
<b>C interface</b>. This is something GNU Emacs will never have
because Richard Stallman considers Emacs shared libraries support to
be <a href="http://www.emacswiki.org/emacs/DynamicallyExtendingEmacs">
a GPL threat</a>. If Emacs could be dynamically extended some useful
libraries could be linked in and exposed to elisp.
</p>
<p>
<b>Concurrency</b>. If some elisp is being executed Emacs will lock
up. This is a particular problem for Gnus. Again, Emacs would really
need to switch to lexical scoping before this could happen. Threading
would be nice.
</p>
<p>
<b>Speed</b>. Emacs lisp is pretty slow, even when compiled. Lexical
scoping would help with performance (compile time vs. run time
binding).
</p>
<p>
<b>Regex type</b>. I mention this last because I think this would be
really cool, and I am not aware of any other lisps that do it. Emacs
does regular expressions with strings, which is silly and
cumbersome. Backslashes need extra escaping, for example. Instead, I
would rather have a regex type like Perl and Javascript have. So
instead of,
</p>
<pre>
(string-match "\\w[0-9]+" "foo525")
</pre>
<p>
we have,
</p>
<pre>
(string-match /\w[0-9]+/ "foo525")
</pre>
<p>
Naturally there would be a <code>regexpp</code> predicate for checking
its type. There could also be a function for compiling a regexp from a
string into a regexp object. As a bonus, I would also like to use it
directly as a function,
</p>
<pre>
(/\w[0-9]+/ "foo525")
</pre>
<p>
I think a regexp price would really give elisp an edge, and would be
entirely appropriate for a text editor. It could also be done without
breaking anything (keep string-style regexp support).
</p>
<p>
There is more commentary over at EmacsWiki: <a
href="http://www.emacswiki.org/emacs/WhyDoesElispSuck"> Why Does Elisp
Suck</a>.
</p>
]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>Elisp Running Time Macro</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2009/05/28/"/>
    <id>urn:uuid:a693dd66-ee70-35ea-3140-6468705ee2d9</id>
    <updated>2009-05-28T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<!-- 28 May 2009 -->
<p>
I wanted an elisp macro that could measure the running time of a block
of code. Specifically, I wanted it to work like this,
</p>
<pre>
(measure-time
  <i>...
  body
  ...</i>)
</pre>
<p>
And it would return the running time as seconds in floating
point. Well, here's a macro that does it!
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="c1">;; ID: 6a3f3d99-f0da-329a-c01c-bb6b868f3239</span>
<span class="p">(</span><span class="nb">defmacro</span> <span class="nv">measure-time</span> <span class="p">(</span><span class="k">&amp;rest</span> <span class="nv">body</span><span class="p">)</span>
  <span class="s">"Measure and return the running time of the code block."</span>
  <span class="p">(</span><span class="k">declare</span> <span class="p">(</span><span class="nv">indent</span> <span class="nb">defun</span><span class="p">))</span>
  <span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">start</span> <span class="p">(</span><span class="nb">make-symbol</span> <span class="s">"start"</span><span class="p">)))</span>
    <span class="o">`</span><span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="o">,</span><span class="nv">start</span> <span class="p">(</span><span class="nv">float-time</span><span class="p">)))</span>
       <span class="o">,@</span><span class="nv">body</span>
       <span class="p">(</span><span class="nb">-</span> <span class="p">(</span><span class="nv">float-time</span><span class="p">)</span> <span class="o">,</span><span class="nv">start</span><span class="p">))))</span></code></pre></figure>
<p>
It's only good for up to around 18 hours, then the time integer
overflows. If only Emacs had arbitrary precision numbers. Here it is
in action using my <a href="/blog/2009/05/23">binomial function from
last week</a>.
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">measure-time</span>
  <span class="p">(</span><span class="nv">nck</span> <span class="mi">20</span> <span class="mi">10</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">nck</span> <span class="mi">30</span> <span class="mi">7</span><span class="p">))</span></code></pre></figure>
<p>
Which, just now, returned <code>3.643713</code> seconds when executed.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  <entry>
    <title>Emacs Web Server</title>
    <link rel="alternate" type="text/html" href="https://nullprogram.com/blog/2009/05/17/"/>
    <id>urn:uuid:1e0a3639-6df2-3c5b-3b16-bb29ac300602</id>
    <updated>2009-05-17T00:00:00Z</updated>
    <category term="emacs"/><category term="elisp"/>
    <content type="html">
      <![CDATA[<!-- 17 May 2009 -->
<p>
As part of my quest of developing solid knowledge of <a
href="http://www.gnu.org/software/emacs/">GNU Emacs</a> lisp, I have
implemented a pseudo-HTTP/1.0 web server within Emacs. Behold,
</p>
<pre>
git clone <a href="https://github.com/skeeto/emacs-http-server">git://github.com/skeeto/emacs-http-server.git</a>
</pre>
<p>
To all other non-emacsen text editors, can your text editor do that?!
Ha! Even though elisp is a slow, closure-less, dynamically scoped,
ugly cousin of more popular lisps, it's still a lot of fun to write.
</p>
<p>
To fire it up, load it into Emacs and run the extended command
(<code>M-x</code>) <code>httpd-start</code>. By default it will serve
files from "<code>~/public_html</code>". To change this, change the
variable <code>httpd-root</code> to the desired web root. You can stop
the server with <code>httpd-stop</code>.
</p>
<p>
It's about 200 lines of code and can serve static websites made of
small, static files. I say small files because it serves files from
buffers, meaning it has to read the entire file in first.
</p>
<p>
For a simple, text editor based server it can hold up to a pretty
decent load. At one point I hit it with 8 <code>wget</code> instances all making
rapid recursive downloads and my manual navigation wasn't slowed down
noticeably. Despite running in the slow elisp interpreter, I think it
can have much better performance by caching commonly served files in
buffers.
</p>
<p>
It <i>should</i> run, unmodified, anywhere a modern Emacs can run, so
I expect that it's already very portable. I can imagine it being
useful in a situation where someone needs to temporarily host some
files but there isn't a web server on the machine. Just grab this
script and throw it at Emacs.
</p>
<p>
Well, it only does IPv4 right now, though I expect IPv6 only requires
changing one number (namely, 4 to 6). I don't have any IPv6 systems to
test it on.
</p>
<p>
When writing it I also had security in mind so, as far as I know, it
should be safe to use. It cleans up the <code>GET</code> from the
client so that no files underneath the serving root can be accessed.
</p>
<p>
The server log is lisp itself. Here is an example log starting the
server, serving one request, and halting,
</p>
<pre>
'(log
  (start "Wed May 13 23:33:34 2009")
  (connection
   (date "Wed May 13 23:36:25 2009")
   (address "192.168.0.3")
   (get "/0001.html")
   (req
    ("Referer" "http://192.168.0.2:8080/")
    ("Connection" "keep-alive")
    ("Keep-Alive" "300")
    ("Accept-Charset" "ISO-8859-1,utf-8;q=0.7,*;q=0.7")
    ("Accept-Encoding" "gzip,deflate")
    ("Accept-Language" "en-us,en;q=0.5")
    ("Accept" "image/png,image/*;q=0.8,*/*;q=0.5")
    ("User-Agent" "Mozilla/5.0 [...] Iceweasel/3.0.9 (Debian-3.0.9-1)")
    ("Host" "192.168.0.2:8080")
    ("GET" "/0001.html" "HTTP/1.1"))
   (path "~/public_html/0001.html")
   (status 200))
  (stop "Wed May 13 23:38:17 2009"))
</pre>
<p>
The log is alists of alists, making a hierarchical tree structure that
can be explored with some simple lisp functions. Normally this sort of
thing is done with XML, but lisp already has its own structured
format: lists!
</p>
<p>
When <code>GET</code> is a directory, it looks for
"<code>index.html</code>" and serves that if it exists. More indexes
can be added to the variable <code>httpd-indexes</code>. This can
actually be done in a special "<code>.htaccess.el</code>" file.
</p>
<p>
If a "<code>.htaccess.el</code>" exists in the directory from which a
file is being served, Emacs will first load/execute it. You see, it's
just a lisp program. If you wanted to add a new index file name, the
hypertext access file could contain this,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nv">add-to-list</span> <span class="ss">'httpd-indexes</span> <span class="s">"0001.html"</span><span class="p">)</span></code></pre></figure>
<p>
It's a bit like a <code>.emacs</code> file.
</p>
<p>
But I think one of the coolest things about having a lisp-based server
is that the server can be modified in place without disrupting or
restarting it. In my Emacs web server, the only change that requires a
restart is changing the server port. In fact, I wrote most of it while
the server was running and tested my changes from a browser right as I
made them — all on the same instance of the server.
</p>
<p>
If you want to look into the AI side of this, the server could modify
its own code in response to its use.
</p>
<p>
I also had the idea of creating dynamic websites with elisp, in the
same way PHP or Perl does. If a <code>.el</code> file (or
<code>.elc</code>) is accessed, the server would pass the
<code>GET</code>/<code>POST</code> arguments as an alist to a function
in the elisp file. The server would also provide some nifty HTML
generation macros. A dynamic script might look like this,
</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><span class="p">(</span><span class="nb">defun</span> <span class="nv">script</span> <span class="p">(</span><span class="nb">get</span><span class="p">)</span>
  <span class="p">(</span><span class="nv">html</span>
   <span class="p">(</span><span class="nv">head</span>
    <span class="p">(</span><span class="nv">title</span> <span class="s">"My Script"</span><span class="p">))</span>
   <span class="p">(</span><span class="nv">body</span>
    <span class="p">(</span><span class="nv">h1</span> <span class="s">"Your Query"</span><span class="p">)</span>
    <span class="p">(</span><span class="nv">p</span> <span class="p">(</span><span class="nv">concat</span> <span class="s">"Your query was "</span>
               <span class="p">(</span><span class="nv">html-sanitize</span> <span class="p">(</span><span class="nb">cdr</span> <span class="p">(</span><span class="nb">assoc</span> <span class="s">"q"</span> <span class="nb">get</span><span class="p">))</span> <span class="s">"."</span><span class="p">))))))</span></code></pre></figure>
<p>
However, this is not (yet?) implemented. Just an idea.
</p>
<p>
I will continue to work on it, though I don't expect to add much more
to it. I will mostly improve the code and documentation.
</p>
]]>
    </content>
  </entry>
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  
    
  

</feed>
